Barbara Miller
b2a4fbb17f
Merge branch 'ARI-4868' into qa
2017-05-10 15:09:00 -07:00
Barbara Miller
35977f6276
enable huffpostslides.js
2017-05-10 15:08:29 -07:00
Barbara Miller
054625b8a5
Merge pull request #40 from BitBaron/ari-4960
...
Crawl Google Calendar for fortstjames.ca
2017-05-09 14:12:48 -07:00
Noah Levitt
0c45ca2211
Merge branch 'master' into qa
...
* master:
do a better job of making sure to shut down the browser when brozzle-page is killed
2017-05-03 16:43:38 -07:00
Noah Levitt
b4bf17df9b
do a better job of making sure to shut down the browser when brozzle-page is killed
2017-05-03 16:43:31 -07:00
Noah Levitt
c3637ecb35
Merge branch 'master' into qa
...
* master:
handle another rethinkdb outage corner case
2017-05-01 14:12:51 -07:00
Noah Levitt
9d4cbbf6eb
handle another rethinkdb outage corner case
2017-05-01 14:12:43 -07:00
Noah Levitt
15a3da61c6
Merge branch 'master' into qa
...
* master:
BrozzlerWorkerThread separate from MainThread to avoid SIGTERM/SIGINT raising exception inside of some rethinkdb code or other sensitive code in that BrozzlerWorker.run() calls
2017-05-01 13:46:28 -07:00
Noah Levitt
389db01458
BrozzlerWorkerThread separate from MainThread to avoid SIGTERM/SIGINT raising exception inside of some rethinkdb code or other sensitive code in that BrozzlerWorker.run() calls
2017-05-01 13:46:19 -07:00
Noah Levitt
69d8571871
Merge branch 'master' into qa
...
* master:
re-claim sites after 1 hour instead of 2 so that sites don't have to wait as long to be brozzled again in case of kill -9 brozzler-worker
add a github PR template for this repo
update headless chrome instructions for regular chrome builds
use the new api `with brozzler.thread_accept_exceptions()`
refactor thread_raise safety to use a context manager
allow this stupid test to fail
improve messaging when brozzler-stop-crawl is passed nonexistent seed/job id
safen up brozzler.thread_raise() to avoid interrupting rethinkdb transactions and such
2017-05-01 13:00:34 -07:00
Noah Levitt
52433ade78
re-claim sites after 1 hour instead of 2 so that sites don't have to wait as long to be brozzled again in case of kill -9 brozzler-worker
2017-05-01 13:00:04 -07:00
Noah Levitt
000d40c4dc
Merge pull request #39 from bnewbold/bnewbold-pr-template
...
add a github PR template for this repo
2017-04-26 14:34:32 -07:00
bnewbold
83552eb444
add a github PR template for this repo
2017-04-26 14:10:24 -07:00
Noah Levitt
d972919db0
Merge pull request #36 from nlevitt/safe-thread-raise
...
safen up brozzler.thread_raise() to avoid interrupting rethinkdb tran…
2017-04-26 11:15:02 -07:00
Noah Levitt
27ee8d53f8
Merge pull request #38 from ato/headless-doc
...
update headless chrome instructions for regular chrome builds
2017-04-25 09:39:43 -07:00
Alex Osborne
69aba8b762
update headless chrome instructions for regular chrome builds
...
Also make it clearer that this hasn't been tested much.
2017-04-25 15:00:25 +10:00
Noah Levitt
dcf4811470
Merge branch 'master' into safe-thread-raise
2017-04-24 20:06:37 -07:00
Noah Levitt
d916b68ab9
use the new api with brozzler.thread_accept_exceptions()
2017-04-24 20:02:34 -07:00
Noah Levitt
0953e6972e
refactor thread_raise safety to use a context manager
2017-04-24 19:51:51 -07:00
Noah Levitt
f140e5bdbd
allow this stupid test to fail
2017-04-21 12:17:11 -07:00
Noah Levitt
ba519d7288
improve messaging when brozzler-stop-crawl is passed nonexistent seed/job id
2017-04-20 18:04:17 -07:00
Noah Levitt
7706bab8b8
safen up brozzler.thread_raise() to avoid interrupting rethinkdb transactions and such
2017-04-20 17:08:16 -07:00
Noah Levitt
4f5553954c
Merge branch 'master' into qa
...
* master:
quote that shell meta character
need warcprox in python path for travis tests now
2017-04-19 08:58:47 -07:00
Noah Levitt
b3fa7a4e39
quote that shell meta character
2017-04-18 18:46:59 -07:00
Noah Levitt
426916a238
need warcprox in python path for travis tests now
2017-04-18 18:10:18 -07:00
Noah Levitt
d8904dc9e7
Merge branch 'master' into qa
...
* master:
implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker
have _warcprox_write_record also raise ProxyError when appropriate, and test this
2017-04-18 17:54:21 -07:00
Noah Levitt
8256a34b4f
implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker
2017-04-18 17:54:12 -07:00
Noah Levitt
5603ff5380
have _warcprox_write_record also raise ProxyError when appropriate, and test this
2017-04-18 16:58:51 -07:00
Neil Minton
f90d05075e
Merge branch 'ari-4960' into qa
2017-04-18 15:24:14 -07:00
Neil Minton
f541dce5c3
Crawl Google Calendar for fortstjames.ca
2017-04-18 15:22:33 -07:00
Noah Levitt
d05474173a
Merge branch 'master' into qa
...
* master:
fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth
2017-04-18 12:00:33 -07:00
Noah Levitt
ac972d399f
fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth
2017-04-18 12:00:23 -07:00
Noah Levitt
6844cb5bcb
Merge branch 'master' into qa
...
* master:
raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch
raise new exception brozzler.ProxyError in case of proxy error browsing a page
make brozzle-page respect --proxy (no test for this!)
oops, version bump for previous commit
bubble up proxy errors fetching robots.txt, with unit test, and documentation
2017-04-17 18:15:32 -07:00
Noah Levitt
dc43794363
raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch
2017-04-17 18:15:22 -07:00
Noah Levitt
349b41ab32
raise new exception brozzler.ProxyError in case of proxy error browsing a page
2017-04-17 18:14:02 -07:00
Noah Levitt
87a7301f4d
make brozzle-page respect --proxy (no test for this!)
2017-04-17 18:11:09 -07:00
Noah Levitt
0e90950de2
oops, version bump for previous commit
2017-04-17 18:10:56 -07:00
Noah Levitt
0884b4cd56
bubble up proxy errors fetching robots.txt, with unit test, and documentation
2017-04-17 16:47:05 -07:00
Noah Levitt
929f046ebb
Merge branch 'master' into qa
...
* master:
new command line utility brozzler-stop-crawl, with tests
2017-04-14 18:06:24 -07:00
Noah Levitt
df7734f2ca
new command line utility brozzler-stop-crawl, with tests
2017-04-14 18:06:15 -07:00
Barbara Miller
11279e001b
Merge branch 'ARI-5259' into qa
2017-04-14 15:20:11 -07:00
Barbara Miller
72e9d8da58
blog.sin.com.cn pagination
2017-04-14 15:19:12 -07:00
Noah Levitt
a768b07a65
Merge branch 'master' into qa
...
* master:
parameterize command line entry points and add tests of --version, a rudimentary check that the commands at least run
2017-04-14 11:46:39 -07:00
Noah Levitt
fae60e9960
parameterize command line entry points and add tests of --version, a rudimentary check that the commands at least run
2017-04-14 11:46:26 -07:00
Noah Levitt
b7731bdc75
Merge branch 'master' into qa
...
* master:
stupid version number bump
Revert "bump version number for last pull request"
2017-04-05 17:02:09 -07:00
Noah Levitt
b3cf746f53
stupid version number bump
2017-04-05 17:01:52 -07:00
Noah Levitt
62917a6f1a
Revert "bump version number for last pull request"
...
This reverts commit d192fc269e .
2017-04-05 17:01:06 -07:00
Noah Levitt
40022ad0c2
Merge branch 'master' into qa
...
* master:
bump version number for last pull request
2017-04-05 16:15:33 -07:00
Noah Levitt
d192fc269e
bump version number for last pull request
2017-04-05 16:15:24 -07:00
Barbara Miller
537eb1cf7f
Merge pull request #34 from galgeek/ARI-5193
...
mouseover for ky.gov sites
2017-04-05 16:13:57 -07:00