Noah Levitt
000d40c4dc
Merge pull request #39 from bnewbold/bnewbold-pr-template
...
add a github PR template for this repo
2017-04-26 14:34:32 -07:00
bnewbold
83552eb444
add a github PR template for this repo
2017-04-26 14:10:24 -07:00
Noah Levitt
d972919db0
Merge pull request #36 from nlevitt/safe-thread-raise
...
safen up brozzler.thread_raise() to avoid interrupting rethinkdb tran…
2017-04-26 11:15:02 -07:00
Noah Levitt
27ee8d53f8
Merge pull request #38 from ato/headless-doc
...
update headless chrome instructions for regular chrome builds
2017-04-25 09:39:43 -07:00
Alex Osborne
69aba8b762
update headless chrome instructions for regular chrome builds
...
Also make it clearer that this hasn't been tested much.
2017-04-25 15:00:25 +10:00
Noah Levitt
dcf4811470
Merge branch 'master' into safe-thread-raise
2017-04-24 20:06:37 -07:00
Noah Levitt
d916b68ab9
use the new api with brozzler.thread_accept_exceptions()
2017-04-24 20:02:34 -07:00
Noah Levitt
0953e6972e
refactor thread_raise safety to use a context manager
2017-04-24 19:51:51 -07:00
Noah Levitt
f140e5bdbd
allow this stupid test to fail
2017-04-21 12:17:11 -07:00
Noah Levitt
ba519d7288
improve messaging when brozzler-stop-crawl is passed nonexistent seed/job id
2017-04-20 18:04:17 -07:00
Noah Levitt
7706bab8b8
safen up brozzler.thread_raise() to avoid interrupting rethinkdb transactions and such
2017-04-20 17:08:16 -07:00
Noah Levitt
4f5553954c
Merge branch 'master' into qa
...
* master:
quote that shell meta character
need warcprox in python path for travis tests now
2017-04-19 08:58:47 -07:00
Noah Levitt
b3fa7a4e39
quote that shell meta character
2017-04-18 18:46:59 -07:00
Noah Levitt
426916a238
need warcprox in python path for travis tests now
2017-04-18 18:10:18 -07:00
Noah Levitt
d8904dc9e7
Merge branch 'master' into qa
...
* master:
implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker
have _warcprox_write_record also raise ProxyError when appropriate, and test this
2017-04-18 17:54:21 -07:00
Noah Levitt
8256a34b4f
implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker
2017-04-18 17:54:12 -07:00
Noah Levitt
5603ff5380
have _warcprox_write_record also raise ProxyError when appropriate, and test this
2017-04-18 16:58:51 -07:00
Neil Minton
f90d05075e
Merge branch 'ari-4960' into qa
2017-04-18 15:24:14 -07:00
Neil Minton
f541dce5c3
Crawl Google Calendar for fortstjames.ca
2017-04-18 15:22:33 -07:00
Noah Levitt
d05474173a
Merge branch 'master' into qa
...
* master:
fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth
2017-04-18 12:00:33 -07:00
Noah Levitt
ac972d399f
fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth
2017-04-18 12:00:23 -07:00
Noah Levitt
6844cb5bcb
Merge branch 'master' into qa
...
* master:
raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch
raise new exception brozzler.ProxyError in case of proxy error browsing a page
make brozzle-page respect --proxy (no test for this!)
oops, version bump for previous commit
bubble up proxy errors fetching robots.txt, with unit test, and documentation
2017-04-17 18:15:32 -07:00
Noah Levitt
dc43794363
raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch
2017-04-17 18:15:22 -07:00
Noah Levitt
349b41ab32
raise new exception brozzler.ProxyError in case of proxy error browsing a page
2017-04-17 18:14:02 -07:00
Noah Levitt
87a7301f4d
make brozzle-page respect --proxy (no test for this!)
2017-04-17 18:11:09 -07:00
Noah Levitt
0e90950de2
oops, version bump for previous commit
2017-04-17 18:10:56 -07:00
Noah Levitt
0884b4cd56
bubble up proxy errors fetching robots.txt, with unit test, and documentation
2017-04-17 16:47:05 -07:00
Noah Levitt
929f046ebb
Merge branch 'master' into qa
...
* master:
new command line utility brozzler-stop-crawl, with tests
2017-04-14 18:06:24 -07:00
Noah Levitt
df7734f2ca
new command line utility brozzler-stop-crawl, with tests
2017-04-14 18:06:15 -07:00
Barbara Miller
11279e001b
Merge branch 'ARI-5259' into qa
2017-04-14 15:20:11 -07:00
Barbara Miller
72e9d8da58
blog.sin.com.cn pagination
2017-04-14 15:19:12 -07:00
Noah Levitt
a768b07a65
Merge branch 'master' into qa
...
* master:
parameterize command line entry points and add tests of --version, a rudimentary check that the commands at least run
2017-04-14 11:46:39 -07:00
Noah Levitt
fae60e9960
parameterize command line entry points and add tests of --version, a rudimentary check that the commands at least run
2017-04-14 11:46:26 -07:00
Noah Levitt
b7731bdc75
Merge branch 'master' into qa
...
* master:
stupid version number bump
Revert "bump version number for last pull request"
2017-04-05 17:02:09 -07:00
Noah Levitt
b3cf746f53
stupid version number bump
2017-04-05 17:01:52 -07:00
Noah Levitt
62917a6f1a
Revert "bump version number for last pull request"
...
This reverts commit d192fc269e
.
2017-04-05 17:01:06 -07:00
Noah Levitt
40022ad0c2
Merge branch 'master' into qa
...
* master:
bump version number for last pull request
2017-04-05 16:15:33 -07:00
Noah Levitt
d192fc269e
bump version number for last pull request
2017-04-05 16:15:24 -07:00
Barbara Miller
537eb1cf7f
Merge pull request #34 from galgeek/ARI-5193
...
mouseover for ky.gov sites
2017-04-05 16:13:57 -07:00
Noah Levitt
6535922fa6
Merge branch 'master' into qa
...
* master:
extract area/@href links, and add test for outlink extraction
2017-04-05 12:09:58 -07:00
Noah Levitt
5bcd10c228
extract area/@href links, and add test for outlink extraction
2017-04-05 12:09:48 -07:00
Barbara Miller
010c2869dd
Merge branch 'ARI-5193' into qa
2017-04-04 15:52:50 -07:00
Barbara Miller
847b68eaf4
add JIRA info
2017-04-04 15:52:03 -07:00
Barbara Miller
bdf4c2db42
Merge branch 'ARI-5193' into qa
2017-03-31 15:49:24 -07:00
Barbara Miller
901321199c
mouseover for ky.gov sites
2017-03-31 15:48:01 -07:00
Noah Levitt
db8c1d36fa
Merge branch 'master' into qa
...
* master:
ugh fix version number
2017-03-30 17:53:45 -07:00
Noah Levitt
d4d3ef4fd3
ugh fix version number
2017-03-30 17:53:36 -07:00
Noah Levitt
082f10d327
Merge branch 'master' into qa
...
* master:
consolidate job.py and site.py into model.py, and let Job and Site share the elapsed() method by way of a mixin
2017-03-29 18:49:15 -07:00
Noah Levitt
125d77b8c4
consolidate job.py and site.py into model.py, and let Job and Site share the elapsed() method by way of a mixin
2017-03-29 18:49:04 -07:00
Noah Levitt
a83c11b302
Merge branch 'master' into qa
...
* master:
new model for crawling hashtags, each one is no longer a top-level page
remove some vestiges of old proxy stuff
2017-03-27 12:16:11 -07:00