Commit graph

1361 commits

Author SHA1 Message Date
Noah Levitt
000d40c4dc Merge pull request #39 from bnewbold/bnewbold-pr-template
add a github PR template for this repo
2017-04-26 14:34:32 -07:00
bnewbold
83552eb444 add a github PR template for this repo 2017-04-26 14:10:24 -07:00
Noah Levitt
d972919db0 Merge pull request #36 from nlevitt/safe-thread-raise
safen up brozzler.thread_raise() to avoid interrupting rethinkdb tran…
2017-04-26 11:15:02 -07:00
Noah Levitt
27ee8d53f8 Merge pull request #38 from ato/headless-doc
update headless chrome instructions for regular chrome builds
2017-04-25 09:39:43 -07:00
Alex Osborne
69aba8b762 update headless chrome instructions for regular chrome builds
Also make it clearer that this hasn't been tested much.
2017-04-25 15:00:25 +10:00
Noah Levitt
dcf4811470 Merge branch 'master' into safe-thread-raise 2017-04-24 20:06:37 -07:00
Noah Levitt
d916b68ab9 use the new api with brozzler.thread_accept_exceptions() 2017-04-24 20:02:34 -07:00
Noah Levitt
0953e6972e refactor thread_raise safety to use a context manager 2017-04-24 19:51:51 -07:00
Noah Levitt
f140e5bdbd allow this stupid test to fail 2017-04-21 12:17:11 -07:00
Noah Levitt
ba519d7288 improve messaging when brozzler-stop-crawl is passed nonexistent seed/job id 2017-04-20 18:04:17 -07:00
Noah Levitt
7706bab8b8 safen up brozzler.thread_raise() to avoid interrupting rethinkdb transactions and such 2017-04-20 17:08:16 -07:00
Noah Levitt
4f5553954c Merge branch 'master' into qa
* master:
  quote that shell meta character
  need warcprox in python path for travis tests now
2017-04-19 08:58:47 -07:00
Noah Levitt
b3fa7a4e39 quote that shell meta character 2017-04-18 18:46:59 -07:00
Noah Levitt
426916a238 need warcprox in python path for travis tests now 2017-04-18 18:10:18 -07:00
Noah Levitt
d8904dc9e7 Merge branch 'master' into qa
* master:
  implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker
  have _warcprox_write_record also raise ProxyError when appropriate, and test this
2017-04-18 17:54:21 -07:00
Noah Levitt
8256a34b4f implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker 2017-04-18 17:54:12 -07:00
Noah Levitt
5603ff5380 have _warcprox_write_record also raise ProxyError when appropriate, and test this 2017-04-18 16:58:51 -07:00
Neil Minton
f90d05075e Merge branch 'ari-4960' into qa 2017-04-18 15:24:14 -07:00
Neil Minton
f541dce5c3 Crawl Google Calendar for fortstjames.ca 2017-04-18 15:22:33 -07:00
Noah Levitt
d05474173a Merge branch 'master' into qa
* master:
  fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth
2017-04-18 12:00:33 -07:00
Noah Levitt
ac972d399f fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth 2017-04-18 12:00:23 -07:00
Noah Levitt
6844cb5bcb Merge branch 'master' into qa
* master:
  raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch
  raise new exception brozzler.ProxyError in case of proxy error browsing a page
  make brozzle-page respect --proxy (no test for this!)
  oops, version bump for previous commit
  bubble up proxy errors fetching robots.txt, with unit test, and documentation
2017-04-17 18:15:32 -07:00
Noah Levitt
dc43794363 raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch 2017-04-17 18:15:22 -07:00
Noah Levitt
349b41ab32 raise new exception brozzler.ProxyError in case of proxy error browsing a page 2017-04-17 18:14:02 -07:00
Noah Levitt
87a7301f4d make brozzle-page respect --proxy (no test for this!) 2017-04-17 18:11:09 -07:00
Noah Levitt
0e90950de2 oops, version bump for previous commit 2017-04-17 18:10:56 -07:00
Noah Levitt
0884b4cd56 bubble up proxy errors fetching robots.txt, with unit test, and documentation 2017-04-17 16:47:05 -07:00
Noah Levitt
929f046ebb Merge branch 'master' into qa
* master:
  new command line utility brozzler-stop-crawl, with tests
2017-04-14 18:06:24 -07:00
Noah Levitt
df7734f2ca new command line utility brozzler-stop-crawl, with tests 2017-04-14 18:06:15 -07:00
Barbara Miller
11279e001b Merge branch 'ARI-5259' into qa 2017-04-14 15:20:11 -07:00
Barbara Miller
72e9d8da58 blog.sin.com.cn pagination 2017-04-14 15:19:12 -07:00
Noah Levitt
a768b07a65 Merge branch 'master' into qa
* master:
  parameterize command line entry points and add tests of --version, a rudimentary check that the commands at least run
2017-04-14 11:46:39 -07:00
Noah Levitt
fae60e9960 parameterize command line entry points and add tests of --version, a rudimentary check that the commands at least run 2017-04-14 11:46:26 -07:00
Noah Levitt
b7731bdc75 Merge branch 'master' into qa
* master:
  stupid version number bump
  Revert "bump version number for last pull request"
2017-04-05 17:02:09 -07:00
Noah Levitt
b3cf746f53 stupid version number bump 2017-04-05 17:01:52 -07:00
Noah Levitt
62917a6f1a Revert "bump version number for last pull request"
This reverts commit d192fc269e.
2017-04-05 17:01:06 -07:00
Noah Levitt
40022ad0c2 Merge branch 'master' into qa
* master:
  bump version number for last pull request
2017-04-05 16:15:33 -07:00
Noah Levitt
d192fc269e bump version number for last pull request 2017-04-05 16:15:24 -07:00
Barbara Miller
537eb1cf7f Merge pull request #34 from galgeek/ARI-5193
mouseover for ky.gov sites
2017-04-05 16:13:57 -07:00
Noah Levitt
6535922fa6 Merge branch 'master' into qa
* master:
  extract area/@href links, and add test for outlink extraction
2017-04-05 12:09:58 -07:00
Noah Levitt
5bcd10c228 extract area/@href links, and add test for outlink extraction 2017-04-05 12:09:48 -07:00
Barbara Miller
010c2869dd Merge branch 'ARI-5193' into qa 2017-04-04 15:52:50 -07:00
Barbara Miller
847b68eaf4 add JIRA info 2017-04-04 15:52:03 -07:00
Barbara Miller
bdf4c2db42 Merge branch 'ARI-5193' into qa 2017-03-31 15:49:24 -07:00
Barbara Miller
901321199c mouseover for ky.gov sites 2017-03-31 15:48:01 -07:00
Noah Levitt
db8c1d36fa Merge branch 'master' into qa
* master:
  ugh fix version number
2017-03-30 17:53:45 -07:00
Noah Levitt
d4d3ef4fd3 ugh fix version number 2017-03-30 17:53:36 -07:00
Noah Levitt
082f10d327 Merge branch 'master' into qa
* master:
  consolidate job.py and site.py into model.py, and let Job and Site share the elapsed() method by way of a mixin
2017-03-29 18:49:15 -07:00
Noah Levitt
125d77b8c4 consolidate job.py and site.py into model.py, and let Job and Site share the elapsed() method by way of a mixin 2017-03-29 18:49:04 -07:00
Noah Levitt
a83c11b302 Merge branch 'master' into qa
* master:
  new model for crawling hashtags, each one is no longer a top-level page
  remove some vestiges of old proxy stuff
2017-03-27 12:16:11 -07:00