brozzler

mirror of https://github.com/internetarchive/brozzler.git synced 2025-08-18 19:18:15 -04:00

Author	SHA1	Message	Date
Noah Levitt	f2227e6759	have travis-ci test against python 3.5 and 3.6 too	2017-05-26 13:28:00 -07:00
Noah Levitt	bdc0badec3	rewrite frontier.scope_and_schedule_outlinks() to use batch rethinkdb queries, because we have witnessed the method running for hours(!)	2017-05-26 13:24:14 -07:00
Noah Levitt	0a19770ba7	Merge branch 'master' into qa * master: remove stray logging	2017-05-24 11:36:15 -07:00
Noah Levitt	d904daea9c	remove stray logging	2017-05-24 11:36:06 -07:00
Noah Levitt	f0b9020c0a	Merge branch 'master' into qa * master: use "ttl" for updated doublethink svc reg api	2017-05-23 11:35:19 -07:00
Noah Levitt	ac543ee5b6	use "ttl" for updated doublethink svc reg api	2017-05-23 11:33:04 -07:00
Barbara Miller	079db762d4	add relocated behavior file with updated copyright	2017-05-22 12:38:50 -07:00
Barbara Miller	d7c31be8d0	enable huffpostslides.js	2017-05-22 12:32:28 -07:00
Noah Levitt	9c8f626c38	Merge branch 'master' into qa * master: fix exception from ReachedLimit.__repr__ when it has been instantiated implicitly and __init__ was not called improve thread_raise() so that the new tests pass even more, better failing tests for thread_raise failing test for forthcoming behavior of thread_raise	2017-05-16 15:47:27 -07:00
Noah Levitt	89e7c8b079	fix exception from ReachedLimit.__repr__ when it has been instantiated implicitly and __init__ was not called	2017-05-16 15:47:18 -07:00
Noah Levitt	31dc6a2d97	improve thread_raise() so that the new tests pass 1. If thread is not currently accepting exceptions, queue it and raise if and when it does start accepting them. This fixes problem of thread_raise exceptions being ignored when raised just before the target thread starts accepting exceptions. 2. Avoid problems caused by raising multiple exceptions in the same thread in quick succession by ensuring that only one is actually raised for a given `with` block. This type of occurrence had been putting brozzler into a borked/frozen state.	2017-05-16 14:20:53 -07:00
Noah Levitt	d514eaec15	even more, better failing tests for thread_raise	2017-05-16 14:00:10 -07:00
Noah Levitt	d2525e2e87	failing test for forthcoming behavior of thread_raise	2017-05-15 16:20:20 -07:00
Noah Levitt	e5371ef0b0	Merge branch 'master' into qa * master: recognize ConnectionError (of which ConnectionResetError is a subclass) in _warcprox_write_record as a proxy error	2017-05-12 10:04:06 -07:00
Noah Levitt	60c5a7c1c4	recognize ConnectionError (of which ConnectionResetError is a subclass) in _warcprox_write_record as a proxy error	2017-05-12 10:03:53 -07:00
Barbara Miller	b2a4fbb17f	Merge branch 'ARI-4868' into qa	2017-05-10 15:09:00 -07:00
Barbara Miller	35977f6276	enable huffpostslides.js	2017-05-10 15:08:29 -07:00
Barbara Miller	054625b8a5	Merge pull request #40 from BitBaron/ari-4960 Crawl Google Calendar for fortstjames.ca	2017-05-09 14:12:48 -07:00
Noah Levitt	0c45ca2211	Merge branch 'master' into qa * master: do a better job of making sure to shut down the browser when brozzle-page is killed	2017-05-03 16:43:38 -07:00
Noah Levitt	b4bf17df9b	do a better job of making sure to shut down the browser when brozzle-page is killed	2017-05-03 16:43:31 -07:00
Noah Levitt	c3637ecb35	Merge branch 'master' into qa * master: handle another rethinkdb outage corner case	2017-05-01 14:12:51 -07:00
Noah Levitt	9d4cbbf6eb	handle another rethinkdb outage corner case	2017-05-01 14:12:43 -07:00
Noah Levitt	15a3da61c6	Merge branch 'master' into qa * master: BrozzlerWorkerThread separate from MainThread to avoid SIGTERM/SIGINT raising exception inside of some rethinkdb code or other sensitive code in that BrozzlerWorker.run() calls	2017-05-01 13:46:28 -07:00
Noah Levitt	389db01458	BrozzlerWorkerThread separate from MainThread to avoid SIGTERM/SIGINT raising exception inside of some rethinkdb code or other sensitive code in that BrozzlerWorker.run() calls	2017-05-01 13:46:19 -07:00
Noah Levitt	69d8571871	Merge branch 'master' into qa * master: re-claim sites after 1 hour instead of 2 so that sites don't have to wait as long to be brozzled again in case of kill -9 brozzler-worker add a github PR template for this repo update headless chrome instructions for regular chrome builds use the new api `with brozzler.thread_accept_exceptions()` refactor thread_raise safety to use a context manager allow this stupid test to fail improve messaging when brozzler-stop-crawl is passed nonexistent seed/job id safen up brozzler.thread_raise() to avoid interrupting rethinkdb transactions and such	2017-05-01 13:00:34 -07:00
Noah Levitt	52433ade78	re-claim sites after 1 hour instead of 2 so that sites don't have to wait as long to be brozzled again in case of kill -9 brozzler-worker	2017-05-01 13:00:04 -07:00
Noah Levitt	000d40c4dc	Merge pull request #39 from bnewbold/bnewbold-pr-template add a github PR template for this repo	2017-04-26 14:34:32 -07:00
bnewbold	83552eb444	add a github PR template for this repo	2017-04-26 14:10:24 -07:00
Noah Levitt	d972919db0	Merge pull request #36 from nlevitt/safe-thread-raise safen up brozzler.thread_raise() to avoid interrupting rethinkdb tran…	2017-04-26 11:15:02 -07:00
Noah Levitt	27ee8d53f8	Merge pull request #38 from ato/headless-doc update headless chrome instructions for regular chrome builds	2017-04-25 09:39:43 -07:00
Alex Osborne	69aba8b762	update headless chrome instructions for regular chrome builds Also make it clearer that this hasn't been tested much.	2017-04-25 15:00:25 +10:00
Noah Levitt	dcf4811470	Merge branch 'master' into safe-thread-raise	2017-04-24 20:06:37 -07:00
Noah Levitt	d916b68ab9	use the new api `with brozzler.thread_accept_exceptions()`	2017-04-24 20:02:34 -07:00
Noah Levitt	0953e6972e	refactor thread_raise safety to use a context manager	2017-04-24 19:51:51 -07:00
Noah Levitt	f140e5bdbd	allow this stupid test to fail	2017-04-21 12:17:11 -07:00
Noah Levitt	ba519d7288	improve messaging when brozzler-stop-crawl is passed nonexistent seed/job id	2017-04-20 18:04:17 -07:00
Noah Levitt	7706bab8b8	safen up brozzler.thread_raise() to avoid interrupting rethinkdb transactions and such	2017-04-20 17:08:16 -07:00
Noah Levitt	4f5553954c	Merge branch 'master' into qa * master: quote that shell meta character need warcprox in python path for travis tests now	2017-04-19 08:58:47 -07:00
Noah Levitt	b3fa7a4e39	quote that shell meta character	2017-04-18 18:46:59 -07:00
Noah Levitt	426916a238	need warcprox in python path for travis tests now	2017-04-18 18:10:18 -07:00
Noah Levitt	d8904dc9e7	Merge branch 'master' into qa * master: implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker have _warcprox_write_record also raise ProxyError when appropriate, and test this	2017-04-18 17:54:21 -07:00
Noah Levitt	8256a34b4f	implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker	2017-04-18 17:54:12 -07:00
Noah Levitt	5603ff5380	have _warcprox_write_record also raise ProxyError when appropriate, and test this	2017-04-18 16:58:51 -07:00
Neil Minton	f90d05075e	Merge branch 'ari-4960' into qa	2017-04-18 15:24:14 -07:00
Neil Minton	f541dce5c3	Crawl Google Calendar for fortstjames.ca	2017-04-18 15:22:33 -07:00
Noah Levitt	d05474173a	Merge branch 'master' into qa * master: fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth	2017-04-18 12:00:33 -07:00
Noah Levitt	ac972d399f	fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth	2017-04-18 12:00:23 -07:00
Noah Levitt	6844cb5bcb	Merge branch 'master' into qa * master: raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch raise new exception brozzler.ProxyError in case of proxy error browsing a page make brozzle-page respect --proxy (no test for this!) oops, version bump for previous commit bubble up proxy errors fetching robots.txt, with unit test, and documentation	2017-04-17 18:15:32 -07:00
Noah Levitt	dc43794363	raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch	2017-04-17 18:15:22 -07:00
Noah Levitt	349b41ab32	raise new exception brozzler.ProxyError in case of proxy error browsing a page	2017-04-17 18:14:02 -07:00

... 10 11 12 13 14 ...

1587 commits