25 Commits

Author SHA1 Message Date
Noah Levitt
a74f46dc53 least surprise on http/https seed redirects
if http://foo.com/ redirects to https://foo.com/a/b/c let's also
put all of https://foo.com/ in scope
2018-12-21 15:17:31 -08:00
Noah Levitt
db62402be8 fix tests 2018-11-27 14:35:00 -08:00
Noah Levitt
e7d2273856 fix failing tests 2018-08-16 11:40:54 -07:00
Noah Levitt
c52c16c260 fix bug in test, add another one 2018-06-22 16:10:23 -05:00
Noah Levitt
aeb7c3f825 treat any error fetching robots.txt as "allow all" 2018-06-22 14:50:57 -05:00
Noah Levitt
05f8ab3495 fix more tests for new approach sans scope['surt'] 2018-05-14 15:38:28 -07:00
Noah Levitt
bf5401283e new test test_needs_browsing
currently exposes bug in resolving "location" response header
2018-01-26 10:59:18 -08:00
Noah Levitt
d514eaec15 even more, better failing tests for thread_raise 2017-05-16 14:00:10 -07:00
Noah Levitt
d2525e2e87 failing test for forthcoming behavior of thread_raise 2017-05-15 16:20:20 -07:00
Noah Levitt
0953e6972e refactor thread_raise safety to use a context manager 2017-04-24 19:51:51 -07:00
Noah Levitt
7706bab8b8 safen up brozzler.thread_raise() to avoid interrupting rethinkdb transactions and such 2017-04-20 17:08:16 -07:00
Noah Levitt
5603ff5380 have _warcprox_write_record also raise ProxyError when appropriate, and test this 2017-04-18 16:58:51 -07:00
Noah Levitt
ac972d399f fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth 2017-04-18 12:00:23 -07:00
Noah Levitt
dc43794363 raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch 2017-04-17 18:15:22 -07:00
Noah Levitt
0884b4cd56 bubble up proxy errors fetching robots.txt, with unit test, and documentation 2017-04-17 16:47:05 -07:00
Noah Levitt
12fb9eaa15 use urlcanon library for canonicalization, surtification, scope match rules 2017-03-15 14:59:51 -07:00
Noah Levitt
40bbbb3524 add tests of backwards compatibility handling of start/stop times and fix a bug or two 2017-03-02 16:53:24 -08:00
Noah Levitt
569af05b11 rethinkstuff is now "doublethink 2017-03-02 12:48:45 -08:00
Noah Levitt
2398031010 let the OS pick an available port, to avoid what appear to be timing issues causing multiple browsers to choose the same port 2017-02-22 12:44:19 -08:00
Noah Levitt
b409e49cfa deprecate current scope rule syntax and create new syntax with slightly different semantics (to be documented), and add parent_url_regex scope rule; unit test for scoping 2017-02-15 16:46:45 -08:00
Noah Levitt
86d6060a2d loosen the find_available_port test slightly, since it seems to be not 100% predictable for reasons i haven't investigated 2016-12-20 17:52:21 -08:00
Noah Levitt
9bcec54f4b fix _find_available_port and its unit test 2016-12-07 14:08:34 -08:00
Noah Levitt
eed8b9ec30 little fixes 2016-12-07 11:20:10 -08:00
Noah Levitt
ce03381b92 move _find_available_ports to chrome.py, changing the way it works so that browser:9200 doesn't get stuck at 9201 forever, which pushes 9201 to 9202 etc, and add a unit test 2016-12-06 17:12:20 -08:00
Noah Levitt
3aead6de93 monkey-patch reppy to support substring user-agent matching 2016-11-16 11:41:34 -08:00