432 Commits

Author SHA1 Message Date
Noah Levitt
b409e49cfa deprecate current scope rule syntax and create new syntax with slightly different semantics (to be documented), and add parent_url_regex scope rule; unit test for scoping 2017-02-15 16:46:45 -08:00
Noah Levitt
c0057e591a add --yaml option to brozzler-list-* commands 2017-02-15 23:13:09 +00:00
Noah Levitt
1054e8e3cb take screenshot before running behavior (but after login) - thanks danielbicho 2017-02-15 09:13:44 -08:00
Noah Levitt
e58f4b7c44 logging tweaks 2017-02-10 15:19:28 -08:00
Noah Levitt
09fa41f959 fix TypeError: not all arguments converted during string formatting 2017-02-03 17:24:47 -08:00
Noah Levitt
14e312e4c4 make sure site is not "claimed" when it's finished 2017-02-03 16:40:15 -08:00
Noah Levitt
a60878c5a7 support for resuming jobs, keeping track of each start and stop time, used to enforce time limits correctly 2017-02-03 14:56:12 -08:00
Noah Levitt
5a0301ac12 let rethinkdb generate job.id if not supplied in configuration 2017-02-03 14:53:50 -08:00
Noah Levitt
129a1e8f47 use underscore convention 2017-02-02 11:52:19 -08:00
Noah Levitt
5f4c5190da improve TRACE level logging 2017-02-02 11:41:40 -08:00
Noah Levitt
ed2d58d87d stopgap fix for problem where an attempt to save a screenshot of a url with a hash tag containing spaces or non-ascii characters would fail, causing the whole brozzle of the page to fail, and end up in a retry loop (better handling of hash tags is planned which will obviate this change) 2017-02-01 22:39:12 +00:00
Noah Levitt
5c684779e5 pywb support for thumbnail: and screenshot: urls 2017-01-31 10:26:38 -08:00
Noah Levitt
8f5003b784 fix oops 2017-01-30 23:47:39 -08:00
Noah Levitt
4b6831b464 new flag Page.blocked_by_robots 2017-01-30 10:43:25 -08:00
Noah Levitt
a8b564f100 be more patient to avoid spurious warnings waiting for browser to start up 2017-01-24 10:06:37 -08:00
Noah Levitt
d22cc075e0 restore ping_timeout argument to WebSocketApp.run_forever to fix problem of leaking websocket receiver threads hanging forever on select() 2017-01-24 09:55:56 -08:00
Noah Levitt
5375b819dd missed a spot 2017-01-20 23:59:31 -08:00
Noah Levitt
c3b637d244 improve brozzler-dashboard logging; fix default wayback baseurl in brozzler dashboard (https://github.com/internetarchive/brozzler/issues/31); tweak arg parsing related stuff 2017-01-20 23:41:59 -08:00
Noah Levitt
095456aa27 avoid js errors in case site or job is not configured to keep stats 2017-01-20 23:36:23 -08:00
Noah Levitt
65f818e901 add travis-ci slack notification to internetarchive/brozzler channel 2017-01-16 12:44:12 -08:00
Noah Levitt
037723fe2b support for BROZZLER_RETHINKDB_SERVERS and BROZZLER_RETHINKDB_DB environment variables, honored by all the brozzler-* commands 2017-01-13 20:27:09 +00:00
Noah Levitt
77c4dc1116 adapt to exception message from newer versions of chromium (e.g. 57.0.2981.0) 2017-01-13 12:08:00 -08:00
Noah Levitt
011d814ee2 tests for dismissal of javascript dialogs (alert, prompt, confirm) 2017-01-13 11:46:42 -08:00
Noah Levitt
d2ed6b97a2 dismiss alerts from the page being browsed (avoids hanging) 2017-01-13 10:27:37 -08:00
Noah Levitt
766441e65c simpleclicks - only click if element is visible, fixes spinning on moma.org sites 2017-01-12 23:23:46 -08:00
Noah Levitt
38d9eee68d implement brozzler-list-pages 2017-01-12 08:22:45 +00:00
Noah Levitt
184612332e new cli utils brozzler-list-jobs and brozzler-list-sites 2017-01-12 07:50:58 +00:00
Noah Levitt
64a0ea879a implement sha1 lookup and url prefix lookup for brozzler-list-captures 2017-01-12 01:26:09 +00:00
Noah Levitt
32097a8f8b catch exceptions parsing funky urls when scoping and extracting outlinks 2017-01-09 15:18:19 -08:00
Noah Levitt
2486768830 fix bug where login form would not be detected in some cases when there was a non-login form earlier on the page 2017-01-09 11:40:30 -08:00
Noah Levitt
d0022fe7bf reset browser shutdown flag when starting up 2017-01-06 17:57:11 -08:00
Noah Levitt
76b658747e fix oversight including username/password in site config when starting a new job 2017-01-06 13:03:09 -08:00
Noah Levitt
c2704b18be restore BrozzlerWorker built-in support for managing its own thread 2017-01-04 14:57:34 -08:00
Noah Levitt
70b67942a5 restore handling of 420 Reached limit, with a rudimentary test 2016-12-22 13:44:09 -08:00
Noah Levitt
e5fb6cb4b9 add import missing from test 2016-12-21 19:19:34 -08:00
Noah Levitt
c90c73372e need $DISPLAY set for test_brozzling.py 2016-12-21 15:15:03 -08:00
Noah Levitt
f7427219cf restore handling of "aw snap" or "he's dead jim" 2016-12-21 14:21:20 -08:00
Noah Levitt
a5d48a9fdb add seed username/password parameters to job config schema 2016-12-20 18:06:20 -08:00
Noah Levitt
edf0a3a50d convert mouseovers and simpleclicks to jinja2 2016-12-20 17:34:29 -08:00
Noah Levitt
e2dbf68ccd remove obsolete facebook login code 2016-12-20 16:38:11 -08:00
Noah Levitt
a0b61408b9 convert behaviors to jinja2, move them to new subdir js-templates, along with javascript previously stored as a string in browser.py 2016-12-20 16:33:25 -08:00
Noah Levitt
7a40822e64 forgot to git add new test data 2016-12-19 18:10:07 -08:00
Noah Levitt
2f8f20bbb4 detect <input type="email"> as potential username field for login 2016-12-19 18:08:10 -08:00
Noah Levitt
86ac48d6c3 generalized support for login doing automatic detection of login form on a page 2016-12-19 17:30:09 -08:00
Noah Levitt
bc6e0d243f yet more refactoring of browser.py, clearer separation of purpose, Browser class manages browsing, sends most of the messages to chrome, WebsockReceiverThread handles messages that come back from chrome 2016-12-16 13:52:12 -08:00
Noah Levitt
534d2e63d6 bump version number in setup.py 2016-12-15 16:43:27 -08:00
Noah Levitt
f6333df6ef back to dev version number 2016-12-15 12:34:26 -08:00
Noah Levitt
85de2fad6a i dub thee 1.1b8 2016-12-15 12:33:34 -08:00
Noah Levitt
d68053764c fix bug handling page with zero outlinks 2016-12-09 16:43:23 -08:00
Noah Levitt
af1e1c75ec avoid infinite loop in case youtube-dl encounters redirect loop (which can be ok if cookies have been set or something) 2016-12-09 14:16:27 -08:00