Noah Levitt
|
242ff51ec7
|
fix bug with seed redirects where scope change was applied too late to affect scoping of outlinks from the seed (with automated tests)
|
2017-03-06 15:13:40 -08:00 |
|
Noah Levitt
|
40bbbb3524
|
add tests of backwards compatibility handling of start/stop times and fix a bug or two
|
2017-03-02 16:53:24 -08:00 |
|
Noah Levitt
|
569af05b11
|
rethinkstuff is now "doublethink
|
2017-03-02 12:48:45 -08:00 |
|
Noah Levitt
|
700b08b7d7
|
use new rethinkstuff ORM
|
2017-02-28 16:12:50 -08:00 |
|
Noah Levitt
|
a1f1681cad
|
fix issue where use of YoutubeDLSpy caused youtube-dl connections to remote servers to be kept open
|
2017-02-24 11:15:17 -08:00 |
|
Noah Levitt
|
b4f19e2594
|
fix typo
|
2017-02-23 10:47:04 -08:00 |
|
Noah Levitt
|
7417310d57
|
more pywb monkey-patching to get at least some youtube videos captured by brozzler to play back
|
2017-02-23 10:43:07 -08:00 |
|
Noah Levitt
|
2398031010
|
let the OS pick an available port, to avoid what appear to be timing issues causing multiple browsers to choose the same port
|
2017-02-22 12:44:19 -08:00 |
|
Noah Levitt
|
3c4ab834da
|
handle errors from extract-outlinks.js, which happens on polyvore.com because it changes the definition of Set 😭
|
2017-02-22 10:57:11 -08:00 |
|
Noah Levitt
|
0d0da22613
|
brozzler-list-jobs --yaml
|
2017-02-16 10:20:36 -08:00 |
|
Noah Levitt
|
f02d4ed40e
|
missed this in the last commit
|
2017-02-15 23:20:47 -08:00 |
|
Noah Levitt
|
b409e49cfa
|
deprecate current scope rule syntax and create new syntax with slightly different semantics (to be documented), and add parent_url_regex scope rule; unit test for scoping
|
2017-02-15 16:46:45 -08:00 |
|
Noah Levitt
|
c0057e591a
|
add --yaml option to brozzler-list-* commands
|
2017-02-15 23:13:09 +00:00 |
|
Noah Levitt
|
1054e8e3cb
|
take screenshot before running behavior (but after login) - thanks danielbicho
|
2017-02-15 09:13:44 -08:00 |
|
Noah Levitt
|
e58f4b7c44
|
logging tweaks
|
2017-02-10 15:19:28 -08:00 |
|
Noah Levitt
|
09fa41f959
|
fix TypeError: not all arguments converted during string formatting
|
2017-02-03 17:24:47 -08:00 |
|
Noah Levitt
|
14e312e4c4
|
make sure site is not "claimed" when it's finished
|
2017-02-03 16:40:15 -08:00 |
|
Noah Levitt
|
a60878c5a7
|
support for resuming jobs, keeping track of each start and stop time, used to enforce time limits correctly
|
2017-02-03 14:56:12 -08:00 |
|
Noah Levitt
|
5a0301ac12
|
let rethinkdb generate job.id if not supplied in configuration
|
2017-02-03 14:53:50 -08:00 |
|
Noah Levitt
|
129a1e8f47
|
use underscore convention
|
2017-02-02 11:52:19 -08:00 |
|
Noah Levitt
|
5f4c5190da
|
improve TRACE level logging
|
2017-02-02 11:41:40 -08:00 |
|
Noah Levitt
|
ed2d58d87d
|
stopgap fix for problem where an attempt to save a screenshot of a url with a hash tag containing spaces or non-ascii characters would fail, causing the whole brozzle of the page to fail, and end up in a retry loop (better handling of hash tags is planned which will obviate this change)
|
2017-02-01 22:39:12 +00:00 |
|
Noah Levitt
|
5c684779e5
|
pywb support for thumbnail: and screenshot: urls
|
2017-01-31 10:26:38 -08:00 |
|
Noah Levitt
|
8f5003b784
|
fix oops
|
2017-01-30 23:47:39 -08:00 |
|
Noah Levitt
|
4b6831b464
|
new flag Page.blocked_by_robots
|
2017-01-30 10:43:25 -08:00 |
|
Noah Levitt
|
a8b564f100
|
be more patient to avoid spurious warnings waiting for browser to start up
|
2017-01-24 10:06:37 -08:00 |
|
Noah Levitt
|
d22cc075e0
|
restore ping_timeout argument to WebSocketApp.run_forever to fix problem of leaking websocket receiver threads hanging forever on select()
|
2017-01-24 09:55:56 -08:00 |
|
Noah Levitt
|
5375b819dd
|
missed a spot
|
2017-01-20 23:59:31 -08:00 |
|
Noah Levitt
|
c3b637d244
|
improve brozzler-dashboard logging; fix default wayback baseurl in brozzler dashboard (https://github.com/internetarchive/brozzler/issues/31); tweak arg parsing related stuff
|
2017-01-20 23:41:59 -08:00 |
|
Noah Levitt
|
095456aa27
|
avoid js errors in case site or job is not configured to keep stats
|
2017-01-20 23:36:23 -08:00 |
|
Noah Levitt
|
65f818e901
|
add travis-ci slack notification to internetarchive/brozzler channel
|
2017-01-16 12:44:12 -08:00 |
|
Noah Levitt
|
037723fe2b
|
support for BROZZLER_RETHINKDB_SERVERS and BROZZLER_RETHINKDB_DB environment variables, honored by all the brozzler-* commands
|
2017-01-13 20:27:09 +00:00 |
|
Noah Levitt
|
77c4dc1116
|
adapt to exception message from newer versions of chromium (e.g. 57.0.2981.0)
|
2017-01-13 12:08:00 -08:00 |
|
Noah Levitt
|
011d814ee2
|
tests for dismissal of javascript dialogs (alert, prompt, confirm)
|
2017-01-13 11:46:42 -08:00 |
|
Noah Levitt
|
d2ed6b97a2
|
dismiss alerts from the page being browsed (avoids hanging)
|
2017-01-13 10:27:37 -08:00 |
|
Noah Levitt
|
766441e65c
|
simpleclicks - only click if element is visible, fixes spinning on moma.org sites
|
2017-01-12 23:23:46 -08:00 |
|
Noah Levitt
|
38d9eee68d
|
implement brozzler-list-pages
|
2017-01-12 08:22:45 +00:00 |
|
Noah Levitt
|
184612332e
|
new cli utils brozzler-list-jobs and brozzler-list-sites
|
2017-01-12 07:50:58 +00:00 |
|
Noah Levitt
|
64a0ea879a
|
implement sha1 lookup and url prefix lookup for brozzler-list-captures
|
2017-01-12 01:26:09 +00:00 |
|
Noah Levitt
|
32097a8f8b
|
catch exceptions parsing funky urls when scoping and extracting outlinks
|
2017-01-09 15:18:19 -08:00 |
|
Noah Levitt
|
2486768830
|
fix bug where login form would not be detected in some cases when there was a non-login form earlier on the page
|
2017-01-09 11:40:30 -08:00 |
|
Noah Levitt
|
d0022fe7bf
|
reset browser shutdown flag when starting up
|
2017-01-06 17:57:11 -08:00 |
|
Noah Levitt
|
76b658747e
|
fix oversight including username/password in site config when starting a new job
|
2017-01-06 13:03:09 -08:00 |
|
Noah Levitt
|
c2704b18be
|
restore BrozzlerWorker built-in support for managing its own thread
|
2017-01-04 14:57:34 -08:00 |
|
Noah Levitt
|
70b67942a5
|
restore handling of 420 Reached limit, with a rudimentary test
|
2016-12-22 13:44:09 -08:00 |
|
Noah Levitt
|
e5fb6cb4b9
|
add import missing from test
|
2016-12-21 19:19:34 -08:00 |
|
Noah Levitt
|
eabb0fb114
|
restore support for on_response and on_request, with an automated test for on_response
|
2016-12-21 18:35:55 -08:00 |
|
Noah Levitt
|
c90c73372e
|
need $DISPLAY set for test_brozzling.py
|
2016-12-21 15:15:03 -08:00 |
|
Noah Levitt
|
f7427219cf
|
restore handling of "aw snap" or "he's dead jim"
|
2016-12-21 14:21:20 -08:00 |
|
Noah Levitt
|
a5d48a9fdb
|
add seed username/password parameters to job config schema
|
2016-12-20 18:06:20 -08:00 |
|