Noah Levitt
|
5f4c5190da
|
improve TRACE level logging
|
2017-02-02 11:41:40 -08:00 |
|
Noah Levitt
|
ed2d58d87d
|
stopgap fix for problem where an attempt to save a screenshot of a url with a hash tag containing spaces or non-ascii characters would fail, causing the whole brozzle of the page to fail, and end up in a retry loop (better handling of hash tags is planned which will obviate this change)
|
2017-02-01 22:39:12 +00:00 |
|
Noah Levitt
|
5c684779e5
|
pywb support for thumbnail: and screenshot: urls
|
2017-01-31 10:26:38 -08:00 |
|
Noah Levitt
|
8f5003b784
|
fix oops
|
2017-01-30 23:47:39 -08:00 |
|
Noah Levitt
|
4b6831b464
|
new flag Page.blocked_by_robots
|
2017-01-30 10:43:25 -08:00 |
|
Noah Levitt
|
a8b564f100
|
be more patient to avoid spurious warnings waiting for browser to start up
|
2017-01-24 10:06:37 -08:00 |
|
Noah Levitt
|
d22cc075e0
|
restore ping_timeout argument to WebSocketApp.run_forever to fix problem of leaking websocket receiver threads hanging forever on select()
|
2017-01-24 09:55:56 -08:00 |
|
Noah Levitt
|
5375b819dd
|
missed a spot
|
2017-01-20 23:59:31 -08:00 |
|
Noah Levitt
|
c3b637d244
|
improve brozzler-dashboard logging; fix default wayback baseurl in brozzler dashboard (https://github.com/internetarchive/brozzler/issues/31); tweak arg parsing related stuff
|
2017-01-20 23:41:59 -08:00 |
|
Noah Levitt
|
095456aa27
|
avoid js errors in case site or job is not configured to keep stats
|
2017-01-20 23:36:23 -08:00 |
|
Noah Levitt
|
65f818e901
|
add travis-ci slack notification to internetarchive/brozzler channel
|
2017-01-16 12:44:12 -08:00 |
|
Noah Levitt
|
037723fe2b
|
support for BROZZLER_RETHINKDB_SERVERS and BROZZLER_RETHINKDB_DB environment variables, honored by all the brozzler-* commands
|
2017-01-13 20:27:09 +00:00 |
|
Noah Levitt
|
77c4dc1116
|
adapt to exception message from newer versions of chromium (e.g. 57.0.2981.0)
|
2017-01-13 12:08:00 -08:00 |
|
Noah Levitt
|
011d814ee2
|
tests for dismissal of javascript dialogs (alert, prompt, confirm)
|
2017-01-13 11:46:42 -08:00 |
|
Noah Levitt
|
d2ed6b97a2
|
dismiss alerts from the page being browsed (avoids hanging)
|
2017-01-13 10:27:37 -08:00 |
|
Noah Levitt
|
766441e65c
|
simpleclicks - only click if element is visible, fixes spinning on moma.org sites
|
2017-01-12 23:23:46 -08:00 |
|
Noah Levitt
|
38d9eee68d
|
implement brozzler-list-pages
|
2017-01-12 08:22:45 +00:00 |
|
Noah Levitt
|
184612332e
|
new cli utils brozzler-list-jobs and brozzler-list-sites
|
2017-01-12 07:50:58 +00:00 |
|
Noah Levitt
|
64a0ea879a
|
implement sha1 lookup and url prefix lookup for brozzler-list-captures
|
2017-01-12 01:26:09 +00:00 |
|
Noah Levitt
|
32097a8f8b
|
catch exceptions parsing funky urls when scoping and extracting outlinks
|
2017-01-09 15:18:19 -08:00 |
|
Noah Levitt
|
2486768830
|
fix bug where login form would not be detected in some cases when there was a non-login form earlier on the page
|
2017-01-09 11:40:30 -08:00 |
|
Noah Levitt
|
d0022fe7bf
|
reset browser shutdown flag when starting up
|
2017-01-06 17:57:11 -08:00 |
|
Noah Levitt
|
76b658747e
|
fix oversight including username/password in site config when starting a new job
|
2017-01-06 13:03:09 -08:00 |
|
Noah Levitt
|
c2704b18be
|
restore BrozzlerWorker built-in support for managing its own thread
|
2017-01-04 14:57:34 -08:00 |
|
Noah Levitt
|
70b67942a5
|
restore handling of 420 Reached limit, with a rudimentary test
|
2016-12-22 13:44:09 -08:00 |
|
Noah Levitt
|
e5fb6cb4b9
|
add import missing from test
|
2016-12-21 19:19:34 -08:00 |
|
Noah Levitt
|
c90c73372e
|
need $DISPLAY set for test_brozzling.py
|
2016-12-21 15:15:03 -08:00 |
|
Noah Levitt
|
f7427219cf
|
restore handling of "aw snap" or "he's dead jim"
|
2016-12-21 14:21:20 -08:00 |
|
Noah Levitt
|
a5d48a9fdb
|
add seed username/password parameters to job config schema
|
2016-12-20 18:06:20 -08:00 |
|
Noah Levitt
|
edf0a3a50d
|
convert mouseovers and simpleclicks to jinja2
|
2016-12-20 17:34:29 -08:00 |
|
Noah Levitt
|
e2dbf68ccd
|
remove obsolete facebook login code
|
2016-12-20 16:38:11 -08:00 |
|
Noah Levitt
|
a0b61408b9
|
convert behaviors to jinja2, move them to new subdir js-templates, along with javascript previously stored as a string in browser.py
|
2016-12-20 16:33:25 -08:00 |
|
Noah Levitt
|
7a40822e64
|
forgot to git add new test data
|
2016-12-19 18:10:07 -08:00 |
|
Noah Levitt
|
2f8f20bbb4
|
detect <input type="email"> as potential username field for login
|
2016-12-19 18:08:10 -08:00 |
|
Noah Levitt
|
86ac48d6c3
|
generalized support for login doing automatic detection of login form on a page
|
2016-12-19 17:30:09 -08:00 |
|
Noah Levitt
|
bc6e0d243f
|
yet more refactoring of browser.py, clearer separation of purpose, Browser class manages browsing, sends most of the messages to chrome, WebsockReceiverThread handles messages that come back from chrome
|
2016-12-16 13:52:12 -08:00 |
|
Noah Levitt
|
534d2e63d6
|
bump version number in setup.py
|
2016-12-15 16:43:27 -08:00 |
|
Noah Levitt
|
f6333df6ef
|
back to dev version number
|
2016-12-15 12:34:26 -08:00 |
|
Noah Levitt
|
85de2fad6a
|
i dub thee 1.1b8
|
2016-12-15 12:33:34 -08:00 |
|
Noah Levitt
|
d68053764c
|
fix bug handling page with zero outlinks
|
2016-12-09 16:43:23 -08:00 |
|
Noah Levitt
|
af1e1c75ec
|
avoid infinite loop in case youtube-dl encounters redirect loop (which can be ok if cookies have been set or something)
|
2016-12-09 14:16:27 -08:00 |
|
Noah Levitt
|
f6a25aa4f0
|
brozzler logo svg with small default size
|
2016-12-08 15:16:02 -08:00 |
|
Noah Levitt
|
40b4d9bfe8
|
travis-ci slack integration
|
2016-12-07 14:46:29 -08:00 |
|
Noah Levitt
|
9bcec54f4b
|
fix _find_available_port and its unit test
|
2016-12-07 14:08:34 -08:00 |
|
Noah Levitt
|
eed8b9ec30
|
little fixes
|
2016-12-07 11:20:10 -08:00 |
|
Noah Levitt
|
0b6c5346bd
|
avoid broken version of websocket-client to fix https://github.com/internetarchive/brozzler/issues/28
|
2016-12-07 11:18:41 -08:00 |
|
Noah Levitt
|
e250c4ca89
|
wrong branch of warcprox in ansible install
|
2016-12-07 09:33:06 -08:00 |
|
Noah Levitt
|
d3063fbd2b
|
move cookie db management code into chrome.py
|
2016-12-06 18:04:51 -08:00 |
|
Noah Levitt
|
ce03381b92
|
move _find_available_ports to chrome.py, changing the way it works so that browser:9200 doesn't get stuck at 9201 forever, which pushes 9201 to 9202 etc, and add a unit test
|
2016-12-06 17:12:20 -08:00 |
|
Noah Levitt
|
74009852d6
|
split Chrome class into its own module
|
2016-12-06 12:50:38 -08:00 |
|