Noah Levitt
|
1c5c9417d2
|
avoid "Uncaught TypeError: Cannot read property 'querySelectorAll' of undefined" from outlinks script
|
2016-08-25 13:10:30 -07:00 |
|
Noah Levitt
|
ed7e01210d
|
little readme fix
|
2016-08-12 17:02:41 -07:00 |
|
Noah Levitt
|
c9bc9fb67d
|
for vagrant, static ansible inventory file, add brozzler-webconsole
|
2016-08-10 18:41:23 -07:00 |
|
Noah Levitt
|
f671cf4f11
|
add info to display of jobless sites in brozzler-webconsole; fix creation of "least_hops" index on the rethinkdb table "pages"
|
2016-08-09 11:24:58 -07:00 |
|
Noah Levitt
|
74b229cfb0
|
add arguments --webconsole-address --webconsole-port --pywb-address and change default ports
|
2016-08-09 10:43:52 -07:00 |
|
Noah Levitt
|
94a8e70226
|
list jobless sites on brozzler-webconsole front page
|
2016-08-08 17:44:41 -07:00 |
|
Noah Levitt
|
4fa1571bc5
|
run brozzler-webconsole inside brozzler-easy
|
2016-08-08 17:43:38 -07:00 |
|
Noah Levitt
|
531b26aabb
|
add section about brozzler-easy to the readme
|
2016-08-05 18:28:30 -07:00 |
|
Noah Levitt
|
c04bf85f4e
|
add --help to brozzler-webconsole
|
2016-08-05 18:19:15 -07:00 |
|
Noah Levitt
|
ba6b342e28
|
fix exception happening now that we have binary data in rethinkdb (the cookie db) "TypeError: <binary, 7168 bytes, '53 51 4c 69 74 65...'> is not JSON serializable"
|
2016-08-05 17:12:22 -07:00 |
|
Noah Levitt
|
a211cc0514
|
dev version number again
|
2016-08-04 17:34:58 -07:00 |
|
Noah Levitt
|
ae63369c3c
|
another version for pypi
|
2016-08-04 17:33:47 -07:00 |
|
Noah Levitt
|
20f9934dd9
|
avoid "Uncaught RangeError: Maximum call stack size exceeded" compiling outlinks
|
2016-08-04 17:33:06 -07:00 |
|
Noah Levitt
|
7734399a22
|
back to a dev version number
|
2016-08-04 16:00:42 -07:00 |
|
Noah Levitt
|
57c0d84fbd
|
bump version to 1.1b4 for pypi upload
|
2016-08-04 15:55:56 -07:00 |
|
Noah Levitt
|
e62055d7d6
|
logging tweak
|
2016-08-04 15:54:05 -07:00 |
|
Noah Levitt
|
65d97caa9a
|
install brozzler.webconsole package
|
2016-07-29 12:56:10 -05:00 |
|
Noah Levitt
|
cfc18e6845
|
add docstring to _chain_chrome_messages, remove debug logging, tweak name of websock thread
|
2016-07-28 20:29:11 -05:00 |
|
Noah Levitt
|
2046ee36e0
|
add a timeout to the one post-behavior step that didn't already have one (getting a screenshot), and majorly refactored the post-behavior code to incorporate timeouts automatically into each step, and hopefully make it easier to follow
|
2016-07-28 19:59:28 -05:00 |
|
Noah Levitt
|
b2b07b79a9
|
logging tweaks
|
2016-07-28 10:19:30 -05:00 |
|
Noah Levitt
|
dd2d8c89e3
|
reduce log level of messages from chrome, since it spews stuff that looks bad but usually isn't
|
2016-07-27 18:48:13 -05:00 |
|
Noah Levitt
|
041a4970ce
|
back to a dev version number
|
2016-07-27 16:57:42 -05:00 |
|
Noah Levitt
|
d94a7c23b9
|
1.1b3 for upload to pypi
|
2016-07-27 16:53:10 -05:00 |
|
Noah Levitt
|
c4bdb6c1fd
|
pass behavior template parameters on to behavior - fixes umbra's ability to log in with parameters received from amqp
|
2016-07-26 19:47:09 -05:00 |
|
Noah Levitt
|
127002b77d
|
brozzler[easy] requires warcprox>=2.0b1
|
2016-07-21 19:14:11 -05:00 |
|
Noah Levitt
|
37bff5328b
|
look for a sensible default chromium/chrome executable
|
2016-07-19 15:57:24 -05:00 |
|
Noah Levitt
|
c902a70450
|
tweak thread names
|
2016-07-19 14:33:57 -05:00 |
|
Noah Levitt
|
ac3a71742d
|
convert domain specific rule url prefixes to our style of surt
|
2016-07-19 14:31:43 -05:00 |
|
Noah Levitt
|
7d9f019e67
|
have pywb support loading warc records from warc files still being written (look for foo.warc.gz.open)
|
2016-07-17 20:09:56 -05:00 |
|
Noah Levitt
|
b62d5a6350
|
install flash plugin for chromium
|
2016-07-13 15:23:50 -05:00 |
|
Noah Levitt
|
04e1e5277e
|
make state dumping signal handler more robust (now you can kill -QUIT a thousand times in a row without causing problems)
|
2016-07-13 14:52:05 -05:00 |
|
Noah Levitt
|
c6e6b34e82
|
handle case where websocket connection is unexpectedly closed during the post-behavior phase
|
2016-07-06 18:17:01 -05:00 |
|
Noah Levitt
|
3bf3c80720
|
implement timeout and retries to work around issue where sometimes we receive no result message after requesting outlinks
|
2016-07-06 17:54:36 -05:00 |
|
Noah Levitt
|
be58fb46f7
|
forgot to commit easy.py, add pywb.py with support for pywb rethinkdb index, and make brozzler-easy also run pywb
|
2016-07-06 14:52:00 -05:00 |
|
Noah Levitt
|
3b252002b7
|
working on brozzler-easy, single process with brozzler-worker and warcprox working together (pywb to be added)
|
2016-07-05 18:46:42 -05:00 |
|
Noah Levitt
|
1a7b94cae7
|
twirldown for site yaml on site page
|
2016-07-05 21:42:36 +00:00 |
|
Noah Levitt
|
f825e76371
|
give master a version number considered later than the one up on pypi (1.1b3.dev45 > 1.1b2)
|
2016-07-05 10:44:48 -05:00 |
|
Noah Levitt
|
0b9ce94226
|
in vagrant/ansible, install brozzler from this checkout instead of from github master
|
2016-07-01 15:45:39 -05:00 |
|
Noah Levitt
|
3e128d2b27
|
option to save list of outlinks (categorized as "accepted", "blocked" (by robots), or "rejected") per page in rethinkdb (to be used by archive-it for out-of-scope reporting)
|
2016-07-01 15:23:46 -05:00 |
|
Noah Levitt
|
01e38ea8c7
|
oops didn't mean to leave that windows-only subprocess flag
|
2016-07-01 14:07:04 -05:00 |
|
Noah Levitt
|
ad502f33da
|
remove accidentally committed playbook.retry
|
2016-06-30 17:56:56 -05:00 |
|
Noah Levitt
|
2aef00826b
|
vagrant setup (unfinished)
|
2016-06-30 17:50:11 -05:00 |
|
Noah Levitt
|
79ad57669c
|
do not send more than one SIGTERM when shutting down browser process, because on recent chromium on linux, the second sigterm abruptly ends the process, and sometimes leaves orphan subprocesses; also send TERM/KILL signals to the whole process group, another measure to avoid orphans; and adjust logging levels for captured chrome output
|
2016-06-30 17:10:27 -05:00 |
|
Noah Levitt
|
371590b578
|
command line utility brozzler-ensure-tables, creates rethinkdb tables if they don't already exist... brozzler normally creates them on demand at startup, but if multiple instances are starting up at the same time, you can end up with duplicate broken tables, so it's a good idea to use this utility when spinning up a cluster
|
2016-06-30 15:16:04 -05:00 |
|
Noah Levitt
|
9fd78fdbe8
|
implement timeout to work around issue where sometimes we receive no result message after requesting scroll to top
|
2016-06-30 11:45:19 -05:00 |
|
Noah Levitt
|
a1910fc0fe
|
avoid "AttributeError: 'ExtractorError' object has no attribute 'code'" checking for 430 (soft limit) from youtube-dl
|
2016-06-29 19:57:51 -05:00 |
|
Noah Levitt
|
79beddfc44
|
set Browser._chrome_instance=None if _chrome_instance.start() throws exception, to avoid endless loop after one failure
|
2016-06-29 19:47:25 -05:00 |
|
Noah Levitt
|
2e687b65fb
|
fix case where rethinkdb page already has claimed=True
|
2016-06-29 19:29:18 -05:00 |
|
Noah Levitt
|
ffcf26b6c9
|
undo accidentally committed change to browser startup timeout, and remove now misleading comment about browser ports (see https://github.com/internetarchive/brozzler/pull/3)
|
2016-06-29 18:53:32 -05:00 |
|
Noah Levitt
|
7431ae0eb1
|
fix bug preventing brozzler-new-site from working, add note about brozzler-new-site in readme
|
2016-06-29 18:41:45 -05:00 |
|