Commit graph

791 commits

Author SHA1 Message Date
Noah Levitt
4fa1571bc5 run brozzler-webconsole inside brozzler-easy 2016-08-08 17:43:38 -07:00
Noah Levitt
531b26aabb add section about brozzler-easy to the readme 2016-08-05 18:28:30 -07:00
Noah Levitt
c04bf85f4e add --help to brozzler-webconsole 2016-08-05 18:19:15 -07:00
Noah Levitt
10c90431e6 Merge branch 'master' into qa
* master:
  fix exception happening now that we have binary data in rethinkdb (the cookie db) "TypeError: <binary, 7168 bytes, '53 51 4c 69 74 65...'> is not JSON serializable"
  dev version number again
  another version for pypi
  avoid "Uncaught RangeError: Maximum call stack size exceeded" compiling outlinks
  back to a dev version number
  bump version to 1.1b4 for pypi upload
  logging tweak
  install brozzler.webconsole package
2016-08-05 17:13:09 -07:00
Noah Levitt
ba6b342e28 fix exception happening now that we have binary data in rethinkdb (the cookie db) "TypeError: <binary, 7168 bytes, '53 51 4c 69 74 65...'> is not JSON serializable" 2016-08-05 17:12:22 -07:00
Noah Levitt
a211cc0514 dev version number again 2016-08-04 17:34:58 -07:00
Noah Levitt
ae63369c3c another version for pypi 2016-08-04 17:33:47 -07:00
Noah Levitt
20f9934dd9 avoid "Uncaught RangeError: Maximum call stack size exceeded" compiling outlinks 2016-08-04 17:33:06 -07:00
Noah Levitt
7734399a22 back to a dev version number 2016-08-04 16:00:42 -07:00
Noah Levitt
57c0d84fbd bump version to 1.1b4 for pypi upload 2016-08-04 15:55:56 -07:00
Noah Levitt
e62055d7d6 logging tweak 2016-08-04 15:54:05 -07:00
Noah Levitt
65d97caa9a install brozzler.webconsole package 2016-07-29 12:56:10 -05:00
Noah Levitt
feff7a8d05 Merge branch 'master' into qa
* master:
  add docstring to _chain_chrome_messages, remove debug logging, tweak name of websock thread
  add a timeout to the one post-behavior step that didn't already have one (getting a screenshot), and majorly refactored the post-behavior code to incorporate timeouts automatically into each step, and hopefully make it easier to follow
  logging tweaks
  reduce log level of messages from chrome, since it spews stuff that looks bad but usually isn't
  back to a dev version number
  1.1b3 for upload to pypi
2016-07-28 20:32:33 -05:00
Noah Levitt
cfc18e6845 add docstring to _chain_chrome_messages, remove debug logging, tweak name of websock thread 2016-07-28 20:29:11 -05:00
Noah Levitt
2046ee36e0 add a timeout to the one post-behavior step that didn't already have one (getting a screenshot), and majorly refactored the post-behavior code to incorporate timeouts automatically into each step, and hopefully make it easier to follow 2016-07-28 19:59:28 -05:00
Noah Levitt
b2b07b79a9 logging tweaks 2016-07-28 10:19:30 -05:00
Noah Levitt
dd2d8c89e3 reduce log level of messages from chrome, since it spews stuff that looks bad but usually isn't 2016-07-27 18:48:13 -05:00
Noah Levitt
041a4970ce back to a dev version number 2016-07-27 16:57:42 -05:00
Noah Levitt
d94a7c23b9 1.1b3 for upload to pypi 2016-07-27 16:53:10 -05:00
Noah Levitt
fdc2f87a0e Merge branch 'master' into qa
* master:
  pass behavior template parameters on to behavior - fixes umbra's ability to log in with parameters received from amqp
  Changing EnvironmentError to OSError
  Fix naming conventions.
  Create cookie directory if it doesn't exist. Add debug messages for cookie db read/write.
  Read/Write Cookie DB file when creating and stopping browser instance.
  brozzler[easy] requires warcprox>=2.0b1
  look for a sensible default chromium/chrome executable
  tweak thread names
  convert domain specific rule url prefixes to our style of surt
  have pywb support loading warc records from warc files still being written (look for foo.warc.gz.open)
  install flash plugin for chromium
  make state dumping signal handler more robust (now you can kill -QUIT a thousand times in a row without causing problems)
  handle case where websocket connection is unexpectedly closed during the post-behavior phase
  implement timeout and retries to work around issue where sometimes we receive no result message after requesting outlinks
  forgot to commit easy.py, add pywb.py with support for pywb rethinkdb index, and make brozzler-easy also run pywb
  working on brozzler-easy, single process with brozzler-worker and warcprox working together (pywb to be added)
  twirldown for site yaml on site page
  give master a version number considered later than the one up on pypi (1.1b3.dev45 > 1.1b2)
  in vagrant/ansible, install brozzler from this checkout instead of from github master
  option to save list of outlinks (categorized as "accepted", "blocked" (by robots), or "rejected") per page in rethinkdb (to be used by archive-it for out-of-scope reporting)
  oops didn't mean to leave that windows-only subprocess flag
  remove accidentally committed playbook.retry
  vagrant setup (unfinished)
  do not send more than one SIGTERM when shutting down browser process, because on recent chromium on linux, the second sigterm abruptly ends the process, and sometimes leaves orphan subprocesses; also send TERM/KILL signals to the whole process group, another measure to avoid orphans; and adjust logging levels for captured chrome output
  command line utility brozzler-ensure-tables, creates rethinkdb tables if they don't already exist... brozzler normally creates them on demand at startup, but if multiple instances are starting up at the same time, you can end up with duplicate broken tables, so it's a good idea to use this utility when spinning up a cluster
2016-07-26 19:47:50 -05:00
Noah Levitt
c4bdb6c1fd pass behavior template parameters on to behavior - fixes umbra's ability to log in with parameters received from amqp 2016-07-26 19:47:09 -05:00
Noah Levitt
c685a4432c Merge pull request #9 from internetarchive/AITFIVE-841
Aitfive 841
2016-07-26 11:28:32 -05:00
Adam Miller
c2dc2fee2a Changing EnvironmentError to OSError 2016-07-26 00:46:16 +00:00
Adam Miller
77dabd4057 Fix naming conventions. 2016-07-26 00:39:50 +00:00
Adam Miller
2029964a74 Create cookie directory if it doesn't exist. Add debug messages for cookie db read/write. 2016-07-25 23:36:14 +00:00
Adam Miller
1cb6653fab Read/Write Cookie DB file when creating and stopping browser instance. 2016-07-22 00:22:28 +00:00
Noah Levitt
127002b77d brozzler[easy] requires warcprox>=2.0b1 2016-07-21 19:14:11 -05:00
Noah Levitt
37bff5328b look for a sensible default chromium/chrome executable 2016-07-19 15:57:24 -05:00
Noah Levitt
c902a70450 tweak thread names 2016-07-19 14:33:57 -05:00
Noah Levitt
ac3a71742d convert domain specific rule url prefixes to our style of surt 2016-07-19 14:31:43 -05:00
Noah Levitt
7d9f019e67 have pywb support loading warc records from warc files still being written (look for foo.warc.gz.open) 2016-07-17 20:09:56 -05:00
Noah Levitt
b62d5a6350 install flash plugin for chromium 2016-07-13 15:23:50 -05:00
Noah Levitt
04e1e5277e make state dumping signal handler more robust (now you can kill -QUIT a thousand times in a row without causing problems) 2016-07-13 14:52:05 -05:00
Noah Levitt
c6e6b34e82 handle case where websocket connection is unexpectedly closed during the post-behavior phase 2016-07-06 18:17:01 -05:00
Noah Levitt
3bf3c80720 implement timeout and retries to work around issue where sometimes we receive no result message after requesting outlinks 2016-07-06 17:54:36 -05:00
Noah Levitt
be58fb46f7 forgot to commit easy.py, add pywb.py with support for pywb rethinkdb index, and make brozzler-easy also run pywb 2016-07-06 14:52:00 -05:00
Noah Levitt
3b252002b7 working on brozzler-easy, single process with brozzler-worker and warcprox working together (pywb to be added) 2016-07-05 18:46:42 -05:00
Noah Levitt
1a7b94cae7 twirldown for site yaml on site page 2016-07-05 21:42:36 +00:00
Noah Levitt
f825e76371 give master a version number considered later than the one up on pypi (1.1b3.dev45 > 1.1b2) 2016-07-05 10:44:48 -05:00
Noah Levitt
0b9ce94226 in vagrant/ansible, install brozzler from this checkout instead of from github master 2016-07-01 15:45:39 -05:00
Noah Levitt
3e128d2b27 option to save list of outlinks (categorized as "accepted", "blocked" (by robots), or "rejected") per page in rethinkdb (to be used by archive-it for out-of-scope reporting) 2016-07-01 15:23:46 -05:00
Noah Levitt
01e38ea8c7 oops didn't mean to leave that windows-only subprocess flag 2016-07-01 14:07:04 -05:00
Noah Levitt
ad502f33da remove accidentally committed playbook.retry 2016-06-30 17:56:56 -05:00
Noah Levitt
2aef00826b vagrant setup (unfinished) 2016-06-30 17:50:11 -05:00
Noah Levitt
79ad57669c do not send more than one SIGTERM when shutting down browser process, because on recent chromium on linux, the second sigterm abruptly ends the process, and sometimes leaves orphan subprocesses; also send TERM/KILL signals to the whole process group, another measure to avoid orphans; and adjust logging levels for captured chrome output 2016-06-30 17:10:27 -05:00
Noah Levitt
371590b578 command line utility brozzler-ensure-tables, creates rethinkdb tables if they don't already exist... brozzler normally creates them on demand at startup, but if multiple instances are starting up at the same time, you can end up with duplicate broken tables, so it's a good idea to use this utility when spinning up a cluster 2016-06-30 15:16:04 -05:00
Noah Levitt
d82feb14da Merge branch 'master' into qa
* master:
  implement timeout to work around issue where sometimes we receive no result message after requesting scroll to top
  avoid "AttributeError: 'ExtractorError' object has no attribute 'code'" checking for 430 (soft limit) from youtube-dl
  set Browser._chrome_instance=None if _chrome_instance.start() throws exception, to avoid endless loop after one failure
  fix case where rethinkdb page already has claimed=True
  undo accidentally committed change to browser startup timeout, and remove now misleading comment about browser ports (see https://github.com/internetarchive/brozzler/pull/3)
  fix bug preventing brozzler-new-site from working, add note about brozzler-new-site in readme
  --trace level logging
2016-06-30 11:46:31 -05:00
Noah Levitt
9fd78fdbe8 implement timeout to work around issue where sometimes we receive no result message after requesting scroll to top 2016-06-30 11:45:19 -05:00
Noah Levitt
a1910fc0fe avoid "AttributeError: 'ExtractorError' object has no attribute 'code'" checking for 430 (soft limit) from youtube-dl 2016-06-29 19:57:51 -05:00
Noah Levitt
79beddfc44 set Browser._chrome_instance=None if _chrome_instance.start() throws exception, to avoid endless loop after one failure 2016-06-29 19:47:25 -05:00