* master:
pass behavior template parameters on to behavior - fixes umbra's ability to log in with parameters received from amqp
Changing EnvironmentError to OSError
Fix naming conventions.
Create cookie directory if it doesn't exist. Add debug messages for cookie db read/write.
Read/Write Cookie DB file when creating and stopping browser instance.
brozzler[easy] requires warcprox>=2.0b1
look for a sensible default chromium/chrome executable
tweak thread names
convert domain specific rule url prefixes to our style of surt
have pywb support loading warc records from warc files still being written (look for foo.warc.gz.open)
install flash plugin for chromium
make state dumping signal handler more robust (now you can kill -QUIT a thousand times in a row without causing problems)
handle case where websocket connection is unexpectedly closed during the post-behavior phase
implement timeout and retries to work around issue where sometimes we receive no result message after requesting outlinks
forgot to commit easy.py, add pywb.py with support for pywb rethinkdb index, and make brozzler-easy also run pywb
working on brozzler-easy, single process with brozzler-worker and warcprox working together (pywb to be added)
twirldown for site yaml on site page
give master a version number considered later than the one up on pypi (1.1b3.dev45 > 1.1b2)
in vagrant/ansible, install brozzler from this checkout instead of from github master
option to save list of outlinks (categorized as "accepted", "blocked" (by robots), or "rejected") per page in rethinkdb (to be used by archive-it for out-of-scope reporting)
oops didn't mean to leave that windows-only subprocess flag
remove accidentally committed playbook.retry
vagrant setup (unfinished)
do not send more than one SIGTERM when shutting down browser process, because on recent chromium on linux, the second sigterm abruptly ends the process, and sometimes leaves orphan subprocesses; also send TERM/KILL signals to the whole process group, another measure to avoid orphans; and adjust logging levels for captured chrome output
command line utility brozzler-ensure-tables, creates rethinkdb tables if they don't already exist... brozzler normally creates them on demand at startup, but if multiple instances are starting up at the same time, you can end up with duplicate broken tables, so it's a good idea to use this utility when spinning up a cluster
* master:
implement timeout to work around issue where sometimes we receive no result message after requesting scroll to top
avoid "AttributeError: 'ExtractorError' object has no attribute 'code'" checking for 430 (soft limit) from youtube-dl
set Browser._chrome_instance=None if _chrome_instance.start() throws exception, to avoid endless loop after one failure
fix case where rethinkdb page already has claimed=True
undo accidentally committed change to browser startup timeout, and remove now misleading comment about browser ports (see https://github.com/internetarchive/brozzler/pull/3)
fix bug preventing brozzler-new-site from working, add note about brozzler-new-site in readme
--trace level logging
* master:
to avoid infinite loops in some cases, ignore the "claimed" field in the rethinkdb table "pages", because if a page is left "claimed", it must have been because of some error... site.claimed is the real claiming mechanism
calm logging, don't print stacktrace on 430 from youtube-dl
fix buglet in creation of new least_hops on pages table
renaming scope rule "host" to "domain" to make it a less confusing, since rules apply to subdomains as well
* master:
let youtube-dl write to a temporary directory instead of /dev/null, to fix errors like this "youtube_dl.utils.DownloadError: ERROR: unable to open for writing: [Errno 13] Permission denied: '/dev/null-Frag0.part'
* master:
handle "undefined" in list of frames when extracting outlinks (fixes ARI-4988)
avoid hanging in case a page has no outlinks
fix noVNC submodule path since brozzler webconsole has moved
handle new bucket format in brozzler-webconsole
fix brozzler.svg symlink
convert command-line executables to entry_points console_scripts, best practice according to Python Packaging Authority (eases testing, etc)
make brozzler-webconsole a part of the main brozzler package, using optional "extras_require" dependencies
remove crufty docker and no-docker scripts
note python 3.4 requirement in readme