* master:
oops, back to dev version number
wait 20 seconds to claim sites if none were avail-
tweak logging
why did those tests fail??? (#117)
Add screenshots
Add screenshots
back to dev version
1.4 for pypi
explain --warcprox-auto briefly
vagrant readme fixes (thanks funkyfuture)
update cryptography dep version
* max-claimed-sites:
new job setting max_claimed_sites
move time limit enforcement
Invalid syntax in WebsockReceiverThread._javascript_dialog_open
Make Browser._wait_for sleep time a varible
Send more compact JSON to browser
Remove google safebrowsing flags
try to get chromium 64? (#92)
Add chromium CLI flags to improve capture performance
We've been seeing some of this:
2018-02-14 20:16:44,011 17816 CRITICAL BrozzlingThread:36444 brozzler.worker.BrozzlerWorker.brozzle_site(worker.py:559) unexpected exception
Traceback (most recent call last):
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 528, in brozzle_site
enable_youtube_dl=not self._skip_youtube_dl)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 385, in brozzle_page
on_request)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 447, in _browse_page
cookie_db=site.get('cookie_db'))
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 338, in start
self._wait_for(lambda: self.websock_thread.is_open, timeout=10)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 311, in _wait_for
elapsed, callback))
brozzler.browser.BrowsingTimeout: timed out after 11.1s waiting for: <function Browser.start.<locals>.<lambda> at 0x7fb2dc772bd8>
Mostly at startup. Now that brozzler claims sites in batches for
brozzling, we have situations where we start up a whole bunch of
browsers at the same time. That's probably why in some cases they are
slow to establish the websocket connection.
We've been seeing a lot of this:
2018-02-14 20:06:01,472 13286 CRITICAL BrozzlingThread:44789 brozzler.worker.BrozzlerWorker.brozzle_site(worker.py:559) unexpected exception
Traceback (most recent call last):
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 528, in brozzle_site
enable_youtube_dl=not self._skip_youtube_dl)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 385, in brozzle_page
on_request)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 459, in _browse_page
behavior_timeout=self._behavior_timeout)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 463, in browse_page
jpeg_bytes = self.screenshot()
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 565, in screenshot
timeout=timeout)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 311, in _wait_for
elapsed, callback))
brozzler.browser.BrowsingTimeout: timed out after 90.5s waiting for: <function Browser.screenshot.<locals>.<lambda> at 0x7f5ab0076a68>
Browser bug, maybe? To work around it, reduce timeout to 45 seconds, try
getting screenshot 3 times, and if it fails proceed anyway, don't queue
the page for recrawling.
* master:
fix attempt for deadlock-ish situation
fix unclosed file warnings when running python in debug mode
give vagrant vm a name in virtualbox
add note to readme about browser version
check browser version at startup
Reinstate logging
Fix typo and block legacy google-analytics.com/ga.js
Use Network.setBlockedUrls instead of Debugger to block URLs
bump dev version after PR merge
back to dev version number
commit for beta release
this should fix travis build?
fix tests
update brozzler-easy for current warcprox api
claim sites to brozzle in batches to reduce contention over sites table
lengthen site session brozzling time to 15 minutes
fix needs_browsing check
new test test_needs_browsing
increase timeout waiting for screenshot
Use TCP_NODELAY in websocket connection to improve performance
Brozzler has hard-coded the JS templates logic in ``brozzler/behaviors.yaml``
and ``brozzler/js-templates/`` locations. With this change, you can use
the optional ``behaviors_dir`` ``browser.browse_page`` parameter to set a
custom location and use any potential JS behaviors.