Every time we run a JS behavior, we load a Jinja2 template.
By default, Jinja2 has option `auto_reload=True`. This mean that
every time a template is requested the loader checks if the source file changed
and if yes, it will reload the template. For higher performance it’s possible
to disable that.
Also note that Jinja caches 400 templates by default.
Ref: http://jinja.pocoo.org/docs/2.10/api/
In Brozzler, we don't make changes to JS templates while the system is
running. So, there is no point in having auto_reload=True.
We've been seeing some of this:
2018-02-14 20:16:44,011 17816 CRITICAL BrozzlingThread:36444 brozzler.worker.BrozzlerWorker.brozzle_site(worker.py:559) unexpected exception
Traceback (most recent call last):
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 528, in brozzle_site
enable_youtube_dl=not self._skip_youtube_dl)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 385, in brozzle_page
on_request)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 447, in _browse_page
cookie_db=site.get('cookie_db'))
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 338, in start
self._wait_for(lambda: self.websock_thread.is_open, timeout=10)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 311, in _wait_for
elapsed, callback))
brozzler.browser.BrowsingTimeout: timed out after 11.1s waiting for: <function Browser.start.<locals>.<lambda> at 0x7fb2dc772bd8>
Mostly at startup. Now that brozzler claims sites in batches for
brozzling, we have situations where we start up a whole bunch of
browsers at the same time. That's probably why in some cases they are
slow to establish the websocket connection.
We've been seeing a lot of this:
2018-02-14 20:06:01,472 13286 CRITICAL BrozzlingThread:44789 brozzler.worker.BrozzlerWorker.brozzle_site(worker.py:559) unexpected exception
Traceback (most recent call last):
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 528, in brozzle_site
enable_youtube_dl=not self._skip_youtube_dl)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 385, in brozzle_page
on_request)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/worker.py", line 459, in _browse_page
behavior_timeout=self._behavior_timeout)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 463, in browse_page
jpeg_bytes = self.screenshot()
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 565, in screenshot
timeout=timeout)
File "/opt/brozzler-ve3/lib/python3.5/site-packages/brozzler/browser.py", line 311, in _wait_for
elapsed, callback))
brozzler.browser.BrowsingTimeout: timed out after 90.5s waiting for: <function Browser.screenshot.<locals>.<lambda> at 0x7f5ab0076a68>
Browser bug, maybe? To work around it, reduce timeout to 45 seconds, try
getting screenshot 3 times, and if it fails proceed anyway, don't queue
the page for recrawling.
``--disable-background-timer-throttling`` and ``--disable-renderer-backgrounding``:
karma JS test runner uses these to improve chrome performance
https://github.com/karma-runner/karma-chrome-launcher/issues/123
``--disable-hang-monitor``: Suppresses hang monitor dialogs in renderer
processes. This may allow slow unload handlers on a page to prevent the
tab from closing, but the Task Manager can be used to terminate the
offending process in this case.
``--mute-audio``: obvious.
The following are part of google safe browsing features:
``--disable-client-side-phishing-detection``
``--safebrowsing-disable-auto-update``
``--safebrowsing-disable-download-protection``
Reference: https://peter.sh/experiments/chromium-command-line-switches/
* master:
back to dev version number
commit for beta release
this should fix travis build?
fix tests
update brozzler-easy for current warcprox api
simpleclicks for minutes PDF
avoids this kind of error:
wbgrp-svc294 2018-01-19 21:04:43,973 648 ERROR BrozzlingThread:39295 youtube_dl.to_stderr(YoutubeDL.py:514) ERROR: Unable to download webpage: <urlopen error no host given> (caused by URLError('no host given',))
wbgrp-svc294 2018-01-19 21:04:43,973 648 ERROR BrozzlingThread:39295 root.brozzle_site(worker.py:521) proxy error (site.proxy=wbgrp-svc400.us.archive.org:8002), will try to choose a healthy instance next time site is brozzled: youtube-dl hit apparent proxy error from https:/www.laphil.com/press1718
--warcprox-auto distribute assigned sites evenly
When running with --warcprox-auto, choose the instance of warcprox with
the least number of assigned sites, instead of the lowest load in the
service registry. In practice we often start brozzling a whole bunch of
sites at approximately the same time, and because it takes time for that
to affect the "load" reported by warcprox instances, sites end up being
distributed very unevenly.