Add option `full_page` to `Browser.screenshot`. The default behavior
remains the same.
We get inspiration from puppeteer to capture a screenshot of the full
page:
https://github.com/GoogleChrome/puppeteer/blob/master/lib/Page.js#L898
Add option `screenshot_full_page=False` to `Browser.browse_page` to use
the new feature when capturing a page.
When capturing a page, we receive a LOT of messages from chrome.
Examining these message, we see that we can reduce them a bit to speed
up Brozzler.
We always use `Console.enable` which returns all browser console output.
Also, we always use `Runtime.enable`. Doc says:
https://chromedevtools.github.io/devtools-protocol/1-3/Runtime#method-enable
Enables reporting of execution contexts creation by means of
executionContextCreated event. When the reporting gets enabled the event
will be sent immediately for each existing execution context.
These outputs are useful when debugging but not in production.
If we disable them, we reduce the websocket traffic and improve
performance. With this PR, we enable them only when the current logging
level is `DEBUG`.
Counting the number of messages before and after the change, we see
improvements like:
https://www.gnome.org/technologies/ 220 -> 202 messages.
https://www.whitehouse.gov/issues/budget-spending/ 203 -> 189 messages
When the chrome process dies and we try to read STDIN/STDERR, we get
`ValueError: I/O operation on closed file` or
`OSError: [Errno 9] Bad file descriptor`.
We modify `readline_nonblock` method to return the buffer it read up to
this point.
If you use a JS behavior timeout smaller than 7 sec, the JS behavior
will always need 7 sec because `sleep(7)` is hard-coded there.
We make a minor addition to use `min(timeout, 7)` for sleep so it will
finish faster when using a smaller JS behavior timeout.
We used `self.headers.getheader` which no longer works. We replace it
with `self.headers.get`.
We change the code to write binary data to `self.wfile` because we get
an exception for writing str and/or None.
which encumbers the validation with additional requirements,
specifically makes it difficult to validate a subclass of `dict` because
it expects a constructor that works like dict.__init__()
see https://travis-ci.org/internetarchive/brozzler/jobs/514858838
(unroll "sudo cat /var/log/brozzler-worker.log")
2019-04-02 20:16:01,792 18595 CRITICAL BrozzlingThread:42073 brozzler.worker.BrozzlerWorker.brozzle_site(worker.py:412) unexpected exception
Traceback (most recent call last):
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 379, in brozzle_site
enable_youtube_dl=not self._skip_youtube_dl)
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 215, in brozzle_page
browser, site, page, on_screenshot, on_request)
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 292, in _browse_page
cookie_db=site.get('cookie_db'))
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/browser.py", line 341, in start
self.websock_url = self.chrome.start(**kwargs)
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/chrome.py", line 200, in start
return self._websocket_url()
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/chrome.py", line 247, in _websocket_url
raise e
Exception: chrome process died with status 1