From 771d6aa626c6076cc1f5ed5274af57d8ee72e656 Mon Sep 17 00:00:00 2001 From: Noah Levitt Date: Mon, 23 Jul 2018 19:05:49 -0500 Subject: [PATCH] more readme edits --- README.rst | 48 +++++++++++++----------------------------------- 1 file changed, 13 insertions(+), 35 deletions(-) diff --git a/README.rst b/README.rst index 27d25e9..2f1e7b6 100644 --- a/README.rst +++ b/README.rst @@ -11,10 +11,10 @@ Brozzler is a distributed web crawler (爬虫) that uses a real browser (Chrome or Chromium) to fetch pages and embedded URLs and to extract links. It employs `youtube-dl `_ to enhance media capture -capabilities, warcprox to write content to Web ARChive (WARC) files, `rethinkdb -`_ to index captured URLs, a native -dashboard for crawl job monitoring, and a customized Python Wayback interface -for archival replay. +capabilities and `rethinkdb `_ to +manage crawl state. + +Brozzler is designed to work in conjuction with warcprox for web archiving. Requirements ------------ @@ -24,10 +24,12 @@ Requirements - Chromium or Google Chrome >= version 64 Note: The browser requires a graphical environment to run. When brozzler is run -on a server, this may require deploying some additional infrastructure -(typically X11; Xvfb does not support screenshots, however Xvnc4 from package -vnc4server, does). The `vagrant configuration `_ in the brozzler -repository (still a work in progress) has an example setup. +on a server, this may require deploying some additional infrastructure, +typically X11. Xvnc4 and Xvfb are X11 variants that are suitable for use on a +server, because they don't display anything to a physical screen. The `vagrant +configuration `_ in the brozzler repository has an example setup +using Xvnc4. (When last tested, chromium on Xvfb did not support screenshots, +so Xvnc4 is preferred at this time.) Getting Started --------------- @@ -168,35 +170,11 @@ Run pywb like so: Then browse http://localhost:8880/brozzler/. - Headless Chrome (experimental) --------------------------------- +------------------------------ -`Headless Chromium -`_ -is now available in stable Chrome releases for 64-bit Linux and may be used to -run the browser without a visible window or X11. - -To try this out, create a wrapper script like ~/bin/chrome-headless.sh: - -:: - - #!/bin/bash - exec /opt/google/chrome/chrome --headless --disable-gpu "$@" - -Run brozzler passing the path to the wrapper script as the ``--chrome-exe`` -option: - -:: - - chmod +x ~/bin/chrome-headless.sh - brozzler-worker --chrome-exe ~/bin/chrome-headless.sh - -Beware: Chrome's headless mode is still very new and has `unresolved issues -`_. -Its use with brozzler has not yet been extensively tested. You may experience -hangs or crashes with some types of content. For the moment we recommend using -Chrome's regular mode instead. +Brozzler is known to work nominally with Chrome/Chromium in headless mode, but +this has not yet been extensively tested. License -------