add some info to the readme

This commit is contained in:
Noah Levitt 2015-07-20 12:00:14 -07:00
parent 2f28f00a09
commit dc04048d50

View file

@ -1,12 +1,34 @@
brozzler brozzler
======== ========
"browser" ^ "crawler" = "brozzler" "browser" | "crawler" = "brozzler"
Brozzler is a distributed web crawler that uses a real browser (chrome or Brozzler is a distributed web crawler that uses a real browser (chrome or
chromium) to fetch pages and embedded urls and to extract links. chromium) to fetch pages and embedded urls and to extract links. It also
uses [youtube-dl](https://github.com/rg3/youtube-dl) to enhance media capture
capabilities.
It is forked from https://github.com/internetarchive/umbra. It is forked from https://github.com/internetarchive/umbra.
Brozzler is designed to work in conjunction with
[warcprox](https://github.com/internetarchive/warcprox) for web archiving.
Installation
------------
```
git clone https://github.com/nlevitt/brozzler
cd brozzler
# set up virtualenv if desired
pip install -r requirements.txt .
```
Brozzler also requires a rabbitmq server.
Fonts for good screenshots
--------------------------
On ubuntu 14.04 trusty I installed these packages:
xfonts-base ttf-mscorefonts-installer fonts-arphic-bkai00mp fonts-arphic-bsmi00lp fonts-arphic-gbsn00lp fonts-arphic-gkai00mp fonts-arphic-ukai fonts-farsiweb fonts-nafees fonts-sil-abyssinica fonts-sil-ezra fonts-sil-padauk fonts-unfonts-
extra fonts-unfonts-core ttf-indic-fonts fonts-thai-tlwg fonts-lklug-sinhala
License License
------- -------