mirror of
https://github.com/internetarchive/brozzler.git
synced 2025-06-21 13:24:20 -04:00
add some info to the readme
This commit is contained in:
parent
2f28f00a09
commit
dc04048d50
1 changed files with 25 additions and 3 deletions
28
README.md
28
README.md
|
@ -1,12 +1,34 @@
|
||||||
brozzler
|
brozzler
|
||||||
========
|
========
|
||||||
"browser" ^ "crawler" = "brozzler"
|
"browser" | "crawler" = "brozzler"
|
||||||
|
|
||||||
Brozzler is a distributed web crawler that uses a real browser (chrome or
|
Brozzler is a distributed web crawler that uses a real browser (chrome or
|
||||||
chromium) to fetch pages and embedded urls and to extract links.
|
chromium) to fetch pages and embedded urls and to extract links. It also
|
||||||
|
uses [youtube-dl](https://github.com/rg3/youtube-dl) to enhance media capture
|
||||||
|
capabilities.
|
||||||
|
|
||||||
It is forked from https://github.com/internetarchive/umbra.
|
It is forked from https://github.com/internetarchive/umbra.
|
||||||
|
|
||||||
|
Brozzler is designed to work in conjunction with
|
||||||
|
[warcprox](https://github.com/internetarchive/warcprox) for web archiving.
|
||||||
|
|
||||||
|
Installation
|
||||||
|
------------
|
||||||
|
```
|
||||||
|
git clone https://github.com/nlevitt/brozzler
|
||||||
|
cd brozzler
|
||||||
|
# set up virtualenv if desired
|
||||||
|
pip install -r requirements.txt .
|
||||||
|
```
|
||||||
|
Brozzler also requires a rabbitmq server.
|
||||||
|
|
||||||
|
Fonts for good screenshots
|
||||||
|
--------------------------
|
||||||
|
On ubuntu 14.04 trusty I installed these packages:
|
||||||
|
|
||||||
|
xfonts-base ttf-mscorefonts-installer fonts-arphic-bkai00mp fonts-arphic-bsmi00lp fonts-arphic-gbsn00lp fonts-arphic-gkai00mp fonts-arphic-ukai fonts-farsiweb fonts-nafees fonts-sil-abyssinica fonts-sil-ezra fonts-sil-padauk fonts-unfonts-
|
||||||
|
extra fonts-unfonts-core ttf-indic-fonts fonts-thai-tlwg fonts-lklug-sinhala
|
||||||
|
|
||||||
License
|
License
|
||||||
-------
|
-------
|
||||||
|
|
||||||
|
@ -15,7 +37,7 @@ Copyright 2015 Internet Archive
|
||||||
Licensed under the Apache License, Version 2.0 (the "License");
|
Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
you may not use this software except in compliance with the License.
|
you may not use this software except in compliance with the License.
|
||||||
You may obtain a copy of the License at
|
You may obtain a copy of the License at
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue