mirror of https://github.com/internetarchive/brozzler.git synced 2025-12-20 10:55:22 -05:00

brozzler - distributed browser-based web crawler

Find a file

Noah Levitt 8749b97811 oops, check in browser.py		2014-05-20 03:10:33 -07:00
bin	clean shutdown without draining entire amqp queue (only consume urls from amqp when browser activity isn't saturated)	2014-05-20 03:02:48 -07:00
umbra	oops, check in browser.py	2014-05-20 03:10:33 -07:00
.gitignore	Some refactor/testing and utility scripts	2014-01-22 18:03:02 +00:00
README.md	Update readme	2014-01-28 00:12:33 -05:00
setup.py	refactor umbra.py into controller.py and browser.py, improve class names	2014-05-20 02:42:40 -07:00

README.md

umbra

Browser automation via chrome debug protocol

Install

Install via pip from this repo.

Run

"umbra" script should be in bin/. load_url.py takes urls as arguments and puts them onto a rabbitmq queue dump_queue.py prints resources discovered by the browser and sent over the return queue.

On ubuntu, rabbitmq install with sudo apt-get install rabbitmq-server should automatically be set up for these three scripts to function on localhost ( the default amqp url ).