mirror of
https://github.com/internetarchive/brozzler.git
synced 2025-05-12 11:32:16 -04:00
brozzler - distributed browser-based web crawler
bin | ||
umbra | ||
.gitignore | ||
README.md | ||
setup.py |
umbra
Browser automation via chrome debug protocol
Install
Install via pip from this repo.
Run
"umbra" script should be in bin/. load_url.py takes urls as arguments and puts them onto a rabbitmq queue dump_queue.py prints resources discovered by the browser and sent over the return queue.
On ubuntu, rabbitmq install with sudo apt-get install rabbitmq-server
should automatically
be set up for these three scripts to function on localhost ( the default amqp url ).