brozzler - distributed browser-based web crawler
Find a file
2014-05-23 13:34:07 -07:00
bin new helper utility to browse urls provided as command line args 2014-05-20 17:11:16 -07:00
umbra sleep for 0.5 sec before attempting to reconnect to amqp; documentation tweaks 2014-05-23 13:34:07 -07:00
.gitignore not sure why /bin/ et al were in .gitignore... replace with a couple of useful things 2014-05-20 17:06:26 -07:00
README.md Update readme 2014-01-28 00:12:33 -05:00
setup.py improve helper utilities 2014-05-20 16:44:13 -07:00

umbra

Browser automation via chrome debug protocol

Install

Install via pip from this repo.

Run

"umbra" script should be in bin/. load_url.py takes urls as arguments and puts them onto a rabbitmq queue dump_queue.py prints resources discovered by the browser and sent over the return queue.

On ubuntu, rabbitmq install with sudo apt-get install rabbitmq-server should automatically be set up for these three scripts to function on localhost ( the default amqp url ).