brozzler - distributed browser-based web crawler
Find a file
2014-02-12 12:17:41 -08:00
bin Replace js evaluation with direct page navigation, add default for dump_queue 2014-01-28 00:10:31 -05:00
umbra formatting change only - indent with 4 spaces 2014-02-10 20:45:18 -08:00
.gitignore Some refactor/testing and utility scripts 2014-01-22 18:03:02 +00:00
README.md Update readme 2014-01-28 00:12:33 -05:00
setup.py specify classifier 'Programming Language :: Python :: 3.3' since websocket-client-py3 requires python 3.3, doesn't work with 3.2 2014-02-12 12:17:41 -08:00

umbra

Browser automation via chrome debug protocol

Install

Install via pip from this repo.

Run

"umbra" script should be in bin/. load_url.py takes urls as arguments and puts them onto a rabbitmq queue dump_queue.py prints resources discovered by the browser and sent over the return queue.

On ubuntu, rabbitmq install with sudo apt-get install rabbitmq-server should automatically be set up for these three scripts to function on localhost ( the default amqp url ).