brozzler/vagrant
2016-09-29 12:03:16 -07:00
..
ansible add missing rethinkdb config file to ansible config 2016-09-22 01:45:28 +01:00
README.rst vagrant setup (unfinished) 2016-06-30 17:50:11 -05:00
run-tests.sh starting to create a framework for testing 2016-09-14 17:06:49 -07:00
vagrant-brozzler-new-job.py starting on documenting job configuration 2016-09-29 12:03:16 -07:00
vagrant-brozzler-new-site.py starting on documenting job configuration 2016-09-29 12:03:16 -07:00
Vagrantfile for vagrant, static ansible inventory file, add brozzler-webconsole 2016-08-10 18:41:23 -07:00

Single-VM Vagrant Brozzler Deployment
-------------------------------------

This is a work in progress. Vagrant + ansible configuration for a single-vm
deployment of brozzler and warcprox with dependencies (notably rethinkdb).

The idea is for this to be a quick way for people to get up and running with a
deployment resembling a real distributed deployment, and to offer a starting
configuration for people to adapt to their clusters.

And equally important, as a harness for integration tests. (As of now brozzler
itself has no automated tests!)

You'll need vagrant installed.
https://www.vagrantup.com/docs/installation/
Then run:

::

    my-laptop$ vagrant up

Currently to start a crawl you first need to ssh to the vagrant vm and activate
the brozzler virtualenv.

::

    my-laptop$ vagrant ssh
    vagrant@brozzler-easy:~$ source ~/brozzler-ve34/bin/activate
    (brozzler-ve34)vagrant@brozzler-easy:~$

Then you can run brozzler-new-site:

::

    (brozzler-ve34)vagrant@brozzler-easy:~$ brozzler-new-site \
           --proxy=localhost:8000 --enable-warcprox-features \
           http://example.com/


Or brozzler-new-job (make sure to set the proxy to localhost:8000):

::

    (brozzler-ve34)vagrant@brozzler-easy:~$ cat >job1.yml
    id: job1
    proxy: localhost:8000 # point at warcprox for archiving
    enable_warcprox_features: true
    seeds:
      - url: https://example.org/
    (brozzler-ve34)vagrant@brozzler-easy:~$ brozzler-new-job job1.yml

WARC files will appear in ./warcs and brozzler, warcprox and rethinkdb logs in
./logs (via vagrant folders syncing).

You can also look at the rethinkdb console by opening http://localhost:8080 in
your browser after opening an ssh tunnel like so:

::

    my-laptop$ vagrant ssh -- -fN -Llocalhost:8080:localhost:8080