mirror of
https://github.com/internetarchive/brozzler.git
synced 2025-02-23 08:09:48 -05:00
vagrant readme fixes (thanks funkyfuture)
This commit is contained in:
parent
ffa8021968
commit
d19e139101
2
setup.py
2
setup.py
@ -32,7 +32,7 @@ def find_package_data(package):
|
||||
|
||||
setuptools.setup(
|
||||
name='brozzler',
|
||||
version='1.4.dev298',
|
||||
version='1.4.dev299',
|
||||
description='Distributed web crawling with browsers',
|
||||
url='https://github.com/internetarchive/brozzler',
|
||||
author='Noah Levitt',
|
||||
|
@ -1,15 +1,14 @@
|
||||
Single-VM Vagrant Brozzler Deployment
|
||||
-------------------------------------
|
||||
|
||||
This is a work in progress. Vagrant + ansible configuration for a single-vm
|
||||
deployment of brozzler and warcprox with dependencies (notably rethinkdb).
|
||||
This is a vagrant + ansible configuration for a single-vm deployment of
|
||||
brozzler and warcprox with dependencies (notably rethinkdb).
|
||||
|
||||
The idea is for this to be a quick way for people to get up and running with a
|
||||
deployment resembling a real distributed deployment, and to offer a starting
|
||||
configuration for people to adapt to their clusters.
|
||||
|
||||
And equally important, as a harness for integration tests. (As of now brozzler
|
||||
itself has no automated tests!)
|
||||
And equally important, as a harness for integration tests.
|
||||
|
||||
You'll need vagrant installed.
|
||||
https://www.vagrantup.com/docs/installation/
|
||||
@ -25,27 +24,27 @@ the brozzler virtualenv.
|
||||
::
|
||||
|
||||
my-laptop$ vagrant ssh
|
||||
vagrant@brozzler-easy:~$ source ~/brozzler-ve34/bin/activate
|
||||
(brozzler-ve34)vagrant@brozzler-easy:~$
|
||||
vagrant@brzl:~$ source /opt/brozzler-ve34/bin/activate
|
||||
(brozzler-ve34)vagrant@brzl:~$
|
||||
|
||||
Then you can run brozzler-new-site:
|
||||
|
||||
::
|
||||
|
||||
(brozzler-ve34)vagrant@brozzler-easy:~$ brozzler-new-site \
|
||||
--proxy=localhost:8000 http://example.com/
|
||||
(brozzler-ve34)vagrant@brzl:~$ brozzler-new-site --proxy=localhost:8000 http://example.com/
|
||||
|
||||
|
||||
Or brozzler-new-job (make sure to set the proxy to localhost:8000):
|
||||
|
||||
::
|
||||
|
||||
(brozzler-ve34)vagrant@brozzler-easy:~$ cat >job1.yml
|
||||
(brozzler-ve34)vagrant@brzl:~$ cat >job1.yml <<EOF
|
||||
id: job1
|
||||
proxy: localhost:8000 # point at warcprox for archiving
|
||||
seeds:
|
||||
- url: https://example.org/
|
||||
(brozzler-ve34)vagrant@brozzler-easy:~$ brozzler-new-job job1.yml
|
||||
- url: https://example.org/
|
||||
EOF
|
||||
(brozzler-ve34)vagrant@brzl:~$ brozzler-new-job job1.yml
|
||||
|
||||
WARC files will appear in ./warcs and brozzler, warcprox and rethinkdb logs in
|
||||
./logs (via vagrant folders syncing).
|
||||
|
Loading…
x
Reference in New Issue
Block a user