mirror of
https://github.com/internetarchive/brozzler.git
synced 2025-04-20 23:56:34 -04:00
Merge branch 'master' into qa
* master: avoid "Uncaught TypeError: Cannot read property 'querySelectorAll' of undefined" from outlinks script little readme fix for vagrant, static ansible inventory file, add brozzler-webconsole add info to display of jobless sites in brozzler-webconsole; fix creation of "least_hops" index on the rethinkdb table "pages" add arguments --webconsole-address --webconsole-port --pywb-address and change default ports list jobless sites on brozzler-webconsole front page run brozzler-webconsole inside brozzler-easy add section about brozzler-easy to the readme add --help to brozzler-webconsole
This commit is contained in:
commit
caadb2beff
100
README.rst
100
README.rst
@ -1,4 +1,4 @@
|
||||
.. |logo| image:: https://cdn.rawgit.com/nlevitt/brozzler/d1158ab2242815b28fe7bb066042b5b5982e4627/webconsole/static/brozzler.svg
|
||||
.. |logo| image:: https://cdn.rawgit.com/internetarchive/brozzler/1.1b5/brozzler/webconsole/static/brozzler.svg
|
||||
:width: 7%
|
||||
|
||||
brozzler |logo|
|
||||
@ -6,37 +6,76 @@ brozzler |logo|
|
||||
|
||||
"browser" \| "crawler" = "brozzler"
|
||||
|
||||
Brozzler is a distributed web crawler (爬虫) that uses a real browser
|
||||
(chrome or chromium) to fetch pages and embedded urls and to extract
|
||||
links. It also uses `youtube-dl <https://github.com/rg3/youtube-dl>`__
|
||||
to enhance media capture capabilities.
|
||||
|
||||
It is forked from https://github.com/internetarchive/umbra.
|
||||
Brozzler is a distributed web crawler (爬虫) that uses a real browser (chrome
|
||||
or chromium) to fetch pages and embedded urls and to extract links. It also
|
||||
uses `youtube-dl <https://github.com/rg3/youtube-dl>`_ to enhance media
|
||||
capture capabilities.
|
||||
|
||||
Brozzler is designed to work in conjunction with
|
||||
`warcprox <https://github.com/internetarchive/warcprox>`__ for web
|
||||
`warcprox <https://github.com/internetarchive/warcprox>`_ for web
|
||||
archiving.
|
||||
|
||||
Installation
|
||||
Requirements
|
||||
------------
|
||||
|
||||
Brozzler requires python 3.4 or later.
|
||||
- Python 3.4 or later
|
||||
- RethinkDB deployment
|
||||
- Chromium or Google Chrome browser
|
||||
|
||||
Worth noting is that the browser requires a graphical environment to run. You
|
||||
already have this on your laptop, but on a server it will probably require
|
||||
deploying some additional infrastructure (typically X11). The vagrant
|
||||
configuration in the brozzler repository (still a work in progress) has an
|
||||
example setup.
|
||||
|
||||
Getting Started
|
||||
---------------
|
||||
|
||||
The easiest way to get started with brozzler for web archiving is with
|
||||
``brozzler-easy``. Brozzler-easy runs brozzler-worker, warcprox,
|
||||
`pywb <https://github.com/ikreymer/pywb>`_, and brozzler-webconsole, configured
|
||||
to work with each other, in a single process.
|
||||
|
||||
Mac instructions:
|
||||
|
||||
::
|
||||
|
||||
# set up virtualenv if desired
|
||||
pip install brozzler
|
||||
# install and start rethinkdb
|
||||
brew install rethinkdb
|
||||
rethinkdb &>>rethinkdb.log &
|
||||
|
||||
Brozzler also requires a rethinkdb deployment.
|
||||
# install brozzler with special dependencies pywb and warcprox
|
||||
pip install brozzler[easy] # in a virtualenv if desired
|
||||
|
||||
Usage
|
||||
-----
|
||||
# queue a site to crawl
|
||||
brozzler-new-site http://example.com/
|
||||
|
||||
# or a job
|
||||
brozzler-new-job job1.yml
|
||||
|
||||
# start brozzler-easy
|
||||
brozzler-easy
|
||||
|
||||
At this point brozzler-easy will start brozzling your site. Results will be
|
||||
immediately available for playback in pywb at http://localhost:8880/brozzler/.
|
||||
|
||||
*Brozzler-easy demonstrates the full brozzler archival crawling workflow, but
|
||||
does not take advantage of brozzler's distributed nature.*
|
||||
|
||||
Installation and Usage
|
||||
----------------------
|
||||
|
||||
To install brozzler only:
|
||||
|
||||
::
|
||||
|
||||
pip install brozzler # in a virtualenv if desired
|
||||
|
||||
Launch one or more workers:
|
||||
|
||||
::
|
||||
|
||||
brozzler-worker -e chromium
|
||||
brozzler-worker
|
||||
|
||||
Submit jobs:
|
||||
|
||||
@ -44,6 +83,13 @@ Submit jobs:
|
||||
|
||||
brozzler-new-job myjob.yaml
|
||||
|
||||
Submit sites not tied to a job:
|
||||
|
||||
::
|
||||
|
||||
brozzler-new-site --proxy=localhost:8000 --enable-warcprox-features \
|
||||
--time-limit=600 http://example.com/
|
||||
|
||||
Job Configuration
|
||||
-----------------
|
||||
|
||||
@ -70,14 +116,6 @@ must be specified, everything else is optional.
|
||||
scope:
|
||||
surt: http://(org,example,
|
||||
|
||||
Submit a Site to Crawl Without Configuring a Job
|
||||
------------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
brozzler-new-site --proxy=localhost:8000 --enable-warcprox-features \
|
||||
--time-limit=600 http://example.com/
|
||||
|
||||
Brozzler Web Console
|
||||
--------------------
|
||||
|
||||
@ -95,19 +133,7 @@ To start the app, run
|
||||
|
||||
brozzler-webconsole
|
||||
|
||||
|
||||
XXX configuration stuff
|
||||
|
||||
Fonts (for decent screenshots)
|
||||
------------------------------
|
||||
|
||||
On ubuntu 14.04 trusty I installed these packages:
|
||||
|
||||
xfonts-base ttf-mscorefonts-installer fonts-arphic-bkai00mp
|
||||
fonts-arphic-bsmi00lp fonts-arphic-gbsn00lp fonts-arphic-gkai00mp
|
||||
fonts-arphic-ukai fonts-farsiweb fonts-nafees fonts-sil-abyssinica
|
||||
fonts-sil-ezra fonts-sil-padauk fonts-unfonts-extra fonts-unfonts-core
|
||||
ttf-indic-fonts fonts-thai-tlwg fonts-lklug-sinhala
|
||||
See ``brozzler-webconsole --help`` for configuration options.
|
||||
|
||||
License
|
||||
-------
|
||||
|
@ -304,11 +304,13 @@ class Browser:
|
||||
var __brzl_framesDone = new Set();
|
||||
var __brzl_compileOutlinks = function(frame) {
|
||||
__brzl_framesDone.add(frame);
|
||||
var outlinks = Array.prototype.slice.call(
|
||||
frame.document.querySelectorAll('a[href]'));
|
||||
for (var i = 0; i < frame.frames.length; i++) {
|
||||
if (frame.frames[i] && !__brzl_framesDone.has(frame.frames[i])) {
|
||||
outlinks = outlinks.concat(__brzl_compileOutlinks(frame.frames[i]));
|
||||
if (frame && frame.document) {
|
||||
var outlinks = Array.prototype.slice.call(
|
||||
frame.document.querySelectorAll('a[href]'));
|
||||
for (var i = 0; i < frame.frames.length; i++) {
|
||||
if (frame.frames[i] && !__brzl_framesDone.has(frame.frames[i])) {
|
||||
outlinks = outlinks.concat(__brzl_compileOutlinks(frame.frames[i]));
|
||||
}
|
||||
}
|
||||
}
|
||||
return outlinks;
|
||||
|
@ -1,7 +1,7 @@
|
||||
#!/usr/bin/env python
|
||||
'''
|
||||
brozzler-easy - brozzler-worker, warcprox, and pywb all working together in a
|
||||
single process
|
||||
brozzler-easy - brozzler-worker, warcprox, pywb, and brozzler-webconsole all
|
||||
working together in a single process
|
||||
|
||||
Copyright (C) 2016 Internet Archive
|
||||
|
||||
@ -27,7 +27,7 @@ try:
|
||||
import brozzler.pywb
|
||||
import wsgiref.simple_server
|
||||
import wsgiref.handlers
|
||||
import six.moves.socketserver
|
||||
import brozzler.webconsole
|
||||
except ImportError as e:
|
||||
logging.critical(
|
||||
'%s: %s\n\nYou might need to run "pip install '
|
||||
@ -44,16 +44,17 @@ import threading
|
||||
import time
|
||||
import rethinkstuff
|
||||
import traceback
|
||||
import socketserver
|
||||
|
||||
def _build_arg_parser(prog=os.path.basename(sys.argv[0])):
|
||||
arg_parser = argparse.ArgumentParser(
|
||||
prog=prog, formatter_class=argparse.ArgumentDefaultsHelpFormatter,
|
||||
description=(
|
||||
'brozzler-easy - easy deployment of brozzler, with '
|
||||
'brozzler-worker, warcprox, and pywb all running in a single '
|
||||
'process'))
|
||||
'brozzler-worker, warcprox, pywb, and brozzler-webconsole all '
|
||||
'running in a single process'))
|
||||
|
||||
# === common args ===
|
||||
# common args
|
||||
arg_parser.add_argument(
|
||||
'--rethinkdb-servers', dest='rethinkdb_servers',
|
||||
default='localhost', help=(
|
||||
@ -66,7 +67,7 @@ def _build_arg_parser(prog=os.path.basename(sys.argv[0])):
|
||||
'-d', '--warcs-dir', dest='warcs_dir', default='./warcs',
|
||||
help='where to write warcs')
|
||||
|
||||
# === warcprox args ===
|
||||
# warcprox args
|
||||
arg_parser.add_argument(
|
||||
'-c', '--cacert', dest='cacert',
|
||||
default='./%s-warcprox-ca.pem' % socket.gethostname(),
|
||||
@ -83,24 +84,42 @@ def _build_arg_parser(prog=os.path.basename(sys.argv[0])):
|
||||
'host:port of tor socks proxy, used only to connect to '
|
||||
'.onion sites'))
|
||||
|
||||
# === brozzler-worker args ===
|
||||
# brozzler-worker args
|
||||
arg_parser.add_argument(
|
||||
'-e', '--chrome-exe', dest='chrome_exe',
|
||||
default=brozzler.cli.suggest_default_chome_exe(),
|
||||
help='executable to use to invoke chrome')
|
||||
arg_parser.add_argument(
|
||||
'-n', '--max-browsers', dest='max_browsers', default='1',
|
||||
help='max number of chrome instances simultaneously browsing pages')
|
||||
'-n', '--max-browsers', dest='max_browsers',
|
||||
type=int, default=1, help=(
|
||||
'max number of chrome instances simultaneously '
|
||||
'browsing pages'))
|
||||
|
||||
# === pywb args ===
|
||||
# pywb args
|
||||
arg_parser.add_argument(
|
||||
'--pywb-port', dest='pywb_port', type=int, default=8091,
|
||||
help='pywb wayback port')
|
||||
'--pywb-address', dest='pywb_address',
|
||||
default='0.0.0.0',
|
||||
help='pywb wayback address to listen on')
|
||||
arg_parser.add_argument(
|
||||
'--pywb-port', dest='pywb_port', type=int,
|
||||
default=8880, help='pywb wayback port')
|
||||
|
||||
# === common at the bottom args ===
|
||||
# webconsole args
|
||||
arg_parser.add_argument(
|
||||
'-v', '--verbose', dest='verbose', action='store_true')
|
||||
arg_parser.add_argument('-q', '--quiet', dest='quiet', action='store_true')
|
||||
'--webconsole-address', dest='webconsole_address',
|
||||
default='localhost',
|
||||
help='brozzler web console address to listen on')
|
||||
arg_parser.add_argument(
|
||||
'--webconsole-port', dest='webconsole_port',
|
||||
type=int, default=8881, help='brozzler web console port')
|
||||
|
||||
# common at the bottom args
|
||||
arg_parser.add_argument(
|
||||
'-v', '--verbose', dest='verbose', action='store_true',
|
||||
help='verbose logging')
|
||||
arg_parser.add_argument(
|
||||
'-q', '--quiet', dest='quiet', action='store_true',
|
||||
help='quiet logging (warnings and errors only)')
|
||||
# arg_parser.add_argument(
|
||||
# '-s', '--silent', dest='log_level', action='store_const',
|
||||
# default=logging.INFO, const=logging.CRITICAL)
|
||||
@ -110,6 +129,10 @@ def _build_arg_parser(prog=os.path.basename(sys.argv[0])):
|
||||
|
||||
return arg_parser
|
||||
|
||||
class ThreadingWSGIServer(
|
||||
socketserver.ThreadingMixIn, wsgiref.simple_server.WSGIServer):
|
||||
pass
|
||||
|
||||
class BrozzlerEasyController:
|
||||
logger = logging.getLogger(__module__ + "." + __qualname__)
|
||||
|
||||
@ -120,6 +143,12 @@ class BrozzlerEasyController:
|
||||
self._warcprox_args(args))
|
||||
self.brozzler_worker = self._init_brozzler_worker(args)
|
||||
self.pywb_httpd = self._init_pywb(args)
|
||||
self.webconsole_httpd = self._init_brozzler_webconsole(args)
|
||||
|
||||
def _init_brozzler_webconsole(self, args):
|
||||
return wsgiref.simple_server.make_server(
|
||||
args.webconsole_address, args.webconsole_port,
|
||||
brozzler.webconsole.app, ThreadingWSGIServer)
|
||||
|
||||
def _init_brozzler_worker(self, args):
|
||||
r = rethinkstuff.Rethinker(
|
||||
@ -128,7 +157,7 @@ class BrozzlerEasyController:
|
||||
service_registry = rethinkstuff.ServiceRegistry(r)
|
||||
worker = brozzler.worker.BrozzlerWorker(
|
||||
frontier, service_registry,
|
||||
max_browsers=int(args.max_browsers),
|
||||
max_browsers=args.max_browsers,
|
||||
chrome_exe=args.chrome_exe,
|
||||
proxy='%s:%s' % self.warcprox_controller.proxy.server_address,
|
||||
enable_warcprox_features=True)
|
||||
@ -166,12 +195,9 @@ class BrozzlerEasyController:
|
||||
|
||||
# disable is_hop_by_hop restrictions
|
||||
wsgiref.handlers.is_hop_by_hop = lambda x: False
|
||||
class ThreadingWSGIServer(
|
||||
six.moves.socketserver.ThreadingMixIn,
|
||||
wsgiref.simple_server.WSGIServer):
|
||||
pass
|
||||
return wsgiref.simple_server.make_server(
|
||||
'', args.pywb_port, wsgi_app, ThreadingWSGIServer)
|
||||
args.pywb_address, args.pywb_port, wsgi_app,
|
||||
ThreadingWSGIServer)
|
||||
|
||||
def start(self):
|
||||
self.logger.info('starting warcprox')
|
||||
@ -185,7 +211,15 @@ class BrozzlerEasyController:
|
||||
'starting pywb at %s:%s', *self.pywb_httpd.server_address)
|
||||
threading.Thread(target=self.pywb_httpd.serve_forever).start()
|
||||
|
||||
self.logger.info(
|
||||
'starting brozzler-webconsole at %s:%s',
|
||||
*self.webconsole_httpd.server_address)
|
||||
threading.Thread(target=self.webconsole_httpd.serve_forever).start()
|
||||
|
||||
def shutdown(self):
|
||||
self.logger.info('shutting down brozzler-webconsole')
|
||||
self.webconsole_httpd.shutdown()
|
||||
|
||||
self.logger.info('shutting down brozzler-worker')
|
||||
self.brozzler_worker.shutdown_now()
|
||||
# brozzler-worker is fully shut down at this point
|
||||
|
@ -69,7 +69,7 @@ class RethinkDbFrontier:
|
||||
self.r.table("pages").index_create(
|
||||
"least_hops", [
|
||||
self.r.row["site_id"], self.r.row["brozzle_count"],
|
||||
self.r.row["hops_from_seed"]])
|
||||
self.r.row["hops_from_seed"]]).run()
|
||||
if not "jobs" in tables:
|
||||
self.logger.info(
|
||||
"creating rethinkdb table 'jobs' in database %s",
|
||||
|
@ -27,7 +27,6 @@ except ImportError as e:
|
||||
'brozzler[webconsole]".\nSee README.rst for more information.',
|
||||
type(e).__name__, e)
|
||||
sys.exit(1)
|
||||
|
||||
import rethinkstuff
|
||||
import json
|
||||
import os
|
||||
@ -56,11 +55,16 @@ SETTINGS = {
|
||||
'RETHINKDB_SERVERS', 'localhost').split(','),
|
||||
'RETHINKDB_DB': os.environ.get('RETHINKDB_DB', 'brozzler'),
|
||||
'WAYBACK_BASEURL': os.environ.get(
|
||||
'WAYBACK_BASEURL', 'http://wbgrp-svc107.us.archive.org:8091'),
|
||||
'WAYBACK_BASEURL', 'http://localhost:8091/brozzler'),
|
||||
}
|
||||
r = rethinkstuff.Rethinker(
|
||||
SETTINGS['RETHINKDB_SERVERS'], db=SETTINGS['RETHINKDB_DB'])
|
||||
service_registry = rethinkstuff.ServiceRegistry(r)
|
||||
_svc_reg = None
|
||||
def service_registry():
|
||||
global _svc_reg
|
||||
if not _svc_reg:
|
||||
_svc_reg = rethinkstuff.ServiceRegistry(r)
|
||||
return _svc_reg
|
||||
|
||||
@app.route("/api/sites/<site_id>/queued_count")
|
||||
@app.route("/api/site/<site_id>/queued_count")
|
||||
@ -149,6 +153,16 @@ def sites(job_id):
|
||||
s["cookie_db"] = base64.b64encode(s["cookie_db"]).decode("ascii")
|
||||
return flask.jsonify(sites=sites_)
|
||||
|
||||
@app.route("/api/jobless-sites")
|
||||
def jobless_sites():
|
||||
# XXX inefficient (unindexed) query
|
||||
sites_ = list(r.table("sites").filter(~r.row.has_fields("job_id")).run())
|
||||
# TypeError: <binary, 7168 bytes, '53 51 4c 69 74 65...'> is not JSON serializable
|
||||
for s in sites_:
|
||||
if "cookie_db" in s:
|
||||
s["cookie_db"] = base64.b64encode(s["cookie_db"]).decode("ascii")
|
||||
return flask.jsonify(sites=sites_)
|
||||
|
||||
@app.route("/api/jobs/<int:job_id>")
|
||||
@app.route("/api/job/<int:job_id>")
|
||||
def job(job_id):
|
||||
@ -165,12 +179,12 @@ def job_yaml(job_id):
|
||||
|
||||
@app.route("/api/workers")
|
||||
def workers():
|
||||
workers_ = service_registry.available_services("brozzler-worker")
|
||||
workers_ = service_registry().available_services("brozzler-worker")
|
||||
return flask.jsonify(workers=list(workers_))
|
||||
|
||||
@app.route("/api/services")
|
||||
def services():
|
||||
services_ = service_registry.available_services()
|
||||
services_ = service_registry().available_services()
|
||||
return flask.jsonify(services=list(services_))
|
||||
|
||||
@app.route("/api/jobs")
|
||||
@ -221,7 +235,26 @@ except ImportError:
|
||||
logging.info('running brozzler-webconsole using simple flask app.run')
|
||||
app.run()
|
||||
|
||||
if __name__ == "__main__":
|
||||
# arguments?
|
||||
def main():
|
||||
import argparse
|
||||
arg_parser = argparse.ArgumentParser(
|
||||
prog=os.path.basename(sys.argv[0]),
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
description=(
|
||||
'brozzler-webconsole - web application for viewing brozzler '
|
||||
'crawl status'),
|
||||
epilog=(
|
||||
'brozzler-webconsole has no command line options, but can be '
|
||||
'configured using the following environment variables:\n\n'
|
||||
' RETHINKDB_SERVERS rethinkdb servers, e.g. db0.foo.org,'
|
||||
'db0.foo.org:38015,db1.foo.org (default: localhost)\n'
|
||||
' RETHINKDB_DB rethinkdb database name (default: '
|
||||
'brozzler)\n'
|
||||
' WAYBACK_BASEURL base url for constructing wayback '
|
||||
'links (default http://localhost:8091/brozzler)'))
|
||||
args = arg_parser.parse_args(args=sys.argv[1:])
|
||||
run()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
|
@ -79,6 +79,9 @@ brozzlerControllers.controller("HomeController", ["$scope", "$http",
|
||||
$http.get("/api/services").success(function(data) {
|
||||
$scope.services = data.services;
|
||||
});
|
||||
$http.get("/api/jobless-sites").success(function(data) {
|
||||
$scope.joblessSites = data.sites;
|
||||
});
|
||||
}]);
|
||||
|
||||
brozzlerControllers.controller("WorkersListController", ["$scope", "$http",
|
||||
|
@ -41,7 +41,6 @@
|
||||
</div>
|
||||
|
||||
<h2>Jobs</h2>
|
||||
|
||||
<div class="row">
|
||||
<div class="col-sm-12">
|
||||
<table class="table table-striped">
|
||||
@ -66,4 +65,29 @@
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<h2>Jobless Sites</h2>
|
||||
<div class="row">
|
||||
<div class="col-sm-12">
|
||||
<table class="table table-striped">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>id</th>
|
||||
<th>status</th>
|
||||
<th>started</th>
|
||||
<th>seed url</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr ng-repeat="site in joblessSites">
|
||||
<td><a href="/sites/{{site.id}}">{{site.id}}</a></td>
|
||||
<td>{{site.status}}</td>
|
||||
<td>{{site.start_time}}</td>
|
||||
<td>{{site.seed}}</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
6
setup.py
6
setup.py
@ -32,7 +32,7 @@ def find_package_data(package):
|
||||
|
||||
setuptools.setup(
|
||||
name='brozzler',
|
||||
version='1.1b6.dev69',
|
||||
version='1.1b6.dev78',
|
||||
description='Distributed web crawling with browsers',
|
||||
url='https://github.com/internetarchive/brozzler',
|
||||
author='Noah Levitt',
|
||||
@ -51,7 +51,7 @@ setuptools.setup(
|
||||
'brozzler-new-site=brozzler.cli:brozzler_new_site',
|
||||
'brozzler-worker=brozzler.cli:brozzler_worker',
|
||||
'brozzler-ensure-tables=brozzler.cli:brozzler_ensure_tables',
|
||||
'brozzler-webconsole=brozzler.webconsole:run',
|
||||
'brozzler-webconsole=brozzler.webconsole:main',
|
||||
'brozzler-easy=brozzler.easy:main',
|
||||
],
|
||||
},
|
||||
@ -69,7 +69,7 @@ setuptools.setup(
|
||||
],
|
||||
extras_require={
|
||||
'webconsole': ['flask>=0.11', 'gunicorn'],
|
||||
'easy': ['warcprox>=2.0b1', 'pywb'],
|
||||
'easy': ['warcprox>=2.0b1', 'pywb', 'flask>=0.11', 'gunicorn'],
|
||||
},
|
||||
zip_safe=False,
|
||||
classifiers=[
|
||||
|
11
vagrant/Vagrantfile
vendored
11
vagrant/Vagrantfile
vendored
@ -1,16 +1,13 @@
|
||||
Vagrant.configure(2) do |config|
|
||||
config.vm.box = "ubuntu/trusty64"
|
||||
config.vm.hostname = "brozzler-easy"
|
||||
config.vm.define "10.9.9.9"
|
||||
config.vm.hostname = "brzl"
|
||||
config.vm.network :private_network, ip: "10.9.9.9"
|
||||
|
||||
config.vm.synced_folder "..", "/brozzler"
|
||||
|
||||
config.vm.provision "ansible" do |ansible|
|
||||
ansible.inventory_path = "ansible/hosts"
|
||||
ansible.playbook = "ansible/playbook.yml"
|
||||
ansible.groups = {
|
||||
"rethinkdb" => ["default"],
|
||||
"warcprox" => ["default"],
|
||||
"brozzler-worker" => ["default"],
|
||||
# "brozzler-webconsole" => ["default"],
|
||||
}
|
||||
end
|
||||
end
|
||||
|
16
vagrant/ansible/hosts
Normal file
16
vagrant/ansible/hosts
Normal file
@ -0,0 +1,16 @@
|
||||
ansible_ssh_private_key_file=.vagrant/machines/10.9.9.9/virtualbox/private_key
|
||||
|
||||
[rethinkdb]
|
||||
10.9.9.9
|
||||
|
||||
[warcprox]
|
||||
10.9.9.9
|
||||
|
||||
[brozzler-worker]
|
||||
10.9.9.9
|
||||
|
||||
[brozzler-webconsole]
|
||||
10.9.9.9
|
||||
|
||||
[pywb]
|
||||
10.9.9.9
|
@ -2,27 +2,27 @@
|
||||
- name: apply common configuration to all nodes
|
||||
hosts: all
|
||||
roles:
|
||||
- common
|
||||
- common
|
||||
|
||||
- name: deploy rethinkdb
|
||||
hosts: rethinkdb
|
||||
roles:
|
||||
- rethinkdb
|
||||
- rethinkdb
|
||||
|
||||
- name: deploy warcprox
|
||||
hosts: warcprox
|
||||
roles:
|
||||
- warcprox
|
||||
- warcprox
|
||||
|
||||
- name: deploy brozzler-worker
|
||||
hosts: brozzler-worker
|
||||
roles:
|
||||
- brozzler-worker
|
||||
- brozzler-worker
|
||||
|
||||
# - name: deploy brozzler-webconsole
|
||||
# hosts: brozzler-webconsole
|
||||
# roles:
|
||||
# - brozzler-webconsole
|
||||
- name: deploy brozzler-webconsole
|
||||
hosts: brozzler-webconsole
|
||||
roles:
|
||||
- brozzler-webconsole
|
||||
|
||||
# - name: deploy pywb
|
||||
# hosts: pywb
|
||||
|
@ -0,0 +1,4 @@
|
||||
---
|
||||
- name: restart brozzler-webconsole
|
||||
service: name=brozzler-webconsole state=restarted
|
||||
become: true
|
@ -1,19 +1,15 @@
|
||||
---
|
||||
- name: git clone https://github.com/internetarchive/brozzler.git
|
||||
git: repo=https://github.com/internetarchive/brozzler.git
|
||||
dest=/home/vagrant/brozzler
|
||||
- name: pip install -r requirements.txt in virtualenv
|
||||
pip: requirements=/home/vagrant/brozzler/webconsole/requirements.txt
|
||||
- name: install brozzler[webconsole] in virtualenv
|
||||
become: true
|
||||
pip: name='-e /brozzler[webconsole]'
|
||||
virtualenv=/home/vagrant/brozzler-webconsole-ve34
|
||||
virtualenv_python=python3.4
|
||||
extra_args='--no-input --upgrade --pre'
|
||||
notify:
|
||||
- restart brozzler-webconsole
|
||||
- restart brozzler-webconsole
|
||||
- name: install upstart config /etc/init/brozzler-webconsole.conf
|
||||
become: true
|
||||
template: src=templates/brozzler-webconsole.conf.j2
|
||||
dest=/etc/init/brozzler-webconsole.conf
|
||||
notify:
|
||||
- restart brozzler-webconsole
|
||||
|
||||
|
||||
- restart brozzler-webconsole
|
@ -3,19 +3,16 @@ description "brozzler-webconsole"
|
||||
start on runlevel [2345]
|
||||
stop on runlevel [!2345]
|
||||
|
||||
env PYTHONPATH=/home/vagrant/brozzler-webconsole-ve34/lib/python3.4/site-packages:/home/vagrant/brozzler/webconsole
|
||||
env PYTHONPATH=/home/vagrant/brozzler-webconsole-ve34/lib/python3.4/site-packages
|
||||
env PATH=/home/vagrant/brozzler-webconsole-ve34/bin:/usr/bin:/bin
|
||||
env LC_ALL=C.UTF-8
|
||||
|
||||
env WAYBACK_BASEURL={{base_wayback_url}}/all
|
||||
# env RETHINKDB_SERVERS={{groups['rethinkdb'] | join(',')}}
|
||||
env RETHINKDB_SERVERS=localhost
|
||||
env RETHINKDB_DB={{rethinkdb_db}}
|
||||
env WAYBACK_BASEURL=http://{{groups['pywb'][0]}}:8880/brozzler
|
||||
env RETHINKDB_SERVERS={{groups['rethinkdb'] | join(',')}}
|
||||
env RETHINKDB_DB=brozzler
|
||||
|
||||
setuid vagrant
|
||||
|
||||
# console log
|
||||
|
||||
exec gunicorn --bind=0.0.0.0:8081 brozzler-webconsole:app >&/vagrant/logs/brozzler-webconsole.log
|
||||
|
||||
|
||||
exec gunicorn --bind=0.0.0.0:8881 brozzler.webconsole:app >>/vagrant/logs/brozzler-webconsole.log 2>&1
|
||||
|
@ -19,7 +19,5 @@ stop on stopping Xvnc
|
||||
kill timeout 60
|
||||
|
||||
exec nice brozzler-worker \
|
||||
--rethinkdb-servers=localhost \
|
||||
--max-browsers=4 >>/vagrant/logs/brozzler-worker.log 2>&1
|
||||
# --rethinkdb-servers={{groups['rethinkdb'] | join(',')}} \
|
||||
|
||||
--rethinkdb-servers={{groups['rethinkdb'] | join(',')}} \
|
||||
--max-browsers=4 >>/vagrant/logs/brozzler-worker.log 2>&1
|
||||
|
@ -10,5 +10,6 @@ console log
|
||||
env PYTHONPATH=/home/vagrant/websockify-ve34/lib/python3.4/site-packages
|
||||
env PATH=/home/vagrant/websockify-ve34/bin:/usr/bin:/bin
|
||||
|
||||
# port 8901 is hard-coded in brozzler/webconsole/static/partials/workers.html
|
||||
exec nice websockify 0.0.0.0:8901 localhost:5901
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
runuser=vagrant
|
||||
# bind=0.0.0.0
|
||||
bind=0.0.0.0
|
||||
# directory=/var/lib/rethinkdb
|
||||
# log-file=/var/log/rethinkdb.log
|
||||
log-file=/vagrant/logs/rethinkdb.log # synced dir
|
||||
|
Loading…
x
Reference in New Issue
Block a user