Merge branch 'master' into qa

* master:
  avoid "Uncaught TypeError: Cannot read property 'querySelectorAll' of undefined" from outlinks script
  little readme fix
  for vagrant, static ansible inventory file, add brozzler-webconsole
  add info to display of jobless sites in brozzler-webconsole; fix creation of "least_hops" index on the rethinkdb table "pages"
  add arguments --webconsole-address --webconsole-port --pywb-address and change default ports
  list jobless sites on brozzler-webconsole front page
  run brozzler-webconsole inside brozzler-easy
  add section about brozzler-easy to the readme
  add --help to brozzler-webconsole
This commit is contained in:
Noah Levitt 2016-08-29 09:59:55 -07:00
commit caadb2beff
17 changed files with 244 additions and 113 deletions

View file

@ -1,4 +1,4 @@
.. |logo| image:: https://cdn.rawgit.com/nlevitt/brozzler/d1158ab2242815b28fe7bb066042b5b5982e4627/webconsole/static/brozzler.svg .. |logo| image:: https://cdn.rawgit.com/internetarchive/brozzler/1.1b5/brozzler/webconsole/static/brozzler.svg
:width: 7% :width: 7%
brozzler |logo| brozzler |logo|
@ -6,37 +6,76 @@ brozzler |logo|
"browser" \| "crawler" = "brozzler" "browser" \| "crawler" = "brozzler"
Brozzler is a distributed web crawler (爬虫) that uses a real browser Brozzler is a distributed web crawler (爬虫) that uses a real browser (chrome
(chrome or chromium) to fetch pages and embedded urls and to extract or chromium) to fetch pages and embedded urls and to extract links. It also
links. It also uses `youtube-dl <https://github.com/rg3/youtube-dl>`__ uses `youtube-dl <https://github.com/rg3/youtube-dl>`_ to enhance media
to enhance media capture capabilities. capture capabilities.
It is forked from https://github.com/internetarchive/umbra.
Brozzler is designed to work in conjunction with Brozzler is designed to work in conjunction with
`warcprox <https://github.com/internetarchive/warcprox>`__ for web `warcprox <https://github.com/internetarchive/warcprox>`_ for web
archiving. archiving.
Installation Requirements
------------ ------------
Brozzler requires python 3.4 or later. - Python 3.4 or later
- RethinkDB deployment
- Chromium or Google Chrome browser
Worth noting is that the browser requires a graphical environment to run. You
already have this on your laptop, but on a server it will probably require
deploying some additional infrastructure (typically X11). The vagrant
configuration in the brozzler repository (still a work in progress) has an
example setup.
Getting Started
---------------
The easiest way to get started with brozzler for web archiving is with
``brozzler-easy``. Brozzler-easy runs brozzler-worker, warcprox,
`pywb <https://github.com/ikreymer/pywb>`_, and brozzler-webconsole, configured
to work with each other, in a single process.
Mac instructions:
:: ::
# set up virtualenv if desired # install and start rethinkdb
pip install brozzler brew install rethinkdb
rethinkdb &>>rethinkdb.log &
Brozzler also requires a rethinkdb deployment. # install brozzler with special dependencies pywb and warcprox
pip install brozzler[easy] # in a virtualenv if desired
Usage # queue a site to crawl
----- brozzler-new-site http://example.com/
# or a job
brozzler-new-job job1.yml
# start brozzler-easy
brozzler-easy
At this point brozzler-easy will start brozzling your site. Results will be
immediately available for playback in pywb at http://localhost:8880/brozzler/.
*Brozzler-easy demonstrates the full brozzler archival crawling workflow, but
does not take advantage of brozzler's distributed nature.*
Installation and Usage
----------------------
To install brozzler only:
::
pip install brozzler # in a virtualenv if desired
Launch one or more workers: Launch one or more workers:
:: ::
brozzler-worker -e chromium brozzler-worker
Submit jobs: Submit jobs:
@ -44,6 +83,13 @@ Submit jobs:
brozzler-new-job myjob.yaml brozzler-new-job myjob.yaml
Submit sites not tied to a job:
::
brozzler-new-site --proxy=localhost:8000 --enable-warcprox-features \
--time-limit=600 http://example.com/
Job Configuration Job Configuration
----------------- -----------------
@ -70,14 +116,6 @@ must be specified, everything else is optional.
scope: scope:
surt: http://(org,example, surt: http://(org,example,
Submit a Site to Crawl Without Configuring a Job
------------------------------------------------
::
brozzler-new-site --proxy=localhost:8000 --enable-warcprox-features \
--time-limit=600 http://example.com/
Brozzler Web Console Brozzler Web Console
-------------------- --------------------
@ -95,19 +133,7 @@ To start the app, run
brozzler-webconsole brozzler-webconsole
See ``brozzler-webconsole --help`` for configuration options.
XXX configuration stuff
Fonts (for decent screenshots)
------------------------------
On ubuntu 14.04 trusty I installed these packages:
xfonts-base ttf-mscorefonts-installer fonts-arphic-bkai00mp
fonts-arphic-bsmi00lp fonts-arphic-gbsn00lp fonts-arphic-gkai00mp
fonts-arphic-ukai fonts-farsiweb fonts-nafees fonts-sil-abyssinica
fonts-sil-ezra fonts-sil-padauk fonts-unfonts-extra fonts-unfonts-core
ttf-indic-fonts fonts-thai-tlwg fonts-lklug-sinhala
License License
------- -------

View file

@ -304,11 +304,13 @@ class Browser:
var __brzl_framesDone = new Set(); var __brzl_framesDone = new Set();
var __brzl_compileOutlinks = function(frame) { var __brzl_compileOutlinks = function(frame) {
__brzl_framesDone.add(frame); __brzl_framesDone.add(frame);
var outlinks = Array.prototype.slice.call( if (frame && frame.document) {
frame.document.querySelectorAll('a[href]')); var outlinks = Array.prototype.slice.call(
for (var i = 0; i < frame.frames.length; i++) { frame.document.querySelectorAll('a[href]'));
if (frame.frames[i] && !__brzl_framesDone.has(frame.frames[i])) { for (var i = 0; i < frame.frames.length; i++) {
outlinks = outlinks.concat(__brzl_compileOutlinks(frame.frames[i])); if (frame.frames[i] && !__brzl_framesDone.has(frame.frames[i])) {
outlinks = outlinks.concat(__brzl_compileOutlinks(frame.frames[i]));
}
} }
} }
return outlinks; return outlinks;

View file

@ -1,7 +1,7 @@
#!/usr/bin/env python #!/usr/bin/env python
''' '''
brozzler-easy - brozzler-worker, warcprox, and pywb all working together in a brozzler-easy - brozzler-worker, warcprox, pywb, and brozzler-webconsole all
single process working together in a single process
Copyright (C) 2016 Internet Archive Copyright (C) 2016 Internet Archive
@ -27,7 +27,7 @@ try:
import brozzler.pywb import brozzler.pywb
import wsgiref.simple_server import wsgiref.simple_server
import wsgiref.handlers import wsgiref.handlers
import six.moves.socketserver import brozzler.webconsole
except ImportError as e: except ImportError as e:
logging.critical( logging.critical(
'%s: %s\n\nYou might need to run "pip install ' '%s: %s\n\nYou might need to run "pip install '
@ -44,16 +44,17 @@ import threading
import time import time
import rethinkstuff import rethinkstuff
import traceback import traceback
import socketserver
def _build_arg_parser(prog=os.path.basename(sys.argv[0])): def _build_arg_parser(prog=os.path.basename(sys.argv[0])):
arg_parser = argparse.ArgumentParser( arg_parser = argparse.ArgumentParser(
prog=prog, formatter_class=argparse.ArgumentDefaultsHelpFormatter, prog=prog, formatter_class=argparse.ArgumentDefaultsHelpFormatter,
description=( description=(
'brozzler-easy - easy deployment of brozzler, with ' 'brozzler-easy - easy deployment of brozzler, with '
'brozzler-worker, warcprox, and pywb all running in a single ' 'brozzler-worker, warcprox, pywb, and brozzler-webconsole all '
'process')) 'running in a single process'))
# === common args === # common args
arg_parser.add_argument( arg_parser.add_argument(
'--rethinkdb-servers', dest='rethinkdb_servers', '--rethinkdb-servers', dest='rethinkdb_servers',
default='localhost', help=( default='localhost', help=(
@ -66,7 +67,7 @@ def _build_arg_parser(prog=os.path.basename(sys.argv[0])):
'-d', '--warcs-dir', dest='warcs_dir', default='./warcs', '-d', '--warcs-dir', dest='warcs_dir', default='./warcs',
help='where to write warcs') help='where to write warcs')
# === warcprox args === # warcprox args
arg_parser.add_argument( arg_parser.add_argument(
'-c', '--cacert', dest='cacert', '-c', '--cacert', dest='cacert',
default='./%s-warcprox-ca.pem' % socket.gethostname(), default='./%s-warcprox-ca.pem' % socket.gethostname(),
@ -83,24 +84,42 @@ def _build_arg_parser(prog=os.path.basename(sys.argv[0])):
'host:port of tor socks proxy, used only to connect to ' 'host:port of tor socks proxy, used only to connect to '
'.onion sites')) '.onion sites'))
# === brozzler-worker args === # brozzler-worker args
arg_parser.add_argument( arg_parser.add_argument(
'-e', '--chrome-exe', dest='chrome_exe', '-e', '--chrome-exe', dest='chrome_exe',
default=brozzler.cli.suggest_default_chome_exe(), default=brozzler.cli.suggest_default_chome_exe(),
help='executable to use to invoke chrome') help='executable to use to invoke chrome')
arg_parser.add_argument( arg_parser.add_argument(
'-n', '--max-browsers', dest='max_browsers', default='1', '-n', '--max-browsers', dest='max_browsers',
help='max number of chrome instances simultaneously browsing pages') type=int, default=1, help=(
'max number of chrome instances simultaneously '
'browsing pages'))
# === pywb args === # pywb args
arg_parser.add_argument( arg_parser.add_argument(
'--pywb-port', dest='pywb_port', type=int, default=8091, '--pywb-address', dest='pywb_address',
help='pywb wayback port') default='0.0.0.0',
help='pywb wayback address to listen on')
arg_parser.add_argument(
'--pywb-port', dest='pywb_port', type=int,
default=8880, help='pywb wayback port')
# === common at the bottom args === # webconsole args
arg_parser.add_argument( arg_parser.add_argument(
'-v', '--verbose', dest='verbose', action='store_true') '--webconsole-address', dest='webconsole_address',
arg_parser.add_argument('-q', '--quiet', dest='quiet', action='store_true') default='localhost',
help='brozzler web console address to listen on')
arg_parser.add_argument(
'--webconsole-port', dest='webconsole_port',
type=int, default=8881, help='brozzler web console port')
# common at the bottom args
arg_parser.add_argument(
'-v', '--verbose', dest='verbose', action='store_true',
help='verbose logging')
arg_parser.add_argument(
'-q', '--quiet', dest='quiet', action='store_true',
help='quiet logging (warnings and errors only)')
# arg_parser.add_argument( # arg_parser.add_argument(
# '-s', '--silent', dest='log_level', action='store_const', # '-s', '--silent', dest='log_level', action='store_const',
# default=logging.INFO, const=logging.CRITICAL) # default=logging.INFO, const=logging.CRITICAL)
@ -110,6 +129,10 @@ def _build_arg_parser(prog=os.path.basename(sys.argv[0])):
return arg_parser return arg_parser
class ThreadingWSGIServer(
socketserver.ThreadingMixIn, wsgiref.simple_server.WSGIServer):
pass
class BrozzlerEasyController: class BrozzlerEasyController:
logger = logging.getLogger(__module__ + "." + __qualname__) logger = logging.getLogger(__module__ + "." + __qualname__)
@ -120,6 +143,12 @@ class BrozzlerEasyController:
self._warcprox_args(args)) self._warcprox_args(args))
self.brozzler_worker = self._init_brozzler_worker(args) self.brozzler_worker = self._init_brozzler_worker(args)
self.pywb_httpd = self._init_pywb(args) self.pywb_httpd = self._init_pywb(args)
self.webconsole_httpd = self._init_brozzler_webconsole(args)
def _init_brozzler_webconsole(self, args):
return wsgiref.simple_server.make_server(
args.webconsole_address, args.webconsole_port,
brozzler.webconsole.app, ThreadingWSGIServer)
def _init_brozzler_worker(self, args): def _init_brozzler_worker(self, args):
r = rethinkstuff.Rethinker( r = rethinkstuff.Rethinker(
@ -128,7 +157,7 @@ class BrozzlerEasyController:
service_registry = rethinkstuff.ServiceRegistry(r) service_registry = rethinkstuff.ServiceRegistry(r)
worker = brozzler.worker.BrozzlerWorker( worker = brozzler.worker.BrozzlerWorker(
frontier, service_registry, frontier, service_registry,
max_browsers=int(args.max_browsers), max_browsers=args.max_browsers,
chrome_exe=args.chrome_exe, chrome_exe=args.chrome_exe,
proxy='%s:%s' % self.warcprox_controller.proxy.server_address, proxy='%s:%s' % self.warcprox_controller.proxy.server_address,
enable_warcprox_features=True) enable_warcprox_features=True)
@ -166,12 +195,9 @@ class BrozzlerEasyController:
# disable is_hop_by_hop restrictions # disable is_hop_by_hop restrictions
wsgiref.handlers.is_hop_by_hop = lambda x: False wsgiref.handlers.is_hop_by_hop = lambda x: False
class ThreadingWSGIServer(
six.moves.socketserver.ThreadingMixIn,
wsgiref.simple_server.WSGIServer):
pass
return wsgiref.simple_server.make_server( return wsgiref.simple_server.make_server(
'', args.pywb_port, wsgi_app, ThreadingWSGIServer) args.pywb_address, args.pywb_port, wsgi_app,
ThreadingWSGIServer)
def start(self): def start(self):
self.logger.info('starting warcprox') self.logger.info('starting warcprox')
@ -185,7 +211,15 @@ class BrozzlerEasyController:
'starting pywb at %s:%s', *self.pywb_httpd.server_address) 'starting pywb at %s:%s', *self.pywb_httpd.server_address)
threading.Thread(target=self.pywb_httpd.serve_forever).start() threading.Thread(target=self.pywb_httpd.serve_forever).start()
self.logger.info(
'starting brozzler-webconsole at %s:%s',
*self.webconsole_httpd.server_address)
threading.Thread(target=self.webconsole_httpd.serve_forever).start()
def shutdown(self): def shutdown(self):
self.logger.info('shutting down brozzler-webconsole')
self.webconsole_httpd.shutdown()
self.logger.info('shutting down brozzler-worker') self.logger.info('shutting down brozzler-worker')
self.brozzler_worker.shutdown_now() self.brozzler_worker.shutdown_now()
# brozzler-worker is fully shut down at this point # brozzler-worker is fully shut down at this point

View file

@ -69,7 +69,7 @@ class RethinkDbFrontier:
self.r.table("pages").index_create( self.r.table("pages").index_create(
"least_hops", [ "least_hops", [
self.r.row["site_id"], self.r.row["brozzle_count"], self.r.row["site_id"], self.r.row["brozzle_count"],
self.r.row["hops_from_seed"]]) self.r.row["hops_from_seed"]]).run()
if not "jobs" in tables: if not "jobs" in tables:
self.logger.info( self.logger.info(
"creating rethinkdb table 'jobs' in database %s", "creating rethinkdb table 'jobs' in database %s",

View file

@ -27,7 +27,6 @@ except ImportError as e:
'brozzler[webconsole]".\nSee README.rst for more information.', 'brozzler[webconsole]".\nSee README.rst for more information.',
type(e).__name__, e) type(e).__name__, e)
sys.exit(1) sys.exit(1)
import rethinkstuff import rethinkstuff
import json import json
import os import os
@ -56,11 +55,16 @@ SETTINGS = {
'RETHINKDB_SERVERS', 'localhost').split(','), 'RETHINKDB_SERVERS', 'localhost').split(','),
'RETHINKDB_DB': os.environ.get('RETHINKDB_DB', 'brozzler'), 'RETHINKDB_DB': os.environ.get('RETHINKDB_DB', 'brozzler'),
'WAYBACK_BASEURL': os.environ.get( 'WAYBACK_BASEURL': os.environ.get(
'WAYBACK_BASEURL', 'http://wbgrp-svc107.us.archive.org:8091'), 'WAYBACK_BASEURL', 'http://localhost:8091/brozzler'),
} }
r = rethinkstuff.Rethinker( r = rethinkstuff.Rethinker(
SETTINGS['RETHINKDB_SERVERS'], db=SETTINGS['RETHINKDB_DB']) SETTINGS['RETHINKDB_SERVERS'], db=SETTINGS['RETHINKDB_DB'])
service_registry = rethinkstuff.ServiceRegistry(r) _svc_reg = None
def service_registry():
global _svc_reg
if not _svc_reg:
_svc_reg = rethinkstuff.ServiceRegistry(r)
return _svc_reg
@app.route("/api/sites/<site_id>/queued_count") @app.route("/api/sites/<site_id>/queued_count")
@app.route("/api/site/<site_id>/queued_count") @app.route("/api/site/<site_id>/queued_count")
@ -149,6 +153,16 @@ def sites(job_id):
s["cookie_db"] = base64.b64encode(s["cookie_db"]).decode("ascii") s["cookie_db"] = base64.b64encode(s["cookie_db"]).decode("ascii")
return flask.jsonify(sites=sites_) return flask.jsonify(sites=sites_)
@app.route("/api/jobless-sites")
def jobless_sites():
# XXX inefficient (unindexed) query
sites_ = list(r.table("sites").filter(~r.row.has_fields("job_id")).run())
# TypeError: <binary, 7168 bytes, '53 51 4c 69 74 65...'> is not JSON serializable
for s in sites_:
if "cookie_db" in s:
s["cookie_db"] = base64.b64encode(s["cookie_db"]).decode("ascii")
return flask.jsonify(sites=sites_)
@app.route("/api/jobs/<int:job_id>") @app.route("/api/jobs/<int:job_id>")
@app.route("/api/job/<int:job_id>") @app.route("/api/job/<int:job_id>")
def job(job_id): def job(job_id):
@ -165,12 +179,12 @@ def job_yaml(job_id):
@app.route("/api/workers") @app.route("/api/workers")
def workers(): def workers():
workers_ = service_registry.available_services("brozzler-worker") workers_ = service_registry().available_services("brozzler-worker")
return flask.jsonify(workers=list(workers_)) return flask.jsonify(workers=list(workers_))
@app.route("/api/services") @app.route("/api/services")
def services(): def services():
services_ = service_registry.available_services() services_ = service_registry().available_services()
return flask.jsonify(services=list(services_)) return flask.jsonify(services=list(services_))
@app.route("/api/jobs") @app.route("/api/jobs")
@ -221,7 +235,26 @@ except ImportError:
logging.info('running brozzler-webconsole using simple flask app.run') logging.info('running brozzler-webconsole using simple flask app.run')
app.run() app.run()
if __name__ == "__main__": def main():
# arguments? import argparse
arg_parser = argparse.ArgumentParser(
prog=os.path.basename(sys.argv[0]),
formatter_class=argparse.RawDescriptionHelpFormatter,
description=(
'brozzler-webconsole - web application for viewing brozzler '
'crawl status'),
epilog=(
'brozzler-webconsole has no command line options, but can be '
'configured using the following environment variables:\n\n'
' RETHINKDB_SERVERS rethinkdb servers, e.g. db0.foo.org,'
'db0.foo.org:38015,db1.foo.org (default: localhost)\n'
' RETHINKDB_DB rethinkdb database name (default: '
'brozzler)\n'
' WAYBACK_BASEURL base url for constructing wayback '
'links (default http://localhost:8091/brozzler)'))
args = arg_parser.parse_args(args=sys.argv[1:])
run() run()
if __name__ == "__main__":
main()

View file

@ -79,6 +79,9 @@ brozzlerControllers.controller("HomeController", ["$scope", "$http",
$http.get("/api/services").success(function(data) { $http.get("/api/services").success(function(data) {
$scope.services = data.services; $scope.services = data.services;
}); });
$http.get("/api/jobless-sites").success(function(data) {
$scope.joblessSites = data.sites;
});
}]); }]);
brozzlerControllers.controller("WorkersListController", ["$scope", "$http", brozzlerControllers.controller("WorkersListController", ["$scope", "$http",

View file

@ -41,7 +41,6 @@
</div> </div>
<h2>Jobs</h2> <h2>Jobs</h2>
<div class="row"> <div class="row">
<div class="col-sm-12"> <div class="col-sm-12">
<table class="table table-striped"> <table class="table table-striped">
@ -66,4 +65,29 @@
</table> </table>
</div> </div>
</div> </div>
<h2>Jobless Sites</h2>
<div class="row">
<div class="col-sm-12">
<table class="table table-striped">
<thead>
<tr>
<th>id</th>
<th>status</th>
<th>started</th>
<th>seed url</th>
</tr>
</thead>
<tbody>
<tr ng-repeat="site in joblessSites">
<td><a href="/sites/{{site.id}}">{{site.id}}</a></td>
<td>{{site.status}}</td>
<td>{{site.start_time}}</td>
<td>{{site.seed}}</td>
</tr>
</tbody>
</table>
</div>
</div>
</div> </div>

View file

@ -32,7 +32,7 @@ def find_package_data(package):
setuptools.setup( setuptools.setup(
name='brozzler', name='brozzler',
version='1.1b6.dev69', version='1.1b6.dev78',
description='Distributed web crawling with browsers', description='Distributed web crawling with browsers',
url='https://github.com/internetarchive/brozzler', url='https://github.com/internetarchive/brozzler',
author='Noah Levitt', author='Noah Levitt',
@ -51,7 +51,7 @@ setuptools.setup(
'brozzler-new-site=brozzler.cli:brozzler_new_site', 'brozzler-new-site=brozzler.cli:brozzler_new_site',
'brozzler-worker=brozzler.cli:brozzler_worker', 'brozzler-worker=brozzler.cli:brozzler_worker',
'brozzler-ensure-tables=brozzler.cli:brozzler_ensure_tables', 'brozzler-ensure-tables=brozzler.cli:brozzler_ensure_tables',
'brozzler-webconsole=brozzler.webconsole:run', 'brozzler-webconsole=brozzler.webconsole:main',
'brozzler-easy=brozzler.easy:main', 'brozzler-easy=brozzler.easy:main',
], ],
}, },
@ -69,7 +69,7 @@ setuptools.setup(
], ],
extras_require={ extras_require={
'webconsole': ['flask>=0.11', 'gunicorn'], 'webconsole': ['flask>=0.11', 'gunicorn'],
'easy': ['warcprox>=2.0b1', 'pywb'], 'easy': ['warcprox>=2.0b1', 'pywb', 'flask>=0.11', 'gunicorn'],
}, },
zip_safe=False, zip_safe=False,
classifiers=[ classifiers=[

11
vagrant/Vagrantfile vendored
View file

@ -1,16 +1,13 @@
Vagrant.configure(2) do |config| Vagrant.configure(2) do |config|
config.vm.box = "ubuntu/trusty64" config.vm.box = "ubuntu/trusty64"
config.vm.hostname = "brozzler-easy" config.vm.define "10.9.9.9"
config.vm.hostname = "brzl"
config.vm.network :private_network, ip: "10.9.9.9"
config.vm.synced_folder "..", "/brozzler" config.vm.synced_folder "..", "/brozzler"
config.vm.provision "ansible" do |ansible| config.vm.provision "ansible" do |ansible|
ansible.inventory_path = "ansible/hosts"
ansible.playbook = "ansible/playbook.yml" ansible.playbook = "ansible/playbook.yml"
ansible.groups = {
"rethinkdb" => ["default"],
"warcprox" => ["default"],
"brozzler-worker" => ["default"],
# "brozzler-webconsole" => ["default"],
}
end end
end end

16
vagrant/ansible/hosts Normal file
View file

@ -0,0 +1,16 @@
ansible_ssh_private_key_file=.vagrant/machines/10.9.9.9/virtualbox/private_key
[rethinkdb]
10.9.9.9
[warcprox]
10.9.9.9
[brozzler-worker]
10.9.9.9
[brozzler-webconsole]
10.9.9.9
[pywb]
10.9.9.9

View file

@ -2,27 +2,27 @@
- name: apply common configuration to all nodes - name: apply common configuration to all nodes
hosts: all hosts: all
roles: roles:
- common - common
- name: deploy rethinkdb - name: deploy rethinkdb
hosts: rethinkdb hosts: rethinkdb
roles: roles:
- rethinkdb - rethinkdb
- name: deploy warcprox - name: deploy warcprox
hosts: warcprox hosts: warcprox
roles: roles:
- warcprox - warcprox
- name: deploy brozzler-worker - name: deploy brozzler-worker
hosts: brozzler-worker hosts: brozzler-worker
roles: roles:
- brozzler-worker - brozzler-worker
# - name: deploy brozzler-webconsole - name: deploy brozzler-webconsole
# hosts: brozzler-webconsole hosts: brozzler-webconsole
# roles: roles:
# - brozzler-webconsole - brozzler-webconsole
# - name: deploy pywb # - name: deploy pywb
# hosts: pywb # hosts: pywb

View file

@ -0,0 +1,4 @@
---
- name: restart brozzler-webconsole
service: name=brozzler-webconsole state=restarted
become: true

View file

@ -1,19 +1,15 @@
--- ---
- name: git clone https://github.com/internetarchive/brozzler.git - name: install brozzler[webconsole] in virtualenv
git: repo=https://github.com/internetarchive/brozzler.git become: true
dest=/home/vagrant/brozzler pip: name='-e /brozzler[webconsole]'
- name: pip install -r requirements.txt in virtualenv
pip: requirements=/home/vagrant/brozzler/webconsole/requirements.txt
virtualenv=/home/vagrant/brozzler-webconsole-ve34 virtualenv=/home/vagrant/brozzler-webconsole-ve34
virtualenv_python=python3.4 virtualenv_python=python3.4
extra_args='--no-input --upgrade --pre' extra_args='--no-input --upgrade --pre'
notify: notify:
- restart brozzler-webconsole - restart brozzler-webconsole
- name: install upstart config /etc/init/brozzler-webconsole.conf - name: install upstart config /etc/init/brozzler-webconsole.conf
become: true become: true
template: src=templates/brozzler-webconsole.conf.j2 template: src=templates/brozzler-webconsole.conf.j2
dest=/etc/init/brozzler-webconsole.conf dest=/etc/init/brozzler-webconsole.conf
notify: notify:
- restart brozzler-webconsole - restart brozzler-webconsole

View file

@ -3,19 +3,16 @@ description "brozzler-webconsole"
start on runlevel [2345] start on runlevel [2345]
stop on runlevel [!2345] stop on runlevel [!2345]
env PYTHONPATH=/home/vagrant/brozzler-webconsole-ve34/lib/python3.4/site-packages:/home/vagrant/brozzler/webconsole env PYTHONPATH=/home/vagrant/brozzler-webconsole-ve34/lib/python3.4/site-packages
env PATH=/home/vagrant/brozzler-webconsole-ve34/bin:/usr/bin:/bin env PATH=/home/vagrant/brozzler-webconsole-ve34/bin:/usr/bin:/bin
env LC_ALL=C.UTF-8 env LC_ALL=C.UTF-8
env WAYBACK_BASEURL={{base_wayback_url}}/all env WAYBACK_BASEURL=http://{{groups['pywb'][0]}}:8880/brozzler
# env RETHINKDB_SERVERS={{groups['rethinkdb'] | join(',')}} env RETHINKDB_SERVERS={{groups['rethinkdb'] | join(',')}}
env RETHINKDB_SERVERS=localhost env RETHINKDB_DB=brozzler
env RETHINKDB_DB={{rethinkdb_db}}
setuid vagrant setuid vagrant
# console log # console log
exec gunicorn --bind=0.0.0.0:8081 brozzler-webconsole:app >&/vagrant/logs/brozzler-webconsole.log exec gunicorn --bind=0.0.0.0:8881 brozzler.webconsole:app >>/vagrant/logs/brozzler-webconsole.log 2>&1

View file

@ -19,7 +19,5 @@ stop on stopping Xvnc
kill timeout 60 kill timeout 60
exec nice brozzler-worker \ exec nice brozzler-worker \
--rethinkdb-servers=localhost \ --rethinkdb-servers={{groups['rethinkdb'] | join(',')}} \
--max-browsers=4 >>/vagrant/logs/brozzler-worker.log 2>&1 --max-browsers=4 >>/vagrant/logs/brozzler-worker.log 2>&1
# --rethinkdb-servers={{groups['rethinkdb'] | join(',')}} \

View file

@ -10,5 +10,6 @@ console log
env PYTHONPATH=/home/vagrant/websockify-ve34/lib/python3.4/site-packages env PYTHONPATH=/home/vagrant/websockify-ve34/lib/python3.4/site-packages
env PATH=/home/vagrant/websockify-ve34/bin:/usr/bin:/bin env PATH=/home/vagrant/websockify-ve34/bin:/usr/bin:/bin
# port 8901 is hard-coded in brozzler/webconsole/static/partials/workers.html
exec nice websockify 0.0.0.0:8901 localhost:5901 exec nice websockify 0.0.0.0:8901 localhost:5901

View file

@ -1,5 +1,5 @@
runuser=vagrant runuser=vagrant
# bind=0.0.0.0 bind=0.0.0.0
# directory=/var/lib/rethinkdb # directory=/var/lib/rethinkdb
# log-file=/var/log/rethinkdb.log # log-file=/var/log/rethinkdb.log
log-file=/vagrant/logs/rethinkdb.log # synced dir log-file=/vagrant/logs/rethinkdb.log # synced dir