Noah Levitt
|
df61e55b6b
|
add license headers
|
2016-04-25 20:02:11 +00:00 |
|
Noah Levitt
|
c2e80ed6ff
|
make whole process die if main worker thread dies
|
2016-03-16 23:35:33 +00:00 |
|
Noah Levitt
|
343b5c0f82
|
register with service registry; only start chrome right before using it, so that web console vnc windows aren't always full of about:blank
|
2015-11-12 02:56:27 +00:00 |
|
Noah Levitt
|
dff4149185
|
missed one more use of brozzler.version
|
2015-09-24 00:44:35 +00:00 |
|
Noah Levitt
|
2bc66f52d4
|
new rethinkstuff.Rethinker api
|
2015-09-23 00:50:15 +00:00 |
|
Noah Levitt
|
92a288bc35
|
detect jobs finishing! (not well tested yet)
|
2015-09-09 22:11:48 +00:00 |
|
Noah Levitt
|
f334107b47
|
support for specifying rethinkdb database name; wrap rethinkdb operations and retry if appropriate (as best as we can tell)
|
2015-08-28 00:37:26 +00:00 |
|
Noah Levitt
|
b8506a2ab4
|
rename "db" to "frontier"
|
2015-08-19 17:47:05 +00:00 |
|
Noah Levitt
|
a878730e02
|
goodbye sqlite and rabbitmq, hello rethinkdb
|
2015-08-18 21:44:54 +00:00 |
|
Noah Levitt
|
85a863b1e3
|
change argument to --amqp-url for clarity and consistency
|
2015-07-23 00:39:57 +00:00 |
|
Noah Levitt
|
2ba5bd4d4b
|
support adding extra http request headers
|
2015-07-17 13:45:27 -07:00 |
|
Noah Levitt
|
140a441eb5
|
honor site proxy setting; remove brozzler-worker options that are now configured at the site level (and in the case of ignore_cert_errors, always on, no longer an option); use "reppy" library for robots.txt handling; fix some bugs
|
2015-07-16 17:19:12 -07:00 |
|
Noah Levitt
|
5aea76ab6d
|
refactor worker code into worker module
|
2015-07-15 15:42:40 -07:00 |
|
Noah Levitt
|
7b92ba39c7
|
avoid printing stack trace on normal youtube_dl unsupported condition (still prints error message unfortunately)
|
2015-07-15 14:33:22 -07:00 |
|
Noah Levitt
|
9b5da57d7e
|
initial youtube-dl support, including saving youtube-dl derived json with warcprox by sending a PUTMETA request, if new option --enable-warcprox-features is enabled
|
2015-07-14 18:57:45 -07:00 |
|
Noah Levitt
|
fd0c3322ee
|
update readme, s/umbra/brozzler/ in most places, delete non-brozzler stuff
|
2015-07-13 17:09:39 -07:00 |
|
Noah Levitt
|
3eff099b16
|
determine if youtube-dl can do something with a url
|
2015-07-13 16:40:56 -07:00 |
|
Noah Levitt
|
6470a8ef26
|
sigquit dumps thread traces
|
2015-07-13 15:57:14 -07:00 |
|
Noah Levitt
|
eb74967fed
|
brozzler-worker round-robins sites needing crawling
|
2015-07-13 12:13:41 -07:00 |
|
Noah Levitt
|
ddd764cac5
|
brozzle-worker options --proxy-server=host:port and --ignore-certificate-errors (for use with warcprox)
|
2015-07-11 23:07:47 -07:00 |
|
Noah Levitt
|
610f9c8cf4
|
add missing file hq.py, improve some logging, fix little race condition bug
|
2015-07-11 13:09:45 -07:00 |
|
Noah Levitt
|
1fb336cb2e
|
crawling outlinks not totally working
|
2015-07-11 02:29:19 -07:00 |
|
Noah Levitt
|
56a7bb7306
|
submit outlinks to hq
|
2015-07-10 21:31:41 -07:00 |
|
Noah Levitt
|
fd99764baa
|
brozzler-worker partially working
|
2015-07-10 21:07:47 -07:00 |
|