Hunter Stern
|
8829323a38
|
Remove changes for https://webarchive.jira.com/browse/ARI-4518:
|
2015-09-17 09:07:03 -07:00 |
|
Hunter Stern
|
f282213981
|
Add fix for https://webarchive.jira.com/browse/ARI-4518
|
2015-09-17 08:43:30 -07:00 |
|
Noah Levitt
|
c780b147b3
|
missed "git+"
|
2015-09-16 19:24:48 +00:00 |
|
Noah Levitt
|
c682627aec
|
Rethinker moved to pyrethink library
|
2015-09-16 19:24:17 +00:00 |
|
Noah Levitt
|
a8f9664212
|
separate virtualenvs
|
2015-09-16 19:23:11 +00:00 |
|
Hunter Stern
|
5ccc535f51
|
More changes
|
2015-09-16 09:23:13 -07:00 |
|
Hunter Stern
|
3467670900
|
More changes for handling psu24 site
|
2015-09-15 18:03:08 -07:00 |
|
Noah Levitt
|
5a6cbf01da
|
Dockerfile for brozzler worker
|
2015-09-15 23:02:37 +00:00 |
|
Hunter Stern
|
ea41653c44
|
Pulled in changes from https://github.com/nlevitt/umbra/tree/aitfive-451-alt
|
2015-09-15 11:53:53 -07:00 |
|
Noah Levitt
|
70308c10f4
|
shouldn't have local paths as requirements
|
2015-09-15 18:07:47 +00:00 |
|
Noah Levitt
|
b30cc2d68b
|
simpler implementation for https://github.com/internetarchive/umbra/pull/42/files
|
2015-09-14 17:57:01 -07:00 |
|
Noah Levitt
|
dc9d1a4959
|
detecting job finish seems to be working now
|
2015-09-10 01:38:31 +00:00 |
|
Noah Levitt
|
92a288bc35
|
detect jobs finishing! (not well tested yet)
|
2015-09-09 22:11:48 +00:00 |
|
Noah Levitt
|
72e72e03c4
|
brozzler-job-starter.py -> ait-brozzler-boss.py
|
2015-09-09 22:11:14 +00:00 |
|
Noah Levitt
|
1b94d10723
|
on reset, mark active jobs as finished
|
2015-09-08 22:38:39 +00:00 |
|
Noah Levitt
|
290ea433a5
|
save full size screenshot as jpeg too
|
2015-09-08 22:37:35 +00:00 |
|
Noah Levitt
|
9698b0f847
|
create thumbnail of screenshot and send to warcprox
|
2015-09-07 06:27:21 +00:00 |
|
Noah Levitt
|
565ab5f936
|
save screenshots with new scheme url screenshot:..., WARC-Type:resource
|
2015-09-07 00:26:37 +00:00 |
|
Noah Levitt
|
993ae6a833
|
run ait5 partner webapp; consolidate "status" and "fullstatus"
|
2015-09-04 21:02:33 +00:00 |
|
Noah Levitt
|
5fe2805285
|
fix bug claiming site, looks like there could be a race condition with other worker claiming the same site
|
2015-09-04 01:36:29 +00:00 |
|
Noah Levitt
|
3c23aa8fd4
|
finally, the jobs table
|
2015-09-03 01:05:03 +00:00 |
|
Noah Levitt
|
6cda4739b8
|
log exception when thread dies (seems to be dying silently sometimes)
|
2015-09-03 01:04:41 +00:00 |
|
Noah Levitt
|
839bf6f4ae
|
script to help with starting/restarting/etc in my dev environment
|
2015-09-03 01:03:19 +00:00 |
|
Noah Levitt
|
f334107b47
|
support for specifying rethinkdb database name; wrap rethinkdb operations and retry if appropriate (as best as we can tell)
|
2015-08-28 00:37:26 +00:00 |
|
Noah Levitt
|
cf91fb1377
|
Revert "use dependency_links instead of requirements.txt in spite of ugliness of --process-dependency-links, #egg=..., so that dependent projects can use brozzler more easily"
Ugh.. too much pain, not worth the time to figure out the magic #egg=
incantation.
This reverts commit 78ca0701651c35bda69122ddf652cbb8d95daeb0.
|
2015-08-26 19:44:04 +00:00 |
|
Noah Levitt
|
78ca070165
|
use dependency_links instead of requirements.txt in spite of ugliness of --process-dependency-links, #egg=..., so that dependent projects can use brozzler more easily
|
2015-08-26 19:22:59 +00:00 |
|
Noah Levitt
|
efa640c640
|
refactor to simplify starting new job from code
|
2015-08-25 19:52:33 +00:00 |
|
Noah Levitt
|
68de85022a
|
there is no hq anymore; database notes can still be found in git history, though there's nothing about rethinkdb
|
2015-08-21 17:55:29 +00:00 |
|
Noah Levitt
|
231d019659
|
use nlevitt fork of surt library for less stupid handling of mailto: urls, etc
|
2015-08-20 21:23:59 +00:00 |
|
Noah Levitt
|
ee50818dca
|
if database already exists but tables don't, just create them
|
2015-08-20 21:23:08 +00:00 |
|
Noah Levitt
|
3af1e10e13
|
make it work again, and list discovered outlinks
|
2015-08-20 21:22:08 +00:00 |
|
Noah Levitt
|
8b45d7eb69
|
since I can't figure out what's causing these sporadic errors fetching certain robots.txt through warcprox, stick a retry loop around the fetch
|
2015-08-19 22:50:04 +00:00 |
|
Noah Levitt
|
ad543e6134
|
enforce time limits; move scope_and_schedule_outlinks into frontier.py; fix bugs around scoping on seed redirect
|
2015-08-19 20:16:25 +00:00 |
|
Noah Levitt
|
ddce1cdc71
|
fix mistakenly removed import; try to shut down chrome in case of unexpected exception
|
2015-08-19 20:04:46 +00:00 |
|
Noah Levitt
|
2533229fa1
|
add __all__ to modules
|
2015-08-19 19:01:28 +00:00 |
|
Noah Levitt
|
b7df0a1f37
|
make frontier prioritize least recently brozzled site; move disclaim_site() and completed_page() into frontier.py
|
2015-08-19 18:45:19 +00:00 |
|
Noah Levitt
|
b8506a2ab4
|
rename "db" to "frontier"
|
2015-08-19 17:47:05 +00:00 |
|
Noah Levitt
|
cd3a644298
|
switch order of brozzle_count and claimed in priority_by_site index to fix has_outstanding_pages check
|
2015-08-19 00:04:20 +00:00 |
|
Noah Levitt
|
382c826678
|
rethinkdb connection per request, to server chosen randomly from list
|
2015-08-18 23:47:28 +00:00 |
|
Noah Levitt
|
a878730e02
|
goodbye sqlite and rabbitmq, hello rethinkdb
|
2015-08-18 21:44:54 +00:00 |
|
Noah Levitt
|
e6fbf0e2e9
|
rename brozzler-add-site to brozzler-new-site to match brozzler-new-job et al
|
2015-08-17 22:48:25 +00:00 |
|
Noah Levitt
|
6b6583e63a
|
more notes on choosing a db
|
2015-08-13 01:01:35 +00:00 |
|
Noah Levitt
|
e68c98e66d
|
brozzle a site for 5 minutes at a time instead of 1 for now
|
2015-08-11 18:15:16 +00:00 |
|
Noah Levitt
|
fc75e18928
|
handle "aw snap" or "he's dead jim" from chrome
|
2015-08-11 18:14:53 +00:00 |
|
Noah Levitt
|
3d70776ce3
|
some thoughts on distributed database
|
2015-08-11 18:06:58 +00:00 |
|
Noah Levitt
|
ce154fc3db
|
more robustness improvements
|
2015-08-10 20:11:46 +00:00 |
|
Noah Levitt
|
e96b16e19a
|
support for max_hops scope rule
|
2015-08-07 22:36:39 +00:00 |
|
Noah Levitt
|
a47292dab5
|
thread to read and selectively log output from chrome
|
2015-08-07 22:36:07 +00:00 |
|
Noah Levitt
|
2a7a0b7c30
|
little fix, tweak
|
2015-08-05 00:17:43 +00:00 |
|
Noah Levitt
|
b6beac3807
|
new script brozzler-new-job to queue a new job with brozzler based on yaml configuration file
|
2015-08-04 19:52:01 +00:00 |
|