327 Commits

Author SHA1 Message Date
Noah Levitt
0050fe56b8 logo 2015-10-07 17:53:16 -07:00
Noah Levitt
2ddda68392 symlink to root 2015-10-08 00:37:39 +00:00
Noah Levitt
d1158ab224 incremental progress on web console 2015-10-08 00:33:49 +00:00
Noah Levitt
7ab2eb4fda brozzler web console in the mix 2015-10-08 00:31:28 +00:00
Noah Levitt
82011c15cd Merge branch 'master' of github.com:nlevitt/brozzler
* 'master' of github.com:nlevitt/brozzler:
  logo!?
2015-10-07 23:56:44 +00:00
Noah Levitt
3805c7bf93 logo!? 2015-10-07 15:45:01 -07:00
Noah Levitt
a5eb223b32 run brozzler workers inside docker containers 2015-10-06 01:24:01 +00:00
Noah Levitt
5868192e0a more stubby stuff 2015-09-28 22:05:43 +00:00
Noah Levitt
2e1601ac81 i think hash-less urls are working 2015-09-25 22:48:01 +00:00
Noah Levitt
05e15b9667 progress on the structure of this little app 2015-09-25 22:19:29 +00:00
Noah Levitt
51732d0d49 run warcprox on wbgrp-svc111 2015-09-25 19:16:27 +00:00
Noah Levitt
69a25bc74a equivalent functionality using angular and restful json 2015-09-25 19:15:20 +00:00
Noah Levitt
1ca17f204b brozzler web console initial fiddling 2015-09-25 17:59:38 +00:00
Noah Levitt
dff4149185 missed one more use of brozzler.version 2015-09-24 00:44:35 +00:00
Noah Levitt
a94dfd27f8 oops, set brozzler.__version__ 2015-09-24 00:34:51 +00:00
Noah Levitt
8c69ca3b39 giving up on using git revision in version number :( latest issue is when installing a package that calls git to compute a version number, but cwd is some other git project, you get the wrong thing 2015-09-24 00:17:33 +00:00
Noah Levitt
9699a40645 remove "dev" from version number and switch README to rst 2015-09-23 22:35:26 +00:00
Noah Levitt
245078284d pep440 compliant versioning 2015-09-23 14:46:57 -07:00
Noah Levitt
40522ef5a5 fix some rethinkdb related stuff; most notably r.desc() and related stuff don't currently work correctly if r is a Rethinker, so use rethinkdb directly in that case 2015-09-23 01:53:05 +00:00
Noah Levitt
8bf34d9db6 tweaks 2015-09-23 00:50:38 +00:00
Noah Levitt
2bc66f52d4 new rethinkstuff.Rethinker api 2015-09-23 00:50:15 +00:00
Noah Levitt
2863b7e422 goodbye requirements.txt now that we have devpi 2015-09-23 00:49:20 +00:00
Noah Levitt
c780b147b3 missed "git+" 2015-09-16 19:24:48 +00:00
Noah Levitt
c682627aec Rethinker moved to pyrethink library 2015-09-16 19:24:17 +00:00
Noah Levitt
a8f9664212 separate virtualenvs 2015-09-16 19:23:11 +00:00
Noah Levitt
5a6cbf01da Dockerfile for brozzler worker 2015-09-15 23:02:37 +00:00
Noah Levitt
70308c10f4 shouldn't have local paths as requirements 2015-09-15 18:07:47 +00:00
Noah Levitt
dc9d1a4959 detecting job finish seems to be working now 2015-09-10 01:38:31 +00:00
Noah Levitt
92a288bc35 detect jobs finishing! (not well tested yet) 2015-09-09 22:11:48 +00:00
Noah Levitt
72e72e03c4 brozzler-job-starter.py -> ait-brozzler-boss.py 2015-09-09 22:11:14 +00:00
Noah Levitt
1b94d10723 on reset, mark active jobs as finished 2015-09-08 22:38:39 +00:00
Noah Levitt
290ea433a5 save full size screenshot as jpeg too 2015-09-08 22:37:35 +00:00
Noah Levitt
9698b0f847 create thumbnail of screenshot and send to warcprox 2015-09-07 06:27:21 +00:00
Noah Levitt
565ab5f936 save screenshots with new scheme url screenshot:..., WARC-Type:resource 2015-09-07 00:26:37 +00:00
Noah Levitt
993ae6a833 run ait5 partner webapp; consolidate "status" and "fullstatus" 2015-09-04 21:02:33 +00:00
Noah Levitt
5fe2805285 fix bug claiming site, looks like there could be a race condition with other worker claiming the same site 2015-09-04 01:36:29 +00:00
Noah Levitt
3c23aa8fd4 finally, the jobs table 2015-09-03 01:05:03 +00:00
Noah Levitt
6cda4739b8 log exception when thread dies (seems to be dying silently sometimes) 2015-09-03 01:04:41 +00:00
Noah Levitt
839bf6f4ae script to help with starting/restarting/etc in my dev environment 2015-09-03 01:03:19 +00:00
Noah Levitt
f334107b47 support for specifying rethinkdb database name; wrap rethinkdb operations and retry if appropriate (as best as we can tell) 2015-08-28 00:37:26 +00:00
Noah Levitt
cf91fb1377 Revert "use dependency_links instead of requirements.txt in spite of ugliness of --process-dependency-links, #egg=..., so that dependent projects can use brozzler more easily"
Ugh.. too much pain, not worth the time to figure out the magic #egg=
incantation.

This reverts commit 78ca0701651c35bda69122ddf652cbb8d95daeb0.
2015-08-26 19:44:04 +00:00
Noah Levitt
78ca070165 use dependency_links instead of requirements.txt in spite of ugliness of --process-dependency-links, #egg=..., so that dependent projects can use brozzler more easily 2015-08-26 19:22:59 +00:00
Noah Levitt
efa640c640 refactor to simplify starting new job from code 2015-08-25 19:52:33 +00:00
Noah Levitt
68de85022a there is no hq anymore; database notes can still be found in git history, though there's nothing about rethinkdb 2015-08-21 17:55:29 +00:00
Noah Levitt
231d019659 use nlevitt fork of surt library for less stupid handling of mailto: urls, etc 2015-08-20 21:23:59 +00:00
Noah Levitt
ee50818dca if database already exists but tables don't, just create them 2015-08-20 21:23:08 +00:00
Noah Levitt
3af1e10e13 make it work again, and list discovered outlinks 2015-08-20 21:22:08 +00:00
Noah Levitt
8b45d7eb69 since I can't figure out what's causing these sporadic errors fetching certain robots.txt through warcprox, stick a retry loop around the fetch 2015-08-19 22:50:04 +00:00
Noah Levitt
ad543e6134 enforce time limits; move scope_and_schedule_outlinks into frontier.py; fix bugs around scoping on seed redirect 2015-08-19 20:16:25 +00:00
Noah Levitt
ddce1cdc71 fix mistakenly removed import; try to shut down chrome in case of unexpected exception 2015-08-19 20:04:46 +00:00