Noah Levitt
72a94ed816
un-hardcode some stuff in webconsole, load from environment variables instead
2016-04-19 18:51:14 +00:00
Noah Levitt
35b713a2e7
little version bump
2016-04-07 23:36:05 +00:00
Noah Levitt
919692f9fa
pin rethinkdb requirement to 2.3.x (this needs to roughly track deployed version)
2016-04-07 23:35:20 +00:00
Noah Levitt
7c637a45e0
remove debugging line
2016-04-07 23:34:44 +00:00
Noah Levitt
5bb23b354c
fix stupid bug where all new sites would have same start_time
2016-04-07 23:34:30 +00:00
Noah Levitt
ecb2e44442
if youtube-dl fetches pages or makes HEAD requests, look at the responses to determine if the page is html and therefore needs to be browsed; if it doesn't need to be browsed, check if youtube-dl has already fetched it (GET request to final bounce of redirect chain that returned a 200); if not, simply fetch it
2016-04-06 17:50:48 -07:00
Noah Levitt
ed0ea24de6
Merge branch 'master' of github.com:nlevitt/brozzler
...
* 'master' of github.com:nlevitt/brozzler:
fix bug preventing brozzler from simultaneously working on more than one site from the same job
2016-04-04 22:43:18 -07:00
Noah Levitt
d834516362
include custom http request headers in youtube-dl requests without need for special hacked youtube-dl
2016-04-04 22:43:08 -07:00
Noah Levitt
733124c7dc
fix bug preventing brozzler from simultaneously working on more than one site from the same job
2016-04-04 23:28:24 +00:00
Noah Levitt
a43b5016e1
use a dev version number
2016-03-18 02:03:20 +00:00
Noah Levitt
c2e80ed6ff
make whole process die if main worker thread dies
2016-03-16 23:35:33 +00:00
Noah Levitt
ca9e62f5cf
if a site is marked "claimed" in rethinkdb, but last_disclaimed is more than 2 hours ago, claim it and log a warning
2016-03-14 22:21:16 +00:00
Noah Levitt
4874eaccbb
Merge remote-tracking branch 'umbra/master'
...
* umbra/master:
Handle Python to JS boolean conversion
Allow clicking on already clicked element to continue in behaviors if click_until_hard_timeout is set to true
Make Umbra click on 'Load More' button for youtube pages
catch and log exception deleting temporary work directory
update detection of modal close button for facebook changes
Add custom behavior for Brooklyn Museum.
2016-03-07 17:37:12 -08:00
Noah Levitt
b06381790c
honor crawl job stop requests
2016-03-08 00:18:54 +00:00
Noah Levitt
d2567f4a13
loosen surt req
2016-03-02 00:16:58 +00:00
Noah Levitt
b75577fca4
Merge pull request #52 from vonrosen/ARI-4725
...
Allow clicking on already clicked element to continue in behaviors if…
2016-02-16 15:22:15 -08:00
Noah Levitt
77dfbcd328
remove cluster-control.sh script because it's specific to ait environment
2016-02-11 23:56:16 +00:00
Noah Levitt
664bb33add
tweaks that have been sitting here
2016-02-10 00:38:48 +00:00
Noah Levitt
887eadb99a
lock down vnc
2016-02-10 00:37:36 +00:00
Hunter Stern
fe650b69ed
Handle Python to JS boolean conversion
2016-02-09 10:48:33 -08:00
Hunter Stern
2ed96f9b59
Allow clicking on already clicked element to continue in behaviors if click_until_hard_timeout is set to true
2016-02-05 10:00:24 -08:00
Neil Minton
b9973c7cae
Merge pull request #51 from vonrosen/ARI-4690
...
Make Umbra click on 'Load More' button for youtube pages
2016-02-03 14:07:51 -08:00
Hunter Stern
fe81aa4ff2
Make Umbra click on 'Load More' button for youtube pages
2016-01-28 11:53:59 -08:00
Neil Minton
54d92f88b0
Merge pull request #49 from nlevitt/work-dir-cleanup-exception
...
catch and log exception deleting temporary work directory
2015-12-18 11:34:54 -08:00
Noah Levitt
f1770b813d
Merge pull request #48 from sfdevguy/master
...
Add custom behavior for Brooklyn Museum
2015-12-18 11:34:00 -08:00
Noah Levitt
8ab0857dad
catch and log exception deleting temporary work directory
2015-12-18 11:26:36 -08:00
Neil Minton
c494afb749
Merge branch 'AITFIVE-497'
2015-12-02 10:02:05 -08:00
Noah Levitt
36e2bb2729
use rethinkdb native time type for date/time values
2015-11-18 02:07:27 +00:00
Noah Levitt
ca0053e3be
also when adding new job, insert all sites before the job, to prevent brozzler workers thinking the job is finished before all the sites are in the db
2015-11-14 03:10:58 +00:00
Noah Levitt
3260fe4e9e
when adding new job, insert the seed url Page document into the database before the Site, to avoid situation where brozzler worker claims the site, finds no pages to crawl, and decides the site is finished
2015-11-13 23:47:51 +00:00
Noah Levitt
21906f8cad
vnc-websock.sh uses bashisms
2015-11-12 02:59:45 +00:00
Noah Levitt
3bcd2400f7
2 instances of warcprox; no docker for brozzler worker
2015-11-12 02:59:21 +00:00
Noah Levitt
4c2ecab856
surt==0.3b2 (available on pypi)
2015-11-12 02:58:53 +00:00
Noah Levitt
38dec97e19
logging tweaks
2015-11-12 02:58:26 +00:00
Noah Levitt
5597b4cf1a
quiet down requests.packages.urllib3
2015-11-12 02:58:00 +00:00
Noah Levitt
998c3975b2
replace jobs page with home page which also lists services
2015-11-12 02:57:27 +00:00
Noah Levitt
343b5c0f82
register with service registry; only start chrome right before using it, so that web console vnc windows aren't always full of about:blank
2015-11-12 02:56:27 +00:00
Noah Levitt
b91d7e4c3f
startup scripts for services needed for non-docker deployment
2015-11-11 21:28:55 +00:00
Noah Levitt
29b6a0b0d4
Merge branch 'master' of github.com:nlevitt/brozzler
...
* 'master' of github.com:nlevitt/brozzler:
update detection of modal close button for facebook changes
refactor umbraAboveBelowOrOnScreen into umbraBehavior object
fixes for psu24 behavior
More changes.
Remove changes for https://webarchive.jira.com/browse/ARI-4518 :
Add fix for https://webarchive.jira.com/browse/ARI-4518
More changes
More changes for handling psu24 site
Pulled in changes from https://github.com/nlevitt/umbra/tree/aitfive-451-alt
simpler implementation for https://github.com/internetarchive/umbra/pull/42/files
Adds routing_key to queue Queue creation
2015-11-05 20:10:22 +00:00
Noah Levitt
8c422534a5
smart waiting for tables and indexes to be ready
2015-11-05 20:10:14 +00:00
Hunter
b329d193ca
Merge pull request #46 from nlevitt/facebook-modal-close
...
update detection of modal close button for facebook changes
2015-11-04 07:37:36 -08:00
Noah Levitt
f6f4daf24a
update detection of modal close button for facebook changes
2015-11-03 15:36:31 -08:00
Noah Levitt
8889707f24
update detection of modal close button for facebook changes
2015-11-03 15:33:46 -08:00
Noah Levitt
85d87a5e42
Merge remote-tracking branch 'umbra/master'
...
* umbra/master:
refactor umbraAboveBelowOrOnScreen into umbraBehavior object
fixes for psu24 behavior
More changes.
Remove changes for https://webarchive.jira.com/browse/ARI-4518 :
Add fix for https://webarchive.jira.com/browse/ARI-4518
More changes
More changes for handling psu24 site
Pulled in changes from https://github.com/nlevitt/umbra/tree/aitfive-451-alt
simpler implementation for https://github.com/internetarchive/umbra/pull/42/files
Adds routing_key to queue Queue creation
2015-11-03 15:31:38 -08:00
Neil Minton
dceef1a676
Add custom behavior for Brooklyn Museum.
2015-11-03 13:59:20 -08:00
Noah Levitt
90fad87f7e
websockify startup script
2015-11-03 20:15:41 +00:00
Noah Levitt
03e7c29701
switch noVNC git url to https
2015-10-29 21:36:43 +00:00
Noah Levitt
d9d69a88fd
tweaking workers page
2015-10-29 01:01:28 +00:00
Noah Levitt
7b39ba021b
proof of concept presenting workers in web console with novnc
2015-10-27 19:01:21 +00:00
Noah Levitt
a0f4fd449c
Merge pull request #1 from adam-miller/fixes
...
uncommented init imports, removed required job_id in Frontier.finished
2015-10-22 15:33:46 -07:00