jkafader
898756690f
Merge pull request #142 from nlevitt/service-worker
...
fetch service worker script with proper headers
2018-11-29 13:42:59 -08:00
jkafader
9c27e829aa
Merge pull request #136 from nlevitt/revert-time-limit
...
change time limit enforcement
2018-11-29 12:29:35 -08:00
Noah Levitt
db62402be8
fix tests
2018-11-27 14:35:00 -08:00
Barbara Miller
e2b2542d4a
handle http auth ( #138 )
...
abort brozzling on insterstial (auth dialog)
because we have no other recourse at this point. waiting on Network.requestIntercepted auth challenge support. (didn't work in our latest testing)
https://chromedevtools.github.io/devtools-protocol/tot/Network#type-AuthChallengeResponse
2018-11-16 15:10:30 -08:00
Noah Levitt
05fab8b909
change time limit enforcement
...
enforce time limit based on all the time that a site was in active
rotation, including time it spent waiting for its turn to be brozzled;
this undoes the change from b9640b8a30c934, because now it seems that
was the wrong decision (brozzler jobs with many seeds and low
max_claimed_sites hanging around forever)
2018-11-12 16:21:38 -08:00
Noah Levitt
7497b7e5ac
tests expect outlinks to be a set
2018-10-12 11:03:54 -07:00
Noah Levitt
1ef717fa75
test exposing bug that we don't send warcprox-meta
...
when pushing stitched-up video with WARCPROX_WRITE_RECORD
2018-09-18 01:05:18 -07:00
jkafader
8368cd2bcb
Merge pull request #115 from nlevitt/ydl-stitched
...
Ydl stitched
2018-09-06 16:15:52 -07:00
Noah Levitt
88d3d3b310
why did those tests fail??? ( #117 )
...
1.4 for pypi
2018-08-22 14:35:39 -07:00
Noah Levitt
e7d2273856
fix failing tests
2018-08-16 11:40:54 -07:00
Noah Levitt
3c27132aaa
test for youtube-dl stitch-up
2018-08-15 17:42:53 -07:00
Noah Levitt
d4db8ba9bc
is test_time_limit is failing because of timing?
...
give it up to ten seconds to mark the job finished
2018-06-25 10:35:24 -05:00
Noah Levitt
c52c16c260
fix bug in test, add another one
2018-06-22 16:10:23 -05:00
Noah Levitt
aeb7c3f825
treat any error fetching robots.txt as "allow all"
2018-06-22 14:50:57 -05:00
Noah Levitt
331d07fe88
these ssurts are strings too
2018-05-16 17:11:08 -07:00
Noah Levitt
5bb392ec7c
ssurts are strings now
...
because they're friendlier that way in rethinkdb
2018-05-16 16:43:10 -07:00
Noah Levitt
1572fd3ed6
missed a spot where is_permitted_by_robots needs monkeying
2018-05-15 16:52:48 -07:00
Noah Levitt
fc05cac338
ok seriously tests
2018-05-14 15:38:28 -07:00
Noah Levitt
05f8ab3495
fix more tests for new approach sans scope['surt']
2018-05-14 15:38:28 -07:00
Noah Levitt
85a4757527
s/max_hops_off_surt/max_hops_off/
2018-05-14 15:38:28 -07:00
Noah Levitt
5ebd2fb709
new test of max_hops_off
2018-05-14 15:38:28 -07:00
Noah Levitt
b83d3cb9df
rename page.hops_off_surt to page.hops_off
2018-05-14 15:38:28 -07:00
Noah Levitt
245e27a21a
tests for new approach without of scope['surt']
...
replaced by an accept rule (two rules in some cases of seed redirects)
2018-05-14 15:38:28 -07:00
Noah Levitt
f26d711a89
new job setting max_claimed_sites
...
Puts a cap on the number of sites belonging to a given job that can be brozzled
simultaneously across the cluster. Addresses the problem of a job with many
seeds starving out other jobs. For AITFIVE-1578.
2018-03-01 17:17:54 -08:00
Noah Levitt
d7512fbeb6
move time limit enforcement
...
now it's next to stop request enforcement which makes more sense and
supports more timely action
2018-03-01 11:28:30 -08:00
Noah Levitt
9a0941f1fd
Merge branch 'master' into claim-batches
...
* master:
back to dev version number
commit for beta release
this should fix travis build?
fix tests
update brozzler-easy for current warcprox api
simpleclicks for minutes PDF
2018-02-06 11:46:15 -08:00
Noah Levitt
8505720c41
fix tests
2018-02-02 15:11:26 -08:00
Noah Levitt
7962444f09
claim sites to brozzle in batches to reduce contention over sites table
2018-02-02 13:56:24 -08:00
Noah Levitt
bf5401283e
new test test_needs_browsing
...
currently exposes bug in resolving "location" response header
2018-01-26 10:59:18 -08:00
Noah Levitt
7f78c335e1
--warcprox-auto distribute assigned sites evenly ( #78 )
...
--warcprox-auto distribute assigned sites evenly
When running with --warcprox-auto, choose the instance of warcprox with
the least number of assigned sites, instead of the lowest load in the
service registry. In practice we often start brozzling a whole bunch of
sites at approximately the same time, and because it takes time for that
to affect the "load" reported by warcprox instances, sites end up being
distributed very unevenly.
2018-01-19 14:54:33 -08:00
Noah Levitt
b81cc4eb0a
remove stray pdb line
2017-11-08 17:03:54 -08:00
Noah Levitt
133726e942
test a real-ish mpd
2017-11-08 17:01:27 -08:00
Barbara Miller
e8fdf84db8
add test--not a Video
2017-11-07 17:23:51 -08:00
Daniel Bicho
c4fa612547
fix some errors in test_resume_job
2017-10-17 10:33:26 +01:00
Daniel Bicho
bb98a43c8c
fix and test both job stop request and site stop requests
2017-10-16 11:46:35 +01:00
Daniel Bicho
8aa10962bc
test resume_job adding a simulation of a crawl job stopped and then resumed.
2017-10-15 19:11:46 +01:00
Daniel Bicho
378c097c29
add verification change to test_resume_job
2017-10-13 12:13:51 +01:00
Noah Levitt
384c877e9a
new test exposing problem where each hashtag visited causes a page load, if page redirects
2017-09-27 14:08:28 -07:00
Noah Levitt
3385d727ac
minimally update test_time_limit for new time accounting
2017-06-26 17:57:50 -07:00
Noah Levitt
405c5725e4
restore reclamation of orphaned, claimed sites, and heartbeat site.last_claimed every 7 minutes during youtube-dl processing, to prevent another brozzler-worker claiming the site
2017-06-23 13:50:49 -07:00
Noah Levitt
6bae53e646
disable the re-claiming of sites that are marked claimed from more than an hour ago, because sometimes pages legitimately take longer than an hour to brozzle; working on a better solution to this issue
2017-06-19 11:21:02 -07:00
Noah Levitt
d514eaec15
even more, better failing tests for thread_raise
2017-05-16 14:00:10 -07:00
Noah Levitt
d2525e2e87
failing test for forthcoming behavior of thread_raise
2017-05-15 16:20:20 -07:00
Noah Levitt
52433ade78
re-claim sites after 1 hour instead of 2 so that sites don't have to wait as long to be brozzled again in case of kill -9 brozzler-worker
2017-05-01 13:00:04 -07:00
Noah Levitt
dcf4811470
Merge branch 'master' into safe-thread-raise
2017-04-24 20:06:37 -07:00
Noah Levitt
0953e6972e
refactor thread_raise safety to use a context manager
2017-04-24 19:51:51 -07:00
Noah Levitt
f140e5bdbd
allow this stupid test to fail
2017-04-21 12:17:11 -07:00
Noah Levitt
ba519d7288
improve messaging when brozzler-stop-crawl is passed nonexistent seed/job id
2017-04-20 18:04:17 -07:00
Noah Levitt
7706bab8b8
safen up brozzler.thread_raise() to avoid interrupting rethinkdb transactions and such
2017-04-20 17:08:16 -07:00
Noah Levitt
8256a34b4f
implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker
2017-04-18 17:54:12 -07:00