Noah Levitt
7f78c335e1
--warcprox-auto distribute assigned sites evenly ( #78 )
...
--warcprox-auto distribute assigned sites evenly
When running with --warcprox-auto, choose the instance of warcprox with
the least number of assigned sites, instead of the lowest load in the
service registry. In practice we often start brozzling a whole bunch of
sites at approximately the same time, and because it takes time for that
to affect the "load" reported by warcprox instances, sites end up being
distributed very unevenly.
2018-01-19 14:54:33 -08:00
Noah Levitt
9e80a3b0d3
Merge pull request #71 from internetarchive/brofurb
...
JS class-based generalized behavior
2018-01-18 12:23:18 -08:00
Barbara Miller
2f3f258856
update copyright dates
2018-01-15 19:39:41 -08:00
Barbara Miller
e52ba4c8ef
rm default.js
2018-01-15 19:38:15 -08:00
Barbara Miller
93ceeacfd7
rm obsolete
2018-01-15 19:36:32 -08:00
Barbara Miller
2ce9cf41a1
resolve conflicts
2018-01-15 19:34:47 -08:00
Barbara Miller
9aa670ece5
simple multi-selector test with window.scroll
2018-01-15 17:58:10 -08:00
Barbara Miller
7dccc809d0
use shorter interval
2018-01-15 17:58:10 -08:00
Barbara Miller
06a2b5f817
tidied
2018-01-15 17:58:10 -08:00
Barbara Miller
b979372e85
update copyright
2018-01-15 17:58:10 -08:00
Barbara Miller
93a81a4a37
qa simpleIntervalFunc for now
2018-01-15 17:58:10 -08:00
Barbara Miller
b589324a05
add simplerIntervalFunc...
2018-01-15 17:58:10 -08:00
Barbara Miller
f78e1ff710
minor edits
2018-01-15 17:58:10 -08:00
Barbara Miller
d0203ff9eb
tweaks post-troubleshooting ARI-5241
2018-01-15 17:58:10 -08:00
Barbara Miller
dd3b041eec
class-based generalized behavior
2018-01-15 17:58:10 -08:00
Barbara Miller
34fb4baf00
WIP: class-based generalized behavior
2018-01-15 17:58:10 -08:00
Barbara Miller
b968397fbe
update default selectors
2018-01-15 17:58:10 -08:00
Barbara Miller
e364b79796
refurb behaviors.yaml 171015
2018-01-15 17:58:10 -08:00
Noah Levitt
016bd5d3f7
Merge pull request #77 from vbanos/chrome-stop-del-tmpdir
...
Fix to delete tmpdir on Chrome.stop()
2018-01-15 10:36:50 -08:00
Vangelis Banos
820c7cd8cc
Fix to delete tmpdir on Chrome.stop()
...
The ``self._home_tmpdir.cleanup()`` cmd is not always executed when
stopping Chrome. As a result, a large number of ``/tmp/tmpXXX`` dirs are
created in production.
The reason is that ``Chrome.stop()`` execution can stop in the ``return``
statement in the following line:
https://github.com/internetarchive/brozzler/blob/master/brozzler/chrome.py#L268
and ``cleanup()`` does not run.
Moving the ``cleanup()`` in the ``finally`` part of the
``try/catch/finally`` block makes it run always in the end of
``Chrome.stop()`` and cleans up the tmp directory in any case.
2018-01-15 13:09:43 +00:00
Noah Levitt
4f37dc0104
Merge pull request #73 from vbanos/configurable-js-templates
...
Configurable JS templates location
2018-01-10 11:43:16 -08:00
Noah Levitt
46fcd055a6
Merge pull request #74 from vbanos/disable-background-networking
...
Add --disable-background-networking chromium flag
2018-01-09 09:57:23 -08:00
Vangelis Banos
3984ca017f
Replace cwd var with d
2018-01-09 06:33:03 +00:00
Barbara Miller
37c5720729
log Page.interstitialShown
2018-01-08 08:26:44 -08:00
Vangelis Banos
3b0175c65b
Add --disable-background-networking chromium flag
...
Chromium browser docs describe this as follows:
Disable several subsystems which run network requests in the
background. This is for use when doing network performance testing to
avoid noise in the measurements.
Testing indicates that irrelevant HTTP requests like the following stop
with this imporvement.
```
HEAD http://ugfgntuqva/ HTTP/1.1
```
2018-01-06 19:07:22 +00:00
Vangelis Banos
dacfba330c
Configurable JS templates location
...
Brozzler has hard-coded the JS templates logic in ``brozzler/behaviors.yaml``
and ``brozzler/js-templates/`` locations. With this change, you can use
the optional ``behaviors_dir`` ``browser.browse_page`` parameter to set a
custom location and use any potential JS behaviors.
2018-01-04 17:37:02 +00:00
Noah Levitt
503771d653
set a timeout on warcprox_write_record request
2017-12-27 15:52:55 -08:00
Noah Levitt
cc6297ef60
wait for ack from browser setting request headers
...
guessing this might fix the issue where some requests are missing the
warcprox-meta header, which results in their being written to the wrong
warc
2017-12-27 14:43:26 -08:00
Noah Levitt
1dea1f3f93
use Accept-Encoding: gzip instead of identity
...
fixes twitter scrolling, which had been giving "Loading seems to be
taking a while." error message
2017-12-27 14:22:24 -08:00
Noah Levitt
daecb4f59e
fix brozzler-list-sites --site=SITE_ID
2017-12-21 17:16:41 -08:00
Noah Levitt
1a3e15d23b
update for warcprox 2.3
2017-12-15 16:47:15 -08:00
Noah Levitt
2cf3239080
fiddling with travis-ci
2017-12-15 16:02:02 -08:00
Noah Levitt
7ff99266ea
quiet down the logging
2017-12-15 15:57:36 -08:00
Noah Levitt
df6615cc2c
avoid rethinkdb.errors.ReqlDriverError: Query size
2017-12-15 15:55:10 -08:00
Neil Minton
a6e5700c18
Merge pull request #72 from galgeek/ARI-5241b
...
simpleclicks for ARI-5241
2017-11-21 12:42:55 -08:00
Barbara Miller
2246fb3d07
simpleclicks for ARI-5241
2017-11-20 17:25:32 -08:00
Noah Levitt
196cd2c5eb
will this fix the travis build?
2017-11-08 17:41:39 -08:00
Noah Levitt
a24fac0194
Merge pull request #70 from internetarchive/skipDashManifest
...
skip remembering dash manifests
2017-11-08 17:12:44 -08:00
Noah Levitt
b81cc4eb0a
remove stray pdb line
2017-11-08 17:03:54 -08:00
Noah Levitt
133726e942
test a real-ish mpd
2017-11-08 17:01:27 -08:00
Barbara Miller
e8fdf84db8
add test--not a Video
2017-11-07 17:23:51 -08:00
Barbara Miller
91527f12df
comment referencing PR
2017-11-07 16:05:35 -08:00
Barbara Miller
31e54c94e7
skip remembering dash manifests
2017-11-06 16:43:43 -08:00
Barbara Miller
f3aa794115
simpleclicks for thejewishnews.com
2017-10-26 19:43:29 -07:00
Barbara Miller
7f4deacdf7
Merge pull request #69 from BitBaron/ari-5426
...
Thanks, Neil!
2017-10-25 15:37:37 -07:00
Noah Levitt
19b67196ab
Merge pull request #68 from danielbicho/master
...
fix resume_job
2017-10-17 09:51:54 -07:00
Daniel Bicho
c4fa612547
fix some errors in test_resume_job
2017-10-17 10:33:26 +01:00
Noah Levitt
d40390f938
cryptography lib version 2.1.1 is causing problems
2017-10-16 10:52:09 -07:00
Daniel Bicho
bb98a43c8c
fix and test both job stop request and site stop requests
2017-10-16 11:46:35 +01:00
Daniel Bicho
8aa10962bc
test resume_job adding a simulation of a crawl job stopped and then resumed.
2017-10-15 19:11:46 +01:00