Vangelis Banos
3b0175c65b
Add --disable-background-networking chromium flag
...
Chromium browser docs describe this as follows:
Disable several subsystems which run network requests in the
background. This is for use when doing network performance testing to
avoid noise in the measurements.
Testing indicates that irrelevant HTTP requests like the following stop
with this imporvement.
```
HEAD http://ugfgntuqva/ HTTP/1.1
```
2018-01-06 19:07:22 +00:00
Noah Levitt
503771d653
set a timeout on warcprox_write_record request
2017-12-27 15:52:55 -08:00
Noah Levitt
cc6297ef60
wait for ack from browser setting request headers
...
guessing this might fix the issue where some requests are missing the
warcprox-meta header, which results in their being written to the wrong
warc
2017-12-27 14:43:26 -08:00
Noah Levitt
1dea1f3f93
use Accept-Encoding: gzip instead of identity
...
fixes twitter scrolling, which had been giving "Loading seems to be
taking a while." error message
2017-12-27 14:22:24 -08:00
Noah Levitt
daecb4f59e
fix brozzler-list-sites --site=SITE_ID
2017-12-21 17:16:41 -08:00
Noah Levitt
1a3e15d23b
update for warcprox 2.3
2017-12-15 16:47:15 -08:00
Noah Levitt
2cf3239080
fiddling with travis-ci
2017-12-15 16:02:02 -08:00
Noah Levitt
7ff99266ea
quiet down the logging
2017-12-15 15:57:36 -08:00
Noah Levitt
df6615cc2c
avoid rethinkdb.errors.ReqlDriverError: Query size
2017-12-15 15:55:10 -08:00
Neil Minton
a6e5700c18
Merge pull request #72 from galgeek/ARI-5241b
...
simpleclicks for ARI-5241
2017-11-21 12:42:55 -08:00
Barbara Miller
2246fb3d07
simpleclicks for ARI-5241
2017-11-20 17:25:32 -08:00
Noah Levitt
196cd2c5eb
will this fix the travis build?
2017-11-08 17:41:39 -08:00
Noah Levitt
a24fac0194
Merge pull request #70 from internetarchive/skipDashManifest
...
skip remembering dash manifests
2017-11-08 17:12:44 -08:00
Noah Levitt
b81cc4eb0a
remove stray pdb line
2017-11-08 17:03:54 -08:00
Noah Levitt
133726e942
test a real-ish mpd
2017-11-08 17:01:27 -08:00
Barbara Miller
e8fdf84db8
add test--not a Video
2017-11-07 17:23:51 -08:00
Barbara Miller
91527f12df
comment referencing PR
2017-11-07 16:05:35 -08:00
Barbara Miller
31e54c94e7
skip remembering dash manifests
2017-11-06 16:43:43 -08:00
Barbara Miller
7f4deacdf7
Merge pull request #69 from BitBaron/ari-5426
...
Thanks, Neil!
2017-10-25 15:37:37 -07:00
Noah Levitt
19b67196ab
Merge pull request #68 from danielbicho/master
...
fix resume_job
2017-10-17 09:51:54 -07:00
Daniel Bicho
c4fa612547
fix some errors in test_resume_job
2017-10-17 10:33:26 +01:00
Noah Levitt
d40390f938
cryptography lib version 2.1.1 is causing problems
2017-10-16 10:52:09 -07:00
Daniel Bicho
bb98a43c8c
fix and test both job stop request and site stop requests
2017-10-16 11:46:35 +01:00
Daniel Bicho
8aa10962bc
test resume_job adding a simulation of a crawl job stopped and then resumed.
2017-10-15 19:11:46 +01:00
Daniel Bicho
378c097c29
add verification change to test_resume_job
2017-10-13 12:13:51 +01:00
Daniel Bicho
36e323c942
fix resume_job function, the job was not able to resume because the job stop_requested value was not reset.
2017-10-12 19:21:13 +01:00
Noah Levitt
554dbe821b
Merge pull request #67 from internetarchive/skip_youtube_dl
...
skip_youtube_dl
2017-09-29 15:10:10 -07:00
Barbara Miller
a86bde734f
skip unnecessary assignment too
2017-09-29 15:06:36 -07:00
Barbara Miller
e6bb6791af
skip unnecessary assignment
2017-09-29 14:53:24 -07:00
Barbara Miller
5e7b3b73dd
skip_youtube_dl
2017-09-29 14:33:23 -07:00
Noah Levitt
ec847e48bc
fix problem where each hashtag visited causes a page load if page url redirects
2017-09-27 14:11:20 -07:00
Noah Levitt
384c877e9a
new test exposing problem where each hashtag visited causes a page load, if page redirects
2017-09-27 14:08:28 -07:00
Noah Levitt
519ce4c733
Merge pull request #66 from internetarchive/ARI-5259
...
ARI-5259 blog.sina.com.cn pagination
2017-09-07 13:07:50 -07:00
Barbara Miller
eb1f79271f
blog.sin.com.cn pagination
2017-09-05 14:20:36 -07:00
Barbara Miller
71d54faae0
Merge pull request #65 from vbanos/behavior_timeout
...
Make behavior_timeout configurable
2017-08-31 14:39:39 -07:00
Vangelis Banos
bb93b04c23
Make behavior_timeout configurable
...
``behavior_timeout`` is hardcoded to 900s. With this MR we make it
configurable with a default value of 900. We add a new variable to
``BrozzlerWorker`` and ``Browser``.
2017-08-31 08:06:26 +00:00
Barbara Miller
18a52f0b15
Merge pull request #64 from galgeek/typo
...
fix typo
2017-08-26 16:58:58 -07:00
Barbara Miller
e786013b1b
fix typo
2017-08-26 16:58:00 -07:00
Barbara Miller
00b57ed87a
Merge pull request #61 from internetarchive/x11-support
...
screenshots don't work with Xvfb
2017-08-26 16:45:50 -07:00
Barbara Miller
f810603cdf
Merge pull request #63 from vbanos/configurable-page-timeout
...
Thank you, @vbanos!
2017-08-23 13:31:29 -07:00
Vangelis Banos
00513af877
Configurable page timeout
...
The page loading timeout was hard-coded to 300s. With this change,
we make it configurable with a default value of 300.
2017-08-23 08:05:14 +00:00
Neil Minton
4733b0ac7d
Update SoundCloud.com behavior selectors.
2017-08-18 14:16:51 -07:00
Neil Minton
a8a624fbbf
Add Archive.org playlists to default behavior.
2017-08-18 14:16:51 -07:00
Neil Minton
b0fd1df1ef
Generalize default behavior.
2017-08-18 14:16:51 -07:00
Neil Minton
12e02ae401
Merge pull request #62 from internetarchive/ARI-5460
...
update Instagram selectors
2017-08-17 16:08:44 -07:00
Barbara Miller
c181f4bcc3
screenshots don't work w/Xvfb
2017-08-16 15:20:43 -07:00
Barbara Miller
6391e7b40f
Merge pull request #60 from galgeek/ARI-5453
...
simpleclicks for wixsite.com
2017-08-14 17:14:09 -07:00
Barbara Miller
901995c6cf
Merge pull request #58 from internetarchive/ARI-5379
...
ARI 5379 URL regex update
2017-08-14 16:54:17 -07:00
Barbara Miller
36b7e4f3d6
Merge pull request #59 from galgeek/ARI-5465
...
skip a.uiMorePagerPrimary after all
2017-08-14 16:50:39 -07:00
Barbara Miller
b5121c26a8
simpleclicks for wixsite.com
2017-08-14 16:47:49 -07:00