Noah Levitt
503771d653
set a timeout on warcprox_write_record request
2017-12-27 15:52:55 -08:00
Noah Levitt
cc6297ef60
wait for ack from browser setting request headers
...
guessing this might fix the issue where some requests are missing the
warcprox-meta header, which results in their being written to the wrong
warc
2017-12-27 14:43:26 -08:00
Noah Levitt
1dea1f3f93
use Accept-Encoding: gzip instead of identity
...
fixes twitter scrolling, which had been giving "Loading seems to be
taking a while." error message
2017-12-27 14:22:24 -08:00
Noah Levitt
daecb4f59e
fix brozzler-list-sites --site=SITE_ID
2017-12-21 17:16:41 -08:00
Noah Levitt
1a3e15d23b
update for warcprox 2.3
2017-12-15 16:47:15 -08:00
Noah Levitt
2cf3239080
fiddling with travis-ci
2017-12-15 16:02:02 -08:00
Noah Levitt
7ff99266ea
quiet down the logging
2017-12-15 15:57:36 -08:00
Noah Levitt
df6615cc2c
avoid rethinkdb.errors.ReqlDriverError: Query size
2017-12-15 15:55:10 -08:00
Neil Minton
a6e5700c18
Merge pull request #72 from galgeek/ARI-5241b
...
simpleclicks for ARI-5241
2017-11-21 12:42:55 -08:00
Barbara Miller
2246fb3d07
simpleclicks for ARI-5241
2017-11-20 17:25:32 -08:00
Noah Levitt
196cd2c5eb
will this fix the travis build?
2017-11-08 17:41:39 -08:00
Noah Levitt
a24fac0194
Merge pull request #70 from internetarchive/skipDashManifest
...
skip remembering dash manifests
2017-11-08 17:12:44 -08:00
Noah Levitt
b81cc4eb0a
remove stray pdb line
2017-11-08 17:03:54 -08:00
Noah Levitt
133726e942
test a real-ish mpd
2017-11-08 17:01:27 -08:00
Barbara Miller
e8fdf84db8
add test--not a Video
2017-11-07 17:23:51 -08:00
Barbara Miller
91527f12df
comment referencing PR
2017-11-07 16:05:35 -08:00
Barbara Miller
31e54c94e7
skip remembering dash manifests
2017-11-06 16:43:43 -08:00
Barbara Miller
f3aa794115
simpleclicks for thejewishnews.com
2017-10-26 19:43:29 -07:00
Barbara Miller
7f4deacdf7
Merge pull request #69 from BitBaron/ari-5426
...
Thanks, Neil!
2017-10-25 15:37:37 -07:00
Noah Levitt
19b67196ab
Merge pull request #68 from danielbicho/master
...
fix resume_job
2017-10-17 09:51:54 -07:00
Daniel Bicho
c4fa612547
fix some errors in test_resume_job
2017-10-17 10:33:26 +01:00
Noah Levitt
d40390f938
cryptography lib version 2.1.1 is causing problems
2017-10-16 10:52:09 -07:00
Daniel Bicho
bb98a43c8c
fix and test both job stop request and site stop requests
2017-10-16 11:46:35 +01:00
Daniel Bicho
8aa10962bc
test resume_job adding a simulation of a crawl job stopped and then resumed.
2017-10-15 19:11:46 +01:00
Daniel Bicho
378c097c29
add verification change to test_resume_job
2017-10-13 12:13:51 +01:00
Daniel Bicho
36e323c942
fix resume_job function, the job was not able to resume because the job stop_requested value was not reset.
2017-10-12 19:21:13 +01:00
Noah Levitt
554dbe821b
Merge pull request #67 from internetarchive/skip_youtube_dl
...
skip_youtube_dl
2017-09-29 15:10:10 -07:00
Barbara Miller
a86bde734f
skip unnecessary assignment too
2017-09-29 15:06:36 -07:00
Barbara Miller
e6bb6791af
skip unnecessary assignment
2017-09-29 14:53:24 -07:00
Barbara Miller
5e7b3b73dd
skip_youtube_dl
2017-09-29 14:33:23 -07:00
Noah Levitt
ec847e48bc
fix problem where each hashtag visited causes a page load if page url redirects
2017-09-27 14:11:20 -07:00
Noah Levitt
384c877e9a
new test exposing problem where each hashtag visited causes a page load, if page redirects
2017-09-27 14:08:28 -07:00
Noah Levitt
519ce4c733
Merge pull request #66 from internetarchive/ARI-5259
...
ARI-5259 blog.sina.com.cn pagination
2017-09-07 13:07:50 -07:00
Barbara Miller
eb1f79271f
blog.sin.com.cn pagination
2017-09-05 14:20:36 -07:00
Barbara Miller
71d54faae0
Merge pull request #65 from vbanos/behavior_timeout
...
Make behavior_timeout configurable
2017-08-31 14:39:39 -07:00
Vangelis Banos
bb93b04c23
Make behavior_timeout configurable
...
``behavior_timeout`` is hardcoded to 900s. With this MR we make it
configurable with a default value of 900. We add a new variable to
``BrozzlerWorker`` and ``Browser``.
2017-08-31 08:06:26 +00:00
Barbara Miller
18a52f0b15
Merge pull request #64 from galgeek/typo
...
fix typo
2017-08-26 16:58:58 -07:00
Barbara Miller
e786013b1b
fix typo
2017-08-26 16:58:00 -07:00
Barbara Miller
00b57ed87a
Merge pull request #61 from internetarchive/x11-support
...
screenshots don't work with Xvfb
2017-08-26 16:45:50 -07:00
Barbara Miller
f810603cdf
Merge pull request #63 from vbanos/configurable-page-timeout
...
Thank you, @vbanos!
2017-08-23 13:31:29 -07:00
Vangelis Banos
00513af877
Configurable page timeout
...
The page loading timeout was hard-coded to 300s. With this change,
we make it configurable with a default value of 300.
2017-08-23 08:05:14 +00:00
Neil Minton
4733b0ac7d
Update SoundCloud.com behavior selectors.
2017-08-18 14:16:51 -07:00
Neil Minton
a8a624fbbf
Add Archive.org playlists to default behavior.
2017-08-18 14:16:51 -07:00
Neil Minton
b0fd1df1ef
Generalize default behavior.
2017-08-18 14:16:51 -07:00
Neil Minton
12e02ae401
Merge pull request #62 from internetarchive/ARI-5460
...
update Instagram selectors
2017-08-17 16:08:44 -07:00
Barbara Miller
c181f4bcc3
screenshots don't work w/Xvfb
2017-08-16 15:20:43 -07:00
Barbara Miller
6391e7b40f
Merge pull request #60 from galgeek/ARI-5453
...
simpleclicks for wixsite.com
2017-08-14 17:14:09 -07:00
Barbara Miller
901995c6cf
Merge pull request #58 from internetarchive/ARI-5379
...
ARI 5379 URL regex update
2017-08-14 16:54:17 -07:00
Barbara Miller
36b7e4f3d6
Merge pull request #59 from galgeek/ARI-5465
...
skip a.uiMorePagerPrimary after all
2017-08-14 16:50:39 -07:00
Barbara Miller
b5121c26a8
simpleclicks for wixsite.com
2017-08-14 16:47:49 -07:00