998 Commits

Author SHA1 Message Date
Noah Levitt
016bd5d3f7
Merge pull request #77 from vbanos/chrome-stop-del-tmpdir
Fix to delete tmpdir on Chrome.stop()
2018-01-15 10:36:50 -08:00
Vangelis Banos
820c7cd8cc Fix to delete tmpdir on Chrome.stop()
The ``self._home_tmpdir.cleanup()`` cmd is not always executed when
stopping Chrome. As a result, a large number of ``/tmp/tmpXXX`` dirs are
created in production.

The reason is that ``Chrome.stop()`` execution can stop in the ``return``
statement in the following line:
https://github.com/internetarchive/brozzler/blob/master/brozzler/chrome.py#L268
and ``cleanup()`` does not run.

Moving the ``cleanup()`` in the ``finally`` part of the
``try/catch/finally`` block makes it run always in the end of
``Chrome.stop()`` and cleans up the tmp directory in any case.
2018-01-15 13:09:43 +00:00
Noah Levitt
4f37dc0104
Merge pull request #73 from vbanos/configurable-js-templates
Configurable JS templates location
2018-01-10 11:43:16 -08:00
Noah Levitt
46fcd055a6
Merge pull request #74 from vbanos/disable-background-networking
Add --disable-background-networking chromium flag
2018-01-09 09:57:23 -08:00
Vangelis Banos
3984ca017f Replace cwd var with d 2018-01-09 06:33:03 +00:00
Barbara Miller
37c5720729 log Page.interstitialShown 2018-01-08 08:26:44 -08:00
Vangelis Banos
3b0175c65b Add --disable-background-networking chromium flag
Chromium browser docs describe this as follows:
Disable several subsystems which run network requests in the
background. This is for use when doing network performance testing to
avoid noise in the measurements.

Testing indicates that irrelevant HTTP requests like the following stop
with this imporvement.
```
HEAD http://ugfgntuqva/ HTTP/1.1
```
2018-01-06 19:07:22 +00:00
Vangelis Banos
dacfba330c Configurable JS templates location
Brozzler has hard-coded the JS templates logic in  ``brozzler/behaviors.yaml``
and ``brozzler/js-templates/`` locations. With this change, you can use
the optional ``behaviors_dir`` ``browser.browse_page`` parameter to set a
custom location and use any potential JS behaviors.
2018-01-04 17:37:02 +00:00
Noah Levitt
503771d653 set a timeout on warcprox_write_record request 2017-12-27 15:52:55 -08:00
Noah Levitt
cc6297ef60 wait for ack from browser setting request headers
guessing this might fix the issue where some requests are missing the
warcprox-meta header, which results in their being written to the wrong
warc
2017-12-27 14:43:26 -08:00
Noah Levitt
1dea1f3f93 use Accept-Encoding: gzip instead of identity
fixes twitter scrolling, which had been giving "Loading seems to be
taking a while." error message
2017-12-27 14:22:24 -08:00
Noah Levitt
daecb4f59e fix brozzler-list-sites --site=SITE_ID 2017-12-21 17:16:41 -08:00
Noah Levitt
1a3e15d23b update for warcprox 2.3 2017-12-15 16:47:15 -08:00
Noah Levitt
2cf3239080 fiddling with travis-ci 2017-12-15 16:02:02 -08:00
Noah Levitt
7ff99266ea quiet down the logging 2017-12-15 15:57:36 -08:00
Noah Levitt
df6615cc2c avoid rethinkdb.errors.ReqlDriverError: Query size 2017-12-15 15:55:10 -08:00
Neil Minton
a6e5700c18
Merge pull request #72 from galgeek/ARI-5241b
simpleclicks for ARI-5241
2017-11-21 12:42:55 -08:00
Barbara Miller
2246fb3d07 simpleclicks for ARI-5241 2017-11-20 17:25:32 -08:00
Noah Levitt
196cd2c5eb will this fix the travis build? 2017-11-08 17:41:39 -08:00
Noah Levitt
a24fac0194
Merge pull request #70 from internetarchive/skipDashManifest
skip remembering dash manifests
2017-11-08 17:12:44 -08:00
Noah Levitt
b81cc4eb0a remove stray pdb line 2017-11-08 17:03:54 -08:00
Noah Levitt
133726e942 test a real-ish mpd 2017-11-08 17:01:27 -08:00
Barbara Miller
e8fdf84db8 add test--not a Video 2017-11-07 17:23:51 -08:00
Barbara Miller
91527f12df comment referencing PR 2017-11-07 16:05:35 -08:00
Barbara Miller
31e54c94e7 skip remembering dash manifests 2017-11-06 16:43:43 -08:00
Barbara Miller
f3aa794115 simpleclicks for thejewishnews.com 2017-10-26 19:43:29 -07:00
Barbara Miller
7f4deacdf7 Merge pull request #69 from BitBaron/ari-5426
Thanks, Neil!
2017-10-25 15:37:37 -07:00
Noah Levitt
19b67196ab Merge pull request #68 from danielbicho/master
fix resume_job
2017-10-17 09:51:54 -07:00
Daniel Bicho
c4fa612547 fix some errors in test_resume_job 2017-10-17 10:33:26 +01:00
Noah Levitt
d40390f938 cryptography lib version 2.1.1 is causing problems 2017-10-16 10:52:09 -07:00
Daniel Bicho
bb98a43c8c fix and test both job stop request and site stop requests 2017-10-16 11:46:35 +01:00
Daniel Bicho
8aa10962bc test resume_job adding a simulation of a crawl job stopped and then resumed. 2017-10-15 19:11:46 +01:00
Daniel Bicho
378c097c29 add verification change to test_resume_job 2017-10-13 12:13:51 +01:00
Daniel Bicho
36e323c942 fix resume_job function, the job was not able to resume because the job stop_requested value was not reset. 2017-10-12 19:21:13 +01:00
Noah Levitt
554dbe821b Merge pull request #67 from internetarchive/skip_youtube_dl
skip_youtube_dl
2017-09-29 15:10:10 -07:00
Barbara Miller
a86bde734f skip unnecessary assignment too 2017-09-29 15:06:36 -07:00
Barbara Miller
e6bb6791af skip unnecessary assignment 2017-09-29 14:53:24 -07:00
Barbara Miller
5e7b3b73dd skip_youtube_dl 2017-09-29 14:33:23 -07:00
Noah Levitt
ec847e48bc fix problem where each hashtag visited causes a page load if page url redirects 2017-09-27 14:11:20 -07:00
Noah Levitt
384c877e9a new test exposing problem where each hashtag visited causes a page load, if page redirects 2017-09-27 14:08:28 -07:00
Noah Levitt
519ce4c733 Merge pull request #66 from internetarchive/ARI-5259
ARI-5259 blog.sina.com.cn pagination
2017-09-07 13:07:50 -07:00
Barbara Miller
eb1f79271f blog.sin.com.cn pagination 2017-09-05 14:20:36 -07:00
Barbara Miller
71d54faae0 Merge pull request #65 from vbanos/behavior_timeout
Make behavior_timeout configurable
2017-08-31 14:39:39 -07:00
Vangelis Banos
bb93b04c23 Make behavior_timeout configurable
``behavior_timeout`` is hardcoded to 900s. With this MR we make it
configurable with a default value of 900. We add a new variable to
``BrozzlerWorker`` and ``Browser``.
2017-08-31 08:06:26 +00:00
Barbara Miller
18a52f0b15 Merge pull request #64 from galgeek/typo
fix typo
2017-08-26 16:58:58 -07:00
Barbara Miller
e786013b1b fix typo 2017-08-26 16:58:00 -07:00
Barbara Miller
00b57ed87a Merge pull request #61 from internetarchive/x11-support
screenshots don't work with Xvfb
2017-08-26 16:45:50 -07:00
Barbara Miller
f810603cdf Merge pull request #63 from vbanos/configurable-page-timeout
Thank you, @vbanos!
2017-08-23 13:31:29 -07:00
Vangelis Banos
00513af877 Configurable page timeout
The page loading timeout was hard-coded to 300s. With this change,
we make it configurable with a default value of 300.
2017-08-23 08:05:14 +00:00
Neil Minton
4733b0ac7d Update SoundCloud.com behavior selectors. 2017-08-18 14:16:51 -07:00