Commit graph

1361 commits

Author SHA1 Message Date
Barbara Miller
011cdde7ce Merge branch 'ARI-5379' into qa 2018-01-29 15:27:07 -08:00
Barbara Miller
92c137f402 behavior_timeout_custom (not timeout_from_behavior) 2018-01-29 15:26:23 -08:00
Barbara Miller
d3088c6418 Merge branch 'ARI-5379' into qa 2018-01-26 17:26:25 -08:00
Barbara Miller
70af801da1 configurable behavior timeout 2018-01-26 17:23:51 -08:00
Noah Levitt
4d37f88bcb
Merge pull request #75 from galgeek/pageInterstitialShown
log Page.interstitialShown
2018-01-26 16:18:22 -08:00
Noah Levitt
0e17205e17
Merge pull request #82 from vbanos/websock-tcp-nodely
Use TCP_NODELAY in websocket connection to improve performance
2018-01-26 12:14:44 -08:00
Noah Levitt
ba8d5a3740 fix needs_browsing check
correctly handle relative url "location" response header
2018-01-26 11:00:46 -08:00
Noah Levitt
bf5401283e new test test_needs_browsing
currently exposes bug in resolving "location" response header
2018-01-26 10:59:18 -08:00
Noah Levitt
67d5a0e671 increase timeout waiting for screenshot
because we are seeing timeouts on moderately busy machines
2018-01-26 10:19:23 -08:00
Vangelis Banos
3b0d1203c3 Use TCP_NODELAY in websocket connection to improve performance 2018-01-25 22:39:32 +00:00
Barbara Miller
455014a631 Merge branch 'ARI-5294' into qa 2018-01-23 11:47:57 -08:00
Barbara Miller
bc21b325d7 simpleclicks for minutes PDF 2018-01-23 11:43:35 -08:00
Noah Levitt
c934759852 pass canonicalized url to youtube-dl
avoids this kind of error:
wbgrp-svc294 2018-01-19 21:04:43,973 648 ERROR BrozzlingThread:39295 youtube_dl.to_stderr(YoutubeDL.py:514) ERROR: Unable to download webpage: <urlopen error no host given> (caused by URLError('no host given',))
wbgrp-svc294 2018-01-19 21:04:43,973 648 ERROR BrozzlingThread:39295 root.brozzle_site(worker.py:521) proxy error (site.proxy=wbgrp-svc400.us.archive.org:8002), will try to choose a healthy instance next time site is brozzled: youtube-dl hit apparent proxy error from https:/www.laphil.com/press1718
2018-01-22 14:52:54 -08:00
Noah Levitt
190a159188 Merge branch 'master' into qa
* master:
  pass canonicalized url to youtube-dl
2018-01-22 12:48:28 -08:00
Noah Levitt
4ddd76f542 pass canonicalized url to youtube-dl
avoids this kind of error:
wbgrp-svc294 2018-01-19 21:04:43,973 648 ERROR BrozzlingThread:39295 youtube_dl.to_stderr(YoutubeDL.py:514) ERROR: Unable to download webpage: <urlopen error no host given> (caused by URLError('no host given',))
wbgrp-svc294 2018-01-19 21:04:43,973 648 ERROR BrozzlingThread:39295 root.brozzle_site(worker.py:521) proxy error (site.proxy=wbgrp-svc400.us.archive.org:8002), will try to choose a healthy instance next time site is brozzled: youtube-dl hit apparent proxy error from https:/www.laphil.com/press1718
2018-01-22 12:47:26 -08:00
Noah Levitt
9034377f7e Merge branch 'master' into qa
* master:
  bump version after pull request merge
  --warcprox-auto distribute assigned sites evenly (#78)
2018-01-19 15:03:36 -08:00
Noah Levitt
c22e81341a bump version after pull request merge 2018-01-19 15:02:55 -08:00
Noah Levitt
7f78c335e1
--warcprox-auto distribute assigned sites evenly (#78)
--warcprox-auto distribute assigned sites evenly

When running with --warcprox-auto, choose the instance of warcprox with
the least number of assigned sites, instead of the lowest load in the
service registry. In practice we often start brozzling a whole bunch of
sites at approximately the same time, and because it takes time for that
to affect the "load" reported by warcprox instances, sites end up being
distributed very unevenly.
2018-01-19 14:54:33 -08:00
Noah Levitt
78f079154a Merge branch 'choose-warcprox' into qa
* choose-warcprox:
  test and fix choose_warcprox
2018-01-19 13:11:44 -08:00
Noah Levitt
1308f784ac test and fix choose_warcprox 2018-01-19 13:11:25 -08:00
Noah Levitt
9157cf3a0a Merge branch 'choose-warcprox' into qa
* choose-warcprox:
  --warcprox-auto distribute assigned sites evenly
2018-01-19 11:43:01 -08:00
Noah Levitt
32bea18ab0 Merge branch 'master' into qa
* master:
2018-01-19 11:42:45 -08:00
Noah Levitt
bc4b2f3145 --warcprox-auto distribute assigned sites evenly
When running with --warcprox-auto, choose the instance of warcprox with
the least number of assigned sites, instead of the lowest load in the
service registry. In practice we often start brozzling a whole bunch of
sites at approximately the same time, and because it takes time for that
to affect the "load" reported by warcprox instances, sites end up being
distributed very unevenly.
2018-01-19 11:35:06 -08:00
Noah Levitt
9e80a3b0d3
Merge pull request #71 from internetarchive/brofurb
JS class-based generalized behavior
2018-01-18 12:23:18 -08:00
Barbara Miller
2773c4ab6f Merge branch 'brofurb' into qa 2018-01-15 19:52:38 -08:00
Barbara Miller
2f3f258856 update copyright dates 2018-01-15 19:39:41 -08:00
Barbara Miller
e52ba4c8ef rm default.js 2018-01-15 19:38:15 -08:00
Barbara Miller
93ceeacfd7 rm obsolete 2018-01-15 19:36:32 -08:00
Barbara Miller
2ce9cf41a1 resolve conflicts 2018-01-15 19:34:47 -08:00
Barbara Miller
9aa670ece5 simple multi-selector test with window.scroll 2018-01-15 17:58:10 -08:00
Barbara Miller
7dccc809d0 use shorter interval 2018-01-15 17:58:10 -08:00
Barbara Miller
06a2b5f817 tidied 2018-01-15 17:58:10 -08:00
Barbara Miller
b979372e85 update copyright 2018-01-15 17:58:10 -08:00
Barbara Miller
93a81a4a37 qa simpleIntervalFunc for now 2018-01-15 17:58:10 -08:00
Barbara Miller
b589324a05 add simplerIntervalFunc... 2018-01-15 17:58:10 -08:00
Barbara Miller
f78e1ff710 minor edits 2018-01-15 17:58:10 -08:00
Barbara Miller
d0203ff9eb tweaks post-troubleshooting ARI-5241 2018-01-15 17:58:10 -08:00
Barbara Miller
dd3b041eec class-based generalized behavior 2018-01-15 17:58:10 -08:00
Barbara Miller
34fb4baf00 WIP: class-based generalized behavior 2018-01-15 17:58:10 -08:00
Barbara Miller
b968397fbe update default selectors 2018-01-15 17:58:10 -08:00
Barbara Miller
e364b79796 refurb behaviors.yaml 171015 2018-01-15 17:58:10 -08:00
Noah Levitt
016bd5d3f7
Merge pull request #77 from vbanos/chrome-stop-del-tmpdir
Fix to delete tmpdir on Chrome.stop()
2018-01-15 10:36:50 -08:00
Vangelis Banos
820c7cd8cc Fix to delete tmpdir on Chrome.stop()
The ``self._home_tmpdir.cleanup()`` cmd is not always executed when
stopping Chrome. As a result, a large number of ``/tmp/tmpXXX`` dirs are
created in production.

The reason is that ``Chrome.stop()`` execution can stop in the ``return``
statement in the following line:
https://github.com/internetarchive/brozzler/blob/master/brozzler/chrome.py#L268
and ``cleanup()`` does not run.

Moving the ``cleanup()`` in the ``finally`` part of the
``try/catch/finally`` block makes it run always in the end of
``Chrome.stop()`` and cleans up the tmp directory in any case.
2018-01-15 13:09:43 +00:00
Noah Levitt
4f37dc0104
Merge pull request #73 from vbanos/configurable-js-templates
Configurable JS templates location
2018-01-10 11:43:16 -08:00
Noah Levitt
46fcd055a6
Merge pull request #74 from vbanos/disable-background-networking
Add --disable-background-networking chromium flag
2018-01-09 09:57:23 -08:00
Vangelis Banos
3984ca017f Replace cwd var with d 2018-01-09 06:33:03 +00:00
Barbara Miller
5901434c2b Merge branch 'pageInterstitialShown' into qa 2018-01-08 08:27:45 -08:00
Barbara Miller
37c5720729 log Page.interstitialShown 2018-01-08 08:26:44 -08:00
Vangelis Banos
3b0175c65b Add --disable-background-networking chromium flag
Chromium browser docs describe this as follows:
Disable several subsystems which run network requests in the
background. This is for use when doing network performance testing to
avoid noise in the measurements.

Testing indicates that irrelevant HTTP requests like the following stop
with this imporvement.
```
HEAD http://ugfgntuqva/ HTTP/1.1
```
2018-01-06 19:07:22 +00:00
Vangelis Banos
dacfba330c Configurable JS templates location
Brozzler has hard-coded the JS templates logic in  ``brozzler/behaviors.yaml``
and ``brozzler/js-templates/`` locations. With this change, you can use
the optional ``behaviors_dir`` ``browser.browse_page`` parameter to set a
custom location and use any potential JS behaviors.
2018-01-04 17:37:02 +00:00