Noah Levitt
c5279f3348
log everything in this test
2019-10-03 16:23:33 -07:00
Noah Levitt
781a4d424d
yet more logging
2019-10-03 15:54:56 -07:00
Noah Levitt
5c8d4d57b1
fix TypeError
2019-10-03 12:10:29 -07:00
Noah Levitt
2e20380618
python 3.4 is deprecated, tests are failing anyway
2019-10-03 11:41:51 -07:00
Noah Levitt
1c51396111
trace logging again
2019-10-03 11:41:34 -07:00
Noah Levitt
fdb24f2893
add some trace logging to debug test failure
2019-10-02 15:45:59 -07:00
Noah Levitt
8a51f28c3d
fix dishonest travis badge
2019-10-02 15:02:56 -07:00
Noah Levitt
85e6027838
bump version after merge
2019-09-27 10:40:59 -07:00
Noah Levitt
996070b35c
Merge pull request #167 from vbanos/console-debug-only
...
Enable Console and Runtime outputs only when debugging
2019-09-27 10:40:17 -07:00
Vangelis Banos
fed5e6b741
Enable Console and Runtime outputs only when debugging
...
When capturing a page, we receive a LOT of messages from chrome.
Examining these message, we see that we can reduce them a bit to speed
up Brozzler.
We always use `Console.enable` which returns all browser console output.
Also, we always use `Runtime.enable`. Doc says:
https://chromedevtools.github.io/devtools-protocol/1-3/Runtime#method-enable
Enables reporting of execution contexts creation by means of
executionContextCreated event. When the reporting gets enabled the event
will be sent immediately for each existing execution context.
These outputs are useful when debugging but not in production.
If we disable them, we reduce the websocket traffic and improve
performance. With this PR, we enable them only when the current logging
level is `DEBUG`.
Counting the number of messages before and after the change, we see
improvements like:
https://www.gnome.org/technologies/ 220 -> 202 messages.
https://www.whitehouse.gov/issues/budget-spending/ 203 -> 189 messages
2019-09-27 13:24:06 +00:00
Noah Levitt
7273c7c3a2
Merge pull request #166 from CorentinB/facebook-ads-lib
...
Add support for Facebook ads library and fix closing
2019-09-26 14:13:47 -07:00
Corentin Barreau
e701e3f101
Add: break after closing the first visible element
2019-09-26 21:44:25 +02:00
Corentin Barreau
101f7f2e4a
Remove: useless comment
2019-09-25 19:48:38 +02:00
Corentin Barreau
fb30fb9aa3
Add: isVisible check for close selectors
...
Modify: doTarget - Revert to initial code
2019-09-25 16:19:41 +02:00
Corentin Barreau
5c5743ea11
Fix: closeSelector not being clicked
...
Add: support for facebook.com/ads/library - Open and close metrics for ads
2019-09-25 16:10:59 +02:00
Noah Levitt
efa185a8dc
Merge pull request #160 from vbanos/behavior-timeout
...
More accurate JS behavior timeout
2019-09-24 12:11:37 -07:00
Noah Levitt
eb30ba0c33
Merge pull request #165 from vbanos/stderr-stdout-exception-handling
...
Improve exception handling when reading STDIN/STDERR
2019-09-24 12:03:06 -07:00
Vangelis Banos
f42ff08da1
Improve exception handling when reading STDIN/STDERR
...
When the chrome process dies and we try to read STDIN/STDERR, we get
`ValueError: I/O operation on closed file` or
`OSError: [Errno 9] Bad file descriptor`.
We modify `readline_nonblock` method to return the buffer it read up to
this point.
2019-09-19 20:08:55 +00:00
Vangelis Banos
0b28a4a57f
More accurate JS behavior timeout
...
If you use a JS behavior timeout smaller than 7 sec, the JS behavior
will always need 7 sec because `sleep(7)` is hard-coded there.
We make a minor addition to use `min(timeout, 7)` for sleep so it will
finish faster when using a smaller JS behavior timeout.
2019-08-22 21:15:44 +00:00
Noah Levitt
16f886259d
Merge pull request #158 from galgeek/aitfive-1668-soundcoud
...
capture soundcloud user page before capturing tracks
2019-08-15 15:46:55 -07:00
Noah Levitt
94cd6cacb6
bump version after merge
2019-07-18 11:07:27 -07:00
Noah Levitt
726c6effed
Merge pull request #157 from vbanos/block-amp-analytics
...
Block AMP analytics JS script
2019-07-18 11:07:09 -07:00
Barbara Miller
9cc60449d7
skip downloading tracks from soundcloud user page
2019-07-17 17:45:02 -07:00
Vangelis Banos
6bd4fd6532
Block AMP analytics JS script
...
AMP analytics is part of Google analytics. We need to block it for
similar reasons.
AMP analytics reference:
https://developers.google.com/analytics/devguides/collection/amp-analytics/
2019-06-26 21:19:35 +00:00
Noah Levitt
8107abd804
Merge pull request #154 from vbanos/fix-brozzling-test
...
Fix test_brozzling::httpd fixture
1.5.6
2019-05-16 14:23:04 -07:00
Noah Levitt
5fdb2dd39c
documentation tweak
2019-05-16 14:03:43 -07:00
Noah Levitt
aa2d491009
i don't know where pyyaml 5.8 came from
2019-05-16 01:29:05 -07:00
Noah Levitt
42ddfba923
Merge pull request #150 from nlevitt/purge-old
...
Purge old
2019-05-16 00:29:58 -07:00
Noah Levitt
40331f02ba
Merge pull request #153 from vbanos/warn-deprecated
...
logging.warn is deprecated and replaced by logging.warning
2019-05-16 00:27:22 -07:00
Noah Levitt
f8db17ce3d
bump version after merge
2019-05-16 00:22:29 -07:00
Noah Levitt
eb34bebb91
Merge pull request #149 from nlevitt/travis-py37
...
trying to make this work with xenial for travis
2019-05-16 00:22:08 -07:00
Noah Levitt
c651bcdd18
remove some travis-ci debugging stuff
2019-05-16 00:21:28 -07:00
Noah Levitt
0a1360ab25
don't use localhost for test http server...
...
... because apparently sometimes chromium bypasses the proxy for local
addresses
2019-05-15 18:49:18 -07:00
Noah Levitt
f8165dc02b
work around pytest issue until fix is out
...
https://github.com/pytest-dev/pytest/issues/5257
2019-05-15 18:46:21 -07:00
Vangelis Banos
a1f9122317
Fix test_brozzling::httpd fixture
...
We used `self.headers.getheader` which no longer works. We replace it
with `self.headers.get`.
We change the code to write binary data to `self.wfile` because we get
an exception for writing str and/or None.
2019-05-14 16:29:52 +00:00
Vangelis Banos
a2ac3a0374
logging.warn is deprecated and replaced by logging.warning
...
We replace it everywhere in the code base.
2019-05-14 12:10:59 +00:00
Noah Levitt
ee8ef23f0c
fix mistake in job-conf.rst
2019-04-30 10:49:48 -07:00
Noah Levitt
411b3f266a
bump version after merge
2019-04-09 22:07:51 +00:00
Noah Levitt
d4386491df
Merge pull request #151 from nlevitt/no-cerberus-normalize
...
don't attempt cerberus normalization
2019-04-09 15:06:17 -07:00
Noah Levitt
5385232b40
don't attempt cerberus normalization
...
which encumbers the validation with additional requirements,
specifically makes it difficult to validate a subclass of `dict` because
it expects a constructor that works like dict.__init__()
2019-04-09 01:45:37 -07:00
Noah Levitt
8dfd92cf7f
fix this utility
2019-04-09 01:44:14 -07:00
Noah Levitt
433b201b52
use logging.warning() to quiet py37 warnings
2019-04-09 01:43:38 -07:00
Noah Levitt
dfd9d9ecdd
omfg
2019-04-04 17:22:15 -07:00
Noah Levitt
fd0fe811e9
so little output from chromium-browser :(
...
https://travis-ci.org/internetarchive/brozzler/jobs/515942434
could it be problems running as this other user?
2019-04-04 16:09:21 -07:00
Noah Levitt
55541be9e9
let's see chromium output inside brozzler-worker
...
using --trace, because chromium seems to be working ok when we just run
it
2019-04-04 15:11:24 -07:00
Noah Levitt
58d1d1c429
chromium-browser with no args isn't dying at start
...
what about with all the args?
2019-04-04 14:38:29 -07:00
Noah Levitt
473e891fb4
not sure if --disable-extensions did something
2019-04-04 13:34:45 -07:00
Noah Levitt
6d145c87c8
chromium-browser --disable-extensions ?
2019-04-04 13:24:12 -07:00
Noah Levitt
0d46d8ce19
still trying to figure out what's up with chromium
2019-04-04 13:15:17 -07:00
Noah Levitt
45ac12117a
maybe Xvnc.log will tell us something
2019-04-04 13:09:02 -07:00