Vangelis Banos
0b28a4a57f
More accurate JS behavior timeout
...
If you use a JS behavior timeout smaller than 7 sec, the JS behavior
will always need 7 sec because `sleep(7)` is hard-coded there.
We make a minor addition to use `min(timeout, 7)` for sleep so it will
finish faster when using a smaller JS behavior timeout.
2019-08-22 21:15:44 +00:00
Noah Levitt
16f886259d
Merge pull request #158 from galgeek/aitfive-1668-soundcoud
...
capture soundcloud user page before capturing tracks
2019-08-15 15:46:55 -07:00
Noah Levitt
94cd6cacb6
bump version after merge
2019-07-18 11:07:27 -07:00
Noah Levitt
726c6effed
Merge pull request #157 from vbanos/block-amp-analytics
...
Block AMP analytics JS script
2019-07-18 11:07:09 -07:00
Barbara Miller
9cc60449d7
skip downloading tracks from soundcloud user page
2019-07-17 17:45:02 -07:00
Vangelis Banos
6bd4fd6532
Block AMP analytics JS script
...
AMP analytics is part of Google analytics. We need to block it for
similar reasons.
AMP analytics reference:
https://developers.google.com/analytics/devguides/collection/amp-analytics/
2019-06-26 21:19:35 +00:00
Noah Levitt
8107abd804
Merge pull request #154 from vbanos/fix-brozzling-test
...
Fix test_brozzling::httpd fixture
1.5.6
2019-05-16 14:23:04 -07:00
Noah Levitt
5fdb2dd39c
documentation tweak
2019-05-16 14:03:43 -07:00
Noah Levitt
aa2d491009
i don't know where pyyaml 5.8 came from
2019-05-16 01:29:05 -07:00
Noah Levitt
42ddfba923
Merge pull request #150 from nlevitt/purge-old
...
Purge old
2019-05-16 00:29:58 -07:00
Noah Levitt
40331f02ba
Merge pull request #153 from vbanos/warn-deprecated
...
logging.warn is deprecated and replaced by logging.warning
2019-05-16 00:27:22 -07:00
Noah Levitt
f8db17ce3d
bump version after merge
2019-05-16 00:22:29 -07:00
Noah Levitt
eb34bebb91
Merge pull request #149 from nlevitt/travis-py37
...
trying to make this work with xenial for travis
2019-05-16 00:22:08 -07:00
Noah Levitt
c651bcdd18
remove some travis-ci debugging stuff
2019-05-16 00:21:28 -07:00
Noah Levitt
0a1360ab25
don't use localhost for test http server...
...
... because apparently sometimes chromium bypasses the proxy for local
addresses
2019-05-15 18:49:18 -07:00
Noah Levitt
f8165dc02b
work around pytest issue until fix is out
...
https://github.com/pytest-dev/pytest/issues/5257
2019-05-15 18:46:21 -07:00
Vangelis Banos
a1f9122317
Fix test_brozzling::httpd fixture
...
We used `self.headers.getheader` which no longer works. We replace it
with `self.headers.get`.
We change the code to write binary data to `self.wfile` because we get
an exception for writing str and/or None.
2019-05-14 16:29:52 +00:00
Vangelis Banos
a2ac3a0374
logging.warn is deprecated and replaced by logging.warning
...
We replace it everywhere in the code base.
2019-05-14 12:10:59 +00:00
Noah Levitt
ee8ef23f0c
fix mistake in job-conf.rst
2019-04-30 10:49:48 -07:00
Noah Levitt
411b3f266a
bump version after merge
2019-04-09 22:07:51 +00:00
Noah Levitt
d4386491df
Merge pull request #151 from nlevitt/no-cerberus-normalize
...
don't attempt cerberus normalization
2019-04-09 15:06:17 -07:00
Noah Levitt
5385232b40
don't attempt cerberus normalization
...
which encumbers the validation with additional requirements,
specifically makes it difficult to validate a subclass of `dict` because
it expects a constructor that works like dict.__init__()
2019-04-09 01:45:37 -07:00
Noah Levitt
8dfd92cf7f
fix this utility
2019-04-09 01:44:14 -07:00
Noah Levitt
433b201b52
use logging.warning() to quiet py37 warnings
2019-04-09 01:43:38 -07:00
Noah Levitt
dfd9d9ecdd
omfg
2019-04-04 17:22:15 -07:00
Noah Levitt
fd0fe811e9
so little output from chromium-browser :(
...
https://travis-ci.org/internetarchive/brozzler/jobs/515942434
could it be problems running as this other user?
2019-04-04 16:09:21 -07:00
Noah Levitt
55541be9e9
let's see chromium output inside brozzler-worker
...
using --trace, because chromium seems to be working ok when we just run
it
2019-04-04 15:11:24 -07:00
Noah Levitt
58d1d1c429
chromium-browser with no args isn't dying at start
...
what about with all the args?
2019-04-04 14:38:29 -07:00
Noah Levitt
473e891fb4
not sure if --disable-extensions did something
2019-04-04 13:34:45 -07:00
Noah Levitt
6d145c87c8
chromium-browser --disable-extensions ?
2019-04-04 13:24:12 -07:00
Noah Levitt
0d46d8ce19
still trying to figure out what's up with chromium
2019-04-04 13:15:17 -07:00
Noah Levitt
45ac12117a
maybe Xvnc.log will tell us something
2019-04-04 13:09:02 -07:00
Noah Levitt
8303fd3ab3
guessing DISPLAY was the issue here
...
https://travis-ci.org/internetarchive/brozzler/jobs/515882174#L610
2019-04-04 12:50:50 -07:00
Noah Levitt
899794f2da
debug what's going on with chromium in travis
...
see https://travis-ci.org/internetarchive/brozzler/jobs/514858838
(unroll "sudo cat /var/log/brozzler-worker.log")
2019-04-02 20:16:01,792 18595 CRITICAL BrozzlingThread:42073 brozzler.worker.BrozzlerWorker.brozzle_site(worker.py:412) unexpected exception
Traceback (most recent call last):
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 379, in brozzle_site
enable_youtube_dl=not self._skip_youtube_dl)
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 215, in brozzle_page
browser, site, page, on_screenshot, on_request)
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 292, in _browse_page
cookie_db=site.get('cookie_db'))
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/browser.py", line 341, in start
self.websock_url = self.chrome.start(**kwargs)
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/chrome.py", line 200, in start
return self._websocket_url()
File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/chrome.py", line 247, in _websocket_url
raise e
Exception: chrome process died with status 1
2019-04-04 12:38:46 -07:00
Noah Levitt
9459ed40d0
fix typo
2019-04-04 12:38:41 -07:00
Noah Levitt
68ce9eac76
debugging travis-ci is a slow process
2019-04-02 13:05:36 -07:00
Noah Levitt
85c6ac0ab2
fix next travis-ci problem
2019-04-02 12:05:08 -07:00
Noah Levitt
06e072a716
update some dependencies
2019-04-02 17:58:35 +00:00
Noah Levitt
8b6e5cbfb9
new option brozzler-purge --finished-before=...
2019-04-02 17:58:13 +00:00
Noah Levitt
9c658cddf7
fix a couple of svc definitions
2019-03-24 16:06:36 -07:00
Noah Levitt
48bb03418f
daemontools
2019-03-23 00:26:39 -07:00
Noah Levitt
18b4a26db6
porting ansible config to xenial
...
no more upstart, switch to daemontools, among other things
2019-03-22 23:50:46 -07:00
Noah Levitt
19522aff85
adjusting ansible config for xenial
...
untested because of vagrant problems
2019-03-19 16:37:13 -07:00
Noah Levitt
d4f8bc768f
trying to make this work with xenial for travis
...
see error https://travis-ci.org/internetarchive/brozzler/jobs/508141058
2019-03-18 16:38:23 -07:00
Noah Levitt
f2a9908395
travis only has py 3.7 for xenial
2019-03-18 16:20:54 -07:00
Noah Levitt
d729c8d0d5
use yaml.safe_load()
...
getting new warnings
see https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation
2019-03-18 15:49:44 -07:00
Noah Levitt
6f5f090c33
test py 3.7
2019-03-18 15:49:03 -07:00
Noah Levitt
ef981706f4
fix rethinkdb dependency version
2019-03-18 15:08:36 -07:00
Noah Levitt
61274ae994
peg to working doublethink
...
see: https://github.com/internetarchive/doublethink/commit/f7fc7da725c9b
2019-03-14 20:04:09 +00:00
Noah Levitt
7d5bb4b5d4
Merge pull request #148 from vbanos/disk-cache
...
Add disk cache options to Chrome
2019-02-12 14:39:49 -08:00