1172 Commits

Author SHA1 Message Date
Noah Levitt
94cd6cacb6 bump version after merge 2019-07-18 11:07:27 -07:00
Noah Levitt
726c6effed
Merge pull request #157 from vbanos/block-amp-analytics
Block AMP analytics JS script
2019-07-18 11:07:09 -07:00
Vangelis Banos
6bd4fd6532 Block AMP analytics JS script
AMP analytics is part of Google analytics. We need to block it for
similar reasons.

AMP analytics reference:

https://developers.google.com/analytics/devguides/collection/amp-analytics/
2019-06-26 21:19:35 +00:00
Noah Levitt
8107abd804
Merge pull request #154 from vbanos/fix-brozzling-test
Fix test_brozzling::httpd fixture
1.5.6
2019-05-16 14:23:04 -07:00
Noah Levitt
5fdb2dd39c documentation tweak 2019-05-16 14:03:43 -07:00
Noah Levitt
aa2d491009 i don't know where pyyaml 5.8 came from 2019-05-16 01:29:05 -07:00
Noah Levitt
42ddfba923
Merge pull request #150 from nlevitt/purge-old
Purge old
2019-05-16 00:29:58 -07:00
Noah Levitt
40331f02ba
Merge pull request #153 from vbanos/warn-deprecated
logging.warn is deprecated and replaced by logging.warning
2019-05-16 00:27:22 -07:00
Noah Levitt
f8db17ce3d bump version after merge 2019-05-16 00:22:29 -07:00
Noah Levitt
eb34bebb91
Merge pull request #149 from nlevitt/travis-py37
trying to make this work with xenial for travis
2019-05-16 00:22:08 -07:00
Noah Levitt
c651bcdd18 remove some travis-ci debugging stuff 2019-05-16 00:21:28 -07:00
Noah Levitt
0a1360ab25 don't use localhost for test http server...
... because apparently sometimes chromium bypasses the proxy for local
addresses
2019-05-15 18:49:18 -07:00
Noah Levitt
f8165dc02b work around pytest issue until fix is out
https://github.com/pytest-dev/pytest/issues/5257
2019-05-15 18:46:21 -07:00
Vangelis Banos
a1f9122317 Fix test_brozzling::httpd fixture
We used `self.headers.getheader` which no longer works. We replace it
with `self.headers.get`.

We change the code to write binary data to `self.wfile` because we get
an exception for writing str and/or None.
2019-05-14 16:29:52 +00:00
Vangelis Banos
a2ac3a0374 logging.warn is deprecated and replaced by logging.warning
We replace it everywhere in the code base.
2019-05-14 12:10:59 +00:00
Noah Levitt
ee8ef23f0c fix mistake in job-conf.rst 2019-04-30 10:49:48 -07:00
Noah Levitt
411b3f266a bump version after merge 2019-04-09 22:07:51 +00:00
Noah Levitt
d4386491df
Merge pull request #151 from nlevitt/no-cerberus-normalize
don't attempt cerberus normalization
2019-04-09 15:06:17 -07:00
Noah Levitt
5385232b40 don't attempt cerberus normalization
which encumbers the validation with additional requirements,
specifically makes it difficult to validate a subclass of `dict` because
it expects a constructor that works like dict.__init__()
2019-04-09 01:45:37 -07:00
Noah Levitt
8dfd92cf7f fix this utility 2019-04-09 01:44:14 -07:00
Noah Levitt
433b201b52 use logging.warning() to quiet py37 warnings 2019-04-09 01:43:38 -07:00
Noah Levitt
dfd9d9ecdd omfg 2019-04-04 17:22:15 -07:00
Noah Levitt
fd0fe811e9 so little output from chromium-browser :(
https://travis-ci.org/internetarchive/brozzler/jobs/515942434

could it be problems running as this other user?
2019-04-04 16:09:21 -07:00
Noah Levitt
55541be9e9 let's see chromium output inside brozzler-worker
using --trace, because chromium seems to be working ok when we just run
it
2019-04-04 15:11:24 -07:00
Noah Levitt
58d1d1c429 chromium-browser with no args isn't dying at start
what about with all the args?
2019-04-04 14:38:29 -07:00
Noah Levitt
473e891fb4 not sure if --disable-extensions did something 2019-04-04 13:34:45 -07:00
Noah Levitt
6d145c87c8 chromium-browser --disable-extensions ? 2019-04-04 13:24:12 -07:00
Noah Levitt
0d46d8ce19 still trying to figure out what's up with chromium 2019-04-04 13:15:17 -07:00
Noah Levitt
45ac12117a maybe Xvnc.log will tell us something 2019-04-04 13:09:02 -07:00
Noah Levitt
8303fd3ab3 guessing DISPLAY was the issue here
https://travis-ci.org/internetarchive/brozzler/jobs/515882174#L610
2019-04-04 12:50:50 -07:00
Noah Levitt
899794f2da debug what's going on with chromium in travis
see https://travis-ci.org/internetarchive/brozzler/jobs/514858838
(unroll "sudo cat /var/log/brozzler-worker.log")

2019-04-02 20:16:01,792 18595 CRITICAL BrozzlingThread:42073 brozzler.worker.BrozzlerWorker.brozzle_site(worker.py:412) unexpected exception
Traceback (most recent call last):
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 379, in brozzle_site
    enable_youtube_dl=not self._skip_youtube_dl)
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 215, in brozzle_page
    browser, site, page, on_screenshot, on_request)
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 292, in _browse_page
    cookie_db=site.get('cookie_db'))
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/browser.py", line 341, in start
    self.websock_url = self.chrome.start(**kwargs)
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/chrome.py", line 200, in start
    return self._websocket_url()
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/chrome.py", line 247, in _websocket_url
    raise e
Exception: chrome process died with status 1
2019-04-04 12:38:46 -07:00
Noah Levitt
9459ed40d0 fix typo 2019-04-04 12:38:41 -07:00
Noah Levitt
68ce9eac76 debugging travis-ci is a slow process 2019-04-02 13:05:36 -07:00
Noah Levitt
85c6ac0ab2 fix next travis-ci problem 2019-04-02 12:05:08 -07:00
Noah Levitt
06e072a716 update some dependencies 2019-04-02 17:58:35 +00:00
Noah Levitt
8b6e5cbfb9 new option brozzler-purge --finished-before=... 2019-04-02 17:58:13 +00:00
Noah Levitt
9c658cddf7 fix a couple of svc definitions 2019-03-24 16:06:36 -07:00
Noah Levitt
48bb03418f daemontools 2019-03-23 00:26:39 -07:00
Noah Levitt
18b4a26db6 porting ansible config to xenial
no more upstart, switch to daemontools, among other things
2019-03-22 23:50:46 -07:00
Noah Levitt
19522aff85 adjusting ansible config for xenial
untested because of vagrant problems
2019-03-19 16:37:13 -07:00
Noah Levitt
d4f8bc768f trying to make this work with xenial for travis
see error https://travis-ci.org/internetarchive/brozzler/jobs/508141058
2019-03-18 16:38:23 -07:00
Noah Levitt
f2a9908395 travis only has py 3.7 for xenial 2019-03-18 16:20:54 -07:00
Noah Levitt
d729c8d0d5 use yaml.safe_load()
getting new warnings
see https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation
2019-03-18 15:49:44 -07:00
Noah Levitt
6f5f090c33 test py 3.7 2019-03-18 15:49:03 -07:00
Noah Levitt
ef981706f4 fix rethinkdb dependency version 2019-03-18 15:08:36 -07:00
Noah Levitt
61274ae994 peg to working doublethink
see: https://github.com/internetarchive/doublethink/commit/f7fc7da725c9b
2019-03-14 20:04:09 +00:00
Noah Levitt
7d5bb4b5d4
Merge pull request #148 from vbanos/disk-cache
Add disk cache options to Chrome
2019-02-12 14:39:49 -08:00
Vangelis Banos
9c48a6fa11 Use disk cache params only on Chrome.start
Use `disk_cache_dir` and `disk_cache_size` only on `Chrome.start` and
not on `Chrome.__init__`.

Drop `disk_cache_dir` and `disk_cache_size` class attributes.
2019-02-12 20:59:08 +00:00
Vangelis Banos
adeca823dd Remove stale comment 2019-02-12 07:21:44 +00:00
Vangelis Banos
31e611771e Improve disk cache options
Remove `--disable-cache`, its not used any more.

Rename `disk_cache` to `disk_cache_dir` and use only path (str)
argument.

Decouple `--disk-cache-size` from `--disk-cache-dir` so it is possible
to use either or both.
2019-02-07 07:42:45 +00:00