1420 Commits

Author SHA1 Message Date
Vangelis Banos
6bd4fd6532 Block AMP analytics JS script
AMP analytics is part of Google analytics. We need to block it for
similar reasons.

AMP analytics reference:

https://developers.google.com/analytics/devguides/collection/amp-analytics/
2019-06-26 21:19:35 +00:00
Noah Levitt
8107abd804
Merge pull request #154 from vbanos/fix-brozzling-test
Fix test_brozzling::httpd fixture
1.5.6
2019-05-16 14:23:04 -07:00
Noah Levitt
5fdb2dd39c documentation tweak 2019-05-16 14:03:43 -07:00
Noah Levitt
aa2d491009 i don't know where pyyaml 5.8 came from 2019-05-16 01:29:05 -07:00
Noah Levitt
42ddfba923
Merge pull request #150 from nlevitt/purge-old
Purge old
2019-05-16 00:29:58 -07:00
Noah Levitt
40331f02ba
Merge pull request #153 from vbanos/warn-deprecated
logging.warn is deprecated and replaced by logging.warning
2019-05-16 00:27:22 -07:00
Noah Levitt
f8db17ce3d bump version after merge 2019-05-16 00:22:29 -07:00
Noah Levitt
eb34bebb91
Merge pull request #149 from nlevitt/travis-py37
trying to make this work with xenial for travis
2019-05-16 00:22:08 -07:00
Noah Levitt
c651bcdd18 remove some travis-ci debugging stuff 2019-05-16 00:21:28 -07:00
Noah Levitt
0a1360ab25 don't use localhost for test http server...
... because apparently sometimes chromium bypasses the proxy for local
addresses
2019-05-15 18:49:18 -07:00
Noah Levitt
f8165dc02b work around pytest issue until fix is out
https://github.com/pytest-dev/pytest/issues/5257
2019-05-15 18:46:21 -07:00
Vangelis Banos
a1f9122317 Fix test_brozzling::httpd fixture
We used `self.headers.getheader` which no longer works. We replace it
with `self.headers.get`.

We change the code to write binary data to `self.wfile` because we get
an exception for writing str and/or None.
2019-05-14 16:29:52 +00:00
Vangelis Banos
a2ac3a0374 logging.warn is deprecated and replaced by logging.warning
We replace it everywhere in the code base.
2019-05-14 12:10:59 +00:00
Noah Levitt
ee8ef23f0c fix mistake in job-conf.rst 2019-04-30 10:49:48 -07:00
Noah Levitt
411b3f266a bump version after merge 2019-04-09 22:07:51 +00:00
Noah Levitt
d4386491df
Merge pull request #151 from nlevitt/no-cerberus-normalize
don't attempt cerberus normalization
2019-04-09 15:06:17 -07:00
Noah Levitt
5385232b40 don't attempt cerberus normalization
which encumbers the validation with additional requirements,
specifically makes it difficult to validate a subclass of `dict` because
it expects a constructor that works like dict.__init__()
2019-04-09 01:45:37 -07:00
Noah Levitt
8dfd92cf7f fix this utility 2019-04-09 01:44:14 -07:00
Noah Levitt
433b201b52 use logging.warning() to quiet py37 warnings 2019-04-09 01:43:38 -07:00
Noah Levitt
dfd9d9ecdd omfg 2019-04-04 17:22:15 -07:00
Noah Levitt
fd0fe811e9 so little output from chromium-browser :(
https://travis-ci.org/internetarchive/brozzler/jobs/515942434

could it be problems running as this other user?
2019-04-04 16:09:21 -07:00
Noah Levitt
55541be9e9 let's see chromium output inside brozzler-worker
using --trace, because chromium seems to be working ok when we just run
it
2019-04-04 15:11:24 -07:00
Noah Levitt
58d1d1c429 chromium-browser with no args isn't dying at start
what about with all the args?
2019-04-04 14:38:29 -07:00
Noah Levitt
473e891fb4 not sure if --disable-extensions did something 2019-04-04 13:34:45 -07:00
Noah Levitt
6d145c87c8 chromium-browser --disable-extensions ? 2019-04-04 13:24:12 -07:00
Noah Levitt
0d46d8ce19 still trying to figure out what's up with chromium 2019-04-04 13:15:17 -07:00
Noah Levitt
45ac12117a maybe Xvnc.log will tell us something 2019-04-04 13:09:02 -07:00
Noah Levitt
8303fd3ab3 guessing DISPLAY was the issue here
https://travis-ci.org/internetarchive/brozzler/jobs/515882174#L610
2019-04-04 12:50:50 -07:00
Noah Levitt
899794f2da debug what's going on with chromium in travis
see https://travis-ci.org/internetarchive/brozzler/jobs/514858838
(unroll "sudo cat /var/log/brozzler-worker.log")

2019-04-02 20:16:01,792 18595 CRITICAL BrozzlingThread:42073 brozzler.worker.BrozzlerWorker.brozzle_site(worker.py:412) unexpected exception
Traceback (most recent call last):
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 379, in brozzle_site
    enable_youtube_dl=not self._skip_youtube_dl)
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 215, in brozzle_page
    browser, site, page, on_screenshot, on_request)
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 292, in _browse_page
    cookie_db=site.get('cookie_db'))
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/browser.py", line 341, in start
    self.websock_url = self.chrome.start(**kwargs)
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/chrome.py", line 200, in start
    return self._websocket_url()
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/chrome.py", line 247, in _websocket_url
    raise e
Exception: chrome process died with status 1
2019-04-04 12:38:46 -07:00
Noah Levitt
9459ed40d0 fix typo 2019-04-04 12:38:41 -07:00
Noah Levitt
68ce9eac76 debugging travis-ci is a slow process 2019-04-02 13:05:36 -07:00
Noah Levitt
85c6ac0ab2 fix next travis-ci problem 2019-04-02 12:05:08 -07:00
Noah Levitt
06e072a716 update some dependencies 2019-04-02 17:58:35 +00:00
Noah Levitt
8b6e5cbfb9 new option brozzler-purge --finished-before=... 2019-04-02 17:58:13 +00:00
Noah Levitt
9c658cddf7 fix a couple of svc definitions 2019-03-24 16:06:36 -07:00
Noah Levitt
48bb03418f daemontools 2019-03-23 00:26:39 -07:00
Noah Levitt
18b4a26db6 porting ansible config to xenial
no more upstart, switch to daemontools, among other things
2019-03-22 23:50:46 -07:00
Noah Levitt
19522aff85 adjusting ansible config for xenial
untested because of vagrant problems
2019-03-19 16:37:13 -07:00
Noah Levitt
d4f8bc768f trying to make this work with xenial for travis
see error https://travis-ci.org/internetarchive/brozzler/jobs/508141058
2019-03-18 16:38:23 -07:00
Noah Levitt
f2a9908395 travis only has py 3.7 for xenial 2019-03-18 16:20:54 -07:00
Noah Levitt
d729c8d0d5 use yaml.safe_load()
getting new warnings
see https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation
2019-03-18 15:49:44 -07:00
Noah Levitt
6f5f090c33 test py 3.7 2019-03-18 15:49:03 -07:00
Noah Levitt
ef981706f4 fix rethinkdb dependency version 2019-03-18 15:08:36 -07:00
Noah Levitt
61274ae994 peg to working doublethink
see: https://github.com/internetarchive/doublethink/commit/f7fc7da725c9b
2019-03-14 20:04:09 +00:00
Noah Levitt
7d5bb4b5d4
Merge pull request #148 from vbanos/disk-cache
Add disk cache options to Chrome
2019-02-12 14:39:49 -08:00
Vangelis Banos
9c48a6fa11 Use disk cache params only on Chrome.start
Use `disk_cache_dir` and `disk_cache_size` only on `Chrome.start` and
not on `Chrome.__init__`.

Drop `disk_cache_dir` and `disk_cache_size` class attributes.
2019-02-12 20:59:08 +00:00
Vangelis Banos
adeca823dd Remove stale comment 2019-02-12 07:21:44 +00:00
Vangelis Banos
31e611771e Improve disk cache options
Remove `--disable-cache`, its not used any more.

Rename `disk_cache` to `disk_cache_dir` and use only path (str)
argument.

Decouple `--disk-cache-size` from `--disk-cache-dir` so it is possible
to use either or both.
2019-02-07 07:42:45 +00:00
Vangelis Banos
c288c9ae98 Add disk cache options to Chrome
Add `Chrome` options `disk_cache` and `disk_cache_size` which add chromium
options `--disk-cache-dir=<DIR>` and `--disk-cache-size=N` (bytes).
The default is to use `--disable-cache` (no disk caching).

There are two ways to use the new vars, if you just use
`Chrome(disk_cache=True)` the chromium cli option `--disable-cache` is
NOT used and chromium writes disk cache inside profile dir.

If you use `Chrome(disk_cache='/tmp/custom_dir', disk_cache_size=10000)`
chromium will use `--disk-cache-dir=/tmp/custom_dir
--disk-cache-size=10000`.
2019-02-06 16:22:10 +00:00
Noah Levitt
809ea3885f
Merge pull request #147 from galgeek/bye_simpleclicks
no more simpleclicks/mouseovers
2019-01-14 13:48:48 -08:00