1149 Commits

Author SHA1 Message Date
Noah Levitt
dfd9d9ecdd omfg 2019-04-04 17:22:15 -07:00
Noah Levitt
fd0fe811e9 so little output from chromium-browser :(
https://travis-ci.org/internetarchive/brozzler/jobs/515942434

could it be problems running as this other user?
2019-04-04 16:09:21 -07:00
Noah Levitt
55541be9e9 let's see chromium output inside brozzler-worker
using --trace, because chromium seems to be working ok when we just run
it
2019-04-04 15:11:24 -07:00
Noah Levitt
58d1d1c429 chromium-browser with no args isn't dying at start
what about with all the args?
2019-04-04 14:38:29 -07:00
Noah Levitt
473e891fb4 not sure if --disable-extensions did something 2019-04-04 13:34:45 -07:00
Noah Levitt
6d145c87c8 chromium-browser --disable-extensions ? 2019-04-04 13:24:12 -07:00
Noah Levitt
0d46d8ce19 still trying to figure out what's up with chromium 2019-04-04 13:15:17 -07:00
Noah Levitt
45ac12117a maybe Xvnc.log will tell us something 2019-04-04 13:09:02 -07:00
Noah Levitt
8303fd3ab3 guessing DISPLAY was the issue here
https://travis-ci.org/internetarchive/brozzler/jobs/515882174#L610
2019-04-04 12:50:50 -07:00
Noah Levitt
899794f2da debug what's going on with chromium in travis
see https://travis-ci.org/internetarchive/brozzler/jobs/514858838
(unroll "sudo cat /var/log/brozzler-worker.log")

2019-04-02 20:16:01,792 18595 CRITICAL BrozzlingThread:42073 brozzler.worker.BrozzlerWorker.brozzle_site(worker.py:412) unexpected exception
Traceback (most recent call last):
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 379, in brozzle_site
    enable_youtube_dl=not self._skip_youtube_dl)
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 215, in brozzle_page
    browser, site, page, on_screenshot, on_request)
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/worker.py", line 292, in _browse_page
    cookie_db=site.get('cookie_db'))
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/browser.py", line 341, in start
    self.websock_url = self.chrome.start(**kwargs)
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/chrome.py", line 200, in start
    return self._websocket_url()
  File "/opt/brozzler-ve3/lib/python3.6/site-packages/brozzler/chrome.py", line 247, in _websocket_url
    raise e
Exception: chrome process died with status 1
2019-04-04 12:38:46 -07:00
Noah Levitt
9459ed40d0 fix typo 2019-04-04 12:38:41 -07:00
Noah Levitt
68ce9eac76 debugging travis-ci is a slow process 2019-04-02 13:05:36 -07:00
Noah Levitt
85c6ac0ab2 fix next travis-ci problem 2019-04-02 12:05:08 -07:00
Noah Levitt
9c658cddf7 fix a couple of svc definitions 2019-03-24 16:06:36 -07:00
Noah Levitt
48bb03418f daemontools 2019-03-23 00:26:39 -07:00
Noah Levitt
18b4a26db6 porting ansible config to xenial
no more upstart, switch to daemontools, among other things
2019-03-22 23:50:46 -07:00
Noah Levitt
19522aff85 adjusting ansible config for xenial
untested because of vagrant problems
2019-03-19 16:37:13 -07:00
Noah Levitt
d4f8bc768f trying to make this work with xenial for travis
see error https://travis-ci.org/internetarchive/brozzler/jobs/508141058
2019-03-18 16:38:23 -07:00
Noah Levitt
f2a9908395 travis only has py 3.7 for xenial 2019-03-18 16:20:54 -07:00
Noah Levitt
d729c8d0d5 use yaml.safe_load()
getting new warnings
see https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation
2019-03-18 15:49:44 -07:00
Noah Levitt
6f5f090c33 test py 3.7 2019-03-18 15:49:03 -07:00
Noah Levitt
ef981706f4 fix rethinkdb dependency version 2019-03-18 15:08:36 -07:00
Noah Levitt
61274ae994 peg to working doublethink
see: https://github.com/internetarchive/doublethink/commit/f7fc7da725c9b
2019-03-14 20:04:09 +00:00
Noah Levitt
7d5bb4b5d4
Merge pull request #148 from vbanos/disk-cache
Add disk cache options to Chrome
2019-02-12 14:39:49 -08:00
Vangelis Banos
9c48a6fa11 Use disk cache params only on Chrome.start
Use `disk_cache_dir` and `disk_cache_size` only on `Chrome.start` and
not on `Chrome.__init__`.

Drop `disk_cache_dir` and `disk_cache_size` class attributes.
2019-02-12 20:59:08 +00:00
Vangelis Banos
adeca823dd Remove stale comment 2019-02-12 07:21:44 +00:00
Vangelis Banos
31e611771e Improve disk cache options
Remove `--disable-cache`, its not used any more.

Rename `disk_cache` to `disk_cache_dir` and use only path (str)
argument.

Decouple `--disk-cache-size` from `--disk-cache-dir` so it is possible
to use either or both.
2019-02-07 07:42:45 +00:00
Vangelis Banos
c288c9ae98 Add disk cache options to Chrome
Add `Chrome` options `disk_cache` and `disk_cache_size` which add chromium
options `--disk-cache-dir=<DIR>` and `--disk-cache-size=N` (bytes).
The default is to use `--disable-cache` (no disk caching).

There are two ways to use the new vars, if you just use
`Chrome(disk_cache=True)` the chromium cli option `--disable-cache` is
NOT used and chromium writes disk cache inside profile dir.

If you use `Chrome(disk_cache='/tmp/custom_dir', disk_cache_size=10000)`
chromium will use `--disk-cache-dir=/tmp/custom_dir
--disk-cache-size=10000`.
2019-02-06 16:22:10 +00:00
Noah Levitt
809ea3885f
Merge pull request #147 from galgeek/bye_simpleclicks
no more simpleclicks/mouseovers
2019-01-14 13:48:48 -08:00
Barbara Miller
f6ffb4acea update (C) 2019-01-10 16:11:24 -08:00
Barbara Miller
9001156b54 rm simpleclicks.js.j2 mouseovers.js.j2 2019-01-10 15:58:38 -08:00
Barbara Miller
770ea6de1e no more simpleclicks/mouseovers 2019-01-10 15:54:47 -08:00
Barbara Miller
e1ceb87ca2
Merge pull request #146 from nlevitt/https-redirect
least surprise on http/https seed redirects
2018-12-21 15:26:04 -08:00
Noah Levitt
a74f46dc53 least surprise on http/https seed redirects
if http://foo.com/ redirects to https://foo.com/a/b/c let's also
put all of https://foo.com/ in scope
2018-12-21 15:17:31 -08:00
Noah Levitt
6b8e597a43 bump version after merge 2018-12-20 11:30:49 -08:00
Noah Levitt
0a08c01461
Merge pull request #145 from galgeek/no-skipIframes
no skipIframes for umbraBehavior
2018-12-20 11:30:28 -08:00
Barbara Miller
047b46bc4e back out now unnecessary updates 2018-12-20 11:25:06 -08:00
Barbara Miller
d8f97e7b3f no current need for skipIframes with new try/catch 2018-12-20 11:24:30 -08:00
Noah Levitt
034f7938c4 catch common exception in default behavior 2018-12-20 10:46:05 -08:00
Noah Levitt
2cd64811b3 bump version after merge 2018-12-17 15:10:26 -08:00
Noah Levitt
d8c9dd2ff4
Merge pull request #144 from galgeek/umbraBehavior18q4
fix instagram captures; add skipIframe feature
2018-12-17 15:09:52 -08:00
Barbara Miller
4a0d95277f update umbraBehavior 2018-12-17 15:04:36 -08:00
Barbara Miller
425d44bf4a updates for jina2 2018-12-13 17:27:15 -08:00
Barbara Miller
6c21a9f773 iframe option and other instagram updates 2018-12-13 15:54:10 -08:00
Noah Levitt
15870e6010 avoid IndexError
in some cases we receive this event from the browser:
{"method":"ServiceWorker.workerVersionUpdated","params":{"versions":[]}}
2018-12-13 15:49:38 -08:00
Noah Levitt
b577fe3c36 log browser uncaught exceptions at debug level
didn't realize these weren't showing up as console messages
2018-12-13 15:45:35 -08:00
Noah Levitt
ebcc063fe2 bump version after merge 2018-11-29 14:52:11 -08:00
jkafader
898756690f
Merge pull request #142 from nlevitt/service-worker
fetch service worker script with proper headers
2018-11-29 13:42:59 -08:00
jkafader
9c27e829aa
Merge pull request #136 from nlevitt/revert-time-limit
change time limit enforcement
2018-11-29 12:29:35 -08:00
Noah Levitt
db62402be8 fix tests 2018-11-27 14:35:00 -08:00