Noah Levitt
9459ed40d0
fix typo
2019-04-04 12:38:41 -07:00
Noah Levitt
68ce9eac76
debugging travis-ci is a slow process
2019-04-02 13:05:36 -07:00
Noah Levitt
85c6ac0ab2
fix next travis-ci problem
2019-04-02 12:05:08 -07:00
Noah Levitt
06e072a716
update some dependencies
2019-04-02 17:58:35 +00:00
Noah Levitt
8b6e5cbfb9
new option brozzler-purge --finished-before=...
2019-04-02 17:58:13 +00:00
Noah Levitt
9c658cddf7
fix a couple of svc definitions
2019-03-24 16:06:36 -07:00
Noah Levitt
48bb03418f
daemontools
2019-03-23 00:26:39 -07:00
Noah Levitt
18b4a26db6
porting ansible config to xenial
...
no more upstart, switch to daemontools, among other things
2019-03-22 23:50:46 -07:00
Noah Levitt
19522aff85
adjusting ansible config for xenial
...
untested because of vagrant problems
2019-03-19 16:37:13 -07:00
Noah Levitt
d4f8bc768f
trying to make this work with xenial for travis
...
see error https://travis-ci.org/internetarchive/brozzler/jobs/508141058
2019-03-18 16:38:23 -07:00
Noah Levitt
f2a9908395
travis only has py 3.7 for xenial
2019-03-18 16:20:54 -07:00
Noah Levitt
d729c8d0d5
use yaml.safe_load()
...
getting new warnings
see https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation
2019-03-18 15:49:44 -07:00
Noah Levitt
6f5f090c33
test py 3.7
2019-03-18 15:49:03 -07:00
Noah Levitt
ef981706f4
fix rethinkdb dependency version
2019-03-18 15:08:36 -07:00
Noah Levitt
61274ae994
peg to working doublethink
...
see: https://github.com/internetarchive/doublethink/commit/f7fc7da725c9b
2019-03-14 20:04:09 +00:00
Noah Levitt
7d5bb4b5d4
Merge pull request #148 from vbanos/disk-cache
...
Add disk cache options to Chrome
2019-02-12 14:39:49 -08:00
Vangelis Banos
9c48a6fa11
Use disk cache params only on Chrome.start
...
Use `disk_cache_dir` and `disk_cache_size` only on `Chrome.start` and
not on `Chrome.__init__`.
Drop `disk_cache_dir` and `disk_cache_size` class attributes.
2019-02-12 20:59:08 +00:00
Vangelis Banos
adeca823dd
Remove stale comment
2019-02-12 07:21:44 +00:00
Vangelis Banos
31e611771e
Improve disk cache options
...
Remove `--disable-cache`, its not used any more.
Rename `disk_cache` to `disk_cache_dir` and use only path (str)
argument.
Decouple `--disk-cache-size` from `--disk-cache-dir` so it is possible
to use either or both.
2019-02-07 07:42:45 +00:00
Vangelis Banos
c288c9ae98
Add disk cache options to Chrome
...
Add `Chrome` options `disk_cache` and `disk_cache_size` which add chromium
options `--disk-cache-dir=<DIR>` and `--disk-cache-size=N` (bytes).
The default is to use `--disable-cache` (no disk caching).
There are two ways to use the new vars, if you just use
`Chrome(disk_cache=True)` the chromium cli option `--disable-cache` is
NOT used and chromium writes disk cache inside profile dir.
If you use `Chrome(disk_cache='/tmp/custom_dir', disk_cache_size=10000)`
chromium will use `--disk-cache-dir=/tmp/custom_dir
--disk-cache-size=10000`.
2019-02-06 16:22:10 +00:00
Noah Levitt
809ea3885f
Merge pull request #147 from galgeek/bye_simpleclicks
...
no more simpleclicks/mouseovers
2019-01-14 13:48:48 -08:00
Barbara Miller
f6ffb4acea
update (C)
2019-01-10 16:11:24 -08:00
Barbara Miller
9001156b54
rm simpleclicks.js.j2 mouseovers.js.j2
2019-01-10 15:58:38 -08:00
Barbara Miller
770ea6de1e
no more simpleclicks/mouseovers
2019-01-10 15:54:47 -08:00
Barbara Miller
e1ceb87ca2
Merge pull request #146 from nlevitt/https-redirect
...
least surprise on http/https seed redirects
2018-12-21 15:26:04 -08:00
Noah Levitt
a74f46dc53
least surprise on http/https seed redirects
...
if http://foo.com/ redirects to https://foo.com/a/b/c let's also
put all of https://foo.com/ in scope
2018-12-21 15:17:31 -08:00
Noah Levitt
6b8e597a43
bump version after merge
2018-12-20 11:30:49 -08:00
Noah Levitt
0a08c01461
Merge pull request #145 from galgeek/no-skipIframes
...
no skipIframes for umbraBehavior
2018-12-20 11:30:28 -08:00
Barbara Miller
047b46bc4e
back out now unnecessary updates
2018-12-20 11:25:06 -08:00
Barbara Miller
d8f97e7b3f
no current need for skipIframes with new try/catch
2018-12-20 11:24:30 -08:00
Noah Levitt
034f7938c4
catch common exception in default behavior
2018-12-20 10:46:05 -08:00
Noah Levitt
2cd64811b3
bump version after merge
2018-12-17 15:10:26 -08:00
Noah Levitt
d8c9dd2ff4
Merge pull request #144 from galgeek/umbraBehavior18q4
...
fix instagram captures; add skipIframe feature
2018-12-17 15:09:52 -08:00
Barbara Miller
4a0d95277f
update umbraBehavior
2018-12-17 15:04:36 -08:00
Barbara Miller
425d44bf4a
updates for jina2
2018-12-13 17:27:15 -08:00
Barbara Miller
6c21a9f773
iframe option and other instagram updates
2018-12-13 15:54:10 -08:00
Noah Levitt
15870e6010
avoid IndexError
...
in some cases we receive this event from the browser:
{"method":"ServiceWorker.workerVersionUpdated","params":{"versions":[]}}
2018-12-13 15:49:38 -08:00
Noah Levitt
b577fe3c36
log browser uncaught exceptions at debug level
...
didn't realize these weren't showing up as console messages
2018-12-13 15:45:35 -08:00
Noah Levitt
ebcc063fe2
bump version after merge
2018-11-29 14:52:11 -08:00
jkafader
898756690f
Merge pull request #142 from nlevitt/service-worker
...
fetch service worker script with proper headers
2018-11-29 13:42:59 -08:00
jkafader
9c27e829aa
Merge pull request #136 from nlevitt/revert-time-limit
...
change time limit enforcement
2018-11-29 12:29:35 -08:00
Noah Levitt
db62402be8
fix tests
2018-11-27 14:35:00 -08:00
Noah Levitt
f63947cfe9
fetch service worker script with proper headers
2018-11-27 12:35:33 -08:00
Noah Levitt
574af7846e
bump version after merge
2018-11-16 15:10:46 -08:00
Barbara Miller
e2b2542d4a
handle http auth ( #138 )
...
abort brozzling on insterstial (auth dialog)
because we have no other recourse at this point. waiting on Network.requestIntercepted auth challenge support. (didn't work in our latest testing)
https://chromedevtools.github.io/devtools-protocol/tot/Network#type-AuthChallengeResponse
2018-11-16 15:10:30 -08:00
Noah Levitt
05fab8b909
change time limit enforcement
...
enforce time limit based on all the time that a site was in active
rotation, including time it spent waiting for its turn to be brozzled;
this undoes the change from b9640b8a30c934, because now it seems that
was the wrong decision (brozzler jobs with many seeds and low
max_claimed_sites hanging around forever)
2018-11-12 16:21:38 -08:00
Noah Levitt
15610fa990
fail quickly if browser dies at startup
...
instead of trying to retrieve /json for 600 seconds
2018-11-01 15:57:03 -07:00
Noah Levitt
1073431f76
handle exceptions extracting links
...
like this one:
Uncaught DOMException: Blocked a frame with origin "https://www.youtube.com " from accessing a cross-origin frame.
at __brzl_compileOutlinks (<anonymous>:4:24)
at __brzl_compileOutlinks (<anonymous>:10:29)
at <anonymous>:16:1
__brzl_compileOutlinks @ VM194:4
__brzl_compileOutlinks @ VM194:10
not sure exactly why this happens but we just have to handle it
2018-10-29 17:42:25 -07:00
Noah Levitt
af85f28908
fix reported chromium crash by removing argument
...
--single-process
https://github.com/internetarchive/brozzler/issues/128
2018-10-22 14:28:31 -07:00
Noah Levitt
20996fa501
bump version after merge
2018-10-12 12:46:09 -07:00