1424 Commits

Author SHA1 Message Date
Adam Miller
cd16985724 Refactor of hop referrer passing 2022-03-24 21:38:47 +00:00
Barbara Miller
70bb544389
bump version 2022-03-22 13:59:48 -07:00
Barbara Miller
7ee6ea50d1
Merge pull request #242 from internetarchive/yt-dlp-03
for the record, @avdempsey ok'd this elsewhere
2022-03-22 10:23:58 -07:00
Barbara Miller
d5e41bf9ef skip vimeo special case 2022-03-22 10:00:18 -07:00
Barbara Miller
c52b4af608 vimeo/M3u8 handling, better logging 2022-03-21 20:26:20 -07:00
Barbara Miller
d67a05572d prefer video+audio files, debug postprocessor hook 2022-03-21 13:28:08 -07:00
Adam Miller
f4a9e77b06 Catching edge cases that were avoiding setting hop path information 2022-03-03 00:15:20 +00:00
Barbara Miller
7ea7e543a6
Merge pull request #241 from internetarchive/yt-dlp-too
yt-dlp for brozzler
2022-02-25 15:26:33 -08:00
Barbara Miller
25bb65a635 brozzler/ydl.py updates 2022-02-23 22:34:47 -08:00
Barbara Miller
0305db5e69 yt_dlp, not youtube-dl 2022-02-23 22:32:00 -08:00
Adam Miller
d61cec399e Merge branch 'master' into adds-hop-path-support 2022-02-09 18:10:37 +00:00
Barbara Miller
d9ac067e41
bump version, copyright statment 2022-01-18 17:45:58 -08:00
Barbara Miller
de199e789e
Merge pull request #237 from vbanos/disable-breakpad
Thanks, @vbanos!
2022-01-18 17:43:45 -08:00
Vangelis Banos
fdc84fb848 Add chrome options --disable-sync and --disable-breakpad
`--disable-sync` disables syncing to a Google account.

`--disable-breakpad` disables crashdump collection.

These options aren't useful for Brozzler. They are already used in
puppeteer
https://github.com/puppeteer/puppeteer/blob/main/src/node/Launcher.ts#L211

Docs in chrome-launcher
https://github.com/GoogleChrome/chrome-launcher/blob/master/docs/chrome-flags-for-tools.md
2022-01-18 10:09:39 +00:00
Alex Dempsey
427908e821
Merge pull request #233 from cclauss/codespell
Fix typos
2021-10-12 12:34:37 -07:00
Christian Clauss
a5ed291e65 Fix typos 2021-10-12 10:19:48 +02:00
Adam Miller
0f72233f3b Adding support for hop path information to be stored and passed along to warcprox 2021-08-31 19:44:55 +00:00
Barbara Miller
4f301f4e03
Merge pull request #225 from internetarchive/wt-376-yt-user-page-fix
Added new extractor type to brozzler's youtube-dl playlist handling
2021-06-08 14:43:42 -07:00
Barbara Miller
c311fbb41f
bump version, update copyright 2021-05-25 17:14:21 -07:00
Barbara Miller
b59c4395ed
Merge pull request #223 from vbanos/fix-AddressValueError
Skip invalid outlink
2021-05-25 17:12:35 -07:00
Vangelis Banos
7aabc5f655 Skip invalid outlink
When one of the outlinks is `http://-1/` `urlcanon.whatwg` raises an
unhandled exception `ipaddress.AddressValueError` and the capture fails.

We can skip the problematic outlink and keep the rest without crashing.
2021-05-23 11:31:47 +00:00
Pravin Visakan
eabdeb0238 Added user page extractor type to ytdl monkeypatch 2021-05-04 16:50:38 -07:00
Barbara Miller
0f27c9995a
bump version 2020-10-29 17:12:14 -07:00
jkafader
5005c619f6
Merge pull request #211 from internetarchive/galgeek-websocket-url-timeout
configurable websocket url timeout, default 60
2020-10-29 17:08:48 -07:00
Barbara Miller
11c5cfa865 add param for Chrome.start 2020-10-21 15:39:46 -07:00
Barbara Miller
dc50fe1db2
Merge pull request #212 from internetarchive/bump-version-to-1.5.23
bump version after merge
2020-10-13 15:21:18 -07:00
Barbara Miller
052c3552ca
bump version after merge 2020-10-13 15:19:50 -07:00
Barbara Miller
f2ebdca597
configurable websocket url timeout, default 60 2020-10-13 15:12:32 -07:00
Barbara Miller
bb7594a14d
Merge pull request #209 from vbanos/outlinks-timeout
Thanks, @vbanos!
2020-10-13 15:01:55 -07:00
Vangelis Banos
8addaf31d5 Add option extract_outlinks_timeout
`Browser.extract_outlinks` has a default `timeout=60` parm that cannot be
changed in any way. (It is always invoked using `extract_outlinks()`.

We add param `extract_outlinks_timeout=60` to `BrozzlerWorker` and
`Browser.browse_page` to allow that.
2020-10-04 15:39:30 +00:00
Barbara Miller
18d3f5f930
Merge pull request #208 from internetarchive/galgeek-patch-2
based on PR #207 — thanks @cclaus!
2020-09-21 18:06:03 -07:00
Barbara Miller
297eaac6dd
update travis.yml and test! 2020-09-21 17:08:39 -07:00
Barbara Miller
c744bb2f92
update copyright 2020-09-01 19:05:21 -07:00
Barbara Miller
d599778c27
Merge pull request #206 from internetarchive/galgeek-patch-1
bump version after merge
2020-08-05 09:24:28 -07:00
Barbara Miller
84d6bb43fa
bump version after merge 2020-08-05 09:23:58 -07:00
Barbara Miller
5a6ecb09d5
Merge pull request #205 from vbanos/behavior-timeout-zero
Skip loading behavior when behavior_timeout=0

behavior_timeout is an existing parameter to `Browser.browse_page`
2020-08-04 16:18:58 -07:00
Neil Minton
12913cccf0
Merge pull request #204 from galgeek/noplaylist-ydl
youtube-dl option noplaylist: True
2020-08-04 14:12:14 -04:00
Vangelis Banos
8b10587031 Skip loading behavior when behavior_timeout=0
The user may set `behavior_timeout=0`. This means that they don't want
to run the behavior. As it is now, Brozzler will invoke
`brozzler.behavior_script` to load the script and `self.run_behavior`
to execute it.
We will run the behavior using `Runtime.evaluate` but then it will be
terminated immediately because of timeout=0.

It is better to skip behavior loading and running when
`behavior_timeout=0`.
2020-08-04 06:27:21 +00:00
Barbara Miller
dc0d99470a
Merge pull request #203 from miku/update-readme-proxy
Thank you, @miku!
2020-07-28 13:43:19 -07:00
Martin Czygan
8e670ca814 readme: remove proxy from job configuration
It has been removed in 934190084c73699747cf3f4c4d2ee7e268927eae.
2020-07-28 22:21:05 +02:00
Barbara Miller
e3a067cf60 youtube-dl option noplaylist: True 2020-07-24 16:22:50 -07:00
jkafader
1b9ebca13c
Merge pull request #202 from galgeek/limit_downloadThroughput
configurable limit for Chromium download throughput
2020-07-23 14:14:20 -07:00
Barbara Miller
739d09294e make configurable 2020-07-14 10:12:28 -07:00
Barbara Miller
36b4f80350 try SPN2 downloadThroughput limit 2020-07-14 10:12:28 -07:00
Barbara Miller
03594413f9
Merge pull request #200 from NGTmeaty/fix-test
Merging for the current fixes—thanks, @NGTmeaty!
2020-06-18 13:22:41 -07:00
NGTmeaty
25313a97de
Fix tests:
Update the RethinkDB pubkey location and repo location based on their guide https://rethinkdb.com/docs/install/ubuntu/
Numpy has updated to no longer support 3.5, on 3.5, we should install a earlier version of Numpy to maintain compatibility.
2020-06-02 03:26:33 -04:00
Neil Minton
3c5d1f24e0
Merge pull request #199 from galgeek/ARI-6097
instagram selector update
2020-05-26 17:11:35 -04:00
Barbara Miller
8da3ae9274 instagram update 2020-05-07 17:55:26 -07:00
jkafader
212111f581
Merge pull request #196 from galgeek/no-cache-dir-ydl
youtube-dl cache_dir: False
2020-04-30 15:05:00 -07:00
Barbara Miller
926de9c853 cache_dir: False 2020-04-30 11:11:38 -07:00