Barbara Miller
14466a7fb3
'youtube_dl' logger
2022-06-08 14:30:32 -07:00
Adam Miller
1de63f0aea
Merge pull request #244 from internetarchive/yt-dlp-skip-live
...
yt-dlp should skip live streams
2022-04-27 15:29:07 -07:00
Adam Miller
66252e17c3
Merge pull request #243 from internetarchive/adds-hop-path-support
...
Adds hop path support
2022-04-26 12:10:43 -07:00
Adam Miller
eef8a1c432
Bump version
2022-04-26 09:55:08 -07:00
Adam Miller
05826942a9
Style fix
2022-04-20 22:49:18 +00:00
Barbara Miller
b693b8713f
skip live streams
2022-04-03 17:50:27 -07:00
Adam Miller
cd16985724
Refactor of hop referrer passing
2022-03-24 21:38:47 +00:00
Barbara Miller
70bb544389
bump version
2022-03-22 13:59:48 -07:00
Barbara Miller
7ee6ea50d1
Merge pull request #242 from internetarchive/yt-dlp-03
...
for the record, @avdempsey ok'd this elsewhere
2022-03-22 10:23:58 -07:00
Barbara Miller
d5e41bf9ef
skip vimeo special case
2022-03-22 10:00:18 -07:00
Barbara Miller
c52b4af608
vimeo/M3u8 handling, better logging
2022-03-21 20:26:20 -07:00
Barbara Miller
d67a05572d
prefer video+audio files, debug postprocessor hook
2022-03-21 13:28:08 -07:00
Adam Miller
f4a9e77b06
Catching edge cases that were avoiding setting hop path information
2022-03-03 00:15:20 +00:00
Barbara Miller
7ea7e543a6
Merge pull request #241 from internetarchive/yt-dlp-too
...
yt-dlp for brozzler
2022-02-25 15:26:33 -08:00
Barbara Miller
25bb65a635
brozzler/ydl.py updates
2022-02-23 22:34:47 -08:00
Barbara Miller
0305db5e69
yt_dlp, not youtube-dl
2022-02-23 22:32:00 -08:00
Adam Miller
d61cec399e
Merge branch 'master' into adds-hop-path-support
2022-02-09 18:10:37 +00:00
Barbara Miller
d9ac067e41
bump version, copyright statment
2022-01-18 17:45:58 -08:00
Barbara Miller
de199e789e
Merge pull request #237 from vbanos/disable-breakpad
...
Thanks, @vbanos!
2022-01-18 17:43:45 -08:00
Vangelis Banos
fdc84fb848
Add chrome options --disable-sync and --disable-breakpad
...
`--disable-sync` disables syncing to a Google account.
`--disable-breakpad` disables crashdump collection.
These options aren't useful for Brozzler. They are already used in
puppeteer
https://github.com/puppeteer/puppeteer/blob/main/src/node/Launcher.ts#L211
Docs in chrome-launcher
https://github.com/GoogleChrome/chrome-launcher/blob/master/docs/chrome-flags-for-tools.md
2022-01-18 10:09:39 +00:00
Alex Dempsey
427908e821
Merge pull request #233 from cclauss/codespell
...
Fix typos
2021-10-12 12:34:37 -07:00
Christian Clauss
a5ed291e65
Fix typos
2021-10-12 10:19:48 +02:00
Adam Miller
0f72233f3b
Adding support for hop path information to be stored and passed along to warcprox
2021-08-31 19:44:55 +00:00
Barbara Miller
4f301f4e03
Merge pull request #225 from internetarchive/wt-376-yt-user-page-fix
...
Added new extractor type to brozzler's youtube-dl playlist handling
2021-06-08 14:43:42 -07:00
Barbara Miller
c311fbb41f
bump version, update copyright
2021-05-25 17:14:21 -07:00
Barbara Miller
b59c4395ed
Merge pull request #223 from vbanos/fix-AddressValueError
...
Skip invalid outlink
2021-05-25 17:12:35 -07:00
Vangelis Banos
7aabc5f655
Skip invalid outlink
...
When one of the outlinks is `http://-1/ ` `urlcanon.whatwg` raises an
unhandled exception `ipaddress.AddressValueError` and the capture fails.
We can skip the problematic outlink and keep the rest without crashing.
2021-05-23 11:31:47 +00:00
Pravin Visakan
eabdeb0238
Added user page extractor type to ytdl monkeypatch
2021-05-04 16:50:38 -07:00
Barbara Miller
0f27c9995a
bump version
2020-10-29 17:12:14 -07:00
jkafader
5005c619f6
Merge pull request #211 from internetarchive/galgeek-websocket-url-timeout
...
configurable websocket url timeout, default 60
2020-10-29 17:08:48 -07:00
Barbara Miller
11c5cfa865
add param for Chrome.start
2020-10-21 15:39:46 -07:00
Barbara Miller
dc50fe1db2
Merge pull request #212 from internetarchive/bump-version-to-1.5.23
...
bump version after merge
2020-10-13 15:21:18 -07:00
Barbara Miller
052c3552ca
bump version after merge
2020-10-13 15:19:50 -07:00
Barbara Miller
f2ebdca597
configurable websocket url timeout, default 60
2020-10-13 15:12:32 -07:00
Barbara Miller
bb7594a14d
Merge pull request #209 from vbanos/outlinks-timeout
...
Thanks, @vbanos!
2020-10-13 15:01:55 -07:00
Vangelis Banos
8addaf31d5
Add option extract_outlinks_timeout
...
`Browser.extract_outlinks` has a default `timeout=60` parm that cannot be
changed in any way. (It is always invoked using `extract_outlinks()`.
We add param `extract_outlinks_timeout=60` to `BrozzlerWorker` and
`Browser.browse_page` to allow that.
2020-10-04 15:39:30 +00:00
Barbara Miller
18d3f5f930
Merge pull request #208 from internetarchive/galgeek-patch-2
...
based on PR #207 — thanks @cclaus!
2020-09-21 18:06:03 -07:00
Barbara Miller
297eaac6dd
update travis.yml and test!
2020-09-21 17:08:39 -07:00
Barbara Miller
c744bb2f92
update copyright
2020-09-01 19:05:21 -07:00
Barbara Miller
d599778c27
Merge pull request #206 from internetarchive/galgeek-patch-1
...
bump version after merge
2020-08-05 09:24:28 -07:00
Barbara Miller
84d6bb43fa
bump version after merge
2020-08-05 09:23:58 -07:00
Barbara Miller
5a6ecb09d5
Merge pull request #205 from vbanos/behavior-timeout-zero
...
Skip loading behavior when behavior_timeout=0
behavior_timeout is an existing parameter to `Browser.browse_page`
2020-08-04 16:18:58 -07:00
Neil Minton
12913cccf0
Merge pull request #204 from galgeek/noplaylist-ydl
...
youtube-dl option noplaylist: True
2020-08-04 14:12:14 -04:00
Vangelis Banos
8b10587031
Skip loading behavior when behavior_timeout=0
...
The user may set `behavior_timeout=0`. This means that they don't want
to run the behavior. As it is now, Brozzler will invoke
`brozzler.behavior_script` to load the script and `self.run_behavior`
to execute it.
We will run the behavior using `Runtime.evaluate` but then it will be
terminated immediately because of timeout=0.
It is better to skip behavior loading and running when
`behavior_timeout=0`.
2020-08-04 06:27:21 +00:00
Barbara Miller
dc0d99470a
Merge pull request #203 from miku/update-readme-proxy
...
Thank you, @miku!
2020-07-28 13:43:19 -07:00
Martin Czygan
8e670ca814
readme: remove proxy from job configuration
...
It has been removed in 934190084c73699747cf3f4c4d2ee7e268927eae.
2020-07-28 22:21:05 +02:00
Barbara Miller
e3a067cf60
youtube-dl option noplaylist: True
2020-07-24 16:22:50 -07:00
jkafader
1b9ebca13c
Merge pull request #202 from galgeek/limit_downloadThroughput
...
configurable limit for Chromium download throughput
2020-07-23 14:14:20 -07:00
Barbara Miller
739d09294e
make configurable
2020-07-14 10:12:28 -07:00
Barbara Miller
36b4f80350
try SPN2 downloadThroughput limit
2020-07-14 10:12:28 -07:00