Barbara Miller
03a6b15717
warcprox>=2.4.31
2022-08-19 12:50:34 -07:00
Barbara Miller
a4195e1a83
bump version
2022-08-12 10:41:48 -07:00
Barbara Miller
50c2b424c2
Merge pull request #248 from vbanos/stealth2
...
Add more stealth evasions
2022-08-12 10:40:34 -07:00
Barbara Miller
60645f7f37
bump version
2022-08-05 15:58:55 -07:00
Barbara Miller
0b60a2e2f3
Merge pull request #249 from internetarchive/blocks-shrink
...
@adam-miller ok'd this elsewhere
2022-08-05 15:36:34 -07:00
Barbara Miller
7edb0f11b0
and decode()
2022-08-04 16:04:37 -07:00
Barbara Miller
a5ee78e662
zlib compression
2022-08-04 11:16:38 -07:00
Vangelis Banos
b5b7d9d52b
Add more stealth evasions
...
Set `navigator.platform = 'Win32'` instead of the default `Linux` as we
usualy run Brozzler on Linux.
Randomize the `navigator.deviceMemory` and
`navigator.hardwareConcurrency` to avoid browser fingerprinting.
Define `window.Notification` which is not defined because we run Chrome
with CLI parameter `--disable-notifications`.
2022-07-29 11:21:08 +00:00
Barbara Miller
39eb80567d
bump version
2022-06-22 16:13:59 -07:00
Barbara Miller
fa59a88a26
Merge pull request #247 from internetarchive/stealth-too
2022-06-22 16:13:12 -07:00
Barbara Miller
218a49e824
stealth for brozzler_worker
2022-06-22 14:14:50 -07:00
Barbara Miller
de8d67e1e7
bump version
2022-06-20 13:44:42 -07:00
Barbara Miller
fe0aaa1ff6
Merge pull request #246 from vbanos/stealth
...
Looks good, thank you, @vbanos!
2022-06-20 13:43:25 -07:00
Vangelis Banos
7a12925004
Add stealth parameter to avoid antibot systems
...
The aim is to prevent Brozzler detection and blocking by antibot
systems. To do that, we need to run some JS before any other code runs
on page load and mock specific browser attributes which indicate that
Brozzler is a bot.
We add the option `stealth` in `Browser`, `brozzler.cli` and
`BrozzlerWorker`. It is disabled by default.
If enabled, we run `stealth.js` which is executed before anything else
on the page via `Page.addScriptToEvaluateOnNewDocument`.
For now, we mock only the graphics driver attributes.
If this is OK, we can add more antibot evasions in the same script.
There are many antibot tests, we are using this: https://bot.sannysoft.com/
Inspired mainly by:
https://www.npmjs.com/package/puppeteer-extra-plugin-stealth
2022-06-17 10:53:12 +00:00
Barbara Miller
ddf7cb4cbc
bump version
2022-06-09 15:14:21 -07:00
Barbara Miller
f2d70e1e25
Merge pull request #245 from internetarchive/yt-dlp-log
...
yt-dlp: use 'youtube_dl' logger
2022-06-09 15:12:51 -07:00
Barbara Miller
14466a7fb3
'youtube_dl' logger
2022-06-08 14:30:32 -07:00
Adam Miller
1de63f0aea
Merge pull request #244 from internetarchive/yt-dlp-skip-live
...
yt-dlp should skip live streams
2022-04-27 15:29:07 -07:00
Adam Miller
66252e17c3
Merge pull request #243 from internetarchive/adds-hop-path-support
...
Adds hop path support
2022-04-26 12:10:43 -07:00
Adam Miller
eef8a1c432
Bump version
2022-04-26 09:55:08 -07:00
Adam Miller
05826942a9
Style fix
2022-04-20 22:49:18 +00:00
Barbara Miller
b693b8713f
skip live streams
2022-04-03 17:50:27 -07:00
Adam Miller
cd16985724
Refactor of hop referrer passing
2022-03-24 21:38:47 +00:00
Barbara Miller
70bb544389
bump version
2022-03-22 13:59:48 -07:00
Barbara Miller
7ee6ea50d1
Merge pull request #242 from internetarchive/yt-dlp-03
...
for the record, @avdempsey ok'd this elsewhere
2022-03-22 10:23:58 -07:00
Barbara Miller
d5e41bf9ef
skip vimeo special case
2022-03-22 10:00:18 -07:00
Barbara Miller
c52b4af608
vimeo/M3u8 handling, better logging
2022-03-21 20:26:20 -07:00
Barbara Miller
d67a05572d
prefer video+audio files, debug postprocessor hook
2022-03-21 13:28:08 -07:00
Adam Miller
f4a9e77b06
Catching edge cases that were avoiding setting hop path information
2022-03-03 00:15:20 +00:00
Barbara Miller
7ea7e543a6
Merge pull request #241 from internetarchive/yt-dlp-too
...
yt-dlp for brozzler
2022-02-25 15:26:33 -08:00
Barbara Miller
25bb65a635
brozzler/ydl.py updates
2022-02-23 22:34:47 -08:00
Barbara Miller
0305db5e69
yt_dlp, not youtube-dl
2022-02-23 22:32:00 -08:00
Adam Miller
d61cec399e
Merge branch 'master' into adds-hop-path-support
2022-02-09 18:10:37 +00:00
Barbara Miller
d9ac067e41
bump version, copyright statment
2022-01-18 17:45:58 -08:00
Barbara Miller
de199e789e
Merge pull request #237 from vbanos/disable-breakpad
...
Thanks, @vbanos!
2022-01-18 17:43:45 -08:00
Vangelis Banos
fdc84fb848
Add chrome options --disable-sync and --disable-breakpad
...
`--disable-sync` disables syncing to a Google account.
`--disable-breakpad` disables crashdump collection.
These options aren't useful for Brozzler. They are already used in
puppeteer
https://github.com/puppeteer/puppeteer/blob/main/src/node/Launcher.ts#L211
Docs in chrome-launcher
https://github.com/GoogleChrome/chrome-launcher/blob/master/docs/chrome-flags-for-tools.md
2022-01-18 10:09:39 +00:00
Alex Dempsey
427908e821
Merge pull request #233 from cclauss/codespell
...
Fix typos
2021-10-12 12:34:37 -07:00
Christian Clauss
a5ed291e65
Fix typos
2021-10-12 10:19:48 +02:00
Adam Miller
0f72233f3b
Adding support for hop path information to be stored and passed along to warcprox
2021-08-31 19:44:55 +00:00
Barbara Miller
4f301f4e03
Merge pull request #225 from internetarchive/wt-376-yt-user-page-fix
...
Added new extractor type to brozzler's youtube-dl playlist handling
2021-06-08 14:43:42 -07:00
Barbara Miller
c311fbb41f
bump version, update copyright
2021-05-25 17:14:21 -07:00
Barbara Miller
b59c4395ed
Merge pull request #223 from vbanos/fix-AddressValueError
...
Skip invalid outlink
2021-05-25 17:12:35 -07:00
Vangelis Banos
7aabc5f655
Skip invalid outlink
...
When one of the outlinks is `http://-1/ ` `urlcanon.whatwg` raises an
unhandled exception `ipaddress.AddressValueError` and the capture fails.
We can skip the problematic outlink and keep the rest without crashing.
2021-05-23 11:31:47 +00:00
Pravin Visakan
eabdeb0238
Added user page extractor type to ytdl monkeypatch
2021-05-04 16:50:38 -07:00
Barbara Miller
0f27c9995a
bump version
2020-10-29 17:12:14 -07:00
jkafader
5005c619f6
Merge pull request #211 from internetarchive/galgeek-websocket-url-timeout
...
configurable websocket url timeout, default 60
2020-10-29 17:08:48 -07:00
Barbara Miller
11c5cfa865
add param for Chrome.start
2020-10-21 15:39:46 -07:00
Barbara Miller
dc50fe1db2
Merge pull request #212 from internetarchive/bump-version-to-1.5.23
...
bump version after merge
2020-10-13 15:21:18 -07:00
Barbara Miller
052c3552ca
bump version after merge
2020-10-13 15:19:50 -07:00
Barbara Miller
f2ebdca597
configurable websocket url timeout, default 60
2020-10-13 15:12:32 -07:00