1611 Commits

Author SHA1 Message Date
Barbara Miller
9cf12039c9 skip remembering youtube video chunks 2023-09-07 12:01:16 -07:00
Barbara Miller
7a3c6d6abe set url per postprocessor 2023-09-06 17:30:48 -07:00
Barbara Miller
c5c918bc87 running well enough maybe 2023-09-05 15:40:23 -07:00
Barbara Miller
c74b1123bb update for mp4s like they used to be 2023-08-31 18:02:01 -07:00
Barbara Miller
57d4fd8060
bump version 2023-07-17 16:00:21 -07:00
Barbara Miller
740019cc18
Merge pull request #257 from vbanos/screenshot-on-successful-capture
Thanks, @vbanos!
2023-07-17 15:59:12 -07:00
Vangelis Banos
7ad7a230f6 Disable screenshot on 4xx/5xx when simpler404 option is used
Also update the relevant comment.
2023-07-16 14:57:09 +00:00
Vangelis Banos
dc0f2a7455 Do not try to get a screenshot if status is 4xx, 5xx
The screenshot is an additional thing we do when the capture is
successful. Why get a screenshot of 4xx/5xx responses? Its just extra
system load.
We already got the capture for archiving reasons.
2023-07-07 11:47:16 +00:00
Barbara Miller
b138b1e89b
bump version, copyright date 2023-04-30 11:15:23 -07:00
Barbara Miller
daefd9a4d5
Merge pull request #254 from internetarchive/bigger-window
configurable browser window height & width
2023-04-29 18:48:39 -07:00
Barbara Miller
6d69105c79 configurable window height & width 2023-04-28 13:49:44 -07:00
Barbara Miller
7783f92ce2 larger chrome window: 1400,900 2023-04-26 14:51:19 -07:00
Barbara Miller
0d4ed6a8be
bump version 2023-03-15 15:55:08 -07:00
Barbara Miller
4e65c2f046
Merge pull request #253 from internetarchive/yt-dlp-timeout
add socket_timeout opt for yt-dlp

Mike Wilson reviewed this via slack. We've agreed that it may be helpful to offer this setting as a command line option for brozzler, when this code is updated again.
2023-03-15 15:54:19 -07:00
Barbara Miller
0847d93d9e add socket_timeout opt for yt-dlp 2023-03-15 14:15:18 -07:00
Barbara Miller
03a6b15717
warcprox>=2.4.31 2022-08-19 12:50:34 -07:00
Barbara Miller
a4195e1a83
bump version 2022-08-12 10:41:48 -07:00
Barbara Miller
50c2b424c2
Merge pull request #248 from vbanos/stealth2
Add more stealth evasions
2022-08-12 10:40:34 -07:00
Barbara Miller
60645f7f37
bump version 2022-08-05 15:58:55 -07:00
Barbara Miller
0b60a2e2f3
Merge pull request #249 from internetarchive/blocks-shrink
@adam-miller ok'd this elsewhere
2022-08-05 15:36:34 -07:00
Barbara Miller
7edb0f11b0 and decode() 2022-08-04 16:04:37 -07:00
Barbara Miller
a5ee78e662 zlib compression 2022-08-04 11:16:38 -07:00
Vangelis Banos
b5b7d9d52b Add more stealth evasions
Set `navigator.platform = 'Win32'` instead of the default `Linux` as we
usualy run Brozzler on Linux.

Randomize the `navigator.deviceMemory` and
`navigator.hardwareConcurrency` to avoid browser fingerprinting.

Define `window.Notification` which is not defined because we run Chrome
with CLI parameter `--disable-notifications`.
2022-07-29 11:21:08 +00:00
Barbara Miller
39eb80567d
bump version 2022-06-22 16:13:59 -07:00
Barbara Miller
fa59a88a26
Merge pull request #247 from internetarchive/stealth-too 2022-06-22 16:13:12 -07:00
Barbara Miller
218a49e824 stealth for brozzler_worker 2022-06-22 14:14:50 -07:00
Barbara Miller
de8d67e1e7
bump version 2022-06-20 13:44:42 -07:00
Barbara Miller
fe0aaa1ff6
Merge pull request #246 from vbanos/stealth
Looks good, thank you, @vbanos!
2022-06-20 13:43:25 -07:00
Vangelis Banos
7a12925004 Add stealth parameter to avoid antibot systems
The aim is to prevent Brozzler detection and blocking by antibot
systems. To do that, we need to run some JS before any other code runs
on page load and mock specific browser attributes which indicate that
Brozzler is a bot.

We add the option `stealth` in `Browser`, `brozzler.cli` and
`BrozzlerWorker`. It is disabled by default.

If enabled, we run `stealth.js` which is executed before anything else
on the page via `Page.addScriptToEvaluateOnNewDocument`.

For now, we mock only the graphics driver attributes.
If this is OK, we can add more antibot evasions in the same script.

There are many antibot tests, we are using this: https://bot.sannysoft.com/

Inspired mainly by:
https://www.npmjs.com/package/puppeteer-extra-plugin-stealth
2022-06-17 10:53:12 +00:00
Barbara Miller
ddf7cb4cbc
bump version 2022-06-09 15:14:21 -07:00
Barbara Miller
f2d70e1e25
Merge pull request #245 from internetarchive/yt-dlp-log
yt-dlp: use 'youtube_dl' logger
2022-06-09 15:12:51 -07:00
Barbara Miller
14466a7fb3 'youtube_dl' logger 2022-06-08 14:30:32 -07:00
Adam Miller
1de63f0aea
Merge pull request #244 from internetarchive/yt-dlp-skip-live
yt-dlp should skip live streams
2022-04-27 15:29:07 -07:00
Adam Miller
66252e17c3
Merge pull request #243 from internetarchive/adds-hop-path-support
Adds hop path support
2022-04-26 12:10:43 -07:00
Adam Miller
eef8a1c432
Bump version 2022-04-26 09:55:08 -07:00
Adam Miller
05826942a9 Style fix 2022-04-20 22:49:18 +00:00
Barbara Miller
b693b8713f skip live streams 2022-04-03 17:50:27 -07:00
Adam Miller
cd16985724 Refactor of hop referrer passing 2022-03-24 21:38:47 +00:00
Barbara Miller
70bb544389
bump version 2022-03-22 13:59:48 -07:00
Barbara Miller
7ee6ea50d1
Merge pull request #242 from internetarchive/yt-dlp-03
for the record, @avdempsey ok'd this elsewhere
2022-03-22 10:23:58 -07:00
Barbara Miller
d5e41bf9ef skip vimeo special case 2022-03-22 10:00:18 -07:00
Barbara Miller
c52b4af608 vimeo/M3u8 handling, better logging 2022-03-21 20:26:20 -07:00
Barbara Miller
d67a05572d prefer video+audio files, debug postprocessor hook 2022-03-21 13:28:08 -07:00
Adam Miller
f4a9e77b06 Catching edge cases that were avoiding setting hop path information 2022-03-03 00:15:20 +00:00
Barbara Miller
7ea7e543a6
Merge pull request #241 from internetarchive/yt-dlp-too
yt-dlp for brozzler
2022-02-25 15:26:33 -08:00
Barbara Miller
25bb65a635 brozzler/ydl.py updates 2022-02-23 22:34:47 -08:00
Barbara Miller
0305db5e69 yt_dlp, not youtube-dl 2022-02-23 22:32:00 -08:00
Adam Miller
d61cec399e Merge branch 'master' into adds-hop-path-support 2022-02-09 18:10:37 +00:00
Barbara Miller
d9ac067e41
bump version, copyright statment 2022-01-18 17:45:58 -08:00
Barbara Miller
de199e789e
Merge pull request #237 from vbanos/disable-breakpad
Thanks, @vbanos!
2022-01-18 17:43:45 -08:00