Barbara Miller
9cf12039c9
skip remembering youtube video chunks
2023-09-07 12:01:16 -07:00
Barbara Miller
7a3c6d6abe
set url per postprocessor
2023-09-06 17:30:48 -07:00
Barbara Miller
c5c918bc87
running well enough maybe
2023-09-05 15:40:23 -07:00
Barbara Miller
c74b1123bb
update for mp4s like they used to be
2023-08-31 18:02:01 -07:00
Barbara Miller
57d4fd8060
bump version
2023-07-17 16:00:21 -07:00
Barbara Miller
740019cc18
Merge pull request #257 from vbanos/screenshot-on-successful-capture
...
Thanks, @vbanos!
2023-07-17 15:59:12 -07:00
Vangelis Banos
7ad7a230f6
Disable screenshot on 4xx/5xx when simpler404 option is used
...
Also update the relevant comment.
2023-07-16 14:57:09 +00:00
Vangelis Banos
dc0f2a7455
Do not try to get a screenshot if status is 4xx, 5xx
...
The screenshot is an additional thing we do when the capture is
successful. Why get a screenshot of 4xx/5xx responses? Its just extra
system load.
We already got the capture for archiving reasons.
2023-07-07 11:47:16 +00:00
Barbara Miller
b138b1e89b
bump version, copyright date
2023-04-30 11:15:23 -07:00
Barbara Miller
daefd9a4d5
Merge pull request #254 from internetarchive/bigger-window
...
configurable browser window height & width
2023-04-29 18:48:39 -07:00
Barbara Miller
6d69105c79
configurable window height & width
2023-04-28 13:49:44 -07:00
Barbara Miller
7783f92ce2
larger chrome window: 1400,900
2023-04-26 14:51:19 -07:00
Barbara Miller
0d4ed6a8be
bump version
2023-03-15 15:55:08 -07:00
Barbara Miller
4e65c2f046
Merge pull request #253 from internetarchive/yt-dlp-timeout
...
add socket_timeout opt for yt-dlp
Mike Wilson reviewed this via slack. We've agreed that it may be helpful to offer this setting as a command line option for brozzler, when this code is updated again.
2023-03-15 15:54:19 -07:00
Barbara Miller
0847d93d9e
add socket_timeout opt for yt-dlp
2023-03-15 14:15:18 -07:00
Barbara Miller
03a6b15717
warcprox>=2.4.31
2022-08-19 12:50:34 -07:00
Barbara Miller
a4195e1a83
bump version
2022-08-12 10:41:48 -07:00
Barbara Miller
50c2b424c2
Merge pull request #248 from vbanos/stealth2
...
Add more stealth evasions
2022-08-12 10:40:34 -07:00
Barbara Miller
60645f7f37
bump version
2022-08-05 15:58:55 -07:00
Barbara Miller
0b60a2e2f3
Merge pull request #249 from internetarchive/blocks-shrink
...
@adam-miller ok'd this elsewhere
2022-08-05 15:36:34 -07:00
Barbara Miller
7edb0f11b0
and decode()
2022-08-04 16:04:37 -07:00
Barbara Miller
a5ee78e662
zlib compression
2022-08-04 11:16:38 -07:00
Vangelis Banos
b5b7d9d52b
Add more stealth evasions
...
Set `navigator.platform = 'Win32'` instead of the default `Linux` as we
usualy run Brozzler on Linux.
Randomize the `navigator.deviceMemory` and
`navigator.hardwareConcurrency` to avoid browser fingerprinting.
Define `window.Notification` which is not defined because we run Chrome
with CLI parameter `--disable-notifications`.
2022-07-29 11:21:08 +00:00
Barbara Miller
39eb80567d
bump version
2022-06-22 16:13:59 -07:00
Barbara Miller
fa59a88a26
Merge pull request #247 from internetarchive/stealth-too
2022-06-22 16:13:12 -07:00
Barbara Miller
218a49e824
stealth for brozzler_worker
2022-06-22 14:14:50 -07:00
Barbara Miller
de8d67e1e7
bump version
2022-06-20 13:44:42 -07:00
Barbara Miller
fe0aaa1ff6
Merge pull request #246 from vbanos/stealth
...
Looks good, thank you, @vbanos!
2022-06-20 13:43:25 -07:00
Vangelis Banos
7a12925004
Add stealth parameter to avoid antibot systems
...
The aim is to prevent Brozzler detection and blocking by antibot
systems. To do that, we need to run some JS before any other code runs
on page load and mock specific browser attributes which indicate that
Brozzler is a bot.
We add the option `stealth` in `Browser`, `brozzler.cli` and
`BrozzlerWorker`. It is disabled by default.
If enabled, we run `stealth.js` which is executed before anything else
on the page via `Page.addScriptToEvaluateOnNewDocument`.
For now, we mock only the graphics driver attributes.
If this is OK, we can add more antibot evasions in the same script.
There are many antibot tests, we are using this: https://bot.sannysoft.com/
Inspired mainly by:
https://www.npmjs.com/package/puppeteer-extra-plugin-stealth
2022-06-17 10:53:12 +00:00
Barbara Miller
ddf7cb4cbc
bump version
2022-06-09 15:14:21 -07:00
Barbara Miller
f2d70e1e25
Merge pull request #245 from internetarchive/yt-dlp-log
...
yt-dlp: use 'youtube_dl' logger
2022-06-09 15:12:51 -07:00
Barbara Miller
14466a7fb3
'youtube_dl' logger
2022-06-08 14:30:32 -07:00
Adam Miller
1de63f0aea
Merge pull request #244 from internetarchive/yt-dlp-skip-live
...
yt-dlp should skip live streams
2022-04-27 15:29:07 -07:00
Adam Miller
66252e17c3
Merge pull request #243 from internetarchive/adds-hop-path-support
...
Adds hop path support
2022-04-26 12:10:43 -07:00
Adam Miller
eef8a1c432
Bump version
2022-04-26 09:55:08 -07:00
Adam Miller
05826942a9
Style fix
2022-04-20 22:49:18 +00:00
Barbara Miller
b693b8713f
skip live streams
2022-04-03 17:50:27 -07:00
Adam Miller
cd16985724
Refactor of hop referrer passing
2022-03-24 21:38:47 +00:00
Barbara Miller
70bb544389
bump version
2022-03-22 13:59:48 -07:00
Barbara Miller
7ee6ea50d1
Merge pull request #242 from internetarchive/yt-dlp-03
...
for the record, @avdempsey ok'd this elsewhere
2022-03-22 10:23:58 -07:00
Barbara Miller
d5e41bf9ef
skip vimeo special case
2022-03-22 10:00:18 -07:00
Barbara Miller
c52b4af608
vimeo/M3u8 handling, better logging
2022-03-21 20:26:20 -07:00
Barbara Miller
d67a05572d
prefer video+audio files, debug postprocessor hook
2022-03-21 13:28:08 -07:00
Adam Miller
f4a9e77b06
Catching edge cases that were avoiding setting hop path information
2022-03-03 00:15:20 +00:00
Barbara Miller
7ea7e543a6
Merge pull request #241 from internetarchive/yt-dlp-too
...
yt-dlp for brozzler
2022-02-25 15:26:33 -08:00
Barbara Miller
25bb65a635
brozzler/ydl.py updates
2022-02-23 22:34:47 -08:00
Barbara Miller
0305db5e69
yt_dlp, not youtube-dl
2022-02-23 22:32:00 -08:00
Adam Miller
d61cec399e
Merge branch 'master' into adds-hop-path-support
2022-02-09 18:10:37 +00:00
Barbara Miller
d9ac067e41
bump version, copyright statment
2022-01-18 17:45:58 -08:00
Barbara Miller
de199e789e
Merge pull request #237 from vbanos/disable-breakpad
...
Thanks, @vbanos!
2022-01-18 17:43:45 -08:00