1582 Commits

Author SHA1 Message Date
Barbara Miller
6c4a9c0f89
bump version 2023-10-19 10:24:20 -07:00
Barbara Miller
5b5d4cb062
Merge pull request #263 from galgeek/yt-dlp-vimeo
yt-dlp: capture postprocessor "Merger" videos
2023-10-19 10:23:29 -07:00
Barbara Miller
52235d9668 catch yt_dlp.utils.UnsupportedError 2023-10-19 09:56:36 -07:00
Barbara Miller
43a97f596c rm old vimeo custom behavior 2023-10-18 18:14:57 -07:00
Barbara Miller
e20c59eeb9 update youtube-dl references to yt-dlp 2023-10-18 16:39:00 -07:00
Barbara Miller
1980573f9e minimal fix for current vimeo capture 2023-10-18 16:36:19 -07:00
Barbara Miller
a62e33a683
bump version 2023-10-18 11:03:33 -07:00
Barbara Miller
c1345e2c9f
Merge pull request #262 from galgeek/pr261
@avdempsey approved this merge.
2023-10-18 11:01:15 -07:00
Barbara Miller
542000ead3
Merge pull request #260 from galgeek/headless
@avdempsey approved this merge.
2023-10-18 11:00:07 -07:00
Barbara Miller
dc4097a9df better version number 2023-10-17 15:01:51 -07:00
Barbara Miller
d07cee8cf0 update rethinkdb imports 2023-10-17 14:58:34 -07:00
Barbara Miller
8c32c98431 update doublethink dependency, too 2023-10-17 14:24:46 -07:00
Vangelis Banos
dff1fbb08b Update rethinkdb dependency
The latest `warcprox` 2.5.1 requirement
https://github.com/internetarchive/doublethink/blob/Py311/setup.py
requires `rethinkdb>=2.4.9,<2.5` but Brozzler has `rethinkdb>=2.3,<2.4`
and this creates a conflict if they are in the same virtualenv.

We update Brozzler to use the same dependency.
2023-10-17 19:36:04 +00:00
Barbara Miller
d610c7745b headless chrome 2023-10-11 15:47:38 -07:00
Barbara Miller
3fc48a9aa8
bump version 2023-09-27 17:24:53 -07:00
Barbara Miller
1367f4bbdb
Merge pull request #259 from galgeek/yt-dlp-fix
@adam-miller ok'd the merge
2023-09-27 17:19:53 -07:00
Barbara Miller
eef7173d72 update for m3u8s, better naming 2023-09-18 14:51:44 -07:00
Barbara Miller
75e0555d43 don't handle FixupM3u8? 2023-09-15 16:56:29 -07:00
Barbara Miller
9c55c7b4c9
bump version 2023-09-07 16:40:29 -07:00
Barbara Miller
540d353a53
Merge pull request #258 from galgeek/yt-dlp-mp4-again
@avdempsey looked at this, I think, and remarked, "Looks good!" which I interpret as, ready to merge, since it fixes an urgent-ish capture issue
2023-09-07 16:38:43 -07:00
Barbara Miller
f868ce146b tidying 2023-09-07 12:39:43 -07:00
Barbara Miller
9cf12039c9 skip remembering youtube video chunks 2023-09-07 12:01:16 -07:00
Barbara Miller
7a3c6d6abe set url per postprocessor 2023-09-06 17:30:48 -07:00
Barbara Miller
c5c918bc87 running well enough maybe 2023-09-05 15:40:23 -07:00
Barbara Miller
c74b1123bb update for mp4s like they used to be 2023-08-31 18:02:01 -07:00
Barbara Miller
57d4fd8060
bump version 2023-07-17 16:00:21 -07:00
Barbara Miller
740019cc18
Merge pull request #257 from vbanos/screenshot-on-successful-capture
Thanks, @vbanos!
2023-07-17 15:59:12 -07:00
Vangelis Banos
7ad7a230f6 Disable screenshot on 4xx/5xx when simpler404 option is used
Also update the relevant comment.
2023-07-16 14:57:09 +00:00
Vangelis Banos
dc0f2a7455 Do not try to get a screenshot if status is 4xx, 5xx
The screenshot is an additional thing we do when the capture is
successful. Why get a screenshot of 4xx/5xx responses? Its just extra
system load.
We already got the capture for archiving reasons.
2023-07-07 11:47:16 +00:00
Barbara Miller
b138b1e89b
bump version, copyright date 2023-04-30 11:15:23 -07:00
Barbara Miller
daefd9a4d5
Merge pull request #254 from internetarchive/bigger-window
configurable browser window height & width
2023-04-29 18:48:39 -07:00
Barbara Miller
6d69105c79 configurable window height & width 2023-04-28 13:49:44 -07:00
Barbara Miller
7783f92ce2 larger chrome window: 1400,900 2023-04-26 14:51:19 -07:00
Barbara Miller
0d4ed6a8be
bump version 2023-03-15 15:55:08 -07:00
Barbara Miller
4e65c2f046
Merge pull request #253 from internetarchive/yt-dlp-timeout
add socket_timeout opt for yt-dlp

Mike Wilson reviewed this via slack. We've agreed that it may be helpful to offer this setting as a command line option for brozzler, when this code is updated again.
2023-03-15 15:54:19 -07:00
Barbara Miller
0847d93d9e add socket_timeout opt for yt-dlp 2023-03-15 14:15:18 -07:00
Barbara Miller
03a6b15717
warcprox>=2.4.31 2022-08-19 12:50:34 -07:00
Barbara Miller
a4195e1a83
bump version 2022-08-12 10:41:48 -07:00
Barbara Miller
50c2b424c2
Merge pull request #248 from vbanos/stealth2
Add more stealth evasions
2022-08-12 10:40:34 -07:00
Barbara Miller
60645f7f37
bump version 2022-08-05 15:58:55 -07:00
Barbara Miller
0b60a2e2f3
Merge pull request #249 from internetarchive/blocks-shrink
@adam-miller ok'd this elsewhere
2022-08-05 15:36:34 -07:00
Barbara Miller
7edb0f11b0 and decode() 2022-08-04 16:04:37 -07:00
Barbara Miller
a5ee78e662 zlib compression 2022-08-04 11:16:38 -07:00
Vangelis Banos
b5b7d9d52b Add more stealth evasions
Set `navigator.platform = 'Win32'` instead of the default `Linux` as we
usualy run Brozzler on Linux.

Randomize the `navigator.deviceMemory` and
`navigator.hardwareConcurrency` to avoid browser fingerprinting.

Define `window.Notification` which is not defined because we run Chrome
with CLI parameter `--disable-notifications`.
2022-07-29 11:21:08 +00:00
Barbara Miller
39eb80567d
bump version 2022-06-22 16:13:59 -07:00
Barbara Miller
fa59a88a26
Merge pull request #247 from internetarchive/stealth-too 2022-06-22 16:13:12 -07:00
Barbara Miller
218a49e824 stealth for brozzler_worker 2022-06-22 14:14:50 -07:00
Barbara Miller
de8d67e1e7
bump version 2022-06-20 13:44:42 -07:00
Barbara Miller
fe0aaa1ff6
Merge pull request #246 from vbanos/stealth
Looks good, thank you, @vbanos!
2022-06-20 13:43:25 -07:00
Vangelis Banos
7a12925004 Add stealth parameter to avoid antibot systems
The aim is to prevent Brozzler detection and blocking by antibot
systems. To do that, we need to run some JS before any other code runs
on page load and mock specific browser attributes which indicate that
Brozzler is a bot.

We add the option `stealth` in `Browser`, `brozzler.cli` and
`BrozzlerWorker`. It is disabled by default.

If enabled, we run `stealth.js` which is executed before anything else
on the page via `Page.addScriptToEvaluateOnNewDocument`.

For now, we mock only the graphics driver attributes.
If this is OK, we can add more antibot evasions in the same script.

There are many antibot tests, we are using this: https://bot.sannysoft.com/

Inspired mainly by:
https://www.npmjs.com/package/puppeteer-extra-plugin-stealth
2022-06-17 10:53:12 +00:00