Commit graph

1729 commits

Author SHA1 Message Date
Misty De Méo
771f553572 release: 1.8.1 2025-11-13 15:37:51 -08:00
Adam Miller
78263af2f7
Merge pull request #422 from internetarchive/adam/fix_claim_sites_pre_filter
fix: We were applying the max_sites_to_claim filter too early. Many s…
2025-11-13 11:30:20 -08:00
Adam Miller
4430605ef1 chore: address ci formatting 2025-11-13 11:19:04 -08:00
Adam Miller
96942d40f9 fix: We were applying the max_sites_to_claim filter too early. Many sites in a single crawl prevent claimable sites from getting through. 2025-11-13 11:14:11 -08:00
Misty De Méo
4d1fb31bc6 ci: install deno
Some checks failed
Tests / Run tests (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
2025-11-07 14:57:42 -08:00
Misty De Méo
9a47de68ee deps: bump minimum python to 3.10
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
3.9 is now EOL, and yt-dlp no longer supports it.
2025-10-27 12:47:28 -07:00
Misty De Méo
1678d163d4 deps: bump pluggy
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
Fixes a warning under Python 3.14.
https://docs.python.org/3.14/whatsnew/3.14.html#pep-765-control-flow-in-finally-blocks
2025-10-08 14:08:00 -07:00
Misty De Méo
44484583cb release: 1.8.0
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (push) Waiting to run
2025-10-07 13:42:16 -07:00
Misty De Méo
3587f6e486 pyproject: add Misty 2025-10-07 13:42:16 -07:00
Misty De Méo
04d06ca49e deps: bump locked yt-dlp
The locked version hasn't been upgraded for awhile.
2025-10-07 12:06:55 -07:00
Barbara Miller
ba9e4f1be7
Merge pull request #367 from galgeek/barbara/header_request_timeout_60
increase HEADER_REQUEST_TIMEOUT
2025-10-07 12:00:15 -07:00
Barbara Miller
35153266a1
Merge branch 'master' into barbara/header_request_timeout_60 2025-10-07 11:00:39 -07:00
Misty De Méo
98a829f269 ssl: allow fetching pages needing legacy renegotiation
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (push) Waiting to run
Unsafe legacy renegotiation is disabled by default in requests, but
it's needed to access some webpages that real browsers are able to
safely access. This leaves it disabled by default when fetching
headers, while logging and retrying with it enabled if that fails.
robots.txt fetching is always done with legacy renegotiation on.
2025-10-06 09:28:36 -07:00
TheTechRobo
89d06af104 Fix workaround
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
2025-08-28 15:25:59 -03:00
TheTechRobo
a19d8c7814 Add test for network monitoring 2025-08-28 15:25:59 -03:00
TheTechRobo
a5b2ecbda9 Track network activity and wait for idle when visiting hashtags 2025-08-28 15:25:59 -03:00
Misty De Méo
3e82a55207 ci: migrate yt-dlp autotest to renovate
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
This replaces the previous yt-dlp auto-test and merge workflow to use
Renovate instead of Dependabot, since we've found that Dependabot is
no longer able to update our dependencies.
2025-08-21 13:25:29 -07:00
renovate[bot]
2b0bc419c0 chore(config): migrate config renovate.json 2025-08-21 13:14:07 -07:00
Misty De Méo
10db4bd19f renovate: disable everything but yt-dlp 2025-08-21 12:39:20 -07:00
Misty De Méo
9b7999989b renovate: customizations 2025-08-21 12:39:20 -07:00
renovate[bot]
b889cedf64 Add renovate.json 2025-08-21 12:39:20 -07:00
Misty De Méo
972b816878 deps: warctools 5.0.1
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
Silences a noisy warning; no other changes.
2025-08-18 15:32:43 -07:00
Misty De Méo
6261ea15ad tests: add some silenced warnings
These come from a dependency we can't affect right now.
2025-08-18 15:20:12 -07:00
Misty De Méo
940dadfc12 worker: add missing import
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
2025-07-30 14:17:30 -07:00
Misty De Méo
5ee31cd879 browser: fix json separators 2025-07-30 14:17:30 -07:00
TheTechRobo
08bb09ff06 Add --no-headless option to brozzle-page and brozzler-worker CLI
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
2025-07-28 15:04:00 -07:00
TheTechRobo
7d7968e833 Add headless option to Chrome.start 2025-07-28 15:04:00 -07:00
Misty De Méo
f719b61983 docs: bump README copyright year
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (push) Waiting to run
2025-07-28 14:19:46 -07:00
Misty De Méo
43b7e57147 docs: remove outdated README comment 2025-07-28 14:19:46 -07:00
Misty De Méo
4c77515063 deps: warctools 5.0.0
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
Needed for the warcprox import to work.
2025-07-21 12:40:11 -07:00
Misty De Méo
99575b03b4 ci: always run full test suite
We previously ran the full suite, including test_brozzling, on a daily
timer because it took an enormous amount of time to run. I'd been under
the impression this was because it *had* to take that long to do the
work it was performing, but it looks like it hadn't been necessary and
the suite has been sped up massively since. We can now run it in about
six and a half minutes, which is perfectly fine to run on every PR.
2025-07-21 12:40:11 -07:00
Misty De Méo
f54e9e382a tests: fix invalid escape
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
This made the common mistake of putting `\.` instead of `\\.`
in a non-raw string.
2025-07-18 16:51:51 -07:00
Misty De Méo
60f363ca89 tests: mark frontier perf test xfail
This is failing for me in CI, but passing locally.
2025-07-18 16:32:55 -07:00
Misty De Méo
db5cc6758a ci: run frontier tests
This was skipped before due to flakiness, but it seems to be both
reliable and fast enough to be tolerable. It takes about 30 seconds
to complete on my local machine.
2025-07-18 16:32:55 -07:00
Misty De Méo
dfcfed8ace ci: skip manpage generation
This should speed up dependency installs.
2025-07-18 16:07:59 -07:00
Misty De Méo
cb2ee89aee tests: fix out of date frontier fixture 2025-07-18 15:34:59 -07:00
Misty De Méo
306e55d61a ci: fix daily run
I migrated our regular tests to use `uv`, but neglected to update
this config too.
2025-07-18 15:34:59 -07:00
Misty De Méo
0f0ae4fbc3 remove unnecessary imports, use find_spec
This was flagged by ruff check - if we just want to find out if a
package is available, and don't need to actually import it, we can
use importlib.util.find_spec() to resolve it. This can lead to a
moderate speedup too, since the import might be slow.
2025-07-18 15:09:45 -07:00
Misty De Méo
85ae741b5d deps: bump ruff 2025-07-18 14:45:48 -07:00
Misty De Méo
f9cc2ea48e ci: test with 3.14 beta
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
3.14 beta 4 is very late in the cycle, so it seems like a good time
for us to start testing with it to make sure we're ready.
2025-07-10 09:47:18 -07:00
Misty De Méo
aea4286bd1 ci: use uv 2025-07-10 09:41:09 -07:00
Misty De Méo
7b691fe397 worker: skip audio content-types for media exclusion
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
2025-07-07 14:41:03 -07:00
Barbara Miller
d20d452fcf bump version to 1.6.14 2025-06-30 17:51:26 -07:00
Misty De Méo
a0f60c1051 Video exclusion: skip YouTube UMP packets too
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
In testing a page with an embedded YouTube video with video
exclusion enabled, I found that brozzler ended up capturing about
30MB of UMP packets. We should be filtering those out too.
2025-06-26 17:13:24 -07:00
Misty De Méo
5ff893ddaf brozzler-new-site: add flag to disable videos
This makes it easier to test the new video exclusion work.
2025-06-26 14:38:15 -07:00
Misty De Méo
38f164dbc4 Makefile: remove target-version
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (push) Waiting to run
This can be inferred from our pyproject.toml.
2025-06-26 09:04:51 -07:00
Misty De Méo
f9848efc1e tests: recognize CI=true 2025-06-26 09:04:51 -07:00
Misty De Méo
a4e5418e13 tests: enable format check 2025-06-26 09:04:51 -07:00
Misty De Méo
0f2c166e2a tests: use github-format in ci 2025-06-26 09:04:51 -07:00
Misty De Méo
422527d7e4 tests: ruff fixes
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (push) Waiting to run
2025-06-25 15:50:39 -07:00