1626 Commits

Author SHA1 Message Date
Gretchen Leigh Miller
5350c202dc
Update README.rst to remove brozzler-easy and Wayback sections + other cleanup (#336)
* update instructions for brozzler-easy + add pywb extras

* revert pywb extra + updated README

* ruffing up

* more README.rst updates

* revert https change for local URL scheme
2025-03-05 14:33:47 -08:00
Misty De Méo
05b72906bd remove travis config 2025-03-05 13:34:03 -08:00
Misty De Méo
b45e5dc096 CLI: add new --worker-id option
This adds a new commandline flag allowing the worker ID to be specified.
If present, it will be added to the global context so that it will be
included in every logging statement.

Previously, we only had some indirect values to tie logging statements
to specific workers, so this should make it easier to follow.
2025-03-05 11:01:50 -08:00
Misty De Méo
af0f3ed378 CLI: enable log prefixing
This adds a commandline option which enables log level prefixing.
These prefixes enable log level-based filtering in journalctl when
present so long as logs are going to the journal, and
`SyslogLevelPrefix=` is set to `true` (which it is by default).

For documentation: https://manpages.debian.org/testing/libsystemd-dev/sd-daemon.3.en.html
2025-03-05 11:01:50 -08:00
Misty De Méo
f384d0b830 deps: add dev deps to pyproject.toml 2025-03-05 10:07:29 -08:00
Misty De Méo
ffeaee7a01 chore: use ruff for formatting
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
There are a few minor changes here compared to black; it flagged
unnecessary string concatenations, and has slightly different
opinions on line length.
2025-03-05 09:43:17 -08:00
Misty De Méo
c59b08df33
test: add CI (#329)
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
This adds two CI runs: a quick one that happens for every pull
request and merge to master, and a longer one that happens daily.

This also adds a new installation group to setup.py because the
`easy` group isn't currently installable, and some of the dependencies
specified there need to be present for the tests to run.
2025-03-04 09:34:23 -08:00
Barbara Miller
984a129b43
Merge pull request #333 from galgeek/barbara/rm_workflow_maybe
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
remove unused publish-artifacts workflow
2025-03-03 14:53:57 -08:00
Barbara Miller
e6332c7f94 unused 2025-03-03 14:41:30 -08:00
Adam Miller
39b695dc70
Merge pull request #320 from internetarchive/adam/chrome-flags-disable-optimization-model-downloads
feat: add chrome flag to disable download of autocomplete optimizatio…
2025-03-03 14:09:29 -08:00
Adam Miller
ccfadc87c8 combine disable-features flags 2025-03-03 13:58:48 -08:00
Misty De Méo
e37c0ad78c
deps: remove easy group (#331)
This group isn't actually installable right now because of the jinja
dependency conflict.
2025-03-03 09:24:52 -08:00
Gretchen Leigh Miller
6ca1a62489
update minimum Python version in README (#327)
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
2025-02-25 12:08:35 -08:00
Misty De Méo
23cee477dc
feat: set up structlog logging (#325)
Some checks are pending
Publish Artifacts / Build distribution 📦 (push) Waiting to run
Python Formatting Check / formatting (push) Waiting to run
This ports the logging from `logging` to `structlog`. This updates
all of the logger instantiations along with all of the places
`logging` was called. Data that was being inlined into log statements
has been broken out so that it's now structured arguments to the
log statements instead.
2025-02-24 16:31:09 -08:00
Misty De Méo
69d682beb9
Merge pull request #324 from mistydemeo/mdfind
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
CLI: improve Chrome finding on Mac
2025-02-19 14:48:21 -08:00
Misty De Méo
53cac65540 CLI: improve Chrome finding on Mac
On macOS, we can find Chrome even if it's installed in a non-default
path by querying `mdfind`. This is the CLI entrypoint to Spotlight,
and we can use it to look up applications using their unique bundle
identifiers.

If `mdfind` fails to find anything, this falls back to the hardcoded
paths. This should ensure this still works if Spotlight indexing is
off, but Chrome is in the default path.
2025-02-19 13:55:18 -08:00
Barbara Miller
591ba3c95a
bump version
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
2025-02-14 13:05:38 -08:00
Barbara Miller
c63f4296a6
Merge pull request #323 from galgeek/bmiller/better_fetch_url_timeout_errors
better error handling for _fetch_url
2025-02-14 12:40:04 -08:00
Barbara Miller
71ffbddfeb log _fetch_url completion 2025-02-14 10:38:24 -08:00
Barbara Miller
ba7031f2da better exceptions for fetch_url 2025-02-14 09:39:41 -08:00
Barbara Miller
732a7943f0 http, not https, maybe 2025-02-13 17:57:53 -08:00
Barbara Miller
819a483227 black'd 2025-02-13 17:55:36 -08:00
Barbara Miller
9dca200230 cert_reqs="CERT_NONE" 2025-02-13 16:27:05 -08:00
Barbara Miller
4af48be6ca use urllib3 2025-02-13 16:11:22 -08:00
Barbara Miller
2c9c040b84 black'd 2025-02-13 14:24:27 -08:00
Barbara Miller
53a1869def better error handling for _fetch_url 2025-02-13 14:21:24 -08:00
Barbara Miller
5fccdd83e3
bump version to 1.6.8
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
2025-02-11 17:34:14 -08:00
Barbara Miller
bfc85f5e89
Merge pull request #322 from galgeek/bmiller/more_better_requests
requests timeout for fetch_url, plus user_agent
2025-02-11 17:33:14 -08:00
Barbara Miller
ca79c3a329 minor logging fix 2025-02-11 17:28:18 -08:00
Barbara Miller
430c0daf39 catch and log more exceptions on fetch_url error 2025-02-11 12:51:26 -08:00
Barbara Miller
561e0803c6 requests timeout and user_agent 2025-02-11 12:27:50 -08:00
Barbara Miller
65de0d2a5f timeout for fetch_url 2025-02-09 11:13:03 -08:00
Adam Miller
6fa65557fb feat: add chrome flag to disable download of autocomplete optimization models at launch 2025-02-06 16:27:33 -08:00
Adam Miller
7ededbc521
Merge pull request #318 from internetarchive/adam/get-page-header-timeout
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
feat: add timeout to header check
2025-02-06 11:22:28 -08:00
Adam Miller
8ed517c1c0 chore: bump version 2025-02-06 11:19:23 -08:00
Adam Miller
3afc63242b fix: syntax bug on HEADER_REQUEST_TIMEOUT 2025-02-05 12:38:41 -08:00
Adam Miller
c5844dfdd6 chore: cleanup unused variable 2025-02-04 16:36:23 -08:00
Adam Miller
0feac5cd07 feat: add timeout to header check 2025-02-04 16:21:28 -08:00
Barbara Miller
df4bd148d5
bump version and update copyright
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
2025-01-23 16:26:16 -08:00
Barbara Miller
a749b2968b
Merge pull request #316 from galgeek/bmiller/shorter_behavior_timeout
shorter behavior timeout
2025-01-23 15:37:29 -08:00
Barbara Miller
5e701e9dbe
Merge pull request #315 from galgeek/bmiller/proxy_select
yt-dlp proxy handling update
2025-01-23 15:37:01 -08:00
Adam Miller
1e30b4f478
Merge pull request #312 from internetarchive/adam/patch-yt-dlp-infinite-loop-bug
feat: override yt-dlp generic extractor to add redirect loop detectio…
2025-01-23 15:30:56 -08:00
Barbara Miller
2905324435 behavior_timeout=300seconds 2025-01-23 14:56:44 -08:00
Barbara Miller
9e09782984 ytdlp_proxy_file param 2025-01-23 14:35:34 -08:00
Barbara Miller
b22349e281 black'd 2025-01-23 12:37:56 -08:00
Barbara Miller
baa33e3079 ytdlp_proxy 2025-01-23 12:17:07 -08:00
Barbara Miller
854970f4dd black'd 2025-01-23 11:21:05 -08:00
Barbara Miller
170377fe89 yt-dlp proxy handling update 2025-01-23 10:58:32 -08:00
Adam Miller
493587ca2c fix: return ie_result and cleanup variable names to properly represent hop depth instead of redirects 2025-01-15 12:00:07 -08:00
Adam Miller
a250eb2b68 fix: ensure url is not a video when determining if we are in a redirect 2025-01-06 18:56:22 -08:00