1636 Commits

Author SHA1 Message Date
Misty De Méo
7fc45fe6d0 brozzler 1.6.10
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (3.12) (push) Has been cancelled
Tests / Run tests (3.8) (push) Has been cancelled
1.16.10
2025-03-12 12:05:25 -07:00
Misty De Méo
af34639adb test: fix test brozzler imports
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
2025-03-11 11:12:15 -07:00
Misty De Méo
3ef0c3abc9
Merge pull request #345 from mistydemeo/fix_worker_id
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
fix: bind worker_id inside BrozzlerWorker
2025-03-10 14:42:06 -07:00
Misty De Méo
a902ae7a02 fix: bind worker_id inside BrozzlerWorker
This ensures the parameter remains available within a multithreaded context.
2025-03-10 14:03:14 -07:00
Misty De Méo
353cc1b9fd deps: install setuptools on python 3.12+
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (3.12) (push) Has been cancelled
Tests / Run tests (3.8) (push) Has been cancelled
distutils was removed beginning in Python 3.12, but it's used at
runtime by rethinkdb 2.4.9. setuptools provides a copy of distutils,
so we should make sure to install it when we're on Python 3.12 or
newer until we're able to upgrade to a version of rethinkdb that
no longer needs it.

See: https://www.python.org/downloads/release/python-3120/
2025-03-07 17:13:54 -08:00
Gretchen Leigh Miller
65b0b5f50b
Makefile improvements + pre-commit hook (#340)
* Makefile improvements + pre-commit hook

* update make target in CI

* fix CI more

* .gitignore update

* couple more Makefile refinements

* make target-version explicit on ruff import sorting
2025-03-07 16:45:53 -08:00
Gretchen Leigh Miller
f64db214d4
ruff linting fixes (#343)
* ruff linting fixes

* move imports back down to where they're re-exported
2025-03-07 16:03:35 -08:00
Gretchen Leigh Miller
6f011cc6c8
ruff import sorting pass + adding uv.lock (#342)
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
* ruff import sorting pass

* add uv.lock

* move comment back to its proper place
2025-03-07 10:04:11 -08:00
Misty De Méo
21102ca95c
__init__.py: rework imports (#334)
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
* __init__.py: rework imports

Although doublethink is an optional dependency to allow brozzler to be
used as a library without it, in practice we had some mandatory import
statements that prevented brozzler from being imported without it.
This fixes that by gating off some of the imports and exports.

If doublethink is available, brozzler works as it is now. But if it
isn't, we make a few changes:

* brozzler.worker, brozzler.cli and brozzler.model reexports are
  disabled
* One brozzler.cli function, which is used outside brozzler's own cli,
  has been moved into brozzler's __init__.py. For compatibility, it's
  reexported from brozzler.cli.

* Make tz-aware datetime of the epoch with stdlib

* Only import yt-dlp if we're using it

* ydl: never try if extra missing

* cli: use worker's yt-dlp check

---------

Co-authored-by: Alex Dempsey <avdempsey@archive.org>
2025-03-06 14:49:22 -08:00
Misty De Méo
0f707dc02b CI: extend daily job timeout
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
This was left at the default of six hours, but it timed out last
might. I'll set it at eight hours to see if this is more reliable.
2025-03-05 17:10:27 -08:00
Gretchen Leigh Miller
5350c202dc
Update README.rst to remove brozzler-easy and Wayback sections + other cleanup (#336)
* update instructions for brozzler-easy + add pywb extras

* revert pywb extra + updated README

* ruffing up

* more README.rst updates

* revert https change for local URL scheme
2025-03-05 14:33:47 -08:00
Misty De Méo
05b72906bd remove travis config 2025-03-05 13:34:03 -08:00
Misty De Méo
b45e5dc096 CLI: add new --worker-id option
This adds a new commandline flag allowing the worker ID to be specified.
If present, it will be added to the global context so that it will be
included in every logging statement.

Previously, we only had some indirect values to tie logging statements
to specific workers, so this should make it easier to follow.
2025-03-05 11:01:50 -08:00
Misty De Méo
af0f3ed378 CLI: enable log prefixing
This adds a commandline option which enables log level prefixing.
These prefixes enable log level-based filtering in journalctl when
present so long as logs are going to the journal, and
`SyslogLevelPrefix=` is set to `true` (which it is by default).

For documentation: https://manpages.debian.org/testing/libsystemd-dev/sd-daemon.3.en.html
2025-03-05 11:01:50 -08:00
Misty De Méo
f384d0b830 deps: add dev deps to pyproject.toml 2025-03-05 10:07:29 -08:00
Misty De Méo
ffeaee7a01 chore: use ruff for formatting
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
There are a few minor changes here compared to black; it flagged
unnecessary string concatenations, and has slightly different
opinions on line length.
2025-03-05 09:43:17 -08:00
Misty De Méo
c59b08df33
test: add CI (#329)
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
This adds two CI runs: a quick one that happens for every pull
request and merge to master, and a longer one that happens daily.

This also adds a new installation group to setup.py because the
`easy` group isn't currently installable, and some of the dependencies
specified there need to be present for the tests to run.
2025-03-04 09:34:23 -08:00
Barbara Miller
984a129b43
Merge pull request #333 from galgeek/barbara/rm_workflow_maybe
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
remove unused publish-artifacts workflow
2025-03-03 14:53:57 -08:00
Barbara Miller
e6332c7f94 unused 2025-03-03 14:41:30 -08:00
Adam Miller
39b695dc70
Merge pull request #320 from internetarchive/adam/chrome-flags-disable-optimization-model-downloads
feat: add chrome flag to disable download of autocomplete optimizatio…
2025-03-03 14:09:29 -08:00
Adam Miller
ccfadc87c8 combine disable-features flags 2025-03-03 13:58:48 -08:00
Misty De Méo
e37c0ad78c
deps: remove easy group (#331)
This group isn't actually installable right now because of the jinja
dependency conflict.
2025-03-03 09:24:52 -08:00
Gretchen Leigh Miller
6ca1a62489
update minimum Python version in README (#327)
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
2025-02-25 12:08:35 -08:00
Misty De Méo
23cee477dc
feat: set up structlog logging (#325)
Some checks are pending
Publish Artifacts / Build distribution 📦 (push) Waiting to run
Python Formatting Check / formatting (push) Waiting to run
This ports the logging from `logging` to `structlog`. This updates
all of the logger instantiations along with all of the places
`logging` was called. Data that was being inlined into log statements
has been broken out so that it's now structured arguments to the
log statements instead.
2025-02-24 16:31:09 -08:00
Misty De Méo
69d682beb9
Merge pull request #324 from mistydemeo/mdfind
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
CLI: improve Chrome finding on Mac
2025-02-19 14:48:21 -08:00
Misty De Méo
53cac65540 CLI: improve Chrome finding on Mac
On macOS, we can find Chrome even if it's installed in a non-default
path by querying `mdfind`. This is the CLI entrypoint to Spotlight,
and we can use it to look up applications using their unique bundle
identifiers.

If `mdfind` fails to find anything, this falls back to the hardcoded
paths. This should ensure this still works if Spotlight indexing is
off, but Chrome is in the default path.
2025-02-19 13:55:18 -08:00
Barbara Miller
591ba3c95a
bump version
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
2025-02-14 13:05:38 -08:00
Barbara Miller
c63f4296a6
Merge pull request #323 from galgeek/bmiller/better_fetch_url_timeout_errors
better error handling for _fetch_url
2025-02-14 12:40:04 -08:00
Barbara Miller
71ffbddfeb log _fetch_url completion 2025-02-14 10:38:24 -08:00
Barbara Miller
ba7031f2da better exceptions for fetch_url 2025-02-14 09:39:41 -08:00
Barbara Miller
732a7943f0 http, not https, maybe 2025-02-13 17:57:53 -08:00
Barbara Miller
819a483227 black'd 2025-02-13 17:55:36 -08:00
Barbara Miller
9dca200230 cert_reqs="CERT_NONE" 2025-02-13 16:27:05 -08:00
Barbara Miller
4af48be6ca use urllib3 2025-02-13 16:11:22 -08:00
Barbara Miller
2c9c040b84 black'd 2025-02-13 14:24:27 -08:00
Barbara Miller
53a1869def better error handling for _fetch_url 2025-02-13 14:21:24 -08:00
Barbara Miller
5fccdd83e3
bump version to 1.6.8
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
2025-02-11 17:34:14 -08:00
Barbara Miller
bfc85f5e89
Merge pull request #322 from galgeek/bmiller/more_better_requests
requests timeout for fetch_url, plus user_agent
2025-02-11 17:33:14 -08:00
Barbara Miller
ca79c3a329 minor logging fix 2025-02-11 17:28:18 -08:00
Barbara Miller
430c0daf39 catch and log more exceptions on fetch_url error 2025-02-11 12:51:26 -08:00
Barbara Miller
561e0803c6 requests timeout and user_agent 2025-02-11 12:27:50 -08:00
Barbara Miller
65de0d2a5f timeout for fetch_url 2025-02-09 11:13:03 -08:00
Adam Miller
6fa65557fb feat: add chrome flag to disable download of autocomplete optimization models at launch 2025-02-06 16:27:33 -08:00
Adam Miller
7ededbc521
Merge pull request #318 from internetarchive/adam/get-page-header-timeout
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
feat: add timeout to header check
2025-02-06 11:22:28 -08:00
Adam Miller
8ed517c1c0 chore: bump version 2025-02-06 11:19:23 -08:00
Adam Miller
3afc63242b fix: syntax bug on HEADER_REQUEST_TIMEOUT 2025-02-05 12:38:41 -08:00
Adam Miller
c5844dfdd6 chore: cleanup unused variable 2025-02-04 16:36:23 -08:00
Adam Miller
0feac5cd07 feat: add timeout to header check 2025-02-04 16:21:28 -08:00
Barbara Miller
df4bd148d5
bump version and update copyright
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
2025-01-23 16:26:16 -08:00
Barbara Miller
a749b2968b
Merge pull request #316 from galgeek/bmiller/shorter_behavior_timeout
shorter behavior timeout
2025-01-23 15:37:29 -08:00