Commit graph

1659 commits

Author SHA1 Message Date
Barbara Miller
aadd9cd521
bump version: 1.6.13
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (push) Waiting to run
2025-05-22 14:45:10 -07:00
Barbara Miller
6b249478cc
Merge pull request #355 from mikemccabe/mccabe/disable-auto-https
Try new flag to disable auto http->https
2025-05-22 14:44:03 -07:00
Mike McCabe
d6e079d8cb Try new flag to disable auto http->https
Fix for https://webarchive.jira.com/browse/WWM-2292 (as seen by pyspn)
2025-05-21 21:07:42 -07:00
Barbara Miller
370638a876
bump version: 1.6.12
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
2025-05-19 15:20:10 -07:00
Barbara Miller
cb4a846f4a
Merge pull request #348 from internetarchive/adam/new_claim_sites_query
feat: Create new claim_sites() query, and fix frontier tests
2025-05-19 15:19:06 -07:00
Barbara Miller
8b1d80fcc3
bump version: 1.6.11 2025-05-19 12:30:08 -07:00
Barbara Miller
79d288bf17
Merge pull request #353 from galgeek/barbara/misc_ytdlp
ytdlp config updates for saved livestreams (mostly)
2025-05-19 12:24:07 -07:00
Barbara Miller
a665d49bba
Merge pull request #350 from mistydemeo/misty/add_thread_to_dict
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
cli: add thread name to event dict
2025-05-15 12:42:04 -07:00
Misty De Méo
b3fbdceeca CI: add a simple tag-to-release config
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (push) Has been cancelled
This adds a tag-to-release Actions config based around uv.

This is triggered by pushing a tag with a new version; it will
automatically kick off this job, which will publish the new
version to PyPI on completion. We can push that tag from a PR
or directly to master.

At the moment, this doesn't do anything to automatically create
a GitHub release from the tag; we can do that manually for now,
but if we're interested I can add something to automatically
generate the release too.

We don't need to provide a token to uv to publish; instead, we
just need to configure the repo for PyPI access using this:
https://docs.pypi.org/trusted-publishers/adding-a-publisher/
2025-05-08 15:30:42 -07:00
Barbara Miller
1c59a076b5 stop skipping HLS, too 2025-05-07 18:41:49 -07:00
Barbara Miller
3dfcc2ade6 don't skip dash, do impersonate, smaller sleep intervals 2025-05-07 15:20:33 -07:00
Misty De Méo
aa86928154 cli: add thread name to event dict
I missed this item from the old log formatting config in the previous
structlog PRs.
2025-05-02 15:57:13 -07:00
Adam Miller
d36313f08f chore: ruff format pass 2025-04-15 14:05:54 -07:00
Adam Miller
0f57188a2c refactor: short circuit claimable sites loop when we have enough sites 2025-04-15 14:03:15 -07:00
Adam Miller
f0d527cda7 chore: merge logged proxy info into existing log call 2025-04-15 13:40:37 -07:00
Adam Miller
cdb81496f6 chore: disable cluster tests, add frontier load test 2025-04-01 14:16:42 -07:00
Adam Miller
addf73f865 chore: Additional frontier testing and reformat 2025-03-31 16:03:44 -07:00
Adam Miller
e7e4225bf2 chore: fixing more tests 2025-03-27 17:12:17 -07:00
Adam Miller
b5ee8a9ea7 feat: Create new claim_sites() query, and fix frontier tests 2025-03-26 18:06:55 -07:00
Adam Miller
42b4a88c96
Merge pull request #347 from internetarchive/adam/annotate_claim_sites
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (3.12) (push) Has been cancelled
Tests / Run tests (3.8) (push) Has been cancelled
Full test suite / Run tests (push) Has been cancelled
chore: annotate claim_sites()
2025-03-26 10:19:17 -07:00
Adam Miller
ae82d6fc13 chore: reformat with ruff 2025-03-26 10:11:00 -07:00
Adam Miller
fd633c32bf chore: additional claim_sites() annotation 2025-03-25 14:34:52 -07:00
Adam Miller
c249aa1728 chore: annotate claim_sites() 2025-03-21 17:09:47 -07:00
Misty De Méo
7fc45fe6d0 brozzler 1.6.10
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (3.12) (push) Has been cancelled
Tests / Run tests (3.8) (push) Has been cancelled
2025-03-12 12:05:25 -07:00
Misty De Méo
af34639adb test: fix test brozzler imports
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
2025-03-11 11:12:15 -07:00
Misty De Méo
3ef0c3abc9
Merge pull request #345 from mistydemeo/fix_worker_id
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
fix: bind worker_id inside BrozzlerWorker
2025-03-10 14:42:06 -07:00
Misty De Méo
a902ae7a02 fix: bind worker_id inside BrozzlerWorker
This ensures the parameter remains available within a multithreaded context.
2025-03-10 14:03:14 -07:00
Misty De Méo
353cc1b9fd deps: install setuptools on python 3.12+
Some checks failed
Python Formatting Check / formatting (push) Has been cancelled
Tests / Run tests (3.12) (push) Has been cancelled
Tests / Run tests (3.8) (push) Has been cancelled
distutils was removed beginning in Python 3.12, but it's used at
runtime by rethinkdb 2.4.9. setuptools provides a copy of distutils,
so we should make sure to install it when we're on Python 3.12 or
newer until we're able to upgrade to a version of rethinkdb that
no longer needs it.

See: https://www.python.org/downloads/release/python-3120/
2025-03-07 17:13:54 -08:00
Gretchen Leigh Miller
65b0b5f50b
Makefile improvements + pre-commit hook (#340)
* Makefile improvements + pre-commit hook

* update make target in CI

* fix CI more

* .gitignore update

* couple more Makefile refinements

* make target-version explicit on ruff import sorting
2025-03-07 16:45:53 -08:00
Gretchen Leigh Miller
f64db214d4
ruff linting fixes (#343)
* ruff linting fixes

* move imports back down to where they're re-exported
2025-03-07 16:03:35 -08:00
Gretchen Leigh Miller
6f011cc6c8
ruff import sorting pass + adding uv.lock (#342)
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
* ruff import sorting pass

* add uv.lock

* move comment back to its proper place
2025-03-07 10:04:11 -08:00
Misty De Méo
21102ca95c
__init__.py: rework imports (#334)
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
* __init__.py: rework imports

Although doublethink is an optional dependency to allow brozzler to be
used as a library without it, in practice we had some mandatory import
statements that prevented brozzler from being imported without it.
This fixes that by gating off some of the imports and exports.

If doublethink is available, brozzler works as it is now. But if it
isn't, we make a few changes:

* brozzler.worker, brozzler.cli and brozzler.model reexports are
  disabled
* One brozzler.cli function, which is used outside brozzler's own cli,
  has been moved into brozzler's __init__.py. For compatibility, it's
  reexported from brozzler.cli.

* Make tz-aware datetime of the epoch with stdlib

* Only import yt-dlp if we're using it

* ydl: never try if extra missing

* cli: use worker's yt-dlp check

---------

Co-authored-by: Alex Dempsey <avdempsey@archive.org>
2025-03-06 14:49:22 -08:00
Misty De Méo
0f707dc02b CI: extend daily job timeout
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
This was left at the default of six hours, but it timed out last
might. I'll set it at eight hours to see if this is more reliable.
2025-03-05 17:10:27 -08:00
Gretchen Leigh Miller
5350c202dc
Update README.rst to remove brozzler-easy and Wayback sections + other cleanup (#336)
* update instructions for brozzler-easy + add pywb extras

* revert pywb extra + updated README

* ruffing up

* more README.rst updates

* revert https change for local URL scheme
2025-03-05 14:33:47 -08:00
Misty De Méo
05b72906bd remove travis config 2025-03-05 13:34:03 -08:00
Misty De Méo
b45e5dc096 CLI: add new --worker-id option
This adds a new commandline flag allowing the worker ID to be specified.
If present, it will be added to the global context so that it will be
included in every logging statement.

Previously, we only had some indirect values to tie logging statements
to specific workers, so this should make it easier to follow.
2025-03-05 11:01:50 -08:00
Misty De Méo
af0f3ed378 CLI: enable log prefixing
This adds a commandline option which enables log level prefixing.
These prefixes enable log level-based filtering in journalctl when
present so long as logs are going to the journal, and
`SyslogLevelPrefix=` is set to `true` (which it is by default).

For documentation: https://manpages.debian.org/testing/libsystemd-dev/sd-daemon.3.en.html
2025-03-05 11:01:50 -08:00
Misty De Méo
f384d0b830 deps: add dev deps to pyproject.toml 2025-03-05 10:07:29 -08:00
Misty De Méo
ffeaee7a01 chore: use ruff for formatting
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
There are a few minor changes here compared to black; it flagged
unnecessary string concatenations, and has slightly different
opinions on line length.
2025-03-05 09:43:17 -08:00
Misty De Méo
c59b08df33
test: add CI (#329)
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
This adds two CI runs: a quick one that happens for every pull
request and merge to master, and a longer one that happens daily.

This also adds a new installation group to setup.py because the
`easy` group isn't currently installable, and some of the dependencies
specified there need to be present for the tests to run.
2025-03-04 09:34:23 -08:00
Barbara Miller
984a129b43
Merge pull request #333 from galgeek/barbara/rm_workflow_maybe
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
remove unused publish-artifacts workflow
2025-03-03 14:53:57 -08:00
Barbara Miller
e6332c7f94 unused 2025-03-03 14:41:30 -08:00
Adam Miller
39b695dc70
Merge pull request #320 from internetarchive/adam/chrome-flags-disable-optimization-model-downloads
feat: add chrome flag to disable download of autocomplete optimizatio…
2025-03-03 14:09:29 -08:00
Adam Miller
ccfadc87c8 combine disable-features flags 2025-03-03 13:58:48 -08:00
Misty De Méo
e37c0ad78c
deps: remove easy group (#331)
This group isn't actually installable right now because of the jinja
dependency conflict.
2025-03-03 09:24:52 -08:00
Gretchen Leigh Miller
6ca1a62489
update minimum Python version in README (#327)
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
2025-02-25 12:08:35 -08:00
Misty De Méo
23cee477dc
feat: set up structlog logging (#325)
Some checks are pending
Publish Artifacts / Build distribution 📦 (push) Waiting to run
Python Formatting Check / formatting (push) Waiting to run
This ports the logging from `logging` to `structlog`. This updates
all of the logger instantiations along with all of the places
`logging` was called. Data that was being inlined into log statements
has been broken out so that it's now structured arguments to the
log statements instead.
2025-02-24 16:31:09 -08:00
Misty De Méo
69d682beb9
Merge pull request #324 from mistydemeo/mdfind
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
CLI: improve Chrome finding on Mac
2025-02-19 14:48:21 -08:00
Misty De Méo
53cac65540 CLI: improve Chrome finding on Mac
On macOS, we can find Chrome even if it's installed in a non-default
path by querying `mdfind`. This is the CLI entrypoint to Spotlight,
and we can use it to look up applications using their unique bundle
identifiers.

If `mdfind` fails to find anything, this falls back to the hardcoded
paths. This should ensure this still works if Spotlight indexing is
off, but Chrome is in the default path.
2025-02-19 13:55:18 -08:00
Barbara Miller
591ba3c95a
bump version
Some checks failed
Publish Artifacts / Build distribution 📦 (push) Has been cancelled
Python Formatting Check / formatting (push) Has been cancelled
2025-02-14 13:05:38 -08:00