131 Commits

Author SHA1 Message Date
Adam Miller
cdb81496f6 chore: disable cluster tests, add frontier load test 2025-04-01 14:16:42 -07:00
Adam Miller
addf73f865 chore: Additional frontier testing and reformat 2025-03-31 16:03:44 -07:00
Adam Miller
e7e4225bf2 chore: fixing more tests 2025-03-27 17:12:17 -07:00
Adam Miller
b5ee8a9ea7 feat: Create new claim_sites() query, and fix frontier tests 2025-03-26 18:06:55 -07:00
Misty De Méo
af34639adb test: fix test brozzler imports
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
2025-03-11 11:12:15 -07:00
Gretchen Leigh Miller
f64db214d4
ruff linting fixes (#343)
* ruff linting fixes

* move imports back down to where they're re-exported
2025-03-07 16:03:35 -08:00
Gretchen Leigh Miller
6f011cc6c8
ruff import sorting pass + adding uv.lock (#342)
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
* ruff import sorting pass

* add uv.lock

* move comment back to its proper place
2025-03-07 10:04:11 -08:00
Misty De Méo
af0f3ed378 CLI: enable log prefixing
This adds a commandline option which enables log level prefixing.
These prefixes enable log level-based filtering in journalctl when
present so long as logs are going to the journal, and
`SyslogLevelPrefix=` is set to `true` (which it is by default).

For documentation: https://manpages.debian.org/testing/libsystemd-dev/sd-daemon.3.en.html
2025-03-05 11:01:50 -08:00
Misty De Méo
c59b08df33
test: add CI (#329)
Some checks are pending
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
This adds two CI runs: a quick one that happens for every pull
request and merge to master, and a longer one that happens daily.

This also adds a new installation group to setup.py because the
`easy` group isn't currently installable, and some of the dependencies
specified there need to be present for the tests to run.
2025-03-04 09:34:23 -08:00
Misty De Méo
23cee477dc
feat: set up structlog logging (#325)
Some checks are pending
Publish Artifacts / Build distribution 📦 (push) Waiting to run
Python Formatting Check / formatting (push) Waiting to run
This ports the logging from `logging` to `structlog`. This updates
all of the logger instantiations along with all of the places
`logging` was called. Data that was being inlined into log statements
has been broken out so that it's now structured arguments to the
log statements instead.
2025-02-24 16:31:09 -08:00
Alex Dempsey
8b23430a87 Use black, enforce with GitHub Actions 2024-02-08 12:07:41 -08:00
Adam Miller
d61cec399e Merge branch 'master' into adds-hop-path-support 2022-02-09 18:10:37 +00:00
Christian Clauss
a5ed291e65 Fix typos 2021-10-12 10:19:48 +02:00
Adam Miller
0f72233f3b Adding support for hop path information to be stored and passed along to warcprox 2021-08-31 19:44:55 +00:00
Jake L
78365c9f35
Expanding Brozzler's logging in capabilities
Some sites don't allow you to login without clicking on a button to open a retracted modal.

This update to the login code allows Brozzler to click on all elements that we think are related to opening a login modal.

Then, if there isn't a regular form, we will attempt to fill out abnormal form schemes.

The test_try_login test has been expanded for the new type of login form we are supporting.
2020-04-14 17:19:53 -04:00
Vangelis Banos
041feaf426 Add missing super().do_POST() 2020-04-14 09:39:48 +00:00
Vangelis Banos
782aab3048 Add unit tests for try_login behavior
Add unit tests for the code that detects and tries to use login forms
automatically (`Browser.try_login`).

Add `htdocs/favicon.ico` because it is loaded automatically when the
browser tries to use the test web server and it causes a "missing"
warning.

Create a new dir `tests/htdocs/site11` which is used for login related
test html files.
2020-04-13 19:16:10 +00:00
Barbara Miller
2dfe3632f5 xfail test 2020-03-11 20:37:30 -07:00
Noah Levitt
7915220ab7 consider page completed after 3 failures
https://github.com/internetarchive/brozzler/pull/183#issuecomment-560562807

"We've had a number of cases where a page kept failing for one reason or
another, and it's bad. We can end up with tons of duplicate captures,
the crawl is not able to make progress, and the overall performance of
the cluster is impacted in cases like yours, where a browser is sitting
there doing nothing for five minutes."
2019-12-04 12:38:22 -08:00
Noah Levitt
e23fa68d65 fix bug clobbering own changes to parent_page
and some other tweaks (python 3.5+, pytest logging config, ...)
2019-10-17 13:47:54 -07:00
Noah Levitt
8107abd804
Merge pull request #154 from vbanos/fix-brozzling-test
Fix test_brozzling::httpd fixture
2019-05-16 14:23:04 -07:00
Noah Levitt
0a1360ab25 don't use localhost for test http server...
... because apparently sometimes chromium bypasses the proxy for local
addresses
2019-05-15 18:49:18 -07:00
Vangelis Banos
a1f9122317 Fix test_brozzling::httpd fixture
We used `self.headers.getheader` which no longer works. We replace it
with `self.headers.get`.

We change the code to write binary data to `self.wfile` because we get
an exception for writing str and/or None.
2019-05-14 16:29:52 +00:00
Noah Levitt
433b201b52 use logging.warning() to quiet py37 warnings 2019-04-09 01:43:38 -07:00
Noah Levitt
85c6ac0ab2 fix next travis-ci problem 2019-04-02 12:05:08 -07:00
Noah Levitt
d729c8d0d5 use yaml.safe_load()
getting new warnings
see https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation
2019-03-18 15:49:44 -07:00
Noah Levitt
a74f46dc53 least surprise on http/https seed redirects
if http://foo.com/ redirects to https://foo.com/a/b/c let's also
put all of https://foo.com/ in scope
2018-12-21 15:17:31 -08:00
jkafader
898756690f
Merge pull request #142 from nlevitt/service-worker
fetch service worker script with proper headers
2018-11-29 13:42:59 -08:00
jkafader
9c27e829aa
Merge pull request #136 from nlevitt/revert-time-limit
change time limit enforcement
2018-11-29 12:29:35 -08:00
Noah Levitt
db62402be8 fix tests 2018-11-27 14:35:00 -08:00
Barbara Miller
e2b2542d4a handle http auth (#138)
abort brozzling on insterstial (auth dialog)

because we have no other recourse at this point. waiting on Network.requestIntercepted auth challenge support. (didn't work in our latest testing)
https://chromedevtools.github.io/devtools-protocol/tot/Network#type-AuthChallengeResponse
2018-11-16 15:10:30 -08:00
Noah Levitt
05fab8b909 change time limit enforcement
enforce time limit based on all the time that a site was in active
rotation, including time it spent waiting for its turn to be brozzled;
this undoes the change from b9640b8a30c934, because now it seems that
was the wrong decision (brozzler jobs with many seeds and low
max_claimed_sites hanging around forever)
2018-11-12 16:21:38 -08:00
Noah Levitt
7497b7e5ac tests expect outlinks to be a set 2018-10-12 11:03:54 -07:00
Noah Levitt
1ef717fa75 test exposing bug that we don't send warcprox-meta
when pushing stitched-up video with WARCPROX_WRITE_RECORD
2018-09-18 01:05:18 -07:00
jkafader
8368cd2bcb
Merge pull request #115 from nlevitt/ydl-stitched
Ydl stitched
2018-09-06 16:15:52 -07:00
Noah Levitt
88d3d3b310
why did those tests fail??? (#117)
1.4 for pypi
2018-08-22 14:35:39 -07:00
Noah Levitt
e7d2273856 fix failing tests 2018-08-16 11:40:54 -07:00
Noah Levitt
3c27132aaa test for youtube-dl stitch-up 2018-08-15 17:42:53 -07:00
Noah Levitt
d4db8ba9bc is test_time_limit is failing because of timing?
give it up to ten seconds to mark the job finished
2018-06-25 10:35:24 -05:00
Noah Levitt
c52c16c260 fix bug in test, add another one 2018-06-22 16:10:23 -05:00
Noah Levitt
aeb7c3f825 treat any error fetching robots.txt as "allow all" 2018-06-22 14:50:57 -05:00
Noah Levitt
331d07fe88 these ssurts are strings too 2018-05-16 17:11:08 -07:00
Noah Levitt
5bb392ec7c ssurts are strings now
because they're friendlier that way in rethinkdb
2018-05-16 16:43:10 -07:00
Noah Levitt
1572fd3ed6 missed a spot where is_permitted_by_robots needs monkeying 2018-05-15 16:52:48 -07:00
Noah Levitt
fc05cac338 ok seriously tests 2018-05-14 15:38:28 -07:00
Noah Levitt
05f8ab3495 fix more tests for new approach sans scope['surt'] 2018-05-14 15:38:28 -07:00
Noah Levitt
85a4757527 s/max_hops_off_surt/max_hops_off/ 2018-05-14 15:38:28 -07:00
Noah Levitt
5ebd2fb709 new test of max_hops_off 2018-05-14 15:38:28 -07:00
Noah Levitt
b83d3cb9df rename page.hops_off_surt to page.hops_off 2018-05-14 15:38:28 -07:00
Noah Levitt
245e27a21a tests for new approach without of scope['surt']
replaced by an accept rule (two rules in some cases of seed redirects)
2018-05-14 15:38:28 -07:00