Gretchen Miller
ec877b769d
ruff linting fixes
2025-03-07 15:45:15 -08:00
Gretchen Leigh Miller
6f011cc6c8
ruff import sorting pass + adding uv.lock ( #342 )
...
Python Formatting Check / formatting (push) Waiting to run
Tests / Run tests (3.12) (push) Waiting to run
Tests / Run tests (3.8) (push) Waiting to run
* ruff import sorting pass
* add uv.lock
* move comment back to its proper place
2025-03-07 10:04:11 -08:00
Misty De Méo
23cee477dc
feat: set up structlog logging ( #325 )
...
Publish Artifacts / Build distribution 📦 (push) Waiting to run
Python Formatting Check / formatting (push) Waiting to run
This ports the logging from `logging` to `structlog`. This updates
all of the logger instantiations along with all of the places
`logging` was called. Data that was being inlined into log statements
has been broken out so that it's now structured arguments to the
log statements instead.
2025-02-24 16:31:09 -08:00
Alex Dempsey
8b23430a87
Use black, enforce with GitHub Actions
2024-02-08 12:07:41 -08:00
Noah Levitt
e23fa68d65
fix bug clobbering own changes to parent_page
...
and some other tweaks (python 3.5+, pytest logging config, ...)
2019-10-17 13:47:54 -07:00
Noah Levitt
0a1360ab25
don't use localhost for test http server...
...
... because apparently sometimes chromium bypasses the proxy for local
addresses
2019-05-15 18:49:18 -07:00
Noah Levitt
433b201b52
use logging.warning() to quiet py37 warnings
2019-04-09 01:43:38 -07:00
Noah Levitt
85c6ac0ab2
fix next travis-ci problem
2019-04-02 12:05:08 -07:00
Noah Levitt
1ef717fa75
test exposing bug that we don't send warcprox-meta
...
when pushing stitched-up video with WARCPROX_WRITE_RECORD
2018-09-18 01:05:18 -07:00
Noah Levitt
3c27132aaa
test for youtube-dl stitch-up
2018-08-15 17:42:53 -07:00
Noah Levitt
d4db8ba9bc
is test_time_limit is failing because of timing?
...
give it up to ten seconds to mark the job finished
2018-06-25 10:35:24 -05:00
Noah Levitt
331d07fe88
these ssurts are strings too
2018-05-16 17:11:08 -07:00
Noah Levitt
5bb392ec7c
ssurts are strings now
...
because they're friendlier that way in rethinkdb
2018-05-16 16:43:10 -07:00
Noah Levitt
fc05cac338
ok seriously tests
2018-05-14 15:38:28 -07:00
Noah Levitt
05f8ab3495
fix more tests for new approach sans scope['surt']
2018-05-14 15:38:28 -07:00
Noah Levitt
d7512fbeb6
move time limit enforcement
...
now it's next to stop request enforcement which makes more sense and
supports more timely action
2018-03-01 11:28:30 -08:00
Noah Levitt
8505720c41
fix tests
2018-02-02 15:11:26 -08:00
Noah Levitt
384c877e9a
new test exposing problem where each hashtag visited causes a page load, if page redirects
2017-09-27 14:08:28 -07:00
Noah Levitt
8256a34b4f
implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker
2017-04-18 17:54:12 -07:00
Noah Levitt
df7734f2ca
new command line utility brozzler-stop-crawl, with tests
2017-04-14 18:06:15 -07:00
Noah Levitt
3d47805ec1
new model for crawling hashtags, each one is no longer a top-level page
2017-03-27 12:15:49 -07:00
Noah Levitt
a836269e95
remove some vestiges of old proxy stuff
2017-03-24 16:04:43 -07:00
Noah Levitt
934190084c
Refactor the way the proxy is configured. Job/site settings "proxy" and "enable_warcprox_features" are gone. Brozzler-worker now has mutually exclusive options --proxy and --warcprox-auto. --warcprox-auto means find an instance of warcprox in the service registry, and enable warcprox features. --proxy is provided, determines if proxy is warcprox by consulting http://{proxy_address}/status (see https://github.com/internetarchive/warcprox/commit/8caae0d7d3 ), and enables warcprox features if so.
2017-03-24 13:55:23 -07:00
Noah Levitt
242ff51ec7
fix bug with seed redirects where scope change was applied too late to affect scoping of outlinks from the seed (with automated tests)
2017-03-06 15:13:40 -08:00
Noah Levitt
569af05b11
rethinkstuff is now "doublethink
2017-03-02 12:48:45 -08:00
Noah Levitt
5c684779e5
pywb support for thumbnail: and screenshot: urls
2017-01-31 10:26:38 -08:00
Noah Levitt
4b6831b464
new flag Page.blocked_by_robots
2017-01-30 10:43:25 -08:00
Noah Levitt
86ac48d6c3
generalized support for login doing automatic detection of login form on a page
2016-12-19 17:30:09 -08:00
Noah Levitt
72816d1058
don't check robots.txt when scheduling a new site to be crawled, but mark the seed Page as needs_robots_check, and delegate the robots check to brozzler-worker; new test of robots.txt adherence
2016-11-16 12:23:59 -08:00
Noah Levitt
5ac8994a24
rename webconsole to dashboard
2016-11-04 17:46:23 -07:00
Mouse Reeve
2215aaab21
Use warcprox if enable_warcprox_features is true
2016-10-18 17:39:33 -07:00
Noah Levitt
a370e7b987
tiny fix, and now the test passes for me
2016-10-14 19:21:26 -07:00
Noah Levitt
27452990ee
toward getting initial tests to pass
2016-10-14 18:26:48 -07:00
Noah Levitt
56e651baeb
working on basic integration tests
2016-10-13 17:12:35 -07:00
Noah Levitt
c864499a64
starting to create a framework for testing
2016-09-14 17:06:49 -07:00