This website requires JavaScript.
Explore
Help
Sign In
Git-Mirrors
/
brozzler
Watch
1
Star
0
Fork
0
You've already forked brozzler
mirror of
https://github.com/internetarchive/brozzler.git
synced
2025-02-24 08:39:59 -05:00
Code
Issues
Actions
Packages
Projects
Releases
Wiki
Activity
brozzler
/
brozzler
History
Barbara Miller
537eb1cf7f
Merge pull request
#34
from galgeek/ARI-5193
...
mouseover for ky.gov sites
2017-04-05 16:13:57 -07:00
..
dashboard
rethinkstuff is now "doublethink
2017-03-02 12:48:45 -08:00
js-templates
extract area/@href links, and add test for outlink extraction
2017-04-05 12:09:48 -07:00
__init__.py
consolidate job.py and site.py into model.py, and let Job and Site share the elapsed() method by way of a mixin
2017-03-29 18:49:04 -07:00
behaviors.yaml
add JIRA info
2017-04-04 15:52:03 -07:00
browser.py
new model for crawling hashtags, each one is no longer a top-level page
2017-03-27 12:15:49 -07:00
chrome.py
let the OS pick an available port, to avoid what appear to be timing issues causing multiple browsers to choose the same port
2017-02-22 12:44:19 -08:00
cli.py
consolidate job.py and site.py into model.py, and let Job and Site share the elapsed() method by way of a mixin
2017-03-29 18:49:04 -07:00
easy.py
Refactor the way the proxy is configured. Job/site settings "proxy" and "enable_warcprox_features" are gone. Brozzler-worker now has mutually exclusive options --proxy and --warcprox-auto. --warcprox-auto means find an instance of warcprox in the service registry, and enable warcprox features. --proxy is provided, determines if proxy is warcprox by consulting
http://{proxy_address}/status
(see
https://github.com/internetarchive/warcprox/commit/8caae0d7d3
), and enables warcprox features if so.
2017-03-24 13:55:23 -07:00
frontier.py
new model for crawling hashtags, each one is no longer a top-level page
2017-03-27 12:15:49 -07:00
job_schema.yaml
Refactor the way the proxy is configured. Job/site settings "proxy" and "enable_warcprox_features" are gone. Brozzler-worker now has mutually exclusive options --proxy and --warcprox-auto. --warcprox-auto means find an instance of warcprox in the service registry, and enable warcprox features. --proxy is provided, determines if proxy is warcprox by consulting
http://{proxy_address}/status
(see
https://github.com/internetarchive/warcprox/commit/8caae0d7d3
), and enables warcprox features if so.
2017-03-24 13:55:23 -07:00
model.py
consolidate job.py and site.py into model.py, and let Job and Site share the elapsed() method by way of a mixin
2017-03-29 18:49:04 -07:00
pywb.py
use urlcanon library for canonicalization, surtification, scope match rules
2017-03-15 14:59:51 -07:00
robots.py
Refactor the way the proxy is configured. Job/site settings "proxy" and "enable_warcprox_features" are gone. Brozzler-worker now has mutually exclusive options --proxy and --warcprox-auto. --warcprox-auto means find an instance of warcprox in the service registry, and enable warcprox features. --proxy is provided, determines if proxy is warcprox by consulting
http://{proxy_address}/status
(see
https://github.com/internetarchive/warcprox/commit/8caae0d7d3
), and enables warcprox features if so.
2017-03-24 13:55:23 -07:00
worker.py
new model for crawling hashtags, each one is no longer a top-level page
2017-03-27 12:15:49 -07:00