18 Commits

Author SHA1 Message Date
Noah Levitt
384c877e9a new test exposing problem where each hashtag visited causes a page load, if page redirects 2017-09-27 14:08:28 -07:00
Noah Levitt
8256a34b4f implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker 2017-04-18 17:54:12 -07:00
Noah Levitt
df7734f2ca new command line utility brozzler-stop-crawl, with tests 2017-04-14 18:06:15 -07:00
Noah Levitt
3d47805ec1 new model for crawling hashtags, each one is no longer a top-level page 2017-03-27 12:15:49 -07:00
Noah Levitt
a836269e95 remove some vestiges of old proxy stuff 2017-03-24 16:04:43 -07:00
Noah Levitt
934190084c Refactor the way the proxy is configured. Job/site settings "proxy" and "enable_warcprox_features" are gone. Brozzler-worker now has mutually exclusive options --proxy and --warcprox-auto. --warcprox-auto means find an instance of warcprox in the service registry, and enable warcprox features. --proxy is provided, determines if proxy is warcprox by consulting http://{proxy_address}/status (see https://github.com/internetarchive/warcprox/commit/8caae0d7d3), and enables warcprox features if so. 2017-03-24 13:55:23 -07:00
Noah Levitt
242ff51ec7 fix bug with seed redirects where scope change was applied too late to affect scoping of outlinks from the seed (with automated tests) 2017-03-06 15:13:40 -08:00
Noah Levitt
569af05b11 rethinkstuff is now "doublethink 2017-03-02 12:48:45 -08:00
Noah Levitt
5c684779e5 pywb support for thumbnail: and screenshot: urls 2017-01-31 10:26:38 -08:00
Noah Levitt
4b6831b464 new flag Page.blocked_by_robots 2017-01-30 10:43:25 -08:00
Noah Levitt
86ac48d6c3 generalized support for login doing automatic detection of login form on a page 2016-12-19 17:30:09 -08:00
Noah Levitt
72816d1058 don't check robots.txt when scheduling a new site to be crawled, but mark the seed Page as needs_robots_check, and delegate the robots check to brozzler-worker; new test of robots.txt adherence 2016-11-16 12:23:59 -08:00
Noah Levitt
5ac8994a24 rename webconsole to dashboard 2016-11-04 17:46:23 -07:00
Mouse Reeve
2215aaab21 Use warcprox if enable_warcprox_features is true 2016-10-18 17:39:33 -07:00
Noah Levitt
a370e7b987 tiny fix, and now the test passes for me 2016-10-14 19:21:26 -07:00
Noah Levitt
27452990ee toward getting initial tests to pass 2016-10-14 18:26:48 -07:00
Noah Levitt
56e651baeb working on basic integration tests 2016-10-13 17:12:35 -07:00
Noah Levitt
c864499a64 starting to create a framework for testing 2016-09-14 17:06:49 -07:00