Noah Levitt
|
8256a34b4f
|
implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker
|
2017-04-18 17:54:12 -07:00 |
|
Noah Levitt
|
3d47805ec1
|
new model for crawling hashtags, each one is no longer a top-level page
|
2017-03-27 12:15:49 -07:00 |
|
Noah Levitt
|
12fb9eaa15
|
use urlcanon library for canonicalization, surtification, scope match rules
|
2017-03-15 14:59:51 -07:00 |
|
Noah Levitt
|
c90c73372e
|
need $DISPLAY set for test_brozzling.py
|
2016-12-21 15:15:03 -08:00 |
|
Noah Levitt
|
72816d1058
|
don't check robots.txt when scheduling a new site to be crawled, but mark the seed Page as needs_robots_check, and delegate the robots check to brozzler-worker; new test of robots.txt adherence
|
2016-11-16 12:23:59 -08:00 |
|
Noah Levitt
|
5ac8994a24
|
rename webconsole to dashboard
|
2016-11-04 17:46:23 -07:00 |
|
Noah Levitt
|
5a373466a3
|
some vagrant/ansible fixes
|
2016-10-14 13:47:54 -07:00 |
|
Noah Levitt
|
c864499a64
|
starting to create a framework for testing
|
2016-09-14 17:06:49 -07:00 |
|