don't check robots.txt when scheduling a new site to be crawled, but mark the seed Page as needs_robots_check, and delegate the robots check to brozzler-worker; new test of robots.txt adherence

This commit is contained in:
Noah Levitt 2016-11-16 12:23:59 -08:00
parent 24cc8377fb
commit 72816d1058
7 changed files with 121 additions and 36 deletions

View file

@ -32,7 +32,7 @@ def find_package_data(package):
setuptools.setup(
name='brozzler',
version='1.1b8.dev126',
version='1.1b8.dev127',
description='Distributed web crawling with browsers',
url='https://github.com/internetarchive/brozzler',
author='Noah Levitt',