Noah Levitt
426916a238
need warcprox in python path for travis tests now
2017-04-18 18:10:18 -07:00
Noah Levitt
d8904dc9e7
Merge branch 'master' into qa
...
* master:
implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker
have _warcprox_write_record also raise ProxyError when appropriate, and test this
2017-04-18 17:54:21 -07:00
Noah Levitt
8256a34b4f
implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker
2017-04-18 17:54:12 -07:00
Noah Levitt
5603ff5380
have _warcprox_write_record also raise ProxyError when appropriate, and test this
2017-04-18 16:58:51 -07:00
Neil Minton
f90d05075e
Merge branch 'ari-4960' into qa
2017-04-18 15:24:14 -07:00
Neil Minton
f541dce5c3
Crawl Google Calendar for fortstjames.ca
2017-04-18 15:22:33 -07:00
Noah Levitt
d05474173a
Merge branch 'master' into qa
...
* master:
fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth
2017-04-18 12:00:33 -07:00
Noah Levitt
ac972d399f
fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth
2017-04-18 12:00:23 -07:00
Noah Levitt
6844cb5bcb
Merge branch 'master' into qa
...
* master:
raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch
raise new exception brozzler.ProxyError in case of proxy error browsing a page
make brozzle-page respect --proxy (no test for this!)
oops, version bump for previous commit
bubble up proxy errors fetching robots.txt, with unit test, and documentation
2017-04-17 18:15:32 -07:00
Noah Levitt
dc43794363
raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch
2017-04-17 18:15:22 -07:00
Noah Levitt
349b41ab32
raise new exception brozzler.ProxyError in case of proxy error browsing a page
2017-04-17 18:14:02 -07:00
Noah Levitt
87a7301f4d
make brozzle-page respect --proxy (no test for this!)
2017-04-17 18:11:09 -07:00
Noah Levitt
0e90950de2
oops, version bump for previous commit
2017-04-17 18:10:56 -07:00
Noah Levitt
0884b4cd56
bubble up proxy errors fetching robots.txt, with unit test, and documentation
2017-04-17 16:47:05 -07:00
Noah Levitt
929f046ebb
Merge branch 'master' into qa
...
* master:
new command line utility brozzler-stop-crawl, with tests
2017-04-14 18:06:24 -07:00
Noah Levitt
df7734f2ca
new command line utility brozzler-stop-crawl, with tests
2017-04-14 18:06:15 -07:00
Barbara Miller
11279e001b
Merge branch 'ARI-5259' into qa
2017-04-14 15:20:11 -07:00
Barbara Miller
72e9d8da58
blog.sin.com.cn pagination
2017-04-14 15:19:12 -07:00
Noah Levitt
a768b07a65
Merge branch 'master' into qa
...
* master:
parameterize command line entry points and add tests of --version, a rudimentary check that the commands at least run
2017-04-14 11:46:39 -07:00
Noah Levitt
fae60e9960
parameterize command line entry points and add tests of --version, a rudimentary check that the commands at least run
2017-04-14 11:46:26 -07:00
Noah Levitt
b7731bdc75
Merge branch 'master' into qa
...
* master:
stupid version number bump
Revert "bump version number for last pull request"
2017-04-05 17:02:09 -07:00
Noah Levitt
b3cf746f53
stupid version number bump
2017-04-05 17:01:52 -07:00
Noah Levitt
62917a6f1a
Revert "bump version number for last pull request"
...
This reverts commit d192fc269e
.
2017-04-05 17:01:06 -07:00
Noah Levitt
40022ad0c2
Merge branch 'master' into qa
...
* master:
bump version number for last pull request
2017-04-05 16:15:33 -07:00
Noah Levitt
d192fc269e
bump version number for last pull request
2017-04-05 16:15:24 -07:00
Barbara Miller
537eb1cf7f
Merge pull request #34 from galgeek/ARI-5193
...
mouseover for ky.gov sites
2017-04-05 16:13:57 -07:00
Noah Levitt
6535922fa6
Merge branch 'master' into qa
...
* master:
extract area/@href links, and add test for outlink extraction
2017-04-05 12:09:58 -07:00
Noah Levitt
5bcd10c228
extract area/@href links, and add test for outlink extraction
2017-04-05 12:09:48 -07:00
Barbara Miller
010c2869dd
Merge branch 'ARI-5193' into qa
2017-04-04 15:52:50 -07:00
Barbara Miller
847b68eaf4
add JIRA info
2017-04-04 15:52:03 -07:00
Barbara Miller
bdf4c2db42
Merge branch 'ARI-5193' into qa
2017-03-31 15:49:24 -07:00
Barbara Miller
901321199c
mouseover for ky.gov sites
2017-03-31 15:48:01 -07:00
Noah Levitt
db8c1d36fa
Merge branch 'master' into qa
...
* master:
ugh fix version number
2017-03-30 17:53:45 -07:00
Noah Levitt
d4d3ef4fd3
ugh fix version number
2017-03-30 17:53:36 -07:00
Noah Levitt
082f10d327
Merge branch 'master' into qa
...
* master:
consolidate job.py and site.py into model.py, and let Job and Site share the elapsed() method by way of a mixin
2017-03-29 18:49:15 -07:00
Noah Levitt
125d77b8c4
consolidate job.py and site.py into model.py, and let Job and Site share the elapsed() method by way of a mixin
2017-03-29 18:49:04 -07:00
Noah Levitt
a83c11b302
Merge branch 'master' into qa
...
* master:
new model for crawling hashtags, each one is no longer a top-level page
remove some vestiges of old proxy stuff
2017-03-27 12:16:11 -07:00
Noah Levitt
3d47805ec1
new model for crawling hashtags, each one is no longer a top-level page
2017-03-27 12:15:49 -07:00
Noah Levitt
a836269e95
remove some vestiges of old proxy stuff
2017-03-24 16:04:43 -07:00
Noah Levitt
d373611061
Merge branch 'master' into qa
...
* master:
new test of frontier.seed_page
2017-03-24 15:45:48 -07:00
Noah Levitt
a826fdc7ef
new test of frontier.seed_page
2017-03-24 15:45:40 -07:00
Noah Levitt
ec3472ce61
Merge branch 'master' into qa
...
* master:
actually respect --proxy and --warcprox-auto options to brozzler-worker
2017-03-24 22:28:20 +00:00
Noah Levitt
0e35de43b6
actually respect --proxy and --warcprox-auto options to brozzler-worker
2017-03-24 22:27:52 +00:00
Noah Levitt
fb2d760306
Merge branch 'master' into qa
...
* master:
Refactor the way the proxy is configured. Job/site settings "proxy" and "enable_warcprox_features" are gone. Brozzler-worker now has mutually exclusive options --proxy and --warcprox-auto. --warcprox-auto means find an instance of warcprox in the service registry, and enable warcprox features. --proxy is provided, determines if proxy is warcprox by consulting http://{proxy_address}/status (see 8caae0d7d3
), and enables warcprox features if so.
2017-03-24 14:38:13 -07:00
Noah Levitt
934190084c
Refactor the way the proxy is configured. Job/site settings "proxy" and "enable_warcprox_features" are gone. Brozzler-worker now has mutually exclusive options --proxy and --warcprox-auto. --warcprox-auto means find an instance of warcprox in the service registry, and enable warcprox features. --proxy is provided, determines if proxy is warcprox by consulting http://{proxy_address}/status (see 8caae0d7d3
), and enables warcprox features if so.
2017-03-24 13:55:23 -07:00
Noah Levitt
bc2d4d5cba
Merge branch 'master' into qa
...
* master:
back to a dev version number
1.1b10 since 1.1b9 has bugs :(
2017-03-22 16:12:50 -07:00
Noah Levitt
9a2f181eb6
back to a dev version number
2017-03-22 16:12:39 -07:00
Noah Levitt
613dca29dc
1.1b10 since 1.1b9 has bugs :(
2017-03-22 16:11:26 -07:00
Noah Levitt
06ef045e63
Merge branch 'master' into qa
...
* master:
ugh, avoid infinite recursion
fix frontier tests now that enable_warcprox_features is simply omitted by default
i dub thee 1.1b9
github didn't like that, how about a width in pixels
maybe pypi supports RST image "scale"
2017-03-22 15:54:07 -07:00
Noah Levitt
4ba25db684
ugh, avoid infinite recursion
2017-03-22 15:53:58 -07:00