Noah Levitt
3d47805ec1
new model for crawling hashtags, each one is no longer a top-level page
2017-03-27 12:15:49 -07:00
Noah Levitt
a836269e95
remove some vestiges of old proxy stuff
2017-03-24 16:04:43 -07:00
Noah Levitt
d373611061
Merge branch 'master' into qa
...
* master:
new test of frontier.seed_page
2017-03-24 15:45:48 -07:00
Noah Levitt
a826fdc7ef
new test of frontier.seed_page
2017-03-24 15:45:40 -07:00
Noah Levitt
ec3472ce61
Merge branch 'master' into qa
...
* master:
actually respect --proxy and --warcprox-auto options to brozzler-worker
2017-03-24 22:28:20 +00:00
Noah Levitt
0e35de43b6
actually respect --proxy and --warcprox-auto options to brozzler-worker
2017-03-24 22:27:52 +00:00
Noah Levitt
fb2d760306
Merge branch 'master' into qa
...
* master:
Refactor the way the proxy is configured. Job/site settings "proxy" and "enable_warcprox_features" are gone. Brozzler-worker now has mutually exclusive options --proxy and --warcprox-auto. --warcprox-auto means find an instance of warcprox in the service registry, and enable warcprox features. --proxy is provided, determines if proxy is warcprox by consulting http://{proxy_address}/status (see 8caae0d7d3
), and enables warcprox features if so.
2017-03-24 14:38:13 -07:00
Noah Levitt
934190084c
Refactor the way the proxy is configured. Job/site settings "proxy" and "enable_warcprox_features" are gone. Brozzler-worker now has mutually exclusive options --proxy and --warcprox-auto. --warcprox-auto means find an instance of warcprox in the service registry, and enable warcprox features. --proxy is provided, determines if proxy is warcprox by consulting http://{proxy_address}/status (see 8caae0d7d3
), and enables warcprox features if so.
2017-03-24 13:55:23 -07:00
Noah Levitt
bc2d4d5cba
Merge branch 'master' into qa
...
* master:
back to a dev version number
1.1b10 since 1.1b9 has bugs :(
2017-03-22 16:12:50 -07:00
Noah Levitt
9a2f181eb6
back to a dev version number
2017-03-22 16:12:39 -07:00
Noah Levitt
613dca29dc
1.1b10 since 1.1b9 has bugs :(
2017-03-22 16:11:26 -07:00
Noah Levitt
06ef045e63
Merge branch 'master' into qa
...
* master:
ugh, avoid infinite recursion
fix frontier tests now that enable_warcprox_features is simply omitted by default
i dub thee 1.1b9
github didn't like that, how about a width in pixels
maybe pypi supports RST image "scale"
2017-03-22 15:54:07 -07:00
Noah Levitt
4ba25db684
ugh, avoid infinite recursion
2017-03-22 15:53:58 -07:00
Noah Levitt
34bb64297f
fix frontier tests now that enable_warcprox_features is simply omitted by default
2017-03-22 15:46:12 -07:00
Noah Levitt
4aa611af52
i dub thee 1.1b9
2017-03-22 15:25:55 -07:00
Noah Levitt
b63badea53
github didn't like that, how about a width in pixels
2017-03-22 15:23:47 -07:00
Noah Levitt
2e6fe9ccc0
maybe pypi supports RST image "scale"
2017-03-22 15:20:35 -07:00
Noah Levitt
8f7a820b05
Merge branch 'master' into qa
...
* master:
fix brozzler-easy so that warcprox features are enabled automatically (feature was already there but broken)
2017-03-22 15:15:17 -07:00
Noah Levitt
aae810cc6e
fix brozzler-easy so that warcprox features are enabled automatically (feature was already there but broken)
2017-03-22 15:15:07 -07:00
Noah Levitt
2fda63ed9d
Merge branch 'master' into qa
...
* master:
restore accidentally deleted line of code
2017-03-21 13:08:24 -07:00
Noah Levitt
603956ec41
restore accidentally deleted line of code
2017-03-21 13:08:18 -07:00
Noah Levitt
a7a880ba97
Merge branch 'master' into qa
...
* master:
initialize page.videos correctly in all cases
2017-03-21 11:11:05 -07:00
Noah Levitt
95ba334b89
initialize page.videos correctly in all cases
2017-03-21 11:10:57 -07:00
Noah Levitt
a334ff5e69
Merge branch 'master' into qa
...
* master:
three-value "brozzled" parameter for frontier.site_pages(); fix thing where every Site got a list of all the seeds from the job; and some more frontier tests to catch these kinds of things
2017-03-20 17:28:24 -07:00
Noah Levitt
eeee523b18
three-value "brozzled" parameter for frontier.site_pages(); fix thing where every Site got a list of all the seeds from the job; and some more frontier tests to catch these kinds of things
2017-03-20 17:28:16 -07:00
Noah Levitt
4e55dea519
Merge branch 'master' into qa
...
* master:
forgot to add the new test data
2017-03-20 12:33:59 -07:00
Noah Levitt
0e9f4a0c26
forgot to add the new test data
2017-03-20 12:33:52 -07:00
Noah Levitt
14373b40a4
Merge branch 'master' into qa
...
* master:
oops remove pdb call
2017-03-20 12:14:17 -07:00
Noah Levitt
e9c7606318
oops remove pdb call
2017-03-20 12:14:11 -07:00
Noah Levitt
a1ef257474
Merge branch 'master' into qa
...
* master:
save info about embedded videos in page document in rethinkdb
2017-03-20 11:49:20 -07:00
Noah Levitt
13130bd9d9
save info about embedded videos in page document in rethinkdb
2017-03-20 11:49:11 -07:00
Noah Levitt
6f41c70892
Merge branch 'master' into qa
...
* master:
actually implement the brozzler-list-jobs --job option
2017-03-17 11:14:51 -07:00
Noah Levitt
94ba56dca5
actually implement the brozzler-list-jobs --job option
2017-03-17 11:14:45 -07:00
Noah Levitt
775bfb123f
Merge branch 'master' into qa
...
* master:
always save outlinks info on rethinkdb page object, get rid of 'remember_outlinks' option, to keep config simple, and because it's not a very expensive thing
2017-03-17 10:04:18 -07:00
Noah Levitt
0685c77d01
always save outlinks info on rethinkdb page object, get rid of 'remember_outlinks' option, to keep config simple, and because it's not a very expensive thing
2017-03-17 10:04:10 -07:00
Noah Levitt
3d1c5f8b2b
Merge branch 'master' into qa
...
* master:
make brozzler-list-* a little more intuitive, maybe
2017-03-16 13:01:48 -07:00
Noah Levitt
701f7654a8
make brozzler-list-* a little more intuitive, maybe
2017-03-16 13:01:41 -07:00
Noah Levitt
ff7f1d207c
Merge branch 'master' into qa
...
* master:
if parent page has a redirect_url, check scope rules both with the parent_page original url and with the redirect url, with automated tests
2017-03-16 12:12:41 -07:00
Noah Levitt
6c81b40e28
if parent page has a redirect_url, check scope rules both with the parent_page original url and with the redirect url, with automated tests
2017-03-16 12:12:33 -07:00
Noah Levitt
2aacf01950
Merge branch 'master' into qa
...
* master:
add the new urlcanon.MatchRule conditions to job_schema.yaml
2017-03-15 17:08:37 -07:00
Noah Levitt
0021a9d5f0
add the new urlcanon.MatchRule conditions to job_schema.yaml
2017-03-15 17:08:27 -07:00
Noah Levitt
63474c09f2
Merge branch 'master' into qa
...
* master:
use urlcanon library for canonicalization, surtification, scope match rules
more automated tests of frontier stuff
2017-03-15 15:00:01 -07:00
Noah Levitt
12fb9eaa15
use urlcanon library for canonicalization, surtification, scope match rules
2017-03-15 14:59:51 -07:00
Noah Levitt
479f0f7e09
more automated tests of frontier stuff
2017-03-15 14:54:16 -07:00
Noah Levitt
6526c40bb8
Merge branch 'master' into qa
...
* master:
turns out we want populate_defaults to happen in __init__, fix so things work right
2017-03-08 17:34:34 -08:00
Noah Levitt
9e1e002a71
turns out we want populate_defaults to happen in __init__, fix so things work right
2017-03-07 17:52:38 -08:00
Noah Levitt
335b67f42b
Merge branch 'master' into qa
...
* master:
use updated doublethink library populate_defaults() to avoid problem where under certain circumstances field values from the database would be overwritten by defaults
2017-03-07 13:20:10 -08:00
Noah Levitt
01653c01d7
use updated doublethink library populate_defaults() to avoid problem where under certain circumstances field values from the database would be overwritten by defaults
2017-03-07 13:19:56 -08:00
Noah Levitt
59316a38f8
Merge branch 'master' into qa
...
* master:
fix bug with seed redirects where scope change was applied too late to affect scoping of outlinks from the seed (with automated tests)
2017-03-06 15:13:48 -08:00
Noah Levitt
242ff51ec7
fix bug with seed redirects where scope change was applied too late to affect scoping of outlinks from the seed (with automated tests)
2017-03-06 15:13:40 -08:00