Barbara Miller
|
537eb1cf7f
|
Merge pull request #34 from galgeek/ARI-5193
mouseover for ky.gov sites
|
2017-04-05 16:13:57 -07:00 |
|
Noah Levitt
|
5bcd10c228
|
extract area/@href links, and add test for outlink extraction
|
2017-04-05 12:09:48 -07:00 |
|
Barbara Miller
|
847b68eaf4
|
add JIRA info
|
2017-04-04 15:52:03 -07:00 |
|
Barbara Miller
|
901321199c
|
mouseover for ky.gov sites
|
2017-03-31 15:48:01 -07:00 |
|
Noah Levitt
|
d4d3ef4fd3
|
ugh fix version number
|
2017-03-30 17:53:36 -07:00 |
|
Noah Levitt
|
125d77b8c4
|
consolidate job.py and site.py into model.py, and let Job and Site share the elapsed() method by way of a mixin
|
2017-03-29 18:49:04 -07:00 |
|
Noah Levitt
|
3d47805ec1
|
new model for crawling hashtags, each one is no longer a top-level page
|
2017-03-27 12:15:49 -07:00 |
|
Noah Levitt
|
a836269e95
|
remove some vestiges of old proxy stuff
|
2017-03-24 16:04:43 -07:00 |
|
Noah Levitt
|
a826fdc7ef
|
new test of frontier.seed_page
|
2017-03-24 15:45:40 -07:00 |
|
Noah Levitt
|
0e35de43b6
|
actually respect --proxy and --warcprox-auto options to brozzler-worker
|
2017-03-24 22:27:52 +00:00 |
|
Noah Levitt
|
934190084c
|
Refactor the way the proxy is configured. Job/site settings "proxy" and "enable_warcprox_features" are gone. Brozzler-worker now has mutually exclusive options --proxy and --warcprox-auto. --warcprox-auto means find an instance of warcprox in the service registry, and enable warcprox features. --proxy is provided, determines if proxy is warcprox by consulting http://{proxy_address}/status (see https://github.com/internetarchive/warcprox/commit/8caae0d7d3), and enables warcprox features if so.
|
2017-03-24 13:55:23 -07:00 |
|
Noah Levitt
|
9a2f181eb6
|
back to a dev version number
|
2017-03-22 16:12:39 -07:00 |
|
Noah Levitt
|
613dca29dc
|
1.1b10 since 1.1b9 has bugs :(
1.1b10
|
2017-03-22 16:11:26 -07:00 |
|
Noah Levitt
|
4ba25db684
|
ugh, avoid infinite recursion
|
2017-03-22 15:53:58 -07:00 |
|
Noah Levitt
|
34bb64297f
|
fix frontier tests now that enable_warcprox_features is simply omitted by default
|
2017-03-22 15:46:12 -07:00 |
|
Noah Levitt
|
4aa611af52
|
i dub thee 1.1b9
1.1b9
|
2017-03-22 15:25:55 -07:00 |
|
Noah Levitt
|
b63badea53
|
github didn't like that, how about a width in pixels
|
2017-03-22 15:23:47 -07:00 |
|
Noah Levitt
|
2e6fe9ccc0
|
maybe pypi supports RST image "scale"
|
2017-03-22 15:20:35 -07:00 |
|
Noah Levitt
|
aae810cc6e
|
fix brozzler-easy so that warcprox features are enabled automatically (feature was already there but broken)
|
2017-03-22 15:15:07 -07:00 |
|
Noah Levitt
|
603956ec41
|
restore accidentally deleted line of code
|
2017-03-21 13:08:18 -07:00 |
|
Noah Levitt
|
95ba334b89
|
initialize page.videos correctly in all cases
|
2017-03-21 11:10:57 -07:00 |
|
Noah Levitt
|
eeee523b18
|
three-value "brozzled" parameter for frontier.site_pages(); fix thing where every Site got a list of all the seeds from the job; and some more frontier tests to catch these kinds of things
|
2017-03-20 17:28:16 -07:00 |
|
Noah Levitt
|
0e9f4a0c26
|
forgot to add the new test data
|
2017-03-20 12:33:52 -07:00 |
|
Noah Levitt
|
e9c7606318
|
oops remove pdb call
|
2017-03-20 12:14:11 -07:00 |
|
Noah Levitt
|
13130bd9d9
|
save info about embedded videos in page document in rethinkdb
|
2017-03-20 11:49:11 -07:00 |
|
Noah Levitt
|
94ba56dca5
|
actually implement the brozzler-list-jobs --job option
|
2017-03-17 11:14:45 -07:00 |
|
Noah Levitt
|
0685c77d01
|
always save outlinks info on rethinkdb page object, get rid of 'remember_outlinks' option, to keep config simple, and because it's not a very expensive thing
|
2017-03-17 10:04:10 -07:00 |
|
Noah Levitt
|
701f7654a8
|
make brozzler-list-* a little more intuitive, maybe
|
2017-03-16 13:01:41 -07:00 |
|
Noah Levitt
|
6c81b40e28
|
if parent page has a redirect_url, check scope rules both with the parent_page original url and with the redirect url, with automated tests
|
2017-03-16 12:12:33 -07:00 |
|
Noah Levitt
|
0021a9d5f0
|
add the new urlcanon.MatchRule conditions to job_schema.yaml
|
2017-03-15 17:08:27 -07:00 |
|
Noah Levitt
|
12fb9eaa15
|
use urlcanon library for canonicalization, surtification, scope match rules
|
2017-03-15 14:59:51 -07:00 |
|
Noah Levitt
|
479f0f7e09
|
more automated tests of frontier stuff
|
2017-03-15 14:54:16 -07:00 |
|
Noah Levitt
|
9e1e002a71
|
turns out we want populate_defaults to happen in __init__, fix so things work right
|
2017-03-07 17:52:38 -08:00 |
|
Noah Levitt
|
01653c01d7
|
use updated doublethink library populate_defaults() to avoid problem where under certain circumstances field values from the database would be overwritten by defaults
|
2017-03-07 13:19:56 -08:00 |
|
Noah Levitt
|
242ff51ec7
|
fix bug with seed redirects where scope change was applied too late to affect scoping of outlinks from the seed (with automated tests)
|
2017-03-06 15:13:40 -08:00 |
|
Noah Levitt
|
40bbbb3524
|
add tests of backwards compatibility handling of start/stop times and fix a bug or two
|
2017-03-02 16:53:24 -08:00 |
|
Noah Levitt
|
569af05b11
|
rethinkstuff is now "doublethink
|
2017-03-02 12:48:45 -08:00 |
|
Noah Levitt
|
700b08b7d7
|
use new rethinkstuff ORM
|
2017-02-28 16:12:50 -08:00 |
|
Noah Levitt
|
a1f1681cad
|
fix issue where use of YoutubeDLSpy caused youtube-dl connections to remote servers to be kept open
|
2017-02-24 11:15:17 -08:00 |
|
Noah Levitt
|
b4f19e2594
|
fix typo
|
2017-02-23 10:47:04 -08:00 |
|
Noah Levitt
|
7417310d57
|
more pywb monkey-patching to get at least some youtube videos captured by brozzler to play back
|
2017-02-23 10:43:07 -08:00 |
|
Noah Levitt
|
2398031010
|
let the OS pick an available port, to avoid what appear to be timing issues causing multiple browsers to choose the same port
|
2017-02-22 12:44:19 -08:00 |
|
Noah Levitt
|
3c4ab834da
|
handle errors from extract-outlinks.js, which happens on polyvore.com because it changes the definition of Set 😭
|
2017-02-22 10:57:11 -08:00 |
|
Noah Levitt
|
0d0da22613
|
brozzler-list-jobs --yaml
|
2017-02-16 10:20:36 -08:00 |
|
Noah Levitt
|
f02d4ed40e
|
missed this in the last commit
|
2017-02-15 23:20:47 -08:00 |
|
Noah Levitt
|
b409e49cfa
|
deprecate current scope rule syntax and create new syntax with slightly different semantics (to be documented), and add parent_url_regex scope rule; unit test for scoping
|
2017-02-15 16:46:45 -08:00 |
|
Noah Levitt
|
c0057e591a
|
add --yaml option to brozzler-list-* commands
|
2017-02-15 23:13:09 +00:00 |
|
Noah Levitt
|
1054e8e3cb
|
take screenshot before running behavior (but after login) - thanks danielbicho
|
2017-02-15 09:13:44 -08:00 |
|
Noah Levitt
|
e58f4b7c44
|
logging tweaks
|
2017-02-10 15:19:28 -08:00 |
|
Noah Levitt
|
09fa41f959
|
fix TypeError: not all arguments converted during string formatting
|
2017-02-03 17:24:47 -08:00 |
|