Barbara Miller
5c6184201f
Merge branch 'master' into behavior-refactor
2017-07-21 16:24:36 -07:00
Barbara Miller
a563e9eb0c
Merge pull request #50 from internetarchive/ARI-5407
...
Add click selector for facebook’s new See More link
2017-07-21 14:43:38 -07:00
Barbara Miller
6c7d5124a7
correct merge error
2017-07-20 15:55:41 -07:00
Barbara Miller
9b69d554f1
Merge branch 'ARI-5379' into qa
2017-07-20 15:53:33 -07:00
Barbara Miller
7620ee6fae
only div.teaser, for *pm.gc.ca*
2017-07-20 15:52:53 -07:00
Barbara Miller
d08e5df10b
Merge branch 'ARI-5379' into qa
2017-07-19 15:52:00 -07:00
Barbara Miller
795b9ab809
improve/expand url_regex
2017-07-19 15:50:49 -07:00
Barbara Miller
ddb1ee283a
Merge branch 'ARI-5407' into qa
2017-07-19 14:47:30 -07:00
Barbara Miller
5c4961fbce
add selector for See More link
2017-07-19 14:44:27 -07:00
Barbara Miller
b33e2c194f
Merge branch 'ARI-5334' into qa
2017-07-18 15:19:22 -07:00
Barbara Miller
ab1083ad0c
better paging through google search results?
2017-07-18 15:18:59 -07:00
Barbara Miller
be2b663ad4
Merge branch 'ARI-5242' into qa
2017-07-18 14:30:26 -07:00
Barbara Miller
59571dadfa
div.compactTrackListItem for soundcloud.com
2017-07-18 14:27:10 -07:00
Barbara Miller
8a27b64a1d
Merge branch 'ARI-5242' into qa
2017-07-18 14:11:12 -07:00
Barbara Miller
3c9eb30212
div.soundItem selector for multi-item list
2017-07-18 14:08:20 -07:00
Barbara Miller
8524992840
var limit
2017-07-15 14:26:35 -07:00
Barbara Miller
ac70617a05
add limits
2017-07-14 18:06:59 -07:00
Noah Levitt
0955d56926
Merge pull request #46 from internetarchive/ARI-5379
...
ARI-5379: custom behavior for pm.gc.ca
2017-07-13 11:42:34 -07:00
Barbara Miller
0a2895364d
resolve conflict
2017-07-13 11:32:25 -07:00
Neil Minton
512931b6c8
Merge branch 'ari-5210' into qa
2017-07-12 17:30:06 -07:00
Neil Minton
218c7c6372
New pagination behavior for http://www.ssab.gov/Our-Work
2017-07-12 17:28:41 -07:00
Neil Minton
78d73a9adc
Merge branch 'ari-5210' into qa
2017-07-12 16:39:03 -07:00
Neil Minton
ddd61e4642
New pagination behavior for http://www.ssab.gov/Our-Work
2017-07-12 16:38:11 -07:00
Barbara Miller
1ee8a7e002
Merge branch 'ARI-5409' into qa
2017-07-12 14:19:51 -07:00
Barbara Miller
5e0c448e11
simpleclicks for tuebingen.de
2017-07-12 14:19:33 -07:00
Barbara Miller
f0bc6bb28e
Merge branch 'ARI-5242' into qa
2017-07-12 11:22:29 -07:00
Barbara Miller
762b65ee3e
selectors for multi-item playlist
2017-07-12 11:19:53 -07:00
Noah Levitt
c77f4e4249
dev version bump
2017-07-06 17:19:53 -07:00
Noah Levitt
6cbe097c87
Merge pull request #48 from vbanos/WWM-802
...
new skip cli options for brozzle-page and brozzler-worker
2017-07-06 17:19:28 -07:00
Vangelis Banos
8019eb4b5f
Hide the options using argparse.SUPPRESS
2017-07-06 06:25:04 +00:00
Barbara Miller
9db30b089c
supports rewritten www.news.com.au yaml
2017-07-05 18:46:18 -07:00
Vangelis Banos
475ddd329c
add skip cli options to brozzle-page
...
Add --skip-extract-outlinks --skip-visit-hashtags options to
`brozzle-page` command.
2017-07-05 07:31:14 +00:00
Vangelis Banos
89877670a4
--skip-extract-outlinks, --skip-visit-hashtags
...
Brozzler always did these actions. We make it possible to skip them with
this MR. Options are passed to `brozzler-worker`.
This feature is useful for tasks where we just need to retrieve a specific
page and we don't need to extract outlinks to continue crawling.
2017-07-04 21:50:05 +00:00
Noah Levitt
261e7977ad
Merge pull request #47 from galgeek/ARI-5389
...
custom behavior for pitchfork.com, based on facebook & pm-gc-ca behaviors
2017-07-03 16:40:27 -07:00
Barbara Miller
24a68cb55d
pitchfork behavior, based on pm-ca and facebook behaviors
2017-06-30 13:54:54 -07:00
Noah Levitt
7b6fbd7b1a
Merge branch 'master' into qa
...
* master:
fix "local variable 'start' referenced before assignment"
2017-06-27 11:09:00 -07:00
Noah Levitt
051e299a80
fix "local variable 'start' referenced before assignment"
2017-06-27 11:08:51 -07:00
Noah Levitt
b132c9c956
Merge branch 'master' into qa
...
* master:
enforce time limits based on time claimed by worker actively brozzling, to avoid problem of stopping crawls that haven't had much chance to crawl, because of cluster busy-ness
minimally update test_time_limit for new time accounting
make sure youtube-dl progress thing can't derail youtube-dl operation
2017-06-26 18:00:41 -07:00
Noah Levitt
b9640b8a30
enforce time limits based on time claimed by worker actively brozzling, to avoid problem of stopping crawls that haven't had much chance to crawl, because of cluster busy-ness
2017-06-26 18:00:32 -07:00
Noah Levitt
3385d727ac
minimally update test_time_limit for new time accounting
2017-06-26 17:57:50 -07:00
Noah Levitt
8ef7972ace
make sure youtube-dl progress thing can't derail youtube-dl operation
2017-06-26 16:10:40 -07:00
Noah Levitt
d45837cf7b
Merge branch 'master' into qa
...
* master:
have brozzler-list-sites --active use the index
2017-06-24 01:05:37 +00:00
Noah Levitt
caee2787b0
have brozzler-list-sites --active use the index
2017-06-24 01:05:19 +00:00
Noah Levitt
37404ba5a9
Merge branch 'master' into qa
...
* master:
make youtube-dl prefer unsegmented videos
try workaround, maybe this is an issue with https://blog.travis-ci.com/2017-06-21-trusty-updates-2017-Q2-launch
shed some light on the travis-ci error
restore reclamation of orphaned, claimed sites, and heartbeat site.last_claimed every 7 minutes during youtube-dl processing, to prevent another brozzler-worker claiming the site
2017-06-23 16:21:59 -07:00
Noah Levitt
35babeb01b
make youtube-dl prefer unsegmented videos
2017-06-23 15:19:30 -07:00
Noah Levitt
e6b5770f6c
try workaround, maybe this is an issue with https://blog.travis-ci.com/2017-06-21-trusty-updates-2017-Q2-launch
2017-06-23 14:07:07 -07:00
Noah Levitt
29b19b1e9d
shed some light on the travis-ci error
2017-06-23 13:56:25 -07:00
Noah Levitt
405c5725e4
restore reclamation of orphaned, claimed sites, and heartbeat site.last_claimed every 7 minutes during youtube-dl processing, to prevent another brozzler-worker claiming the site
2017-06-23 13:50:49 -07:00
Barbara Miller
b856751963
Merge branch 'ARI-5389' into qa
2017-06-19 18:01:28 -07:00
Barbara Miller
974a961713
pitchfork behavior, based on pm-ca and facebook behaviors
2017-06-19 17:59:55 -07:00