Commit Graph

  • 0a2895364d resolve conflict Barbara Miller 2017-07-13 11:32:25 -07:00
  • 512931b6c8 Merge branch 'ari-5210' into qa Neil Minton 2017-07-12 17:30:06 -07:00
  • 218c7c6372 New pagination behavior for http://www.ssab.gov/Our-Work Neil Minton 2017-07-12 16:36:18 -07:00
  • 78d73a9adc Merge branch 'ari-5210' into qa Neil Minton 2017-07-12 16:39:03 -07:00
  • ddd61e4642 New pagination behavior for http://www.ssab.gov/Our-Work Neil Minton 2017-07-12 16:36:18 -07:00
  • 1ee8a7e002 Merge branch 'ARI-5409' into qa Barbara Miller 2017-07-12 14:19:51 -07:00
  • 5e0c448e11 simpleclicks for tuebingen.de Barbara Miller 2017-07-12 14:19:33 -07:00
  • f0bc6bb28e Merge branch 'ARI-5242' into qa Barbara Miller 2017-07-12 11:22:29 -07:00
  • 762b65ee3e selectors for multi-item playlist Barbara Miller 2017-07-12 11:19:28 -07:00
  • c77f4e4249 dev version bump Noah Levitt 2017-07-06 17:19:53 -07:00
  • 6cbe097c87 Merge pull request #48 from vbanos/WWM-802 Noah Levitt 2017-07-06 17:19:28 -07:00
  • 8019eb4b5f Hide the options using argparse.SUPPRESS Vangelis Banos 2017-07-06 06:25:04 +00:00
  • 9db30b089c supports rewritten www.news.com.au yaml Barbara Miller 2017-07-05 18:46:18 -07:00
  • 475ddd329c add skip cli options to brozzle-page Vangelis Banos 2017-07-05 07:31:14 +00:00
  • 89877670a4 --skip-extract-outlinks, --skip-visit-hashtags Vangelis Banos 2017-07-04 21:50:05 +00:00
  • 261e7977ad Merge pull request #47 from galgeek/ARI-5389 Noah Levitt 2017-07-03 16:40:27 -07:00
  • 24a68cb55d pitchfork behavior, based on pm-ca and facebook behaviors Barbara Miller 2017-06-19 17:59:55 -07:00
  • 7b6fbd7b1a Merge branch 'master' into qa Noah Levitt 2017-06-27 11:09:00 -07:00
  • 051e299a80 fix "local variable 'start' referenced before assignment" Noah Levitt 2017-06-27 11:08:51 -07:00
  • b132c9c956 Merge branch 'master' into qa Noah Levitt 2017-06-26 18:00:41 -07:00
  • b9640b8a30 enforce time limits based on time claimed by worker actively brozzling, to avoid problem of stopping crawls that haven't had much chance to crawl, because of cluster busy-ness Noah Levitt 2017-06-26 18:00:32 -07:00
  • 3385d727ac minimally update test_time_limit for new time accounting Noah Levitt 2017-06-26 17:57:50 -07:00
  • 8ef7972ace make sure youtube-dl progress thing can't derail youtube-dl operation Noah Levitt 2017-06-26 16:10:40 -07:00
  • d45837cf7b Merge branch 'master' into qa Noah Levitt 2017-06-24 01:05:37 +00:00
  • caee2787b0 have brozzler-list-sites --active use the index Noah Levitt 2017-06-24 01:05:19 +00:00
  • 37404ba5a9 Merge branch 'master' into qa Noah Levitt 2017-06-23 16:21:59 -07:00
  • 35babeb01b make youtube-dl prefer unsegmented videos Noah Levitt 2017-06-23 15:19:30 -07:00
  • e6b5770f6c try workaround, maybe this is an issue with https://blog.travis-ci.com/2017-06-21-trusty-updates-2017-Q2-launch Noah Levitt 2017-06-23 14:07:07 -07:00
  • 29b19b1e9d shed some light on the travis-ci error Noah Levitt 2017-06-23 13:56:25 -07:00
  • 405c5725e4 restore reclamation of orphaned, claimed sites, and heartbeat site.last_claimed every 7 minutes during youtube-dl processing, to prevent another brozzler-worker claiming the site Noah Levitt 2017-06-23 13:50:49 -07:00
  • b856751963 Merge branch 'ARI-5389' into qa Barbara Miller 2017-06-19 18:01:28 -07:00
  • 974a961713 pitchfork behavior, based on pm-ca and facebook behaviors Barbara Miller 2017-06-19 17:59:55 -07:00
  • d04a3f4f2b Merge branch 'master' into qa Noah Levitt 2017-06-19 11:21:10 -07:00
  • 6bae53e646 disable the re-claiming of sites that are marked claimed from more than an hour ago, because sometimes pages legitimately take longer than an hour to brozzle; working on a better solution to this issue Noah Levitt 2017-06-19 11:21:02 -07:00
  • 82b77b6903 WIP Barbara Miller 2017-06-16 10:10:42 -07:00
  • 20cc10efb1 Merge branch 'ARI-5379' into qa Barbara Miller 2017-06-13 18:24:55 -07:00
  • c2c246bb57 custom behavior for pm.gc.ca Barbara Miller 2017-06-13 18:24:30 -07:00
  • a6f7d1bc14 Merge branch 'master' into qa Noah Levitt 2017-06-12 15:42:59 -07:00
  • 7ae22381ef bump version number for pull request Noah Levitt 2017-06-12 15:42:49 -07:00
  • 33508af58f Merge pull request #44 from galgeek/ARI-5384 Noah Levitt 2017-06-12 15:42:04 -07:00
  • 03a6c64970 Merge branch 'ARI-5384' into qa Barbara Miller 2017-06-09 14:09:02 -07:00
  • 626220ce86 simpleclicks for recent issuu.com URLs Barbara Miller 2017-06-09 14:08:34 -07:00
  • 193ac43797 back to dev version number Noah Levitt 2017-06-08 17:33:29 -07:00
  • 44f74066cf 1.1b11 1.1b11 Noah Levitt 2017-06-08 17:30:24 -07:00
  • 27040fd8b7 mini fix Noah Levitt 2017-06-08 17:29:51 -07:00
  • 6edfbddd64 Merge branch 'master' into qa Noah Levitt 2017-06-07 13:08:31 -07:00
  • 02e1c88fac oops bump version Noah Levitt 2017-06-07 13:08:23 -07:00
  • da6e07fb61 Merge branch 'master' into qa Noah Levitt 2017-06-07 13:07:51 -07:00
  • 4d7f4518b5 use %r instead of calling repr() Noah Levitt 2017-06-07 13:07:42 -07:00
  • 99c8ebcc8b Merge branch 'master' into qa Noah Levitt 2017-06-07 08:52:12 -07:00
  • 65adc11d95 oops, should have bumped version number after merging pull requests Noah Levitt 2017-06-07 08:51:21 -07:00
  • 39fb811d13 Merge pull request #41 from galgeek/ARI-4868 Noah Levitt 2017-06-02 14:41:02 -07:00
  • 5e38a9755e Merge pull request #42 from galgeek/loginAndReloadSeed Noah Levitt 2017-06-02 14:03:51 -07:00
  • d41f30cbc7 Merge branch 'loginAndReloadSeed' into qa Barbara Miller 2017-06-02 13:40:36 -07:00
  • a0330d9716 updates per Noah's review Barbara Miller 2017-06-02 13:27:01 -07:00
  • 830b0eef89 undo post-login nav (ARI-5385 and/or ARI-5386) Barbara Miller 2017-06-02 12:45:21 -07:00
  • f2227e6759 have travis-ci test against python 3.5 and 3.6 too Noah Levitt 2017-05-26 13:28:00 -07:00
  • bdc0badec3 rewrite frontier.scope_and_schedule_outlinks() to use batch rethinkdb queries, because we have witnessed the method running for hours(!) Noah Levitt 2017-05-26 13:24:14 -07:00
  • 0a19770ba7 Merge branch 'master' into qa Noah Levitt 2017-05-24 11:36:15 -07:00
  • d904daea9c remove stray logging Noah Levitt 2017-05-24 11:36:06 -07:00
  • f0b9020c0a Merge branch 'master' into qa Noah Levitt 2017-05-23 11:35:19 -07:00
  • ac543ee5b6 use "ttl" for updated doublethink svc reg api Noah Levitt 2017-05-23 11:33:04 -07:00
  • 079db762d4 add relocated behavior file with updated copyright Barbara Miller 2017-05-22 12:38:43 -07:00
  • d7c31be8d0 enable huffpostslides.js Barbara Miller 2017-05-10 15:08:29 -07:00
  • 9c8f626c38 Merge branch 'master' into qa Noah Levitt 2017-05-16 15:47:27 -07:00
  • 89e7c8b079 fix exception from ReachedLimit.__repr__ when it has been instantiated implicitly and __init__ was not called Noah Levitt 2017-05-16 15:47:18 -07:00
  • 31dc6a2d97 improve thread_raise() so that the new tests pass Noah Levitt 2017-05-16 14:20:53 -07:00
  • d514eaec15 even more, better failing tests for thread_raise Noah Levitt 2017-05-16 14:00:10 -07:00
  • dd1b275653 Merge 197a5710339b5e513f6a1474b5c8ea770f63aa12 into d2525e2e8771e6ad74832bd0ac9489339a3a254e Mat Kelly 2017-05-16 01:52:24 +00:00
  • d2525e2e87 failing test for forthcoming behavior of thread_raise Noah Levitt 2017-05-15 16:20:20 -07:00
  • e5371ef0b0 Merge branch 'master' into qa Noah Levitt 2017-05-12 10:04:06 -07:00
  • 60c5a7c1c4 recognize ConnectionError (of which ConnectionResetError is a subclass) in _warcprox_write_record as a proxy error Noah Levitt 2017-05-12 10:03:53 -07:00
  • b2a4fbb17f Merge branch 'ARI-4868' into qa Barbara Miller 2017-05-10 15:09:00 -07:00
  • 35977f6276 enable huffpostslides.js Barbara Miller 2017-05-10 15:08:29 -07:00
  • 054625b8a5 Merge pull request #40 from BitBaron/ari-4960 Barbara Miller 2017-05-09 14:12:48 -07:00
  • 0c45ca2211 Merge branch 'master' into qa Noah Levitt 2017-05-03 16:43:38 -07:00
  • b4bf17df9b do a better job of making sure to shut down the browser when brozzle-page is killed Noah Levitt 2017-05-03 16:43:31 -07:00
  • c3637ecb35 Merge branch 'master' into qa Noah Levitt 2017-05-01 14:12:51 -07:00
  • 9d4cbbf6eb handle another rethinkdb outage corner case Noah Levitt 2017-05-01 14:12:43 -07:00
  • 15a3da61c6 Merge branch 'master' into qa Noah Levitt 2017-05-01 13:46:28 -07:00
  • 389db01458 BrozzlerWorkerThread separate from MainThread to avoid SIGTERM/SIGINT raising exception inside of some rethinkdb code or other sensitive code in that BrozzlerWorker.run() calls Noah Levitt 2017-05-01 13:46:19 -07:00
  • 69d8571871 Merge branch 'master' into qa Noah Levitt 2017-05-01 13:00:34 -07:00
  • 52433ade78 re-claim sites after 1 hour instead of 2 so that sites don't have to wait as long to be brozzled again in case of kill -9 brozzler-worker Noah Levitt 2017-05-01 12:59:44 -07:00
  • 000d40c4dc Merge pull request #39 from bnewbold/bnewbold-pr-template Noah Levitt 2017-04-26 14:34:32 -07:00
  • 83552eb444 add a github PR template for this repo bnewbold 2017-04-25 19:39:44 -07:00
  • d972919db0 Merge pull request #36 from nlevitt/safe-thread-raise Noah Levitt 2017-04-26 11:15:02 -07:00
  • 27ee8d53f8 Merge pull request #38 from ato/headless-doc Noah Levitt 2017-04-25 09:39:43 -07:00
  • 69aba8b762 update headless chrome instructions for regular chrome builds Alex Osborne 2017-04-25 14:58:43 +10:00
  • dcf4811470 Merge branch 'master' into safe-thread-raise Noah Levitt 2017-04-24 20:06:37 -07:00
  • d916b68ab9 use the new api with brozzler.thread_accept_exceptions() Noah Levitt 2017-04-24 20:02:34 -07:00
  • 0953e6972e refactor thread_raise safety to use a context manager Noah Levitt 2017-04-24 19:51:51 -07:00
  • f140e5bdbd allow this stupid test to fail Noah Levitt 2017-04-21 12:17:11 -07:00
  • ba519d7288 improve messaging when brozzler-stop-crawl is passed nonexistent seed/job id Noah Levitt 2017-04-20 18:04:17 -07:00
  • 7706bab8b8 safen up brozzler.thread_raise() to avoid interrupting rethinkdb transactions and such Noah Levitt 2017-04-20 17:08:16 -07:00
  • 4f5553954c Merge branch 'master' into qa Noah Levitt 2017-04-19 08:58:47 -07:00
  • b3fa7a4e39 quote that shell meta character Noah Levitt 2017-04-18 18:46:59 -07:00
  • 426916a238 need warcprox in python path for travis tests now Noah Levitt 2017-04-18 18:10:18 -07:00
  • d8904dc9e7 Merge branch 'master' into qa Noah Levitt 2017-04-18 17:54:21 -07:00
  • 8256a34b4f implement resilience to warcprox outage, i.e. deal with brozzler.ProxyError in brozzler-worker Noah Levitt 2017-04-18 17:54:12 -07:00
  • 5603ff5380 have _warcprox_write_record also raise ProxyError when appropriate, and test this Noah Levitt 2017-04-18 16:58:51 -07:00