Commit Graph

  • f90d05075e Merge branch 'ari-4960' into qa Neil Minton 2017-04-18 15:24:14 -07:00
  • f541dce5c3 Crawl Google Calendar for fortstjames.ca Neil Minton 2017-04-18 15:22:33 -07:00
  • d05474173a Merge branch 'master' into qa Noah Levitt 2017-04-18 12:00:33 -07:00
  • ac972d399f fix robots.txt proxy down test by setting site.id (cached robots is stored by site.id, and other tests that ran earlier with no site.id were interfering); and test another kind of connection error, for whatever that's worth Noah Levitt 2017-04-18 12:00:23 -07:00
  • c1469d7793 narrow the tests further to speed up debugging further travis-debugging Noah Levitt 2017-04-18 10:26:10 -07:00
  • 964496aa7a narrow the tests to speed up debugging Noah Levitt 2017-04-18 10:25:06 -07:00
  • 0880ab615b debugging travis-ci test failure Noah Levitt 2017-04-18 10:23:45 -07:00
  • 6844cb5bcb Merge branch 'master' into qa Noah Levitt 2017-04-17 18:15:32 -07:00
  • dc43794363 raise brozzler.ProxyError in case of proxy error fetching robots.txt, doing youtube-dl, or doing raw fetch Noah Levitt 2017-04-17 18:15:22 -07:00
  • 349b41ab32 raise new exception brozzler.ProxyError in case of proxy error browsing a page Noah Levitt 2017-04-17 18:14:02 -07:00
  • 87a7301f4d make brozzle-page respect --proxy (no test for this!) Noah Levitt 2017-04-17 18:11:09 -07:00
  • 0e90950de2 oops, version bump for previous commit Noah Levitt 2017-04-17 18:10:56 -07:00
  • 0884b4cd56 bubble up proxy errors fetching robots.txt, with unit test, and documentation Noah Levitt 2017-04-17 16:47:05 -07:00
  • 929f046ebb Merge branch 'master' into qa Noah Levitt 2017-04-14 18:06:24 -07:00
  • df7734f2ca new command line utility brozzler-stop-crawl, with tests Noah Levitt 2017-04-14 18:06:15 -07:00
  • 11279e001b Merge branch 'ARI-5259' into qa Barbara Miller 2017-04-14 15:20:11 -07:00
  • 72e9d8da58 blog.sin.com.cn pagination Barbara Miller 2017-04-14 13:18:47 -07:00
  • a768b07a65 Merge branch 'master' into qa Noah Levitt 2017-04-14 11:46:39 -07:00
  • fae60e9960 parameterize command line entry points and add tests of --version, a rudimentary check that the commands at least run Noah Levitt 2017-04-14 11:46:26 -07:00
  • b7731bdc75 Merge branch 'master' into qa Noah Levitt 2017-04-05 17:02:09 -07:00
  • b3cf746f53 stupid version number bump Noah Levitt 2017-04-05 17:01:52 -07:00
  • 62917a6f1a Revert "bump version number for last pull request" Noah Levitt 2017-04-05 17:01:06 -07:00
  • 40022ad0c2 Merge branch 'master' into qa Noah Levitt 2017-04-05 16:15:33 -07:00
  • d192fc269e bump version number for last pull request Noah Levitt 2017-04-05 16:15:24 -07:00
  • 537eb1cf7f Merge pull request #34 from galgeek/ARI-5193 Barbara Miller 2017-04-05 16:13:57 -07:00
  • 6535922fa6 Merge branch 'master' into qa Noah Levitt 2017-04-05 12:09:58 -07:00
  • 5bcd10c228 extract area/@href links, and add test for outlink extraction Noah Levitt 2017-04-05 12:09:48 -07:00
  • 010c2869dd Merge branch 'ARI-5193' into qa Barbara Miller 2017-04-04 15:52:50 -07:00
  • 847b68eaf4 add JIRA info Barbara Miller 2017-04-04 15:51:55 -07:00
  • bdf4c2db42 Merge branch 'ARI-5193' into qa Barbara Miller 2017-03-31 15:49:24 -07:00
  • 901321199c mouseover for ky.gov sites Barbara Miller 2017-03-31 15:48:01 -07:00
  • db8c1d36fa Merge branch 'master' into qa Noah Levitt 2017-03-30 17:53:45 -07:00
  • d4d3ef4fd3 ugh fix version number Noah Levitt 2017-03-30 17:53:36 -07:00
  • 082f10d327 Merge branch 'master' into qa Noah Levitt 2017-03-29 18:49:15 -07:00
  • 125d77b8c4 consolidate job.py and site.py into model.py, and let Job and Site share the elapsed() method by way of a mixin Noah Levitt 2017-03-29 18:49:04 -07:00
  • a83c11b302 Merge branch 'master' into qa Noah Levitt 2017-03-27 12:16:11 -07:00
  • 3d47805ec1 new model for crawling hashtags, each one is no longer a top-level page Noah Levitt 2017-03-27 12:15:49 -07:00
  • a836269e95 remove some vestiges of old proxy stuff Noah Levitt 2017-03-24 16:04:43 -07:00
  • d373611061 Merge branch 'master' into qa Noah Levitt 2017-03-24 15:45:48 -07:00
  • a826fdc7ef new test of frontier.seed_page Noah Levitt 2017-03-24 15:45:40 -07:00
  • ec3472ce61 Merge branch 'master' into qa Noah Levitt 2017-03-24 22:28:20 +00:00
  • 0e35de43b6 actually respect --proxy and --warcprox-auto options to brozzler-worker Noah Levitt 2017-03-24 22:27:52 +00:00
  • fb2d760306 Merge branch 'master' into qa Noah Levitt 2017-03-24 14:38:13 -07:00
  • 934190084c Refactor the way the proxy is configured. Job/site settings "proxy" and "enable_warcprox_features" are gone. Brozzler-worker now has mutually exclusive options --proxy and --warcprox-auto. --warcprox-auto means find an instance of warcprox in the service registry, and enable warcprox features. --proxy is provided, determines if proxy is warcprox by consulting http://{proxy_address}/status (see https://github.com/internetarchive/warcprox/commit/8caae0d7d3), and enables warcprox features if so. Noah Levitt 2017-03-24 13:55:23 -07:00
  • bc2d4d5cba Merge branch 'master' into qa Noah Levitt 2017-03-22 16:12:50 -07:00
  • 9a2f181eb6 back to a dev version number Noah Levitt 2017-03-22 16:12:39 -07:00
  • 613dca29dc 1.1b10 since 1.1b9 has bugs :( 1.1b10 Noah Levitt 2017-03-22 16:11:26 -07:00
  • 06ef045e63 Merge branch 'master' into qa Noah Levitt 2017-03-22 15:54:07 -07:00
  • 4ba25db684 ugh, avoid infinite recursion Noah Levitt 2017-03-22 15:53:58 -07:00
  • 34bb64297f fix frontier tests now that enable_warcprox_features is simply omitted by default Noah Levitt 2017-03-22 15:46:12 -07:00
  • 4aa611af52 i dub thee 1.1b9 1.1b9 Noah Levitt 2017-03-22 15:25:55 -07:00
  • b63badea53 github didn't like that, how about a width in pixels Noah Levitt 2017-03-22 15:23:47 -07:00
  • 2e6fe9ccc0 maybe pypi supports RST image "scale" Noah Levitt 2017-03-22 15:20:35 -07:00
  • 8f7a820b05 Merge branch 'master' into qa Noah Levitt 2017-03-22 15:15:17 -07:00
  • aae810cc6e fix brozzler-easy so that warcprox features are enabled automatically (feature was already there but broken) Noah Levitt 2017-03-22 15:15:07 -07:00
  • 2fda63ed9d Merge branch 'master' into qa Noah Levitt 2017-03-21 13:08:24 -07:00
  • 603956ec41 restore accidentally deleted line of code Noah Levitt 2017-03-21 13:08:18 -07:00
  • a7a880ba97 Merge branch 'master' into qa Noah Levitt 2017-03-21 11:11:05 -07:00
  • 95ba334b89 initialize page.videos correctly in all cases Noah Levitt 2017-03-21 11:10:57 -07:00
  • a334ff5e69 Merge branch 'master' into qa Noah Levitt 2017-03-20 17:28:24 -07:00
  • eeee523b18 three-value "brozzled" parameter for frontier.site_pages(); fix thing where every Site got a list of all the seeds from the job; and some more frontier tests to catch these kinds of things Noah Levitt 2017-03-20 17:28:16 -07:00
  • 4e55dea519 Merge branch 'master' into qa Noah Levitt 2017-03-20 12:33:59 -07:00
  • 0e9f4a0c26 forgot to add the new test data Noah Levitt 2017-03-20 12:33:52 -07:00
  • 14373b40a4 Merge branch 'master' into qa Noah Levitt 2017-03-20 12:14:17 -07:00
  • e9c7606318 oops remove pdb call Noah Levitt 2017-03-20 12:14:11 -07:00
  • a1ef257474 Merge branch 'master' into qa Noah Levitt 2017-03-20 11:49:20 -07:00
  • 13130bd9d9 save info about embedded videos in page document in rethinkdb Noah Levitt 2017-03-20 11:49:11 -07:00
  • 6f41c70892 Merge branch 'master' into qa Noah Levitt 2017-03-17 11:14:51 -07:00
  • 94ba56dca5 actually implement the brozzler-list-jobs --job option Noah Levitt 2017-03-17 11:14:45 -07:00
  • 775bfb123f Merge branch 'master' into qa Noah Levitt 2017-03-17 10:04:18 -07:00
  • 0685c77d01 always save outlinks info on rethinkdb page object, get rid of 'remember_outlinks' option, to keep config simple, and because it's not a very expensive thing Noah Levitt 2017-03-17 10:04:10 -07:00
  • 3d1c5f8b2b Merge branch 'master' into qa Noah Levitt 2017-03-16 13:01:48 -07:00
  • 701f7654a8 make brozzler-list-* a little more intuitive, maybe Noah Levitt 2017-03-16 13:01:41 -07:00
  • ff7f1d207c Merge branch 'master' into qa Noah Levitt 2017-03-16 12:12:41 -07:00
  • 6c81b40e28 if parent page has a redirect_url, check scope rules both with the parent_page original url and with the redirect url, with automated tests Noah Levitt 2017-03-16 12:12:33 -07:00
  • 2aacf01950 Merge branch 'master' into qa Noah Levitt 2017-03-15 17:08:37 -07:00
  • 0021a9d5f0 add the new urlcanon.MatchRule conditions to job_schema.yaml Noah Levitt 2017-03-15 17:08:27 -07:00
  • 63474c09f2 Merge branch 'master' into qa Noah Levitt 2017-03-15 15:00:01 -07:00
  • 12fb9eaa15 use urlcanon library for canonicalization, surtification, scope match rules Noah Levitt 2017-03-15 14:59:51 -07:00
  • 479f0f7e09 more automated tests of frontier stuff Noah Levitt 2017-03-15 14:54:16 -07:00
  • 6526c40bb8 Merge branch 'master' into qa Noah Levitt 2017-03-08 17:34:34 -08:00
  • 9e1e002a71 turns out we want populate_defaults to happen in __init__, fix so things work right Noah Levitt 2017-03-07 17:52:38 -08:00
  • 335b67f42b Merge branch 'master' into qa Noah Levitt 2017-03-07 13:20:10 -08:00
  • 01653c01d7 use updated doublethink library populate_defaults() to avoid problem where under certain circumstances field values from the database would be overwritten by defaults Noah Levitt 2017-03-07 13:19:56 -08:00
  • 59316a38f8 Merge branch 'master' into qa Noah Levitt 2017-03-06 15:13:48 -08:00
  • 242ff51ec7 fix bug with seed redirects where scope change was applied too late to affect scoping of outlinks from the seed (with automated tests) Noah Levitt 2017-03-06 15:13:40 -08:00
  • d55dd0b26f Merge branch 'master' into qa Noah Levitt 2017-03-02 16:53:31 -08:00
  • 40bbbb3524 add tests of backwards compatibility handling of start/stop times and fix a bug or two Noah Levitt 2017-03-02 16:53:24 -08:00
  • c434bbf3da Merge branch 'master' into qa Noah Levitt 2017-03-02 12:48:58 -08:00
  • 569af05b11 rethinkstuff is now "doublethink Noah Levitt 2017-03-02 12:48:45 -08:00
  • 95f362d49a Merge branch 'master' into qa Noah Levitt 2017-02-28 16:12:58 -08:00
  • 700b08b7d7 use new rethinkstuff ORM Noah Levitt 2017-02-28 16:12:50 -08:00
  • 6db69de2e5 Merge branch 'master' into qa Noah Levitt 2017-02-24 11:15:34 -08:00
  • a1f1681cad fix issue where use of YoutubeDLSpy caused youtube-dl connections to remote servers to be kept open Noah Levitt 2017-02-24 11:15:17 -08:00
  • 2b41cebfc1 Merge branch 'master' into qa Noah Levitt 2017-02-23 10:47:10 -08:00
  • b4f19e2594 fix typo Noah Levitt 2017-02-23 10:47:04 -08:00
  • b496bce320 Merge branch 'master' into qa Noah Levitt 2017-02-23 10:43:15 -08:00
  • 7417310d57 more pywb monkey-patching to get at least some youtube videos captured by brozzler to play back Noah Levitt 2017-02-23 10:43:07 -08:00
  • cb75bb6e04 Merge branch 'master' into qa Noah Levitt 2017-02-22 12:44:27 -08:00
  • 2398031010 let the OS pick an available port, to avoid what appear to be timing issues causing multiple browsers to choose the same port Noah Levitt 2017-02-22 12:44:19 -08:00