Commit graph

1361 commits

Author SHA1 Message Date
Noah Levitt
d04a3f4f2b Merge branch 'master' into qa
* master:
  disable the re-claiming of sites that are marked claimed from more than an hour ago, because sometimes pages legitimately take longer than an hour to brozzle; working on a better solution to this issue
2017-06-19 11:21:10 -07:00
Noah Levitt
6bae53e646 disable the re-claiming of sites that are marked claimed from more than an hour ago, because sometimes pages legitimately take longer than an hour to brozzle; working on a better solution to this issue 2017-06-19 11:21:02 -07:00
Barbara Miller
82b77b6903 WIP 2017-06-16 10:10:42 -07:00
Barbara Miller
20cc10efb1 Merge branch 'ARI-5379' into qa 2017-06-13 18:24:55 -07:00
Barbara Miller
c2c246bb57 custom behavior for pm.gc.ca 2017-06-13 18:24:30 -07:00
Noah Levitt
a6f7d1bc14 Merge branch 'master' into qa
* master:
  bump version number for pull request
2017-06-12 15:42:59 -07:00
Noah Levitt
7ae22381ef bump version number for pull request 2017-06-12 15:42:49 -07:00
Noah Levitt
33508af58f Merge pull request #44 from galgeek/ARI-5384
simpleclicks for recent issuu.com URLs
2017-06-12 15:42:04 -07:00
Barbara Miller
03a6c64970 Merge branch 'ARI-5384' into qa 2017-06-09 14:09:02 -07:00
Barbara Miller
626220ce86 simpleclicks for recent issuu.com URLs 2017-06-09 14:08:34 -07:00
Noah Levitt
193ac43797 back to dev version number 2017-06-08 17:33:29 -07:00
Noah Levitt
44f74066cf 1.1b11 2017-06-08 17:30:24 -07:00
Noah Levitt
27040fd8b7 mini fix 2017-06-08 17:29:51 -07:00
Noah Levitt
6edfbddd64 Merge branch 'master' into qa
* master:
  oops bump version
2017-06-07 13:08:31 -07:00
Noah Levitt
02e1c88fac oops bump version 2017-06-07 13:08:23 -07:00
Noah Levitt
da6e07fb61 Merge branch 'master' into qa
* master:
  use %r instead of calling repr()
2017-06-07 13:07:51 -07:00
Noah Levitt
4d7f4518b5 use %r instead of calling repr() 2017-06-07 13:07:42 -07:00
Noah Levitt
99c8ebcc8b Merge branch 'master' into qa
* master:
  oops, should have bumped version number after merging pull requests
  add relocated behavior file with updated copyright
  enable huffpostslides.js
2017-06-07 08:52:12 -07:00
Noah Levitt
65adc11d95 oops, should have bumped version number after merging pull requests 2017-06-07 08:51:21 -07:00
Noah Levitt
39fb811d13 Merge pull request #41 from galgeek/ARI-4868
ARI-4868 behavior for Huffington Post slideshow
2017-06-02 14:41:02 -07:00
Noah Levitt
5e38a9755e Merge pull request #42 from galgeek/loginAndReloadSeed
login and reload original url if navigated away
2017-06-02 14:03:51 -07:00
Barbara Miller
d41f30cbc7 Merge branch 'loginAndReloadSeed' into qa 2017-06-02 13:40:36 -07:00
Barbara Miller
a0330d9716 updates per Noah's review 2017-06-02 13:27:01 -07:00
Barbara Miller
830b0eef89 undo post-login nav (ARI-5385 and/or ARI-5386) 2017-06-02 12:47:19 -07:00
Noah Levitt
f2227e6759 have travis-ci test against python 3.5 and 3.6 too 2017-05-26 13:28:00 -07:00
Noah Levitt
bdc0badec3 rewrite frontier.scope_and_schedule_outlinks() to use batch rethinkdb queries, because we have witnessed the method running for hours(!) 2017-05-26 13:24:14 -07:00
Noah Levitt
0a19770ba7 Merge branch 'master' into qa
* master:
  remove stray logging
2017-05-24 11:36:15 -07:00
Noah Levitt
d904daea9c remove stray logging 2017-05-24 11:36:06 -07:00
Noah Levitt
f0b9020c0a Merge branch 'master' into qa
* master:
  use "ttl" for updated doublethink svc reg api
2017-05-23 11:35:19 -07:00
Noah Levitt
ac543ee5b6 use "ttl" for updated doublethink svc reg api 2017-05-23 11:33:04 -07:00
Barbara Miller
079db762d4 add relocated behavior file with updated copyright 2017-05-22 12:38:50 -07:00
Barbara Miller
d7c31be8d0 enable huffpostslides.js 2017-05-22 12:32:28 -07:00
Noah Levitt
9c8f626c38 Merge branch 'master' into qa
* master:
  fix exception from ReachedLimit.__repr__ when it has been instantiated implicitly and __init__ was not called
  improve thread_raise() so that the new tests pass
  even more, better failing tests for thread_raise
  failing test for forthcoming behavior of thread_raise
2017-05-16 15:47:27 -07:00
Noah Levitt
89e7c8b079 fix exception from ReachedLimit.__repr__ when it has been instantiated implicitly and __init__ was not called 2017-05-16 15:47:18 -07:00
Noah Levitt
31dc6a2d97 improve thread_raise() so that the new tests pass
1. If thread is not currently accepting exceptions, queue it and raise if and
   when it does start accepting them. This fixes problem of thread_raise
   exceptions being ignored when raised just before the target thread starts
   accepting exceptions.
2. Avoid problems caused by raising multiple exceptions in the same
   thread in quick succession by ensuring that only one is actually raised for
   a given `with` block. This type of occurrence had been putting brozzler into
   a borked/frozen state.
2017-05-16 14:20:53 -07:00
Noah Levitt
d514eaec15 even more, better failing tests for thread_raise 2017-05-16 14:00:10 -07:00
Noah Levitt
d2525e2e87 failing test for forthcoming behavior of thread_raise 2017-05-15 16:20:20 -07:00
Noah Levitt
e5371ef0b0 Merge branch 'master' into qa
* master:
  recognize ConnectionError (of which ConnectionResetError is a subclass) in _warcprox_write_record as a proxy error
2017-05-12 10:04:06 -07:00
Noah Levitt
60c5a7c1c4 recognize ConnectionError (of which ConnectionResetError is a subclass) in _warcprox_write_record as a proxy error 2017-05-12 10:03:53 -07:00
Barbara Miller
b2a4fbb17f Merge branch 'ARI-4868' into qa 2017-05-10 15:09:00 -07:00
Barbara Miller
35977f6276 enable huffpostslides.js 2017-05-10 15:08:29 -07:00
Barbara Miller
054625b8a5 Merge pull request #40 from BitBaron/ari-4960
Crawl Google Calendar for fortstjames.ca
2017-05-09 14:12:48 -07:00
Noah Levitt
0c45ca2211 Merge branch 'master' into qa
* master:
  do a better job of making sure to shut down the browser when brozzle-page is killed
2017-05-03 16:43:38 -07:00
Noah Levitt
b4bf17df9b do a better job of making sure to shut down the browser when brozzle-page is killed 2017-05-03 16:43:31 -07:00
Noah Levitt
c3637ecb35 Merge branch 'master' into qa
* master:
  handle another rethinkdb outage corner case
2017-05-01 14:12:51 -07:00
Noah Levitt
9d4cbbf6eb handle another rethinkdb outage corner case 2017-05-01 14:12:43 -07:00
Noah Levitt
15a3da61c6 Merge branch 'master' into qa
* master:
  BrozzlerWorkerThread separate from MainThread to avoid SIGTERM/SIGINT raising exception inside of some rethinkdb code or other sensitive code in that BrozzlerWorker.run() calls
2017-05-01 13:46:28 -07:00
Noah Levitt
389db01458 BrozzlerWorkerThread separate from MainThread to avoid SIGTERM/SIGINT raising exception inside of some rethinkdb code or other sensitive code in that BrozzlerWorker.run() calls 2017-05-01 13:46:19 -07:00
Noah Levitt
69d8571871 Merge branch 'master' into qa
* master:
  re-claim sites after 1 hour instead of 2 so that sites don't have to wait as long to be brozzled again in case of kill -9 brozzler-worker
  add a github PR template for this repo
  update headless chrome instructions for regular chrome builds
  use the new api `with brozzler.thread_accept_exceptions()`
  refactor thread_raise safety to use a context manager
  allow this stupid test to fail
  improve messaging when brozzler-stop-crawl is passed nonexistent seed/job id
  safen up brozzler.thread_raise() to avoid interrupting rethinkdb transactions and such
2017-05-01 13:00:34 -07:00
Noah Levitt
52433ade78 re-claim sites after 1 hour instead of 2 so that sites don't have to wait as long to be brozzled again in case of kill -9 brozzler-worker 2017-05-01 13:00:04 -07:00