Commit Graph

56 Commits

Author SHA1 Message Date
Patrick Cloke
77dfc1f939
Fix a bug where servers could be marked as up when they were failing (#16506)
After this change a server will only be reported as back online
if they were previously having requests fail.
2023-10-17 07:32:40 -04:00
Erik Johnston
d35bed8369
Don't wake up destination transaction queue if they're not due for retry. (#16223) 2023-09-04 17:14:09 +01:00
Erik Johnston
f84baecb6f
Don't reset retry timers on "valid" error codes (#16221) 2023-09-04 14:04:43 +01:00
Mathieu Velten
f0a860908b
Allow config of the backoff algorithm for the federation client. (#15754)
Adds three new configuration variables:

* destination_min_retry_interval is identical to before (10mn).
* destination_retry_multiplier is now 2 instead of 5, the maximum value will
  be reached slower.
* destination_max_retry_interval is one day instead of (essentially) infinity.

Capping this will cause destinations to continue to be retried sometimes instead
of being lost forever. The previous value was 2 ^ 62 milliseconds.
2023-08-03 14:36:55 -04:00
Eric Eastwood
40fa8294e3
Refactor MSC3030 /timestamp_to_event to move away from our snowflake pull from destination pattern (#14096)
1. `federation_client.timestamp_to_event(...)` now handles all `destination` looping and uses our generic `_try_destination_list(...)` helper.
 2. Consistently handling `NotRetryingDestination` and `FederationDeniedError` across `get_pdu` , backfill, and the generic `_try_destination_list` which is used for many places we use this pattern.
 3. `get_pdu(...)` now returns `PulledPduInfo` so we know which `destination` we ended up pulling the PDU from
2022-10-26 16:10:55 -05:00
Sean Quah
2be5a2b07b
Fix RetryDestinationLimiter re-starting finished log contexts (#12803)
Signed-off-by: Sean Quah <seanq@matrix.org>
2022-05-19 20:17:10 +01:00
Erik Johnston
8dd3e0e084
Immediately retry any requests that have backed off when a server comes back online. (#12500)
Otherwise it can take up to a minute for any in-flight `/send` requests to be retried.
2022-05-10 10:39:54 +01:00
David Robertson
a2b00a4486
Bump black and click versions (#12320) 2022-03-29 10:41:19 +00:00
reivilibre
524b8ead77
Add types to synapse.util. (#10601) 2021-09-10 17:03:18 +01:00
Erik Johnston
3e831f24ff
Don't hammer the database for destination retry timings every ~5mins (#10036) 2021-05-21 17:57:08 +01:00
Jonathan de Jong
4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Dan Callahan
aff1eb7c67
Tell Black to format code for Python 3.5 (#8664)
This allows trailing commas in multi-line arg lists.

Minor, but we might as well keep our formatting current with regard to
our minimum supported Python version.

Signed-off-by: Dan Callahan <danc@element.io>
2020-10-27 23:26:36 +00:00
Patrick Cloke
8a4a4186de
Simplify super() calls to Python 3 syntax. (#8344)
This converts calls like super(Foo, self) -> super().

Generated with:

    sed -i "" -Ee 's/super\([^\(]+\)/super()/g' **/*.py
2020-09-18 09:56:44 -04:00
Patrick Cloke
c619253db8
Stop sub-classing object (#8249) 2020-09-04 06:54:56 -04:00
Patrick Cloke
fe6cfc80ec
Convert some util functions to async (#8035) 2020-08-06 08:39:35 -04:00
Patrick Cloke
38e1fac886
Fix some spelling mistakes / typos. (#7811) 2020-07-09 09:52:58 -04:00
Erik Johnston
f44f1d2e83 Fix errors storing large retry intervals.
We have set the max retry interval to a value larger than a postgres or
sqlite int can hold, which caused exceptions when updating the
destinations table.

To fix postgres we need to change the column to a bigint, and for sqlite
we lower the max interval to 2**62 (which is still incredibly long).
2019-10-02 10:36:27 +01:00
Richard van der Hoff
1e19ce00bf
Add 'failure_ts' column to 'destinations' table (#6016)
Track the time that a server started failing at, for general analysis purposes.
2019-09-17 11:41:54 +01:00
Richard van der Hoff
3d882a7ba5
Remove the cap on federation retry interval. (#6026)
Essentially the intention here is to end up blacklisting servers which never
respond to federation requests.

Fixes https://github.com/matrix-org/synapse/issues/5113.
2019-09-12 13:00:13 +01:00
Richard van der Hoff
0388beafe4
Fix bug in calculating the federation retry backoff period (#6025)
This was intended to introduce an element of jitter; instead it gave you a
30/60 chance of resetting to zero.
2019-09-12 12:59:43 +01:00
Richard van der Hoff
7902bf1e1d
Clean up some code in the retry logic (#6017)
* remove some unused code
* make things which were constants into constants for efficiency and clarity
2019-09-11 15:14:56 +01:00
Amber Brown
4806651744
Replace returnValue with return (#5736) 2019-07-23 23:00:55 +10:00
Amber Brown
463b072b12
Move logging utilities out of the side drawer of util/ and into logging/ (#5606) 2019-07-04 00:07:04 +10:00
Richard van der Hoff
aa530e6800
Call RetryLimiter correctly (#5340)
Fixes a regression introduced in #5335.
2019-06-04 22:02:53 +01:00
Richard van der Hoff
dce6e9e0c1 Avoid rapidly backing-off a server if we ignore the retry interval 2019-06-03 23:58:42 +01:00
Richard van der Hoff
642199570c
Improve the logging when handling a federation transaction (#3904)
Let's try to rationalise the logging that happens when we are processing an
incoming transaction, to make it easier to figure out what is going wrong when
they take ages. In particular:

- make everything start with a [room_id event_id] prefix
- make sure we log a warning when catching exceptions rather than just turning
  them into other, more cryptic, exceptions.
2018-09-19 17:28:18 +01:00
Amber Brown
49af402019 run isort 2018-07-09 16:09:20 +10:00
Richard van der Hoff
2a13af23bc Use run_in_background in preference to preserve_fn
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
2018-04-27 12:55:51 +01:00
Matthew Hodgson
ab9f844aaf
Add federation_domain_whitelist option (#2820)
Add federation_domain_whitelist

gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
2018-01-22 19:11:18 +01:00
Richard van der Hoff
eaaabc6c4f replace 'except:' with 'except Exception:'
what could possibly go wrong
2017-10-23 15:52:32 +01:00
Richard van der Hoff
9397edb28b Merge pull request #2050 from matrix-org/rav/federation_backoff
push federation retry limiter down to matrixfederationclient
2017-03-23 22:27:01 +00:00
Richard van der Hoff
5a16cb4bf0 Ignore backoff history for invites, aliases, and roomdirs
Add a param to the federation client which lets us ignore historical backoff
data for federation queries, and set it for a handful of operations.
2017-03-23 12:23:22 +00:00
Richard van der Hoff
4bd597d9fc push federation retry limiter down to matrixfederationclient
rather than having to instrument everywhere we make a federation call,
make the MatrixFederationHttpClient manage the retry limiter.
2017-03-23 09:28:46 +00:00
Richard van der Hoff
19b9366d73 Fix a couple of logcontext leaks
Use preserve_fn to correctly manage the logcontexts around things we don't want
to yield on.
2017-03-23 00:17:46 +00:00
Erik Johnston
df4ecff5a9 Correctly raise exceptions for ratelimitng. Ratelimit on 401 2017-02-01 15:42:19 +00:00
Erik Johnston
fe08db2713 Remove explicit < 400 check as apparently this is confusing 2017-01-31 15:21:32 +00:00
Erik Johnston
4c0ec15bdc Comment 2017-01-31 13:53:46 +00:00
Erik Johnston
85c590105f Comment 2017-01-31 13:46:38 +00:00
Erik Johnston
ae7a132f38 Better handle 404 response for federation /send/ 2017-01-31 13:40:09 +00:00
Erik Johnston
11bfe438a2 Use correct var 2016-11-24 15:26:53 +00:00
Erik Johnston
aaecffba3a Correctly handle 500's and 429 on federation 2016-11-24 15:04:49 +00:00
Erik Johnston
90565d015e Invalidate retry cache in both directions 2016-11-22 17:45:44 +00:00
Mark Haines
bf81e38d36 Fix retry utils to check if the exception is a subclass of CME 2016-07-28 10:47:02 +01:00
Matthew Hodgson
6c28ac260c copyrights 2016-01-07 04:26:29 +00:00
Erik Johnston
eacb068ac2 Retry dead servers a lot less often 2015-11-02 16:56:30 +00:00
Erik Johnston
9236136f3a Make work in both Maria and SQLite. Fix tests 2015-04-01 14:12:33 +01:00
Erik Johnston
cc3d3babb0 Remove unused import 2015-02-18 12:01:41 +00:00
Erik Johnston
36e144091b Remove spurious comma. Remove temp run_on_reactor 2015-02-18 11:25:20 +00:00
Erik Johnston
b17bd31da0 Temporarily add a run_on_reactor() call 2015-02-18 11:17:26 +00:00
Erik Johnston
859fbd4423 s/self._clock/self.clock/ 2015-02-18 10:39:14 +00:00