Commit Graph

97 Commits

Author SHA1 Message Date
Richard van der Hoff
9255a6cb17 Improve exception handling for background processes
There were a bunch of places where we fire off a process to happen in the
background, but don't have any exception handling on it - instead relying on
the unhandled error being logged when the relevent deferred gets
garbage-collected.

This is unsatisfactory for a number of reasons:
 - logging on garbage collection is best-effort and may happen some time after
   the error, if at all
 - it can be hard to figure out where the error actually happened.
 - it is logged as a scary CRITICAL error which (a) I always forget to grep for
   and (b) it's not really CRITICAL if a background process we don't care about
   fails.

So this is an attempt to add exception handling to everything we fire off into
the background.
2018-04-27 11:07:40 +01:00
Erik Johnston
f67e906e18 Set all metrics at the same time 2018-04-12 11:18:19 +01:00
Erik Johnston
4dae4a97ed Track last processed event received_ts 2018-04-11 14:27:09 +01:00
Erik Johnston
92e34615c5 Track where event stream processing have gotten up to 2018-04-11 12:13:40 +01:00
Erik Johnston
a060dfa132 Use run_in_background instead 2018-04-10 14:25:11 +01:00
Erik Johnston
1246d23710 Preserve log contexts correctly 2018-04-10 12:04:32 +01:00
Erik Johnston
d49cbf712f Log event ID on exception 2018-04-10 12:03:41 +01:00
Erik Johnston
6e025a97b4 Handle all events in a room correctly 2018-04-09 16:02:48 +01:00
Erik Johnston
11974f3787 Send federation events concurrently 2018-04-09 11:47:10 +01:00
Erik Johnston
145d14656b Handle exceptions in get_hosts_for_room when sending events over federation 2018-04-09 11:47:01 +01:00
Matthew Hodgson
ab9f844aaf
Add federation_domain_whitelist option (#2820)
Add federation_domain_whitelist

gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
2018-01-22 19:11:18 +01:00
Richard van der Hoff
a027c2af8d Metrics for events processed in appservice and fed sender
More metrics I wished I'd had
2018-01-15 18:23:24 +00:00
Richard van der Hoff
d4fb4f7c52 Clear logcontext before starting fed txn queue runner
These processes take a long time compared to the request, so there is lots of
"Entering|Restoring dead context" in the logs. Let's try to shut it up a bit.
2017-11-28 15:26:14 +00:00
Richard van der Hoff
01bbacf3c4 Fix up logcontext handling in (federation) TransactionQueue
Avoid using preserve_context_over_function, which has problems with respect to
logcontexts.
2017-10-06 22:39:25 +01:00
Erik Johnston
6e2a7ee1bc Remove spurious log lines 2017-06-07 11:05:17 +01:00
Erik Johnston
dfbda5e025 Faster cache for get_joined_hosts 2017-05-25 17:24:44 +01:00
Erik Johnston
ec5c4499f4 Make presence use cached users/hosts in room 2017-05-16 16:01:43 +01:00
Erik Johnston
7166854f41 Add cache for get_current_hosts_in_room 2017-05-02 10:36:35 +01:00
Erik Johnston
247c736b9b Merge pull request #2115 from matrix-org/erikj/dedupe_federation_repl
Reduce federation replication traffic
2017-04-12 11:07:13 +01:00
Erik Johnston
1745069543 Comment 2017-04-12 10:17:10 +01:00
Erik Johnston
c7ddb5ef7a Reuse get_interested_parties 2017-04-12 10:16:26 +01:00
Paul "LeoNerd" Evans
11dbceb761 Add a counter metric for successfully-sent transactions 2017-04-11 17:16:12 +01:00
Erik Johnston
a8c8e4efd4 Comment 2017-04-11 15:35:49 +01:00
Erik Johnston
6308ac45b0 Move get_interested_remotes back to presence handler 2017-04-11 15:19:26 +01:00
Erik Johnston
b9b72bc6e2 Comments 2017-04-11 15:15:34 +01:00
Erik Johnston
29574fd5b3 Reduce federation presence replication traffic
This is mainly done by moving the calculation of where to send presence
updates from the presence handler to the transaction queue, so we only
need to send the presence event (and not the destinations) across the
replication connection. Before we were duplicating by sending the full
state across once per destination.
2017-04-10 16:48:30 +01:00
Erik Johnston
85be3dde81 Bail early if remote wouldn't be retried (#2064)
* Bail early if remote wouldn't be retried

* Don't always return true

* Just use get_retry_limiter

* Spelling
2017-03-29 11:48:27 +01:00
Erik Johnston
2a28b79e04 Batch sending of device list pokes 2017-03-24 14:44:49 +00:00
Richard van der Hoff
4bd597d9fc push federation retry limiter down to matrixfederationclient
rather than having to instrument everywhere we make a federation call,
make the MatrixFederationHttpClient manage the retry limiter.
2017-03-23 09:28:46 +00:00
Richard van der Hoff
29ed09e80a Fix assertion to stop transaction queue getting wedged
... and update some docstrings to correctly reflect the types being used.

get_new_device_msgs_for_remote can return a long under some circumstances,
which was being stored in last_device_list_stream_id_by_dest, and was then
upsetting things on the next loop.
2017-03-15 12:16:55 +00:00
Richard van der Hoff
0c4cf9372b Fix a race in transaction queue
It was theoretically possible for a PDU to get queued and not sent for ages. On
closer inspection I think there were bigger problems elsewhere, but we might as
well fix this since it's easy.
2017-02-20 16:46:25 +00:00
Erik Johnston
df4ecff5a9 Correctly raise exceptions for ratelimitng. Ratelimit on 401 2017-02-01 15:42:19 +00:00
Erik Johnston
ae7a132f38 Better handle 404 response for federation /send/ 2017-01-31 13:40:09 +00:00
Erik Johnston
51e9fe36e4 Fix up sending of m.device_list_update edus 2017-01-25 16:55:21 +00:00
Erik Johnston
2367c5568c Add basic implementation of local device list changes 2017-01-25 14:27:27 +00:00
Erik Johnston
f878f64f43 Lower the not retrying host log line to debug 2017-01-17 17:20:39 +00:00
Mark Haines
f784980d2b Only send events that originate on this server.
Or events that are sent via the federation "send_join" API.

This should match the behaviour from before v0.18.5 and #1635 landed.
2017-01-05 11:26:30 +00:00
Mark Haines
e02bdaf08b Get the destinations from the state from before the event
Rather than the state after then event.
2017-01-04 15:17:15 +00:00
Mark Haines
b6b67715ed Send ALL membership events to the server that was affected.
Send all membership changes to the server that was affected.
This ensures that if the last member of a room on a server
was kicked or banned they get told about it.
2017-01-04 13:56:20 +00:00
Erik Johnston
aaecffba3a Correctly handle 500's and 429 on federation 2016-11-24 15:04:49 +00:00
Erik Johnston
50934ce460 Comments 2016-11-21 16:55:23 +00:00
Erik Johnston
9687e039e7 Remove explicit calls to send_pdu 2016-11-21 14:48:51 +00:00
Erik Johnston
524d61bf7e Fix tests 2016-11-21 11:53:02 +00:00
Erik Johnston
7c9cdb2245 Store federation stream positions in the database 2016-11-21 11:33:08 +00:00
Erik Johnston
f8ee66250a Handle sending events and device messages over federation 2016-11-17 15:48:04 +00:00
Erik Johnston
59ef517e6b Use new federation_sender DI 2016-11-16 14:47:52 +00:00
Erik Johnston
847d5db1d1 Add transaction queue and transport layer to DI 2016-11-16 14:47:52 +00:00
Erik Johnston
daec6fc355 Move logic into transaction_queue 2016-11-16 14:47:52 +00:00
Erik Johnston
0e830d3770 Rename transaction queue functions to send_* 2016-11-16 14:47:52 +00:00
Erik Johnston
af4701b311 Fix incorrect attribute name 2016-09-09 17:36:56 +01:00