Commit Graph

102 Commits

Author SHA1 Message Date
Shay
a86b2f6837
Fix a bug where redactions were not being sent over federation if we did not have the original event. (#13813) 2022-10-11 11:18:45 -07:00
Erik Johnston
299b00d968
Prioritize outbound to-device over device list updates (#13922)
Otherwise device list changes for large accounts can temporarily delay to-device messages.
2022-09-27 15:17:41 +01:00
reivilibre
526f84bc2e
Fix Prometheus recording rules to not use legacy metric names. (#13718) 2022-09-08 15:01:42 +01:00
Erik Johnston
2318603772
Add some logging to help track down #13444 (#13679) 2022-09-01 13:54:52 +01:00
Nick Mills-Barrett
21eeacc995
Federation Sender & Appservice Pusher Stream Optimisations (#13251)
* Replace `get_new_events_for_appservice` with `get_all_new_events_stream`

The functions were near identical and this brings the AS worker closer
to the way federation senders work which can allow for multiple workers
to handle AS traffic.

* Pull received TS alongside events when processing the stream

This avoids an extra query -per event- when both federation sender
and appservice pusher process events.
2022-07-15 09:36:56 +01:00
Erik Johnston
a7e506ddee
Reduce amount of state we pull out when attempting to send catchup PDUs. (#12963)
* Don't pull out state for catchup

* Newsfile

* Merge newsfile
2022-06-07 14:35:56 +01:00
Erik Johnston
44de53bb79
Reduce state pulled from DB due to sending typing and receipts over federation (#12964)
Reducing the amount of state we pull from the DB is useful as fetching state is expensive in terms of DB, CPU and memory.
2022-06-06 16:46:11 +01:00
Patrick Cloke
c52abc1cfd
Additional constants for EDU types. (#12884)
Instead of hard-coding strings in many places.
2022-05-27 07:14:36 -04:00
Patrick Cloke
b5707ceaba
Avoid attempting to delete push actions for remote users. (#12879)
Remote users will never have push actions, so we can avoid a database
round-trip/transaction completely.
2022-05-26 07:09:16 -04:00
Dirk Klimpel
6edefef602
Add some type hints to datastore (#12717) 2022-05-17 15:29:06 +01:00
Richard van der Hoff
d66d68f917
Add extra debug logging to federation sender (#12614)
... in order to debug some problems we've been having with certain events not
being sent when expected.
2022-05-03 16:32:40 +01:00
Richard van der Hoff
db2edf5a65
Exclude OOB memberships from the federation sender (#12570)
As the comment says, there is no need to process such events, and indeed we
need to avoid doing so.

Fixes #12509.
2022-05-03 12:47:56 +00:00
Erik Johnston
423cca9efe
Spread out sending device lists to remote hosts (#12132) 2022-03-04 11:48:15 +00:00
Richard van der Hoff
e24ff8ebe3
Remove HomeServer.get_datastore() (#12031)
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
2022-02-23 11:04:02 +00:00
Richard van der Hoff
a85dde3445
Minor typing fixes (#12034)
These started failing in
https://github.com/matrix-org/synapse/pull/12031... I'm a bit mystified by how
they ever worked.
2022-02-21 18:37:04 +00:00
David Robertson
f160fe18e3
Debug for device lists updates (#11760)
Debug for #8631.

I'm having a hard time tracking down what's going wrong in that issue.
In the reported example, I could see server A sending federation traffic
to server B and all was well. Yet B reports out-of-sync device updates
from A.

I couldn't see what was _in_ the events being sent from A to B. So I
have added some crude logging to track

- when we have updates to send to a remote HS
- the edus we actually accumulate to send
- when a federation transaction includes a device list update edu
- when such an EDU is received

This is a bit of a sledgehammer.
2022-01-20 13:38:44 +00:00
Patrick Cloke
10a88ba91c
Use auto_attribs/native type hints for attrs classes. (#11692) 2022-01-13 13:49:28 +00:00
Patrick Cloke
d2279f471b
Add most of the missing type hints to synapse.federation. (#11483)
This skips a few methods which are difficult to type.
2021-12-02 16:18:10 +00:00
reivilibre
75ca0a6168
Annotate log_function decorator (#10943)
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2021-10-27 17:27:23 +01:00
Andrew Morgan
aa2c027792
Remove unnecessary parentheses around tuples returned from methods (#10889) 2021-09-23 11:59:07 +01:00
Patrick Cloke
8c7a531e27
Use direct references for some configuration variables (part 2) (#10812) 2021-09-15 08:34:52 -04:00
Patrick Cloke
01c88a09cd
Use direct references for some configuration variables (#10798)
Instead of proxying through the magic getter of the RootConfig
object. This should be more performant (and is more explicit).
2021-09-13 13:07:12 -04:00
reivilibre
524b8ead77
Add types to synapse.util. (#10601) 2021-09-10 17:03:18 +01:00
Patrick Cloke
1de26b3467
Convert Transaction and Edu object to attrs (#10542)
Instead of wrapping the JSON into an object, this creates concrete
instances for Transaction and Edu. This allows for improved type
hints and simplified code.
2021-08-06 09:39:59 -04:00
Erik Johnston
ac5c221208
Stagger send presence to remotes (#10398)
This is to help with performance, where trying to connect to thousands
of hosts at once can consume a lot of CPU (due to TLS etc).

Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2021-07-15 11:52:56 +01:00
Jonathan de Jong
bf72d10dbf
Use inline type hints in various other places (in synapse/) (#10380) 2021-07-15 11:02:43 +01:00
Richard van der Hoff
b378d98c8f
Add debug logging for issue #9533 (#9959)
Hopefully this will help us track down where to-device messages are getting
lost/delayed.
2021-05-11 11:04:03 +01:00
Andrew Morgan
4e0fd35bc9 Revert "Experimental Federation Speedup (#9702)"
This reverts commit 05e8c70c05.
2021-04-28 11:38:33 +01:00
Richard van der Hoff
294c675033
Remove synapse.types.Collection (#9856)
This is no longer required, since we have dropped support for Python 3.5.
2021-04-22 16:43:50 +01:00
Erik Johnston
db70435de7
Fix bug where we sent remote presence states to remote servers (#9850) 2021-04-20 13:37:54 +01:00
Erik Johnston
2b7dd21655
Don't send normal presence updates over federation replication stream (#9828) 2021-04-19 10:50:49 +01:00
Richard van der Hoff
5a153772c1
remove HomeServer.get_config (#9815)
Every single time I want to access the config object, I have to remember
whether or not we use `get_config`. Let's just get rid of it.
2021-04-14 19:09:08 +01:00
Jonathan de Jong
05e8c70c05
Experimental Federation Speedup (#9702)
This basically speeds up federation by "squeezing" each individual dual database call (to destinations and destination_rooms), which previously happened per every event, into one call for an entire batch (100 max).

Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
2021-04-14 17:19:02 +01:00
Jonathan de Jong
4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Erik Johnston
3a569fb200 Fix sharded federation sender sometimes using 100% CPU.
We pull all destinations requiring catchup from the DB in batches.
However, if all those destinations get filtered out (due to the
federation sender being sharded), then the `last_processed` destination
doesn't get updated, and we keep requesting the same set repeatedly.
2021-04-08 17:34:07 +01:00
Andrew Morgan
04819239ba
Add a Synapse Module for configuring presence update routing (#9491)
At the moment, if you'd like to share presence between local or remote users, those users must be sharing a room together. This isn't always the most convenient or useful situation though.

This PR adds a module to Synapse that will allow deployments to set up extra logic on where presence updates should be routed. The module must implement two methods, `get_users_for_states` and `get_interested_users`. These methods are given presence updates or user IDs and must return information that Synapse will use to grant passing presence updates around.

A method is additionally added to `ModuleApi` which allows triggering a set of users to receive the current, online presence information for all users they are considered interested in. This is the equivalent of that user receiving presence information during an initial sync. 

The goal of this module is to be fairly generic and useful for a variety of applications, with hard requirements being:

* Sending state for a specific set or all known users to a defined set of local and remote users.
* The ability to trigger an initial sync for specific users, so they receive all current state.
2021-04-06 14:38:30 +01:00
Erik Johnston
33548f37aa
Improve tracing for to device messages (#9686) 2021-04-01 17:08:21 +01:00
Patrick Cloke
da75d2ea1f
Add type hints for the federation sender. (#9681)
Includes an abstract base class which both the FederationSender
and the FederationRemoteSendQueue must implement.
2021-03-29 11:43:20 -04:00
Erik Johnston
c602ba8336
Fixed undefined variable error in catchup (#9664)
Broke in #9640

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2021-03-24 16:12:47 +00:00
Erik Johnston
dd71eb0f8a
Make federation catchup send last event from any server. (#9640)
Currently federation catchup will send the last *local* event that we
failed to send to the remote. This can cause issues for large rooms
where lots of servers have sent events while the remote server was down,
as when it comes back up again it'll be flooded with events from various
points in the DAG.

Instead, let's make it so that all the servers send the most recent
events, even if its not theirs. The remote should deduplicate the
events, so there shouldn't be much overhead in doing this.
Alternatively, the servers could only send local events if they were
also extremities and hope that the other server will send the event
over, but that is a bit risky.
2021-03-18 15:52:26 +00:00
Erik Johnston
026503fa3b
Don't go into federation catch up mode so easily (#9561)
Federation catch up mode is very inefficient if the number of events
that the remote server has missed is small, since handling gaps can be
very expensive, c.f. #9492.

Instead of going into catch up mode whenever we see an error, we instead
do so only if we've backed off from trying the remote for more than an
hour (the assumption being that in such a case it is more than a
transient failure).
2021-03-15 14:42:40 +00:00
Richard van der Hoff
8a4b3738f3
Replace last_*_pdu_age metrics with timestamps (#9540)
Following the advice at
https://prometheus.io/docs/practices/instrumentation/#timestamps-not-time-since,
it's preferable to export unix timestamps, not ages.

There doesn't seem to be any particular naming convention for timestamp
metrics.
2021-03-04 16:40:18 +00:00
Andrew Morgan
8bcfc2eaad
Be smarter about which hosts to send presence to when processing room joins (#9402)
This PR attempts to eliminate unnecessary presence sending work when your local server joins a room, or when a remote server joins a room your server is participating in by processing state deltas in chunks rather than individually.

---

When your server joins a room for the first time, it requests the historical state as well. This chunk of new state is passed to the presence handler which, after filtering that state down to only membership joins, will send presence updates to homeservers for each join processed.

It turns out that we were being a bit naive and processing each event individually, and sending out presence updates for every one of those joins. Even if many different joins were users on the same server (hello IRC bridges), we'd send presence to that same homeserver for every remote user join we saw.

This PR attempts to deduplicate all of that by processing the entire batch of state deltas at once, instead of only doing each join individually. We process the joins and note down which servers need which presence:

* If it was a local user join, send that user's latest presence to all servers in the room
* If it was a remote user join, send the presence for all local users in the room to that homeserver

We deduplicate by inserting all of those pending updates into a dictionary of the form:

```
{
  server_name1: {presence_update1, ...},
  server_name2: {presence_update1, presence_update2, ...}
}
```

Only after building this dict do we then start sending out presence updates.
2021-02-19 11:37:29 +00:00
Eric Eastwood
0a00b7ff14
Update black, and run auto formatting over the codebase (#9381)
- Update black version to the latest
 - Run black auto formatting over the codebase
    - Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
 - Update `code_style.md` docs around installing black to use the correct version
2021-02-16 22:32:34 +00:00
Erik Johnston
dd8da8c5f6
Precompute joined hosts and store in Redis (#9198) 2021-01-26 13:57:31 +00:00
Erik Johnston
921a3f8a59
Fix not sending events over federation when using sharded event persisters (#8536)
* Fix outbound federaion with multiple event persisters.

We incorrectly notified federation senders that the minimum persisted
stream position had advanced when we got an `RDATA` from an event
persister.

Notifying of federation senders already correctly happens in the
notifier, so we just delete the offending line.

* Change some interfaces to use RoomStreamToken.

By enforcing use of `RoomStreamTokens` we make it less likely that
people pass in random ints that they got from somewhere random.
2020-10-14 13:27:51 +01:00
Richard van der Hoff
f31f8e6319
Remove stream ordering from Metadata dict (#8452)
There's no need for it to be in the dict as well as the events table. Instead,
we store it in a separate attribute in the EventInternalMetadata object, and
populate that on load.

This means that we can rely on it being correctly populated for any event which
has been persited to the database.
2020-10-05 14:43:14 +01:00
Richard van der Hoff
3bd3707cb9
Fix malformed log line in new federation "catch up" logic (#8442) 2020-10-02 11:05:29 +01:00
Richard van der Hoff
c1ef579b63
Add prometheus metrics to track federation delays (#8430)
Add a pair of federation metrics to track the delays in sending PDUs to/from 
particular servers.
2020-10-01 11:09:12 +01:00
reivilibre
36efbcaf51
Catch-up after Federation Outage (bonus): Catch-up on Synapse Startup (#8322)
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Fix _set_destination_retry_timings

This came about because the code assumed that retry_interval
could not be NULL — which has been challenged by catch-up.
2020-09-18 14:59:13 +01:00