Commit Graph

4376 Commits

Author SHA1 Message Date
Patrick Cloke
66a7857334
Use stable identifiers for MSC3771 & MSC3773. (#14050)
These are both part of Matrix 1.4 which has now been released.

For now, support both the unstable and stable identifiers.
2022-10-07 09:26:40 -04:00
Shay
7b7478e8b6
Batch up notifications after event persistence (#14033) 2022-10-05 10:12:48 -07:00
Nick Mills-Barrett
0506bb100e
Remove get rooms for user with stream ordering (#13991)
By getting the joined rooms before the current token we avoid any reading
history to confirm a user *was* in a room. We can then use any membership
change events, which we already fetch during sync, to determine the final
list of joined room IDs.
2022-10-04 16:42:59 +01:00
Patrick Cloke
b4ec4f5e71
Track notification counts per thread (implement MSC3773). (#13776)
When retrieving counts of notifications segment the results based on the
thread ID, but choose whether to return them as individual threads or as
a single summed field by letting the client opt-in via a sync flag.

The summarization code is also updated to be per thread, instead of per
room.
2022-10-04 09:47:04 -04:00
Patrick Cloke
b706111b78
Do not return unspecced original_event field when using the stable /relations endpoint. (#14025)
Keep the old behavior (of including the original_event field) for any
requests to the /unstable version of the endpoint, but do not include
the field when the /v1 version is used.

This should avoid new clients from depending on this field, but will
not help with current dependencies.
2022-10-03 16:47:15 +00:00
lukasdenk
719488dda8
Add query parameter ts to allow appservices set the origin_server_ts for state events. (#11866)
MSC3316 declares that both /rooms/{roomId}/send and /rooms/{roomId}/state
should accept a ts parameter for appservices. This change expands support
to /state and adds tests.
2022-10-03 13:30:45 +00:00
David Robertson
a423f45294
Fix twisted trunk mypy errors (#14012) 2022-10-03 13:26:49 +00:00
Eric Eastwood
2769ef4df1
Revert the general exception recording introduced in #13814 (#13969)
* Maybe not catch all errors to avoid things in the nature-of CancelledError

See https://github.com/matrix-org/synapse/pull/13815#discussion_r983384698

* Remove general exception tracking

* Add changelog
2022-10-03 10:14:45 +01:00
Eric Eastwood
a52c40e2a6
Fix get_users_in_room mis-use in transfer_room_state_on_room_upgrade (#13960)
Spawning from looking into `get_users_in_room` while investigating https://github.com/matrix-org/synapse/issues/13942#issuecomment-1262787050.

See https://github.com/matrix-org/synapse/pull/13575#discussion_r953023755 for the original exploration around finding `get_users_in_room` mis-uses.

Related to the following PRs where we also cleaned up some `get_users_in_room` mis-uses:

 - https://github.com/matrix-org/synapse/pull/13605
 - https://github.com/matrix-org/synapse/pull/13608
 - https://github.com/matrix-org/synapse/pull/13606
 - https://github.com/matrix-org/synapse/pull/13958
2022-09-30 20:10:50 -05:00
Eric Eastwood
ad4c14e4b0
Clarifications in user directory for users who share rooms tracking (#13966)
Spawned while working on [`get_users_in_room` mis-uses](https://github.com/matrix-org/synapse/pull/13958#discussion_r984074897) and thinking we could use `get_local_users_in_room` here but we can't.

From first glance, it seemed like this was only using local users from all of the `is_mine_id(user_id)` checks but I see that it does actually use remote users. Just making things a little more clear here what it does and mentions remote users so maybe that will be more obvious in the future.
2022-09-30 14:40:18 -05:00
David Robertson
5507bfa769
Discourage automatic replies to Synapse's emails (#13957)
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2022-09-30 13:23:37 +00:00
Erik Johnston
3dfc4a08dc
Fix performance regression in get_users_in_room (#13972)
Fixes #13942. Introduced in #13575.

Basically, let's only get the ordered set of hosts out of the DB if we need an ordered set of hosts. Since we split the function up the caching won't be as good, but I think it will still be fine as e.g. multiple backfill requests for the same room will hit the cache.
2022-09-30 13:15:32 +01:00
Nick Mills-Barrett
a466164647
Optimise get_rooms_for_user (drop with_stream_ordering) (#13787) 2022-09-29 13:55:12 +00:00
Brendan Abolivier
be76cd8200
Allow admins to require a manual approval process before new accounts can be used (using MSC3866) (#13556) 2022-09-29 15:23:24 +02:00
reivilibre
73ecff7e9e
Improve backfill robustness by trying more servers. (#13890)
Co-authored-by: Eric Eastwood <erice@element.io>
2022-09-29 10:00:02 +00:00
Erik Johnston
5f659d4a88
Handle local device list updates during partial join (#13934) 2022-09-28 23:22:35 +01:00
Eric Eastwood
df8b91ed2b
Limit and filter the number of backfill points to get from the database (#13879)
There is no need to grab thousands of backfill points when we only need 5 to make the `/backfill` request with. We need to grab a few extra in case the first few aren't visible in the history.

Previously, we grabbed thousands of backfill points from the database, then sorted and filtered them in the app. Fetching the 4.6k backfill points for `#matrix:matrix.org` from the database takes ~50ms - ~570ms so it's not like this saves a lot of time 🤷. But it might save us more time now that `get_backfill_points_in_room`/`get_insertion_event_backward_extremities_in_room` are more complicated after https://github.com/matrix-org/synapse/pull/13635 

This PR moves the filtering and limiting to the SQL query so we just have less data to work with in the first place.

Part of https://github.com/matrix-org/synapse/issues/13356
2022-09-28 15:26:16 -05:00
Erik Johnston
4b17a5ace8
Handle remote device list updates during partial join (#13913)
c.f. #12993 (comment), point 3

This stores all device list updates that we receive while partial joins are ongoing, and processes them once we have the full state.

Note: We don't actually process the device lists in the same ways as if we weren't partially joined. Instead of updating the device list remote cache, we simply notify local users that a change in the remote user's devices has happened. I think this is safe as if the local user requests the keys for the remote user and we don't have them we'll simply fetch them as normal.
2022-09-28 13:42:43 +00:00
Kateřina Churanová
6caa303083
fix: Push notifications for invite over federation (#13719) 2022-09-28 12:31:53 +00:00
Shay
8ab16a92ed
Persist CreateRoom events to DB in a batch (#13800) 2022-09-28 10:11:48 +00:00
Shay
a2cf66a94d
Prepatory work for batching events to send (#13487)
This PR begins work on batching up events during the creation of a room. The PR splits out the creation and sending/persisting of the events. The first three events in the creation of the room-creating the room, joining the creator to the room, and the power levels event are sent sequentially, while the subsequent events are created and collected to be sent at the end of the function. This is currently done by appending them to a list and then iterating over the list to send, the next step (after this PR) would be to send and persist the collected events as a batch.
2022-09-28 10:39:03 +01:00
David Robertson
f5aaa55e27
Add new columns tracking when we partial-joined (#13892) 2022-09-27 17:26:35 +01:00
Quentin Gliech
50c92f3a69
Carry IdP Session IDs through user-mapping sessions. (#13839)
Since #11482, we're saving sessions IDs from upstream IdPs, but we've been losing them when the user goes through a user mapping session on account registration.
2022-09-27 14:38:14 +01:00
Sean Quah
85e161631a
Faster room joins: Fix spurious error when joining a room (#13872)
During a `lazy_load_members` `/sync`, we look through auth events in
rooms with partial state to find prior membership events. When such a
membership is not found, an error is logged.

Since the first join event for a user never has a prior membership event
to cite, the error would always be logged when one appeared in the room
timeline.

Avoid logging errors for such events.

Introduced in #13477.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-09-27 11:17:23 +01:00
Mathieu Velten
41461fd4d6
typing: check origin server of typing event against room's servers (#13830)
This is also using the partial state approximation if needed so we do
not block here during a fast join.

Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2022-09-26 17:33:32 +02:00
Eric Eastwood
ac1a31740b
Only try to backfill event if we haven't tried before recently (#13635)
Only try to backfill event if we haven't tried before recently (exponential backoff). No need to keep trying the same backfill point that fails over and over.

Fix https://github.com/matrix-org/synapse/issues/13622
Fix https://github.com/matrix-org/synapse/issues/8451

Follow-up to https://github.com/matrix-org/synapse/pull/13589

Part of https://github.com/matrix-org/synapse/issues/13356
2022-09-23 14:01:29 -05:00
Sean Quah
f49f73c0da
Faster room joins: Avoid blocking /keys/changes (#13888)
Part of the work for #12993.

Once #12993 is fully resolved, we expect `/keys/changes` to behave
sensibly when joined to a room with partial state.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-09-23 17:55:15 +01:00
Patrick Cloke
efd108b45d
Accept & store thread IDs for receipts (implement MSC3771). (#13782)
Updates the `/receipts` endpoint and receipt EDU handler to parse a
`thread_id` from the body and insert it in the database.
2022-09-23 14:33:28 +00:00
Sean Quah
03c2bfb7f8
Send device list updates out to servers in partially joined rooms (#13874)
Use the provided list of servers in the room from the `/send_join`
response, since we will not know which users are in the room.  This
isn't sufficient to ensure that all remote servers receive the right
device list updates, since the `/send_join` response may be inaccurate
or we may calculate the membership state of new users in the room
incorrectly.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-09-23 13:44:03 +01:00
reivilibre
c06b2b7142
Faster Remote Room Joins: tell remote homeservers that we are unable to authorise them if they query a room which has partial state on our server. (#13823) 2022-09-23 11:47:16 +01:00
Brendan Abolivier
8ae42ab8fa
Support enabling/disabling pushers (from MSC3881) (#13799)
Partial implementation of MSC3881
2022-09-21 14:39:01 +00:00
Peter Scheu
16e1a9d9a7
Correct documentation for map_user_attributes of OpenID Mapping Providers (#13836)
Co-authored-by: David Robertson <davidr@element.io>
2022-09-21 13:08:16 +00:00
Quentin Gliech
85fc7ea1a1
Remove the complete_sso_login method from the Module API which was deprecated in Synapse 1.13.0. (#13843)
Signed-off-by: Quentin Gliech <quenting@element.io>
2022-09-20 15:18:07 +02:00
Erik Johnston
42d261c32f
Port the push rule classes to Rust. (#13768) 2022-09-20 12:10:31 +01:00
Sean Quah
d64e85197a
Remove error spam when users query the keys of departed remote users (#13826)
The error message introduced in #13749 has turned out to be very spammy.
Remove it for now.
2022-09-16 16:16:05 +01:00
Eric Eastwood
140af0cdb6
Record any exception when processing a pulled event (#13814)
Part of https://github.com/matrix-org/synapse/issues/13700 and https://github.com/matrix-org/synapse/issues/13356

Follow-up to https://github.com/matrix-org/synapse/pull/13589
2022-09-15 14:40:49 -05:00
Eric Eastwood
957e3d74fc
Keep track when we try and fail to process a pulled event (#13589)
We can follow-up this PR with:

 1. Only try to backfill from an event if we haven't tried recently -> https://github.com/matrix-org/synapse/issues/13622
 1. When we decide to backfill that event again, process it in the background so it doesn't block and make `/messages` slow when we know it will probably fail again -> https://github.com/matrix-org/synapse/issues/13623
 1. Generally track failures everywhere we try and fail to pull an event over federation -> https://github.com/matrix-org/synapse/issues/13700

Fix https://github.com/matrix-org/synapse/issues/13621

Part of https://github.com/matrix-org/synapse/issues/13356

Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.qv7cj51sv9i5)
2022-09-14 13:57:50 -05:00
reivilibre
6302753012
Deduplicate is_server_notices_room. (#13780) 2022-09-14 15:53:18 +00:00
Sean Quah
c73774467e
Fix bug in device list caching when remote users leave rooms (#13749)
When a remote user leaves the last room shared with the homeserver, we
have to mark their device list as unsubscribed, otherwise we would hold
on to a stale device list in our cache. Crucially, the device list would
remain cached even after the remote user rejoined the room, which could
lead to E2EE failures until the next change to the remote user's device
list.

Fixes #13651.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-09-14 10:42:57 +01:00
Sean Quah
69fa29700e
Re-type hint some collections in /sync code as read-only (#13754)
Signed-off-by: Sean Quah <seanq@matrix.org>
2022-09-08 20:13:39 +01:00
Dirk Klimpel
f799eac7ea
Add timestamp to user's consent (#13741)
Co-authored-by: reivilibre <olivier@librepush.net>
2022-09-08 15:41:48 +00:00
reivilibre
d3d9ca156e
Cancel the processing of key query requests when they time out. (#13680) 2022-09-07 12:03:32 +01:00
Connor Davis
bb5b47b62a
Add Admin API to Fetch Messages Within a Particular Window (#13672)
This adds two new admin APIs that allow us to fetch messages from a room within a particular time.
2022-09-07 10:54:44 +01:00
reivilibre
26bc26586b
Remove the unspecced room_id field in the /hierarchy response. (#13506)
This is a re-do of 57d334a13d (#13365),
which was backed out in 12abd72497 (#13501).

The `room_id` field represented the parent space for each room
and was made redundant by changes in the API shape where the
`children_state` is now nested underneath each `room`.

The room ID of each child is in the `state_key` field and is still
available.
2022-09-06 15:28:44 -04:00
Šimon Brandner
0e99f07952
Remove support for unstable private read receipts (#13653)
Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com>
2022-09-01 13:31:54 +01:00
Jacek Kuśnierz
84ddcd7bbf
Drop support for calling /_matrix/client/v3/rooms/{roomId}/invite without an id_access_token (#13241)
Fixes #13206

Signed-off-by: Jacek Kusnierz jacek.kusnierz@tum.de
2022-08-31 12:10:25 +00:00
Dirk Klimpel
682dfcfc0d
Fix that user cannot /forget rooms after the last member has left (#13546) 2022-08-30 09:58:38 +00:00
Eric Eastwood
51d732db3b
Optimize how we calculate likely_domains during backfill (#13575)
Optimize how we calculate `likely_domains` during backfill because I've seen this take 17s in production just to `get_current_state` which is used to `get_domains_from_state` (see case [*2. Loading tons of events* in the `/messages` investigation issue](https://github.com/matrix-org/synapse/issues/13356)).

There are 3 ways we currently calculate hosts that are in the room:

 1. `get_current_state` -> `get_domains_from_state`
    - Used in `backfill` to calculate `likely_domains` and `/timestamp_to_event` because it was cargo-culted from `backfill`
    - This one is being eliminated in favor of `get_current_hosts_in_room` in this PR 🕳
 1. `get_current_hosts_in_room`
    - Used for other federation things like sending read receipts and typing indicators
 1. `get_hosts_in_room_at_events`
    - Used when pushing out events over federation to other servers in the `_process_event_queue_loop`

Fix https://github.com/matrix-org/synapse/issues/13626

Part of https://github.com/matrix-org/synapse/issues/13356

Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh)


### Query performance

#### Before

The query from `get_current_state` sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s):
```
synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 16035.612 ms (00:16.036)

synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 4243.237 ms (00:04.243)
```

But what about `get_current_hosts_in_room`: When there is 8M rows in the `current_state_events` table, the previous query in `get_current_hosts_in_room` took 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back.

```sh
$ psql synapse
synapse=# \timing on
synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]*:(.*)$'))
FROM current_state_events
WHERE
    type = 'm.room.member'
    AND membership = 'join'
    AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
  4130
(1 row)

Time: 13181.598 ms (00:13.182)

synapse=# SELECT COUNT(*) from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
 80814

synapse=# SELECT COUNT(*) from current_state_events;
  count
---------
 8162847

synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') );
 pg_size_pretty
----------------
 4702 MB
```

#### After

I'm not sure how long it takes from complete freshness as I only really get that opportunity once (maybe restarting computer but that's cumbersome) and it's not really relevant to normal operating times. Maybe you get closer to the fresh times the more access variability there is so that Postgres caches aren't as exact. Update: The longest I've seen this run for is 6.4s and 4.5s after a computer restart.

After a Postgres restart, it takes 330ms and running back to back takes 260ms.

```sh
$ psql synapse
synapse=# \timing on
Timing is on.
synapse=# SELECT
    substring(c.state_key FROM '@[^:]*:(.*)$') as host
FROM current_state_events c
/* Get the depth of the event from the events table */
INNER JOIN events AS e USING (event_id)
WHERE
    c.type = 'm.room.member'
    AND c.membership = 'join'
    AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org'
GROUP BY host
ORDER BY min(e.depth) ASC;
Time: 333.800 ms
```

#### Going further

To improve things further we could add a `limit` parameter to `get_current_hosts_in_room`. Realistically, we don't need 4k domains to choose from because there is no way we're going to query that many before we a) probably get an answer or b) we give up. 

Another thing we can do is optimize the query to use a index skip scan:

 - https://wiki.postgresql.org/wiki/Loose_indexscan
 - Index Skip Scan, https://commitfest.postgresql.org/37/1741/
 - https://www.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/
2022-08-30 01:38:14 -05:00
Brad Murray
967d7bad6c
Move the execution of the retention purge_jobs to the main worker (#13632)
Fixes #9927

Signed-off-by: Brad Murray brad@beeper.com
2022-08-26 08:38:10 +01:00
Eric Eastwood
0bf180cbb4
Comment about a better future where we can get the state diff between two events (#13586)
Split off from https://github.com/matrix-org/synapse/pull/13561

Part of https://github.com/matrix-org/synapse/issues/13356

Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh)
2022-08-24 18:59:27 -05:00