Commit Graph

494 Commits

Author SHA1 Message Date
Eric Eastwood
51d732db3b
Optimize how we calculate likely_domains during backfill (#13575)
Optimize how we calculate `likely_domains` during backfill because I've seen this take 17s in production just to `get_current_state` which is used to `get_domains_from_state` (see case [*2. Loading tons of events* in the `/messages` investigation issue](https://github.com/matrix-org/synapse/issues/13356)).

There are 3 ways we currently calculate hosts that are in the room:

 1. `get_current_state` -> `get_domains_from_state`
    - Used in `backfill` to calculate `likely_domains` and `/timestamp_to_event` because it was cargo-culted from `backfill`
    - This one is being eliminated in favor of `get_current_hosts_in_room` in this PR 🕳
 1. `get_current_hosts_in_room`
    - Used for other federation things like sending read receipts and typing indicators
 1. `get_hosts_in_room_at_events`
    - Used when pushing out events over federation to other servers in the `_process_event_queue_loop`

Fix https://github.com/matrix-org/synapse/issues/13626

Part of https://github.com/matrix-org/synapse/issues/13356

Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh)


### Query performance

#### Before

The query from `get_current_state` sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s):
```
synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 16035.612 ms (00:16.036)

synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 4243.237 ms (00:04.243)
```

But what about `get_current_hosts_in_room`: When there is 8M rows in the `current_state_events` table, the previous query in `get_current_hosts_in_room` took 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back.

```sh
$ psql synapse
synapse=# \timing on
synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]*:(.*)$'))
FROM current_state_events
WHERE
    type = 'm.room.member'
    AND membership = 'join'
    AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
  4130
(1 row)

Time: 13181.598 ms (00:13.182)

synapse=# SELECT COUNT(*) from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
 80814

synapse=# SELECT COUNT(*) from current_state_events;
  count
---------
 8162847

synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') );
 pg_size_pretty
----------------
 4702 MB
```

#### After

I'm not sure how long it takes from complete freshness as I only really get that opportunity once (maybe restarting computer but that's cumbersome) and it's not really relevant to normal operating times. Maybe you get closer to the fresh times the more access variability there is so that Postgres caches aren't as exact. Update: The longest I've seen this run for is 6.4s and 4.5s after a computer restart.

After a Postgres restart, it takes 330ms and running back to back takes 260ms.

```sh
$ psql synapse
synapse=# \timing on
Timing is on.
synapse=# SELECT
    substring(c.state_key FROM '@[^:]*:(.*)$') as host
FROM current_state_events c
/* Get the depth of the event from the events table */
INNER JOIN events AS e USING (event_id)
WHERE
    c.type = 'm.room.member'
    AND c.membership = 'join'
    AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org'
GROUP BY host
ORDER BY min(e.depth) ASC;
Time: 333.800 ms
```

#### Going further

To improve things further we could add a `limit` parameter to `get_current_hosts_in_room`. Realistically, we don't need 4k domains to choose from because there is no way we're going to query that many before we a) probably get an answer or b) we give up. 

Another thing we can do is optimize the query to use a index skip scan:

 - https://wiki.postgresql.org/wiki/Loose_indexscan
 - Index Skip Scan, https://commitfest.postgresql.org/37/1741/
 - https://www.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/
2022-08-30 01:38:14 -05:00
Eric Eastwood
d58615c82c
Directly lookup local membership instead of getting all members in a room first (get_users_in_room mis-use) (#13608)
See https://github.com/matrix-org/synapse/pull/13575#discussion_r953023755
2022-08-24 14:13:12 -05:00
Quentin Gliech
3dd175b628
synapse.api.auth.Auth cleanup: make permission-related methods use Requester instead of the UserID (#13024)
Part of #13019

This changes all the permission-related methods to rely on the Requester instead of the UserID. This is a first step towards enabling scoped access tokens at some point, since I expect the Requester to have scope-related informations in it.

It also changes methods which figure out the user/device/appservice out of the access token to return a Requester instead of something else. This avoids having store-related objects in the methods signatures.
2022-08-22 14:17:59 +01:00
Eric Eastwood
357561c1a2
Backfill remote event fetched by MSC3030 so we can paginate from it later (#13205)
Depends on https://github.com/matrix-org/synapse/pull/13320

Complement tests: https://github.com/matrix-org/complement/pull/406

We could use the same method to backfill for `/context` as well in the future, see https://github.com/matrix-org/synapse/issues/3848
2022-07-22 16:00:11 -05:00
Nick Mills-Barrett
982fe29655
Optimise room creation event lookups part 2 (#13224) 2022-07-13 19:32:46 +01:00
Nick Mills-Barrett
92202ce867
Reduce event lookups during room creation by passing known event IDs (#13210)
Inspired by the room batch handler, this uses previous event inserts to
pre-populate prev events during room creation, reducing the number of
queries required to create a room.

Signed off by Nick @ Beeper (@Fizzadar)
2022-07-11 18:00:12 +01:00
David Teller
11f811470f
Uniformize spam-checker API, part 5: expand other spam-checker callbacks to return Tuple[Codes, dict] (#13044)
Signed-off-by: David Teller <davidt@element.io>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2022-07-11 16:52:10 +00:00
Eric Eastwood
a962c5a56d
Fix exception when using MSC3030 to look for remote federated events before room creation (#13197)
Complement tests: https://github.com/matrix-org/complement/pull/405

This happens when you have some messages imported before the room is created.
Then use MSC3030 to look backwards before the room creation from a remote
federated server. The server won't find anything locally, but will ask over
federation which will have the remote event. The previous logic would
choke on not having the local event assigned.

```
Failed to fetch /timestamp_to_event from hs2 because of exception(UnboundLocalError) local variable 'local_event' referenced before assignment args=("local variable 'local_event' referenced before assignment",)
```
2022-07-07 11:52:45 -05:00
Quentin Gliech
92103cb2c8
Decouple synapse.api.auth_blocking.AuthBlocking from synapse.api.auth.Auth. (#13021) 2022-06-14 09:51:15 +01:00
David Teller
a164a46038
Uniformize spam-checker API, part 4: port other spam-checker callbacks to return Union[Allow, Codes]. (#12857)
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2022-06-13 18:16:16 +00:00
Richard van der Hoff
c1b28b8842 Remove redundant room_version param from check_auth_rules_from_context
It's now implied by the room_version property on the event.
2022-06-12 23:13:10 +01:00
Richard van der Hoff
68be42f6b6 Remove room_version param from validate_event_for_room_version
Instead, use the `room_version` property of the event we're validating.

The `room_version` was originally added as a parameter somewhere around #4482,
but really it's been redundant since #6875 added a `room_version` field to `EventBase`.
2022-06-12 23:13:09 +01:00
Erik Johnston
e3163e2e11
Reduce the amount of state we pull from the DB (#12811) 2022-06-06 09:24:12 +01:00
Erik Johnston
888a29f412
Wait for lazy join to complete when getting current state (#12872) 2022-06-01 16:02:53 +01:00
Patrick Cloke
7bc08f3201
Remove remaining bits of groups code. (#12936)
* Update worker docs to remove group endpoints.
* Removes an unused parameter to `ApplicationService`.
* Break dependency between media repo and groups.
* Avoid copying `m.room.related_groups` state events during room upgrades.
2022-06-01 09:41:25 -04:00
Erik Johnston
1e453053cb
Rename storage classes (#12913) 2022-05-31 12:17:50 +00:00
Erik Johnston
4660d9fdcf
Fix up state_store naming (#12871) 2022-05-25 12:59:04 +01:00
Shay
71e8afe34d
Update EventContext get_current_event_ids and get_prev_event_ids to accept state filters and update calls where possible (#12791) 2022-05-20 09:54:12 +01:00
Andrew Morgan
96df31239c
Add a unit test for copying over arbitrary room types when upgrading a room (#12792) 2022-05-19 18:32:48 +01:00
Aminda Suomalainen
d25935cd3d
Implement MSC3818: copy room type on upgrade (#12786)
Resolves: #11896

Signed-off-by: Aminda Suomalainen <suomalainen+git@mikaela.info>
2022-05-19 12:28:10 +01:00
reivilibre
df4963548b
Give a meaningful error message when a client tries to create a room with an invalid alias localpart. (#12779) 2022-05-18 11:46:06 +00:00
Andrew Morgan
83be72d76c
Add StreamKeyType class and replace string literals with constants (#12567) 2022-05-16 15:35:31 +00:00
Sean Quah
a5c26750b5
Fix room upgrades creating an empty room when auth fails (#12696)
Signed-off-by: Sean Quah <seanq@element.io>
2022-05-16 14:06:04 +01:00
Andy Balaam
de1e599b9d
add default_power_level_content_override config option. (#12618)
Co-authored-by: Matthew Hodgson <matthew@matrix.org>
2022-05-12 10:41:35 +00:00
David Robertson
051a1c3f22
Convert stringy power levels to integers on room upgrade (#12657) 2022-05-07 13:37:29 +01:00
Eric Eastwood
793d03e2c5
Generate historic pagination token for /messages when no ?from token provided (#12370) 2022-04-06 11:40:28 +01:00
Sean Quah
800ba87cc8
Refactor and convert Linearizer to async (#12357)
Refactor and convert `Linearizer` to async. This makes a `Linearizer`
cancellation bug easier to fix.

Also refactor to use an async context manager, which eliminates an
unlikely footgun where code that doesn't immediately use the context
manager could forget to release the lock.

Signed-off-by: Sean Quah <seanq@element.io>
2022-04-05 15:43:52 +01:00
reivilibre
c4cf916ed7
Default to private room visibility rather than public when a client does not specify one, according to spec. (#12350) 2022-04-01 15:55:09 +01:00
Patrick Cloke
8fe930c215
Move get_bundled_aggregations to relations handler. (#12237)
The get_bundled_aggregations code is fairly high-level and uses
a lot of store methods, we move it into the handler as that seems
like a better fit.
2022-03-18 17:49:32 +00:00
Will Hunt
15382b1afa
Add third_party module callbacks to check if a user can delete a room and deactivate a user (#12028)
* Add check_can_deactivate_user

* Add check_can_shutdown_rooms

* Documentation

* callbacks, not functions

* Various suggested tweaks

* Add tests for test_check_can_shutdown_room and test_check_can_deactivate_user

* Update check_can_deactivate_user to not take a Requester

* Fix check_can_shutdown_room docs

* Renegade and use `by_admin` instead of `admin_user_id`

* fix lint

* Update docs/modules/third_party_rules_callbacks.md

Co-authored-by: Brendan Abolivier <babolivier@matrix.org>

* Update docs/modules/third_party_rules_callbacks.md

Co-authored-by: Brendan Abolivier <babolivier@matrix.org>

* Update docs/modules/third_party_rules_callbacks.md

Co-authored-by: Brendan Abolivier <babolivier@matrix.org>

* Update docs/modules/third_party_rules_callbacks.md

Co-authored-by: Brendan Abolivier <babolivier@matrix.org>

Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2022-03-09 18:23:57 +00:00
Richard van der Hoff
e24ff8ebe3
Remove HomeServer.get_datastore() (#12031)
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
2022-02-23 11:04:02 +00:00
Brendan Abolivier
0171fa5226
Remove deprecated user_may_create_room_with_invites callback (#11950)
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2022-02-11 13:58:11 +00:00
Patrick Cloke
2897fb6b4f
Improvements to bundling aggregations. (#11815)
This is some odds and ends found during the review of #11791
and while continuing to work in this code:

* Return attrs classes instead of dictionaries from some methods
  to improve type safety.
* Call `get_bundled_aggregations` fewer times.
* Adds a missing assertion in the tests.
* Do not return empty bundled aggregations for an event (preferring
  to not include the bundle at all, as the docstring states).
2022-01-26 08:27:04 -05:00
Patrick Cloke
68acb0a29d
Include whether the requesting user has participated in a thread. (#11577)
Per updates to MSC3440.

This is implement as a separate method since it needs to be cached
on a per-user basis, instead of a per-thread basis.
2022-01-18 11:38:57 -05:00
Patrick Cloke
6bf81a7a61
Bundle aggregations outside of the serialization method. (#11612)
This makes the serialization of events synchronous (and it no
longer access the database), but we must manually calculate and
provide the bundled aggregations.

Overall this should cause no change in behavior, but is prep work
for other improvements.
2022-01-07 09:10:46 -05:00
lukasdenk
2ef1fea8d2
Make room creations denied by user_may_create_room cause an M_FORBIDDEN error to be returned, not M_UNKNOWN (#11672)
Co-authored-by: reivilibre <olivier@librepush.net>
2022-01-06 13:16:42 +00:00
Richard van der Hoff
c3e38b88f2
Improve opentracing support for ResponseCache (#11607)
This adds some opentracing annotations to ResponseCache, to make it easier to see what's going on; in particular, it adds a link back to the initial trace which is actually doing the work of generating the response.
2021-12-20 18:12:08 +00:00
Richard van der Hoff
b1ecd19c5d
Fix 'delete room' admin api to work on incomplete rooms (#11523)
If, for some reason, we don't have the create event, we should still be able to
purge a room.
2021-12-07 11:37:54 +00:00
Eric Eastwood
a6f1a3abec
Add MSC3030 experimental client and federation API endpoints to get the closest event to a given timestamp (#9445)
MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030

Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about.
```
GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
{
    "event_id": ...
    "origin_server_ts": ...
}
```

Federation API endpoint:
```
GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
{
    "event_id": ...
    "origin_server_ts": ...
}
```

Co-authored-by: Erik Johnston <erik@matrix.org>
2021-12-02 01:02:20 -06:00
Patrick Cloke
7ae559944a
Fix checking whether a room can be published on creation. (#11392)
If `room_list_publication_rules` was configured with a rule with a
non-wildcard alias and a room was created with an alias then an
internal server error would have been thrown.

This fixes the error and properly applies the publication rules
during room creation.
2021-11-19 15:19:32 +00:00
Dirk Klimpel
8840a7b7f1
Convert delete room admin API to async endpoint (#11223)
Signed-off-by: Dirk Klimpel dirk@klimpel.org
2021-11-12 12:35:31 +00:00
David Robertson
b6f4d122ef
Allow admins to proactively block rooms (#11228)
Co-authored-by: Dirk Klimpel <5740567+dklimpel@users.noreply.github.com>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2021-11-09 13:11:47 +00:00
Patrick Cloke
a19d01c3d9
Support filtering by relations per MSC3440 (#11236)
Adds experimental support for `relation_types` and `relation_senders`
fields for filters.
2021-11-09 08:10:58 -05:00
Patrick Cloke
c01bc5f43d
Add remaining type hints to synapse.events. (#11098) 2021-11-02 09:55:52 -04:00
Patrick Cloke
19d5dc6931
Refactor Filter to handle fields according to data being filtered. (#11194)
This avoids filtering against fields which cannot exist on an
event source. E.g. presence updates don't have a room.
2021-10-27 11:26:30 -04:00
AndrewFerr
4387b791e0
Don't set new room alias before potential 403 (#10930)
Fixes: #10929 

Signed-off-by: Andrew Ferrazzutti <fair@miscworks.net>
2021-10-25 15:24:49 +01:00
Patrick Cloke
1f9d0b8a7a
Add type hints to synapse.events.*. (#11066)
Except `synapse/events/__init__.py`, which will be done in a follow-up.
2021-10-13 07:24:07 -04:00
Patrick Cloke
eb9ddc8c2e
Remove the deprecated BaseHandler. (#11005)
The shared ratelimit function was replaced with a dedicated
RequestRatelimiter class (accessible from the HomeServer
object).

Other properties were copied to each sub-class that inherited
from BaseHandler.
2021-10-08 07:44:43 -04:00
Brendan Abolivier
829f2a82b0
Add a spamchecker callback to allow or deny room joins (#10910)
Co-authored-by: Erik Johnston <erik@matrix.org>
2021-10-06 14:32:16 +00:00
Richard van der Hoff
428174f902
Split event_auth.check into two parts (#10940)
Broadly, the existing `event_auth.check` function has two parts:
 * a validation section: checks that the event isn't too big, that it has the rught signatures, etc. 
   This bit is independent of the rest of the state in the room, and so need only be done once 
   for each event.
 * an auth section: ensures that the event is allowed, given the rest of the state in the room.
   This gets done multiple times, against various sets of room state, because it forms part of
   the state res algorithm.

Currently, this is implemented with `do_sig_check` and `do_size_check` parameters, but I think
that makes everything hard to follow. Instead, we split the function in two and call each part
separately where it is needed.
2021-09-29 18:59:15 +01:00