synapse-product

mirror of https://git.anonymousland.org/anonymousland/synapse-product.git synced 2024-10-01 08:25:44 -04:00

Author	SHA1	Message	Date
Patrick Cloke	20df96a7a7	Speed up inserting `event_push_actions_staging`. (#13634 ) By using `execute_values` instead of `execute_batch`.	2022-08-30 07:12:48 -04:00
Eric Eastwood	51d732db3b	Optimize how we calculate `likely_domains` during backfill (#13575 ) Optimize how we calculate `likely_domains` during backfill because I've seen this take 17s in production just to `get_current_state` which is used to `get_domains_from_state` (see case [2. Loading tons of events in the `/messages` investigation issue](https://github.com/matrix-org/synapse/issues/13356)). There are 3 ways we currently calculate hosts that are in the room: 1. `get_current_state` -> `get_domains_from_state` - Used in `backfill` to calculate `likely_domains` and `/timestamp_to_event` because it was cargo-culted from `backfill` - This one is being eliminated in favor of `get_current_hosts_in_room` in this PR 🕳 1. `get_current_hosts_in_room` - Used for other federation things like sending read receipts and typing indicators 1. `get_hosts_in_room_at_events` - Used when pushing out events over federation to other servers in the `_process_event_queue_loop` Fix https://github.com/matrix-org/synapse/issues/13626 Part of https://github.com/matrix-org/synapse/issues/13356 Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh) ### Query performance #### Before The query from `get_current_state` sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s): ``` synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; Time: 16035.612 ms (00:16.036) synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; Time: 4243.237 ms (00:04.243) ``` But what about `get_current_hosts_in_room`: When there is 8M rows in the `current_state_events` table, the previous query in `get_current_hosts_in_room` took 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back. ```sh $ psql synapse synapse=# \timing on synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]:(.)$')) FROM current_state_events WHERE type = 'm.room.member' AND membership = 'join' AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; count ------- 4130 (1 row) Time: 13181.598 ms (00:13.182) synapse=# SELECT COUNT() from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; count ------- 80814 synapse=# SELECT COUNT() from current_state_events; count --------- 8162847 synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') ); pg_size_pretty ---------------- 4702 MB ``` #### After I'm not sure how long it takes from complete freshness as I only really get that opportunity once (maybe restarting computer but that's cumbersome) and it's not really relevant to normal operating times. Maybe you get closer to the fresh times the more access variability there is so that Postgres caches aren't as exact. Update: The longest I've seen this run for is 6.4s and 4.5s after a computer restart. After a Postgres restart, it takes 330ms and running back to back takes 260ms. ```sh $ psql synapse synapse=# \timing on Timing is on. synapse=# SELECT substring(c.state_key FROM '@[^:]:(.)$') as host FROM current_state_events c /* Get the depth of the event from the events table */ INNER JOIN events AS e USING (event_id) WHERE c.type = 'm.room.member' AND c.membership = 'join' AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org' GROUP BY host ORDER BY min(e.depth) ASC; Time: 333.800 ms ``` #### Going further To improve things further we could add a `limit` parameter to `get_current_hosts_in_room`. Realistically, we don't need 4k domains to choose from because there is no way we're going to query that many before we a) probably get an answer or b) we give up. Another thing we can do is optimize the query to use a index skip scan: - https://wiki.postgresql.org/wiki/Loose_indexscan - Index Skip Scan, https://commitfest.postgresql.org/37/1741/ - https://www.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/	2022-08-30 01:38:14 -05:00
Eric Eastwood	d58615c82c	Directly lookup local membership instead of getting all members in a room first (`get_users_in_room` mis-use) (#13608 ) See https://github.com/matrix-org/synapse/pull/13575#discussion_r953023755	2022-08-24 14:13:12 -05:00
Eric Eastwood	b93bd95e8a	When loading current ids, sort by `stream_id` to avoid incorrect overwrite and avoid errors caused by sorting alphabetical instance name which can be `null` (#13585 ) When loading current ids, sort by stream ID so that we don't want to overwrite the `current_position` of an instance to a lower stream ID than we're actually at ([discussion](https://github.com/matrix-org/synapse/pull/13585#discussion_r951795379)). Previously, it sorted alphabetically by instance name which can be `null` and throw errors but more importantly, accomplishes nothing. Fixes the following startup error which is why I started looking into this area: ``` $ poetry run synapse_homeserver --config-path homeserver.yaml ************************************************************** Error during initialisation: '<' not supported between instances of 'NoneType' and 'str' There may be more information in the logs. ************************************************************** ``` Somehow my database ended up looking like the following, notice the `instance_name` is `null` in the db, and we can't sort `NoneType` things. Another question is why do we see the `instance_name` as `null` sometimes instead of `master` in monolith mode? ``` $ psql synapse synapse=# SELECT * FROM stream_positions; stream_name \| instance_name \| stream_id -----------------+---------------+----------- account_data \| master \| 1242 events \| master \| 1787 to_device \| master \| 58 presence_stream \| master \| 485638 receipts \| master \| 341 backfill \| master \| -139106 (6 rows) synapse=# SELECT instance_name, stream_id FROM receipts_linearized; instance_name \| stream_id ---------------+----------- \| 211 \| 3 \| 4 \| 212 \| 213 \| 224 \| 228 \| 164 \| 313 \| 253 \| 38 \| 321 \| 324 \| 189 \| 192 \| 193 \| 194 \| 195 \| 197 \| 198 \| 275 \| 79 \| 339 \| 340 \| 82 \| 341 \| 84 \| 85 \| 91 \| 119 ```	2022-08-24 12:53:46 -05:00
Nick Mills-Barrett	b687010f89	Rewrite get push actions queries (#13597 )	2022-08-24 10:12:51 +01:00
Erik Johnston	05c9c7363b	Fix regression caused by #13573 (#13600 ) Broke in #13573.	2022-08-23 14:14:05 +00:00
Erik Johnston	aec87a0f93	Speed up fetching large numbers of push rules (#13592 )	2022-08-23 13:15:43 +01:00
Nick Mills-Barrett	5e7847dc92	Cache user IDs instead of profile objects (#13573 ) The profile objects are never used and increase cache size significantly.	2022-08-23 09:49:59 +00:00
Quentin Gliech	3dd175b628	`synapse.api.auth.Auth` cleanup: make permission-related methods use `Requester` instead of the `UserID` (#13024 ) Part of #13019 This changes all the permission-related methods to rely on the Requester instead of the UserID. This is a first step towards enabling scoped access tokens at some point, since I expect the Requester to have scope-related informations in it. It also changes methods which figure out the user/device/appservice out of the access token to return a Requester instead of something else. This avoids having store-related objects in the methods signatures.	2022-08-22 14:17:59 +01:00
Sean Quah	84169a82dc	Avoid blocking lazy-loading `/sync`s during partial joins (#13477 ) Use a state filter or accept partial state in a few places where we request state, to avoid blocking. To make lazy-loading `/sync`s work, we need to provide the memberships of event senders, which are not guaranteed to be in the room state. Instead we dig through auth events for memberships to present to clients. The auth events of an event are guaranteed to contain a passable membership event, otherwise the event would have been rejected. Note that this only covers the common code paths encountered during testing. There has been no exhaustive checking of all sync code paths. Fixes #13146. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-08-18 11:53:02 +01:00
reivilibre	8bdf2bd31e	Fix a bug in the `/event_reports` Admin API which meant that the total count could be larger than the number of results you can actually query for. (#13525 ) Co-authored-by: Brendan Abolivier <babolivier@matrix.org>	2022-08-17 18:08:23 +00:00
Dirk Klimpel	d75512d19e	Add forgotten status to Room Details API (#13503 )	2022-08-17 09:42:01 +00:00
Eric Eastwood	0a4efbc1dd	Instrument the federation/backfill part of `/messages` (#13489 ) Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace. Split out from https://github.com/matrix-org/synapse/pull/13440 Follow-up from https://github.com/matrix-org/synapse/pull/13368 Part of https://github.com/matrix-org/synapse/issues/13356	2022-08-16 12:39:40 -05:00
reivilibre	c3516e9dec	Faster room joins: make `/joined_members` block whilst the room is partial stated. (#13514 )	2022-08-16 13:16:56 +01:00
Erik Johnston	5442891cbc	Make push rules use proper structures. (#13522 ) This improves load times for push rules: \| Version \| Time per user \| Time for 1k users \| \| -------------------- \| ------------- \| ----------------- \| \| Before \| 138 µs \| 138ms \| \| Now (with custom) \| 2.11 µs \| 2.11ms \| \| Now (without custom) \| 49.7 ns \| 0.05 ms \| This therefore has a large impact on send times for rooms with large numbers of local users in the room.	2022-08-16 12:22:17 +01:00
Eric Eastwood	344a2f767c	Instrument `FederationStateIdsServlet` - `/state_ids` (#13499 ) Instrument FederationStateIdsServlet - `/state_ids` so it's easier to follow what's going on in Jaeger when viewing a trace.	2022-08-15 19:41:23 +01:00
David Robertson	19e5d44886	Revert "Update locked versions of mypy and mypy-zope (#13521 )" This reverts commit `f383b9b3ec`. Other PRs were seeing mypy failures that looked to be related to mypy-zope. Confusingly, we didn't see this on #13521. Revert this for now and investigate later.	2022-08-15 14:51:05 +01:00
Patrick Cloke	46bd7f4ed9	Clarifications for event push action processing. (#13485 ) * Clarifies comments. * Fixes an erroneous comment (about return type) added in #13455 (`ec24813220`). * Clarifies the name of a variable. * Simplifies logic of pulling out the latest join for the requesting user.	2022-08-15 09:33:17 -04:00
David Robertson	f383b9b3ec	Update locked versions of mypy and mypy-zope (#13521 )	2022-08-15 11:32:30 +01:00
Richard van der Hoff	507c1cb330	Update the rejected state of events during resync (#13459 ) Events can be un-rejected or newly-rejected during resync, so ensure we update the database and caches when that happens.	2022-08-11 10:42:24 +00:00
Šimon Brandner	ab18441573	Support stable identifiers for MSC2285: private read receipts. (#13273 ) This adds support for the stable identifiers of MSC2285 while continuing to support the unstable identifiers behind the configuration flag. These will be removed in a future version.	2022-08-05 11:09:33 -04:00
Erik Johnston	b6a6bb4027	Add comments about how event push actions are stored. (#13445 )	2022-08-04 19:38:08 +00:00
Patrick Cloke	ec24813220	Improve comments (& avoid a duplicate query) in push actions processing. (#13455 ) * Adds docstrings and inline comments. * Formats SQL queries using triple quoted strings. * Minor formatting changes. * Avoid fetching `event_push_summary_stream_ordering` multiple times in the same transactions.	2022-08-04 19:24:44 +00:00
Richard van der Hoff	96d92156d0	Update type of `EventContext.rejected` (#13460 )	2022-08-04 17:45:01 +01:00
Nick Mills-Barrett	41320a0554	Optimise async get event lookups (#13435 ) Still maintains local in memory lookup optimisation, but does any external lookup as part of the deferred that prevents duplicate lookups for the same event at once. This makes the assumption that fetching from an external cache is a non-zero load operation.	2022-08-04 15:49:55 +01:00
Eric Eastwood	92d21faf12	Instrument `/messages` for understandable traces in Jaeger (#13368 ) In Jaeger: - Before: huge list of uncategorized database calls - After: nice and collapsible into units of work	2022-08-03 10:57:38 -05:00
Sean Quah	224d792dd7	Refactor `_resolve_state_at_missing_prevs` to return an `EventContext` (#13404 ) Previously, `_resolve_state_at_missing_prevs` returned the resolved state before an event and a partial state flag. These were unwieldy to carry around would only ever be used to build an event context. Build the event context directly instead. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-08-01 13:53:56 +01:00
Richard van der Hoff	23768ccb4d	Faster joins: fix rejected events becoming un-rejected during resync (#13413 ) Make sure that we re-check the auth rules during state resync, otherwise rejected events get un-rejected.	2022-08-01 11:20:05 +01:00
Šimon Brandner	583f22780f	Use stable prefixes for MSC3827: filtering of `/publicRooms` by room type (#13370 ) Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com>	2022-07-27 19:46:57 +01:00
Richard van der Hoff	ca3db044a3	Fix infinite loop in partial-state resync (#13353 ) Make sure that we only pull out events from the db once they have no prev-events with partial state.	2022-07-26 11:47:31 +00:00
Sean Quah	335ebb21cc	Faster room joins: avoid blocking when pulling events with missing prevs (#13355 ) Avoid blocking on full state in `_resolve_state_at_missing_prevs` and return a new flag indicating whether the resolved state is partial. Thread that flag around so that it makes it into the event context. Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>	2022-07-26 12:39:23 +01:00
Patrick Cloke	8b603299bf	Remove unused argument for get_relations_for_event. (#13383 )	2022-07-26 07:19:20 -04:00
Erik Johnston	43adf2521c	Refactor presence so we can prune user in room caches (#13313 ) See #10826 and #10786 for context as to why we had to disable pruning on those caches. Now that `get_users_who_share_room_with_user` is called frequently only for presence, we just need to make calls to it less frequent and then we can remove the various levels of caching that is going on.	2022-07-25 09:21:06 +00:00
Erik Johnston	0b87eb8e0c	Make DictionaryCache have better expiry properties (#13292 )	2022-07-21 17:13:44 +01:00
David Robertson	34949ead1f	Track DB txn times w/ two counters, not histogram (#13342 )	2022-07-21 13:23:05 +01:00
Patrick Cloke	50122754c8	Add missing types to opentracing. (#13345 ) After this change `synapse.logging` is fully typed.	2022-07-21 12:01:52 +00:00
Nick Mills-Barrett	190f49d8ab	Use cache store remove base slaved (#13329 ) This comes from two identical definitions in each of the base stores, and means the base slaved store is now empty and can be removed.	2022-07-21 11:51:30 +01:00
Eric Eastwood	0f971ca68e	Update `get_pdu` to return the original, pristine `EventBase` (#13320 ) Update `get_pdu` to return the untouched, pristine `EventBase` as it was originally seen over federation (no metadata added). Previously, we returned the same `event` reference that we stored in the cache which downstream code modified in place and added metadata like setting it as an `outlier` and essentially poisoned our cache. Now we always return a copy of the `event` so the original can stay pristine in our cache and re-used for the next cache call. Split out from https://github.com/matrix-org/synapse/pull/13205 As discussed at: - https://github.com/matrix-org/synapse/pull/13205#discussion_r918365746 - https://github.com/matrix-org/synapse/pull/13205#discussion_r918366125 Related to https://github.com/matrix-org/synapse/issues/12584. This PR doesn't fix that issue because it hits [`get_event` which exists from the local database before it tries to `get_pdu`](`7864f33e28/synapse/federation/federation_client.py (L581-L594)`).	2022-07-20 15:58:51 -05:00
Patrick Cloke	a6895dd576	Add type annotations to `trace` decorator. (#13328 ) Functions that are decorated with `trace` are now properly typed and the type hints for them are fixed.	2022-07-19 14:14:30 -04:00
Erik Johnston	de70b25e84	Reduce memory usage of state group cache (#13323 )	2022-07-19 14:40:37 +01:00
David Robertson	b977867358	Rate limit joins per-room (#13276 )	2022-07-19 11:45:17 +00:00
Nick Mills-Barrett	2ee0b6ef4b	Safe async event cache (#13308 ) Fix race conditions in the async cache invalidation logic, by separating the async & local invalidation calls and ensuring any async call i executed first. Signed off by Nick @ Beeper (@Fizzadar).	2022-07-19 11:25:29 +00:00
Shay	7864f33e28	Increase batch size of `bulk_get_push_rules` and `_get_joined_profiles_from_event_ids`. (#13300 )	2022-07-18 13:15:23 -07:00
Shay	15edf23626	Improve performance of query `_get_subset_users_in_room_with_profiles` (#13299 )	2022-07-18 12:35:45 -07:00
Erik Johnston	f721f1baba	Revert "Make all `process_replication_rows` methods async (#13304 )" (#13312 ) This reverts commit `5d4028f217`.	2022-07-18 14:28:14 +01:00
Nick Mills-Barrett	6785b0f39d	Use READ COMMITTED isolation level when purging rooms (#12942 ) To close: #10294. Signed off by Nick @ Beeper.	2022-07-18 14:17:24 +01:00
Nick Mills-Barrett	5d4028f217	Make all `process_replication_rows` methods async (#13304 ) More prep work for asyncronous caching, also makes all process_replication_rows methods consistent (presence handler already is so). Signed off by Nick @ Beeper (@Fizzadar)	2022-07-17 22:19:43 +01:00
Erik Johnston	0731e0829c	Don't pull out the full state when storing state (#13274 )	2022-07-15 12:59:45 +00:00
Richard van der Hoff	b116d3ce00	Bg update to populate new `events` table columns (#13215 ) These columns were added back in Synapse 1.52, and have been populated for new events since then. It's now (beyond) time to back-populate them for existing events.	2022-07-15 12:47:26 +01:00
Erik Johnston	7be954f59b	Fix a bug which could lead to incorrect state (#13278 ) There are two fixes here: 1. A long-standing bug where we incorrectly calculated `delta_ids`; and 2. A bug introduced in #13267 where we got current state incorrect.	2022-07-15 11:06:41 +00:00

1 2 3 4 5 ...

4600 Commits