forked-synapse

mirror of https://mau.dev/maunium/synapse.git synced 2024-10-01 01:36:05 -04:00

Author	SHA1	Message	Date
David Robertson	f2d2481e56	Require SQLite >= 3.27.0 (#13760 )	2022-09-09 11:14:10 +01:00
Dirk Klimpel	f799eac7ea	Add timestamp to user's consent (#13741 ) Co-authored-by: reivilibre <olivier@librepush.net>	2022-09-08 15:41:48 +00:00
Sean Quah	906cead9ca	Update docstrings to explain the impact of partial state (#13750 ) Update the docstrings for `get_users_in_room` and `get_current_hosts_in_room` to explain the impact of partial state. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-09-08 15:55:29 +01:00
Sean Quah	89e8b98b65	Avoid raising errors due to malformed IDs in `get_current_hosts_in_room` (#13748 ) Handle malformed user IDs with no colons in `get_current_hosts_in_room`. It's not currently possible for a malformed user ID to join a room, so this error would never be hit. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-09-08 15:55:03 +01:00
Eric Eastwood	d4d3249ded	Instrument `get_metadata_for_events` for tracing (#13730 ) When backfilling, `_get_state_ids_after_missing_prev_event` calls [`get_metadata_for_events`](`26bc26586b/synapse/handlers/federation_event.py (L1133)`). For `#matrix:matrix.org`, it's called with 77k `state_events` which means 77 calls to the database and takes 28 seconds.	2022-09-07 11:41:52 -05:00
reivilibre	d3d9ca156e	Cancel the processing of key query requests when they time out. (#13680 )	2022-09-07 12:03:32 +01:00
reivilibre	c2fe48a6ff	Rename the `EventFormatVersions` enum values so that they line up with room version numbers. (#13706 )	2022-09-07 11:08:20 +01:00
Patrick Cloke	48a5c47a9f	Add a schema delta to drop unstable private read receipts. (#13692 ) Otherwise they'll be leaked due to the filtering code only respecting the stable identifiers for private read receipts.	2022-09-01 14:57:47 -04:00
Erik Johnston	9d2823ab70	Cache `is_partial_state_room` (#13693 ) Fixes #13613.	2022-09-01 16:07:01 +01:00
Šimon Brandner	0e99f07952	Remove support for unstable private read receipts (#13653 ) Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com>	2022-09-01 13:31:54 +01:00
Nick Mills-Barrett	42b11d5565	Remove cached wrap on `_get_joined_users_from_context` method (#13569 ) The method doesn't actually do any data fetching and the method that does, `_get_joined_profile_from_event_id`, has its own cache. Signed off by Nick @ Beeper (@Fizzadar).	2022-08-31 12:19:39 +01:00
David Robertson	a160406d24	Fix admin List Room API return type on sqlite (#13509 )	2022-08-31 10:38:16 +00:00
Eric Eastwood	92c5817e34	Give the correct next event when the message timestamps are the same - MSC3030 (#13658 ) Discovered while working on https://github.com/matrix-org/synapse/pull/13589 and I had all the messages at the same timestamp in the tests. Part of https://github.com/matrix-org/matrix-spec-proposals/pull/3030 Complement tests: https://github.com/matrix-org/complement/pull/457	2022-08-30 14:50:06 -05:00
Shay	20c76cecb9	Drop unused column `application_services_state.last_txn` (#13627 )	2022-08-30 10:29:16 -07:00
Patrick Cloke	20df96a7a7	Speed up inserting `event_push_actions_staging`. (#13634 ) By using `execute_values` instead of `execute_batch`.	2022-08-30 07:12:48 -04:00
Eric Eastwood	51d732db3b	Optimize how we calculate `likely_domains` during backfill (#13575 ) Optimize how we calculate `likely_domains` during backfill because I've seen this take 17s in production just to `get_current_state` which is used to `get_domains_from_state` (see case [2. Loading tons of events in the `/messages` investigation issue](https://github.com/matrix-org/synapse/issues/13356)). There are 3 ways we currently calculate hosts that are in the room: 1. `get_current_state` -> `get_domains_from_state` - Used in `backfill` to calculate `likely_domains` and `/timestamp_to_event` because it was cargo-culted from `backfill` - This one is being eliminated in favor of `get_current_hosts_in_room` in this PR 🕳 1. `get_current_hosts_in_room` - Used for other federation things like sending read receipts and typing indicators 1. `get_hosts_in_room_at_events` - Used when pushing out events over federation to other servers in the `_process_event_queue_loop` Fix https://github.com/matrix-org/synapse/issues/13626 Part of https://github.com/matrix-org/synapse/issues/13356 Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh) ### Query performance #### Before The query from `get_current_state` sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s): ``` synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; Time: 16035.612 ms (00:16.036) synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; Time: 4243.237 ms (00:04.243) ``` But what about `get_current_hosts_in_room`: When there is 8M rows in the `current_state_events` table, the previous query in `get_current_hosts_in_room` took 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back. ```sh $ psql synapse synapse=# \timing on synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]:(.)$')) FROM current_state_events WHERE type = 'm.room.member' AND membership = 'join' AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; count ------- 4130 (1 row) Time: 13181.598 ms (00:13.182) synapse=# SELECT COUNT() from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; count ------- 80814 synapse=# SELECT COUNT() from current_state_events; count --------- 8162847 synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') ); pg_size_pretty ---------------- 4702 MB ``` #### After I'm not sure how long it takes from complete freshness as I only really get that opportunity once (maybe restarting computer but that's cumbersome) and it's not really relevant to normal operating times. Maybe you get closer to the fresh times the more access variability there is so that Postgres caches aren't as exact. Update: The longest I've seen this run for is 6.4s and 4.5s after a computer restart. After a Postgres restart, it takes 330ms and running back to back takes 260ms. ```sh $ psql synapse synapse=# \timing on Timing is on. synapse=# SELECT substring(c.state_key FROM '@[^:]:(.)$') as host FROM current_state_events c /* Get the depth of the event from the events table */ INNER JOIN events AS e USING (event_id) WHERE c.type = 'm.room.member' AND c.membership = 'join' AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org' GROUP BY host ORDER BY min(e.depth) ASC; Time: 333.800 ms ``` #### Going further To improve things further we could add a `limit` parameter to `get_current_hosts_in_room`. Realistically, we don't need 4k domains to choose from because there is no way we're going to query that many before we a) probably get an answer or b) we give up. Another thing we can do is optimize the query to use a index skip scan: - https://wiki.postgresql.org/wiki/Loose_indexscan - Index Skip Scan, https://commitfest.postgresql.org/37/1741/ - https://www.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/	2022-08-30 01:38:14 -05:00
Eric Eastwood	d58615c82c	Directly lookup local membership instead of getting all members in a room first (`get_users_in_room` mis-use) (#13608 ) See https://github.com/matrix-org/synapse/pull/13575#discussion_r953023755	2022-08-24 14:13:12 -05:00
Eric Eastwood	b93bd95e8a	When loading current ids, sort by `stream_id` to avoid incorrect overwrite and avoid errors caused by sorting alphabetical instance name which can be `null` (#13585 ) When loading current ids, sort by stream ID so that we don't want to overwrite the `current_position` of an instance to a lower stream ID than we're actually at ([discussion](https://github.com/matrix-org/synapse/pull/13585#discussion_r951795379)). Previously, it sorted alphabetically by instance name which can be `null` and throw errors but more importantly, accomplishes nothing. Fixes the following startup error which is why I started looking into this area: ``` $ poetry run synapse_homeserver --config-path homeserver.yaml ************************************************************** Error during initialisation: '<' not supported between instances of 'NoneType' and 'str' There may be more information in the logs. ************************************************************** ``` Somehow my database ended up looking like the following, notice the `instance_name` is `null` in the db, and we can't sort `NoneType` things. Another question is why do we see the `instance_name` as `null` sometimes instead of `master` in monolith mode? ``` $ psql synapse synapse=# SELECT * FROM stream_positions; stream_name \| instance_name \| stream_id -----------------+---------------+----------- account_data \| master \| 1242 events \| master \| 1787 to_device \| master \| 58 presence_stream \| master \| 485638 receipts \| master \| 341 backfill \| master \| -139106 (6 rows) synapse=# SELECT instance_name, stream_id FROM receipts_linearized; instance_name \| stream_id ---------------+----------- \| 211 \| 3 \| 4 \| 212 \| 213 \| 224 \| 228 \| 164 \| 313 \| 253 \| 38 \| 321 \| 324 \| 189 \| 192 \| 193 \| 194 \| 195 \| 197 \| 198 \| 275 \| 79 \| 339 \| 340 \| 82 \| 341 \| 84 \| 85 \| 91 \| 119 ```	2022-08-24 12:53:46 -05:00
Nick Mills-Barrett	b687010f89	Rewrite get push actions queries (#13597 )	2022-08-24 10:12:51 +01:00
Erik Johnston	05c9c7363b	Fix regression caused by #13573 (#13600 ) Broke in #13573.	2022-08-23 14:14:05 +00:00
Erik Johnston	aec87a0f93	Speed up fetching large numbers of push rules (#13592 )	2022-08-23 13:15:43 +01:00
Nick Mills-Barrett	5e7847dc92	Cache user IDs instead of profile objects (#13573 ) The profile objects are never used and increase cache size significantly.	2022-08-23 09:49:59 +00:00
Quentin Gliech	3dd175b628	`synapse.api.auth.Auth` cleanup: make permission-related methods use `Requester` instead of the `UserID` (#13024 ) Part of #13019 This changes all the permission-related methods to rely on the Requester instead of the UserID. This is a first step towards enabling scoped access tokens at some point, since I expect the Requester to have scope-related informations in it. It also changes methods which figure out the user/device/appservice out of the access token to return a Requester instead of something else. This avoids having store-related objects in the methods signatures.	2022-08-22 14:17:59 +01:00
Sean Quah	84169a82dc	Avoid blocking lazy-loading `/sync`s during partial joins (#13477 ) Use a state filter or accept partial state in a few places where we request state, to avoid blocking. To make lazy-loading `/sync`s work, we need to provide the memberships of event senders, which are not guaranteed to be in the room state. Instead we dig through auth events for memberships to present to clients. The auth events of an event are guaranteed to contain a passable membership event, otherwise the event would have been rejected. Note that this only covers the common code paths encountered during testing. There has been no exhaustive checking of all sync code paths. Fixes #13146. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-08-18 11:53:02 +01:00
reivilibre	8bdf2bd31e	Fix a bug in the `/event_reports` Admin API which meant that the total count could be larger than the number of results you can actually query for. (#13525 ) Co-authored-by: Brendan Abolivier <babolivier@matrix.org>	2022-08-17 18:08:23 +00:00
Dirk Klimpel	d75512d19e	Add forgotten status to Room Details API (#13503 )	2022-08-17 09:42:01 +00:00
Eric Eastwood	0a4efbc1dd	Instrument the federation/backfill part of `/messages` (#13489 ) Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace. Split out from https://github.com/matrix-org/synapse/pull/13440 Follow-up from https://github.com/matrix-org/synapse/pull/13368 Part of https://github.com/matrix-org/synapse/issues/13356	2022-08-16 12:39:40 -05:00
reivilibre	c3516e9dec	Faster room joins: make `/joined_members` block whilst the room is partial stated. (#13514 )	2022-08-16 13:16:56 +01:00
Erik Johnston	5442891cbc	Make push rules use proper structures. (#13522 ) This improves load times for push rules: \| Version \| Time per user \| Time for 1k users \| \| -------------------- \| ------------- \| ----------------- \| \| Before \| 138 µs \| 138ms \| \| Now (with custom) \| 2.11 µs \| 2.11ms \| \| Now (without custom) \| 49.7 ns \| 0.05 ms \| This therefore has a large impact on send times for rooms with large numbers of local users in the room.	2022-08-16 12:22:17 +01:00
Eric Eastwood	344a2f767c	Instrument `FederationStateIdsServlet` - `/state_ids` (#13499 ) Instrument FederationStateIdsServlet - `/state_ids` so it's easier to follow what's going on in Jaeger when viewing a trace.	2022-08-15 19:41:23 +01:00
David Robertson	19e5d44886	Revert "Update locked versions of mypy and mypy-zope (#13521 )" This reverts commit `f383b9b3ec`. Other PRs were seeing mypy failures that looked to be related to mypy-zope. Confusingly, we didn't see this on #13521. Revert this for now and investigate later.	2022-08-15 14:51:05 +01:00
Patrick Cloke	46bd7f4ed9	Clarifications for event push action processing. (#13485 ) * Clarifies comments. * Fixes an erroneous comment (about return type) added in #13455 (`ec24813220`). * Clarifies the name of a variable. * Simplifies logic of pulling out the latest join for the requesting user.	2022-08-15 09:33:17 -04:00
David Robertson	f383b9b3ec	Update locked versions of mypy and mypy-zope (#13521 )	2022-08-15 11:32:30 +01:00
Richard van der Hoff	507c1cb330	Update the rejected state of events during resync (#13459 ) Events can be un-rejected or newly-rejected during resync, so ensure we update the database and caches when that happens.	2022-08-11 10:42:24 +00:00
Šimon Brandner	ab18441573	Support stable identifiers for MSC2285: private read receipts. (#13273 ) This adds support for the stable identifiers of MSC2285 while continuing to support the unstable identifiers behind the configuration flag. These will be removed in a future version.	2022-08-05 11:09:33 -04:00
Erik Johnston	b6a6bb4027	Add comments about how event push actions are stored. (#13445 )	2022-08-04 19:38:08 +00:00
Patrick Cloke	ec24813220	Improve comments (& avoid a duplicate query) in push actions processing. (#13455 ) * Adds docstrings and inline comments. * Formats SQL queries using triple quoted strings. * Minor formatting changes. * Avoid fetching `event_push_summary_stream_ordering` multiple times in the same transactions.	2022-08-04 19:24:44 +00:00
Richard van der Hoff	96d92156d0	Update type of `EventContext.rejected` (#13460 )	2022-08-04 17:45:01 +01:00
Nick Mills-Barrett	41320a0554	Optimise async get event lookups (#13435 ) Still maintains local in memory lookup optimisation, but does any external lookup as part of the deferred that prevents duplicate lookups for the same event at once. This makes the assumption that fetching from an external cache is a non-zero load operation.	2022-08-04 15:49:55 +01:00
Eric Eastwood	92d21faf12	Instrument `/messages` for understandable traces in Jaeger (#13368 ) In Jaeger: - Before: huge list of uncategorized database calls - After: nice and collapsible into units of work	2022-08-03 10:57:38 -05:00
Sean Quah	224d792dd7	Refactor `_resolve_state_at_missing_prevs` to return an `EventContext` (#13404 ) Previously, `_resolve_state_at_missing_prevs` returned the resolved state before an event and a partial state flag. These were unwieldy to carry around would only ever be used to build an event context. Build the event context directly instead. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-08-01 13:53:56 +01:00
Richard van der Hoff	23768ccb4d	Faster joins: fix rejected events becoming un-rejected during resync (#13413 ) Make sure that we re-check the auth rules during state resync, otherwise rejected events get un-rejected.	2022-08-01 11:20:05 +01:00
Šimon Brandner	583f22780f	Use stable prefixes for MSC3827: filtering of `/publicRooms` by room type (#13370 ) Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com>	2022-07-27 19:46:57 +01:00
Richard van der Hoff	ca3db044a3	Fix infinite loop in partial-state resync (#13353 ) Make sure that we only pull out events from the db once they have no prev-events with partial state.	2022-07-26 11:47:31 +00:00
Sean Quah	335ebb21cc	Faster room joins: avoid blocking when pulling events with missing prevs (#13355 ) Avoid blocking on full state in `_resolve_state_at_missing_prevs` and return a new flag indicating whether the resolved state is partial. Thread that flag around so that it makes it into the event context. Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>	2022-07-26 12:39:23 +01:00
Patrick Cloke	8b603299bf	Remove unused argument for get_relations_for_event. (#13383 )	2022-07-26 07:19:20 -04:00
Erik Johnston	43adf2521c	Refactor presence so we can prune user in room caches (#13313 ) See #10826 and #10786 for context as to why we had to disable pruning on those caches. Now that `get_users_who_share_room_with_user` is called frequently only for presence, we just need to make calls to it less frequent and then we can remove the various levels of caching that is going on.	2022-07-25 09:21:06 +00:00
Erik Johnston	0b87eb8e0c	Make DictionaryCache have better expiry properties (#13292 )	2022-07-21 17:13:44 +01:00
David Robertson	34949ead1f	Track DB txn times w/ two counters, not histogram (#13342 )	2022-07-21 13:23:05 +01:00
Patrick Cloke	50122754c8	Add missing types to opentracing. (#13345 ) After this change `synapse.logging` is fully typed.	2022-07-21 12:01:52 +00:00
Nick Mills-Barrett	190f49d8ab	Use cache store remove base slaved (#13329 ) This comes from two identical definitions in each of the base stores, and means the base slaved store is now empty and can be removed.	2022-07-21 11:51:30 +01:00
Eric Eastwood	0f971ca68e	Update `get_pdu` to return the original, pristine `EventBase` (#13320 ) Update `get_pdu` to return the untouched, pristine `EventBase` as it was originally seen over federation (no metadata added). Previously, we returned the same `event` reference that we stored in the cache which downstream code modified in place and added metadata like setting it as an `outlier` and essentially poisoned our cache. Now we always return a copy of the `event` so the original can stay pristine in our cache and re-used for the next cache call. Split out from https://github.com/matrix-org/synapse/pull/13205 As discussed at: - https://github.com/matrix-org/synapse/pull/13205#discussion_r918365746 - https://github.com/matrix-org/synapse/pull/13205#discussion_r918366125 Related to https://github.com/matrix-org/synapse/issues/12584. This PR doesn't fix that issue because it hits [`get_event` which exists from the local database before it tries to `get_pdu`](`7864f33e28/synapse/federation/federation_client.py (L581-L594)`).	2022-07-20 15:58:51 -05:00
Patrick Cloke	a6895dd576	Add type annotations to `trace` decorator. (#13328 ) Functions that are decorated with `trace` are now properly typed and the type hints for them are fixed.	2022-07-19 14:14:30 -04:00
Erik Johnston	de70b25e84	Reduce memory usage of state group cache (#13323 )	2022-07-19 14:40:37 +01:00
David Robertson	b977867358	Rate limit joins per-room (#13276 )	2022-07-19 11:45:17 +00:00
Nick Mills-Barrett	2ee0b6ef4b	Safe async event cache (#13308 ) Fix race conditions in the async cache invalidation logic, by separating the async & local invalidation calls and ensuring any async call i executed first. Signed off by Nick @ Beeper (@Fizzadar).	2022-07-19 11:25:29 +00:00
Shay	7864f33e28	Increase batch size of `bulk_get_push_rules` and `_get_joined_profiles_from_event_ids`. (#13300 )	2022-07-18 13:15:23 -07:00
Shay	15edf23626	Improve performance of query `_get_subset_users_in_room_with_profiles` (#13299 )	2022-07-18 12:35:45 -07:00
Erik Johnston	f721f1baba	Revert "Make all `process_replication_rows` methods async (#13304 )" (#13312 ) This reverts commit `5d4028f217`.	2022-07-18 14:28:14 +01:00
Nick Mills-Barrett	6785b0f39d	Use READ COMMITTED isolation level when purging rooms (#12942 ) To close: #10294. Signed off by Nick @ Beeper.	2022-07-18 14:17:24 +01:00
Nick Mills-Barrett	5d4028f217	Make all `process_replication_rows` methods async (#13304 ) More prep work for asyncronous caching, also makes all process_replication_rows methods consistent (presence handler already is so). Signed off by Nick @ Beeper (@Fizzadar)	2022-07-17 22:19:43 +01:00
Erik Johnston	0731e0829c	Don't pull out the full state when storing state (#13274 )	2022-07-15 12:59:45 +00:00
Richard van der Hoff	b116d3ce00	Bg update to populate new `events` table columns (#13215 ) These columns were added back in Synapse 1.52, and have been populated for new events since then. It's now (beyond) time to back-populate them for existing events.	2022-07-15 12:47:26 +01:00
Erik Johnston	7be954f59b	Fix a bug which could lead to incorrect state (#13278 ) There are two fixes here: 1. A long-standing bug where we incorrectly calculated `delta_ids`; and 2. A bug introduced in #13267 where we got current state incorrect.	2022-07-15 11:06:41 +00:00
Nick Mills-Barrett	cc21a431f3	Async get event cache prep (#13242 ) Some experimental prep work to enable external event caching based on #9379 & #12955. Doesn't actually move the cache at all, just lays the groundwork for async implemented caches. Signed off by Nick @ Beeper (@Fizzadar)	2022-07-15 09:30:46 +00:00
Nick Mills-Barrett	21eeacc995	Federation Sender & Appservice Pusher Stream Optimisations (#13251 ) * Replace `get_new_events_for_appservice` with `get_all_new_events_stream` The functions were near identical and this brings the AS worker closer to the way federation senders work which can allow for multiple workers to handle AS traffic. * Pull received TS alongside events when processing the stream This avoids an extra query -per event- when both federation sender and appservice pusher process events.	2022-07-15 09:36:56 +01:00
Erik Johnston	0ca4172b5d	Don't pull out state in `compute_event_context` for unconflicted state (#13267 )	2022-07-14 13:57:02 +00:00
Patrick Cloke	4db7862e0f	Drop unused tables from groups/communities. (#12967 ) These tables have been unused since Synapse v1.61.0, although schema version 72 was added in Synapse v1.62.0.	2022-07-13 09:55:14 -04:00
Richard van der Hoff	a366b75b72	Drop unused table `event_reference_hashes` (#13218 ) This is unused since Synapse 1.60.0 (#12679). It's time for it to go.	2022-07-12 18:52:06 +00:00
Sean Quah	3f178332d6	Log the stack when waiting for an entire room to be un-partial stated (#13257 ) The stack is already logged when waiting for an event to be un-partial stated. Log the stack for rooms as well, to aid in debugging.	2022-07-12 18:57:38 +01:00
andrew do	2d82cdafd2	expose whether a room is a space in the Admin API (#13208 )	2022-07-12 15:30:53 +01:00
Erik Johnston	e5716b631c	Don't pull out the full state when calculating push actions (#13078 )	2022-07-11 20:08:39 +00:00
Erik Johnston	f1711e1f5c	Remove delay when rotating event push actions (#13211 ) We want to be as up to date as possible, and sleeping doesn't help here and can mean we fall behind.	2022-07-11 16:51:30 +01:00
Erik Johnston	757bc0caef	Fix notification count after a highlighted message (#13223 ) Fixes #13196 Broke by #13005	2022-07-08 14:00:29 +01:00
Sean Quah	1391a76cd2	Faster room joins: fix race in recalculation of current room state (#13151 ) Bounce recalculation of current state to the correct event persister and move recalculation of current state into the event persistence queue, to avoid concurrent updates to a room's current state. Also give recalculation of a room's current state a real stream ordering. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-07-07 12:19:31 +00:00
Erik Johnston	a0f51b059c	Fix bug where we failed to delete old push actions (#13194 ) This happened if we encountered a stream ordering in `event_push_actions` that had more rows than the batch size of the delete, as If we don't delete any rows in an iteration then the next time round we get the exact same stream ordering and get stuck.	2022-07-06 12:09:19 +01:00
Sean Quah	68db233f0c	Handle race between persisting an event and un-partial stating a room (#13100 ) Whenever we want to persist an event, we first compute an event context, which includes the state at the event and a flag indicating whether the state is partial. After a lot of processing, we finally try to store the event in the database, which can fail for partial state events when the containing room has been un-partial stated in the meantime. We detect the race as a foreign key constraint failure in the data store layer and turn it into a special `PartialStateConflictError` exception, which makes its way up to the method in which we computed the event context. To make things difficult, the exception needs to cross a replication request: `/fed_send_events` for events coming over federation and `/send_event` for events from clients. We transport the `PartialStateConflictError` as a `409 Conflict` over replication and turn `409`s back into `PartialStateConflictError`s on the worker making the request. All client events go through `EventCreationHandler.handle_new_client_event`, which is called in a lot of places. Instead of trying to update all the code which creates client events, we turn the `PartialStateConflictError` into a `429 Too Many Requests` in `EventCreationHandler.handle_new_client_event` and hope that clients take it as a hint to retry their request. On the federation event side, there are 7 places which compute event contexts. 4 of them use outlier event contexts: `FederationEventHandler._auth_and_persist_outliers_inner`, `FederationHandler.do_knock`, `FederationHandler.on_invite_request` and `FederationHandler.do_remotely_reject_invite`. These events won't have the partial state flag, so we do not need to do anything for then. The remaining 3 paths which create events are `FederationEventHandler.process_remote_join`, `FederationEventHandler.on_send_membership_event` and `FederationEventHandler._process_received_pdu`. We can't experience the race in `process_remote_join`, unless we're handling an additional join into a partial state room, which currently blocks, so we make no attempt to handle it correctly. `on_send_membership_event` is only called by `FederationServer._on_send_membership_event`, so we catch the `PartialStateConflictError` there and retry just once. `_process_received_pdu` is called by `on_receive_pdu` for incoming events and `_process_pulled_event` for backfill. The latter should never try to persist partial state events, so we ignore it. We catch the `PartialStateConflictError` in `on_receive_pdu` and retry just once. Refering to the graph of code paths in https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648 may make the above make more sense. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-07-05 16:12:52 +01:00
Erik Johnston	578a5e24a9	Use upserts for updating `event_push_summary` (#13153 )	2022-07-05 13:51:04 +01:00
Andrew Morgan	6180e1bc4b	Synapse 1.62.0rc3 (2022-07-04) ============================== Bugfixes -------- - Update the version of the [ldap3 plugin](https://github.com/matrix-org/matrix-synapse-ldap3/) included in the `matrixdotorg/synapse` DockerHub images and the Debian packages hosted on `packages.matrix.org` to 0.2.1. This fixes [a bug](https://github.com/matrix-org/matrix-synapse-ldap3/pull/163) with usernames containing uppercase characters. ([\#13156](https://github.com/matrix-org/synapse/issues/13156)) - Fix a bug introduced in Synapse 1.62.0rc1 affecting unread counts for users on small servers. ([\#13168](https://github.com/matrix-org/synapse/issues/13168)) -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAmLDDVgACgkQiISIDS7+ X/Q+KQ//WuWB9hfAW8XEyYHWox95zaITsAzY/TTG1IXygAMjgEk2+9utdRaX3wbk YDaZCeEw+vbK/3w/lt1RzI30K3uVCZVcW2DTQr1Qi4B+UWLOlCsVfOT9LcMvNoJe ww/cOK6RpPgTlqk5ij0MtdjfWkAJeToi7ESMooORxhWFm3Zd8e5BpbNv89WUBZhk zCqCjIdjSF+Mwk8NwmU1iJi5JQY/+Xl51uk2+wGIAe4vtgPTz7PJmoPF1E6nGGVF 9OYdlWU4H7u6js8n05QL2jKtX34uszCo2hwoW2aFPPmF0B2CFEV6WFBiDOppLZ1g ZMJv1s/34RXoBu8pAuJnq2BZkWxu99LRmPV+f/R+S0jDT1MH9tdSdhfcGu7iH/Y9 uguGX3OOlxnkUb5o825Xt3mvBcVaTGY+sspFtB12RtXmWRdll/Hq6w11ZN5f6qDy Nr/DuoPjMAH7kzelFn/GpP6K8zX8iYjf0lLCyrbYV7OYAI6/I+Vao+sT2ctHD1T8 s4aTTx1bEl23mo/RiqH2fRHaPhBjZKW0uv6iRNqDE2ThYPAXinVtt7MiUU0QGco5 vMca/RZBkEj0Lov0AleBx4XRXlBTyq5BX2V1frYLenKp42bDzN9sgsPAOPeKieHW qjr+Ti9i47wGADXs2GI/mke/C8jlONEKJm/v8mwXItn8Za7wBJc= =SpI6 -----END PGP SIGNATURE----- Merge tag 'v1.62.0rc3' into develop Synapse 1.62.0rc3 (2022-07-04) ============================== Bugfixes -------- - Update the version of the [ldap3 plugin](https://github.com/matrix-org/matrix-synapse-ldap3/) included in the `matrixdotorg/synapse` DockerHub images and the Debian packages hosted on `packages.matrix.org` to 0.2.1. This fixes [a bug](https://github.com/matrix-org/matrix-synapse-ldap3/pull/163) with usernames containing uppercase characters. ([\#13156](https://github.com/matrix-org/synapse/issues/13156)) - Fix a bug introduced in Synapse 1.62.0rc1 affecting unread counts for users on small servers. ([\#13168](https://github.com/matrix-org/synapse/issues/13168))	2022-07-04 17:35:06 +01:00
Erik Johnston	723ce73d02	Fix stuck notification counts on small servers (#13168 )	2022-07-04 16:02:21 +01:00
Patrick Cloke	b0366853ca	Merge remote-tracking branch 'origin/release-v1.62' into develop	2022-06-30 13:27:24 -04:00
Erik Johnston	dbce28b2f1	Fix unread counts on large servers (#13140 )	2022-06-30 15:08:40 +01:00
Erik Johnston	a3a05c812d	Add index to help delete old push actions (#13141 )	2022-06-30 14:05:49 +00:00
Brendan Abolivier	4d3b8fb23f	Don't actually one-line the SQL statements we send to the DB (#13129 )	2022-06-30 10:43:24 +02:00
Šimon Brandner	13e359aec8	Implement MSC3827: Filtering of `/publicRooms` by room type (#13031 ) Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com>	2022-06-29 17:12:45 +00:00
Erik Johnston	92a0c18ef0	Improve performance of getting unread counts in rooms (#13119 )	2022-06-29 10:32:38 +00:00
Erik Johnston	7469824d58	Fix serialization errors when rotating notifications (#13118 )	2022-06-28 13:13:44 +01:00
reivilibre	b26cbe3d45	Fix type error that made its way onto develop (#13098 ) * Fix type error introduced accidentally by #13045 * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>	2022-06-17 13:05:27 +01:00
Erik Johnston	5ef05c70c3	Rotate notifications more frequently (#13096 )	2022-06-17 10:58:00 +00:00
Erik Johnston	5099b5ecc7	Use new `device_list_changes_in_room` table when getting device list changes (#13045 )	2022-06-17 11:42:03 +01:00
Erik Johnston	8ceed5e6b5	Add desc to `get_earliest_token_for_stats` (#13085 )	2022-06-16 17:50:46 +00:00
David Robertson	97e9fbe1b2	Type annotations in `synapse.databases.main.devices` (#13025 ) Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>	2022-06-15 15:20:04 +00:00
Erik Johnston	0d1d3e0708	Speed up `get_unread_event_push_actions_by_room` (#13005 ) Fixes #11887 hopefully. The core change here is that `event_push_summary` now holds a summary of counts up until a much more recent point, meaning that the range of rows we need to count in `event_push_actions` is much smaller. This needs two major changes: 1. When we get a receipt we need to recalculate `event_push_summary` rather than just delete it 2. The logic for deleting `event_push_actions` is now divorced from calculating `event_push_summary`. In future it would be good to calculate `event_push_summary` while we persist a new event (it should just be a case of adding one to the relevant rows in `event_push_summary`), as that will further simplify the get counts logic and remove the need for us to periodically update `event_push_summary` in a background job.	2022-06-15 15:17:14 +00:00
Richard van der Hoff	75fb10ee45	Clean up schema for `event_edges` (#12893 ) * Remove redundant references to `event_edges.room_id` We don't need to care about the room_id here, because we are already checking the event id. * Clean up the event_edges table We make a number of changes to `event_edges`: * We give the `room_id` and `is_state` columns defaults (null and false respectively) so that we can stop populating them. * We drop any rows that have `is_state` set true - they should no longer exist. * We drop any rows that do not exist in `events` - these should not exist either. * We drop the old unique constraint on all the colums, which wasn't much use. * We create a new unique index on `(event_id, prev_event_id)`. * We add a foreign key constraint to `events`. These happen rather differently depending on whether we are on Postgres or SQLite. For SQLite, we just rebuild the whole table, copying only the rows we want to keep. For Postgres, we try to do things in the background as much as possible. * Stop populating `event_edges.room_id` and `is_state` We can just rely on the defaults.	2022-06-15 12:29:42 +01:00
Patrick Cloke	5f4ecf759d	Rename delta to apply in the proper schema version. (#13050 )	2022-06-14 14:34:04 +00:00
Patrick Cloke	53b77b203a	Replace noop background updates with DELETE. (#12954 ) Removes the `register_noop_background_update` and deletes the background updates directly in a delta file.	2022-06-13 14:06:27 -04:00
Richard van der Hoff	7c6b2204d1	Faster joins: add issue links to the TODOs (#13004 ) ... to help us keep track of these things	2022-06-09 10:13:03 +00:00
Nick Mills-Barrett	04ca3a52f6	Use READ COMMITTED isolation level when inserting read receipts (#12957 )	2022-06-09 09:44:16 +01:00
David Robertson	586bfc6dc0	Use dummy fallback engines if imports fail (#12979 )	2022-06-07 17:33:55 +01:00
Patrick Cloke	d2fd7f7b5c	Fix a stale comment in get_room_version_id_txn. (#12969 )	2022-06-07 07:44:31 -04:00

1 2 3 4 5 ...

4664 Commits