synapse-product/synapse/storage/databases/main
Sean Quah 68db233f0c
Handle race between persisting an event and un-partial stating a room (#13100)
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.

We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.

To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.

All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.

On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.

The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.

We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.

`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.

`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.

Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-05 16:12:52 +01:00
..
__init__.py Improve performance of getting unread counts in rooms (#13119) 2022-06-29 10:32:38 +00:00
account_data.py Use the ignored_users table to test event visibility & sync. (#12225) 2022-03-15 14:06:05 -04:00
appservice.py Remove code which updates application_services_state.last_txn (#12680) 2022-05-17 11:07:18 +01:00
cache.py Add index to cache invalidations (#12747) 2022-05-17 09:34:59 +00:00
censor_events.py Type hint the constructors of the data store classes (#11555) 2021-12-13 17:05:00 +00:00
client_ips.py Optimise _update_client_ips_batch_txn to batch together database operations. (#12252) 2022-04-08 15:29:13 +01:00
deviceinbox.py Replace noop background updates with DELETE. (#12954) 2022-06-13 14:06:27 -04:00
devices.py Fix type error that made its way onto develop (#13098) 2022-06-17 13:05:27 +01:00
directory.py Replace uses of simple_insert_many with simple_insert_many_values. (#11742) 2022-01-13 19:44:18 -05:00
e2e_room_keys.py Add StreamKeyType class and replace string literals with constants (#12567) 2022-05-16 15:35:31 +00:00
end_to_end_keys.py Add support for MSC3202: sending one-time key counts and fallback key usage states to Application Services. (#11617) 2022-02-24 17:55:45 +00:00
event_federation.py Stop reading from event_edges.room_id. (#12914) 2022-05-31 13:51:49 +01:00
event_push_actions.py Use upserts for updating event_push_summary (#13153) 2022-07-05 13:51:04 +01:00
events_bg_updates.py Clean up schema for event_edges (#12893) 2022-06-15 12:29:42 +01:00
events_forward_extremities.py Fix returned count of delete extremities admin API (#12496) 2022-04-19 16:49:45 +01:00
events_worker.py Stop reading from event_edges.room_id. (#12914) 2022-05-31 13:51:49 +01:00
events.py Handle race between persisting an event and un-partial stating a room (#13100) 2022-07-05 16:12:52 +01:00
filtering.py Improve type hints in storage classes. (#11652) 2021-12-29 13:04:28 +00:00
keys.py Add some type hints to datastore (#12485) 2022-04-27 13:05:00 +01:00
lock.py LockStore: fix acquiring a lock via LockStore.try_acquire_lock (#12832) 2022-05-30 09:41:13 +01:00
media_repository.py Replace noop background updates with DELETE. (#12954) 2022-06-13 14:06:27 -04:00
metrics.py Add some type hints to datastore (#12717) 2022-05-17 15:29:06 +01:00
monthly_active_users.py Add storage and module API methods to get monthly active users and their appservices (#12838) 2022-05-27 10:25:57 +00:00
openid.py Add type hints to some storage classes (#11307) 2021-11-11 08:47:31 -05:00
presence.py Reduce DB load of /sync when using presence (#12885) 2022-05-31 13:01:05 +00:00
profile.py Remove remaining pieces of groups code. (#12966) 2022-06-06 13:20:05 -04:00
purge_events.py Clean up schema for event_edges (#12893) 2022-06-15 12:29:42 +01:00
push_rule.py Speed up get_unread_event_push_actions_by_room (#13005) 2022-06-15 15:17:14 +00:00
pusher.py Fix invite notifications for users without pushers (#12840) 2022-05-30 13:14:43 +02:00
receipts.py Fix serialization errors when rotating notifications (#13118) 2022-06-28 13:13:44 +01:00
registration.py Replace noop background updates with DELETE. (#12954) 2022-06-13 14:06:27 -04:00
rejections.py Remove redundant "coding: utf-8" lines (#9786) 2021-04-14 15:34:27 +01:00
relations.py Fix caching behavior for relations push rules. (#12859) 2022-05-25 07:49:54 -04:00
room_batch.py Correct type hint for room_batch.py (#11310) 2021-11-11 16:49:28 +00:00
room.py Handle race between persisting an event and un-partial stating a room (#13100) 2022-07-05 16:12:52 +01:00
roommember.py Reduce state pulled from DB due to sending typing and receipts over federation (#12964) 2022-06-06 16:46:11 +01:00
search.py Replace noop background updates with DELETE. (#12954) 2022-06-13 14:06:27 -04:00
session.py Run pyupgrade --py37-plus --keep-percent-format on Synapse (#11685) 2022-01-05 09:53:05 -08:00
signatures.py remove constantly lib use and switch to enums. (#12624) 2022-05-04 11:26:11 +00:00
state_deltas.py Wait for lazy join to complete when getting current state (#12872) 2022-06-01 16:02:53 +01:00
state.py Faster joins: add issue links to the TODOs (#13004) 2022-06-09 10:13:03 +00:00
stats.py Implement MSC3827: Filtering of /publicRooms by room type (#13031) 2022-06-29 17:12:45 +00:00
stream.py Improve performance of getting unread counts in rooms (#13119) 2022-06-29 10:32:38 +00:00
tags.py Add some type hints to datastore (#12423) 2022-04-12 11:54:00 +01:00
transactions.py Add admin API to get a list of federated rooms (#11658) 2022-01-25 16:11:40 +00:00
ui_auth.py Add some type hints to datastore (#12485) 2022-04-27 13:05:00 +01:00
user_directory.py Wait for lazy join to complete when getting current state (#12872) 2022-06-01 16:02:53 +01:00
user_erasure_store.py Annotations for user_erasure_store (#11313) 2021-11-11 19:22:19 +00:00