forked-synapse/synapse/handlers
Sean Quah 68db233f0c
Handle race between persisting an event and un-partial stating a room (#13100)
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.

We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.

To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.

All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.

On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.

The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.

We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.

`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.

`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.

Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-05 16:12:52 +01:00
..
ui_auth Fix typo in some instances of enable_registration_token_3pid_bypass. (#12639) 2022-05-05 07:11:52 -04:00
__init__.py Remove redundant "coding: utf-8" lines (#9786) 2021-04-14 15:34:27 +01:00
account_data.py Add StreamKeyType class and replace string literals with constants (#12567) 2022-05-16 15:35:31 +00:00
account_validity.py Implement cancellation support/protection for module callbacks (#12568) 2022-05-09 12:31:14 +01:00
account.py Optionally include account validity in MSC3720 account status responses (#12266) 2022-03-24 11:19:41 +01:00
admin.py Rename storage classes (#12913) 2022-05-31 12:17:50 +00:00
appservice.py Additional constants for EDU types. (#12884) 2022-05-27 07:14:36 -04:00
auth.py Move the "email unsubscribe" resource, refactor the macaroon generator & simplify the access token verification logic. (#12986) 2022-06-14 09:12:08 -04:00
cas.py Remove HomeServer.get_datastore() (#12031) 2022-02-23 11:04:02 +00:00
deactivate_account.py Add third_party module callbacks to check if a user can delete a room and deactivate a user (#12028) 2022-03-09 18:23:57 +00:00
device.py Use new device_list_changes_in_room table when getting device list changes (#13045) 2022-06-17 11:42:03 +01:00
devicemessage.py Additional constants for EDU types. (#12884) 2022-05-27 07:14:36 -04:00
directory.py Uniformize spam-checker API, part 4: port other spam-checker callbacks to return Union[Allow, Codes]. (#12857) 2022-06-13 18:16:16 +00:00
e2e_keys.py Additional constants for EDU types. (#12884) 2022-05-27 07:14:36 -04:00
e2e_room_keys.py Refactor and convert Linearizer to async (#12357) 2022-04-05 15:43:52 +01:00
event_auth.py Move some event auth checks out to a different method (#13065) 2022-06-15 19:48:22 +01:00
events.py Rename storage classes (#12913) 2022-05-31 12:17:50 +00:00
federation_event.py Handle race between persisting an event and un-partial stating a room (#13100) 2022-07-05 16:12:52 +01:00
federation.py Handle race between persisting an event and un-partial stating a room (#13100) 2022-07-05 16:12:52 +01:00
identity.py Use getClientAddress instead of getClientIP. (#12599) 2022-05-04 14:11:21 -04:00
initial_sync.py Reduce the amount of state we pull from the DB (#12811) 2022-06-06 09:24:12 +01:00
message.py Handle race between persisting an event and un-partial stating a room (#13100) 2022-07-05 16:12:52 +01:00
oidc.py Move the "email unsubscribe" resource, refactor the macaroon generator & simplify the access token verification logic. (#12986) 2022-06-14 09:12:08 -04:00
pagination.py Rename storage classes (#12913) 2022-05-31 12:17:50 +00:00
password_policy.py Use direct references for some configuration variables (part 3) (#10885) 2021-09-23 07:13:34 -04:00
presence.py Wait for lazy join to complete when getting current state (#12872) 2022-06-01 16:02:53 +01:00
profile.py Remove remaining pieces of groups code. (#12966) 2022-06-06 13:20:05 -04:00
push_rules.py Add a module API to allow modules to edit push rule actions (#12406) 2022-04-27 13:55:33 +00:00
read_marker.py Refactor and convert Linearizer to async (#12357) 2022-04-05 15:43:52 +01:00
receipts.py Additional constants for EDU types. (#12884) 2022-05-27 07:14:36 -04:00
register.py Decouple synapse.api.auth_blocking.AuthBlocking from synapse.api.auth.Auth. (#13021) 2022-06-14 09:51:15 +01:00
relations.py Implement MSC3816, consider the root event for thread participation. (#12766) 2022-06-06 07:18:04 -04:00
room_batch.py Rename storage classes (#12913) 2022-05-31 12:17:50 +00:00
room_list.py Implement MSC3827: Filtering of /publicRooms by room type (#13031) 2022-06-29 17:12:45 +00:00
room_member_worker.py Implement knock feature (#6739) 2021-06-09 19:39:51 +01:00
room_member.py Fix application service not being able to join remote federated room without a profile set (#13131) 2022-07-05 05:56:06 -05:00
room_summary.py Wait for lazy join to complete when getting current state (#12872) 2022-06-01 16:02:53 +01:00
room.py Decouple synapse.api.auth_blocking.AuthBlocking from synapse.api.auth.Auth. (#13021) 2022-06-14 09:51:15 +01:00
saml.py Remove HomeServer.get_datastore() (#12031) 2022-02-23 11:04:02 +00:00
search.py Reduce the amount of state we pull from the DB (#12811) 2022-06-06 09:24:12 +01:00
send_email.py Remove unnecessary ignores due to Twisted upgrade. (#11939) 2022-02-08 09:15:59 -05:00
set_password.py Remove HomeServer.get_datastore() (#12031) 2022-02-23 11:04:02 +00:00
sso.py Use getClientAddress instead of getClientIP. (#12599) 2022-05-04 14:11:21 -04:00
state_deltas.py Remove HomeServer.get_datastore() (#12031) 2022-02-23 11:04:02 +00:00
stats.py Implement MSC3827: Filtering of /publicRooms by room type (#13031) 2022-06-29 17:12:45 +00:00
sync.py Use new device_list_changes_in_room table when getting device list changes (#13045) 2022-06-17 11:42:03 +01:00
typing.py Reduce state pulled from DB due to sending typing and receipts over federation (#12964) 2022-06-06 16:46:11 +01:00
user_directory.py Wait for lazy join to complete when getting current state (#12872) 2022-06-01 16:02:53 +01:00