forked-synapse/synapse
Sean Quah 68db233f0c
Handle race between persisting an event and un-partial stating a room (#13100)
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.

We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.

To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.

All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.

On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.

The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.

We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.

`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.

`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.

Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-05 16:12:52 +01:00
..
_scripts Merge remote-tracking branch 'origin/release-v1.62' into develop 2022-06-30 13:27:24 -04:00
api Implement MSC3827: Filtering of /publicRooms by room type (#13031) 2022-06-29 17:12:45 +00:00
app Improve startup times in Complement test runs against workers, particularly in CPU-constrained environments. (#13127) 2022-06-30 11:58:12 +00:00
appservice Remove remaining bits of groups code. (#12936) 2022-06-01 09:41:25 -04:00
config Allow dependency errors to pass through (#13113) 2022-06-30 19:48:04 +02:00
crypto Bump black and click versions (#12320) 2022-04-05 11:04:28 +01:00
events Uniformize spam-checker API, part 4: port other spam-checker callbacks to return Union[Allow, Codes]. (#12857) 2022-06-13 18:16:16 +00:00
federation Handle race between persisting an event and un-partial stating a room (#13100) 2022-07-05 16:12:52 +01:00
handlers Handle race between persisting an event and un-partial stating a room (#13100) 2022-07-05 16:12:52 +01:00
http Add Cross-Origin-Resource-Policy header to thumbnail and download media endpoints (#12944) 2022-06-27 14:44:05 +01:00
logging More type hints for synapse.logging (#13103) 2022-06-30 13:05:06 +00:00
metrics Fix Synapse git info missing in version strings (#12973) 2022-06-07 15:24:11 +01:00
module_api Uniformize spam-checker API, part 4: port other spam-checker callbacks to return Union[Allow, Codes]. (#12857) 2022-06-13 18:16:16 +00:00
push Update MSC3786 implementation: Check the state_key (#12939) 2022-06-27 20:28:34 +01:00
replication Handle race between persisting an event and un-partial stating a room (#13100) 2022-07-05 16:12:52 +01:00
res Fix Jinja templating error when generating thumbnail URLs. (#12510) 2022-04-20 12:03:03 -04:00
rest Extra validation for rest/client/account_data (#13148) 2022-07-01 11:04:56 +01:00
server_notices Decouple synapse.api.auth_blocking.AuthBlocking from synapse.api.auth.Auth. (#13021) 2022-06-14 09:51:15 +01:00
spam_checker_api Fix import in module_api module and docs on the new check_event_for_spam signature (#12918) 2022-05-31 12:04:53 +02:00
state Skip waiting for full state for incoming events (#13144) 2022-07-01 10:19:27 +01:00
static Display an error page during failure of fallback UIA. (#10561) 2021-08-18 08:13:35 -04:00
storage Handle race between persisting an event and un-partial stating a room (#13100) 2022-07-05 16:12:52 +01:00
streams Rework stream token to stop caring about groups. (#12897) 2022-05-31 07:42:50 -04:00
util Type tests.utils (#13028) 2022-07-05 15:13:47 +01:00
__init__.py Fix Synapse git info missing in version strings (#12973) 2022-06-07 15:24:11 +01:00
event_auth.py Fix inconsistencies in event validation (#13088) 2022-06-17 16:30:59 +01:00
notifier.py Reduce the amount of state we pull from the DB (#12811) 2022-06-06 09:24:12 +01:00
py.typed Mark Module API error imports as re-exported and mark Synapse as containing type annotations (#11054) 2021-10-13 08:42:41 +01:00
server.py Move the "email unsubscribe" resource, refactor the macaroon generator & simplify the access token verification logic. (#12986) 2022-06-14 09:12:08 -04:00
types.py Fix destination_is errors seen in sentry. (#13041) 2022-06-14 18:28:26 +01:00
visibility.py Fix 404 on /sync when the last event is a redaction of an unknown/purged event (#12905) 2022-06-01 11:29:51 +00:00