Commit Graph

39 Commits

Author SHA1 Message Date
Sean Quah
68db233f0c
Handle race between persisting an event and un-partial stating a room ()
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.

We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.

To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.

All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.

On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.

The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.

We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.

`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.

`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.

Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-05 16:12:52 +01:00
Erik Johnston
1e453053cb
Rename storage classes () 2022-05-31 12:17:50 +00:00
Richard van der Hoff
e24ff8ebe3
Remove HomeServer.get_datastore() ()
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of 
2022-02-23 11:04:02 +00:00
Patrick Cloke
63d90f10ec
Add missing type hints to synapse.replication.http. () 2022-02-08 07:44:39 -05:00
Sean Quah
2b82ec425f
Add type hints for most HomeServer parameters () 2021-10-22 18:15:41 +01:00
Jonathan de Jong
4b965c862d
Remove redundant "coding: utf-8" lines ()
Part of 

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Richard van der Hoff
567f88f835
Prep work for removing outlier from internal_metadata ()
* Populate `internal_metadata.outlier` based on `events` table

Rather than relying on `outlier` being in the `internal_metadata` column,
populate it based on the `events.outlier` column.

* Move `outlier` out of InternalMetadata._dict

Ultimately, this will allow us to stop writing it to the database. For now, we
have to grandfather it back in so as to maintain compatibility with older
versions of Synapse.
2021-03-17 12:33:18 +00:00
Erik Johnston
f21e24ffc2
Add ability for access tokens to belong to one user but grant access to another user. ()
We do it this way round so that only the "owner" can delete the access token (i.e. `/logout/all` by the "owner" also deletes that token, but `/logout/all` by the "target user" doesn't).

A future PR will add an API for creating such a token.

When the target user and authenticated entity are different the `Processed request` log line will be logged with a: `{@admin:server as @bob:server} ...`. I'm not convinced by that format (especially since it adds spaces in there, making it harder to use `cut -d ' '` to chop off the start of log lines). Suggestions welcome.
2020-10-29 15:58:44 +00:00
Erik Johnston
b2486f6656
Fix message duplication if something goes wrong after persisting the event ()
Should fix .
2020-10-13 12:07:56 +01:00
Patrick Cloke
8a4a4186de
Simplify super() calls to Python 3 syntax. ()
This converts calls like super(Foo, self) -> super().

Generated with:

    sed -i "" -Ee 's/super\([^\(]+\)/super()/g' **/*.py
2020-09-18 09:56:44 -04:00
Patrick Cloke
3b415e23a5
Convert replication code to async/await. () 2020-08-03 07:12:55 -04:00
Patrick Cloke
8553f46498
Convert a synapse.events to async/await. () 2020-07-27 13:40:22 -04:00
Erik Johnston
1531b214fc
Add ability to wait for replication streams ()
The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).

Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.

People probably want to look at this commit by commit.
2020-05-22 14:21:54 +01:00
Richard van der Hoff
78a15b1f9d
Store room_versions in EventBase objects ()
This is a bit fiddly because it all has to be done on one fell swoop:

* Wherever we create a new event, pass in the room version (and check it matches the format version)
* When we prune an event, use the room version of the unpruned event to create the pruned version.
* When we pass an event over the replication protocol, pass the room version over alongside it, and use it when deserialising the event again.
2020-03-05 15:46:44 +00:00
Erik Johnston
fa780e9721
Change EventContext to use the Storage class () 2019-12-20 10:32:02 +00:00
Erik Johnston
e577a4b2ad Port replication http server endpoints to async/await 2019-10-29 13:00:51 +00:00
Andrew Morgan
4548d1f87e
Remove unnecessary parentheses around return statements ()
Python will return a tuple whether there are parentheses around the returned values or not.

I'm just sick of my editor complaining about this all over the place :)
2019-08-30 16:28:26 +01:00
Amber Brown
4806651744
Replace returnValue with return () 2019-07-23 23:00:55 +10:00
Amber Brown
32e7c9e7f2
Run Black. () 2019-06-20 19:32:02 +10:00
Erik Johnston
678a92cb56 Replace missed usages of FrozenEvent 2019-01-25 10:32:30 +00:00
Erik Johnston
be6a7e47fa
Revert "Require event format version to parse or create events" 2019-01-25 10:23:51 +00:00
Erik Johnston
e8c9f15397 Replace missed usages of FrozenEvent 2019-01-24 11:14:07 +00:00
Erik Johnston
bebe325e6c Rename POST param to METHOD 2018-08-08 10:36:18 +01:00
Erik Johnston
729b672823 Use new helper base class for ReplicationSendEventRestServlet 2018-07-31 14:32:23 +01:00
Erik Johnston
0faa3223cd Fix missing attributes on workers.
This was missed during the transition from attribute to getter for
getting state from context.
2018-07-23 16:28:00 +01:00
Amber Brown
49af402019 run isort 2018-07-09 16:09:20 +10:00
Amber Brown
77ac14b960
Pass around the reactor explicitly () 2018-06-22 09:37:10 +01:00
Richard van der Hoff
b78395b7fe Refactor ResponseCache usage
Adds a `.wrap` method to ResponseCache which wraps up the boilerplate of a
(get, set) pair, and then use it throughout the codebase.

This will be largely non-functional, but does include the following functional
changes:

* federation_server.on_context_state_request: drops use of _server_linearizer
  which looked redundant and could cause incorrect cache misses by yielding
  between the get and the set.
* RoomListHandler.get_remote_public_room_list(): fixes logcontext leaks
* the wrap function includes some logging. I'm hoping this won't be too noisy
  on production.
2018-04-12 13:02:15 +01:00
Richard van der Hoff
b3384232a0 Add metrics for ResponseCache 2018-04-10 23:14:47 +01:00
Erik Johnston
d0fcc48f9d extra_users is actually a list of UserIDs 2018-03-13 11:20:06 +00:00
Erik Johnston
126b9bf96f Log in the correct places 2018-03-01 12:05:33 +00:00
Erik Johnston
157298f986 Don't do preserve_fn for every request 2018-03-01 11:59:45 +00:00
Erik Johnston
89f90d808a Add some logging 2018-03-01 11:59:16 +00:00
Erik Johnston
8ded8ba2c7 Make repl send_event idempotent and retry on timeouts
If we treated timeouts as failures on the worker we would attempt to
clean up e.g. push actions while the master might still process the
event.
2018-03-01 11:20:34 +00:00
Erik Johnston
6b8604239f Correctly send ratelimit and extra_users params 2018-03-01 10:08:39 +00:00
Erik Johnston
28e973ac11 Calculate push actions on worker 2018-02-28 18:02:30 +00:00
Erik Johnston
106906a65e Don't serialize current state over replication 2018-02-15 13:53:18 +00:00
Erik Johnston
ef344b10e5 Don't log errors propogated from send_event 2018-02-15 11:03:49 +00:00
Erik Johnston
24dd73028a Add replication http endpoint for event sending 2018-02-07 10:32:32 +00:00