forked-synapse

mirror of https://mau.dev/maunium/synapse.git synced 2024-10-01 01:36:05 -04:00

Author	SHA1	Message	Date
Eric Eastwood	9385c41ba4	Fix Prometheus metrics being negative (mixed up start/end) (#13584 ) Fix: - https://github.com/matrix-org/synapse/pull/13535#discussion_r949582508 - https://github.com/matrix-org/synapse/pull/13533#discussion_r949577244	2022-08-23 08:47:30 +01:00
Eric Eastwood	088bcb7ecb	Time how long it takes us to do backfill processing (#13535 )	2022-08-17 10:33:19 +01:00
Eric Eastwood	0a4efbc1dd	Instrument the federation/backfill part of `/messages` (#13489 ) Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace. Split out from https://github.com/matrix-org/synapse/pull/13440 Follow-up from https://github.com/matrix-org/synapse/pull/13368 Part of https://github.com/matrix-org/synapse/issues/13356	2022-08-16 12:39:40 -05:00
Eric Eastwood	344a2f767c	Instrument `FederationStateIdsServlet` - `/state_ids` (#13499 ) Instrument FederationStateIdsServlet - `/state_ids` so it's easier to follow what's going on in Jaeger when viewing a trace.	2022-08-15 19:41:23 +01:00
reivilibre	e9e6aacfbe	Faster Room Joins: prevent Synapse from answering federated join requests for a room which it has not fully joined yet. (#13416 )	2022-08-04 16:27:04 +01:00
Eric Eastwood	92d21faf12	Instrument `/messages` for understandable traces in Jaeger (#13368 ) In Jaeger: - Before: huge list of uncategorized database calls - After: nice and collapsible into units of work	2022-08-03 10:57:38 -05:00
Sean Quah	8d317f6da5	Fix error when out of servers to sync partial state with (#13432 ) so that we raise the intended error instead. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-08-02 12:12:44 +01:00
reivilibre	e17e5c97e0	Faster Room Joins: don't leave a stuck room partial state flag if the join fails. (#13403 )	2022-08-01 16:45:39 +00:00
David Teller	11f811470f	Uniformize spam-checker API, part 5: expand other spam-checker callbacks to return `Tuple[Codes, dict]` (#13044 ) Signed-off-by: David Teller <davidt@element.io> Co-authored-by: Brendan Abolivier <babolivier@matrix.org>	2022-07-11 16:52:10 +00:00
Sean Quah	1391a76cd2	Faster room joins: fix race in recalculation of current room state (#13151 ) Bounce recalculation of current state to the correct event persister and move recalculation of current state into the event persistence queue, to avoid concurrent updates to a room's current state. Also give recalculation of a room's current state a real stream ordering. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-07-07 12:19:31 +00:00
Sean Quah	68db233f0c	Handle race between persisting an event and un-partial stating a room (#13100 ) Whenever we want to persist an event, we first compute an event context, which includes the state at the event and a flag indicating whether the state is partial. After a lot of processing, we finally try to store the event in the database, which can fail for partial state events when the containing room has been un-partial stated in the meantime. We detect the race as a foreign key constraint failure in the data store layer and turn it into a special `PartialStateConflictError` exception, which makes its way up to the method in which we computed the event context. To make things difficult, the exception needs to cross a replication request: `/fed_send_events` for events coming over federation and `/send_event` for events from clients. We transport the `PartialStateConflictError` as a `409 Conflict` over replication and turn `409`s back into `PartialStateConflictError`s on the worker making the request. All client events go through `EventCreationHandler.handle_new_client_event`, which is called in a lot of places. Instead of trying to update all the code which creates client events, we turn the `PartialStateConflictError` into a `429 Too Many Requests` in `EventCreationHandler.handle_new_client_event` and hope that clients take it as a hint to retry their request. On the federation event side, there are 7 places which compute event contexts. 4 of them use outlier event contexts: `FederationEventHandler._auth_and_persist_outliers_inner`, `FederationHandler.do_knock`, `FederationHandler.on_invite_request` and `FederationHandler.do_remotely_reject_invite`. These events won't have the partial state flag, so we do not need to do anything for then. The remaining 3 paths which create events are `FederationEventHandler.process_remote_join`, `FederationEventHandler.on_send_membership_event` and `FederationEventHandler._process_received_pdu`. We can't experience the race in `process_remote_join`, unless we're handling an additional join into a partial state room, which currently blocks, so we make no attempt to handle it correctly. `on_send_membership_event` is only called by `FederationServer._on_send_membership_event`, so we catch the `PartialStateConflictError` there and retry just once. `_process_received_pdu` is called by `on_receive_pdu` for incoming events and `_process_pulled_event` for backfill. The latter should never try to persist partial state events, so we ignore it. We catch the `PartialStateConflictError` in `on_receive_pdu` and retry just once. Refering to the graph of code paths in https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648 may make the above make more sense. Signed-off-by: Sean Quah <seanq@matrix.org>	2022-07-05 16:12:52 +01:00
David Teller	a164a46038	Uniformize spam-checker API, part 4: port other spam-checker callbacks to return `Union[Allow, Codes]`. (#12857 ) Co-authored-by: Brendan Abolivier <babolivier@matrix.org>	2022-06-13 18:16:16 +00:00
Richard van der Hoff	f68b5e5773	Merge branch 'rav/simplify_event_auth_interface' into develop	2022-06-13 11:34:59 +01:00
Richard van der Hoff	c1b28b8842	Remove redundant `room_version` param from `check_auth_rules_from_context` It's now implied by the room_version property on the event.	2022-06-12 23:13:10 +01:00
Richard van der Hoff	68be42f6b6	Remove `room_version` param from `validate_event_for_room_version` Instead, use the `room_version` property of the event we're validating. The `room_version` was originally added as a parameter somewhere around #4482, but really it's been redundant since #6875 added a `room_version` field to `EventBase`.	2022-06-12 23:13:09 +01:00
Richard van der Hoff	7c6b2204d1	Faster joins: add issue links to the TODOs (#13004 ) ... to help us keep track of these things	2022-06-09 10:13:03 +00:00
Erik Johnston	e3163e2e11	Reduce the amount of state we pull from the DB (#12811 )	2022-06-06 09:24:12 +01:00
Erik Johnston	888a29f412	Wait for lazy join to complete when getting current state (#12872 )	2022-06-01 16:02:53 +01:00
Sean Quah	641908f72f	Faster room joins: Resume state re-syncing after a Synapse restart (#12813 ) Signed-off-by: Sean Quah <seanq@matrix.org>	2022-05-31 15:15:08 +00:00
Sean Quah	2fba1076c5	Faster room joins: Try other destinations when resyncing the state of a partial-state room (#12812 ) Signed-off-by: Sean Quah <seanq@matrix.org>	2022-05-31 15:50:29 +01:00
Erik Johnston	1e453053cb	Rename storage classes (#12913 )	2022-05-31 12:17:50 +00:00
Erik Johnston	4660d9fdcf	Fix up `state_store` naming (#12871 )	2022-05-25 12:59:04 +01:00
Shay	71e8afe34d	Update EventContext `get_current_event_ids` and `get_prev_event_ids` to accept state filters and update calls where possible (#12791 )	2022-05-20 09:54:12 +01:00
Erik Johnston	c72d26c1e1	Refactor `EventContext` (#12689 ) Refactor how the `EventContext` class works, with the intention of reducing the amount of state we fetch from the DB during event processing. The idea here is to get rid of the cached `current_state_ids` and `prev_state_ids` that live in the `EventContext`, and instead defer straight to the database (and its caching). One change that may have a noticeable effect is that we now no longer prefill the `get_current_state_ids` cache on a state change. However, that query is relatively light, since its just a case of reading a table from the DB (unlike fetching state at an event which is more heavyweight). For deployments with workers this cache isn't even used. Part of #12684	2022-05-10 19:43:13 +00:00
andrew do	01e625513a	remove constantly lib use and switch to enums. (#12624 )	2022-05-04 11:26:11 +00:00
Richard van der Hoff	17d99f758a	Optimise backfill calculation (#12522 ) Try to avoid an OOM by checking fewer extremities. Generally this is a big rewrite of _maybe_backfill, to try and fix some of the TODOs and other problems in it. It's best reviewed commit-by-commit.	2022-04-26 10:27:11 +01:00
Richard van der Hoff	320186319a	Resync state after partial-state join (#12394 ) We work through all the events with partial state, updating the state at each of them. Once it's done, we recalculate the state for the whole room, and then mark the room as having complete state.	2022-04-12 13:23:43 +00:00
Sean Quah	800ba87cc8	Refactor and convert `Linearizer` to async (#12357 ) Refactor and convert `Linearizer` to async. This makes a `Linearizer` cancellation bug easier to fix. Also refactor to use an async context manager, which eliminates an unlikely footgun where code that doesn't immediately use the context manager could forget to release the lock. Signed-off-by: Sean Quah <seanq@element.io>	2022-04-05 15:43:52 +01:00
Richard van der Hoff	afa17f0eab	Return a 404 from `/state` for an outlier (#12087 ) * Replace `get_state_for_pdu` with `get_state_ids_for_pdu` and `get_events_as_list`. * Return a 404 from `/state` and `/state_ids` for an outlier	2022-03-21 11:23:32 +00:00
Richard van der Hoff	dc8d825ef2	Skip attempt to get state at backwards-extremities (#12173 ) We don't have the state at a backwards-extremity, so this is never going to do anything useful.	2022-03-09 11:00:48 +00:00
Richard van der Hoff	e2e1d90a5e	Faster joins: persist to database (#12012 ) When we get a partial_state response from send_join, store information in the database about it: * store a record about the room as a whole having partial state, and stash the list of member servers too. * flag the join event itself as having partial state * also, for any new events whose prev-events are partial-stated, note that they will also be partial-stated. We don't yet make any attempt to interpret this data, so API calls (and a bunch of other things) are just going to get incorrect data.	2022-03-01 12:49:54 +00:00
Richard van der Hoff	e24ff8ebe3	Remove `HomeServer.get_datastore()` (#12031 ) The presence of this method was confusing, and mostly present for backwards compatibility. Let's get rid of it. Part of #11733	2022-02-23 11:04:02 +00:00
Richard van der Hoff	81364db49b	Run `_handle_queued_pdus` as a background process (#12041 ) ... to ensure it gets a proper log context, mostly.	2022-02-22 13:33:22 +00:00
Richard van der Hoff	3070af4809	remote join processing: get create event from state, not auth_chain (#12039 ) A follow-up to #12005, in which I apparently missed that there are a bunch of other places that assume the create event is in the auth chain.	2022-02-21 19:27:35 +00:00
Eric Eastwood	fef2e792be	Fix historical messages backfilling in random order on remote homeservers (MSC2716) (#11114 ) Fix https://github.com/matrix-org/synapse/issues/11091 Fix https://github.com/matrix-org/synapse/issues/10764 (side-stepping the issue because we no longer have to deal with `fake_prev_event_id`) 1. Made the `/backfill` response return messages in `(depth, stream_ordering)` order (previously only sorted by `depth`) - Technically, it shouldn't really matter how `/backfill` returns things but I'm just trying to make the `stream_ordering` a little more consistent from the origin to the remote homeservers in order to get the order of messages from `/messages` consistent ([sorted by `(topological_ordering, stream_ordering)`](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)). - Even now that we return backfilled messages in order, it still doesn't guarantee the same `stream_ordering` (and more importantly the [`/messages` order](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)) on the other server. For example, if a room has a bunch of history imported and someone visits a permalink to a historical message back in time, their homeserver will skip over the historical messages in between and insert the permalink as the next message in the `stream_order` and totally throw off the sort. - This will be even more the case when we add the [MSC3030 jump to date API endpoint](https://github.com/matrix-org/matrix-doc/pull/3030) so the static archives can navigate and jump to a certain date. - We're solving this in the future by switching to [online topological ordering](https://github.com/matrix-org/gomatrixserverlib/issues/187) and [chunking](https://github.com/matrix-org/synapse/issues/3785) which by its nature will apply retroactively to fix any inconsistencies introduced by people permalinking 2. As we're navigating `prev_events` to return in `/backfill`, we order by `depth` first (newest -> oldest) and now also tie-break based on the `stream_ordering` (newest -> oldest). This is technically important because MSC2716 inserts a bunch of historical messages at the same `depth` so it's best to be prescriptive about which ones we should process first. In reality, I think the code already looped over the historical messages as expected because the database is already in order. 3. Making the historical state chain and historical event chain float on their own by having no `prev_events` instead of a fake `prev_event` which caused backfill to get clogged with an unresolvable event. Fixes https://github.com/matrix-org/synapse/issues/11091 and https://github.com/matrix-org/synapse/issues/10764 4. We no longer find connected insertion events by finding a potential `prev_event` connection to the current event we're iterating over. We now solely rely on marker events which when processed, add the insertion event as an extremity and the federating homeserver can ask about it when time calls. - Related discussion, https://github.com/matrix-org/synapse/pull/11114#discussion_r741514793 Before \| After --- \| --- ![](https://user-images.githubusercontent.com/558581/139218681-b465c862-5c49-4702-a59e-466733b0cf45.png) \| ![](https://user-images.githubusercontent.com/558581/146453159-a1609e0a-8324-439d-ae44-e4bce43ac6d1.png) #### Why aren't we sorting topologically when receiving backfill events? > The main reason we're going to opt to not sort topologically when receiving backfill events is because it's probably best to do whatever is easiest to make it just work. People will probably have opinions once they look at [MSC2716](https://github.com/matrix-org/matrix-doc/pull/2716) which could change whatever implementation anyway. > > As mentioned, ideally we would do this but code necessary to make the fake edges but it gets confusing and gives an impression of “just whyyyy” (feels icky). This problem also dissolves with online topological ordering. > > -- https://github.com/matrix-org/synapse/pull/11114#discussion_r741517138 See https://github.com/matrix-org/synapse/pull/11114#discussion_r739610091 for the technical difficulties	2022-02-07 15:54:13 -06:00
Richard van der Hoff	251b5567ec	Remove `log_function` and its uses (#11761 ) I've never found this terribly useful. I think it was added in the early days of Synapse, without much thought as to what would actually be useful to log, and has just been cargo-culted ever since. Rather, it tends to clutter up debug logs with useless information.	2022-01-18 13:06:04 +00:00
Sean Quah	0147b3de20	Add missing type hints to `synapse.logging.context` (#11556 )	2021-12-14 17:35:28 +00:00
Eric Eastwood	a6f1a3abec	Add MSC3030 experimental client and federation API endpoints to get the closest event to a given timestamp (#9445 ) MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030 Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about. ``` GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction> { "event_id": ... "origin_server_ts": ... } ``` Federation API endpoint: ``` GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction> { "event_id": ... "origin_server_ts": ... } ``` Co-authored-by: Erik Johnston <erik@matrix.org>	2021-12-02 01:02:20 -06:00
Richard van der Hoff	f3efa0036b	Move _persist_auth_tree into FederationEventHandler (#11115 ) This is just a lift-and-shift, because it fits more naturally here. We do rename it to `process_remote_join` at the same time though.	2021-10-19 10:24:09 +01:00
Richard van der Hoff	a5d2ea3d08	Check all auth events for room id and rejection (#11009 ) This fixes a bug where we would accept an event whose `auth_events` include rejected events, if the rejected event was shadowed by another `auth_event` with same `(type, state_key)`. The approach is to pass a list of auth events into `check_auth_rules_for_event` instead of a dict, which of course means updating the call sites. This is an extension of #10956.	2021-10-18 18:28:30 +01:00
Eric Eastwood	daf498e099	Fix 500 error on `/messages` when we accumulate more than 5 backward extremities (#11027 ) Found while working on the Gitter backfill script and noticed it only happened after we sent 7 batches, https://gitlab.com/gitterHQ/webapp/-/merge_requests/2229#note_665906390 When there are more than 5 backward extremities for a given depth, backfill will throw an error because we sliced the extremity list to 5 but then try to iterate over the full list. This causes us to look for state that we never fetched and we get a `KeyError`. Before when calling `/messages` when there are more than 5 backward extremities: ``` Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 258, in _async_render_wrapper callback_return = await self._async_render(request) File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 446, in _async_render callback_return = await raw_callback_return File "/usr/local/lib/python3.8/site-packages/synapse/rest/client/room.py", line 580, in on_GET msgs = await self.pagination_handler.get_messages( File "/usr/local/lib/python3.8/site-packages/synapse/handlers/pagination.py", line 396, in get_messages await self.hs.get_federation_handler().maybe_backfill( File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 133, in maybe_backfill return await self._maybe_backfill_inner(room_id, current_depth, limit) File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 386, in _maybe_backfill_inner likely_extremeties_domains = get_domains_from_state(states[e_id]) KeyError: '$zpFflMEBtZdgcMQWTakaVItTLMjLFdKcRWUPHbbSZJl' ```	2021-10-14 18:53:45 -05:00
Patrick Cloke	eb9ddc8c2e	Remove the deprecated BaseHandler. (#11005 ) The shared ratelimit function was replaced with a dedicated RequestRatelimiter class (accessible from the HomeServer object). Other properties were copied to each sub-class that inherited from BaseHandler.	2021-10-08 07:44:43 -04:00
Patrick Cloke	d1bf5f7c9d	Strip "join_authorised_via_users_server" from join events which do not need it. (#10933 ) This fixes a "Event not signed by authorising server" error when transition room member from join -> join, e.g. when updating a display name or avatar URL for restricted rooms.	2021-09-30 11:13:59 -04:00
Richard van der Hoff	428174f902	Split `event_auth.check` into two parts (#10940 ) Broadly, the existing `event_auth.check` function has two parts: * a validation section: checks that the event isn't too big, that it has the rught signatures, etc. This bit is independent of the rest of the state in the room, and so need only be done once for each event. * an auth section: ensures that the event is allowed, given the rest of the state in the room. This gets done multiple times, against various sets of room state, because it forms part of the state res algorithm. Currently, this is implemented with `do_sig_check` and `do_size_check` parameters, but I think that makes everything hard to follow. Instead, we split the function in two and call each part separately where it is needed.	2021-09-29 18:59:15 +01:00
Patrick Cloke	94b620a5ed	Use direct references for configuration variables (part 6). (#10916 )	2021-09-29 06:44:15 -04:00
Richard van der Hoff	5279b9161b	Use `RoomVersion` objects (#10934 ) Various refactors to use `RoomVersion` objects instead of room version identifiers.	2021-09-29 10:57:10 +01:00
Patrick Cloke	bb7fdd821b	Use direct references for configuration variables (part 5). (#10897 )	2021-09-24 07:25:21 -04:00
Andrew Morgan	aa2c027792	Remove unnecessary parentheses around tuples returned from methods (#10889 )	2021-09-23 11:59:07 +01:00
Richard van der Hoff	26f2bfedbf	Factor out a separate `EventContext.for_outlier` (#10883 ) Constructing an EventContext for an outlier is actually really simple, and there's no sense in going via an `async` method in the `StateHandler`. This also means that we can resolve a bunch of FIXMEs.	2021-09-22 17:58:57 +01:00
Richard van der Hoff	8f2a52766b	Ensure we mark sent knocks as outliers (#10873 )	2021-09-22 15:20:18 +01:00

1 2 3 4 5 ...

778 Commits