Commit Graph

118 Commits

Author SHA1 Message Date
Mathieu Velten
6cddf24e36
Faster joins: don't stall when a user joins during a fast join (#14606)
Fixes #12801.
Complement tests are at
https://github.com/matrix-org/complement/pull/567.

Avoid blocking on full state when handling a subsequent join into a
partial state room.

Also always perform a remote join into partial state rooms, since we do
not know whether the joining user has been banned and want to avoid
leaking history to banned users.

Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <seanq@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
2023-02-10 23:31:05 +00:00
Patrick Cloke
733531ee3e
Add final type hint to synapse.server. (#15035) 2023-02-09 09:49:04 -05:00
David Robertson
9cd7610f86
Revert "Add event_stream_ordering column to membership state tables (#14979)"
This reverts commit 5fdc12f482.
2023-02-07 15:26:55 +00:00
Nick Mills-Barrett
5fdc12f482
Add event_stream_ordering column to membership state tables (#14979)
This adds an `event_stream_ordering` column to `current_state_events`,
`local_current_membership` and `room_memberships`. Each of these tables
is regularly joined with the `events` table to get the stream ordering
and denormalising this into each table will yield significant query
performance improvements once used. Includes a background job to
populate these values from the `events` table.

Same idea as https://github.com/matrix-org/synapse/pull/13703.

Signed off by Nick @ Beeper (@fizzadar).
2023-02-07 00:10:54 +00:00
David Robertson
796a4b7482
Prefer type(x) is int to isinstance(x, int) (#14945)
* Perfer `type(x) is int` to `isinstance(x, int)`

This covered all additional instances I could see where `x` was
user-controlled.
The remaining cases are

```
$ rg -s 'isinstance.*[^_]int'
tests/replication/_base.py
576:        if isinstance(obj, int):

synapse/util/caches/stream_change_cache.py
136:        assert isinstance(stream_pos, int)
214:        assert isinstance(stream_pos, int)
246:        assert isinstance(stream_pos, int)
267:        assert isinstance(stream_pos, int)

synapse/replication/tcp/external_cache.py
133:        if isinstance(result, int):

synapse/metrics/__init__.py
100:        if isinstance(calls, (int, float)):

synapse/handlers/appservice.py
262:        assert isinstance(new_token, int)

synapse/config/_util.py
62:        if isinstance(p, int):
```

which cover metrics, logic related to `jsonschema`, and replication and
data streams. AFAICS these are all internal to Synapse

* Changelog
2023-01-31 10:33:07 +00:00
Patrick Cloke
6d7523ef14
Batch fetch bundled references (#14508)
Avoid an n+1 query problem and fetch the bundled aggregations for
m.reference relations in a single query instead of a query per event.

This applies similar logic for as was previously done for edits in
8b309adb43 (#11660; threads
in b65acead42 (#11752); and
annotations in 1799a54a54 (#14491).
2022-11-22 09:41:09 -05:00
Patrick Cloke
d8cc86eff4
Remove redundant types from comments. (#14412)
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.

Also adds some missing type hints which were simple to fill in.
2022-11-16 15:25:24 +00:00
Patrick Cloke
fb66fae84b
Clean-up events persistance code (#14411)
By removing unused variables and making some arguments
required which are always provided.
2022-11-14 08:13:11 -05:00
Patrick Cloke
4dd7aa371b
Properly update the threads table when thread events are redacted. (#14248)
When the last event in a thread is redacted we need to update
the threads table:

* Find the new latest event in the thread and store it into the table; or
* Remove the thread from the table if it is no longer a thread (i.e. all
  events in the thread were redacted).
2022-10-21 09:11:19 -04:00
Patrick Cloke
3bbe532abb
Add an API for listing threads in a room. (#13394)
Implement the /threads endpoint from MSC3856.

This is currently unstable and behind an experimental configuration
flag.

It includes a background update to backfill data, results from
the /threads endpoint will be partial until that finishes.
2022-10-13 08:02:11 -04:00
Patrick Cloke
09be8ab5f9
Remove the experimental implementation of MSC3772. (#14094)
MSC3772 has been abandoned.
2022-10-12 06:26:39 -04:00
Erik Johnston
2c237debd3
Fix bug where we didn't delete staging push actions (#14014)
Introduced in #13719
2022-10-03 13:45:19 +00:00
Kateřina Churanová
6caa303083
fix: Push notifications for invite over federation (#13719) 2022-09-28 12:31:53 +00:00
Erik Johnston
e8318a4333
Handle the case of remote users leaving a partial join room for device lists (#13885) 2022-09-27 13:01:08 +01:00
Nick Mills-Barrett
6b4593a80f
Simplify cache invalidation after event persist txn (#13796)
This moves all the invalidations into a single place and de-duplicates
the code involved in invalidating caches for a given event by using
the base class method.
2022-09-26 16:26:35 +01:00
Eric Eastwood
957e3d74fc
Keep track when we try and fail to process a pulled event (#13589)
We can follow-up this PR with:

 1. Only try to backfill from an event if we haven't tried recently -> https://github.com/matrix-org/synapse/issues/13622
 1. When we decide to backfill that event again, process it in the background so it doesn't block and make `/messages` slow when we know it will probably fail again -> https://github.com/matrix-org/synapse/issues/13623
 1. Generally track failures everywhere we try and fail to pull an event over federation -> https://github.com/matrix-org/synapse/issues/13700

Fix https://github.com/matrix-org/synapse/issues/13621

Part of https://github.com/matrix-org/synapse/issues/13356

Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.qv7cj51sv9i5)
2022-09-14 13:57:50 -05:00
Patrick Cloke
666ae87729
Update event push action and receipt tables to support threads. (#13753)
Adds a `thread_id` column to the `event_push_actions`, `event_push_actions_staging`,
and `event_push_summary` tables. This will notifications to be segmented by the thread
in a future pull request. The `thread_id` column stores the root event ID or the special
value `"main"`.

The `thread_id` column for `event_push_actions` and `event_push_summary` is
backfilled with `"main"` for all existing rows. New entries into `event_push_actions`
and `event_push_actions_staging` will get the proper thread ID.

`receipts_linearized` and `receipts_graph` also gain a `thread_id` column, which is similar,
except `NULL` is a special value meaning the receipt is "unthreaded".

See MSC3771 and MSC3773 for where this data will be useful.
2022-09-14 17:11:16 +00:00
Eric Eastwood
0a4efbc1dd
Instrument the federation/backfill part of /messages (#13489)
Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace.

Split out from https://github.com/matrix-org/synapse/pull/13440

Follow-up from https://github.com/matrix-org/synapse/pull/13368

Part of https://github.com/matrix-org/synapse/issues/13356
2022-08-16 12:39:40 -05:00
Richard van der Hoff
96d92156d0
Update type of EventContext.rejected (#13460) 2022-08-04 17:45:01 +01:00
Eric Eastwood
0f971ca68e
Update get_pdu to return the original, pristine EventBase (#13320)
Update `get_pdu` to return the untouched, pristine `EventBase` as it was originally seen over federation (no metadata added). Previously, we returned the same `event` reference that we stored in the cache which downstream code modified in place and added metadata like setting it as an `outlier`  and essentially poisoned our cache. Now we always return a copy of the `event` so the original can stay pristine in our cache and re-used for the next cache call.

Split out from https://github.com/matrix-org/synapse/pull/13205

As discussed at:

 - https://github.com/matrix-org/synapse/pull/13205#discussion_r918365746
 - https://github.com/matrix-org/synapse/pull/13205#discussion_r918366125

Related to https://github.com/matrix-org/synapse/issues/12584. This PR doesn't fix that issue because it hits [`get_event` which exists from the local database before it tries to `get_pdu`](7864f33e28/synapse/federation/federation_client.py (L581-L594)).
2022-07-20 15:58:51 -05:00
Nick Mills-Barrett
2ee0b6ef4b
Safe async event cache (#13308)
Fix race conditions in the async cache invalidation logic, by separating
the async & local invalidation calls and ensuring any async call i
executed first.

Signed off by Nick @ Beeper (@Fizzadar).
2022-07-19 11:25:29 +00:00
Nick Mills-Barrett
cc21a431f3
Async get event cache prep (#13242)
Some experimental prep work to enable external event caching based on #9379 & #12955. Doesn't actually move the cache at all, just lays the groundwork for async implemented caches.

Signed off by Nick @ Beeper (@Fizzadar)
2022-07-15 09:30:46 +00:00
Erik Johnston
e5716b631c
Don't pull out the full state when calculating push actions (#13078) 2022-07-11 20:08:39 +00:00
Sean Quah
1391a76cd2
Faster room joins: fix race in recalculation of current room state (#13151)
Bounce recalculation of current state to the correct event persister and
move recalculation of current state into the event persistence queue, to
avoid concurrent updates to a room's current state.

Also give recalculation of a room's current state a real stream
ordering.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-07 12:19:31 +00:00
Sean Quah
68db233f0c
Handle race between persisting an event and un-partial stating a room (#13100)
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.

We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.

To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.

All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.

On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.

The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.

We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.

`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.

`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.

Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-05 16:12:52 +01:00
Richard van der Hoff
75fb10ee45
Clean up schema for event_edges (#12893)
* Remove redundant references to `event_edges.room_id`

We don't need to care about the room_id here, because we are already checking
the event id.

* Clean up the event_edges table

We make a number of changes to `event_edges`:

 * We give the `room_id` and `is_state` columns defaults (null and false
   respectively) so that we can stop populating them.
 * We drop any rows that have `is_state` set true - they should no longer
   exist.
 * We drop any rows that do not exist in `events` - these should not exist
   either.
 * We drop the old unique constraint on all the colums, which wasn't much use.
 * We create a new unique index on `(event_id, prev_event_id)`.
 * We add a foreign key constraint to `events`.

These happen rather differently depending on whether we are on Postgres or
SQLite. For SQLite, we just rebuild the whole table, copying only the rows we
want to keep. For Postgres, we try to do things in the background as much as
possible.

* Stop populating `event_edges.room_id` and `is_state`

We can just rely on the defaults.
2022-06-15 12:29:42 +01:00
David Robertson
586bfc6dc0
Use dummy fallback engines if imports fail (#12979) 2022-06-07 17:33:55 +01:00
Patrick Cloke
88ce3080d4
Experimental support for MSC3772 (#12740)
Implements the following behind an experimental configuration flag:

* A new push rule kind for mutually related events.
* A new default push rule (`.m.rule.thread_reply`) under an unstable prefix.

This is missing part of MSC3772:

* The `.m.rule.thread_reply_to_me` push rule, this depends on MSC3664 / #11804.
2022-05-24 13:23:23 +00:00
David Robertson
d4713d3e33
Discard null-containing strings before updating the user directory (#12762) 2022-05-18 11:28:14 +01:00
Patrick Cloke
86a515ccbf
Consolidate logic for parsing relations. (#12693)
Parse the `m.relates_to` event content field (which describes relations)
in a single place, this is used during:

* Event persistence.
* Validation of the Client-Server API.
* Fetching bundled aggregations.
* Processing of push rules.

Each of these separately implement the logic and each made slightly
different assumptions about what was valid. Some had minor / potential
bugs.
2022-05-16 12:42:45 +00:00
Erik Johnston
c72d26c1e1
Refactor EventContext (#12689)
Refactor how the `EventContext` class works, with the intention of reducing the amount of state we fetch from the DB during event processing.

The idea here is to get rid of the cached `current_state_ids` and `prev_state_ids` that live in the `EventContext`, and instead defer straight to the database (and its caching). 

One change that may have a noticeable effect is that we now no longer prefill the `get_current_state_ids` cache on a state change. However, that query is relatively light, since its just a case of reading a table from the DB (unlike fetching state at an event which is more heavyweight). For deployments with workers this cache isn't even used.


Part of #12684
2022-05-10 19:43:13 +00:00
Dirk Klimpel
989fa33096
Add some type hints to datastore. (#12477) 2022-05-10 14:07:48 -04:00
Richard van der Hoff
147f098fb4
Stop writing to event_reference_hashes (#12679)
This table is never read, since #11794. We stop writing to it; in future we can
drop it altogether.
2022-05-10 15:35:08 +01:00
David Robertson
fa0eab9c8e
Use ParamSpec in a few places (#12667) 2022-05-09 10:27:39 +00:00
Erik Johnston
ae7858f184
Fix race when persisting an event and deleting a room (#12594)
This works by taking a row level lock on the `rooms` table at the start of both transactions, ensuring that they don't run at the same time. In the event persistence transaction we also check that there is an entry still in the `rooms` table.

I can't figure out how to do this in SQLite. I was just going to lock the table, but it seems that we don't support that in SQLite either, so I'm *really* confused as to how we maintain integrity in SQLite when using `lock_table`....
2022-05-03 11:47:21 +01:00
Richard van der Hoff
320186319a
Resync state after partial-state join (#12394)
We work through all the events with partial state, updating the state at each
of them. Once it's done, we recalculate the state for the whole room, and then
mark the room as having complete state.
2022-04-12 13:23:43 +00:00
Patrick Cloke
86cf6a3a17
Remove references to unstable identifiers from MSC3440. (#12382)
Removes references to unstable thread relation, unstable
identifiers for filtering parameters, and the experimental
config flag.
2022-04-12 08:42:03 -04:00
Richard van der Hoff
6fe757d69e
Fix synapse_event_persisted_position metric (#12390)
Fixes a bug introduced in #11417 where we would only included backfilled events
in `synapse_event_persisted_position`
2022-04-06 13:52:39 +00:00
Richard van der Hoff
ae01a7edd3
Update type annotations for compatiblity with prometheus_client 0.14 (#12389)
Principally, `prometheus_client.REGISTRY.register` now requires its argument to
extend `prometheus_client.Collector`.

Additionally, `Gauge.set` is now annotated so that passing `Optional[int]`
causes an error.
2022-04-06 12:59:04 +00:00
Erik Johnston
7ca8ee67a5
Add cache for get_membership_from_event_ids (#12272)
This should speed up push rule calculations for rooms with large numbers of local users when the main push rule cache fails.

Co-authored-by: reivilibre <oliverw@matrix.org>
2022-03-25 14:58:56 +00:00
Patrick Cloke
ea27528b5d
Support stable identifiers for MSC3440: Threading (#12151)
The unstable identifiers are still supported if the experimental configuration
flag is enabled. The unstable identifiers will be removed in a future release.
2022-03-10 15:36:13 +00:00
Patrick Cloke
88cd6f9378
Allow retrieving the relations of a redacted event. (#12130)
This is allowed per MSC2675, although the original implementation did
not allow for it and would return an empty chunk / not bundle aggregations.

The main thing to improve is that the various caches get cleared properly
when an event is redacted, and that edits must not leak if the original
event is redacted (as that would presumably leak something similar to
the original event content).
2022-03-10 09:03:59 -05:00
Patrick Cloke
f63bedef07
Invalidate caches when an event with a relation is redacted. (#12121)
The caches for the target of the relation must be cleared
so that the bundled aggregations are re-calculated after
the redaction is processed.
2022-03-07 14:00:05 +00:00
Richard van der Hoff
e2e1d90a5e
Faster joins: persist to database (#12012)
When we get a partial_state response from send_join, store information in the
database about it:
 * store a record about the room as a whole having partial state, and stash the
   list of member servers too.
 * flag the join event itself as having partial state
 * also, for any new events whose prev-events are partial-stated, note that
   they will *also* be partial-stated.

We don't yet make any attempt to interpret this data, so API calls (and a bunch
of other things) are just going to get incorrect data.
2022-03-01 12:49:54 +00:00
Sean Quah
f3fd8558cd
Minor typing fixes for synapse/storage/persist_events.py (#12069)
Signed-off-by: Sean Quah <seanq@element.io>
2022-02-25 10:19:49 +00:00
Sean Quah
41cf4c2cf6
Fix non-strings in the event_search table (#12037)
Don't attempt to add non-string `value`s to `event_search` and add a
background update to clear out bad rows from `event_search` when
using sqlite.

Signed-off-by: Sean Quah <seanq@element.io>
2022-02-24 11:52:28 +00:00
Erik Johnston
dc9fe61050
Fix incorrect get_rooms_for_user for remote user (#11999)
When the server leaves a room the `get_rooms_for_user` cache is not
correctly invalidated for the remote users in the room. This means that
subsequent calls to `get_rooms_for_user` for the remote users would
incorrectly include the room (it shouldn't be included because the
server no longer knows anything about the room).
2022-02-15 14:26:28 +00:00
Patrick Cloke
b65acead42
Fetch thread summaries for multiple events in a single query (#11752)
This should reduce database usage when fetching bundled aggregations
as the number of individual queries (and round trips to the database) are
reduced.
2022-02-11 09:50:14 -05:00
Patrick Cloke
8b309adb43
Fetch edits for multiple events in a single query. (#11660)
This should reduce database usage when fetching bundled aggregations
as the number of individual queries (and round trips to the database) are
reduced.
2022-02-08 07:43:30 -05:00
Eric Eastwood
fef2e792be
Fix historical messages backfilling in random order on remote homeservers (MSC2716) (#11114)
Fix https://github.com/matrix-org/synapse/issues/11091
Fix https://github.com/matrix-org/synapse/issues/10764 (side-stepping the issue because we no longer have to deal with `fake_prev_event_id`)

 1. Made the `/backfill` response return messages in `(depth, stream_ordering)` order (previously only sorted by `depth`)
    - Technically, it shouldn't really matter how `/backfill` returns things but I'm just trying to make the `stream_ordering` a little more consistent from the origin to the remote homeservers in order to get the order of messages from `/messages` consistent ([sorted by `(topological_ordering, stream_ordering)`](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)).
    - Even now that we return backfilled messages in order, it still doesn't guarantee the same `stream_ordering` (and more importantly the [`/messages` order](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)) on the other server. For example, if a room has a bunch of history imported and someone visits a permalink to a historical message back in time, their homeserver will skip over the historical messages in between and insert the permalink as the next message in the `stream_order` and totally throw off the sort.
       - This will be even more the case when we add the [MSC3030 jump to date API endpoint](https://github.com/matrix-org/matrix-doc/pull/3030) so the static archives can navigate and jump to a certain date.
       - We're solving this in the future by switching to [online topological ordering](https://github.com/matrix-org/gomatrixserverlib/issues/187) and [chunking](https://github.com/matrix-org/synapse/issues/3785) which by its nature will apply retroactively to fix any inconsistencies introduced by people permalinking
 2. As we're navigating `prev_events` to return in `/backfill`, we order by `depth` first (newest -> oldest) and now also tie-break based on the `stream_ordering` (newest -> oldest). This is technically important because MSC2716 inserts a bunch of historical messages at the same `depth` so it's best to be prescriptive about which ones we should process first. In reality, I think the code already looped over the historical messages as expected because the database is already in order.
 3. Making the historical state chain and historical event chain float on their own by having no `prev_events` instead of a fake `prev_event` which caused backfill to get clogged with an unresolvable event. Fixes https://github.com/matrix-org/synapse/issues/11091 and https://github.com/matrix-org/synapse/issues/10764
 4. We no longer find connected insertion events by finding a potential `prev_event` connection to the current event we're iterating over. We now solely rely on marker events which when processed, add the insertion event as an extremity and the federating homeserver can ask about it when time calls.
    - Related discussion, https://github.com/matrix-org/synapse/pull/11114#discussion_r741514793


Before | After
--- | ---
![](https://user-images.githubusercontent.com/558581/139218681-b465c862-5c49-4702-a59e-466733b0cf45.png) | ![](https://user-images.githubusercontent.com/558581/146453159-a1609e0a-8324-439d-ae44-e4bce43ac6d1.png)



#### Why aren't we sorting topologically when receiving backfill events?

> The main reason we're going to opt to not sort topologically when receiving backfill events is because it's probably best to do whatever is easiest to make it just work. People will probably have opinions once they look at [MSC2716](https://github.com/matrix-org/matrix-doc/pull/2716) which could change whatever implementation anyway.
> 
> As mentioned, ideally we would do this but code necessary to make the fake edges but it gets confusing and gives an impression of “just whyyyy” (feels icky). This problem also dissolves with online topological ordering.
>
> -- https://github.com/matrix-org/synapse/pull/11114#discussion_r741517138

See https://github.com/matrix-org/synapse/pull/11114#discussion_r739610091 for the technical difficulties
2022-02-07 15:54:13 -06:00