Commit Graph

79 Commits

Author SHA1 Message Date
Shay
a86b2f6837
Fix a bug where redactions were not being sent over federation if we did not have the original event. (#13813) 2022-10-11 11:18:45 -07:00
Eric Eastwood
29269d9d3f
Fix have_seen_event cache not being invalidated (#13863)
Fix https://github.com/matrix-org/synapse/issues/13856
Fix https://github.com/matrix-org/synapse/issues/13865

> Discovered while trying to make Synapse fast enough for [this MSC2716 test for importing many batches](https://github.com/matrix-org/complement/pull/214#discussion_r741678240). As an example, disabling the `have_seen_event` cache saves 10 seconds for each `/messages` request in that MSC2716 Complement test because we're not making as many federation requests for `/state` (speeding up `have_seen_event` itself is related to https://github.com/matrix-org/synapse/issues/13625) 
> 
> But this will also make `/messages` faster in general so we can include it in the [faster `/messages` milestone](https://github.com/matrix-org/synapse/milestone/11).
> 
> *-- https://github.com/matrix-org/synapse/issues/13856*


### The problem

`_invalidate_caches_for_event` doesn't run in monolith mode which means we never even tried to clear the `have_seen_event` and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT).

Additionally there was bug with the key being wrong so `_invalidate_caches_for_event` never invalidates the `have_seen_event` cache even when it does run.

Because we were using the `@cachedList` wrong, it was putting items in the cache under keys like `((room_id, event_id),)` with a `set` in a `set` (ex. `(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)`) and we we're trying to invalidate with just `(room_id, event_id)` which did nothing.
2022-09-27 15:55:43 -05:00
reivilibre
d3d9ca156e
Cancel the processing of key query requests when they time out. (#13680) 2022-09-07 12:03:32 +01:00
reivilibre
c2fe48a6ff
Rename the EventFormatVersions enum values so that they line up with room version numbers. (#13706) 2022-09-07 11:08:20 +01:00
Eric Eastwood
92c5817e34
Give the correct next event when the message timestamps are the same - MSC3030 (#13658)
Discovered while working on https://github.com/matrix-org/synapse/pull/13589 and I had all the messages at the same timestamp in the tests.

Part of https://github.com/matrix-org/matrix-spec-proposals/pull/3030

Complement tests: https://github.com/matrix-org/complement/pull/457
2022-08-30 14:50:06 -05:00
Eric Eastwood
0a4efbc1dd
Instrument the federation/backfill part of /messages (#13489)
Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace.

Split out from https://github.com/matrix-org/synapse/pull/13440

Follow-up from https://github.com/matrix-org/synapse/pull/13368

Part of https://github.com/matrix-org/synapse/issues/13356
2022-08-16 12:39:40 -05:00
Richard van der Hoff
507c1cb330
Update the rejected state of events during resync (#13459)
Events can be un-rejected or newly-rejected during resync, so ensure we update
the database and caches when that happens.
2022-08-11 10:42:24 +00:00
Nick Mills-Barrett
41320a0554
Optimise async get event lookups (#13435)
Still maintains local in memory lookup optimisation, but does any external
lookup as part of the deferred that prevents duplicate lookups for the same
event at once. This makes the assumption that fetching from an external
cache is a non-zero load operation.
2022-08-04 15:49:55 +01:00
Richard van der Hoff
ca3db044a3
Fix infinite loop in partial-state resync (#13353)
Make sure that we only pull out events from the db once they have no
prev-events with partial state.
2022-07-26 11:47:31 +00:00
David Robertson
b977867358
Rate limit joins per-room (#13276) 2022-07-19 11:45:17 +00:00
Nick Mills-Barrett
2ee0b6ef4b
Safe async event cache (#13308)
Fix race conditions in the async cache invalidation logic, by separating
the async & local invalidation calls and ensuring any async call i
executed first.

Signed off by Nick @ Beeper (@Fizzadar).
2022-07-19 11:25:29 +00:00
Erik Johnston
f721f1baba
Revert "Make all process_replication_rows methods async (#13304)" (#13312)
This reverts commit 5d4028f217.
2022-07-18 14:28:14 +01:00
Nick Mills-Barrett
5d4028f217
Make all process_replication_rows methods async (#13304)
More prep work for asyncronous caching, also makes all process_replication_rows methods consistent (presence handler already is so).

Signed off by Nick @ Beeper (@Fizzadar)
2022-07-17 22:19:43 +01:00
Nick Mills-Barrett
cc21a431f3
Async get event cache prep (#13242)
Some experimental prep work to enable external event caching based on #9379 & #12955. Doesn't actually move the cache at all, just lays the groundwork for async implemented caches.

Signed off by Nick @ Beeper (@Fizzadar)
2022-07-15 09:30:46 +00:00
Nick Mills-Barrett
21eeacc995
Federation Sender & Appservice Pusher Stream Optimisations (#13251)
* Replace `get_new_events_for_appservice` with `get_all_new_events_stream`

The functions were near identical and this brings the AS worker closer
to the way federation senders work which can allow for multiple workers
to handle AS traffic.

* Pull received TS alongside events when processing the stream

This avoids an extra query -per event- when both federation sender
and appservice pusher process events.
2022-07-15 09:36:56 +01:00
Richard van der Hoff
5e17922ef7
Stop reading from event_edges.room_id. (#12914)
event_edges.room_id is implied by the event id, so there is no need to join on the room id.
2022-05-31 13:51:49 +01:00
Richard van der Hoff
bc1beebc27
Refactor have_seen_events to reduce OOMs (#12886)
My server is currently OOMing in the middle of have_seen_events, so let's try
to fix that.
2022-05-27 10:27:33 +01:00
Erik Johnston
fcf951d5dc
Track in memory events using weakrefs (#10533) 2022-05-17 10:34:27 +01:00
andrew do
01e625513a
remove constantly lib use and switch to enums. (#12624) 2022-05-04 11:26:11 +00:00
Richard van der Hoff
96e0cdbc5a
Add a consistency check on events read from the database (#12620)
I've seen a few errors which can only plausibly be explained by the calculated
event id for an event being different from the ID of the event in the
database. It should be cheap to check this, so let's do so and raise an
exception.
2022-05-03 21:27:52 +01:00
Sean Quah
8a87b4435a
Handle cancellation in EventsWorkerStore._get_events_from_cache_or_db (#12529)
Multiple calls to `EventsWorkerStore._get_events_from_cache_or_db` can
reuse the same database fetch, which is initiated by the first call.
Ensure that cancelling the first call doesn't cancel the other calls
sharing the same database fetch.

Signed-off-by: Sean Quah <seanq@element.io>
2022-04-25 19:39:17 +01:00
Richard van der Hoff
f5668f0b4a
Await un-partial-stating after a partial-state join (#12399)
When we join a room via the faster-joins mechanism, we end up with "partial
state" at some points on the event DAG. Many parts of the codebase need to
wait for the full state to load. So, we implement a mechanism to keep track of
which events have partial state, and wait for them to be fully-populated.
2022-04-21 07:42:03 +01:00
Tulir Asokan
4bc8cb4669
Implement MSC2815: allow room moderators to view redacted event content (#12427)
Implements matrix-org/matrix-spec-proposals#2815

Signed-off-by: Tulir Asokan <tulir@maunium.net>
2022-04-20 12:57:39 +01:00
Richard van der Hoff
320186319a
Resync state after partial-state join (#12394)
We work through all the events with partial state, updating the state at each
of them. Once it's done, we recalculate the state for the whole room, and then
mark the room as having complete state.
2022-04-12 13:23:43 +00:00
Richard van der Hoff
9b43df1f7b
Optimise _get_state_after_missing_prev_event: use /state (#12040)
If we're missing most of the events in the room state, then we may as well call the /state endpoint, instead of individually requesting each and every event.
2022-04-01 12:53:42 +01:00
Patrick Cloke
690cb4f3b3
Allow for ignoring some arguments when caching. (#12189)
* `@cached` can now take an `uncached_args` which is an iterable of names to not use in the cache key.
* Requires `@cached`, @cachedList` and `@lru_cache` to use keyword arguments for clarity.
* Asserts that keyword-only arguments in cached functions are not accepted. (I tested this briefly and I don't believe this works properly.)
2022-03-09 18:07:41 +00:00
Richard van der Hoff
e2e1d90a5e
Faster joins: persist to database (#12012)
When we get a partial_state response from send_join, store information in the
database about it:
 * store a record about the room as a whole having partial state, and stash the
   list of member servers too.
 * flag the join event itself as having partial state
 * also, for any new events whose prev-events are partial-stated, note that
   they will *also* be partial-stated.

We don't yet make any attempt to interpret this data, so API calls (and a bunch
of other things) are just going to get incorrect data.
2022-03-01 12:49:54 +00:00
Eric Eastwood
5a6911598a
Fix 500 error with Postgres when looking backwards with the MSC3030 /timestamp_to_event endpoint (#12024) 2022-02-18 12:11:18 +00:00
Patrick Cloke
45f45404de
Fix incorrect thread summaries when the latest event is edited. (#11992)
If the latest event in a thread was edited than the original
event content was included in bundled aggregation for
threads instead of the edited event content.
2022-02-15 08:26:57 -05:00
Richard van der Hoff
2359ee3864
Remove redundant get_current_events_token (#11643)
* Push `get_room_{min,max_stream_ordering}` into StreamStore

Both implementations of this are identical, so we may as well push it down and
get rid of the abstract base class nonsense.

* Remove redundant `StreamStore` class

This is empty now

* Remove redundant `get_current_events_token`

This was an exact duplicate of `get_room_max_stream_ordering`, so let's get rid
of it.

* newsfile
2022-01-04 16:10:27 +00:00
Richard van der Hoff
5640992d17
Disambiguate queries on state_key (#11497)
We're going to add a `state_key` column to the `events` table, so we need to
add some disambiguation to queries which use it.
2021-12-02 22:42:58 +00:00
Eric Eastwood
a6f1a3abec
Add MSC3030 experimental client and federation API endpoints to get the closest event to a given timestamp (#9445)
MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030

Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about.
```
GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
{
    "event_id": ...
    "origin_server_ts": ...
}
```

Federation API endpoint:
```
GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
{
    "event_id": ...
    "origin_server_ts": ...
}
```

Co-authored-by: Erik Johnston <erik@matrix.org>
2021-12-02 01:02:20 -06:00
Sean Quah
ffd858aa68
Add type hints to synapse/storage/databases/main/events_worker.py (#11411)
Also refactor the stream ID trackers/generators a bit and try to
document them better.
2021-11-26 18:41:31 +00:00
Sean Quah
c675a18071
Track ongoing event fetches correctly (again) (#11376)
The previous fix for the ongoing event fetches counter
(8eec25a1d9) was both insufficient and
incorrect.

When the database is unreachable, `_do_fetch` never gets run and so
`_event_fetch_ongoing` is never decremented.

The previous fix also moved the `_event_fetch_ongoing` decrement outside
of the `_event_fetch_lock` which allowed race conditions to corrupt the
counter.
2021-11-26 13:47:24 +00:00
Sean Quah
8eec25a1d9
Track ongoing event fetches correctly in the presence of failure (#11240)
When an event fetcher aborts due to an exception, `_event_fetch_ongoing`
must be decremented, otherwise the event fetcher would never be
replaced. If enough event fetchers were to fail, no more events would be
fetched and requests would get stuck waiting for events.
2021-11-04 10:33:53 +00:00
Patrick Cloke
0dd0c40329
Add missing type hints to event fetching. (#11121)
Updates the event rows returned from the database to be
attrs classes instead of dictionaries.
2021-10-19 14:29:03 +00:00
Andrew Morgan
aa2c027792
Remove unnecessary parentheses around tuples returned from methods (#10889) 2021-09-23 11:59:07 +01:00
Patrick Cloke
01c88a09cd
Use direct references for some configuration variables (#10798)
Instead of proxying through the magic getter of the RootConfig
object. This should be more performant (and is more explicit).
2021-09-13 13:07:12 -04:00
Erik Johnston
c4fa4f37cb
Fix perf of fetching the same events many times. (#10703)
The code to deduplicate repeated fetches of the same set of events was
N^2 (over the number of events requested), which could lead to a process
being completely wedged.

The main fix is to deduplicate the returned deferreds so we only await
on a deferred once rather than many times. Seperately, when handling the
returned events from the defrered we only add the events we care about
to the event map to be returned (so that we don't pay the price of
inserting extraneous events into the dict).
2021-08-27 09:15:50 +00:00
Erik Johnston
c37dad67ab
Improve event caching code (#10119)
Ensure we only load an event from the DB once when the same event is requested multiple times at once.
2021-08-04 13:54:51 +01:00
Jonathan de Jong
bdfde6dca1
Use inline type hints in http/federation/, storage/ and util/ (#10381) 2021-07-15 12:46:54 -04:00
Richard van der Hoff
b4b2fd2ece
add a cache to have_seen_event (#9953)
Empirically, this helped my server considerably when handling gaps in Matrix HQ. The problem was that we would repeatedly call have_seen_events for the same set of (50K or so) auth_events, each of which would take many minutes to complete, even though it's only an index scan.
2021-06-01 12:04:47 +01:00
Richard van der Hoff
c0df6bae06
Remove keylen from LruCache. (#9993)
`keylen` seems to be a thing that is frequently incorrectly set, and we don't really need it.

The only time it was used was to figure out if we had removed a subtree in `del_multi`, which we can do better by changing `TreeCache.pop` to return a different type (`TreeCacheNode`).

Commits should be independently reviewable.
2021-05-24 14:02:01 +01:00
Richard van der Hoff
294c675033
Remove synapse.types.Collection (#9856)
This is no longer required, since we have dropped support for Python 3.5.
2021-04-22 16:43:50 +01:00
Jonathan de Jong
4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Richard van der Hoff
f02663c4dd
Replace room_invite_state_types with room_prejoin_state (#9700)
`room_invite_state_types` was inconvenient as a configuration setting, because
anyone that ever set it would not receive any new types that were added to the
defaults. Here, we deprecate the old setting, and replace it with a couple of
new settings under `room_prejoin_state`.
2021-03-30 12:12:44 +01:00
Richard van der Hoff
567f88f835
Prep work for removing outlier from internal_metadata (#9411)
* Populate `internal_metadata.outlier` based on `events` table

Rather than relying on `outlier` being in the `internal_metadata` column,
populate it based on the `events.outlier` column.

* Move `outlier` out of InternalMetadata._dict

Ultimately, this will allow us to stop writing it to the database. For now, we
have to grandfather it back in so as to maintain compatibility with older
versions of Synapse.
2021-03-17 12:33:18 +00:00
Richard van der Hoff
af2248f8bf
Optimise missing prev_event handling (#9601)
Background: When we receive incoming federation traffic, and notice that we are missing prev_events from 
the incoming traffic, first we do a `/get_missing_events` request, and then if we still have missing prev_events,
we set up new backwards-extremities. To do that, we need to make a `/state_ids` request to ask the remote
server for the state at those prev_events, and then we may need to then ask the remote server for any events
in that state which we don't already have, as well as the auth events for those missing state events, so that we
can auth them.

This PR attempts to optimise the processing of that state request. The `state_ids` API returns a list of the state
events, as well as a list of all the auth events for *all* of those state events. The optimisation comes from the
observation that we are currently loading all of those auth events into memory at the start of the operation, but
we almost certainly aren't going to need *all* of the auth events. Rather, we can check that we have them, and
leave the actual load into memory for later. (Ideally the federation API would tell us which auth events we're
actually going to need, but it doesn't.)

The effect of this is to reduce the number of events that I need to load for an event in Matrix HQ from about
60000 to about 22000, which means it can stay in my in-memory cache, whereas previously the sheer number
of events meant that all 60K events had to be loaded from db for each request, due to the amount of cache
churn. (NB I've already tripled the size of the cache from its default of 10K).

Unfortunately I've ended up basically C&Ping `_get_state_for_room` and `_get_events_from_store_or_dest` into
a new method, because `_get_state_for_room` is also called during backfill, which expects the auth events to be
returned, so the same tricks don't work. That said, I don't really know why that codepath is completely different
(ultimately we're doing the same thing in setting up a new backwards extremity) so I've left a TODO suggesting
that we clean it up.
2021-03-15 13:51:02 +00:00
Erik Johnston
0b5c967813
Refactor to ensure we call check_consistency (#9470)
The idea here is to stop people forgetting to call `check_consistency`. Folks can still just pass in `None` to the new args in `build_sequence_generator`, but hopefully they won't.
2021-02-24 10:13:53 +00:00
Eric Eastwood
0a00b7ff14
Update black, and run auto formatting over the codebase (#9381)
- Update black version to the latest
 - Run black auto formatting over the codebase
    - Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
 - Update `code_style.md` docs around installing black to use the correct version
2021-02-16 22:32:34 +00:00