Refactor how the `EventContext` class works, with the intention of reducing the amount of state we fetch from the DB during event processing.
The idea here is to get rid of the cached `current_state_ids` and `prev_state_ids` that live in the `EventContext`, and instead defer straight to the database (and its caching).
One change that may have a noticeable effect is that we now no longer prefill the `get_current_state_ids` cache on a state change. However, that query is relatively light, since its just a case of reading a table from the DB (unlike fetching state at an event which is more heavyweight). For deployments with workers this cache isn't even used.
Part of #12684
I've seen a few errors which can only plausibly be explained by the calculated
event id for an event being different from the ID of the event in the
database. It should be cheap to check this, so let's do so and raise an
exception.
When configuring the return values of mocks, prefer awaitables from
`make_awaitable` over `defer.succeed`. `Deferred`s are only awaitable
once, so it is inappropriate for a mock to return the same `Deferred`
multiple times.
Also update `run_in_background` to support functions that return
arbitrary awaitables.
Signed-off-by: Sean Quah <seanq@element.io>
Multiple calls to `EventsWorkerStore._get_events_from_cache_or_db` can
reuse the same database fetch, which is initiated by the first call.
Ensure that cancelling the first call doesn't cancel the other calls
sharing the same database fetch.
Signed-off-by: Sean Quah <seanq@element.io>
When we join a room via the faster-joins mechanism, we end up with "partial
state" at some points on the event DAG. Many parts of the codebase need to
wait for the full state to load. So, we implement a mechanism to keep track of
which events have partial state, and wait for them to be fully-populated.
This is a first step in dealing with #7721.
The idea is basically that rather than calculating the full set of users a device list update needs to be sent to up front, we instead simply record the rooms the user was in at the time of the change. This will allow a few things:
1. we can defer calculating the set of remote servers that need to be poked about the change; and
2. during `/sync` and `/keys/changes` we can avoid also avoid calculating users who share rooms with other users, and instead just look at the rooms that have changed.
However, care needs to be taken to correctly handle server downgrades. As such this PR writes to both `device_lists_changes_in_room` and the `device_lists_outbound_pokes` table synchronously. In a future release we can then bump the database schema compat version to `69` and then we can assume that the new `device_lists_changes_in_room` exists and is handled.
There is a temporary option to disable writing to `device_lists_outbound_pokes` synchronously, allowing us to test the new code path does work (and by implication upgrading to a future release and downgrading to this one will work correctly).
Note: Ideally we'd do the calculation of room to servers on a worker (e.g. the background worker), but currently only master can write to the `device_list_outbound_pokes` table.
There are a bunch of places we call get_success on an immediate value, which is unnecessary. Let's rip them out, and remove the redundant functionality in get_success and friends.
Switching to a sequence means there's no need to track `last_txn` on the
AS state table to generate new TXN IDs. This also means that there is
no longer contention between the AS scheduler and AS handler on updates
to the `application_services_state` table, which will prevent serialization
errors during the complete AS txn transaction.
To handle cancellation, we ensure that `after_callback`s and
`exception_callback`s are always run, since the transaction will
complete on another thread regardless of cancellation.
We also wait until everything is done before releasing the
`CancelledError`, so that logging contexts won't get used after they
have been finished.
Signed-off-by: Sean Quah <seanq@element.io>
The unstable identifiers are still supported if the experimental configuration
flag is enabled. The unstable identifiers will be removed in a future release.
Don't attempt to add non-string `value`s to `event_search` and add a
background update to clear out bad rows from `event_search` when
using sqlite.
Signed-off-by: Sean Quah <seanq@element.io>
When the server leaves a room the `get_rooms_for_user` cache is not
correctly invalidated for the remote users in the room. This means that
subsequent calls to `get_rooms_for_user` for the remote users would
incorrectly include the room (it shouldn't be included because the
server no longer knows anything about the room).
* Make `get_auth_chain_ids` return a Set
It has a set internally, and a set is often useful where it gets used, so let's
avoid converting to an intermediate list.
* Minor refactors in `on_send_join_request`
A little bit of non-functional groundwork
* Implement MSC3706: partial state in /send_join response
==============================
Bugfixes
--------
- Fix a bug introduced in Synapse 1.40.0 that caused Synapse to fail to process incoming federation traffic after handling a large amount of events in a v1 room. ([\#11806](https://github.com/matrix-org/synapse/issues/11806))
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAmHum5QTHGFuZHJld0Bh
bW9yZ2FuLnh5egAKCRCIhIgNLv5f9G6vD/9Dw6V0687vzahnU6aYWaecRSO1sbww
EtcCiXOh0r8HXPAwEIXJYomSTTbPl0eAwP8T1the1WZVQArvRW2VvMzDoBheO8bt
dCm4CTKFCGUsI/GrlFDPtjAEd6kruAETmpHZn7bAFSqtIpissD6FjUwg/ND/NJfM
zVM/bMgM0+js6eD9J5/k16E1ZWj7Lbp+/fKN+qTeQrXzIeT13WQZ4Nz1o/cqe/21
o/coI3FmYq8CzKpfA6qZMDd/OtYpYqwwr7otSmW+6qHeZI/yqeoxlpzfgSZGpRUe
mtXqbQllZQeqrbm8oK96GNmKcThy6awTwJPoD46b1AOkmFdf/lGSqO0lnTQRRPqR
hyPPdrx6Lt2t7DDbVVyUElzkqLPhVJtrItPDC8669sWvmSgsPQdUqIsRmhF+8aJe
ffjRvKbGRICnNZkT+qGf1HwuMHajZyAIHAS/kyFDKUZCvau3VQ1wlyOqVQ1hWr7+
3k3CobjSx4y1bYKXncAK6hWH6lE8M319jaTnVfYXocDLWRonyFf7o286Q+c8WBGF
tY0FzvUPb5S2kktC4WLwcqtcTWK1cu5MI6GfD69EqL7iifJRVyKDeUoD7tiUvgzN
O2wBl2soJIsAU+8y16WQu7p+k2nZIomCCAySnW3C8mlxvv7CKQu0Url8J7IH8uZE
ZVN2MMe7H+ZHJQ==
=Fpsa
-----END PGP SIGNATURE-----
Merge tag 'v1.51.0rc2' into develop
Synapse 1.51.0rc2 (2022-01-24)
==============================
Bugfixes
--------
- Fix a bug introduced in Synapse 1.40.0 that caused Synapse to fail to process incoming federation traffic after handling a large amount of events in a v1 room. ([\#11806](https://github.com/matrix-org/synapse/issues/11806))