Commit Graph

134 Commits

Author SHA1 Message Date
Sean Quah
68db233f0c
Handle race between persisting an event and un-partial stating a room (#13100)
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.

We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.

To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.

All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.

On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.

The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.

We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.

`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.

`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.

Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-05 16:12:52 +01:00
Erik Johnston
1e453053cb
Rename storage classes (#12913) 2022-05-31 12:17:50 +00:00
Sean Quah
a559c8b0d9
Respect the @cancellable flag for ReplicationEndpoints (#12700)
While `ReplicationEndpoint`s register themselves via `JsonResource`,
they pass a method that calls the handler, instead of the handler itself,
to `register_paths`. As a result, `JsonResource` will not correctly pick
up the `@cancellable` flag and we have to apply it ourselves.

Signed-off-by: Sean Quah <seanq@element.io>
2022-05-11 12:25:39 +01:00
David Robertson
a2b00a4486
Bump black and click versions (#12320) 2022-03-29 10:41:19 +00:00
Nick Mills-Barrett
180d8ff0d4
Retry some http replication failures (#12182)
This allows for the target process to be down for around a minute
which provides time for restarts during synapse upgrades/config updates.

Closes: #12178

Signed off by Nick Mills-Barrett nick@beeper.com
2022-03-09 14:53:28 +00:00
Richard van der Hoff
e24ff8ebe3
Remove HomeServer.get_datastore() (#12031)
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
2022-02-23 11:04:02 +00:00
Erik Johnston
6d14b3dabf
Better error message when failing to request from another process (#12060) 2022-02-22 15:52:08 +00:00
Patrick Cloke
63d90f10ec
Add missing type hints to synapse.replication.http. (#11856) 2022-02-08 07:44:39 -05:00
Quentin Gliech
a15a893df8
Save the OIDC session ID (sid) with the device on login (#11482)
As a step towards allowing back-channel logout for OIDC.
2021-12-06 12:43:06 -05:00
Sean Quah
2b82ec425f
Add type hints for most HomeServer parameters (#11095) 2021-10-22 18:15:41 +01:00
Sean Quah
6b18eb4430
Fix opentracing and Prometheus metrics for replication requests (#10996)
This commit fixes two bugs to do with decorators not instrumenting
`ReplicationEndpoint`'s `send_request` correctly. There are two
decorators on `send_request`: Prometheus' `Gauge.track_inprogress()`
and Synapse's `opentracing.trace`.

`Gauge.track_inprogress()` does not have any support for async
functions when used as a decorator. Since async functions behave like
regular functions that return coroutines, only the creation of the
coroutine was covered by the metric and none of the actual body of
`send_request`.

`Gauge.track_inprogress()` returns a regular, non-async function
wrapping `send_request`, which is the source of the next bug.
The `opentracing.trace` decorator would normally handle async functions
correctly, but since the wrapped `send_request` is a non-async function,
the decorator ends up suffering from the same issue as
`Gauge.track_inprogress()`: the opentracing span only measures the
creation of the coroutine and none of the actual function body.

Using `Gauge.track_inprogress()` as a context manager instead of a
decorator resolves both bugs.
2021-10-12 11:23:46 +01:00
Patrick Cloke
bb7fdd821b
Use direct references for configuration variables (part 5). (#10897) 2021-09-24 07:25:21 -04:00
Richard van der Hoff
1800aabfc2
Split FederationHandler in half (#10692)
The idea here is to take anything to do with incoming events and move it out to a separate handler, as a way of making FederationHandler smaller.
2021-08-26 21:41:44 +01:00
Jonathan de Jong
bf72d10dbf
Use inline type hints in various other places (in synapse/) (#10380) 2021-07-15 11:02:43 +01:00
Quentin Gliech
bd4919fb72
MSC2918 Refresh tokens implementation (#9450)
This implements refresh tokens, as defined by MSC2918

This MSC has been implemented client side in Hydrogen Web: vector-im/hydrogen-web#235

The basics of the MSC works: requesting refresh tokens on login, having the access tokens expire, and using the refresh token to get a new one.

Signed-off-by: Quentin Gliech <quentingliech@gmail.com>
2021-06-24 14:33:20 +01:00
Richard van der Hoff
d7808a2dde
Extend ResponseCache to pass a context object into the callback (#10157)
This is the first of two PRs which seek to address #8518. This first PR lays the groundwork by extending ResponseCache; a second PR (#10158) will update the SyncHandler to actually use it, and fix the bug.

The idea here is that we allow the callback given to ResponseCache.wrap to decide whether its result should be cached or not. We do that by (optionally) passing a ResponseCacheContext into it, which it can modify.
2021-06-14 10:26:09 +01:00
Sorunome
d936371b69
Implement knock feature (#6739)
This PR aims to implement the knock feature as proposed in https://github.com/matrix-org/matrix-doc/pull/2403

Signed-off-by: Sorunome mail@sorunome.de
Signed-off-by: Andrew Morgan andrewm@element.io
2021-06-09 19:39:51 +01:00
Richard van der Hoff
1bf83a191b
Clean up the interface for injecting opentracing over HTTP (#10143)
* Remove unused helper functions

* Clean up the interface for injecting opentracing over HTTP

* changelog
2021-06-09 11:33:00 +01:00
Andrew Morgan
4d6e5a5e99
Use a database table to hold the users that should have full presence sent to them, instead of something in-memory (#9823) 2021-05-18 14:13:45 +01:00
Erik Johnston
9d25a0ae65
Split presence out of master (#9820) 2021-04-23 12:21:55 +01:00
Jonathan de Jong
4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Erik Johnston
963f4309fe
Make RateLimiter class check for ratelimit overrides (#9711)
This should fix a class of bug where we forget to check if e.g. the appservice shouldn't be ratelimited.

We also check the `ratelimit_override` table to check if the user has ratelimiting disabled. That table is really only meant to override the event sender ratelimiting, so we don't use any values from it (as they might not make sense for different rate limits), but we do infer that if ratelimiting is disabled for the user we should disabled all ratelimits.

Fixes #9663
2021-03-30 12:06:09 +01:00
Richard van der Hoff
567f88f835
Prep work for removing outlier from internal_metadata (#9411)
* Populate `internal_metadata.outlier` based on `events` table

Rather than relying on `outlier` being in the `internal_metadata` column,
populate it based on the `events.outlier` column.

* Move `outlier` out of InternalMetadata._dict

Ultimately, this will allow us to stop writing it to the database. For now, we
have to grandfather it back in so as to maintain compatibility with older
versions of Synapse.
2021-03-17 12:33:18 +00:00
Richard van der Hoff
1107214a1d
Fix the auth provider on the logins metric (#9573)
We either need to pass the auth provider over the replication api, or make sure
we report the auth provider on the worker that received the request. I've gone
with the latter.
2021-03-10 18:15:03 +00:00
Jonathan de Jong
d6196efafc
Add ResponseCache tests. (#9458) 2021-03-08 14:00:07 -05:00
Patrick Cloke
a0bc9d387e
Use the proper Request in type hints. (#9515)
This also pins the Twisted version in the mypy job for CI until
proper type hints are fixed throughout Synapse.
2021-03-01 12:23:46 -05:00
Erik Johnston
66f4949e7f
Fix deleting pushers when using sharded pushers. (#9465) 2021-02-22 21:14:42 +00:00
AndrewFerr
9bc74743d5
Add configs to make profile data more private (#9203)
Add off-by-default configuration settings to:
- disable putting an invitee's profile info in invite events
- disable profile lookup via federation

Signed-off-by: Andrew Ferrazzutti <fair@miscworks.net>
2021-02-19 09:50:41 +00:00
Eric Eastwood
0a00b7ff14
Update black, and run auto formatting over the codebase (#9381)
- Update black version to the latest
 - Run black auto formatting over the codebase
    - Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
 - Update `code_style.md` docs around installing black to use the correct version
2021-02-16 22:32:34 +00:00
Erik Johnston
6633a4015a
Allow moving account data and receipts streams off master (#9104) 2021-01-18 15:47:59 +00:00
Erik Johnston
f08ef64926
Enforce all replication HTTP clients calls use kwargs (#9144) 2021-01-18 15:24:04 +00:00
Erik Johnston
a7a913918c Merge remote-tracking branch 'origin/erikj/as_mau_block' into develop 2020-12-18 09:51:56 +00:00
Erik Johnston
4c33796b20 Correctly handle AS registerations and add test 2020-12-17 12:55:21 +00:00
Patrick Cloke
96358cb424
Add authentication to replication endpoints. (#8853)
Authentication is done by checking a shared secret provided
in the Synapse configuration file.
2020-12-04 10:56:28 -05:00
Andrew Morgan
5cbe8d93fe
Add typing to membership Replication class methods (#8809)
This PR grew out of #6739, and adds typing to some method arguments

You'll notice that there are a lot of `# type: ignores` in here. This is due to the base methods not matching the overloads here. This is necessary to stop mypy complaining, but a better solution is #8828.
2020-11-27 10:49:38 +00:00
Andrew Morgan
e8d0853739
Generalise _maybe_store_room_on_invite (#8754)
There's a handy function called maybe_store_room_on_invite which allows us to create an entry in the rooms table for a room and its version for which we aren't joined to yet, but we can reference when ingesting events about.

This is currently used for invites where we receive some stripped state about the room and pass it down via /sync to the client, without us being in the room yet.

There is a similar requirement for knocking, where we will eventually do the same thing, and need an entry in the rooms table as well. Thus, reusing this function works, however its name needs to be generalised a bit.

Separated out from #6739.
2020-11-13 16:24:04 +00:00
Erik Johnston
f21e24ffc2
Add ability for access tokens to belong to one user but grant access to another user. (#8616)
We do it this way round so that only the "owner" can delete the access token (i.e. `/logout/all` by the "owner" also deletes that token, but `/logout/all` by the "target user" doesn't).

A future PR will add an API for creating such a token.

When the target user and authenticated entity are different the `Processed request` log line will be logged with a: `{@admin:server as @bob:server} ...`. I'm not convinced by that format (especially since it adds spaces in there, making it harder to use `cut -d ' '` to chop off the start of log lines). Suggestions welcome.
2020-10-29 15:58:44 +00:00
Erik Johnston
b2486f6656
Fix message duplication if something goes wrong after persisting the event (#8476)
Should fix #3365.
2020-10-13 12:07:56 +01:00
Patrick Cloke
1781bbe319
Add type hints to response cache. (#8507) 2020-10-09 11:35:11 -04:00
Patrick Cloke
c9c0ad5e20
Remove the deprecated Handlers object (#8494)
All handlers now available via get_*_handler() methods on the HomeServer.
2020-10-09 07:24:34 -04:00
Richard van der Hoff
866c84da8d
Add metrics to track success/otherwise of replication requests (#8406)
One hope is that this might provide some insights into #3365.
2020-09-29 11:06:11 +01:00
Patrick Cloke
8a4a4186de
Simplify super() calls to Python 3 syntax. (#8344)
This converts calls like super(Foo, self) -> super().

Generated with:

    sed -i "" -Ee 's/super\([^\(]+\)/super()/g' **/*.py
2020-09-18 09:56:44 -04:00
Jonathan de Jong
a3f124b821
Switch metaclass initialization to python 3-compatible syntax (#8326) 2020-09-16 15:15:55 -04:00
Erik Johnston
04cc249b43
Add experimental support for sharding event persister. Again. (#8294)
This is *not* ready for production yet. Caveats:

1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
2020-09-14 10:16:41 +01:00
Patrick Cloke
2ea1c68249
Remove some unused distributor signals (#8216)
Removes the `user_joined_room` and stops calling it since there are no observers.

Also cleans-up some other unused signals and related code.
2020-09-09 12:22:00 -04:00
Patrick Cloke
c619253db8
Stop sub-classing object (#8249) 2020-09-04 06:54:56 -04:00
Brendan Abolivier
9f8abdcc38
Revert "Add experimental support for sharding event persister. (#8170)" (#8242)
* Revert "Add experimental support for sharding event persister. (#8170)"

This reverts commit 82c1ee1c22.

* Changelog
2020-09-04 10:19:42 +01:00
Erik Johnston
82c1ee1c22
Add experimental support for sharding event persister. (#8170)
This is *not* ready for production yet. Caveats:

1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
2020-09-02 15:48:37 +01:00
Patrick Cloke
ac77cdb64e
Add a shadow-banned flag to users. (#8092) 2020-08-14 12:37:59 -04:00
Patrick Cloke
3b415e23a5
Convert replication code to async/await. (#7987) 2020-08-03 07:12:55 -04:00