Commit Graph

717 Commits

Author SHA1 Message Date
Patrick Cloke
375b0a8a11
Update code to refer to "workers". (#15606)
A bunch of comments and variables are out of date and use
obsolete terms.
2023-05-16 15:56:38 -04:00
Roel ter Maat
2611433b70
Add redis SSL configuration options (#15312)
* Add SSL options to redis config

* fix lint issues

* Add documentation and changelog file

* add missing . at the end of the changelog

* Move client context factory to new file

* Rename ssl to tls and fix typo

* fix lint issues

* Added when redis attributes were added
2023-05-11 13:02:51 +01:00
Jason Little
e4f545c452
Remove worker_replication_* settings (#15491)
* Add master to the instance_map as part of Complement, have ReplicationEndpoint look at instance_map for master.

* Fix typo in drive by.

* Remove unnecessary worker_replication_* bits from unit tests and add master to instance_map(hopefully in the right place)

* Several updates:

1. Switch from master to main for naming the main process in the instance_map. Add useful constants for easier adjustment of names in the future.
2. Add backwards compatibility for worker_replication_* to allow time to transition to new style. Make sure to prioritize declaring main directly on the instance_map.
3. Clean up old comments/commented out code.
4. Adjust unit tests to match with new code.
5. Adjust Complement setup infrastructure to only add main to the instance_map if workers are used and remove now unused options from the worker.yaml template.

* Initial Docs upload

* Changelog

* Missed some commented out code that can go now

* Remove TODO comment that no longer holds true.

* Fix links in docs

* More docs

* Remove debug logging

* Apply suggestions from code review

Co-authored-by: reivilibre <olivier@librepush.net>

* Apply suggestions from code review

Co-authored-by: reivilibre <olivier@librepush.net>

* Update version to latest, include completeish before/after examples in upgrade notes.

* Fix up and docs too

---------

Co-authored-by: reivilibre <olivier@librepush.net>
2023-05-11 11:30:56 +01:00
Jason Little
d3bd03559b
HTTP Replication Client (#15470)
Separate out a HTTP client for replication in preparation for
also supporting using UNIX sockets. The major difference from
the base class is that this does not use treq to handle HTTP
requests.
2023-05-09 14:25:20 -04:00
Alok Kumar Singh
197fbb123b
Remove legacy code of single user device resync api (#15418)
* Removed single-user resync usage and updated it to use multi-user counterpart

Signed-off-by: Alok Kumar Singh alokaks601@gmail.com
2023-04-21 12:06:39 +01:00
Mathieu Velten
9228ae633f
Add some clarification to the doc/comments regarding TCP replication (#15354) 2023-03-30 12:51:35 +02:00
David Robertson
1bc9985eb7
Have replication clients remove _INT_STREAM_POS (#15309)
* Have replication clients remove _INT_STREAM_POS

Suppose worker A makes an internal http request from worker B. B may
make changes that A later learns about over replication. We want A's
request to block until it has seen those changes—mainly to ensure A's
caches are invalidated promptly. This helps provide read-after-write
consistency, eliminating entire categories of races and test flakes.

To implement this, B includes a top-level field `_INT_STREAM_POS` in its
response JSON. Roughly speaking, the field's value tells A what to wait
for. But we weren't removing that internal field before A's request
completed!

Introduced in https://github.com/matrix-org/synapse/pull/14820.
Fixes #15308.

* Changelog
2023-03-22 12:53:55 +00:00
Patrick Cloke
afb216c202
Remove no-op send_command for Redis replication. (#15274)
With Redis commands do not need to be re-issued by the main
process (they fan-out to all processes at once) and thus it is no
longer necessary to worry about them reflecting recursively forever.
2023-03-16 11:13:30 -04:00
Patrick Cloke
3bf973edc7
Remove unused class: DirectTcpReplicationClientFactory. (#15272) 2023-03-15 15:42:20 -04:00
Dirk Klimpel
ecbe0ddbe7
Add support for knocking to workers. (#15133) 2023-03-02 12:59:53 -05:00
H. Shay
b2fd03d075 Merge branch 'master' into develop 2023-02-28 10:14:20 -08:00
Erik Johnston
b2357a898c
Fix bug where 5s delays would occasionally happen. (#15150)
This only affects deployments using workers.
2023-02-24 14:39:50 +00:00
dependabot[bot]
9bb2eac719
Bump black from 22.12.0 to 23.1.0 (#15103) 2023-02-22 15:29:09 -05:00
reivilibre
addd12f16d
Tweak logging for when a worker waits for its view of a replication stream to catch up. (#15120)Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Improve logging messages for the 'wait for repl stream' read-after-write consistency feature

* Newsfile

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>

* Update synapse/replication/tcp/client.py

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

---------

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2023-02-21 12:26:00 +00:00
Erik Johnston
c78c67c5a9
Fix bug in replication where response is cached (#15024) 2023-02-08 16:41:55 +00:00
David Robertson
80d44060c9
Faster joins: omit partial rooms from eager syncs until the resync completes (#14870)
* Allow `AbstractSet` in `StrCollection`

Or else frozensets are excluded. This will be useful in an upcoming
commit where I plan to change a function that accepts `List[str]` to
accept `StrCollection` instead.

* `rooms_to_exclude` -> `rooms_to_exclude_globally`

I am about to make use of this exclusion mechanism to exclude rooms for
a specific user and a specific sync. This rename helps to clarify the
distinction between the global config and the rooms to exclude for a
specific sync.

* Better function names for internal sync methods

* Track a list of excluded rooms on SyncResultBuilder

I plan to feed a list of partially stated rooms for this sync to ignore

* Exclude partial state rooms during eager sync

using the mechanism established in the previous commit

* Track un-partial-state stream in sync tokens

So that we can work out which rooms have become fully-stated during a
given sync period.

* Fix mutation of `@cached` return value

This was fouling up a complement test added alongside this PR.
Excluding a room would mean the set of forgotten rooms in the cache
would be extended. This means that room could be erroneously considered
forgotten in the future.

Introduced in #12310, Synapse 1.57.0. I don't think this had any
user-visible side effects (until now).

* SyncResultBuilder: track rooms to force as newly joined

Similar plan as before. We've omitted rooms from certain sync responses;
now we establish the mechanism to reintroduce them into future syncs.

* Read new field, to present rooms as newly joined

* Force un-partial-stated rooms to be newly-joined

for eager incremental syncs only, provided they're still fully stated

* Notify user stream listeners to wake up long polling syncs

* Changelog

* Typo fix

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

* Unnecessary list cast

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

* Rephrase comment

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

* Another comment

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

* Fixup merge(?)

* Poke notifier when receiving un-partial-stated msg over replication

* Fixup merge whoops

Thanks MV :)

Co-authored-by: Mathieu Velen <mathieuv@matrix.org>

Co-authored-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2023-01-23 15:44:39 +00:00
Sean Quah
2ec9c58496
Faster joins: Update room stats and the user directory on workers when finishing join (#14874)
* Faster joins: Update room stats and user directory on workers when done

When finishing a partial state join to a room, we update the current
state of the room without persisting additional events. Workers receive
notice of the current state update over replication, but neglect to wake
the room stats and user directory updaters, which then get incidentally
triggered the next time an event is persisted or an unrelated event
persister sends out a stream position update.

We wake the room stats and user directory updaters at the appropriate
time in this commit.

Part of #12814 and #12815.

Signed-off-by: Sean Quah <seanq@matrix.org>

* fixup comment

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-23 10:31:36 +00:00
reivilibre
22cc93afe3
Enable Faster Remote Room Joins against worker-mode Synapse. (#14752)
* Enable Complement tests for Faster Remote Room Joins on worker-mode

* (dangerous) Add an override to allow Complement to use FRRJ under workers

* Newsfile

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>

* Fix race where we didn't send out replication notification

* MORE HACKS

* Fix get_un_partial_stated_rooms_token to take instance_name

* Fix bad merge

* Remove warning

* Correctly advance un_partial_stated_room_stream

* Fix merge

* Add another notify_replication

* Fixups

* Create a separate ReplicationNotifier

* Fix test

* Fix portdb

* Create a separate ReplicationNotifier

* Fix test

* Fix portdb

* Fix presence test

* Newsfile

* Apply suggestions from code review

* Update changelog.d/14752.misc

Co-authored-by: Erik Johnston <erik@matrix.org>

* lint

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Erik Johnston <erik@matrix.org>
2023-01-22 21:10:11 +00:00
Erik Johnston
0ec12a3753
Reduce max time we wait for stream positions (#14881)
Now that we wait for stream positions whenever we do a HTTP replication
hit, we need to be less brutal in the case where we do timeout (as we
have bugs around this).
2023-01-20 21:04:33 +00:00
Erik Johnston
cdf2707678
Fix bug in wait for stream position (#14872)
This caused some requests to fail.

This caused some requests to fail.

This really only started causing issues due to #14856
2023-01-19 22:19:56 +00:00
Erik Johnston
9187fd940e
Wait for streams to catch up when processing HTTP replication. (#14820)
This should hopefully mitigate a class of races where data gets out of
sync due a HTTP replication request racing with the replication streams.
2023-01-18 19:35:29 +00:00
Erik Johnston
316590d1ea
Fix bug in wait_for_stream_position (#14856)
We were incorrectly checking if the *local* token had been advanced, rather than the token for the remote instance.

In practice, I don't think this has caused any bugs due to where we use `wait_for_stream_position`, as critically we don't use it on instances that also write to the given streams (and so the local token will lag behind all remote tokens).
2023-01-17 09:58:22 +00:00
Erik Johnston
2b084c5b71
Merge device list replication streams (#14833) 2023-01-17 09:29:58 +00:00
Erik Johnston
73ff493dfb
Merge account data streams (#14826) 2023-01-13 14:57:43 +00:00
reivilibre
ba4ea7d13f
Batch up replication requests to request the resyncing of remote users's devices. (#14716) 2023-01-10 11:17:59 +00:00
Nick Mills-Barrett
db1cfe9c80
Update all stream IDs after processing replication rows (#14723)
This creates a new store method, `process_replication_position` that
is called after `process_replication_rows`. By moving stream ID advances
here this guarantees any relevant cache invalidations will have been
applied before the stream is advanced.

This avoids race conditions where Python switches between threads mid
way through processing the `process_replication_rows` method where stream
IDs may be advanced before caches are invalidated due to class resolution
ordering.

See this comment/issue for further discussion:
	https://github.com/matrix-org/synapse/issues/14158#issuecomment-1344048703
2023-01-04 11:49:26 +00:00
Andrew Morgan
c4456114e1
Add experimental support for MSC3391: deleting account data (#14714) 2023-01-01 03:40:46 +00:00
reivilibre
2888d7ec83
Faster remote room joins: invalidate caches and unblock requests when receiving un-partial-stated event notifications over replication. [rei:frrj/streams/unpsr] (#14546) 2022-12-19 14:57:51 +00:00
reivilibre
fb60cb16fe
Faster remote room joins: stream the un-partial-stating of events over replication. [rei:frrj/streams/unpsr] (#14545) 2022-12-14 14:47:11 +00:00
reivilibre
9e82caac45
Faster remote room joins: unblock tasks waiting for full room state when the un-partial-stating of that room is received over the replication stream. [rei:frrj/streams/unpsr] (#14474) 2022-12-06 15:48:42 +00:00
reivilibre
501f62d1a6
Faster remote room joins: stream the un-partial-stating of rooms over replication. [rei:frrj/streams/unpsr] (#14473) 2022-12-05 13:07:55 +00:00
Patrick Cloke
6d47b7e325
Add a type hint for get_device_handler() and fix incorrect types. (#14055)
This was the last untyped handler from the HomeServer object. Since
it was being treated as Any (and thus unchecked) it was being used
incorrectly in a few places.
2022-11-22 14:08:04 -05:00
Andrew Morgan
e7132c3f81
Fix check to ignore blank lines in incoming TCP replication (#14449) 2022-11-17 16:09:56 +00:00
David Robertson
115f0eb233
Reintroduce #14376, with bugfix for monoliths (#14468)
* Add tests for StreamIdGenerator

* Drive-by: annotate all defs

* Revert "Revert "Remove slaved id tracker (#14376)" (#14463)"

This reverts commit d63814fd73, which in
turn reverted 36097e88c4. This restores
the latter.

* Fix StreamIdGenerator not handling unpersisted IDs

Spotted by @erikjohnston.

Closes #14456.

* Changelog

Co-authored-by: Nick Mills-Barrett <nick@fizzadar.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
2022-11-16 22:16:46 +00:00
realtyem
c15e9a0edb
Remove need for worker_main_http_uri setting to use /keys/upload. (#14400) 2022-11-16 22:16:25 +00:00
Patrick Cloke
d8cc86eff4
Remove redundant types from comments. (#14412)
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.

Also adds some missing type hints which were simple to fill in.
2022-11-16 15:25:24 +00:00
Erik Johnston
d63814fd73
Revert "Remove slaved id tracker (#14376)" (#14463)
This reverts commit 36097e88c4.
2022-11-16 13:50:07 +00:00
Tuomas Ojamies
b5ab2c428a
Support using SSL on worker endpoints. (#14128)
* Fix missing SSL support in worker endpoints.

* Add changelog

* SSL for Replication endpoint

* Remove unit test change

* Refactor listener creation to reduce duplicated code

* Fix the logger message

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Add config documentation for new TLS option

Co-authored-by: Tuomas Ojamies <tojamies@palantir.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
2022-11-15 12:55:00 +00:00
Nick Mills-Barrett
36097e88c4
Remove slaved id tracker (#14376)
This matches the multi instance writer ID generator class which can
both handle advancing the current token over replication and by calling
the database.
2022-11-14 17:31:36 +00:00
Nick Mills-Barrett
3a4f80f8c6
Merge/remove Slaved* stores into WorkerStores (#14375) 2022-11-11 10:51:49 +00:00
Patrick Cloke
bc2bd92b93 Merge remote-tracking branch 'origin/release-v1.69' into develop 2022-10-14 14:11:27 -04:00
Brendan Abolivier
422cff7df6
Fallback if 'approved' isn't included in a registration replication request (#14135) 2022-10-11 14:41:06 +02:00
Shay
7b7478e8b6
Batch up notifications after event persistence (#14033) 2022-10-05 10:12:48 -07:00
Brendan Abolivier
be76cd8200
Allow admins to require a manual approval process before new accounts can be used (using MSC3866) (#13556) 2022-09-29 15:23:24 +02:00
Shay
8ab16a92ed
Persist CreateRoom events to DB in a batch (#13800) 2022-09-28 10:11:48 +00:00
Patrick Cloke
efd108b45d
Accept & store thread IDs for receipts (implement MSC3771). (#13782)
Updates the `/receipts` endpoint and receipt EDU handler to parse a
`thread_id` from the body and insert it in the database.
2022-09-23 14:33:28 +00:00
Brendan Abolivier
8ae42ab8fa
Support enabling/disabling pushers (from MSC3881) (#13799)
Partial implementation of MSC3881
2022-09-21 14:39:01 +00:00
Patrick Cloke
32fc3b7ba4
Remove configuration options for direct TCP replication. (#13647)
Removes the ability to configure legacy direct TCP replication. Workers now require Redis to run.
2022-09-06 07:50:02 +00:00
Šimon Brandner
0e99f07952
Remove support for unstable private read receipts (#13653)
Signed-off-by: Šimon Brandner <simon.bra.ag@gmail.com>
2022-09-01 13:31:54 +01:00
reivilibre
7bc110a19e
Generalise the @cancellable annotation so it can be used on functions other than just servlet methods. (#13662) 2022-08-31 11:16:05 +00:00
Erik Johnston
aec87a0f93
Speed up fetching large numbers of push rules (#13592) 2022-08-23 13:15:43 +01:00
Šimon Brandner
ab18441573
Support stable identifiers for MSC2285: private read receipts. (#13273)
This adds support for the stable identifiers of MSC2285 while
continuing to support the unstable identifiers behind the configuration
flag. These will be removed in a future version.
2022-08-05 11:09:33 -04:00
Nick Mills-Barrett
86e366a46e
Remove old empty/redundant slaved stores. (#13349) 2022-07-21 17:56:45 +00:00
Nick Mills-Barrett
190f49d8ab
Use cache store remove base slaved (#13329)
This comes from two identical definitions in each of the base stores, and means the base slaved store is now empty and can be removed.
2022-07-21 11:51:30 +01:00
Patrick Cloke
a6895dd576
Add type annotations to trace decorator. (#13328)
Functions that are decorated with `trace` are now properly typed
and the type hints for them are fixed.
2022-07-19 14:14:30 -04:00
David Robertson
b977867358
Rate limit joins per-room (#13276) 2022-07-19 11:45:17 +00:00
Erik Johnston
f721f1baba
Revert "Make all process_replication_rows methods async (#13304)" (#13312)
This reverts commit 5d4028f217.
2022-07-18 14:28:14 +01:00
Nick Mills-Barrett
5d4028f217
Make all process_replication_rows methods async (#13304)
More prep work for asyncronous caching, also makes all process_replication_rows methods consistent (presence handler already is so).

Signed off by Nick @ Beeper (@Fizzadar)
2022-07-17 22:19:43 +01:00
Sean Quah
1391a76cd2
Faster room joins: fix race in recalculation of current room state (#13151)
Bounce recalculation of current state to the correct event persister and
move recalculation of current state into the event persistence queue, to
avoid concurrent updates to a room's current state.

Also give recalculation of a room's current state a real stream
ordering.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-07 12:19:31 +00:00
Sean Quah
68db233f0c
Handle race between persisting an event and un-partial stating a room (#13100)
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.

We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.

To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.

All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.

On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.

The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.

We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.

`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.

`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.

Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-07-05 16:12:52 +01:00
David Robertson
97e9fbe1b2
Type annotations in synapse.databases.main.devices (#13025)
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2022-06-15 15:20:04 +00:00
Patrick Cloke
cf05258f76
Remove groups replication code. (#12900)
The replication logic for groups is no longer used, so the message
passing infrastructure can be removed.
2022-05-31 13:04:08 -04:00
Erik Johnston
1e453053cb
Rename storage classes (#12913) 2022-05-31 12:17:50 +00:00
reivilibre
39dee30f01
Send USER_IP commands on a different Redis channel, in order to reduce traffic to workers that do not process these commands. (#12809) 2022-05-20 15:28:23 +01:00
reivilibre
177b884ad7
Lay some foundation work to allow workers to only subscribe to some kinds of messages, reducing replication traffic. (#12672) 2022-05-19 16:29:08 +01:00
Andrew Morgan
83be72d76c
Add StreamKeyType class and replace string literals with constants (#12567) 2022-05-16 15:35:31 +00:00
Sean Quah
a559c8b0d9
Respect the @cancellable flag for ReplicationEndpoints (#12700)
While `ReplicationEndpoint`s register themselves via `JsonResource`,
they pass a method that calls the handler, instead of the handler itself,
to `register_paths`. As a result, `JsonResource` will not correctly pick
up the `@cancellable` flag and we have to apply it ourselves.

Signed-off-by: Sean Quah <seanq@element.io>
2022-05-11 12:25:39 +01:00
Shay
d80a7ab151
Update replication.md with info on TCP module structure (#12621) 2022-05-09 14:46:43 -07:00
Šimon Brandner
ef86cf3d28
Update _on_new_receipts() to work with MSC2285 changes. (#12636) 2022-05-05 13:25:51 +00:00
Erik Johnston
c0379d6e5b
Reduce log spam when running multiple event persisters (#12610) 2022-05-05 10:20:23 +01:00
Erik Johnston
d1cd96ce29
Add opentracing spans to calls to external cache (#12380) 2022-04-07 13:18:29 +01:00
Sean Quah
800ba87cc8
Refactor and convert Linearizer to async (#12357)
Refactor and convert `Linearizer` to async. This makes a `Linearizer`
cancellation bug easier to fix.

Also refactor to use an async context manager, which eliminates an
unlikely footgun where code that doesn't immediately use the context
manager could forget to release the lock.

Signed-off-by: Sean Quah <seanq@element.io>
2022-04-05 15:43:52 +01:00
Erik Johnston
66053b6bfb
Prefill more stream change caches. (#12372) 2022-04-05 14:26:41 +01:00
Erik Johnston
b446c99ac9
Prefill the device_list_stream_cache (#12367)
* Prefill the device_list_stream_cache

* Newsfile

* Newsfile
2022-04-04 20:12:25 +01:00
Erik Johnston
5c9e39e619
Track device list updates per room. (#12321)
This is a first step in dealing with #7721.

The idea is basically that rather than calculating the full set of users a device list update needs to be sent to up front, we instead simply record the rooms the user was in at the time of the change. This will allow a few things:

1. we can defer calculating the set of remote servers that need to be poked about the change; and
2. during `/sync` and `/keys/changes` we can avoid also avoid calculating users who share rooms with other users, and instead just look at the rooms that have changed.

However, care needs to be taken to correctly handle server downgrades. As such this PR writes to both `device_lists_changes_in_room` and the `device_lists_outbound_pokes` table synchronously. In a future release we can then bump the database schema compat version to `69` and then we can assume that the new `device_lists_changes_in_room` exists and is handled.

There is a temporary option to disable writing to `device_lists_outbound_pokes` synchronously, allowing us to test the new code path does work (and by implication upgrading to a future release and downgrading to this one will work correctly).

Note: Ideally we'd do the calculation of room to servers on a worker (e.g. the background worker), but currently only master can write to the `device_list_outbound_pokes` table.
2022-04-04 15:25:20 +01:00
reivilibre
f871222880
Move update_client_ip background job from the main process to the background worker. (#12251) 2022-04-01 13:08:55 +01:00
David Robertson
a2b00a4486
Bump black and click versions (#12320) 2022-03-29 10:41:19 +00:00
reivilibre
4a53f35737
Improve code documentation for the typing stream over replication. (#12211) 2022-03-11 14:00:15 +00:00
Patrick Cloke
3e4af36bc8
Rename get_tcp_replication to get_replication_command_handler. (#12192)
Since the object it returns is a ReplicationCommandHandler.

This is clean-up from adding support to Redis where the command handler
was added as an additional layer of abstraction from the TCP protocol.
2022-03-10 13:01:56 +00:00
Nick Mills-Barrett
180d8ff0d4
Retry some http replication failures (#12182)
This allows for the target process to be down for around a minute
which provides time for restarts during synapse upgrades/config updates.

Closes: #12178

Signed off by Nick Mills-Barrett nick@beeper.com
2022-03-09 14:53:28 +00:00
Patrick Cloke
d8bab6793c
Fix incorrect type hints for txredis. (#12042)
Some properties were marked as RedisProtocol instead of ConnectionHandler,
which wraps RedisProtocol instance(s).
2022-03-08 07:26:05 -05:00
Erik Johnston
423cca9efe
Spread out sending device lists to remote hosts (#12132) 2022-03-04 11:48:15 +00:00
Richard van der Hoff
e24ff8ebe3
Remove HomeServer.get_datastore() (#12031)
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
2022-02-23 11:04:02 +00:00
Erik Johnston
6d14b3dabf
Better error message when failing to request from another process (#12060) 2022-02-22 15:52:08 +00:00
Patrick Cloke
d0e78af35e
Add missing type hints to synapse.replication. (#11938) 2022-02-08 11:03:08 -05:00
Patrick Cloke
6c0984e3f0
Remove unnecessary ignores due to Twisted upgrade. (#11939)
Twisted 22.1.0 fixed some internal type hints, allowing Synapse
to remove ignore calls for parameters to connectTCP.
2022-02-08 09:15:59 -05:00
Patrick Cloke
63d90f10ec
Add missing type hints to synapse.replication.http. (#11856) 2022-02-08 07:44:39 -05:00
Richard van der Hoff
2277275485
Stop reading from event_reference_hashes (#11794)
Preparation for dropping this table altogether. Part of #6574.
2022-01-21 09:18:10 +00:00
Patrick Cloke
10a88ba91c
Use auto_attribs/native type hints for attrs classes. (#11692) 2022-01-13 13:49:28 +00:00
Richard van der Hoff
2359ee3864
Remove redundant get_current_events_token (#11643)
* Push `get_room_{min,max_stream_ordering}` into StreamStore

Both implementations of this are identical, so we may as well push it down and
get rid of the abstract base class nonsense.

* Remove redundant `StreamStore` class

This is empty now

* Remove redundant `get_current_events_token`

This was an exact duplicate of `get_room_max_stream_ordering`, so let's get rid
of it.

* newsfile
2022-01-04 16:10:27 +00:00
Patrick Cloke
cbd82d0b2d
Convert all namedtuples to attrs. (#11665)
To improve type hints throughout the code.
2021-12-30 18:47:12 +00:00
Sean Quah
5305a5e881
Type hint the constructors of the data store classes (#11555) 2021-12-13 17:05:00 +00:00
Quentin Gliech
a15a893df8
Save the OIDC session ID (sid) with the device on login (#11482)
As a step towards allowing back-channel logout for OIDC.
2021-12-06 12:43:06 -05:00
Sean Quah
ffd858aa68
Add type hints to synapse/storage/databases/main/events_worker.py (#11411)
Also refactor the stream ID trackers/generators a bit and try to
document them better.
2021-11-26 18:41:31 +00:00
Patrick Cloke
5cace20bf1
Add missing type hints to synapse.app. (#11287) 2021-11-10 15:06:54 -05:00
Nick Barrett
af54167516
Enable passing typing stream writers as a list. (#11237)
This makes the typing stream writer config match the other stream writers
that only currently support a single worker.
2021-11-03 14:25:47 +00:00
Brendan Abolivier
c7a5e49664
Implement an on_new_event callback (#11126)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2021-10-26 15:17:36 +02:00
Sean Quah
2b82ec425f
Add type hints for most HomeServer parameters (#11095) 2021-10-22 18:15:41 +01:00
Sean Quah
6a67f3786a
Fix logging context warnings when losing replication connection (#10984)
Instead of triggering `__exit__` manually on the replication handler's
logging context, use it as a context manager so that there is an
`__enter__` call to balance the `__exit__`.
2021-10-15 13:10:58 +01:00
Sean Quah
6b18eb4430
Fix opentracing and Prometheus metrics for replication requests (#10996)
This commit fixes two bugs to do with decorators not instrumenting
`ReplicationEndpoint`'s `send_request` correctly. There are two
decorators on `send_request`: Prometheus' `Gauge.track_inprogress()`
and Synapse's `opentracing.trace`.

`Gauge.track_inprogress()` does not have any support for async
functions when used as a decorator. Since async functions behave like
regular functions that return coroutines, only the creation of the
coroutine was covered by the metric and none of the actual body of
`send_request`.

`Gauge.track_inprogress()` returns a regular, non-async function
wrapping `send_request`, which is the source of the next bug.
The `opentracing.trace` decorator would normally handle async functions
correctly, but since the wrapped `send_request` is a non-async function,
the decorator ends up suffering from the same issue as
`Gauge.track_inprogress()`: the opentracing span only measures the
creation of the coroutine and none of the actual function body.

Using `Gauge.track_inprogress()` as a context manager instead of a
decorator resolves both bugs.
2021-10-12 11:23:46 +01:00