Commit Graph

136 Commits

Author SHA1 Message Date
Erik Johnston
b3b793786c
Fix sync waiting for an invalid token from the "future" (#17386)
Fixes https://github.com/element-hq/synapse/issues/17274, hopefully.

Basically, old versions of Synapse could advance streams without
persisting anything in the DB (fixed in #17229). On restart those
updates would get lost, and so the position of the stream would revert
to an older position. If this happened across an upgrade to a later
Synapse version which included #17215, then sync could get blocked
indefinitely (until the stream advanced to the position in the token).

We fix this by bounding the stream positions we'll wait for to the
maximum position of the underlying stream ID generator.
2024-07-02 12:39:49 +01:00
Erik Johnston
c89fea3fd1
Limit amount of replication we send (#17358)
Fixes up #17333, where we failed to actually send less data (the
`DISTINCT` didn't work due to `stream_id` being different).

We fix this by making it so that every device list outbound poke for a
given user ID has the same stream ID. We can't change the query to only
return e.g. max stream ID as the receivers look up the destinations to
send to by doing `SELECT WHERE stream_id = ?`
2024-06-25 11:17:39 +01:00
Erik Johnston
554a92601a
Reintroduce "Reduce device lists replication traffic."" (#17361)
Reintroduces https://github.com/element-hq/synapse/pull/17333


Turns out the reason for revert was down two master instances running
2024-06-25 10:34:34 +01:00
Erik Johnston
a98cb87bee
Revert "Reduce device lists replication traffic." (#17360)
Reverts element-hq/synapse#17333

It looks like master was still sending out replication RDATA with the
old format... somehow
2024-06-25 09:57:34 +01:00
Erik Johnston
cf711ac03c
Reduce device lists replication traffic. (#17333)
Reduce the replication traffic of device lists, by not sending every
destination that needs to be sent the device list update over
replication. Instead a "hosts to send to have been calculated"
notification over replication, and then federation senders read the
destinations from the DB.

For non federation senders this should heavily reduce the impact of a
user in many large rooms changing a device.
2024-06-24 14:15:13 +01:00
Erik Johnston
8c4937b216
Fix bug where device lists would break sync (#17292)
If the stream ID in the unconverted table is ahead of the device lists
ID gen, then it can break all /sync requests that had an ID from ahead
of the table.

The fix is to make sure we add the unconverted table to the list of
tables we check at start up.

Broke in https://github.com/element-hq/synapse/pull/17229
2024-06-10 15:56:57 +01:00
Erik Johnston
d16910ca02
Replaces all usages of StreamIdGenerator with MultiWriterIdGenerator (#17229)
Replaces all usages of `StreamIdGenerator` with `MultiWriterIdGenerator`, which is safer.
2024-05-30 11:07:32 +00:00
Erik Johnston
b71d277438
Reduce work of calculating outbound device pokes (#17211) 2024-05-22 13:55:18 +01:00
Erik Johnston
b5facbac0f
Improve perf of sync device lists (#17216)
Re-introduces #17191, and includes #17197 and #17214

The basic idea is to stop calling `get_rooms_for_user` everywhere, and
instead use the table `device_lists_changes_in_room`.

Commits reviewable one-by-one.
2024-05-21 16:48:20 +01:00
Erik Johnston
fd12003441
Revert "Improve perf of sync device lists" (#17207)
Reverts element-hq/synapse#17191
2024-05-16 16:07:54 +01:00
Erik Johnston
0b91ccce47
Improve perf of sync device lists (#17191)
It's almost always more efficient to query the rooms that have device
list changes, rather than looking at the list of all users whose devices
have changed and then look for shared rooms.
2024-05-14 14:39:04 +01:00
dependabot[bot]
1e68b56a62
Bump black from 23.10.1 to 24.2.0 (#16936) 2024-03-13 16:46:44 +00:00
Erik Johnston
23740eaa3d
Correctly mention previous copyright (#16820)
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
2024-01-23 11:26:48 +00:00
Erik Johnston
4c67f0391b
Split up deleting devices into batches (#16766)
Otherwise for users with large numbers of devices this can cause a lot
of woe.
2024-01-10 13:55:16 +00:00
Patrick Cloke
8e1e62c9e0 Update license headers 2023-11-21 15:29:58 -05:00
Patrick Cloke
ab3f1b3b53
Convert simple_select_one_txn and simple_select_one to return tuples. (#16612) 2023-11-09 11:13:31 -05:00
Patrick Cloke
9738b1c497
Avoid executing no-op queries. (#16583)
If simple_{insert,upsert,update}_many_txn is called without any data
to modify then return instead of executing the query.

This matches the behavior of simple_{select,delete}_many_txn.
2023-11-07 14:00:25 -05:00
Patrick Cloke
cfb6d38c47
Remove remaining usage of cursor_to_dict. (#16564) 2023-10-31 13:13:28 -04:00
Patrick Cloke
679c691f6f
Remove more usages of cursor_to_dict. (#16551)
Mostly to improve type safety.
2023-10-26 15:12:28 -04:00
Patrick Cloke
9407d5ba78
Convert simple_select_list and simple_select_list_txn to return lists of tuples (#16505)
This should use fewer allocations and improves type hints.
2023-10-26 13:01:36 -04:00
Patrick Cloke
a4904dcb04
Convert simple_select_many_batch, simple_select_many_txn to tuples. (#16444) 2023-10-11 13:24:56 -04:00
Patrick Cloke
fa907025f4
Remove manys calls to cursor_to_dict (#16431)
This avoids calling cursor_to_dict and then immediately
unpacking the values in the dict for other users. By not
creating the intermediate dictionary we can avoid allocating
the dictionary and strings for the keys, which should generally
be more performant.

Additionally this improves type hints by avoid Dict[str, Any]
dictionaries coming out of the database layer.
2023-10-05 11:07:38 -04:00
Patrick Cloke
d7c89c5908
Return immutable objects for cachedList decorators (#16350) 2023-09-19 15:26:44 -04:00
Erik Johnston
1cd410a783
Recheck if remote device is cached before requesting it (#16252)
This fixes a bug where we could get stuck re-requesting the device over
replication again and again.
2023-09-07 12:45:43 +00:00
Patrick Cloke
32fb264120 Merge remote-tracking branch 'origin/release-v1.92' into develop 2023-09-06 13:08:22 -04:00
Quentin Gliech
1940d990a3
Revert MSC3861 introspection cache, admin impersonation and account lock (#16258) 2023-09-06 15:19:51 +01:00
Mathieu Velten
4f1840a88a
Delete device messages asynchronously and in staged batches (#16240) 2023-09-06 09:30:53 +02:00
Shay
69048f7b48
Add an admin endpoint to allow authorizing server to signal token revocations (#16125) 2023-08-22 14:15:34 +00:00
Shay
0328b56468
Support MSC3814: Dehydrated Devices Part 2 (#16010) 2023-08-08 12:04:46 -07:00
pacien
07d7cbfe69
devices: use combined ANY clause for faster cleanup (#15861)
Old device entries for the same user were being removed in individual
SQL commands, making the batch take way longer than necessary.

This combines the commands into a single one with a IN/ANY clause.

Example of log entry before the change, regularly observed with
"log_min_duration_statement = 10000" in PostgreSQL's config:

    LOG:  duration: 42538.282 ms  statement:
    DELETE FROM device_lists_stream
    WHERE user_id = '@someone' AND device_id = 'someid1'
    AND stream_id < 123456789
    ;
    DELETE FROM device_lists_stream
    WHERE user_id = '@someone' AND device_id = 'someid2'
    AND stream_id < 123456789
    ;
    [repeated for each device ID of that user, potentially a lot...]

With the patch applied on my instance for the past couple of days, I
no longer notice overly long statements of that particular kind.

Signed-off-by: pacien <pacien.trangirard@pacien.net>
2023-07-03 16:39:38 +02:00
Erik Johnston
5ed0e8c61f
Cache requests for user's devices from federation (#15675)
This should mitigate the issue where lots of different servers requests
the same user's devices all at once.
2023-06-01 13:25:20 +00:00
Patrick Cloke
375b0a8a11
Update code to refer to "workers". (#15606)
A bunch of comments and variables are out of date and use
obsolete terms.
2023-05-16 15:56:38 -04:00
Erik Johnston
6204c3663e
Revert pruning of old devices (#15360)
* Revert "Fix registering a device on an account with lots of devices (#15348)"

This reverts commit f0d8f66eaa.

* Revert "Delete stale non-e2e devices for users, take 3 (#15183)"

This reverts commit 78cdb72cd6.
2023-03-31 13:51:51 +01:00
Erik Johnston
f0d8f66eaa
Fix registering a device on an account with lots of devices (#15348)
Fixes up #15183
2023-03-29 13:37:06 +00:00
Erik Johnston
78cdb72cd6
Delete stale non-e2e devices for users, take 3 (#15183)
This should help reduce the number of devices e.g. simple bots the repeatedly login rack up.

We only delete non-e2e devices as they should be safe to delete, whereas if we delete e2e devices for a user we may accidentally break their ability to receive e2e keys for a message.
2023-03-29 12:07:14 +01:00
Patrick Cloke
02f74f3a99
Combine AbstractStreamIdTracker and AbstractStreamIdGenerator. (#15192)
AbstractStreamIdTracker (now) has only a single sub-class: AbstractStreamIdGenerator,
combine them to simplify some code and remove any direct references to
AbstractStreamIdTracker.
2023-03-03 08:13:37 -05:00
dependabot[bot]
9bb2eac719
Bump black from 22.12.0 to 23.1.0 (#15103) 2023-02-22 15:29:09 -05:00
Sean Quah
d0c713cc85
Return read-only collections from @cached methods (#13755)
It's important that collections returned from `@cached` methods are not
modified, otherwise future retrievals from the cache will return the
modified collection.

This applies to the return values from `@cached` methods and the values
inside the dictionaries returned by `@cachedList` methods. It's not
necessary for the dictionaries returned by `@cachedList` methods
themselves to be read-only.

Signed-off-by: Sean Quah <seanq@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
2023-02-10 23:29:00 +00:00
Patrick Cloke
a481fb9f98
Refactor get_user_devices_from_cache to avoid mutating cached values. (#15040)
The previous version of the code could mutate a cached value,
but only if the input requested all devices of a user *and* a specific
device.

To avoid this nonsensical situation we no longer fetch a specific
device ID if all of a user's devices are returned.
2023-02-10 08:09:47 -05:00
Erik Johnston
fd296b7343
Fix exception on start up about device lists (#15041)
Fixes #15010.
2023-02-10 09:52:35 +00:00
Sean Quah
cf66d712c6
Fix initialization of _device_list_id_gen (#14914)
On startup, the `_device_list_id_gen` stream id generator is initialized
using the maximum stream id seen in a list of tables. When we started
populating the `device_list_remote_pending` table in #13913, we forgot
to add it to the aforementioned list of tables, so the stream id
generator can hand out old stream ids after a restart. The end result is
that Synapse can fail to handle device list update EDUs after a restart
when a partial state join is in progress.

Add the `device_list_remote_pending` table to the list of tables to
consider when initializing the `_device_list_id_gen` stream id generator.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-26 10:38:49 +00:00
Erik Johnston
65d0386693
Always notify replication when a stream advances (#14877)
This ensures that all other workers are told about stream updates in a timely manner, without having to remember to manually poke replication.
2023-01-20 18:02:18 +00:00
Erik Johnston
2b084c5b71
Merge device list replication streams (#14833) 2023-01-17 09:29:58 +00:00
reivilibre
ba4ea7d13f
Batch up replication requests to request the resyncing of remote users's devices. (#14716) 2023-01-10 11:17:59 +00:00
Nick Mills-Barrett
db1cfe9c80
Update all stream IDs after processing replication rows (#14723)
This creates a new store method, `process_replication_position` that
is called after `process_replication_rows`. By moving stream ID advances
here this guarantees any relevant cache invalidations will have been
applied before the stream is advanced.

This avoids race conditions where Python switches between threads mid
way through processing the `process_replication_rows` method where stream
IDs may be advanced before caches are invalidated due to class resolution
ordering.

See this comment/issue for further discussion:
	https://github.com/matrix-org/synapse/issues/14158#issuecomment-1344048703
2023-01-04 11:49:26 +00:00
reivilibre
74b89c2761
Revert the deletion of stale devices due to performance issues. (#14662) 2022-12-12 13:55:23 +00:00
Erik Johnston
94bc21e69f
Limit the number of devices we delete at once (#14649) 2022-12-09 13:31:32 +00:00
Erik Johnston
c2de2ca630
Delete stale non-e2e devices for users, take 2 (#14595)
This should help reduce the number of devices e.g. simple bots the repeatedly login rack up.

We only delete non-e2e devices as they should be safe to delete, whereas if we delete e2e devices for a user we may accidentally break their ability to receive e2e keys for a message.
2022-12-09 09:37:07 +00:00
Erik Johnston
cee9445884
Better return type for get_all_entities_changed (#14604)
Help callers from using the return value incorrectly by ensuring
that callers explicitly check if there was a cache hit or not.
2022-12-05 15:19:14 -05:00
Patrick Cloke
fac8a38525
Properly handle unknown results for the stream change cache. (#14592)
StreamChangeCache.get_all_changed_entities can return None to signify
it does not have information at the given stream position. Two callers (related
to device lists and presence) were treating this response the same as an empty
list (i.e. there being no updates).
2022-12-02 10:28:41 -05:00