#8567 started a span for every background process. This is good as it means all Synapse code that gets run should be in a span (unless in the sentinel logging context), but it means we generate about 15x the number of spans as we did previously.
This PR attempts to reduce that number by a) not starting one for send commands to Redis, and b) deferring starting background processes until after we're sure they're necessary.
I don't really know how much this will help.
Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. `get_current_token()`) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress.
This PR changes it so that the `MultiWriterIDGenerator` keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.
On startup `MultiWriteIdGenerator` fetches the maximum stream ID for
each instance from the table and uses that as its initial "current
position" for each writer. This is problematic as a) it involves either
a scan of events table or an index (neither of which is ideal), and b)
if rows are being persisted out of order elsewhere while the process
restarts then using the maximum stream ID is not correct. This could
theoretically lead to race conditions where e.g. events that are
persisted out of order are not sent down sync streams.
We fix this by creating a new table that tracks the current positions of
each writer to the stream, and update it each time we finish persisting
a new entry. This is a relatively small overhead when persisting events.
However for the cache invalidation stream this is a much bigger relative
overhead, so instead we note that for invalidation we don't actually
care about reliability over restarts (as there's no caches to
invalidate) and simply don't bother reading and writing to the new table
in that particular case.
The idea is to remove some of the places we pass around `int`, where it can represent one of two things:
1. the position of an event in the stream; or
2. a token that partitions the stream, used as part of the stream tokens.
The valid operations are then:
1. did a position happen before or after a token;
2. get all events that happened before or after a token; and
3. get all events between two tokens.
(Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)
slots use less memory (and attribute access is faster) while slightly
limiting the flexibility of the class attributes. This focuses on objects
which are instantiated "often" and for short periods of time.
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
The idea here is that we pass the `max_stream_id` to everything, and only use the stream ID of the particular event to figure out *when* the max stream position has caught up to the event and we can notify people about it.
This is to maintain the distinction between the position of an item in the stream (i.e. event A has stream ID 513) and a token that can be used to partition the stream (i.e. give me all events after stream ID 352). This distinction becomes important when the tokens are more complicated than a single number, which they will be once we start tracking the position of multiple writers in the tokens.
The valid operations here are:
1. Is a position before or after a token
2. Fetching all events between two tokens
3. Merging multiple tokens to get the "max", i.e. `C = max(A, B)` means that for all positions P where P is before A *or* before B, then P is before C.
Future PR will change the token type to a dedicated type.
`pusher_pool.on_new_notifications` expected a min and max stream ID, however that was not what we were passing in. Instead, let's just pass it the current max stream ID and have it track the last stream ID it got passed.
I believe that it mostly worked as we called the function for every event. However, it would break for events that got persisted out of order, i.e, that were persisted but the max stream ID wasn't incremented as not all preceding events had finished persisting, and push for that event would be delayed until another event got pushed to the effected users.
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
* Move `get_devices_with_keys_by_user` to `EndToEndKeyWorkerStore`
this seems a better fit for it.
This commit simply moves the existing code: no other changes at all.
* Rename `get_devices_with_keys_by_user`
to better reflect what it does.
* get_device_stream_token abstract method
To avoid referencing fields which are declared in the derived classes, make
`get_device_stream_token` abstract, and define that in the classes which define
`_device_list_id_gen`.
This fixes a bug where having multiple callers waiting on the same
stream and position will cause it to try and compare two deferreds,
which fails (due to the sorted list having an entry of `Tuple[int,
Deferred]`).
It's just a thin wrapper around two ID gens to make `get_current_token`
and `get_next` return tuples. This can easily be replaced by calling the
appropriate methods on the underlying ID gens directly.
The function is used for two purposes: 1) for subscribers of streams to
get a token they can use to get further updates with, and 2) for
replication to track position of the writers of the stream.
For streams with a single writer the two scenarios produce the same
result, however the situation becomes complicated for streams with
multiple writers. The current `MultiWriterIdGenerator` does not
correctly handle the first case (which is not an issue as its only used
for the `caches` stream which nothing subscribes to outside of
replication).
==============================
Bugfixes
--------
- Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
- Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967))
Internal Changes
----------------
- Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl8f/f8ACgkQOSor00I9
eP8/Uwf8CiVWvrBsmFZMvxJDkUWm0/f1kN4IQdm8ibDtyNyvFUx+Y1K8KOQS+VwG
a3bZqSC2Vv2sO9O9kR+V2tk831l+ujO0Nlaohuqyvhcl9lzh04rRYI9x9IHlAq2H
WPb0NMLwMufL6YkXDBwZT/G9TVW1vLRGASu4f7X2rXqek34VNVgYbg1hB2dp4dDa
wjKk3iBZ6h34IhKPgu0sLBUcyvX4U5xdOHjEG3HXvNnvDNO0HMD8rGB7065vFMD6
PH4nUK/h+RL0UBs2sJOMK1ZazFUODdURwANJQNAQ6pNvf9/RWgw2okka2bYIcmQQ
UT7tiwMsBvKdy4PER5fcDX3COY16qw==
=Q+bI
-----END PGP SIGNATURE-----
Merge tag 'v1.18.0rc2' into develop
Synapse 1.18.0rc2 (2020-07-28)
==============================
Bugfixes
--------
- Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
- Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967))
Internal Changes
----------------
- Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
IIRC this doesn't break tests because its only hit on reconnection, or something.
Basically, when a process needs to fetch missing updates for the `typing` stream it needs to query the writer instance via HTTP (as we don't write typing notifications to the DB), the problem was that the endpoint (`streams`) was only registered on master and specifically not on the typing writer worker.
Most of the stuff we do for replication commands can be done synchronously. There's no point spinning up background processes if we're not going to need them.
Handling of incoming typing stream updates from replication was not
hooked up on master, effecting set ups where typing was handled on a
different worker.
This is really only a problem if the master process is also handling
sync requests, which is unlikely for those that are at the stage of
moving typing off.
The other observable effect is that if a worker restarts or a
replication connect drops then the typing worker will issue a
`POSITION typing`, triggering master process to try and stream *all*
typing updates from position 0.
Fixes#7907
It serves no purpose and updating everytime we write to the device inbox
stream means all such transactions will conflict, causing lots of
transaction failures and retries.
When we get behind on replication, we tend to stack up background processes
behind a linearizer. Bg processes are heavy (particularly with respect to
prometheus metrics) and linearizers aren't terribly efficient once the queue
gets long either.
A better approach is to maintain a queue of requests to be processed, and
nominate a single process to work its way through the queue.
Fixes: #7444