Refactor how the `EventContext` class works, with the intention of reducing the amount of state we fetch from the DB during event processing.
The idea here is to get rid of the cached `current_state_ids` and `prev_state_ids` that live in the `EventContext`, and instead defer straight to the database (and its caching).
One change that may have a noticeable effect is that we now no longer prefill the `get_current_state_ids` cache on a state change. However, that query is relatively light, since its just a case of reading a table from the DB (unlike fetching state at an event which is more heavyweight). For deployments with workers this cache isn't even used.
Part of #12684
* Add mau_appservice_trial_days
* Add a test
* Tweaks
* changelog
* Ensure we sync after the delay
* Fix types
* Add config statement
* Fix test
* Reinstate logging that got removed
* Fix feature name
* Changes hidden read receipts to be a separate receipt type
(instead of a field on `m.read`).
* Updates the `/receipts` endpoint to accept `m.fully_read`.
The `latest_event` field of the bundled aggregations for `m.thread` relations
did not include bundled aggregations itself. This resulted in clients needing to
immediately request the event from the server (and thus making it useless that
the latest event itself was serialized instead of just including an event ID).
I've seen a few errors which can only plausibly be explained by the calculated
event id for an event being different from the ID of the event in the
database. It should be cheap to check this, so let's do so and raise an
exception.
This works by taking a row level lock on the `rooms` table at the start of both transactions, ensuring that they don't run at the same time. In the event persistence transaction we also check that there is an entry still in the `rooms` table.
I can't figure out how to do this in SQLite. I was just going to lock the table, but it seems that we don't support that in SQLite either, so I'm *really* confused as to how we maintain integrity in SQLite when using `lock_table`....
Over time we've begun to use newer versions of mypy, typeshed, stub
packages---and of course we've improved our own annotations. This makes
some type ignore comments no longer necessary. I have removed them.
There was one exception: a module that imports `select.epoll`. The
ignore is redundant on Linux, but I've kept it ignored for those of us
who work on the source tree using not-Linux. (#11771)
I'm more interested in the config line which enforces this. I want
unused ignores to be reported, because I think it's useful feedback when
annotating to know when you've fixed a problem you had to previously
ignore.
* Installing extras before typechecking
Lacking an easy way to install all extras generically, let's bite the bullet and
make install the hand-maintained `all` extra before typechecking.
Now that https://github.com/matrix-org/backend-meta/pull/6 is merged to
the release/v1 branch.
Try to avoid an OOM by checking fewer extremities.
Generally this is a big rewrite of _maybe_backfill, to try and fix some of the TODOs and other problems in it. It's best reviewed commit-by-commit.
Multiple calls to `EventsWorkerStore._get_events_from_cache_or_db` can
reuse the same database fetch, which is initiated by the first call.
Ensure that cancelling the first call doesn't cancel the other calls
sharing the same database fetch.
Signed-off-by: Sean Quah <seanq@element.io>
This will mainly be useful when dealing with module callbacks, which are
all typed as returning `Awaitable`s instead of coroutines or
`Deferred`s.
Signed-off-by: Sean Quah <seanq@element.io>
When we join a room via the faster-joins mechanism, we end up with "partial
state" at some points on the event DAG. Many parts of the codebase need to
wait for the full state to load. So, we implement a mechanism to keep track of
which events have partial state, and wait for them to be fully-populated.
We work through all the events with partial state, updating the state at each
of them. Once it's done, we recalculate the state for the whole room, and then
mark the room as having complete state.
* Add some type hints to datastore
* newsfile
* change `Collection` to `List`
* refactor return type of `select_users_txn`
* correct type hint in `stream.py`
* Remove `Optional` in `select_users_txn`
* remove not needed return type in `__init__`
* Revert change in `get_stream_id_for_event_txn`
* Remove import from `Literal`
Consider the requester's ignored users when calculating the
bundled aggregations.
See #12285 / 4df10d3214
for corresponding changes for the `/relations` endpoint.
Of note:
* No untyped defs in `register_new_matrix_user`
This one might be contraversial. `request_registration` has three
dependency-injection arguments used for testing. I'm removing the
injection of the `requests` module and using `unitest.mock.patch` in the
test cases instead.
Doing `reveal_type(requests)` and `reveal_type(requests.get)` before the
change:
```
synapse/_scripts/register_new_matrix_user.py:45: note: Revealed type is "Any"
synapse/_scripts/register_new_matrix_user.py:46: note: Revealed type is "Any"
```
And after:
```
synapse/_scripts/register_new_matrix_user.py:44: note: Revealed type is "types.ModuleType"
synapse/_scripts/register_new_matrix_user.py:45: note: Revealed type is "def (url: Union[builtins.str, builtins.bytes], params: Union[Union[_typeshed.SupportsItems[Union[builtins.str, builtins.bytes, builtins.int, builtins.float], Union[builtins.str, builtins.bytes, builtins.int, builtins.float, typing.Iterable[Union[builtins.str, builtins.bytes, builtins.int, builtins.float]], None]], Tuple[Union[builtins.str, builtins.bytes, builtins.int, builtins.float], Union[builtins.str, builtins.bytes, builtins.int, builtins.float, typing.Iterable[Union[builtins.str, builtins.bytes, builtins.int, builtins.float]], None]], typing.Iterable[Tuple[Union[builtins.str, builtins.bytes, builtins.int, builtins.float], Union[builtins.str, builtins.bytes, builtins.int, builtins.float, typing.Iterable[Union[builtins.str, builtins.bytes, builtins.int, builtins.float]], None]]], builtins.str, builtins.bytes], None] =, data: Union[Any, None] =, headers: Union[Any, None] =, cookies: Union[Any, None] =, files: Union[Any, None] =, auth: Union[Any, None] =, timeout: Union[Any, None] =, allow_redirects: builtins.bool =, proxies: Union[Any, None] =, hooks: Union[Any, None] =, stream: Union[Any, None] =, verify: Union[Any, None] =, cert: Union[Any, None] =, json: Union[Any, None] =) -> requests.models.Response"
```
* Drive-by comment in `synapse.storage.types`
* No untyped defs in `synapse_port_db`
This was by far the most painful. I'm happy to break this up into
smaller pieces for review if it's not managable as-is.
Principally, `prometheus_client.REGISTRY.register` now requires its argument to
extend `prometheus_client.Collector`.
Additionally, `Gauge.set` is now annotated so that passing `Optional[int]`
causes an error.
Refactor and convert `Linearizer` to async. This makes a `Linearizer`
cancellation bug easier to fix.
Also refactor to use an async context manager, which eliminates an
unlikely footgun where code that doesn't immediately use the context
manager could forget to release the lock.
Signed-off-by: Sean Quah <seanq@element.io>
This is a first step in dealing with #7721.
The idea is basically that rather than calculating the full set of users a device list update needs to be sent to up front, we instead simply record the rooms the user was in at the time of the change. This will allow a few things:
1. we can defer calculating the set of remote servers that need to be poked about the change; and
2. during `/sync` and `/keys/changes` we can avoid also avoid calculating users who share rooms with other users, and instead just look at the rooms that have changed.
However, care needs to be taken to correctly handle server downgrades. As such this PR writes to both `device_lists_changes_in_room` and the `device_lists_outbound_pokes` table synchronously. In a future release we can then bump the database schema compat version to `69` and then we can assume that the new `device_lists_changes_in_room` exists and is handled.
There is a temporary option to disable writing to `device_lists_outbound_pokes` synchronously, allowing us to test the new code path does work (and by implication upgrading to a future release and downgrading to this one will work correctly).
Note: Ideally we'd do the calculation of room to servers on a worker (e.g. the background worker), but currently only master can write to the `device_list_outbound_pokes` table.
Switching to a sequence means there's no need to track `last_txn` on the
AS state table to generate new TXN IDs. This also means that there is
no longer contention between the AS scheduler and AS handler on updates
to the `application_services_state` table, which will prevent serialization
errors during the complete AS txn transaction.
It seems like calling `_get_state_group_for_events` for an event where the
state is unknown is an error. Accordingly, let's raise an exception rather than
silently returning an empty result.
If we're missing most of the events in the room state, then we may as well call the /state endpoint, instead of individually requesting each and every event.
The PaginationChunk class attempted to bundle some properties
together, but really just caused callers to jump through hoops and
hid implementation details.
This endpoint was removed from MSC2675 before it was approved.
It is currently unspecified (even in any MSCs) and therefore subject to
removal. It is not implemented by any known clients.
This also changes the bundled aggregation format for `m.annotation`,
which previously included pagination tokens for the `/aggregations`
endpoint, which are no longer useful.
Document the behaviour of `LoggingTransaction.call_after` and
`LoggingTransaction.call_on_exception` when transactions are retried.
Signed-off-by: Sean Quah <seanq@element.io>
When we are processing a `/backfill` request from a remote server, exclude any
outliers from consideration early on. We can't return outliers anyway (since we
don't know the state at the outlier), and filtering them out earlier means that
we won't attempt to calulate the state for them.
This should speed up push rule calculations for rooms with large numbers of local users when the main push rule cache fails.
Co-authored-by: reivilibre <oliverw@matrix.org>
We fetch the thread summary in two phases:
1. The summary that is shared by all users (count of messages and latest event).
2. Whether the requesting user has participated in the thread.
There's no use in attempting step 2 for events which did not return a summary
from step 1.
* Formally type the UserProfile in user searches
* export UserProfile in synapse.module_api
* Update docs
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
To handle cancellation, we ensure that `after_callback`s and
`exception_callback`s are always run, since the transaction will
complete on another thread regardless of cancellation.
We also wait until everything is done before releasing the
`CancelledError`, so that logging contexts won't get used after they
have been finished.
Signed-off-by: Sean Quah <seanq@element.io>
The unstable identifiers are still supported if the experimental configuration
flag is enabled. The unstable identifiers will be removed in a future release.
This is allowed per MSC2675, although the original implementation did
not allow for it and would return an empty chunk / not bundle aggregations.
The main thing to improve is that the various caches get cleared properly
when an event is redacted, and that edits must not leak if the original
event is redacted (as that would presumably leak something similar to
the original event content).
* `@cached` can now take an `uncached_args` which is an iterable of names to not use in the cache key.
* Requires `@cached`, @cachedList` and `@lru_cache` to use keyword arguments for clarity.
* Asserts that keyword-only arguments in cached functions are not accepted. (I tested this briefly and I don't believe this works properly.)
When we get a partial_state response from send_join, store information in the
database about it:
* store a record about the room as a whole having partial state, and stash the
list of member servers too.
* flag the join event itself as having partial state
* also, for any new events whose prev-events are partial-stated, note that
they will *also* be partial-stated.
We don't yet make any attempt to interpret this data, so API calls (and a bunch
of other things) are just going to get incorrect data.
Don't attempt to add non-string `value`s to `event_search` and add a
background update to clear out bad rows from `event_search` when
using sqlite.
Signed-off-by: Sean Quah <seanq@element.io>
For users with large accounts it is inefficient to calculate the set of
users they share a room with (and takes a lot of space in the cache).
Instead we can look at users whose devices have changed since the last
sync and check if they share a room with the syncing user.
When the server leaves a room the `get_rooms_for_user` cache is not
correctly invalidated for the remote users in the room. This means that
subsequent calls to `get_rooms_for_user` for the remote users would
incorrectly include the room (it shouldn't be included because the
server no longer knows anything about the room).
Splits the search code into a few logical functions instead of a single
unreadable function.
There are also a few additional changes for readability.
After refactoring it was clear to see there were some unused and
unnecessary variables, which were simplified.
If the latest event in a thread was edited than the original
event content was included in bundled aggregation for
threads instead of the edited event content.
* Make `get_auth_chain_ids` return a Set
It has a set internally, and a set is often useful where it gets used, so let's
avoid converting to an intermediate list.
* Minor refactors in `on_send_join_request`
A little bit of non-functional groundwork
* Implement MSC3706: partial state in /send_join response
This should reduce database usage when fetching bundled aggregations
as the number of individual queries (and round trips to the database) are
reduced.
If ther are more than 100 to-device messages pending for a device
`/sync` will only return the first 100, however the next batch token was
incorrectly calculated and so all other pending messages would be
dropped.
This is due to `txn.rowcount` only returning the number of rows that
*changed*, rather than the number *selected* in SQLite.
This should reduce database usage when fetching bundled aggregations
as the number of individual queries (and round trips to the database) are
reduced.
The get_users_in_room and get_users_in_room_with_profiles
are now only invalidated when the membership of a room changes,
instead of during any state change in the room.
* Include 'prev_content' field in AS events
Signed-off-by: Vaishnav Nair <nairvaishnav007@icloud.com>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
This is some odds and ends found during the review of #11791
and while continuing to work in this code:
* Return attrs classes instead of dictionaries from some methods
to improve type safety.
* Call `get_bundled_aggregations` fewer times.
* Adds a missing assertion in the tests.
* Do not return empty bundled aggregations for an event (preferring
to not include the bundle at all, as the docstring states).
==============================
Bugfixes
--------
- Fix a bug introduced in Synapse 1.40.0 that caused Synapse to fail to process incoming federation traffic after handling a large amount of events in a v1 room. ([\#11806](https://github.com/matrix-org/synapse/issues/11806))
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAmHum5QTHGFuZHJld0Bh
bW9yZ2FuLnh5egAKCRCIhIgNLv5f9G6vD/9Dw6V0687vzahnU6aYWaecRSO1sbww
EtcCiXOh0r8HXPAwEIXJYomSTTbPl0eAwP8T1the1WZVQArvRW2VvMzDoBheO8bt
dCm4CTKFCGUsI/GrlFDPtjAEd6kruAETmpHZn7bAFSqtIpissD6FjUwg/ND/NJfM
zVM/bMgM0+js6eD9J5/k16E1ZWj7Lbp+/fKN+qTeQrXzIeT13WQZ4Nz1o/cqe/21
o/coI3FmYq8CzKpfA6qZMDd/OtYpYqwwr7otSmW+6qHeZI/yqeoxlpzfgSZGpRUe
mtXqbQllZQeqrbm8oK96GNmKcThy6awTwJPoD46b1AOkmFdf/lGSqO0lnTQRRPqR
hyPPdrx6Lt2t7DDbVVyUElzkqLPhVJtrItPDC8669sWvmSgsPQdUqIsRmhF+8aJe
ffjRvKbGRICnNZkT+qGf1HwuMHajZyAIHAS/kyFDKUZCvau3VQ1wlyOqVQ1hWr7+
3k3CobjSx4y1bYKXncAK6hWH6lE8M319jaTnVfYXocDLWRonyFf7o286Q+c8WBGF
tY0FzvUPb5S2kktC4WLwcqtcTWK1cu5MI6GfD69EqL7iifJRVyKDeUoD7tiUvgzN
O2wBl2soJIsAU+8y16WQu7p+k2nZIomCCAySnW3C8mlxvv7CKQu0Url8J7IH8uZE
ZVN2MMe7H+ZHJQ==
=Fpsa
-----END PGP SIGNATURE-----
Merge tag 'v1.51.0rc2' into develop
Synapse 1.51.0rc2 (2022-01-24)
==============================
Bugfixes
--------
- Fix a bug introduced in Synapse 1.40.0 that caused Synapse to fail to process incoming federation traffic after handling a large amount of events in a v1 room. ([\#11806](https://github.com/matrix-org/synapse/issues/11806))
* remove reference in comments to python3.6
* upgrade tox python env in script
* bump python version in example for completeness
* upgrade python version requirement in setup doc
* upgrade necessary python version in __init__.py
* upgrade python version in setup.py
* newsfragment
* drops refs to bionic and replace with focal
* bump refs to postgres 9.6 to 10
* fix hanging ci
* try installing tzdata first
* revert change made in b979f336
* ignore new random mypy error while debugging other error
* fix lint error for temporary workaround
* revert change to install list
* try passing env var
* export debian frontend var?
* move line and add comment
* bump pillow dependency
* bump lxml depenency
* install libjpeg-dev for pillow
* bump automat version to one compatible with py3.8
* add libwebp for pillow
* bump twisted trunk python version
* change suffix of newsfragment
* remove redundant python 3.7 checks
* lint
Debug for #8631.
I'm having a hard time tracking down what's going wrong in that issue.
In the reported example, I could see server A sending federation traffic
to server B and all was well. Yet B reports out-of-sync device updates
from A.
I couldn't see what was _in_ the events being sent from A to B. So I
have added some crude logging to track
- when we have updates to send to a remote HS
- the edus we actually accumulate to send
- when a federation transaction includes a device list update edu
- when such an EDU is received
This is a bit of a sledgehammer.
I've never found this terribly useful. I think it was added in the early days
of Synapse, without much thought as to what would actually be useful to log,
and has just been cargo-culted ever since.
Rather, it tends to clutter up debug logs with useless information.
Always add state.room_id after the configurable ORDER BY. Otherwise,
for any sort, certain pages can contain results from
other pages. (Especially when sorting by creator, since there may
be many rooms by the same creator)
* Document different order direction of numerical fields
"joined_members", "joined_local_members", "version" and "state_events"
are ordered in descending direction by default (dir=f). Added a note
in tests to explain the differences in ordering.
Signed-off-by: Daniël Sonck <daniel@sonck.nl>
This makes the serialization of events synchronous (and it no
longer access the database), but we must manually calculate and
provide the bundled aggregations.
Overall this should cause no change in behavior, but is prep work
for other improvements.
* Push `get_room_{min,max_stream_ordering}` into StreamStore
Both implementations of this are identical, so we may as well push it down and
get rid of the abstract base class nonsense.
* Remove redundant `StreamStore` class
This is empty now
* Remove redundant `get_current_events_token`
This was an exact duplicate of `get_room_max_stream_ordering`, so let's get rid
of it.
* newsfile
A couple of safety-checks to hopefully stop people doing what I just did, and create a storage
function which only works the first time it is called (and not when it is re-run due to a database
concurrency error or similar).
Create a new dict helper method `simple_insert_many_values_txn`, which takes
raw row values, rather than {key=>value} dicts. This saves us a bunch of dict
munging, and makes it easier to use generators rather than creating
intermediate lists and dicts.
This mainly consists of docstrings and inline comments. There are one or two type annotations and variable renames thrown in while I was here.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030
Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about.
```
GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
{
"event_id": ...
"origin_server_ts": ...
}
```
Federation API endpoint:
```
GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
{
"event_id": ...
"origin_server_ts": ...
}
```
Co-authored-by: Erik Johnston <erik@matrix.org>
Part of https://github.com/matrix-org/synapse/issues/11300
Call stack:
- `_persist_events_and_state_updates` (added `use_negative_stream_ordering`)
- `_persist_events_txn`
- `_update_room_depths_txn` (added `update_room_forward_stream_ordering`)
- `_update_metadata_tables_txn`
- `_store_room_members_txn` (added `inhibit_local_membership_updates`)
Using keyword-only arguments (`*`) to reduce the mistakes from `backfilled` being left as a positional argument somewhere and being interpreted wrong by our new arguments.
The previous fix for the ongoing event fetches counter
(8eec25a1d9) was both insufficient and
incorrect.
When the database is unreachable, `_do_fetch` never gets run and so
`_event_fetch_ongoing` is never decremented.
The previous fix also moved the `_event_fetch_ongoing` decrement outside
of the `_event_fetch_lock` which allowed race conditions to corrupt the
counter.
* remove background update code related to deprecated config flag
* changelog entry
* update changelog
* Delete 11394.removal
Duplicate, wrong number
* add no-op background update and change newfragment so it will be consolidated with associated work
* remove unused code
* Remove code associated with deprecated flag from legacy docker dynamic config file
Co-authored-by: reivilibre <oliverw@matrix.org>
Instead of only known relation types. This also reworks the background
update for thread relations to crawl events and search for any relation
type, not just threaded relations.
Adds validation to the Client-Server API to ensure that
the potential thread head does not relate to another event
already. This results in not allowing a thread to "fork" into
other threads.
If the target event is unknown for some reason (maybe it isn't
visible to your homeserver), but is the target of other events
it is assumed that the thread can be created from it. Otherwise,
it is rejected as an unknown event.
==============================
Bugfixes
--------
- Fix a bug introduced in 1.47.0rc1 which caused worker processes to not halt startup in the presence of outstanding database migrations. ([\#11346](https://github.com/matrix-org/synapse/issues/11346))
- Fix a bug introduced in 1.47.0rc1 which prevented the 'remove deleted devices from `device_inbox` column' background process from running when updating from a recent Synapse version. ([\#11303](https://github.com/matrix-org/synapse/issues/11303), [\#11353](https://github.com/matrix-org/synapse/issues/11353))
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAmGTw7wTHGFuZHJld0Bh
bW9yZ2FuLnh5egAKCRCIhIgNLv5f9FUdD/9EgimhmOUuwcbNQS+zJJo6qhbnxngA
uOIWGhuoBq9vyZjx3s1mwRO/sCuBc6KRs2JYzhntfHwV6FCgkC0QJXPkrYO7mkbg
L/QsET3QYrWwClMN1y/j2/rstOqGtgrtg6pPgd9LiQH+LjdM+vzAyrgxJRyhk9Sx
7184Y9QpEt/6ApqFIbXZewP+zH1z1QyVeZ1WiuHsqhjcrrcpv1t8Rq0fv4FxgIVP
Tmy3T2uw0FeVoQIL8BnNVmRGbLjlSgpDfiXZTwuXGS/uLtTe1tnHf0QsKEK7XZxY
yUsRKQVDy76GW9AJuWKFXa9Voy5gkjLWnvMwHvr+B1WcJi/5cjz9lP2A0Bsm6vhQ
ivrktlhlw7O3x0EvrL1r4z2gRY4GXxyZKOBLRsY4AAIMblNR/e94SmK0rLoKIQn8
Rp50VV2B4cf3WLxoqHBpJh7CDFisth1nXMnETU3y0VgAYMg2+cxqVDawTSTH9v7Q
QeGI10QCMmeFwGoL6lyVJ+qhyUqFeFOqTrNqzigsiB1qule+w1fEEl7cvI+Om/QM
Wjkvw6fpOvUGBN1OOARL5Qnm4rjxa0Ld2vw5vugkGiDTV8P3TMJP1JFQ44IX2GKP
yGJH92ac0qy6vQ2za1gL/w/U3miNblHUIvAmpkaIIlvovQJH9dU5dMsxnvi+QZU+
Auyqh9rwiWiizA==
=iNww
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE1508oLYUKainYFJakD7OEIo53t0FAmGT0mcACgkQkD7OEIo5
3t0sSA/+KjLlnYe3xO9Pbeeq7/E6Qs3VTMbwERcVVj0WZQU8Hik/EH+p0zsnG9jU
8t9UVs0xnu1r5ZMNsrMnfOU5SR9o7RM8UBCHnIYde8yFf1N6FI0YgGJ8nOP96gSG
dKZQOCKA9psQqd0pvhwCem4ITmBvVz2CrnUZYkKp9rHjUZeBYwE7cWmHL5W/WzT+
Sv0zafhvYi76PxMKdXu08MArMZ9JpCLbZnlzaoW/sY+xvnQBRfQviqYkj+qZhQjr
tPnJlQN5FS1pO1ZHd9o0mVcdbDBwLqeQlqC1toFuZXbed6e558KtCzvizM5VAfF8
peRGVFarbZpD/QcvoFljoH9qIECrcAYIN1HeE8aX7pIedh2AVROyINumdnbmsMJH
F4OyX/aLb3KbbecHJaVLQ2c/KyDnSnxr6Fs3HlEslD7DRdA0TS7XsI/BVER0pscp
j2Q5nLOthpV2d1SVekj4Ge/Hr++AuTnriHldEx9OlGI3/74ldrNL2c/L5gM3ue1W
qnPQfr5ehtIitqa/ROiexxSoWS5OC953UujTCwBgil9mAD9gC7mhoEFOAFVuMHfN
zBjsunMGBRYsGcw2umTPGPd7D3Gi1FXQrduN1xUoV8g8vmJld4GDjqJsitiaR3lv
NUsQ5JzttakDUwAJU7qijOo2Y/HtSs5E2nF66bMCSmwHvl9AkUc=
=c7gi
-----END PGP SIGNATURE-----
Merge tag 'v1.47.0rc3' into develop
Synapse 1.47.0rc3 (2021-11-16)
==============================
Bugfixes
--------
- Fix a bug introduced in 1.47.0rc1 which caused worker processes to not halt startup in the presence of outstanding database migrations. ([\#11346](https://github.com/matrix-org/synapse/issues/11346))
- Fix a bug introduced in 1.47.0rc1 which prevented the 'remove deleted devices from `device_inbox` column' background process from running when updating from a recent Synapse version. ([\#11303](https://github.com/matrix-org/synapse/issues/11303), [\#11353](https://github.com/matrix-org/synapse/issues/11353))
It already seems to pass mypy. I wonder what changed, given that it was
on the exclusion list. So this commit consists of me ensuring
`--disallow-untyped-defs` passes and a minor fixup to a function that
returned either `True` or `None`.
* remove unused tables room_stats_historical and user_stats_historical
* update changelog number
* Bump schema compat version comment
* make linter happy
* Update comment to give more info
Co-authored-by: reivilibre <oliverw@matrix.org>
Co-authored-by: reivilibre <oliverw@matrix.org>
* Prefer `HTTPStatus` over plain `int`
This is an Opinion that no-one has seemed to object to yet.
* `--disallow-untyped-defs` for `tests.rest.client.test_directory`
* Improve synapse's annotations for deleting aliases
* Test case for deleting a room alias
* Changelog
* change display names/avatar URLS to None if they contain null bytes
* add changelog
* add POC test, requested changes
* add a saner test and remove old one
* update test to verify that display name has been changed to None
* make test less fragile
* Make DataStore inherit from EventForwardExtremitiesStore before CacheInvalidationWorkerStore
the former implicitly inherits from the latter, so they should be
ordered like this when used.
Co-authored-by: Dirk Klimpel <5740567+dklimpel@users.noreply.github.com>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Make lock better handle process being killed
If the process gets killed and restarted (so that it didn't have a
chance to drop its locks gracefully) then there may still be locks in
the DB that are for the same instance that haven't yet timed out but are
safe to delete.
We handle this case by a) checking if the current instance already has
taken out the lock, and b) if not then ignoring locks that are for the
same instance.
* Periodically check for old staged events
This is to protect against other instances dying and their locks timing
out.
When an event fetcher aborts due to an exception, `_event_fetch_ongoing`
must be decremented, otherwise the event fetcher would never be
replaced. If enough event fetchers were to fail, no more events would be
fetched and requests would get stuck waiting for events.
Users admin API can now also modify user
type in addition to allowing it to be
set on user creation.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
Resolve and share `state_groups` for all historical events in batch. This also helps for showing the appropriate avatar/displayname in Element and will work whenever `/messages` has one of the historical messages as the first message in the batch.
This does have the flaw where if you just insert a single historical event somewhere, it probably won't resolve the state correctly from `/messages` or `/context` since it will grab a non historical event above or below with resolved state which never included the historical state back then. For the same reasions, this also does not work in Element between the transition from actual messages to historical messages. In the Gitter case, this isn't really a problem since all of the historical messages are in one big lump at the beginning of the room.
For a future iteration, might be good to look at `/messages` and `/context` to additionally add the `state` for any historical messages in that batch.
---
How are the `state_groups` shared? To illustrate the `state_group` sharing, see this example:
**Before** (new `state_group` for every event 😬, very inefficient):
```
# Tests from https://github.com/matrix-org/complement/pull/206
$ COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh TestBackfillingHistory/parallel/should_resolve_member_state_events_for_historical_events
create_new_client_event m.room.member event=$_JXfwUDIWS6xKGG4SmZXjSFrizhARM7QblhATVWWUcA state_group=None
create_new_client_event org.matrix.msc2716.insertion event=$1ZBfmBKEjg94d-vGYymKrVYeghwBOuGJ3wubU1-I9y0 state_group=9
create_new_client_event org.matrix.msc2716.insertion event=$Mq2JvRetTyclPuozRI682SAjYp3GqRuPc8_cH5-ezPY state_group=10
create_new_client_event m.room.message event=$MfmY4rBQkxrIp8jVwVMTJ4PKnxSigpG9E2cn7S0AtTo state_group=11
create_new_client_event m.room.message event=$uYOv6V8wiF7xHwOMt-60d1AoOIbqLgrDLz6ZIQDdWUI state_group=12
create_new_client_event m.room.message event=$PAbkJRMxb0bX4A6av463faiAhxkE3FEObM1xB4D0UG4 state_group=13
create_new_client_event org.matrix.msc2716.batch event=$Oy_S7AWN7rJQe_MYwGPEy6RtbYklrI-tAhmfiLrCaKI state_group=14
```
**After** (all events in batch sharing `state_group=10`) (the base insertion event has `state_group=8` which matches the `prev_event` we're inserting next to):
```
# Tests from https://github.com/matrix-org/complement/pull/206
$ COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh TestBackfillingHistory/parallel/should_resolve_member_state_events_for_historical_events
create_new_client_event m.room.member event=$PWomJ8PwENYEYuVNoG30gqtybuQQSZ55eldBUSs0i0U state_group=None
create_new_client_event org.matrix.msc2716.insertion event=$e_mCU7Eah9ABF6nQU7lu4E1RxIWccNF05AKaTT5m3lw state_group=9
create_new_client_event org.matrix.msc2716.insertion event=$ui7A3_GdXIcJq0C8GpyrF8X7B3DTjMd_WGCjogax7xU state_group=10
create_new_client_event m.room.message event=$EnTIM5rEGVezQJiYl62uFBl6kJ7B-sMxWqe2D_4FX1I state_group=10
create_new_client_event m.room.message event=$LGx5jGONnBPuNhAuZqHeEoXChd9ryVkuTZatGisOPjk state_group=10
create_new_client_event m.room.message event=$wW0zwoN50lbLu1KoKbybVMxLbKUj7GV_olozIc5i3M0 state_group=10
create_new_client_event org.matrix.msc2716.batch event=$5ZB6dtzqFBCEuMRgpkU201Qhx3WtXZGTz_YgldL6JrQ state_group=10
```
The following scenarios would halt the user directory updater:
- user joins room
- user leaves room
- user present in room which switches from private to public, or vice versa.
for two classes of users:
- appservice senders
- users missing from the user table.
If this happened, the user directory would be stuck, unable to make forward progress.
Exclude both cases from the user directory, so that we ignore them.
Co-authored-by: Eric Eastwood <erice@element.io>
Co-authored-by: reivilibre <oliverw@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
The race allowed the current position to advance too far when stream IDs
are still being persisted.
This happened when it received a new stream ID from a remote write
between a new stream ID being allocated and it being added to the set of
unpersisted stream IDs.
Fixes#9424.
Make `get_last_client_by_ip` return the same dictionary structure
regardless of whether the data has been persisted to the database.
This change will allow slightly cleaner type hints to be applied later
on.
Updating mypy past version 0.9 means that third-party stubs are no-longer distributed with typeshed. See http://mypy-lang.blogspot.com/2021/06/mypy-0900-released.html for details.
We therefore pull in stub packages in setup.py
Additionally, some modules that we were previously ignoring import failures for now have stubs. So let's use them.
The rest of this change consists of fixups to make the newer mypy + stubs pass CI.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
This removes the magic allowing accessing configurable
variables directly from the config object. It is now required
that a specific configuration class is used (e.g. `config.foo`
must be replaced with `config.server.foo`).
There are two steps to rebuilding the user directory:
1. a scan over rooms, followed by
2. a scan over local users.
The former reads avatars and display names from the `room_memberships`
table and therefore contains potentially private avatars and
display names. The latter reads from the the `profiles` table which only
contains public data; moreover it will overwrite any private profiles
that the rooms scan may have written to the user directory. This means
that the rebuild could leak private user while the rebuild was in
progress, only to later cover up the leaks once the rebuild had completed.
This change skips over local users when writing user_directory rows
when scanning rooms. Doing so means that it'll take longer for a rebuild
to make local users searchable, which is unfortunate. I think a future
PR can improve this by swapping the order of the two steps above. (And
indeed there's more to do here, e.g. copying from `profiles` without
going via Python.)
Small tidy-ups while I'm here:
* Remove duplicated code from test_initial. This was meant to be pulled into `purge_and_rebuild_user_dir`.
* Move `is_public` before updating sharing tables. No functional change; it's still before the first read of `is_public`.
* Don't bother creating a set from dict keys. Slightly nicer and makes the code simpler.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
We correctly allowed using the MSC2716 batch endpoint for
the room creator in existing room versions but accidentally didn't track
the events because of a logic flaw.
This prevented you from connecting subsequent chunks together because it would
throw the unknown batch ID error.
We only want to process MSC2716 events when:
- The room version supports MSC2716
- Any room where the homeserver has the `msc2716_enabled` experimental feature enabled and the event is from the room creator
* add test
* add function to remove user from monthly active table in deactivate code
* add function to remove user from monthly active table
* add changelog entry
* update changelog number
* requested changes
* update docstring on new function
* fix lint error
* Update synapse/storage/databases/main/monthly_active_users.py
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
==============================
Bugfixes
--------
- Fix a bug introduced in Synapse v1.40.0 where changing a user's display name or avatar in a restricted room would cause an authentication error. ([\#10933](https://github.com/matrix-org/synapse/issues/10933))
- Fix `/admin/whois/{user_id}` endpoint, which was broken in v1.44.0rc1. ([\#10968](https://github.com/matrix-org/synapse/issues/10968))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdVkXOgzrGzds0jtrHgFcFF8ZFs0FAmFbCRUACgkQHgFcFF8Z
Fs1n2Q/+P6WZmqNPEyPb4zgEhHgBZtAiYsS8i+6TBdu/viricfe3xmiifmKYyblV
C07a8Qu14OMgmXKYR8AY0HvLRx+zdhJFGuYa3I0+yY1GTlQWlf+OKnQtsbgdmJ6O
b8HbmGB1qJCJ0rbbWhCQvx/XXlqqMIATc6qJj3HTwOpWxNL8TymOFg6TGtb+rLkN
/s2NfxWPJqKMKTLVgZUVjkNGBBtJmu/+Ow3PYz7J9hEW69REONed/014wVgiWx+W
BJOoUj8dHAS5ZhdKWiSKAzl5A9FQyxUPDwkO44wsK5OIu+dVy4eF9HCMi0EP/9nT
G3VWwr3z/TA55foL8XdrIPdp1SFqJmIEwDLgaibISD1/8MoC15YzAkt4CoKYOsL6
EQtDDQal1BNSFZPITz7PWSFGOMgN3tKfMQUqCC+eagHvNfSxVH+J1zNg7ve2/24h
PbRU/tt4mJiPu7M2Ejj0EWNHyI2fp92ARzzAzQ0JlrPJe34Z/hAiqf7w4kjtJ7Ew
Lm890EAw2azo7RYU9xevkOsU2CEtLEnKsGMW8pJc9eRxUjRKfk/EW39dCQq9Myhu
uQFeYHALcn4vu9jGhGyoO9fJDKIxpM76h37Cwu7shg84Gp2ZwucSjwW2l2rZKiIS
YUeruLOavUueUZYTcyXxAqAAsH8z1hbKmnXIbNxFrcu+NOl+o84=
=HR3N
-----END PGP SIGNATURE-----
Merge tag 'v1.44.0rc3' into develop
Synapse 1.44.0rc3 (2021-10-04)
==============================
Bugfixes
--------
- Fix a bug introduced in Synapse v1.40.0 where changing a user's display name or avatar in a restricted room would cause an authentication error. ([\#10933](https://github.com/matrix-org/synapse/issues/10933))
- Fix `/admin/whois/{user_id}` endpoint, which was broken in v1.44.0rc1. ([\#10968](https://github.com/matrix-org/synapse/issues/10968))
* Introduce `should_include_local_users_in_dir`
We exclude three kinds of local users from the user_directory tables. At
present we don't consistently exclude all three in the same places. This
commit introduces a new function to gather those exclusion conditions
together. Because we have to handle local and remote users in different
ways, I've made that function only consider the case of remote users.
It's the caller's responsibility to make the local versus remote
distinction clear and correct.
A test fixup is required. The test now hits a path which makes db
queries against the users table. The expected rows were missing, because
we were using a dummy user that hadn't actually been registered.
We also add new test cases to covert the exclusion logic.
----
By my reading this makes these changes:
* When an app service user registers or changes their profile, they will
_not_ be added to the user directory. (Previously only support and
deactivated users were excluded). This is consistent with the logic that
rebuilds the user directory. See also [the discussion
here](https://github.com/matrix-org/synapse/pull/10914#discussion_r716859548).
* When rebuilding the directory, exclude support and disabled users from
room sharing tables. Previously only appservice users were excluded.
* Exclude all three categories of local users when rebuilding the
directory. Previously `_populate_user_directory_process_users` didn't do
any exclusion.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Pull out GetUserDirectoryTables helper
* Don't rebuild the dir in tests that don't need it
In #10796 I changed registering a user to add directory entries under.
This means we don't have to force a directory regbuild in to tests of
the user directory search.
* Move test_initial to tests/storage
* Add type hints to both test_user_directory files
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Before Synapse 1.31 (#9411), we relied on `outlier` being stored in the
`internal_metadata` column. We can now assume nobody will roll back their
deployment that far and drop the legacy support.
* Improve typing in user_directory files
This makes the user_directory.py in storage pass most of mypy's
checks (including `no-untyped-defs`). Unfortunately that file is in the
tangled web of Store class inheritance so doesn't pass mypy at the moment.
The handlers directory has already been mypyed.
Co-authored-by: reivilibre <olivier@librepush.net>
This change adds a check for row existence before accessing row element, this should fix issue #10669
Signed-off-by: Vasya Boytsov vasiliy.boytsov@phystech.edu
This avoids the overhead of searching through the various
configuration classes by directly referencing the class that
the attributes are in.
It also improves type hints since mypy can now resolve the
types of the configuration variables.
* add test to check if null code points are being inserted
* add logic to detect and replace null code points before insertion into db
* lints
* add license to test
* change approach to null substitution
* add type hint for SearchEntry
* Add changelog entry
Signed-off-by: H.Shay <shaysquared@gmail.com>
* updated changelog
* update chanelog message
* remove duplicate changelog
* Update synapse/storage/databases/main/events.py remove extra space
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* rename and move test file, update tests, delete old test file
* fix typo in comments
* update _find_highlights_in_postgres to replace null byte with space
* replace null byte in sqlite search insertion
* beef up and reorganize test for this pr
* update changelog
* add type hints and update docstring
* check db engine directly vs using env variable
* refactor tests to be less repetetive
* move rplace logic into seperate function
* requested changes
* Fix typo.
* Update synapse/storage/databases/main/search.py
Co-authored-by: reivilibre <olivier@librepush.net>
* Update changelog.d/10820.misc
Co-authored-by: Aaron Raimist <aaron@raim.ist>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: reivilibre <olivier@librepush.net>
Co-authored-by: Aaron Raimist <aaron@raim.ist>
The invalidation was missing in `_claim_e2e_one_time_key_returning`,
which is used on SQLite 3.24+ and Postgres. This could break e2ee if
nothing else happened to invalidate the caches before the keys ran out.
Signed-off-by: Tulir Asokan <tulir@beeper.com>
The deprecated /initialSync endpoint maintains a cache of responses,
using parameter values as part of the cache key. When a `from` or `to`
parameter is specified, it gets converted into a `StreamToken`, which
contains a `RoomStreamToken` and forms part of the cache key.
`RoomStreamToken`s need to be made hashable for this to work.
It's a simplification, but one that'll help make the user directory logic easier
to follow with the other changes upcoming. It's not strictly required for those
changes, but this will help simplify the resulting logic that listens for
`m.room.member` events and generally make the logic easier to follow.
This means the config option `search_all_users` ends up controlling the
search query only, and not the data we store. The cost of doing so is an
extra row in the `user_directory` and `user_directory_search` tables for
each local user which
- belongs to no public rooms
- belongs to no private rooms of size ≥ 2
I think the cost of this will be marginal (since they'll already have entries
in `users` and `profiles` anyway).
As a small upside, a homeserver whose directory was built with this
change can toggle `search_all_users` without having to rebuild their
directory.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
I had one of these error messages yesterday and assumed it was an
invalid auth token (because that was an HTTP query parameter in the
test) I was working on. In fact, it was an invalid next batch token for
syncing.
We've already batched up the events previously, and assume in other
places in the events.py file that we have. Removing this makes it easier
to adjust the batch sizes in one place.