Fix MSC3030 `/timestamp_to_event` endpoint returning `outliers` that it has no idea whether are near a gap or not (and therefore unable to determine whether it's actually the closest event). The reason Synapse doesn't know whether an `outlier` is next to a gap is because our gap checks rely on entries in the `event_edges`, `event_forward_extremeties`, and `event_backward_extremities` tables which is [not the case for `outliers`](2c63cdcc3f/docs/development/room-dag-concepts.md (outliers)).
Also fixes MSC3030 Complement `can_paginate_after_getting_remote_event_from_timestamp_to_event_endpoint` test flake. Although this acted flakey in Complement, if `sync_partial_state` raced and beat us before `/timestamp_to_event`, then even if we retried the failing `/context` request it wouldn't work until we made this Synapse change. With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.
Fix https://github.com/matrix-org/synapse/issues/13944
### Why did this fail before? Why was it flakey?
Sleuthing the server logs on the [CI failure](https://github.com/matrix-org/synapse/actions/runs/3149623842/jobs/5121449357#step:5:5805), it looks like `hs2:/timestamp_to_event` found `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` event locally. Then when we went and asked for it via `/context`, since it's an `outlier`, it was filtered out of the results -> `You don't have permission to access that event.`
This is reproducible when `sync_partial_state` races and persists `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` before we evaluate `get_event_for_timestamp(...)`. To consistently reproduce locally, just add a delay at the [start of `get_event_for_timestamp(...)`](cb20b885cb/synapse/handlers/room.py (L1470-L1496)) so it always runs after `sync_partial_state` completes.
```py
from twisted.internet import task as twisted_task
d = twisted_task.deferLater(self.hs.get_reactor(), 3.5)
await d
```
In a run where it passes, on `hs2`, `get_event_for_timestamp(...)` finds a different event locally which is next to a gap and we request from a closer one from `hs1` which gets backfilled. And since the backfilled event is not an `outlier`, it's returned as expected during `/context`.
With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.
* Fix `track_memory_usage` on poetry-core 1.3.x installations
The same kind of problem as discussed in #14085:
1. we defined an extra with an underscore
2. we look it up at runtime with an underscore
3. but poetry-core 1.3.x. installs it with a dash, causing (2) to fail.
Fix by using a dash everywhere.
* Changelog
Spawned while investigating https://github.com/matrix-org/synapse/issues/13944
This way we might get some more context whenever an `403 Forbidden - body: {"errcode":"M_FORBIDDEN","error":"You don't have permission to access that event."}` error is produced.
`log_config.yaml`
```yaml
loggers:
synapse:
level: INFO
synapse.visibility:
level: DEBUG
```
This should fix a race where the event notification comes in over
replication before the state replication, leaving a window during
which a sync may get an incorrect list of rooms for the user.
While https://github.com/matrix-org/synapse/pull/13635 stops us from doing the slow thing after we've already done it once, this PR stops us from doing one of the slow things in the first place.
Related to
- https://github.com/matrix-org/synapse/issues/13622
- https://github.com/matrix-org/synapse/pull/13635
- https://github.com/matrix-org/synapse/issues/13676
Part of https://github.com/matrix-org/synapse/issues/13356
Follow-up to https://github.com/matrix-org/synapse/pull/13815 which tracks event signature failures.
With this PR, we avoid the call to the costly `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before and we filter events based on the backoff before calling `_get_state_ids_after_missing_prev_event` now.
For example, this will save us 156s out of the 185s total that this `matrix.org` `/messages` request. If you want to see the full Jaeger trace of this, you can drag and drop this `trace.json` into your own Jaeger, https://gist.github.com/MadLittleMods/4b12d0d0afe88c2f65ffcc907306b761
To explain this exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid.
- `backfill`
- `outgoing-federation-request` `/backfill`
- `_check_sigs_and_hash_and_fetch`
- `_check_sigs_and_hash_and_fetch_one` for each event received over backfill
- ❗ `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
- ❗ `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
- `_process_pulled_events`
- `_process_pulled_event` for each validated event
- ❗ Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it
- `_get_state_ids_after_missing_prev_event`
- `outgoing-federation-request` `/state_ids`
- ❗ `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again
- ❗ `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check
The root node of a thread (and events related to it) are considered
"part of a thread" when validating receipts. This allows clients which
show the root node in both the main timeline and the threaded timeline
to easily send receipts in either.
Note that threaded notifications are not created for these events, these
events created notifications on the main timeline.
The callers either set a default limit or manually handle a None-limit
later on (by setting a default value).
Update the callers to always instantiate PaginationConfig with a default
limit and then assume the limit is non-None.
Stabilize the threads API (MSC3856) by supporting (only) the v1
path for the endpoint.
This also marks the API as safe for workers since it is a read-only
API.
Implement the /threads endpoint from MSC3856.
This is currently unstable and behind an experimental configuration
flag.
It includes a background update to backfill data, results from
the /threads endpoint will be partial until that finishes.
**Before:**
```
WARNING - POST-11 - Unable to parse JSON: Expecting value: line 1 column 1 (char 0) (b'')
```
**After:**
```
WARNING - POST-11 - Unable to parse JSON from POST /_matrix/client/v3/join/%21ZlmJtelqFroDRJYZaq:hs1?server_name=hs1 response: Expecting value: line 1 column 1 (char 0) (b'')
```
---
It's possible to figure out which endpoint these warnings were coming from before but you had to follow the request ID `POST-11` to the log line that says `Completed request [...]`. Including this key information next to the JSON parsing error makes it much easier to reason whether it matters or not.
```
2022-09-29T08:23:25.7875506Z synapse_main | 2022-09-29 08:21:10,336 - synapse.http.matrixfederationclient - 299 - INFO - POST-11 - {GET-O-13} [hs1] Completed request: 200 OK in 0.53 secs, got 450 bytes - GET matrix://hs1/_matrix/federation/v1/make_join/%21ohtKoQiXlPePSycXwp%3Ahs1/%40charlie%3Ahs2?ver=1&ver=2&ver=3&ver=4&ver=5&ver=6&ver=org.matrix.msc2176&ver=7&ver=8&ver=9&ver=org.matrix.msc3787&ver=10&ver=org.matrix.msc2716v4
```
---
As a note, having no `body` is normal for the `/join` endpoint and it can handle it.
0c853e0970/synapse/rest/client/room.py (L398-L403)
Alternatively we could remove these extra logs but they are probably more usually helpful to figure out what went wrong.
Fixes two related bugs:
* No edit information was bundled for events which aren't `m.room.message`.
* `m.new_content` was not applied for those events.
* Revert to prior build-system requirements
This reverts #14080.
* Use normalised extra name, which poetry-core 1.3 will generate anyway
* Changelog
* Upper bound build-system requirements
* Remove upgrade note; expand changelog entry a little.
* Fix typo in build-system comment
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Attempt to parse any valid information from an oEmbed response
(instead of bailing at the first unexpected data). This should allow
for more partial oEmbed data to be returned, resulting in better /
more URL previews, even if those URL previews are only partial.
==============================
Please note that legacy Prometheus metric names are now deprecated and will be removed in Synapse 1.73.0.
Server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names.
See the [upgrade notes](https://matrix-org.github.io/synapse/v1.69/upgrade.html#upgrading-to-v1690) for more details.
Deprecations and Removals
-------------------------
- Deprecate the `generate_short_term_login_token` method in favor of an async `create_login_token` method in the Module API. ([\#13842](https://github.com/matrix-org/synapse/issues/13842))
Internal Changes
----------------
- Ensure Synapse v1.69 works with upcoming database changes in v1.70. ([\#14045](https://github.com/matrix-org/synapse/issues/14045))
- Fix a bug introduced in Synapse v1.68.0 where messages could not be sent in rooms with non-integer `notifications` power level. ([\#14073](https://github.com/matrix-org/synapse/issues/14073))
- Temporarily pin build-system requirements to workaround an incompatibility with poetry-core 1.3.0. This will be reverted before the v1.69.0 release proper, see [\#14079](https://github.com/matrix-org/synapse/issues/14079). ([\#14080](https://github.com/matrix-org/synapse/issues/14080))
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEWMTnW8Z8khaaf90R+84KzgcyGG8FAmM+674ACgkQ+84Kzgcy
GG+TdQwAm3EswLr9gq+PPD59ewt7BI6DgdrW0aw+OS0qbYBEg2BklNk8RpQsELFC
U/FT+Ty9VakI+/ID4W0mHhPDc9xtZz09NE+fZqrSNxel7rdL8SfQu3gsrT433o53
FElQbeknuspiX8kokBFnzuFKR+HzZsZq5Of9SqSy6ItO/YsV8kFdx5QSYolzWZdP
AU/HNrfEBog/blFJEx504XDg7SgQRjAQM4uOYS1twSlKdG19yDm9+N1kLxdH4nyy
K5bxKQMARzk35QgM5JQIfryMKVo5rcfB5uotxUXd8s1mlHGxAVAb+jfXdSSvHrKq
SSUD5cII2XQmkZbsqig1fQaknh8RbwR4d3y+3O1Rjm1JlCQ4vjPMZ7KVyTS2nmVA
wdo9bEMbtP2qMMoq5Mlj9QpMg3VMU+ZpRDIY6zoa8oWHUCAL4Q/jxeDb4XtvLn3h
MuE6yRHca390jeYIJnSy1zZjoacIAAEBy3he1oTTor4KhCZV4jLRRtcc/nmR5cZ8
sgW+7wFh
=Iu7b
-----END PGP SIGNATURE-----
Merge tag 'v1.69.0rc2' into develop
Synapse 1.69.0rc2 (2022-10-06)
==============================
Please note that legacy Prometheus metric names are now deprecated and will be removed in Synapse 1.73.0.
Server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names.
See the [upgrade notes](https://matrix-org.github.io/synapse/v1.69/upgrade.html#upgrading-to-v1690) for more details.
Deprecations and Removals
-------------------------
- Deprecate the `generate_short_term_login_token` method in favor of an async `create_login_token` method in the Module API. ([\#13842](https://github.com/matrix-org/synapse/issues/13842))
Internal Changes
----------------
- Ensure Synapse v1.69 works with upcoming database changes in v1.70. ([\#14045](https://github.com/matrix-org/synapse/issues/14045))
- Fix a bug introduced in Synapse v1.68.0 where messages could not be sent in rooms with non-integer `notifications` power level. ([\#14073](https://github.com/matrix-org/synapse/issues/14073))
- Temporarily pin build-system requirements to workaround an incompatibility with poetry-core 1.3.0. This will be reverted before the v1.69.0 release proper, see [\#14079](https://github.com/matrix-org/synapse/issues/14079). ([\#14080](https://github.com/matrix-org/synapse/issues/14080))
Fixes two related bugs:
* The handling of `[null]` for a `room_types` filter was incorrect.
* The ordering of arguments when providing both a network tuple
and room type field was incorrect.
By getting the joined rooms before the current token we avoid any reading
history to confirm a user *was* in a room. We can then use any membership
change events, which we already fetch during sync, to determine the final
list of joined room IDs.
Applies the proper logic for unthreaded and threaded receipts to either
apply to all events in the room or only events in the same thread, respectively.
When retrieving counts of notifications segment the results based on the
thread ID, but choose whether to return them as individual threads or as
a single summed field by letting the client opt-in via a sync flag.
The summarization code is also updated to be per thread, instead of per
room.
Implements MSC2832 by sending application service access
tokens in the Authorization header.
The access token is also still sent as a query parameter until
the application service ecosystem has fully migrated to using
headers. In the future this could be made opt-in, or removed
completely.
Keep the old behavior (of including the original_event field) for any
requests to the /unstable version of the endpoint, but do not include
the field when the /v1 version is used.
This should avoid new clients from depending on this field, but will
not help with current dependencies.
MSC3316 declares that both /rooms/{roomId}/send and /rooms/{roomId}/state
should accept a ts parameter for appservices. This change expands support
to /state and adds tests.
Instead of running a single large query, run a single query for
user-only lookups and additional queries for batches of user device
lookups.
Resolves#13580.
Signed-off-by: Sean Quah <seanq@matrix.org>
Spawned while working on [`get_users_in_room` mis-uses](https://github.com/matrix-org/synapse/pull/13958#discussion_r984074897) and thinking we could use `get_local_users_in_room` here but we can't.
From first glance, it seemed like this was only using local users from all of the `is_mine_id(user_id)` checks but I see that it does actually use remote users. Just making things a little more clear here what it does and mentions remote users so maybe that will be more obvious in the future.
* Update mypy and mypy-zope
* Unignore assigning to LogRecord attributes
Presumably https://github.com/python/typeshed/pull/8064 makes this ok
Cherry-picked from #13521
* Remove unused ignores due to mypy ParamSpec fixes
https://github.com/python/mypy/pull/12668
Cherry-picked from #13521
* Remove additional unused ignores
* Fix new mypy complaints related to `assertGreater`
Presumably due to https://github.com/python/typeshed/pull/8077
* Changelog
* Reword changelog
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
We move the expensive check of visibility to after calculating push actions, avoiding the expensive check for users who won't get pushed anyway.
I think this should have a big impact on rooms with large numbers of local users that have pushed disabled.
Fixes#13942. Introduced in #13575.
Basically, let's only get the ordered set of hosts out of the DB if we need an ordered set of hosts. Since we split the function up the caching won't be as good, but I think it will still be fine as e.g. multiple backfill requests for the same room will hit the cache.
There is no need to grab thousands of backfill points when we only need 5 to make the `/backfill` request with. We need to grab a few extra in case the first few aren't visible in the history.
Previously, we grabbed thousands of backfill points from the database, then sorted and filtered them in the app. Fetching the 4.6k backfill points for `#matrix:matrix.org` from the database takes ~50ms - ~570ms so it's not like this saves a lot of time 🤷. But it might save us more time now that `get_backfill_points_in_room`/`get_insertion_event_backward_extremities_in_room` are more complicated after https://github.com/matrix-org/synapse/pull/13635
This PR moves the filtering and limiting to the SQL query so we just have less data to work with in the first place.
Part of https://github.com/matrix-org/synapse/issues/13356
c.f. #12993 (comment), point 3
This stores all device list updates that we receive while partial joins are ongoing, and processes them once we have the full state.
Note: We don't actually process the device lists in the same ways as if we weren't partially joined. Instead of updating the device list remote cache, we simply notify local users that a change in the remote user's devices has happened. I think this is safe as if the local user requests the keys for the remote user and we don't have them we'll simply fetch them as normal.