Optimize how we calculate `likely_domains` during backfill because I've seen this take 17s in production just to `get_current_state` which is used to `get_domains_from_state` (see case [*2. Loading tons of events* in the `/messages` investigation issue](https://github.com/matrix-org/synapse/issues/13356)).
There are 3 ways we currently calculate hosts that are in the room:
1. `get_current_state` -> `get_domains_from_state`
- Used in `backfill` to calculate `likely_domains` and `/timestamp_to_event` because it was cargo-culted from `backfill`
- This one is being eliminated in favor of `get_current_hosts_in_room` in this PR 🕳
1. `get_current_hosts_in_room`
- Used for other federation things like sending read receipts and typing indicators
1. `get_hosts_in_room_at_events`
- Used when pushing out events over federation to other servers in the `_process_event_queue_loop`
Fix https://github.com/matrix-org/synapse/issues/13626
Part of https://github.com/matrix-org/synapse/issues/13356
Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh)
### Query performance
#### Before
The query from `get_current_state` sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s):
```
synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 16035.612 ms (00:16.036)
synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 4243.237 ms (00:04.243)
```
But what about `get_current_hosts_in_room`: When there is 8M rows in the `current_state_events` table, the previous query in `get_current_hosts_in_room` took 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back.
```sh
$ psql synapse
synapse=# \timing on
synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]*:(.*)$'))
FROM current_state_events
WHERE
type = 'm.room.member'
AND membership = 'join'
AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
count
-------
4130
(1 row)
Time: 13181.598 ms (00:13.182)
synapse=# SELECT COUNT(*) from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
count
-------
80814
synapse=# SELECT COUNT(*) from current_state_events;
count
---------
8162847
synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') );
pg_size_pretty
----------------
4702 MB
```
#### After
I'm not sure how long it takes from complete freshness as I only really get that opportunity once (maybe restarting computer but that's cumbersome) and it's not really relevant to normal operating times. Maybe you get closer to the fresh times the more access variability there is so that Postgres caches aren't as exact. Update: The longest I've seen this run for is 6.4s and 4.5s after a computer restart.
After a Postgres restart, it takes 330ms and running back to back takes 260ms.
```sh
$ psql synapse
synapse=# \timing on
Timing is on.
synapse=# SELECT
substring(c.state_key FROM '@[^:]*:(.*)$') as host
FROM current_state_events c
/* Get the depth of the event from the events table */
INNER JOIN events AS e USING (event_id)
WHERE
c.type = 'm.room.member'
AND c.membership = 'join'
AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org'
GROUP BY host
ORDER BY min(e.depth) ASC;
Time: 333.800 ms
```
#### Going further
To improve things further we could add a `limit` parameter to `get_current_hosts_in_room`. Realistically, we don't need 4k domains to choose from because there is no way we're going to query that many before we a) probably get an answer or b) we give up.
Another thing we can do is optimize the query to use a index skip scan:
- https://wiki.postgresql.org/wiki/Loose_indexscan
- Index Skip Scan, https://commitfest.postgresql.org/37/1741/
- https://www.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/
If things like the signing key file are missing, let's just try to generate
them on startup.
Again, this is useful for k8s-like deployments where we just want to generate
keys on the first run.
When loading current ids, sort by stream ID so that we don't want to overwrite the `current_position` of an instance to a lower stream ID than we're actually at ([discussion](https://github.com/matrix-org/synapse/pull/13585#discussion_r951795379)). Previously, it sorted alphabetically by instance name which can be `null` and throw errors but more importantly, accomplishes nothing.
Fixes the following startup error which is why I started looking into this area:
```
$ poetry run synapse_homeserver --config-path homeserver.yaml
****************************************************************
Error during initialisation:
'<' not supported between instances of 'NoneType' and 'str'
There may be more information in the logs.
****************************************************************
```
Somehow my database ended up looking like the following, notice the `instance_name` is `null` in the db, and we can't sort `NoneType` things. Another question is why do we see the `instance_name` as `null` sometimes instead of `master` in monolith mode?
```
$ psql synapse
synapse=# SELECT * FROM stream_positions;
stream_name | instance_name | stream_id
-----------------+---------------+-----------
account_data | master | 1242
events | master | 1787
to_device | master | 58
presence_stream | master | 485638
receipts | master | 341
backfill | master | -139106
(6 rows)
synapse=# SELECT instance_name, stream_id FROM receipts_linearized;
instance_name | stream_id
---------------+-----------
| 211
| 3
| 4
| 212
| 213
| 224
| 228
| 164
| 313
| 253
| 38
| 321
| 324
| 189
| 192
| 193
| 194
| 195
| 197
| 198
| 275
| 79
| 339
| 340
| 82
| 341
| 84
| 85
| 91
| 119
```
Use dedicated `get_local_users_in_room` to find local users when calculating `join_authorised_via_users_server` ("the authorising user for joining a restricted room") of a `/make_join` request.
Found while working on https://github.com/matrix-org/synapse/pull/13575#discussion_r953023755 but it's not related.
This speeds things up by ~2x.
The vast majority of the time is now spent in `LruCache` moving things around the linked lists.
We do this via two things:
1. Don't create a deferred per-key during bulk set operations in `DeferredCache`. Instead, only create them if a subsequent caller asks for the key.
2. Add a bulk lookup API to `DeferredCache` rather than use a loop.
Part of #13019
This changes all the permission-related methods to rely on the Requester instead of the UserID. This is a first step towards enabling scoped access tokens at some point, since I expect the Requester to have scope-related informations in it.
It also changes methods which figure out the user/device/appservice out of the access token to return a Requester instead of something else. This avoids having store-related objects in the methods signatures.
Use a state filter or accept partial state in a few places where we
request state, to avoid blocking.
To make lazy-loading `/sync`s work, we need to provide the memberships
of event senders, which are not guaranteed to be in the room state.
Instead we dig through auth events for memberships to present to
clients. The auth events of an event are guaranteed to contain a
passable membership event, otherwise the event would have been rejected.
Note that this only covers the common code paths encountered during
testing. There has been no exhaustive checking of all sync code paths.
Fixes#13146.
Signed-off-by: Sean Quah <seanq@matrix.org>
Broke by #13522
It looks like we have some rules in the DB with a priority class less
than 0 that don't override the base rules. Before these were just
dropped, but #13522 made that a hard error.
This improves load times for push rules:
| Version | Time per user | Time for 1k users |
| -------------------- | ------------- | ----------------- |
| Before | 138 µs | 138ms |
| Now (with custom) | 2.11 µs | 2.11ms |
| Now (without custom) | 49.7 ns | 0.05 ms |
This therefore has a large impact on send times for rooms
with large numbers of local users in the room.
This reverts commit f383b9b3ec. Other PRs
were seeing mypy failures that looked to be related to mypy-zope.
Confusingly, we didn't see this on #13521.
Revert this for now and investigate later.
* Clarifies comments.
* Fixes an erroneous comment (about return type) added in #13455
(ec24813220).
* Clarifies the name of a variable.
* Simplifies logic of pulling out the latest join for the requesting user.
Add some miscellaneous comments to document sync, especially around
`compute_state_delta`.
Signed-off-by: Sean Quah <seanq@matrix.org>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
```py
@trace
@tag_args
async def get_oldest_event_ids_with_depth_in_room(...)
...
```
Before this PR, you would see a warning in the logs and the span was not exported:
```
2022-08-03 19:11:59,383 - synapse.logging.opentracing - 835 - ERROR - GET-0 - @trace may not have wrapped EventFederationWorkerStore.get_oldest_event_ids_with_depth_in_room correctly! The function is not async but returned a coroutine.
```
In state res v2, we apply two passes of iterative auth checks. The first
pass replays power events and events in their auth chains, but only
those belonging to the full conflicted set. The source code as written
suggests that we want only those belonging to the auth difference (which
is a smaller set of events).
At runtime we were doing the correct thing anyway, because the only
callsite of `_reverse_topological_power_sort` passes in the
`full_conflicted_set`. So this really is just a rename.
This adds support for the stable identifiers of MSC2285 while
continuing to support the unstable identifiers behind the configuration
flag. These will be removed in a future version.
Fix @tag_args being off-by-one (ahead)
Example:
```
argspec.args=[
'self',
'room_id'
]
args=(
<synapse.storage.databases.main.DataStore object at 0x10d0b8d00>,
'!HBehERstyQBxyJDLfR:my.synapse.server'
)
```
---
The previous logic was also flawed and we can end up in a situation like this:
```
argspec.args=['self', 'dest', 'room_id', 'limit', 'extremities']
args=(<synapse.federation.federation_client.FederationClient object at 0x7f1651c18160>, 'hs1', '!jAEHKIubyIfuLOdfpY:hs1')
```
From this source:
```py
async def backfill(
self, dest: str, room_id: str, limit: int, extremities: Collection[str]
) -> Optional[List[EventBase]]:
```
And this usage:
```py
events = await self._federation_client.backfill(
dest, room_id, limit=limit, extremities=extremities
)
```
which would previously cause this error:
```
synapse_main | 2022-08-04 06:13:12,051 - synapse.handlers.federation - 424 - ERROR - GET-5 - Failed to backfill from hs1 because tuple index out of range
synapse_main | Traceback (most recent call last):
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation.py", line 392, in try_backfill
synapse_main | await self._federation_event_handler.backfill(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 828, in _wrapper
synapse_main | return await func(*args, **kwargs)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 593, in backfill
synapse_main | events = await self._federation_client.backfill(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 828, in _wrapper
synapse_main | return await func(*args, **kwargs)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 827, in _wrapper
synapse_main | with wrapping_logic(func, *args, **kwargs):
synapse_main | File "/usr/local/lib/python3.9/contextlib.py", line 119, in __enter__
synapse_main | return next(self.gen)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 922, in _wrapping_logic
synapse_main | set_attribute("ARG_" + arg, str(args[i + 1])) # type: ignore[index]
synapse_main | IndexError: tuple index out of range
```
* Adds docstrings and inline comments.
* Formats SQL queries using triple quoted strings.
* Minor formatting changes.
* Avoid fetching `event_push_summary_stream_ordering` multiple times
in the same transactions.
Still maintains local in memory lookup optimisation, but does any external
lookup as part of the deferred that prevents duplicate lookups for the same
event at once. This makes the assumption that fetching from an external
cache is a non-zero load operation.
Signed-off-by: Andrew Doh <andrewddo@gmail.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Andrew Morgan <andrewm@element.io>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
Previously, `_resolve_state_at_missing_prevs` returned the resolved
state before an event and a partial state flag. These were unwieldy to
carry around would only ever be used to build an event context. Build
the event context directly instead.
Signed-off-by: Sean Quah <seanq@matrix.org>
==============================
This RC reintroduces support for `account_threepid_delegates.email`, which was removed in 1.64.0rc1. It remains deprecated and will be removed altogether in a future release. ([\#13406](https://github.com/matrix-org/synapse/issues/13406))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAmLj4a8ACgkQOSor00I9
eP8biQf/c8yY2mbeRZcBKtp6yoQCRYQvboSMEXyi+dLe1hNqdhSZwRQcAoFuAFwE
WdScDvoTaElUxv0v6eCI1x9CoXnZ6xpDShvK39j5Yhzv+1tNsm5Uq9imyG3jK5i6
U/3Gt6CrCsS01VkGslQ3B5I6MFtbC6ZZK9O48yg+GD8Oqw2HH/gllr5swyVbKdbc
GGhRBHvgXn+w6d/KnKt8uRxJqIpDt9JMga+WdB8CwFR5WnWbGdw24KsyxmBuOLC3
caQRiluJL/X4jApUpfsJMBBd/jrDod5wWDFO/4P+v0+2d3Ts+hKezZbt5h1VIYSw
szZXbzxn5RNDkNiJDpOOOMYQ5DXGmA==
=3/nK
-----END PGP SIGNATURE-----
Merge tag 'v1.64.0rc2' into develop
Synapse 1.64.0rc2 (2022-07-29)
==============================
This RC reintroduces support for `account_threepid_delegates.email`, which was removed in 1.64.0rc1. It remains deprecated and will be removed altogether in a future release. ([\#13406](https://github.com/matrix-org/synapse/issues/13406))
The `room_id` field represented the parent space for each room
and was made redundant by changes in the API shape where the
`children_state` is now nested underneath each `room`.
The room ID of each child is in the `state_key` field and is still
available.
Avoid blocking on full state in `_resolve_state_at_missing_prevs` and
return a new flag indicating whether the resolved state is partial.
Thread that flag around so that it makes it into the event context.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
When registering a new account via SSO on iOS, the text field becomes pretty annoying as it autocapitalises and autocorrects your input. This PR fixes that (although I have only tested the raw HTML file on the simulator, I'm not sure how to get the complete setup available for testing in the flow).
Previously, TLS could only be used with STARTTLS.
Add a new option `force_tls`, where TLS is used from the start.
Implicit TLS is recommended over STARTLS,
see https://datatracker.ietf.org/doc/html/rfc8314Fixes#8046.
Signed-off-by: Jan Schär <jan@jschaer.ch>
See #10826 and #10786 for context as to why we had to disable pruning on
those caches.
Now that `get_users_who_share_room_with_user` is called frequently only
for presence, we just need to make calls to it less frequent and then we
can remove the various levels of caching that is going on.
When a room has the partial state flag, we may not have an accurate
`m.room.member` event for event senders in the room's current state, and
so cannot perform soft fail checks correctly. Skip the soft fail check
entirely in this case.
As an alternative, we could block until we have full state, but that
would prevent us from receiving incoming events over federation, which
is undesirable.
Signed-off-by: Sean Quah <seanq@matrix.org>
Fix race conditions in the async cache invalidation logic, by separating
the async & local invalidation calls and ensuring any async call i
executed first.
Signed off by Nick @ Beeper (@Fizzadar).
More prep work for asyncronous caching, also makes all process_replication_rows methods consistent (presence handler already is so).
Signed off by Nick @ Beeper (@Fizzadar)
Fix https://github.com/matrix-org/synapse/issues/13016
## New error code and status
### Before
Previously, we returned a `404` for `/thumbnail` which isn't even in the spec.
```json
{
"errcode": "M_NOT_FOUND",
"error": "Not found [b'hs1', b'tefQeZhmVxoiBfuFQUKRzJxc']"
}
```
### After
What does the spec say?
> 400: The request does not make sense to the server, or the server cannot thumbnail the content. For example, the client requested non-integer dimensions or asked for negatively-sized images.
>
> *-- https://spec.matrix.org/v1.1/client-server-api/#get_matrixmediav3thumbnailservernamemediaid*
Now with this PR, we respond with a `400` when we don't have thumbnails to serve and we explain why we might not have any thumbnails.
```json
{
"errcode": "M_UNKNOWN",
"error": "Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)",
}
```
> Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)
---
We still respond with a 404 in many other places. But we can iterate on those later and maybe keep some in some specific places after spec updates/clarification: https://github.com/matrix-org/matrix-spec/issues/1122
We can also iterate on the bugs where Synapse doesn't thumbnail when it should in other issues/PRs.
These columns were added back in Synapse 1.52, and have been populated for new
events since then. It's now (beyond) time to back-populate them for existing
events.
There are two fixes here:
1. A long-standing bug where we incorrectly calculated `delta_ids`; and
2. A bug introduced in #13267 where we got current state incorrect.
Some experimental prep work to enable external event caching based on #9379 & #12955. Doesn't actually move the cache at all, just lays the groundwork for async implemented caches.
Signed off by Nick @ Beeper (@Fizzadar)
* Replace `get_new_events_for_appservice` with `get_all_new_events_stream`
The functions were near identical and this brings the AS worker closer
to the way federation senders work which can allow for multiple workers
to handle AS traffic.
* Pull received TS alongside events when processing the stream
This avoids an extra query -per event- when both federation sender
and appservice pusher process events.
There is a corner in `_check_event_auth` (long known as "the weird corner") where, if we get an event with auth_events which don't match those we were expecting, we attempt to resolve the diffence between our state and the remote's with a state resolution.
This isn't specced, and there's general agreement we shouldn't be doing it.
However, it turns out that the faster-joins code was relying on it, so we need to introduce something similar (but rather simpler) for that.
* Drop support for v1 unbind
Signed-off-by: Jacek Kusnierz <jacek.kusnierz@tum.de>
* Add changelog
Signed-off-by: Jacek Kusnierz <jacek.kusnierz@tum.de>
* Update changelog.d/13240.misc
* Drop support for delegating email validation
Delegating email validation to an IS is insecure (since it allows the owner of
the IS to do a password reset on your HS), and has long been deprecated. It
will now cause a config error at startup.
* Update unit test which checks for email verification
Give it an `email` config instead of a threepid delegate
* Remove unused method `requestEmailToken`
* Simplify config handling for email verification
Rather than an enum and a boolean, all we need here is a single bool, which
says whether we are or are not doing email verification.
* update docs
* changelog
* upgrade.md: fix typo
* update version number
this will be in 1.64, not 1.63
* update version number
this one too
This gets rid of another usage of get_appservice_by_req, with all the benefits, including correctly tracking the appservice IP and setting the tracing attributes correctly.
Signed-off-by: Quentin Gliech <quenting@element.io>
Inspired by the room batch handler, this uses previous event inserts to
pre-populate prev events during room creation, reducing the number of
queries required to create a room.
Signed off by Nick @ Beeper (@Fizzadar)
Complement tests: https://github.com/matrix-org/complement/pull/405
This happens when you have some messages imported before the room is created.
Then use MSC3030 to look backwards before the room creation from a remote
federated server. The server won't find anything locally, but will ask over
federation which will have the remote event. The previous logic would
choke on not having the local event assigned.
```
Failed to fetch /timestamp_to_event from hs2 because of exception(UnboundLocalError) local variable 'local_event' referenced before assignment args=("local variable 'local_event' referenced before assignment",)
```
Bounce recalculation of current state to the correct event persister and
move recalculation of current state into the event persistence queue, to
avoid concurrent updates to a room's current state.
Also give recalculation of a room's current state a real stream
ordering.
Signed-off-by: Sean Quah <seanq@matrix.org>
Method `_get_state_map_for_room` seems to break in presence of some ill-formed events in the database. Reimplementing this method to use `get_current_state`, which is more robust to such events.
This happened if we encountered a stream ordering in `event_push_actions` that had more rows than the batch size of the delete, as If we don't delete any rows in an iteration then the next time round we get the exact same stream ordering and get stuck.
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.
We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.
To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.
All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.
On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.
The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.
We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.
`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.
`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.
Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Cast to postgres types when handling postgres db
* Remove unused method
* Easy annotations
* Annotate create_room
* Use `ParamSpec` to annotate looping_call
* Annotate `default_config`
* Track `now` as a float
`time_ms` returns an int like the proper Synapse `Clock`
* Introduce a `Timer` dataclass
* Introduce a Looper type
* Suppress checking of a mock
* tests.utils is typed
* Changelog
* Whoops, import ParamSpec from typing_extensions
* ditch the psycopg2 casts
==============================
Bugfixes
--------
- Update the version of the [ldap3 plugin](https://github.com/matrix-org/matrix-synapse-ldap3/) included in the `matrixdotorg/synapse` DockerHub images and the Debian packages hosted on `packages.matrix.org` to 0.2.1. This fixes [a bug](https://github.com/matrix-org/matrix-synapse-ldap3/pull/163) with usernames containing uppercase characters. ([\#13156](https://github.com/matrix-org/synapse/issues/13156))
- Fix a bug introduced in Synapse 1.62.0rc1 affecting unread counts for users on small servers. ([\#13168](https://github.com/matrix-org/synapse/issues/13168))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAmLDDVgACgkQiISIDS7+
X/Q+KQ//WuWB9hfAW8XEyYHWox95zaITsAzY/TTG1IXygAMjgEk2+9utdRaX3wbk
YDaZCeEw+vbK/3w/lt1RzI30K3uVCZVcW2DTQr1Qi4B+UWLOlCsVfOT9LcMvNoJe
ww/cOK6RpPgTlqk5ij0MtdjfWkAJeToi7ESMooORxhWFm3Zd8e5BpbNv89WUBZhk
zCqCjIdjSF+Mwk8NwmU1iJi5JQY/+Xl51uk2+wGIAe4vtgPTz7PJmoPF1E6nGGVF
9OYdlWU4H7u6js8n05QL2jKtX34uszCo2hwoW2aFPPmF0B2CFEV6WFBiDOppLZ1g
ZMJv1s/34RXoBu8pAuJnq2BZkWxu99LRmPV+f/R+S0jDT1MH9tdSdhfcGu7iH/Y9
uguGX3OOlxnkUb5o825Xt3mvBcVaTGY+sspFtB12RtXmWRdll/Hq6w11ZN5f6qDy
Nr/DuoPjMAH7kzelFn/GpP6K8zX8iYjf0lLCyrbYV7OYAI6/I+Vao+sT2ctHD1T8
s4aTTx1bEl23mo/RiqH2fRHaPhBjZKW0uv6iRNqDE2ThYPAXinVtt7MiUU0QGco5
vMca/RZBkEj0Lov0AleBx4XRXlBTyq5BX2V1frYLenKp42bDzN9sgsPAOPeKieHW
qjr+Ti9i47wGADXs2GI/mke/C8jlONEKJm/v8mwXItn8Za7wBJc=
=SpI6
-----END PGP SIGNATURE-----
Merge tag 'v1.62.0rc3' into develop
Synapse 1.62.0rc3 (2022-07-04)
==============================
Bugfixes
--------
- Update the version of the [ldap3 plugin](https://github.com/matrix-org/matrix-synapse-ldap3/) included in the `matrixdotorg/synapse` DockerHub images and the Debian packages hosted on `packages.matrix.org` to 0.2.1. This fixes [a bug](https://github.com/matrix-org/matrix-synapse-ldap3/pull/163) with usernames containing uppercase characters. ([\#13156](https://github.com/matrix-org/synapse/issues/13156))
- Fix a bug introduced in Synapse 1.62.0rc1 affecting unread counts for users on small servers. ([\#13168](https://github.com/matrix-org/synapse/issues/13168))
When we receive an event over federation during a faster join, there is no need
to wait for full state, since we have a whole reconciliation process designed
to take the partial state into account.
* Make _iterate_over_text easier to read by using simple data structures
* Prefer a set of tags to ignore
In my tests, it's 4x faster to check for containment in a set of this size
* Add a stack size limit to _iterate_over_text
* Continue accepting the case where there is no body element
* Use an early return instead for None
Co-authored-by: Richard van der Hoff <richard@matrix.org>
* Extend the auth rule checks for `m.room.create` events
... and move them up to the top of the function. Since the no auth_events are
allowed for m.room.create events, we may as well get the m.room.create event
checks out of the way first.
* Add a test for create events with prev_events
When we fail to persist a federation event, we kick off a task to remove
its push actions in the background, using the current logging context.
Since we don't `await` that task, we may finish our logging context
before the task finishes. There's no reason to not `await` the task, so
let's do that.
Signed-off-by: Sean Quah <seanq@matrix.org>
Pull out `twitter:` meta tags when generating a preview and
use it to augment any `og:` meta tags.
Prefers Open Graph information over Twitter card information.
* Add auth events to events used in tests
* Move some event auth checks out to a different method
Some of the event auth checks apply to an event's auth_events, rather than the
state at the event - which means they can play no part in state
resolution. Move them out to a separate method.
* Rename check_auth_rules_for_event
Now it only checks the state-dependent auth rules, it needs a better name.
Fixes#11887 hopefully.
The core change here is that `event_push_summary` now holds a summary of counts up until a much more recent point, meaning that the range of rows we need to count in `event_push_actions` is much smaller.
This needs two major changes:
1. When we get a receipt we need to recalculate `event_push_summary` rather than just delete it
2. The logic for deleting `event_push_actions` is now divorced from calculating `event_push_summary`.
In future it would be good to calculate `event_push_summary` while we persist a new event (it should just be a case of adding one to the relevant rows in `event_push_summary`), as that will further simplify the get counts logic and remove the need for us to periodically update `event_push_summary` in a background job.
* Remove redundant references to `event_edges.room_id`
We don't need to care about the room_id here, because we are already checking
the event id.
* Clean up the event_edges table
We make a number of changes to `event_edges`:
* We give the `room_id` and `is_state` columns defaults (null and false
respectively) so that we can stop populating them.
* We drop any rows that have `is_state` set true - they should no longer
exist.
* We drop any rows that do not exist in `events` - these should not exist
either.
* We drop the old unique constraint on all the colums, which wasn't much use.
* We create a new unique index on `(event_id, prev_event_id)`.
* We add a foreign key constraint to `events`.
These happen rather differently depending on whether we are on Postgres or
SQLite. For SQLite, we just rebuild the whole table, copying only the rows we
want to keep. For Postgres, we try to do things in the background as much as
possible.
* Stop populating `event_edges.room_id` and `is_state`
We can just rely on the defaults.
* Rename test_fedclient to match its source file
* Require at least one destination to be truthy
* Explicitly validate user ID in profile endpoint GETs
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
This simplifies the access token verification logic by removing the `rights`
parameter which was only ever used for the unsubscribe link in email
notifications. The latter has been moved under the `/_synapse` namespace,
since it is not a standard API.
This also makes the email verification link more secure, by embedding the
app_id and pushkey in the macaroon and verifying it. This prevents the user
from tampering the query parameters of that unsubscribe link.
Macaroon generation is refactored:
- Centralised all macaroon generation and verification logic to the
`MacaroonGenerator`
- Moved to `synapse.utils`
- Changed the constructor to require only a `Clock`, hostname, and a secret key
(instead of a full `Homeserver`).
- Added tests for all methods.
Instead, use the `room_version` property of the event we're checking.
The `room_version` was originally added as a parameter somewhere around #4482,
but really it's been redundant since #6875 added a `room_version` field to `EventBase`.
Instead, use the `room_version` property of the event we're validating.
The `room_version` was originally added as a parameter somewhere around #4482,
but really it's been redundant since #6875 added a `room_version` field to `EventBase`.
The `room_id` field was removed from MSC2946 before
it was accepted. It was initially kept for backwards compatibility
and should be removed now that the stable form of the API
is used.
This change only stops Synapse from validating that it is returned,
a future PR will remove returning it as part of the response.
By always using delete_devices and sometimes passing a list
with a single device ID.
Previously these methods had gotten out of sync with each
other and it seems there's little benefit to the single-device
variant.
* Update worker docs to remove group endpoints.
* Removes an unused parameter to `ApplicationService`.
* Break dependency between media repo and groups.
* Avoid copying `m.room.related_groups` state events during room upgrades.
Currently, we try to pull the event corresponding to a sync token from the database. However, when
we fetch redaction events, we check the target of that redaction (because we aren't allowed to send
redactions to clients without validating them). So, if the sync token points to a redaction of an event
that we don't have, we have a problem.
It turns out we don't really need that event, and can just work with its ID and metadata, which
sidesteps the whole problem.
* Raise a dedicated `InvalidEventSignatureError` from `_check_sigs_on_pdu`
* Downgrade logging about redactions to DEBUG
this can be very spammy during a room join, and it's not very useful.
* Raise `InvalidEventSignatureError` from `_check_sigs_and_hash`
... and, more importantly, move the logging out to the callers.
* changelog