This improves load times for push rules:
| Version | Time per user | Time for 1k users |
| -------------------- | ------------- | ----------------- |
| Before | 138 µs | 138ms |
| Now (with custom) | 2.11 µs | 2.11ms |
| Now (without custom) | 49.7 ns | 0.05 ms |
This therefore has a large impact on send times for rooms
with large numbers of local users in the room.
This reverts commit f383b9b3ec. Other PRs
were seeing mypy failures that looked to be related to mypy-zope.
Confusingly, we didn't see this on #13521.
Revert this for now and investigate later.
* Clarifies comments.
* Fixes an erroneous comment (about return type) added in #13455
(ec24813220).
* Clarifies the name of a variable.
* Simplifies logic of pulling out the latest join for the requesting user.
Add some miscellaneous comments to document sync, especially around
`compute_state_delta`.
Signed-off-by: Sean Quah <seanq@matrix.org>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
```py
@trace
@tag_args
async def get_oldest_event_ids_with_depth_in_room(...)
...
```
Before this PR, you would see a warning in the logs and the span was not exported:
```
2022-08-03 19:11:59,383 - synapse.logging.opentracing - 835 - ERROR - GET-0 - @trace may not have wrapped EventFederationWorkerStore.get_oldest_event_ids_with_depth_in_room correctly! The function is not async but returned a coroutine.
```
In state res v2, we apply two passes of iterative auth checks. The first
pass replays power events and events in their auth chains, but only
those belonging to the full conflicted set. The source code as written
suggests that we want only those belonging to the auth difference (which
is a smaller set of events).
At runtime we were doing the correct thing anyway, because the only
callsite of `_reverse_topological_power_sort` passes in the
`full_conflicted_set`. So this really is just a rename.
This adds support for the stable identifiers of MSC2285 while
continuing to support the unstable identifiers behind the configuration
flag. These will be removed in a future version.
Fix @tag_args being off-by-one (ahead)
Example:
```
argspec.args=[
'self',
'room_id'
]
args=(
<synapse.storage.databases.main.DataStore object at 0x10d0b8d00>,
'!HBehERstyQBxyJDLfR:my.synapse.server'
)
```
---
The previous logic was also flawed and we can end up in a situation like this:
```
argspec.args=['self', 'dest', 'room_id', 'limit', 'extremities']
args=(<synapse.federation.federation_client.FederationClient object at 0x7f1651c18160>, 'hs1', '!jAEHKIubyIfuLOdfpY:hs1')
```
From this source:
```py
async def backfill(
self, dest: str, room_id: str, limit: int, extremities: Collection[str]
) -> Optional[List[EventBase]]:
```
And this usage:
```py
events = await self._federation_client.backfill(
dest, room_id, limit=limit, extremities=extremities
)
```
which would previously cause this error:
```
synapse_main | 2022-08-04 06:13:12,051 - synapse.handlers.federation - 424 - ERROR - GET-5 - Failed to backfill from hs1 because tuple index out of range
synapse_main | Traceback (most recent call last):
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation.py", line 392, in try_backfill
synapse_main | await self._federation_event_handler.backfill(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 828, in _wrapper
synapse_main | return await func(*args, **kwargs)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 593, in backfill
synapse_main | events = await self._federation_client.backfill(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 828, in _wrapper
synapse_main | return await func(*args, **kwargs)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 827, in _wrapper
synapse_main | with wrapping_logic(func, *args, **kwargs):
synapse_main | File "/usr/local/lib/python3.9/contextlib.py", line 119, in __enter__
synapse_main | return next(self.gen)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 922, in _wrapping_logic
synapse_main | set_attribute("ARG_" + arg, str(args[i + 1])) # type: ignore[index]
synapse_main | IndexError: tuple index out of range
```
* Adds docstrings and inline comments.
* Formats SQL queries using triple quoted strings.
* Minor formatting changes.
* Avoid fetching `event_push_summary_stream_ordering` multiple times
in the same transactions.
Still maintains local in memory lookup optimisation, but does any external
lookup as part of the deferred that prevents duplicate lookups for the same
event at once. This makes the assumption that fetching from an external
cache is a non-zero load operation.
Part of my continuing quest to make the docker images build quicker: copy nginx and redis in from base docker images, rather than apt installing each time.
Signed-off-by: Andrew Doh <andrewddo@gmail.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Andrew Morgan <andrewm@element.io>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
* Improved section regarding server admin
Added steps describing how to elevate an existing user to administrator by manipulating a `postgres` database.
Signed-off-by: jejo86 28619134+jejo86@users.noreply.github.com
* Improved section regarding server admin
* Reference database settings
Add instructions to check database settings to find out the database name, instead of listing all available PostgreSQL databases.
* Add suggestions from PR conversation
Replace config filename `homeserver.yaml`. with "config file".
Remove instructions to switch to `postgres` user.
Add instructions how to connect to SQLite database.
* Update changelog.d/13230.doc
Co-authored-by: reivilibre <olivier@librepush.net>
Previously, `_resolve_state_at_missing_prevs` returned the resolved
state before an event and a partial state flag. These were unwieldy to
carry around would only ever be used to build an event context. Build
the event context directly instead.
Signed-off-by: Sean Quah <seanq@matrix.org>
The `room_id` field represented the parent space for each room
and was made redundant by changes in the API shape where the
`children_state` is now nested underneath each `room`.
The room ID of each child is in the `state_key` field and is still
available.
Avoid blocking on full state in `_resolve_state_at_missing_prevs` and
return a new flag indicating whether the resolved state is partial.
Thread that flag around so that it makes it into the event context.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
When registering a new account via SSO on iOS, the text field becomes pretty annoying as it autocapitalises and autocorrects your input. This PR fixes that (although I have only tested the raw HTML file on the simulator, I'm not sure how to get the complete setup available for testing in the flow).
Previously, TLS could only be used with STARTTLS.
Add a new option `force_tls`, where TLS is used from the start.
Implicit TLS is recommended over STARTLS,
see https://datatracker.ietf.org/doc/html/rfc8314Fixes#8046.
Signed-off-by: Jan Schär <jan@jschaer.ch>
See #10826 and #10786 for context as to why we had to disable pruning on
those caches.
Now that `get_users_who_share_room_with_user` is called frequently only
for presence, we just need to make calls to it less frequent and then we
can remove the various levels of caching that is going on.
When a room has the partial state flag, we may not have an accurate
`m.room.member` event for event senders in the room's current state, and
so cannot perform soft fail checks correctly. Skip the soft fail check
entirely in this case.
As an alternative, we could block until we have full state, but that
would prevent us from receiving incoming events over federation, which
is undesirable.
Signed-off-by: Sean Quah <seanq@matrix.org>
Add another bash script to the contrib directory. It creates multiple stream writers and also prints out the example configuration for homeserver.yaml.
Signed-off-by: Ville Petteri Huh.
Fix race conditions in the async cache invalidation logic, by separating
the async & local invalidation calls and ensuring any async call i
executed first.
Signed off by Nick @ Beeper (@Fizzadar).
More prep work for asyncronous caching, also makes all process_replication_rows methods consistent (presence handler already is so).
Signed off by Nick @ Beeper (@Fizzadar)
Fix https://github.com/matrix-org/synapse/issues/13016
## New error code and status
### Before
Previously, we returned a `404` for `/thumbnail` which isn't even in the spec.
```json
{
"errcode": "M_NOT_FOUND",
"error": "Not found [b'hs1', b'tefQeZhmVxoiBfuFQUKRzJxc']"
}
```
### After
What does the spec say?
> 400: The request does not make sense to the server, or the server cannot thumbnail the content. For example, the client requested non-integer dimensions or asked for negatively-sized images.
>
> *-- https://spec.matrix.org/v1.1/client-server-api/#get_matrixmediav3thumbnailservernamemediaid*
Now with this PR, we respond with a `400` when we don't have thumbnails to serve and we explain why we might not have any thumbnails.
```json
{
"errcode": "M_UNKNOWN",
"error": "Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)",
}
```
> Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)
---
We still respond with a 404 in many other places. But we can iterate on those later and maybe keep some in some specific places after spec updates/clarification: https://github.com/matrix-org/matrix-spec/issues/1122
We can also iterate on the bugs where Synapse doesn't thumbnail when it should in other issues/PRs.
`frozendict` 2.3.2 includes a fix for a memory leak in
`frozendict.__hash__`. This likely has no impact outside of the
deprecated `/initialSync` endpoint, which uses `StreamToken`s,
containing `RoomStreamToken`s, containing `frozendict`s, as cache keys.
Signed-off-by: Sean Quah <seanq@matrix.org>
These columns were added back in Synapse 1.52, and have been populated for new
events since then. It's now (beyond) time to back-populate them for existing
events.
There are two fixes here:
1. A long-standing bug where we incorrectly calculated `delta_ids`; and
2. A bug introduced in #13267 where we got current state incorrect.
When building the docker images for complement testing, copy a preinstalled
complement over from a base image, rather than apt installing it. This avoids
network traffic and is much faster.
Some experimental prep work to enable external event caching based on #9379 & #12955. Doesn't actually move the cache at all, just lays the groundwork for async implemented caches.
Signed off by Nick @ Beeper (@Fizzadar)
* Replace `get_new_events_for_appservice` with `get_all_new_events_stream`
The functions were near identical and this brings the AS worker closer
to the way federation senders work which can allow for multiple workers
to handle AS traffic.
* Pull received TS alongside events when processing the stream
This avoids an extra query -per event- when both federation sender
and appservice pusher process events.
There is a corner in `_check_event_auth` (long known as "the weird corner") where, if we get an event with auth_events which don't match those we were expecting, we attempt to resolve the diffence between our state and the remote's with a state resolution.
This isn't specced, and there's general agreement we shouldn't be doing it.
However, it turns out that the faster-joins code was relying on it, so we need to introduce something similar (but rather simpler) for that.
* Drop support for v1 unbind
Signed-off-by: Jacek Kusnierz <jacek.kusnierz@tum.de>
* Add changelog
Signed-off-by: Jacek Kusnierz <jacek.kusnierz@tum.de>
* Update changelog.d/13240.misc
* Admin API request explanation improved
Pointed out, that the Admin API is not accessible by default from any remote computer, but only from the PC `matrix-synapse` is running on.
Added a full, working example, making sure to include the cURL flag `-X`, which needs to be prepended to `GET`, `POST`, `PUT` etc. and listing the full query string including protocol, IP address and port.
* Admin API request explanation improved
* Apply suggestions from code review
Update changelog. Reword prose.
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Drop support for delegating email validation
Delegating email validation to an IS is insecure (since it allows the owner of
the IS to do a password reset on your HS), and has long been deprecated. It
will now cause a config error at startup.
* Update unit test which checks for email verification
Give it an `email` config instead of a threepid delegate
* Remove unused method `requestEmailToken`
* Simplify config handling for email verification
Rather than an enum and a boolean, all we need here is a single bool, which
says whether we are or are not doing email verification.
* update docs
* changelog
* upgrade.md: fix typo
* update version number
this will be in 1.64, not 1.63
* update version number
this one too
This gets rid of another usage of get_appservice_by_req, with all the benefits, including correctly tracking the appservice IP and setting the tracing attributes correctly.
Signed-off-by: Quentin Gliech <quenting@element.io>
Inspired by the room batch handler, this uses previous event inserts to
pre-populate prev events during room creation, reducing the number of
queries required to create a room.
Signed off by Nick @ Beeper (@Fizzadar)
Complement tests: https://github.com/matrix-org/complement/pull/405
This happens when you have some messages imported before the room is created.
Then use MSC3030 to look backwards before the room creation from a remote
federated server. The server won't find anything locally, but will ask over
federation which will have the remote event. The previous logic would
choke on not having the local event assigned.
```
Failed to fetch /timestamp_to_event from hs2 because of exception(UnboundLocalError) local variable 'local_event' referenced before assignment args=("local variable 'local_event' referenced before assignment",)
```
All tests are prefixed with `STALE_` and therefore they are silently
skipped. They were moved to `STALE_` in version `v0.5.0` in commit
2fcce3b3c5 - `Remove stale tests`.
Tests from `RoomEventsStoreTestCase` class are not used for last 8
years, I believe the best would be to remove them entirely.
Signed-off-by: Petr Vaněk <arkamar@atlas.cz>
Bounce recalculation of current state to the correct event persister and
move recalculation of current state into the event persistence queue, to
avoid concurrent updates to a room's current state.
Also give recalculation of a room's current state a real stream
ordering.
Signed-off-by: Sean Quah <seanq@matrix.org>
Method `_get_state_map_for_room` seems to break in presence of some ill-formed events in the database. Reimplementing this method to use `get_current_state`, which is more robust to such events.
This happened if we encountered a stream ordering in `event_push_actions` that had more rows than the batch size of the delete, as If we don't delete any rows in an iteration then the next time round we get the exact same stream ordering and get stuck.
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.
We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.
To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.
All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.
On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.
The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.
We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.
`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.
`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.
Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Cast to postgres types when handling postgres db
* Remove unused method
* Easy annotations
* Annotate create_room
* Use `ParamSpec` to annotate looping_call
* Annotate `default_config`
* Track `now` as a float
`time_ms` returns an int like the proper Synapse `Clock`
* Introduce a `Timer` dataclass
* Introduce a Looper type
* Suppress checking of a mock
* tests.utils is typed
* Changelog
* Whoops, import ParamSpec from typing_extensions
* ditch the psycopg2 casts
==============================
Bugfixes
--------
- Fix unread counts for users on large servers. Introduced in v1.62.0rc1. ([\#13140](https://github.com/matrix-org/synapse/issues/13140))
- Fix DB performance when deleting old push notifications. Introduced in v1.62.0rc1. ([\#13141](https://github.com/matrix-org/synapse/issues/13141))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAmK+0FYACgkQiISIDS7+
X/RE7w//RJTQD+9rqanBj9IxE07Vy6nbMxRoxMbhdj9DMidepxNalxg9MqkhZXJC
AJquNIVhgMLs7HDO0KAu6xePEjr/E2/kmUJZwZX6cgeh8/Yp2NcSpWp4kq+gc2IJ
vtc5GZ5tgyvQ8UJB6xozL62g3aDcCtXppRJKDx+OkBgWWrczHg+zGi6XkLfsL86L
0yBgyp57cLRIyZ97isvAH7BEYF/vSVVvpjzA/m61LdftfMalTqgpUUt89rqWmbyw
iqcaspRTukHXpeCwFyvrpZviMl82wVTN3K2rUZAGisS6Xuht7K30I7uszIPx3gd/
S47fav2onOpAwSHni2IbZa+cLsz7vEaNFIj21/QS+wZvkFpbvf17m8AIxb7y31i4
cBKJC2qRyVFlQyGYNi3yZ4V1jY2nLV+/lC9Z1epYH5EmXAKSxHvxNL609QLaMovA
OPv/wgPESQnvxpHc9qqzGh9LJ8O+YaaPY4t2MB4Kf5CkIf25yOuDjvzUcJKiM3G8
ytwEdOipw80WozyfUuivz5F5skJ3ay8OvUeL/AxlA0k+2Q5nThyLO7LODA5pZSlO
vHLvIhp8FDdoFLwRtBiGfC42cvjVEsjiYD46M7uYQCGVVdu1/UegUoaA0beFwbHI
wFyFOTGip63nwAtsSAgqkq/yyBigHJsb9Z37X+sMRweY13337nM=
=PH+y
-----END PGP SIGNATURE-----
Merge tag 'v1.62.0rc2' into develop
Synapse 1.62.0rc2 (2022-07-01)
==============================
Bugfixes
--------
- Fix unread counts for users on large servers. Introduced in v1.62.0rc1. ([\#13140](https://github.com/matrix-org/synapse/issues/13140))
- Fix DB performance when deleting old push notifications. Introduced in v1.62.0rc1. ([\#13141](https://github.com/matrix-org/synapse/issues/13141))
When we receive an event over federation during a faster join, there is no need
to wait for full state, since we have a whole reconciliation process designed
to take the partial state into account.
* Extend the auth rule checks for `m.room.create` events
... and move them up to the top of the function. Since the no auth_events are
allowed for m.room.create events, we may as well get the m.room.create event
checks out of the way first.
* Add a test for create events with prev_events
When we fail to persist a federation event, we kick off a task to remove
its push actions in the background, using the current logging context.
Since we don't `await` that task, we may finish our logging context
before the task finishes. There's no reason to not `await` the task, so
let's do that.
Signed-off-by: Sean Quah <seanq@matrix.org>
Pull out `twitter:` meta tags when generating a preview and
use it to augment any `og:` meta tags.
Prefers Open Graph information over Twitter card information.
* Add auth events to events used in tests
* Move some event auth checks out to a different method
Some of the event auth checks apply to an event's auth_events, rather than the
state at the event - which means they can play no part in state
resolution. Move them out to a separate method.
* Rename check_auth_rules_for_event
Now it only checks the state-dependent auth rules, it needs a better name.
Fixes#11887 hopefully.
The core change here is that `event_push_summary` now holds a summary of counts up until a much more recent point, meaning that the range of rows we need to count in `event_push_actions` is much smaller.
This needs two major changes:
1. When we get a receipt we need to recalculate `event_push_summary` rather than just delete it
2. The logic for deleting `event_push_actions` is now divorced from calculating `event_push_summary`.
In future it would be good to calculate `event_push_summary` while we persist a new event (it should just be a case of adding one to the relevant rows in `event_push_summary`), as that will further simplify the get counts logic and remove the need for us to periodically update `event_push_summary` in a background job.
* Remove redundant references to `event_edges.room_id`
We don't need to care about the room_id here, because we are already checking
the event id.
* Clean up the event_edges table
We make a number of changes to `event_edges`:
* We give the `room_id` and `is_state` columns defaults (null and false
respectively) so that we can stop populating them.
* We drop any rows that have `is_state` set true - they should no longer
exist.
* We drop any rows that do not exist in `events` - these should not exist
either.
* We drop the old unique constraint on all the colums, which wasn't much use.
* We create a new unique index on `(event_id, prev_event_id)`.
* We add a foreign key constraint to `events`.
These happen rather differently depending on whether we are on Postgres or
SQLite. For SQLite, we just rebuild the whole table, copying only the rows we
want to keep. For Postgres, we try to do things in the background as much as
possible.
* Stop populating `event_edges.room_id` and `is_state`
We can just rely on the defaults.
* Rename test_fedclient to match its source file
* Require at least one destination to be truthy
* Explicitly validate user ID in profile endpoint GETs
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
This simplifies the access token verification logic by removing the `rights`
parameter which was only ever used for the unsubscribe link in email
notifications. The latter has been moved under the `/_synapse` namespace,
since it is not a standard API.
This also makes the email verification link more secure, by embedding the
app_id and pushkey in the macaroon and verifying it. This prevents the user
from tampering the query parameters of that unsubscribe link.
Macaroon generation is refactored:
- Centralised all macaroon generation and verification logic to the
`MacaroonGenerator`
- Moved to `synapse.utils`
- Changed the constructor to require only a `Clock`, hostname, and a secret key
(instead of a full `Homeserver`).
- Added tests for all methods.
The `room_id` field was removed from MSC2946 before
it was accepted. It was initially kept for backwards compatibility
and should be removed now that the stable form of the API
is used.
This change only stops Synapse from validating that it is returned,
a future PR will remove returning it as part of the response.
By always using delete_devices and sometimes passing a list
with a single device ID.
Previously these methods had gotten out of sync with each
other and it seems there's little benefit to the single-device
variant.
* Update worker docs to remove group endpoints.
* Removes an unused parameter to `ApplicationService`.
* Break dependency between media repo and groups.
* Avoid copying `m.room.related_groups` state events during room upgrades.
Currently, we try to pull the event corresponding to a sync token from the database. However, when
we fetch redaction events, we check the target of that redaction (because we aren't allowed to send
redactions to clients without validating them). So, if the sync token points to a redaction of an event
that we don't have, we have a problem.
It turns out we don't really need that event, and can just work with its ID and metadata, which
sidesteps the whole problem.
* Raise a dedicated `InvalidEventSignatureError` from `_check_sigs_on_pdu`
* Downgrade logging about redactions to DEBUG
this can be very spammy during a room join, and it's not very useful.
* Raise `InvalidEventSignatureError` from `_check_sigs_and_hash`
... and, more importantly, move the logging out to the callers.
* changelog
* Fix room deletion
ae7858f broke room deletion by attempting to delete the entry from `rooms`
before the tables that reference it.
* faster_joins: remove database rows on purge
* Refactor HTTP response size limits
Rather than passing a separate `max_response_size` down the stack, make it an
attribute of the `parser`.
* Allow bigger responses on `federation/v1/state`
`/state` can return huge responses, so we need to handle that.
Makes it so that groups/communities no longer exist from a user-POV. E.g. we remove:
* All API endpoints (including Client-Server, Server-Server, and admin).
* Documented configuration options (and the experimental flag, which is now unused).
* Special handling during room upgrades.
* The `groups` section of the `/sync` response.