... when workers are unreachable, etc.
Fixes https://github.com/element-hq/synapse/issues/17117.
The general principle is just to make sure that we propagate any
exceptions to the JsonResource, so that we return an error code to the
sending server. That means that the sending server no longer considers
the message safely sent, so it will retry later.
In the issue, Erik mentions that an alternative solution would be to
persist the to-device messages into a table so that they can be retried.
This might be an improvement for performance, but even if we did that,
we still need this mechanism, since we might be unable to reach the
database. So, if we want to do that, it can be a later follow-up.
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
PR #16942 removed an invalid optimisation that avoided pulling out state
for non-gappy syncs. This causes a large increase in DB usage. c.f.
#16941 for why that optimisation was wrong.
However, we can still optimise in the simple case where the events in
the timeline are a linear chain without any branching/merging of the
DAG.
cc. @richvdh
This PR fixes a very, very niche edge-case, but I've got some more work
coming which will otherwise make the problem worse.
The bug happens when the syncing user leaves a room, and has a sync
filter which includes "left" rooms, but sets the timeline limit to 0. In
that case, the state returned in the `state` section is calculated
incorrectly.
The fix is to pass a token corresponding to the point that the user
leaves the room through to `compute_state_delta`.
When a lot of locks are waiting for a single lock, notifying all locks
independently with `call_later` on each release is really costly and
incurs some kind of async contention, where the CPU is spinning a lot
for not much.
The included test is taking around 30s before the change, and 0.5s
after.
It was found following failing tests with
https://github.com/element-hq/synapse/pull/16827.
List of users not to send out device list updates for when they register
new devices. This is useful to handle bot accounts.
This is undocumented as its mostly a hack to test on matrix.org.
Note: This will still send out device list updates if the device is
later updated, e.g. end to end keys are added.
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
Prior to this PR, if a request to create a public (public as in
published to the rooms directory) room violated the room list
publication rules set in the
[config](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#room_list_publication_rules),
the request to create the room was denied and the room was not created.
This PR changes the behavior such that when a request to create a room
published to the directory violates room list publication rules, the
room is still created but the room is not published to the directory.
Sometimes we fail to fetch events during backfill due to missing state,
and we often end up querying the same bad events periodically (as people
backpaginate). In such cases its likely we will continue to fail to get
the state, and therefore we should try *before* loading the state that
we have from the DB (as otherwise it's wasted DB and memory).
---------
Co-authored-by: reivilibre <oliverw@matrix.org>
There are two changes here:
1. Only pull out the required state when handling the request.
2. Change the get filtered state return type to check that we're only
querying state that was requested
---------
Co-authored-by: reivilibre <oliverw@matrix.org>
Instead of persisting outliers in a bunch of batches, let's just do them
all at once.
This is fine because all `_auth_and_persist_outliers_inner` is doing is
checking the auth rules for each event, which requires the events to be
topologically sorted by the auth graph.
The idea here being that the directory server shouldn't advertise rooms
to a requesting server is the requesting server would not be allowed to
join or participate in the room.
<!--
Fixes: # <!-- -->
<!--
Supersedes: # <!-- -->
<!--
Follows: # <!-- -->
<!--
Part of: # <!-- -->
Base: `develop` <!-- git-stack-base-branch:develop -->
<!--
This pull request is commit-by-commit review friendly. <!-- -->
<!--
This pull request is intended for commit-by-commit review. <!-- -->
Original commit schedule, with full messages:
<ol>
<li>
Pass `from_federation_origin` down into room list retrieval code
</li>
<li>
Don't cache /publicRooms response for inbound federated requests
</li>
<li>
fixup! Don't cache /publicRooms response for inbound federated requests
</li>
<li>
Cap the number of /publicRooms entries to 100
</li>
<li>
Simplify code now that you can't request unlimited rooms
</li>
<li>
Filter out rooms from federated requests that don't have the correct ACL
</li>
<li>
Request a handful more when filtering ACLs so that we can try to avoid
shortchanging the requester
</li>
</ol>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
We remove these fields as they're just duplicating data the event
already stores, and (for reasons 🤫) I'd like to simplify
the class to only store simple types.
I'm not entirely convinced that we shouldn't instead add helper methods
to the event class to generate stream tokens, but I don't really think
that's where they belong either
This is an extra response parameter just added to MSC3981. In the
current impl, the recursion depth is always 3, so this just returns a
static 3 if the recurse parameter is supplied.
Keeping track of a lower bound of stream ID where we've deleted everything below makes the queries much faster. Otherwise, every time we scan for rows to delete we'd re-scan across all the rows that have previously deleted (until the next table VACUUM).
This is mostly useful for federated rooms where some users
would get stuck in the invite or knock state when the room
was purged from their homeserver.
This adds a module API which allows a module to update a user's
presence state/status message. This is useful for controlling presence
from an external system.
To fully control presence from the module the presence.enabled config
parameter gains a new state of "untracked" which disables internal tracking
of presence changes via user actions, etc. Only updates from the module will
be persisted and sent down sync properly).
This splits thinsg into two queries, but most of the time we won't have
new event backwards extremities so this shouldn't actually add an extra
RTT for the majority of cases.
Note this removes the check for events with no prev events, but that was
part of MSC2716 work that has since been removed.
Synapse was incorrectly implemented with a knock_state_events
property on some APIs (instead of knock_room_state). This was
correct in Synapse 1.70.0, but *both* fields were sent to also be
compatible with Synapse versions expecting the wrong field.
Enough time has passed that only the correct field needs to be
included/handled.
Co-authored-by: David Robertson <davidr@element.io>
Co-authored-by: Patrick Cloke <patrickc@matrix.org>
Co-authored-by: Erik Johnston <erik@matrix.org>
Assert that the return type of callables wrapped in @cached
and @cachedList are cachable (aka immutable).
Also add restore of purge/shutdown rooms after a synapse restart.
Co-authored-by: Eric Eastwood <erice@matrix.org>
Co-authored-by: Erik Johnston <erikj@matrix.org>
Refresh tokens were not correctly moved to the rehydrated
device (similar to how the access token is currently handled).
This resulted in invalid refresh tokens after rehydration.
Adds both the List-Unsubscribe (RFC2369) and List-Unsubscribe-Post (RFC8058)
headers to push notification emails, which together should:
* Show an "Unsubscribe" link in the MUA UI when viewing Synapse notification emails.
* Enable "one-click" unsubscribe (the user never leaves their MUA, which automatically
makes a POST request to the specified endpoint).
* Allow user_id to be optional for room deletion
* Add module API method to delete a room
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Don't worry about the case block=True && requester_user_id is None
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Add a (long) timeout to when a "busy" device is considered not online.
This does *not* match MSC3026, but is a reasonable thing for an
implementation to do.
Expands tests for the (unstable) busy presence with multiple devices.
Tracks presence on an individual per-device basis and combine
the per-device state into a per-user state. This should help in
situations where a user has multiple devices with conflicting status
(e.g. one is syncing with unavailable and one is syncing with online).
The tie-breaking is done by priority:
BUSY > ONLINE > UNAVAILABLE > OFFLINE
Refactoring to use both the user ID & the device ID when tracking
the currently syncing users in the presence handler.
This is done both locally and over replication. Note that the device
ID is discarded but will be used in a future change.
Refactoring to pass the device ID (in addition to the user ID) through
the presence handler (specifically the `user_syncing`, `set_state`,
and `bump_presence_active_time` methods and their replication
versions).
Simplify some of the presence code by reducing duplicated code between
worker & non-worker modes.
The main change is to push some of the logic from `user_syncing` into
`set_state`. This is done by passing whether the user is setting the presence
via a `/sync` with a new `is_sync` flag to `set_state`. If this is `true` some
additional logic is performed:
* Don't override `busy` presence.
* Update the `last_user_sync_ts`.
* Never update the status message.
Misc. clean-ups to:
* Use keyword arguments.
* Return early (reducing indentation) of some functions.
* Removing duplicated / unused code.
* Use wrap_as_background_process.
For now this maintains compatible with old Synapses by falling back
to using transaction semantics on a per-access token. A future version
of Synapse will drop support for this.
Signed-off-by: Nicolas Werner <n.werner@famedly.com>
Co-authored-by: Nicolas Werner <n.werner@famedly.com>
Co-authored-by: Nicolas Werner <89468146+nico-famedly@users.noreply.github.com>
Co-authored-by: Hubert Chathi <hubert@uhoreg.ca>
And fix a bug in the implementation of the updated redaction
format (MSC2174) where the top-level redacts field was not
properly added for backwards-compatibility.
**Before:**
```
Error retrieving alias
```
**After:**
```
Error retrieving alias #foo:bar -> 401 Unauthorized
```
*Spawning from creating the [manual testing strategy for the outbound federation proxy](https://github.com/matrix-org/synapse/pull/15773).*
We now only block the client to backfill when we see a large gap in the events (more than 2 events missing in a row according to `depth`), more than 3 single-event holes, or not enough messages to fill the response. Otherwise, we return the messages directly to the client and backfill in the background for eventual consistency sake.
Fix https://github.com/matrix-org/synapse/issues/15696
* Check required power levels earlier in createRoom handler.
- If a server was configured to reject the creation of rooms with E2EE
enabled (by specifying an unattainably high power level for
"m.room.encryption" in default_power_level_content_override), the 403
error was not being triggered until after the room was created and
before the "m.room.power_levels" was sent. This allowed a user to
access the partially-configured room and complete the setup of E2EE
and power levels manually.
- This change causes the power level overrides to be checked earlier and
the request to be rejected before the user gains access to the room.
- A new `_validate_room_config` method is added to contain checks that
should be run before a room is created.
- The new test case confirms that a user request is rejected by the new
validation method.
Signed-off-by: Grant McLean <grant@catalyst.net.nz>
* Add a changelog file.
* Formatting fix for black.
* Remove unneeded line from test.
---------
Signed-off-by: Grant McLean <grant@catalyst.net.nz>