Fixes https://github.com/matrix-org/synapse/issues/2431
Adds config option `encryption_enabled_by_default_for_room_type`, which determines whether encryption should be enabled with the default encryption algorithm in private or public rooms upon creation. Whether the room is private or public is decided based upon the room creation preset that is used.
Part of this PR is also pulling out all of the individual instances of `m.megolm.v1.aes-sha2` into a constant variable to eliminate typos ala https://github.com/matrix-org/synapse/pull/7637
Based on #7637
While working on https://github.com/matrix-org/synapse/issues/5665 I found myself digging into the `Ratelimiter` class and seeing that it was both:
* Rather undocumented, and
* causing a *lot* of config checks
This PR attempts to refactor and comment the `Ratelimiter` class, as well as encourage config file accesses to only be done at instantiation.
Best to be reviewed commit-by-commit.
* Expose `return_html_error`, and allow it to take a Jinja2 template instead of a raw string
* Clean up exception handling in SAML2ResponseResource
* use the existing code in `return_html_error` instead of re-implementing it
(giving it a jinja2 template rather than inventing a new form of template)
* do the exception-catching in the REST layer rather than in the handler
layer, to make sure we catch all exceptions.
It looks like `user_device_resync` was ignoring cross-signing keys from the results received from the remote server. This patch fixes this, by processing these keys using the same process `_handle_signing_key_updates` does (and effectively factor that part out of that function).
Without this patch, if an error happens which isn't caught by `user_device_resync`, then `_maybe_retry_device_resync` would fail, without retrying the next users in the iteration. This patch fixes this so that it now only logs an error in this case.
The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).
Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.
People probably want to look at this commit by commit.
Instead of doing a complicated dance of deleting and moving aliases one
by one, which sends a canonical alias update into the old room for each
one, lets do it all in one go.
This also changes the function to move *all* local alias events to the new
room, however that happens later on anyway.
When a call to `user_device_resync` fails, we don't currently mark the remote user's device list as out of sync, nor do we retry to sync it.
https://github.com/matrix-org/synapse/pull/6776 introduced some code infrastructure to mark device lists as stale/out of sync.
This commit uses that code infrastructure to mark device lists as out of sync if processing an incoming device list update makes the device handler realise that the device list is out of sync, but we can't resync right now.
It also adds a looping call to retry all failed resync every 30s. This shouldn't cause too much spam in the logs as this commit also removes the "Failed to handle device list update for..." warning logs when catching `NotRetryingDestination`.
Fixes#7418
Bugfixes:
- Hash passwords as early as possible during registration. #7523
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl7CpGYACgkQM/xY9qcR
MEhVixAAk2hDWVXxbGzUk2LmfiIsFA2eV55sw+VqEw0eRfe1d/mP6aH75VmTt3pw
IymZUVxDXdbTnPNPw+ldyGhzu9C6JJjXnNRBZnIkR5vcSbWsV0mPl/qHFu/4FnZI
m4Nj1Sx3sG0CyNDpWjVrzTW6SbDX9J68DXbLwnNTSX3KPa7gNn6TUmFfKzlrNI23
pPmD+EITYMn/H9HOhxhTzq//Ja7UOViAKQ0q4N2I4GxmLP6ufx9P3s5FG/oJqA+H
Pka2+9JnfHq2Ze22CoDcg8q5f5MgVkQzGeir0ZsGJwJqOYjeTmbCvD3T/RYWO5g+
ZghON3tsMQmdzUQqGRxcn/YLOZY9ZqrX2kBf5E6Wapwj9MfKg2ToLZM4yrWN0+RX
KDuWaKXYtkSQCo1nDS2KooVMWjGNZautWWnHzZ0KNQCIkxVpGC234JYI685grKXb
dg7R41kdXI7NJzqS4iM1fxXoLx64fpoREa/pbLF6VeLaYXBlzMjfhiIx2pQBN3L/
q/y3ftev9VCp+2wPxiKUkiC4Sh7dgWUzNuyHU+4lsPUbI1H/MN5dN2ryObdEGWc/
5YU3tv2MTQJ7jECHR+/fastnG+5d2kVm/FK+zVhG17JvA2VmDaLnSde+mzGbsO1N
gIUx5VrTEP7y0tC8C/VgbS3c2KqCSOopqd3j2slLLrtQlXM71VE=
=lpDI
-----END PGP SIGNATURE-----
Merge tag 'v1.13.0rc3' into develop
Synapse 1.13.0rc3 (2020-05-18)
Bugfixes:
- Hash passwords as early as possible during registration. #7523
==============================
Bugfixes
--------
- Fix a long-standing bug which could cause messages not to be sent over federation, when state events with state keys matching user IDs (such as custom user statuses) were received. ([\#7376](https://github.com/matrix-org/synapse/issues/7376))
- Restore compatibility with non-compliant clients during the user interactive authentication process, fixing a problem introduced in v1.13.0rc1. ([\#7483](https://github.com/matrix-org/synapse/issues/7483))
Internal Changes
----------------
- Fix linting errors in new version of Flake8. ([\#7470](https://github.com/matrix-org/synapse/issues/7470))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl69IQ8ACgkQOSor00I9
eP87lAf8DK+v6cs2U0BoD5opzQ7ZazJT6JYTmnMBaTzHU6Wx20V2ttkF7Vpwm3WU
Zsz0048tdYtHFyYBQ1kF5RNIBBJwV8SA/QUcPkR7FVpwZMLR2q4aJn0EE7kC9OMf
tYsmdbHeBdyfLXpXzazxWlgHquLyEIt52ykAcCphjx/Jl2fAExFEhtfsxpECoJ2f
8Dqhjg3WFjd6QWU6AFkElbwHUYCdIWdJOcsC8N1p8OvBmDz5QXv/RlYipHE00Cpx
QQQOgEjdRc6dlz2mbetMklnfII3p2kO9bzNdmEpOzT0Zt7nFaGdntW4I1QA0yJfa
gows9bYMzhqYk7YSiyTYOZ4qyavVtw==
=N/zZ
-----END PGP SIGNATURE-----
Merge tag 'v1.13.0rc2' into develop
Synapse 1.13.0rc2 (2020-05-14)
==============================
Bugfixes
--------
- Fix a long-standing bug which could cause messages not to be sent over federation, when state events with state keys matching user IDs (such as custom user statuses) were received. ([\#7376](https://github.com/matrix-org/synapse/issues/7376))
- Restore compatibility with non-compliant clients during the user interactive authentication process, fixing a problem introduced in v1.13.0rc1. ([\#7483](https://github.com/matrix-org/synapse/issues/7483))
Internal Changes
----------------
- Fix linting errors in new version of Flake8. ([\#7470](https://github.com/matrix-org/synapse/issues/7470))
* release-v1.13.0:
Don't UPGRADE database rows
RST indenting
Put rollback instructions in upgrade notes
Fix changelog typo
Oh yeah, RST
Absolute URL it is then
Fix upgrade notes link
Provide summary of upgrade issues in changelog. Fix )
Move next version notes from changelog to upgrade notes
Changelog fixes
1.13.0rc1
Documentation on setting up redis (#7446)
Rework UI Auth session validation for registration (#7455)
Fix errors from malformed log line (#7454)
Drop support for redis.dbid (#7450)
By persisting the user interactive authentication sessions to the database, this fixes
situations where a user hits different works throughout their auth session and also
allows sessions to persist through restarts of Synapse.
Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.
In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.
Fixes (I hope) #7257.
Add changelog
Save retrieved keys to the db
lint
Fix and de-brittle remote result dict processing
Use query_user_devices instead, assume only master, self_signing key types
Make changelog more useful
Remove very specific exception handling
Wrap get_verify_key_from_cross_signing_key in a try/except
Note that _get_e2e_cross_signing_verify_key can raise a SynapseError
lint
Add comment explaining why this is useful
Only fetch master and self_signing key types
Fix log statements, docstrings
Remove extraneous items from remote query try/except
lint
Factor key retrieval out into a separate function
Send device updates, modeled after SigningKeyEduUpdater._handle_signing_key_updates
Update method docstring
* Pull Sentinel out of LoggingContext
... and drop a few unnecessary references to it
* Factor out LoggingContext.current_context
move `current_context` and `set_context` out to top-level functions.
Mostly this means that I can more easily trace what's actually referring to
LoggingContext, but I think it's generally neater.
* move copy-to-parent into `stop`
this really just makes `start` and `stop` more symetric. It also means that it
behaves correctly if you manually `set_log_context` rather than using the
context manager.
* Replace `LoggingContext.alive` with `finished`
Turn `alive` into `finished` and make it a bit better defined.
If an error happened while processing a SAML AuthN response, or a client
ends up doing a `GET` request to `/authn_response`, then render a
customisable error page rather than a confusing error.
When we get an invite over federation, store the room version in the rooms table.
The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
This is intended as a precursor to storing room versions when we receive an
invite over federation, but has the happy side-effect of fixing #3374 at last.
In short: change the store_room with try/except to a proper upsert which
updates the right columns.
... and set it everywhere it's called.
while we're here, rename it for consistency with `check_user_in_room` (and to
help check that I haven't missed any instances)
* Reject device display names that are too long.
Too long is currently defined as 100 characters in length.
* Add a regression test for rejecting a too long device display name.
A lot of the things we log at INFO are now a bit superfluous, so lets
make them DEBUG logs to reduce the amount we log by default.
Co-Authored-By: Brendan Abolivier <babolivier@matrix.org>
Co-authored-by: Brendan Abolivier <github@brendanabolivier.com>
==============================
Bugfixes
--------
- Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](https://github.com/matrix-org/synapse/issues/6844))
- Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](https://github.com/matrix-org/synapse/issues/6848))
Internal Changes
----------------
- Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](https://github.com/matrix-org/synapse/issues/6850))
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEumuwyPtYLL2OMhYdOtoG7cdT0R4FAl478l8QHGVyaWtAbWF0
cml4Lm9yZwAKCRA62gbtx1PRHmKfD/0ZpBV95gMk0iecfvO6IY8+0KigzDUypgXf
zp1k+k5DBUmH5/83h7+cXWHIJyofv1At1YdIq9J/nNeyqr3V3dcQwPabOKYbI96d
kVD0EZX+2NjuYejAuF9eJNdiwRq5yMLluCfnOXr5jCS0NNdpO1xVb4bWYm50RmcD
WJvBvmfwXwCC3AFyNb4IAipaVUXuVXXx1YhjFs6DsbtpPUG88e4uIhpfvrS4v0t8
GFJZNuVJStOvyKHHTfNRIMkhy4xidj3HRUMSmjNpx07WnPBnbv4LnUOD1boYxgaP
EoODYnoAPshdiKB0AywNNjue3TmFD/Z7vVEzlFP/lNZ4GDU4kN9C/EwFYw0C2Jqq
f9O/E2ZuP9Qqume5O9UwMQQAmhV5lMBaIsYRbixU+9bSB893zRHRffA11HypzD00
ZXj8QDOXpiXp2jpAwN1Rk9e/aZEX3qd+zUAgRmk+kYb0KTC2/gcY956rodNSSO/U
RFYg4DFvoCgCrRSxZ4LQMlFu3YY2E7qWH+p/OHk3WG79jW64VpEFaytvv8fR+GKq
g3EbtWy6mUzlKYqbjZ+ZTUDe5AWdv6ZX8xJqRD/S6cpiwyh6Gp89HHNvBThXoCmK
fxkgw8if7eIITuTlNuDrqunxyWqwd3oVlzd2mi2bg0yRfcqJ9C6OuBV1VTLFZeky
3sjCiU0IRw==
=kcSy
-----END PGP SIGNATURE-----
Merge tag 'v1.10.0rc2' into develop
Synapse 1.10.0rc2 (2020-02-06)
==============================
Bugfixes
--------
- Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](https://github.com/matrix-org/synapse/issues/6844))
- Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](https://github.com/matrix-org/synapse/issues/6848))
Internal Changes
----------------
- Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](https://github.com/matrix-org/synapse/issues/6850))
We were looking at the wrong event type (`m.room.encryption` vs
`m.room.encrypted`).
Also fixup the duplicate `EvenTypes` entries.
Introduced in #6776.
These are easier to work with than the strings and we normally have one around.
This fixes `FederationHander._persist_auth_tree` which was passing a
RoomVersion object into event_auth.check instead of a string.
Turns out that figuring out a remote user id for the SAML user isn't quite as obvious as it seems. Factor it out to the SamlMappingProvider so that it's easy to control.
* Port synapse.replication.tcp to async/await
* Newsfile
* Correctly document type of on_<FOO> functions as async
* Don't be overenthusiastic with the asyncing....
When figuring out which topological token to start a purge job at, we
need to do the following:
1. Figure out a timestamp before which events will be purged
2. Select the first stream ordering after that timestamp
3. Select info about the first event after that stream ordering
4. Build a topological token from that info
In some situations (e.g. quiet rooms with a short max_lifetime), there
might not be an event after the stream ordering at step 3, therefore we
abort the purge with the error `No event found`. To mitigate that, this
patch fetches the first event _before_ the stream ordering, instead of
after.
Currently we rely on `current_state_events` to figure out what rooms a
user was in and their last membership event in there. However, if the
server leaves the room then the table may be cleaned up and that
information is lost. So lets add a table that separately holds that
information.
This fixes a weird bug where, if you were determined enough, you could end up with a rejected event forming part of the state at a backwards-extremity. Authing that backwards extrem would then lead to us trying to pull the rejected event from the db (with allow_rejected=False), which would fail with a 404.
When we request the state/auth_events to populate a backwards extremity (on
backfill or in the case of missing events in a transaction push), we should
check that the returned events are in the right room rather than blindly using
them in the room state or auth chain.
Given that _get_events_from_store_or_dest takes a room_id, it seems clear that
it should be sanity-checking the room_id of the requested events, so let's do
it there.
Make it return the state *after* the requested event, rather than the one
before it. This is a bit easier and requires fewer calls to
get_events_from_store_or_dest.
This fixes a weird bug where, if you were determined enough, you could end up with a rejected event forming part of the state at a backwards-extremity. Authing that backwards extrem would then lead to us trying to pull the rejected event from the db (with allow_rejected=False), which would fail with a 404.
Sometimes the filtering function can return a pruned version of an event (on top of either the event itself or an empty list), if it thinks the user should be able to see that there's an event there but not the content of that event. Therefore, the previous logic of 'if filtered is empty then we can use the event we retrieved from the database' is flawed, and we should use the event returned by the filtering function.
When we request the state/auth_events to populate a backwards extremity (on
backfill or in the case of missing events in a transaction push), we should
check that the returned events are in the right room rather than blindly using
them in the room state or auth chain.
Given that _get_events_from_store_or_dest takes a room_id, it seems clear that
it should be sanity-checking the room_id of the requested events, so let's do
it there.
Make it return the state *after* the requested event, rather than the one
before it. This is a bit easier and requires fewer calls to
get_events_from_store_or_dest.
PaginationHandler.get_messages is only called by RoomMessageListRestServlet,
which is async.
Chase the code path down from there:
- FederationHandler.maybe_backfill (and nested try_backfill)
- FederationHandler.backfill
have_events was a map from event_id to rejection reason (or None) for events
which are in our local database. It was used as filter on the list of
event_ids being passed into get_events_as_list. However, since
get_events_as_list will ignore any event_ids that are unknown or rejected, we
can equivalently just leave it to get_events_as_list to do the filtering.
That means that we don't have to keep `have_events` up-to-date, and can use
`have_seen_events` instead of `get_seen_events_with_rejection` in the one place
we do need it.
Implement part [MSC2228](https://github.com/matrix-org/matrix-doc/pull/2228). The parts that differ are:
* the feature is hidden behind a configuration flag (`enable_ephemeral_messages`)
* self-destruction doesn't happen for state events
* only implement support for the `m.self_destruct_after` field (not the `m.self_destruct` one)
* doesn't send synthetic redactions to clients because for this specific case we consider the clients to be able to destroy an event themselves, instead we just censor it (by pruning its JSON) in the database
Purge jobs don't delete the latest event in a room in order to keep the forward extremity and not break the room. On the other hand, get_state_events, when given an at_token argument calls filter_events_for_client to know if the user can see the event that matches that (sync) token. That function uses the retention policies of the events it's given to filter out those that are too old from a client's view.
Some clients, such as Riot, when loading a room, request the list of members for the latest sync token it knows about, and get confused to the point of refusing to send any message if the server tells it that it can't get that information. This can happen very easily with the message retention feature turned on and a room with low activity so that the last event sent becomes too old according to the room's retention policy.
An easy and clean fix for that issue is to discard the room's retention policies when retrieving state.
We were doing this in a number of places which meant that some login
code paths incremented the counter multiple times.
It was also applying ratelimiting to UIA endpoints, which was probably
not intentional.
In particular, some custom auth modules were calling
`check_user_exists`, which incremented the counters, meaning that people
would fail to login sometimes.
Fixes a bug where rejected events were persisted with the wrong state group.
Also fixes an occasional internal-server-error when receiving events over
federation which are rejected and (possibly because they are
backwards-extremities) have no prev_group.
Fixes#6289.
* Raise an exception if accessing state for rejected events
Add some sanity checks on accessing state_group etc for
rejected events.
* Skip calculating push actions for rejected events
It didn't actually cause any bugs, because rejected events get filtered out at
various later points, but there's not point in trying to calculate the push
actions for a rejected event.
When the `/keys/query` API is hit on client_reader worker Synapse may
decide that it needs to resync some remote deivces. Usually this happens
on master, and then gets cached. However, that fails on workers and so
it falls back to fetching devices from remotes directly, which may in
turn fail if the remote is down.
While the current version of the spec doesn't say much about how this endpoint uses filters (see https://github.com/matrix-org/matrix-doc/issues/2338), the current implementation is that some fields of an EventFilter apply (the ones that are used when running the SQL query) and others don't (the ones that are used by the filter itself) because we don't call event_filter.filter(...). This seems counter-intuitive and probably not what we want so this commit fixes it.
The intention here is to make it clearer which fields we can expect to be
populated when: notably, that the _event_type etc aren't used for the
synchronous impl of EventContext.
The `http_proxy` and `HTTPS_PROXY` env vars can be set to a `host[:port]` value which should point to a proxy.
The address of the proxy should be excluded from IP blacklists such as the `url_preview_ip_range_blacklist`.
The proxy will then be used for
* push
* url previews
* phone-home stats
* recaptcha validation
* CAS auth validation
It will *not* be used for:
* Application Services
* Identity servers
* Outbound federation
* In worker configurations, connections from workers to masters
Fixes#4198.
* Fix presence timeouts when synchrotron restarts.
Handling timeouts would fail if there was an external process that had
timed out, e.g. a synchrotron restarting. This was due to a couple of
variable name typoes.
Fixes#3715.
Hopefully this will fix the occasional failures we were seeing in the room directory.
The problem was that events are not necessarily persisted (and `current_state_delta_stream` updated) in the same order as their stream_id. So for instance current_state_delta 9 might be persisted *before* current_state_delta 8. Then, when the room stats saw stream_id 9, it assumed it had done everything up to 9, and never came back to do stream_id 8.
We can solve this easily by only processing up to the stream_id where we know all events have been persisted.
It turns out that _local_membership_update doesn't run when you join a new, remote room. It only runs if you're joining a room that your server already knows about. This would explain #4703 and #5295 and why the transfer would work in testing and some rooms, but not others. This would especially hit single-user homeservers.
The check has been moved to right after the room has been joined, and works much more reliably. (Though it may still be a bit awkward of a place).
We incorrectly used `room_id` as to bound the result set, even though we
order by `joined_members, room_id`, leading to incorrect results after
pagination.
Copy push rules during a room upgrade from the old room to the new room, instead of deleting them from the old room.
For instance, we've defined upgrading of a room multiple times to be possible, and push rules won't be transferred on the second upgrade if they're deleted during the first.
Also fix some missing yields that probably broke things quite a bit.
While this is not documented in the spec (but should be), Riot (and other clients) revoke 3PID invites by sending a m.room.third_party_invite event with an empty ({}) content to the room's state.
When the invited 3PID gets associated with a MXID, the identity server (which doesn't know about revocations) sends down to the MXID's homeserver all of the undelivered invites it has for this 3PID. The homeserver then tries to talk to the inviting homeserver in order to exchange these invite for m.room.member events.
When one of the invite is revoked, the inviting homeserver responds with a 500 error because it tries to extract a 'display_name' property from the content, which is empty. This might cause the invited server to consider that the server is down and not try to exchange other, valid invites (or at least delay it).
This fix handles the case of revoked invites by avoiding trying to fetch a 'display_name' from the original invite's content, and letting the m.room.member event fail the auth rules (because, since the original invite's content is empty, it doesn't have public keys), which results in sending a 403 with the correct error message to the invited server.
Pull the checkers out to their own classes, rather than having them lost in a
massive 1000-line class which does everything.
This is also preparation for some more intelligent advertising of flows, as per #6100
Second part of solving #6076Fixes#6076
We return a submit_url parameter on calls to POST */msisdn/requestToken so that clients know where to submit token information to.
Uses a SimpleHttpClient instance equipped with the federation_ip_range_blacklist list for requests to identity servers provided by user input. Does not use a blacklist when contacting identity servers specified by account_threepid_delegates. The homeserver trusts the latter and we don't want to prevent homeserver admins from specifying delegates that are on internal IP addresses.
Fixes#5935
Implements MSC2290. This PR adds two new endpoints, /unstable/account/3pid/add and /unstable/account/3pid/bind. Depending on the progress of that MSC the unstable prefix may go away.
This PR also removes the blacklist on some 3PID tests which occurs in #6042, as the corresponding Sytest PR changes them to use the new endpoints.
Finally, it also modifies the account deactivation code such that it doesn't just try to deactivate 3PIDs that were bound to the user's account, but any 3PIDs that were bound through the homeserver on that user's account.
3PID invites require making a request to an identity server to check that the invited 3PID has an Matrix ID linked, and if so, what it is.
These requests are being made on behalf of a user. The user will supply an identity server and an access token for that identity server. The homeserver will then forward this request with the access token (using an `Authorization` header) and, if the given identity server doesn't support v2 endpoints, will fall back to v1 (which doesn't require any access tokens).
Requires: ~~#5976~~
Broke in #5971
Basically the bug is that if get_current_state_deltas returns no new updates and we then take the max pos, its possible that we miss an update that happens in between the two calls. (e.g. get_current_state_deltas looks up to stream pos 5, then an event persists and so getting the max stream pos returns 6, meaning that next time we check for things with a stream pos bigger than 6)
We want to assign unique mxids to saml users based on an incrementing
suffix. For that to work, we need to record the allocated mxid in a separate
table.
Some small fixes to `room_member.py` found while doing other PRs.
1. Add requester to the base `_remote_reject_invite` method.
2. `send_membership_event`'s docstring was out of date and took in a `remote_room_hosts` arg that was not used and no calling function provided.
Another small fixup noticed during work on a larger PR. The `origin` field of `add_display_name_to_third_party_invite` is not used and likely was just carried over from the `on_PUT` method of `FederationThirdPartyInviteExchangeServlet` which, like all other servlets, provides an `origin` argument.
Since it's not used anywhere in the handler function though, we should remove it from the function arguments.
Previously if the first registered user was a "support" or "bot" user,
when the first real user registers, the auto-join rooms were not
created.
Fix to exclude non-real (ie users with a special user type) users
when counting how many users there are to determine whether we should
auto-create a room.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
This is a combination of a few different PRs, finally all being merged into `develop`:
* #5875
* #5876
* #5868 (This one added the `/versions` flag but the flag itself was actually [backed out](891afb57cb (diff-e591d42d30690ffb79f63bb726200891)) in #5969. What's left is just giving /versions access to the config file, which could be useful in the future)
* #5835
* #5969
* #5940
Clients should not actually use the new registration functionality until https://github.com/matrix-org/synapse/pull/5972 is merged.
UPGRADE.rst, changelog entries and config file changes should all be reviewed closely before this PR is merged.
Remove all the "double return" statements which were a result of us removing all the instances of
```
defer.returnValue(...)
return
```
statements when we switched to python3 fully.
Python will return a tuple whether there are parentheses around the returned values or not.
I'm just sick of my editor complaining about this all over the place :)
This hopefully addresses #5407 by gracefully handling an empty but
limited TimelineBatch. We also add some logging to figure out how this
is happening.
The `expire_access_token` didn't do what it sounded like it should do. What it
actually did was make Synapse enforce the 'time' caveat on macaroons used as
access tokens, but since our access token macaroons never contained such a
caveat, it was always a no-op.
(The code to add 'time' caveats was removed back in v0.18.5, in #1656)
`None` is not a valid event id, so queuing up a database fetch for it seems
like a silly thing to do.
I considered making `get_event` return `None` if `event_id is None`, but then
its interaction with `allow_none` seemed uninituitive, and strong typing ftw.
this is only used in one place, so it's clearer if we inline it and reduce the
API surface.
Also, fixes a buglet where we would create an access token even if we were
about to block the user (we would never return the AT, so the user could never
use it, but it was still created and added to the db.)
FederationDeniedError is a subclass of SynapseError, which is a subclass of
CodeMessageException, so if e is a FederationDeniedError, then this check for
FederationDeniedError will never be reached since it will be caught by the
check for CodeMessageException above. The check for CodeMessageException does
almost the same thing as this check (since FederationDeniedError initialises
with code=403 and msg="Federation denied with %s."), so may as well just keep
allowing it to handle this case.
Nothing uses this now, so we can remove the dead code, and clean up the
API.
Since we're changing the shape of the return value anyway, we take the
opportunity to give the method a better name.
When a user creates an account and the 'require_auth_for_profile_requests' config flag is set, and a client that performed the registration wants to lookup the newly-created profile, the request will be denied because the user doesn't share a room with themselves yet.
This has never been documented, and I'm not sure it's ever been used outside
sytest.
It's quite a lot of poorly-maintained code, so I'd like to get rid of it.
For now I haven't removed the database table; I suggest we leave that for a
future clearout.
* SAML2 Improvements and redirect stuff
Signed-off-by: Alexander Trost <galexrt@googlemail.com>
* Code cleanups and simplifications.
Also: share the saml client between redirect and response handlers.
* changelog
* Revert redundant changes to static js
* Move all the saml stuff out to a centralised handler
* Add support for tracking SAML2 sessions.
This allows us to correctly handle `allow_unsolicited: False`.
* update sample config
* cleanups
* update sample config
* rename BaseSSORedirectServlet for consistency
* Address review comments
This would cause emails being sent, but Synapse responding with a 429 when creating the event. The client would then retry, and with bad timing the same scenario would happen again. Some testing I did ended up sending me 10 emails for one single invite because of this.
When a client asks for users whose devices have changed since a token we
used to pull *all* users from the database since the token, which could
easily be thousands of rows for old tokens.
This PR changes this to only check for changes for users the client is
actually interested in.
Fixes#5553
Closes#4583
Does slightly less than #5045, which prevented a room from being upgraded multiple times, one after another. This PR still allows that, but just prevents two from happening at the same time.
Mostly just to mitigate the fact that servers are slow and it can take a moment for the room upgrade to actually complete. We don't want people sending another request to upgrade the room when really they just thought the first didn't go through.
If no `from` param is specified we calculate and use the "current
token" that inlcuded typing, presence, etc. These are unused during
pagination and are not available on workers, so we simply don't
calculate them.
If, for some reason, presence updates take a while to persist then it
can trigger clients to tightloop calling `/sync` due to the presence
handler returning updates but not advancing the stream token.
Fixes#5503.
I had to add quite a lot of logging to diagnose a problem with 3pid
invites - we only logged the one failure which isn't all that
informative.
NB. I'm not convinced the logic of this loop is right: I think it
should just accept a single valid signature from a trusted source
rather than fail if *any* signature is invalid. Also it should
probably not skip the rest of middle loop if a check fails? However,
I'm deliberately not changing the logic here.
Adds new config option `cleanup_extremities_with_dummy_events` which
periodically sends dummy events to rooms with more than 10 extremities.
THIS IS REALLY EXPERIMENTAL.
Fixes that when a user exchanges a 3PID invite for a proper invite over
federation it does not include the `invite_room_state` key.
This was due to synapse incorrectly sending out two invite requests.
Sends password reset emails from the homeserver instead of proxying to the identity server. This is now the default behaviour for security reasons. If you wish to continue proxying password reset requests to the identity server you must now enable the email.trust_identity_server_for_password_resets option.
This PR is a culmination of 3 smaller PRs which have each been separately reviewed:
* #5308
* #5345
* #5368
* Fix background updates to handle redactions/rejections
In background updates based on current state delta stream we need to
handle that we may not have all the events (or at least that
`get_events` may raise an exception).
When processing an incoming event over federation, we may try and
resolve any unexpected differences in auth events. This is a
non-essential process and so should not stop the processing of the event
if it fails (e.g. due to the remote disappearing or not implementing the
necessary endpoints).
Fixes#3330
Replaces DEFAULT_ROOM_VERSION constant with a method that first checks the config, then returns a hardcoded value if the option is not present.
That hardcoded value is now located in the server.py config file.
I was staring at this function trying to figure out wtf it was actually
doing. This is (hopefully) a non-functional refactor which makes it a bit
clearer.
This commit adds two config options:
* `restrict_public_rooms_to_local_users`
Requires auth to fetch the public rooms directory through the CS API and disables fetching it through the federation API.
* `require_auth_for_profile_requests`
When set to `true`, requires that requests to `/profile` over the CS API are authenticated, and only returns the user's profile if the requester shares a room with the profile's owner, as per MSC1301.
MSC1301 also specifies a behaviour for federation (only returning the profile if the server asking for it shares a room with the profile's owner), but that's currently really non-trivial to do in a not too expensive way. Next step is writing down a MSC that allows a HS to specify which user sent the profile query. In this implementation, Synapse won't send a profile query over federation if it doesn't believe it already shares a room with the profile's owner, though.
Groups have been intentionally omitted from this commit.
Currently if a call to `/get_missing_events` fails we log an exception
and stop processing the top level event we received over federation.
Instead let's try and handle it sensibly given it is a somewhat expected
failure mode.
By default the homeserver will use the identity server used during the
binding of the 3PID to unbind the 3PID. However, we need to allow
clients to explicitly ask the homeserver to unbind via a particular
identity server, for the case where the 3PID was bound out of band from
the homeserver.
Implements MSC915.
This changes the behaviour from using the server specified trusted
identity server to using the IS that used during the binding of the
3PID, if known.
This is the behaviour specified by MSC1915.
Primarily this fixes a bug in the handling of remote users joining a
room where the server sent out the presence for all local users in the
room to all servers in the room.
We also change to using the state delta stream, rather than the
distributor, as it will make it easier to split processing out of the
master process (as well as being more flexible).
Finally, when sending presence states to newly joined servers we filter
out old presence states to reduce the number sent. Initially we filter
out states that are offline and have a last active more than a week ago,
though this can be changed down the line.
Fixes#3962
Adds a new method, check_3pid_auth, which gives password providers
the chance to allow authentication with third-party identifiers such
as email or msisdn.
There are a number of instances where a server or admin may puppet a
user to join/leave rooms, which we don't want to fail if the user has
not consented to the privacy policy. We fix this by adding a check to
test if the requester has an associated access_token, which is used as a
proxy to answer the question of whether the action is being done on
behalf of a real request from the user.
We keep track of what stream IDs we've seen so that we know what updates
we've handled or missed. If we re-sync we don't know if the updates
we've seen are included in the re-sync (there may be a race), so we
should reset the seen updates.
* Rate-limiting for registration
* Add unit test for registration rate limiting
* Add config parameters for rate limiting on auth endpoints
* Doc
* Fix doc of rate limiting function
Co-Authored-By: babolivier <contact@brendanabolivier.com>
* Incorporate review
* Fix config parsing
* Fix linting errors
* Set default config for auth rate limiting
* Fix tests
* Add changelog
* Advance reactor instead of mocked clock
* Move parameters to registration specific config and give them more sensible default values
* Remove unused config options
* Don't mock the rate limiter un MAU tests
* Rename _register_with_store into register_with_store
* Make CI happy
* Remove unused import
* Update sample config
* Fix ratelimiting test for py2
* Add non-guest test
Remove a call to run_as_background_process: there is no need to run this as a
background process, because build_and_send_edu does not block.
We may as well inline the whole of _push_remotes.
When filtering events to send to server we check more than just history
visibility. However when deciding whether to backfill or not we only
care about the history visibility.
In worker mode, on the federation sender, when we receive an edu for sending
over the replication socket, it is parsed into an Edu object. There is no point
extracting the contents of it so that we can then immediately build another Edu.
When guest_access changes from allowed to forbidden all local guest
users should be kicked from the room. This did not happen when
revocation was received from federation on a worker.
Presumably broken in #4141
This allows registration to be handled by a worker, though the actual
write to the database still happens on master.
Note: due to the in-memory session map all registration requests must be
handled by the same worker.
This was broken in PR #4405, commit 886e5ac, where we changed remote
rejections to be outliers.
The fix is to explicitly add the leave event in when we know its an out
of band invite. We can't always add the event as if the server is/was in
the room there might be more events to send down the sync than just the
leave.
* Handle listening for ACME requests on IPv6 addresses
the weird url-but-not-actually-a-url-string doesn't handle IPv6 addresses
without extra quoting. Building a string which you are about to parse again
seems like a weird choice. Let's just use listenTCP, which is consistent with
what we do elsewhere.
* Clean up the default ACME config
make it look a bit more consistent with everything else, and tweak the defaults
to listen on port 80.
* newsfile
The transaction queue only sends out events that we generate. This was
done by checking domain of event ID, but that can no longer be used.
Instead, we may as well use the sender field.
We currently pass FrozenEvent instead of `dict` to
`compute_event_signature`, which works by accident due to `dict(event)`
producing the correct result.
This fixes PR #4493 commit 855a151
The validator was being run on the EventBuilder objects, and so the
validator only checked a subset of fields. With the upcoming
EventBuilder refactor even fewer fields will be there to validate.
To get around this we split the validation into those that can be run
against an EventBuilder and those run against a fully fledged event.
Currently they're stored as non-outliers even though the server isn't in
the room, which can be problematic in places where the code assumes it
has the state for all non outlier events.
In particular, there is an edge case where persisting the leave event
triggers a state resolution, which requires looking up the room version
from state. Since the server doesn't have the state, this causes an
exception to be thrown.
Allow for the creation of a support user.
A support user can access the server, join rooms, interact with other users, but does not appear in the user directory nor does it contribute to monthly active user limits.
This is mostly factoring out the post-CAS-login code to somewhere we can reuse
it for other SSO flows, but it also fixes the userid mapping while we're at it.
* Rip out half-implemented m.login.saml2 support
This was implemented in an odd way that left most of the work to the client, in
a way that I really didn't understand. It's going to be a pain to maintain, so
let's start by ripping it out.
* drop undocumented dependency on dateutil
It turns out we were relying on dateutil being pulled in transitively by
pysaml2. There's no need for that bloat.
Fixes handling of rooms where we have permission to send the tombstone, but not
other state. We need to (a) fail more gracefully when we can't send the PLs in
the old room, and (b) not set the PLs in the new room until we are done with
the other stuff.
Currently when fetching state groups from the data store we make two
hits two the database: once for members and once for non-members (unless
request is filtered to one or the other). This adds needless load to the
datbase, so this PR refactors the lookup to make only a single database
hit.
Wrap calls to deferToThread() in a thing which uses a child logcontext to
attribute CPU usage to the right request.
While we're in the area, remove the logcontext_tracer stuff, which is never
used, and afaik doesn't work.
Fixes#4064
`on_new_notifications` and `on_new_receipts` in `HttpPusher` and `EmailPusher`
now always return synchronously, so we can remove the `defer.gatherResults` on
their results, and the `run_as_background_process` wrappers can be removed too
because the PusherPool methods will now complete quickly enough.
It's quite important that get_missing_events returns the *latest* events in the
room; however we were pulling event ids out of the database until we got *at
least* 10, and then taking the *earliest* of the results.
We also shouldn't really be relying on depth, and should be checking the
room_id.
In particular, we assume that the name and canonical alias events in
the state have not been rejected. In practice this may not be the case
(though we should probably think about fixing that) so lets ensure that
we gracefully handle that case, rather than 404'ing the sync request
like we do now.
If we have a forward extremity for a room as `E`, and you receive `A`, `B`,
s.t. `A -> B -> E`, and `B` also points to an unknown event `X`, then we need
to do state res between `X` and `E`.
When that happens, we need to make sure we include `X` in the state that goes
into the state res alg.
Fixes#3934.
If we've fetched state events from remote servers in order to resolve the state
for a new event, we need to actually pass those events into
resolve_events_with_factory (so that it can do the state res) and then persist
the ones we need - otherwise other bits of the codebase get confused about why
we have state groups pointing to non-existent events.
get_state_groups returns a map from state_group_id to a list of FrozenEvents,
so was very much the wrong thing to be putting as one of the entries in the
list passed to resolve_events_with_factory (which expects maps from
(event_type, state_key) to event id).
We actually want get_state_groups_ids().values() rather than
get_state_groups().
This fixes the main problem in #3923, but there are other problems with this
bit of code which get discovered once you do so.
* add some comments on things that look a bit bogus
* rename this `state` variable to avoid confusion with the `state` used
elsewhere in this function. (There was no actual conflict, but it was
a confusing bit of spaghetti.)
when processing incoming transactions, it can be hard to see what's going on,
because we process a bunch of stuff in parallel, and because we may end up
recursively working our way through a chain of three or four events.
This commit creates a way to use logcontexts to add the relevant event ids to
the log lines.
Given we have disabled lazy loading for incr syncs in #3840, we can make self-LL more efficient by only doing it on initial sync. Also adds a bounds check for if/when we change our mind, so that we don't try to include LL members on sync responses with no timeline.
Add some informative comments about what's going on here.
Also, `sent_to_us_directly` and `get_missing` were doing the same thing (apart
from in `_handle_queued_pdus`, which looks like a bug), so let's get rid of
`get_missing` and use `sent_to_us_directly` consistently.
Let's try to rationalise the logging that happens when we are processing an
incoming transaction, to make it easier to figure out what is going wrong when
they take ages. In particular:
- make everything start with a [room_id event_id] prefix
- make sure we log a warning when catching exceptions rather than just turning
them into other, more cryptic, exceptions.
don't filter membership events based on history visibility
as we will already have filtered the messages in the timeline, and state events
are always visible.
and because @erikjohnston said so.
* speed up room summaries by pulling their data from room_memberships rather than room state
* disable LL for incr syncs, and log incr sync stats (#3840)
When a user joined a room any existing tags were not sent down the sync
stream. Ordinarily this isn't a problem because the user needs to be in
the room to have set tags in it, however synapse will sometimes add tags
for a user to a room, e.g. for server notices, which need to come down
sync.
Fetching the list of all new typing notifications involved iterating
over all rooms and comparing their serial. Lets move to using a stream
change cache, like we do for other streams.
Turns out that the user directory handling is fairly racey as a bunch
of stuff assumes that the processing happens on master, which it doesn't
when there is a synapse.app.user_dir worker. So lets just call the
function directly until we actually get round to fixing it, since it
doesn't make the situation any worse.
First of all, avoid resetting the logcontext before running the pushers, to fix
the "Starting db txn 'get_all_updated_receipts' from sentinel context" warning.
Instead, give them their own "background process" logcontexts.
Older identity servers may not support the unbind 3pid request, so we
shouldn't fail the requests if we received one of 400/404/501. The
request still fails if we receive e.g. 500 responses, allowing clients
to retry requests on transient identity server errors that otherwise do
support the API.
Fixes#3661