When fetching the state of a room over federation we receive the event
IDs of the state and auth chain. We then fetch those events that we
don't already have.
However, we used a function that recursively fetched any missing auth
events for the fetched events, which can lead to a lot of recursion if
the server is missing most of the auth chain. This work is entirely
pointless because would have queued up the missing events in the auth
chain to be fetched already.
Let's just diable the recursion, since it only gets called from one
place anyway.
Fixes#2181.
The basic premise is that, when we
fail to reject an invite via the remote server, we can generate our own
out-of-band leave event and persist it as an outlier, so that we have something
to send to the client.
The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.
==============================
Synapse 1.16.0rc2 includes the security fixes released with Synapse 1.15.2.
Please see [below](https://github.com/matrix-org/synapse/blob/master/CHANGES.md#synapse-1152-2020-07-02) for more details.
Improved Documentation
----------------------
- Update postgres image in example `docker-compose.yaml` to tag `12-alpine`. ([\#7696](https://github.com/matrix-org/synapse/issues/7696))
Internal Changes
----------------
- Add some metrics for inbound and outbound federation latencies: `synapse_federation_server_pdu_process_time` and `synapse_event_processing_lag_by_event`. ([\#7771](https://github.com/matrix-org/synapse/issues/7771))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl79+qgACgkQM/xY9qcR
MEhcaRAAjWLW3ojN1F0DUfE85jziZK2VdnMQC3g+uEOLX6QRbfqFNaNNMjLdK+vl
K/+2ZoHkRsg6g8noSPhPmI1z1+hb5xDJaxjltzHxonIipW8XSU8o2PQMkf8O/BAy
VS58y3GyLkhEgzWC+/hcII+LBgcqXpLuNM0xrKTHmxclIjdewlwe1v+hxkP+6wsX
9Whhn1f4sNHrCtyFVK9uzMFcVyzcQaiWZRjEDMj2uR7rWT6UbCUifN/G4fWmtGbY
xWoNoC4Qv8xiqXOG4U7juPp9T3bRyWMKyjBFM5PWO6Ec2zfafDyFzhBxJhlQhODG
g21tS4PowX/dM/pBpJFEOPh1BVrPZzzTD+YMmTcd3NO79HeaQGqEX/+tzFCFUyPp
0daJK3Y85+l5w/M09WU8DDN8CiR3PFJyGDIZp+nweMsiJZkbEbLOkh1tx6TL+5/6
zwewU6cq8nTVGrn53Tn58l8C7Sj4w+Qk+1XDzymAoidyoWqAKW9Y/fw53PaViUSx
voDu0rpsEUXR1OzCBG8SAPQCFy9gdEWV04OvIpzHuq2uojkz66f7NAXy+Wz+Occ9
AYb/s6Ei80bGCLgRd5jg+myqavwRbzCyv+LIC6dxpopxZJ3AzrFuD11eXKtrIxOC
FZYf3U4KeBk4Q9TV5IFV1xcGFrq5aK36LdmP6WOsEl3PXVT9p/Q=
=YaJn
-----END PGP SIGNATURE-----
Merge tag 'v1.16.0rc2' into develop
Synapse 1.16.0rc2 (2020-07-02)
==============================
Synapse 1.16.0rc2 includes the security fixes released with Synapse 1.15.2.
Please see [below](https://github.com/matrix-org/synapse/blob/master/CHANGES.md#synapse-1152-2020-07-02) for more details.
Improved Documentation
----------------------
- Update postgres image in example `docker-compose.yaml` to tag `12-alpine`. ([\#7696](https://github.com/matrix-org/synapse/issues/7696))
Internal Changes
----------------
- Add some metrics for inbound and outbound federation latencies: `synapse_federation_server_pdu_process_time` and `synapse_event_processing_lag_by_event`. ([\#7771](https://github.com/matrix-org/synapse/issues/7771))
State res v2 across large data sets can be very CPU intensive, and if
all the relevant events are in the cache the algorithm will run from
start to finish within a single reactor tick. This can result in
blocking the reactor tick for several seconds, which can have major
repercussions on other requests.
To fix this we simply add the occaisonal `sleep(0)` during iterations to
yield execution until the next reactor tick. The aim is to only do this
for large data sets so that we don't impact otherwise quick resolutions.=
The aim here is to make it easier to reason about when streams are limited and when they're not, by moving the logic into the database functions themselves. This should mean we can kill of `db_query_to_update_function` function.
Fixes https://github.com/matrix-org/synapse/issues/2431
Adds config option `encryption_enabled_by_default_for_room_type`, which determines whether encryption should be enabled with the default encryption algorithm in private or public rooms upon creation. Whether the room is private or public is decided based upon the room creation preset that is used.
Part of this PR is also pulling out all of the individual instances of `m.megolm.v1.aes-sha2` into a constant variable to eliminate typos ala https://github.com/matrix-org/synapse/pull/7637
Based on #7637
While working on https://github.com/matrix-org/synapse/issues/5665 I found myself digging into the `Ratelimiter` class and seeing that it was both:
* Rather undocumented, and
* causing a *lot* of config checks
This PR attempts to refactor and comment the `Ratelimiter` class, as well as encourage config file accesses to only be done at instantiation.
Best to be reviewed commit-by-commit.
* Expose `return_html_error`, and allow it to take a Jinja2 template instead of a raw string
* Clean up exception handling in SAML2ResponseResource
* use the existing code in `return_html_error` instead of re-implementing it
(giving it a jinja2 template rather than inventing a new form of template)
* do the exception-catching in the REST layer rather than in the handler
layer, to make sure we catch all exceptions.
It looks like `user_device_resync` was ignoring cross-signing keys from the results received from the remote server. This patch fixes this, by processing these keys using the same process `_handle_signing_key_updates` does (and effectively factor that part out of that function).
Without this patch, if an error happens which isn't caught by `user_device_resync`, then `_maybe_retry_device_resync` would fail, without retrying the next users in the iteration. This patch fixes this so that it now only logs an error in this case.
The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).
Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.
People probably want to look at this commit by commit.
Instead of doing a complicated dance of deleting and moving aliases one
by one, which sends a canonical alias update into the old room for each
one, lets do it all in one go.
This also changes the function to move *all* local alias events to the new
room, however that happens later on anyway.
When a call to `user_device_resync` fails, we don't currently mark the remote user's device list as out of sync, nor do we retry to sync it.
https://github.com/matrix-org/synapse/pull/6776 introduced some code infrastructure to mark device lists as stale/out of sync.
This commit uses that code infrastructure to mark device lists as out of sync if processing an incoming device list update makes the device handler realise that the device list is out of sync, but we can't resync right now.
It also adds a looping call to retry all failed resync every 30s. This shouldn't cause too much spam in the logs as this commit also removes the "Failed to handle device list update for..." warning logs when catching `NotRetryingDestination`.
Fixes#7418
Bugfixes:
- Hash passwords as early as possible during registration. #7523
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl7CpGYACgkQM/xY9qcR
MEhVixAAk2hDWVXxbGzUk2LmfiIsFA2eV55sw+VqEw0eRfe1d/mP6aH75VmTt3pw
IymZUVxDXdbTnPNPw+ldyGhzu9C6JJjXnNRBZnIkR5vcSbWsV0mPl/qHFu/4FnZI
m4Nj1Sx3sG0CyNDpWjVrzTW6SbDX9J68DXbLwnNTSX3KPa7gNn6TUmFfKzlrNI23
pPmD+EITYMn/H9HOhxhTzq//Ja7UOViAKQ0q4N2I4GxmLP6ufx9P3s5FG/oJqA+H
Pka2+9JnfHq2Ze22CoDcg8q5f5MgVkQzGeir0ZsGJwJqOYjeTmbCvD3T/RYWO5g+
ZghON3tsMQmdzUQqGRxcn/YLOZY9ZqrX2kBf5E6Wapwj9MfKg2ToLZM4yrWN0+RX
KDuWaKXYtkSQCo1nDS2KooVMWjGNZautWWnHzZ0KNQCIkxVpGC234JYI685grKXb
dg7R41kdXI7NJzqS4iM1fxXoLx64fpoREa/pbLF6VeLaYXBlzMjfhiIx2pQBN3L/
q/y3ftev9VCp+2wPxiKUkiC4Sh7dgWUzNuyHU+4lsPUbI1H/MN5dN2ryObdEGWc/
5YU3tv2MTQJ7jECHR+/fastnG+5d2kVm/FK+zVhG17JvA2VmDaLnSde+mzGbsO1N
gIUx5VrTEP7y0tC8C/VgbS3c2KqCSOopqd3j2slLLrtQlXM71VE=
=lpDI
-----END PGP SIGNATURE-----
Merge tag 'v1.13.0rc3' into develop
Synapse 1.13.0rc3 (2020-05-18)
Bugfixes:
- Hash passwords as early as possible during registration. #7523
==============================
Bugfixes
--------
- Fix a long-standing bug which could cause messages not to be sent over federation, when state events with state keys matching user IDs (such as custom user statuses) were received. ([\#7376](https://github.com/matrix-org/synapse/issues/7376))
- Restore compatibility with non-compliant clients during the user interactive authentication process, fixing a problem introduced in v1.13.0rc1. ([\#7483](https://github.com/matrix-org/synapse/issues/7483))
Internal Changes
----------------
- Fix linting errors in new version of Flake8. ([\#7470](https://github.com/matrix-org/synapse/issues/7470))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl69IQ8ACgkQOSor00I9
eP87lAf8DK+v6cs2U0BoD5opzQ7ZazJT6JYTmnMBaTzHU6Wx20V2ttkF7Vpwm3WU
Zsz0048tdYtHFyYBQ1kF5RNIBBJwV8SA/QUcPkR7FVpwZMLR2q4aJn0EE7kC9OMf
tYsmdbHeBdyfLXpXzazxWlgHquLyEIt52ykAcCphjx/Jl2fAExFEhtfsxpECoJ2f
8Dqhjg3WFjd6QWU6AFkElbwHUYCdIWdJOcsC8N1p8OvBmDz5QXv/RlYipHE00Cpx
QQQOgEjdRc6dlz2mbetMklnfII3p2kO9bzNdmEpOzT0Zt7nFaGdntW4I1QA0yJfa
gows9bYMzhqYk7YSiyTYOZ4qyavVtw==
=N/zZ
-----END PGP SIGNATURE-----
Merge tag 'v1.13.0rc2' into develop
Synapse 1.13.0rc2 (2020-05-14)
==============================
Bugfixes
--------
- Fix a long-standing bug which could cause messages not to be sent over federation, when state events with state keys matching user IDs (such as custom user statuses) were received. ([\#7376](https://github.com/matrix-org/synapse/issues/7376))
- Restore compatibility with non-compliant clients during the user interactive authentication process, fixing a problem introduced in v1.13.0rc1. ([\#7483](https://github.com/matrix-org/synapse/issues/7483))
Internal Changes
----------------
- Fix linting errors in new version of Flake8. ([\#7470](https://github.com/matrix-org/synapse/issues/7470))
* release-v1.13.0:
Don't UPGRADE database rows
RST indenting
Put rollback instructions in upgrade notes
Fix changelog typo
Oh yeah, RST
Absolute URL it is then
Fix upgrade notes link
Provide summary of upgrade issues in changelog. Fix )
Move next version notes from changelog to upgrade notes
Changelog fixes
1.13.0rc1
Documentation on setting up redis (#7446)
Rework UI Auth session validation for registration (#7455)
Fix errors from malformed log line (#7454)
Drop support for redis.dbid (#7450)
By persisting the user interactive authentication sessions to the database, this fixes
situations where a user hits different works throughout their auth session and also
allows sessions to persist through restarts of Synapse.
Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.
In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.
Fixes (I hope) #7257.
Add changelog
Save retrieved keys to the db
lint
Fix and de-brittle remote result dict processing
Use query_user_devices instead, assume only master, self_signing key types
Make changelog more useful
Remove very specific exception handling
Wrap get_verify_key_from_cross_signing_key in a try/except
Note that _get_e2e_cross_signing_verify_key can raise a SynapseError
lint
Add comment explaining why this is useful
Only fetch master and self_signing key types
Fix log statements, docstrings
Remove extraneous items from remote query try/except
lint
Factor key retrieval out into a separate function
Send device updates, modeled after SigningKeyEduUpdater._handle_signing_key_updates
Update method docstring
* Pull Sentinel out of LoggingContext
... and drop a few unnecessary references to it
* Factor out LoggingContext.current_context
move `current_context` and `set_context` out to top-level functions.
Mostly this means that I can more easily trace what's actually referring to
LoggingContext, but I think it's generally neater.
* move copy-to-parent into `stop`
this really just makes `start` and `stop` more symetric. It also means that it
behaves correctly if you manually `set_log_context` rather than using the
context manager.
* Replace `LoggingContext.alive` with `finished`
Turn `alive` into `finished` and make it a bit better defined.
If an error happened while processing a SAML AuthN response, or a client
ends up doing a `GET` request to `/authn_response`, then render a
customisable error page rather than a confusing error.
When we get an invite over federation, store the room version in the rooms table.
The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
This is intended as a precursor to storing room versions when we receive an
invite over federation, but has the happy side-effect of fixing #3374 at last.
In short: change the store_room with try/except to a proper upsert which
updates the right columns.
... and set it everywhere it's called.
while we're here, rename it for consistency with `check_user_in_room` (and to
help check that I haven't missed any instances)
* Reject device display names that are too long.
Too long is currently defined as 100 characters in length.
* Add a regression test for rejecting a too long device display name.
A lot of the things we log at INFO are now a bit superfluous, so lets
make them DEBUG logs to reduce the amount we log by default.
Co-Authored-By: Brendan Abolivier <babolivier@matrix.org>
Co-authored-by: Brendan Abolivier <github@brendanabolivier.com>
==============================
Bugfixes
--------
- Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](https://github.com/matrix-org/synapse/issues/6844))
- Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](https://github.com/matrix-org/synapse/issues/6848))
Internal Changes
----------------
- Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](https://github.com/matrix-org/synapse/issues/6850))
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEumuwyPtYLL2OMhYdOtoG7cdT0R4FAl478l8QHGVyaWtAbWF0
cml4Lm9yZwAKCRA62gbtx1PRHmKfD/0ZpBV95gMk0iecfvO6IY8+0KigzDUypgXf
zp1k+k5DBUmH5/83h7+cXWHIJyofv1At1YdIq9J/nNeyqr3V3dcQwPabOKYbI96d
kVD0EZX+2NjuYejAuF9eJNdiwRq5yMLluCfnOXr5jCS0NNdpO1xVb4bWYm50RmcD
WJvBvmfwXwCC3AFyNb4IAipaVUXuVXXx1YhjFs6DsbtpPUG88e4uIhpfvrS4v0t8
GFJZNuVJStOvyKHHTfNRIMkhy4xidj3HRUMSmjNpx07WnPBnbv4LnUOD1boYxgaP
EoODYnoAPshdiKB0AywNNjue3TmFD/Z7vVEzlFP/lNZ4GDU4kN9C/EwFYw0C2Jqq
f9O/E2ZuP9Qqume5O9UwMQQAmhV5lMBaIsYRbixU+9bSB893zRHRffA11HypzD00
ZXj8QDOXpiXp2jpAwN1Rk9e/aZEX3qd+zUAgRmk+kYb0KTC2/gcY956rodNSSO/U
RFYg4DFvoCgCrRSxZ4LQMlFu3YY2E7qWH+p/OHk3WG79jW64VpEFaytvv8fR+GKq
g3EbtWy6mUzlKYqbjZ+ZTUDe5AWdv6ZX8xJqRD/S6cpiwyh6Gp89HHNvBThXoCmK
fxkgw8if7eIITuTlNuDrqunxyWqwd3oVlzd2mi2bg0yRfcqJ9C6OuBV1VTLFZeky
3sjCiU0IRw==
=kcSy
-----END PGP SIGNATURE-----
Merge tag 'v1.10.0rc2' into develop
Synapse 1.10.0rc2 (2020-02-06)
==============================
Bugfixes
--------
- Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](https://github.com/matrix-org/synapse/issues/6844))
- Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](https://github.com/matrix-org/synapse/issues/6848))
Internal Changes
----------------
- Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](https://github.com/matrix-org/synapse/issues/6850))
We were looking at the wrong event type (`m.room.encryption` vs
`m.room.encrypted`).
Also fixup the duplicate `EvenTypes` entries.
Introduced in #6776.
These are easier to work with than the strings and we normally have one around.
This fixes `FederationHander._persist_auth_tree` which was passing a
RoomVersion object into event_auth.check instead of a string.
Turns out that figuring out a remote user id for the SAML user isn't quite as obvious as it seems. Factor it out to the SamlMappingProvider so that it's easy to control.
* Port synapse.replication.tcp to async/await
* Newsfile
* Correctly document type of on_<FOO> functions as async
* Don't be overenthusiastic with the asyncing....
When figuring out which topological token to start a purge job at, we
need to do the following:
1. Figure out a timestamp before which events will be purged
2. Select the first stream ordering after that timestamp
3. Select info about the first event after that stream ordering
4. Build a topological token from that info
In some situations (e.g. quiet rooms with a short max_lifetime), there
might not be an event after the stream ordering at step 3, therefore we
abort the purge with the error `No event found`. To mitigate that, this
patch fetches the first event _before_ the stream ordering, instead of
after.
Currently we rely on `current_state_events` to figure out what rooms a
user was in and their last membership event in there. However, if the
server leaves the room then the table may be cleaned up and that
information is lost. So lets add a table that separately holds that
information.
This fixes a weird bug where, if you were determined enough, you could end up with a rejected event forming part of the state at a backwards-extremity. Authing that backwards extrem would then lead to us trying to pull the rejected event from the db (with allow_rejected=False), which would fail with a 404.
When we request the state/auth_events to populate a backwards extremity (on
backfill or in the case of missing events in a transaction push), we should
check that the returned events are in the right room rather than blindly using
them in the room state or auth chain.
Given that _get_events_from_store_or_dest takes a room_id, it seems clear that
it should be sanity-checking the room_id of the requested events, so let's do
it there.
Make it return the state *after* the requested event, rather than the one
before it. This is a bit easier and requires fewer calls to
get_events_from_store_or_dest.
This fixes a weird bug where, if you were determined enough, you could end up with a rejected event forming part of the state at a backwards-extremity. Authing that backwards extrem would then lead to us trying to pull the rejected event from the db (with allow_rejected=False), which would fail with a 404.
Sometimes the filtering function can return a pruned version of an event (on top of either the event itself or an empty list), if it thinks the user should be able to see that there's an event there but not the content of that event. Therefore, the previous logic of 'if filtered is empty then we can use the event we retrieved from the database' is flawed, and we should use the event returned by the filtering function.
When we request the state/auth_events to populate a backwards extremity (on
backfill or in the case of missing events in a transaction push), we should
check that the returned events are in the right room rather than blindly using
them in the room state or auth chain.
Given that _get_events_from_store_or_dest takes a room_id, it seems clear that
it should be sanity-checking the room_id of the requested events, so let's do
it there.
Make it return the state *after* the requested event, rather than the one
before it. This is a bit easier and requires fewer calls to
get_events_from_store_or_dest.
PaginationHandler.get_messages is only called by RoomMessageListRestServlet,
which is async.
Chase the code path down from there:
- FederationHandler.maybe_backfill (and nested try_backfill)
- FederationHandler.backfill