`StatsHandler` handles updates to the `current_state_delta_stream`, and updates room stats such as the amount of state events, joined users, etc.
However, it counts every new join membership as a new user entering a room (and that user being in another room), whereas it's possible for a user's membership status to go from join -> join, for instance when they change their per-room profile information.
This PR adds a check for join->join membership transitions, and bails out early, as none of the further checks are necessary at that point.
Due to this bug, membership stats in many rooms have ended up being wildly larger than their true values. I am not sure if we also want to include a migration step which recalculates these statistics (possibly using the `_populate_stats_process_rooms` bg update).
Bug introduced in the initial implementation https://github.com/matrix-org/synapse/pull/4338.
Thanks to some slightly overzealous cleanup in the
`delete_old_current_state_events`, it's possible to end up with no
`event_forward_extremities` in a room where we have outstanding local
invites. The user would then get a "no create event in auth events" when trying
to reject the invite.
We can hack around it by using the dangling invite as the prev event.
The replication client requires that arguments are given as keyword
arguments, which was not done in this case. We also pull out the logic
so that we can catch and handle any exceptions raised, rather than
leaving them unhandled.
When fetching the state of a room over federation we receive the event
IDs of the state and auth chain. We then fetch those events that we
don't already have.
However, we used a function that recursively fetched any missing auth
events for the fetched events, which can lead to a lot of recursion if
the server is missing most of the auth chain. This work is entirely
pointless because would have queued up the missing events in the auth
chain to be fetched already.
Let's just diable the recursion, since it only gets called from one
place anyway.
Fixes#2181.
The basic premise is that, when we
fail to reject an invite via the remote server, we can generate our own
out-of-band leave event and persist it as an outlier, so that we have something
to send to the client.
The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.
==============================
Synapse 1.16.0rc2 includes the security fixes released with Synapse 1.15.2.
Please see [below](https://github.com/matrix-org/synapse/blob/master/CHANGES.md#synapse-1152-2020-07-02) for more details.
Improved Documentation
----------------------
- Update postgres image in example `docker-compose.yaml` to tag `12-alpine`. ([\#7696](https://github.com/matrix-org/synapse/issues/7696))
Internal Changes
----------------
- Add some metrics for inbound and outbound federation latencies: `synapse_federation_server_pdu_process_time` and `synapse_event_processing_lag_by_event`. ([\#7771](https://github.com/matrix-org/synapse/issues/7771))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl79+qgACgkQM/xY9qcR
MEhcaRAAjWLW3ojN1F0DUfE85jziZK2VdnMQC3g+uEOLX6QRbfqFNaNNMjLdK+vl
K/+2ZoHkRsg6g8noSPhPmI1z1+hb5xDJaxjltzHxonIipW8XSU8o2PQMkf8O/BAy
VS58y3GyLkhEgzWC+/hcII+LBgcqXpLuNM0xrKTHmxclIjdewlwe1v+hxkP+6wsX
9Whhn1f4sNHrCtyFVK9uzMFcVyzcQaiWZRjEDMj2uR7rWT6UbCUifN/G4fWmtGbY
xWoNoC4Qv8xiqXOG4U7juPp9T3bRyWMKyjBFM5PWO6Ec2zfafDyFzhBxJhlQhODG
g21tS4PowX/dM/pBpJFEOPh1BVrPZzzTD+YMmTcd3NO79HeaQGqEX/+tzFCFUyPp
0daJK3Y85+l5w/M09WU8DDN8CiR3PFJyGDIZp+nweMsiJZkbEbLOkh1tx6TL+5/6
zwewU6cq8nTVGrn53Tn58l8C7Sj4w+Qk+1XDzymAoidyoWqAKW9Y/fw53PaViUSx
voDu0rpsEUXR1OzCBG8SAPQCFy9gdEWV04OvIpzHuq2uojkz66f7NAXy+Wz+Occ9
AYb/s6Ei80bGCLgRd5jg+myqavwRbzCyv+LIC6dxpopxZJ3AzrFuD11eXKtrIxOC
FZYf3U4KeBk4Q9TV5IFV1xcGFrq5aK36LdmP6WOsEl3PXVT9p/Q=
=YaJn
-----END PGP SIGNATURE-----
Merge tag 'v1.16.0rc2' into develop
Synapse 1.16.0rc2 (2020-07-02)
==============================
Synapse 1.16.0rc2 includes the security fixes released with Synapse 1.15.2.
Please see [below](https://github.com/matrix-org/synapse/blob/master/CHANGES.md#synapse-1152-2020-07-02) for more details.
Improved Documentation
----------------------
- Update postgres image in example `docker-compose.yaml` to tag `12-alpine`. ([\#7696](https://github.com/matrix-org/synapse/issues/7696))
Internal Changes
----------------
- Add some metrics for inbound and outbound federation latencies: `synapse_federation_server_pdu_process_time` and `synapse_event_processing_lag_by_event`. ([\#7771](https://github.com/matrix-org/synapse/issues/7771))
State res v2 across large data sets can be very CPU intensive, and if
all the relevant events are in the cache the algorithm will run from
start to finish within a single reactor tick. This can result in
blocking the reactor tick for several seconds, which can have major
repercussions on other requests.
To fix this we simply add the occaisonal `sleep(0)` during iterations to
yield execution until the next reactor tick. The aim is to only do this
for large data sets so that we don't impact otherwise quick resolutions.=
The aim here is to make it easier to reason about when streams are limited and when they're not, by moving the logic into the database functions themselves. This should mean we can kill of `db_query_to_update_function` function.
Fixes https://github.com/matrix-org/synapse/issues/2431
Adds config option `encryption_enabled_by_default_for_room_type`, which determines whether encryption should be enabled with the default encryption algorithm in private or public rooms upon creation. Whether the room is private or public is decided based upon the room creation preset that is used.
Part of this PR is also pulling out all of the individual instances of `m.megolm.v1.aes-sha2` into a constant variable to eliminate typos ala https://github.com/matrix-org/synapse/pull/7637
Based on #7637
While working on https://github.com/matrix-org/synapse/issues/5665 I found myself digging into the `Ratelimiter` class and seeing that it was both:
* Rather undocumented, and
* causing a *lot* of config checks
This PR attempts to refactor and comment the `Ratelimiter` class, as well as encourage config file accesses to only be done at instantiation.
Best to be reviewed commit-by-commit.
* Expose `return_html_error`, and allow it to take a Jinja2 template instead of a raw string
* Clean up exception handling in SAML2ResponseResource
* use the existing code in `return_html_error` instead of re-implementing it
(giving it a jinja2 template rather than inventing a new form of template)
* do the exception-catching in the REST layer rather than in the handler
layer, to make sure we catch all exceptions.
It looks like `user_device_resync` was ignoring cross-signing keys from the results received from the remote server. This patch fixes this, by processing these keys using the same process `_handle_signing_key_updates` does (and effectively factor that part out of that function).
Without this patch, if an error happens which isn't caught by `user_device_resync`, then `_maybe_retry_device_resync` would fail, without retrying the next users in the iteration. This patch fixes this so that it now only logs an error in this case.
The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).
Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.
People probably want to look at this commit by commit.
Instead of doing a complicated dance of deleting and moving aliases one
by one, which sends a canonical alias update into the old room for each
one, lets do it all in one go.
This also changes the function to move *all* local alias events to the new
room, however that happens later on anyway.
When a call to `user_device_resync` fails, we don't currently mark the remote user's device list as out of sync, nor do we retry to sync it.
https://github.com/matrix-org/synapse/pull/6776 introduced some code infrastructure to mark device lists as stale/out of sync.
This commit uses that code infrastructure to mark device lists as out of sync if processing an incoming device list update makes the device handler realise that the device list is out of sync, but we can't resync right now.
It also adds a looping call to retry all failed resync every 30s. This shouldn't cause too much spam in the logs as this commit also removes the "Failed to handle device list update for..." warning logs when catching `NotRetryingDestination`.
Fixes#7418
Bugfixes:
- Hash passwords as early as possible during registration. #7523
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl7CpGYACgkQM/xY9qcR
MEhVixAAk2hDWVXxbGzUk2LmfiIsFA2eV55sw+VqEw0eRfe1d/mP6aH75VmTt3pw
IymZUVxDXdbTnPNPw+ldyGhzu9C6JJjXnNRBZnIkR5vcSbWsV0mPl/qHFu/4FnZI
m4Nj1Sx3sG0CyNDpWjVrzTW6SbDX9J68DXbLwnNTSX3KPa7gNn6TUmFfKzlrNI23
pPmD+EITYMn/H9HOhxhTzq//Ja7UOViAKQ0q4N2I4GxmLP6ufx9P3s5FG/oJqA+H
Pka2+9JnfHq2Ze22CoDcg8q5f5MgVkQzGeir0ZsGJwJqOYjeTmbCvD3T/RYWO5g+
ZghON3tsMQmdzUQqGRxcn/YLOZY9ZqrX2kBf5E6Wapwj9MfKg2ToLZM4yrWN0+RX
KDuWaKXYtkSQCo1nDS2KooVMWjGNZautWWnHzZ0KNQCIkxVpGC234JYI685grKXb
dg7R41kdXI7NJzqS4iM1fxXoLx64fpoREa/pbLF6VeLaYXBlzMjfhiIx2pQBN3L/
q/y3ftev9VCp+2wPxiKUkiC4Sh7dgWUzNuyHU+4lsPUbI1H/MN5dN2ryObdEGWc/
5YU3tv2MTQJ7jECHR+/fastnG+5d2kVm/FK+zVhG17JvA2VmDaLnSde+mzGbsO1N
gIUx5VrTEP7y0tC8C/VgbS3c2KqCSOopqd3j2slLLrtQlXM71VE=
=lpDI
-----END PGP SIGNATURE-----
Merge tag 'v1.13.0rc3' into develop
Synapse 1.13.0rc3 (2020-05-18)
Bugfixes:
- Hash passwords as early as possible during registration. #7523
==============================
Bugfixes
--------
- Fix a long-standing bug which could cause messages not to be sent over federation, when state events with state keys matching user IDs (such as custom user statuses) were received. ([\#7376](https://github.com/matrix-org/synapse/issues/7376))
- Restore compatibility with non-compliant clients during the user interactive authentication process, fixing a problem introduced in v1.13.0rc1. ([\#7483](https://github.com/matrix-org/synapse/issues/7483))
Internal Changes
----------------
- Fix linting errors in new version of Flake8. ([\#7470](https://github.com/matrix-org/synapse/issues/7470))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl69IQ8ACgkQOSor00I9
eP87lAf8DK+v6cs2U0BoD5opzQ7ZazJT6JYTmnMBaTzHU6Wx20V2ttkF7Vpwm3WU
Zsz0048tdYtHFyYBQ1kF5RNIBBJwV8SA/QUcPkR7FVpwZMLR2q4aJn0EE7kC9OMf
tYsmdbHeBdyfLXpXzazxWlgHquLyEIt52ykAcCphjx/Jl2fAExFEhtfsxpECoJ2f
8Dqhjg3WFjd6QWU6AFkElbwHUYCdIWdJOcsC8N1p8OvBmDz5QXv/RlYipHE00Cpx
QQQOgEjdRc6dlz2mbetMklnfII3p2kO9bzNdmEpOzT0Zt7nFaGdntW4I1QA0yJfa
gows9bYMzhqYk7YSiyTYOZ4qyavVtw==
=N/zZ
-----END PGP SIGNATURE-----
Merge tag 'v1.13.0rc2' into develop
Synapse 1.13.0rc2 (2020-05-14)
==============================
Bugfixes
--------
- Fix a long-standing bug which could cause messages not to be sent over federation, when state events with state keys matching user IDs (such as custom user statuses) were received. ([\#7376](https://github.com/matrix-org/synapse/issues/7376))
- Restore compatibility with non-compliant clients during the user interactive authentication process, fixing a problem introduced in v1.13.0rc1. ([\#7483](https://github.com/matrix-org/synapse/issues/7483))
Internal Changes
----------------
- Fix linting errors in new version of Flake8. ([\#7470](https://github.com/matrix-org/synapse/issues/7470))
* release-v1.13.0:
Don't UPGRADE database rows
RST indenting
Put rollback instructions in upgrade notes
Fix changelog typo
Oh yeah, RST
Absolute URL it is then
Fix upgrade notes link
Provide summary of upgrade issues in changelog. Fix )
Move next version notes from changelog to upgrade notes
Changelog fixes
1.13.0rc1
Documentation on setting up redis (#7446)
Rework UI Auth session validation for registration (#7455)
Fix errors from malformed log line (#7454)
Drop support for redis.dbid (#7450)
By persisting the user interactive authentication sessions to the database, this fixes
situations where a user hits different works throughout their auth session and also
allows sessions to persist through restarts of Synapse.
Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.
In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.
Fixes (I hope) #7257.
Add changelog
Save retrieved keys to the db
lint
Fix and de-brittle remote result dict processing
Use query_user_devices instead, assume only master, self_signing key types
Make changelog more useful
Remove very specific exception handling
Wrap get_verify_key_from_cross_signing_key in a try/except
Note that _get_e2e_cross_signing_verify_key can raise a SynapseError
lint
Add comment explaining why this is useful
Only fetch master and self_signing key types
Fix log statements, docstrings
Remove extraneous items from remote query try/except
lint
Factor key retrieval out into a separate function
Send device updates, modeled after SigningKeyEduUpdater._handle_signing_key_updates
Update method docstring
* Pull Sentinel out of LoggingContext
... and drop a few unnecessary references to it
* Factor out LoggingContext.current_context
move `current_context` and `set_context` out to top-level functions.
Mostly this means that I can more easily trace what's actually referring to
LoggingContext, but I think it's generally neater.
* move copy-to-parent into `stop`
this really just makes `start` and `stop` more symetric. It also means that it
behaves correctly if you manually `set_log_context` rather than using the
context manager.
* Replace `LoggingContext.alive` with `finished`
Turn `alive` into `finished` and make it a bit better defined.
If an error happened while processing a SAML AuthN response, or a client
ends up doing a `GET` request to `/authn_response`, then render a
customisable error page rather than a confusing error.
When we get an invite over federation, store the room version in the rooms table.
The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
This is intended as a precursor to storing room versions when we receive an
invite over federation, but has the happy side-effect of fixing #3374 at last.
In short: change the store_room with try/except to a proper upsert which
updates the right columns.