This fixes a bug where having multiple callers waiting on the same
stream and position will cause it to try and compare two deferreds,
which fails (due to the sorted list having an entry of `Tuple[int,
Deferred]`).
This is split out from https://github.com/matrix-org/synapse/pull/7438, which had gotten rather large.
`LoginRestServlet` has a couple helper methods, `login_submission_legacy_convert` and `login_id_thirdparty_from_phone`. They're primarily used for converting legacy user login submissions to "identifier" dicts ([see spec](https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-login)). Identifying information such as usernames or 3PID information used to be top-level in the login body. They're now supposed to be put inside an [identifier](https://matrix.org/docs/spec/client_server/r0.6.1#identifier-types) parameter instead.
#7438's purpose is to allow using the new identifier parameter during User-Interactive Authentication, which is currently handled in AuthHandler. That's why I've moved these helper methods there. I also moved the refactoring of these method from #7438 as they're relevant.
Small cleanup PR.
* Removed the unused `is_guest` argument
* Added a safeguard to a (currently) impossible code path, fixing static checking at the same time.
==============================
Bugfixes
--------
- Fix a bug introduced in v1.19.0 where appservices with ratelimiting disabled would still be ratelimited when joining rooms. ([\#8139](https://github.com/matrix-org/synapse/issues/8139))
- Fix a bug introduced in v1.19.0 that would cause e.g. profile updates to fail due to incorrect application of rate limits on join requests. ([\#8153](https://github.com/matrix-org/synapse/issues/8153))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdVkXOgzrGzds0jtrHgFcFF8ZFs0FAl9FIKkACgkQHgFcFF8Z
Fs3zIw/9H40ieb73Iay6ecQeOSfHiMVMzvJqYbKgho/a6h5JDHir+NGpwuLFdeGM
eHR07QkQgUU9+dLNTMOQpCKTIsU70wvzH1vTDINS3ChjnRBdrHKqhAG6ZyEt1dJx
kxYX54zsQUwiwshMKbJ+DPclHaBFnL+SY5OFfqCNjvaNob59DbHAL3tlSktPc2go
tGmj81q0dWY6maMCGI3IIYcrW7oLi+4TwosZual5Hz/xgRBiGaKHXRIJnInvkXpl
R+rSOmpYraapfDPHzPyQgLN4Dt7aAccGho843tt7dAVfd2GRSaUkLGXVXCdoruQG
CRjY1P3BRBzRBx6o80Uw7Ah0hsoVgpJTSzY008KigJce+IiRWG5sgPjoubhfK0MA
BqzmCa3/lrR+/WUOf4+w6HSfRncKawgAp7Y7wVj4nQF5fc8mwpFLz4pA/C2YOyjp
nYXCHf60/KSBDhnr0ZRAhAby4MJoYSf03djFG1oef5SVzOzHD7zho9oBnEz15Tab
XXkg1iJ7AhNFiQjsY4H1sl2onoF4T7B53NOnUEwD0oll+nXIYGe6hlNuq6x4j6l9
39ZlMoe9zK28LoKKWa1RDug8z+PmarKRJ2zATlHIb2RGeVX+oFfaKVsIbJtupJVC
8HSFt7gcgLCdUazk6taKpOHeVyGxK6WUkLnCMHzD2rzPhzpSyws=
=0rHM
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdVkXOgzrGzds0jtrHgFcFF8ZFs0FAl9FJK4ACgkQHgFcFF8Z
Fs0O2xAAgxqCfAHylVuwunRppwV49Bvq6H0mMM29hYBogGB5cmh1vRnUm5GsPQt0
9vTKlwMz6XKjLs3TLqYsnZfXqK+lyaN0xFXd7xWCNzFtEXoIvDnq+u3h5WGaKOC9
PcXb2LSZoHC5yECBpoh58ZQEPKtYILEjo+OSDboIqHz4N5HSjMSPGPvMkUn6xMNG
roWTAWKPd9juyE2dPzZxIoBWwJGn3D8EMkeTQDlExTTvmnDPyPvJ5MVUY/xaHLgy
XV8lapFu/SzWAKotc5+9qkVN64obaxwovYTU9JnlqEc5+WlD+Jl+g0258Q1bV1H9
341aQQJX08iYw3xw13xVgT0zLPRbp82O3/SHC3S1nz27HUWKXqUtsm6woDbgHIz5
UPvKFQsp2dEN4tFXxkEHiossIVNGuXdRYwEjFQrxOwayCuS4cQwDADhqnzDU4hio
LSVhtxs9rgLps4iKpcaRAqK8kifTrsomlQfh/7axPJQ43pmBR2PiItetlBW/9Le6
KTH90ghLQzJwKFkIcFcvPhFMVqSyXI32+g5++YAPmNVy9M/7LdJxuEc9ifTWgwds
LtV3/F8xlqd0qwl5IbwC6Wf19N06jdlRv/q1zL/Hb6qu3FLQeGd+/1aiC0rsbq15
grdHVZkZi1iVF/zrOx24ctxQvgLyGHA+M7n/oIaIgxlT1S6+FUI=
=49ZC
-----END PGP SIGNATURE-----
Merge tag 'v1.19.1rc1' into develop
Synapse 1.19.1rc1 (2020-08-25)
==============================
Bugfixes
--------
- Fix a bug introduced in v1.19.0 where appservices with ratelimiting disabled would still be ratelimited when joining rooms. ([\#8139](https://github.com/matrix-org/synapse/issues/8139))
- Fix a bug introduced in v1.19.0 that would cause e.g. profile updates to fail due to incorrect application of rate limits on join requests. ([\#8153](https://github.com/matrix-org/synapse/issues/8153))
Add new method ratelimiter.can_requester_do_action and ensure that appservices are exempt from being ratelimited.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
* Don't raise session_id errors on submit_token if request_token_inhibit_3pid_errors is set
* Changelog
* Also wait some time before responding to /requestToken
* Incorporate review
* Update synapse/storage/databases/main/registration.py
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Incorporate review
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Add new method ratelimiter.can_requester_do_action and ensure that appservices are exempt from being ratelimited.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
It's just a thin wrapper around two ID gens to make `get_current_token`
and `get_next` return tuples. This can easily be replaced by calling the
appropriate methods on the underlying ID gens directly.
The function is used for two purposes: 1) for subscribers of streams to
get a token they can use to get further updates with, and 2) for
replication to track position of the writers of the stream.
For streams with a single writer the two scenarios produce the same
result, however the situation becomes complicated for streams with
multiple writers. The current `MultiWriterIdGenerator` does not
correctly handle the first case (which is not an issue as its only used
for the `caches` stream which nothing subscribes to outside of
replication).
If we got an error persisting an event, we would try to remove the push actions
asynchronously, which would lead to a 'Re-starting finished log context'
warning.
I don't think there's any need for this to be asynchronous.
We do this to prevent foot guns. The default config uses a MemoryFilter,
but users are free to change to logging to files directly. If they do
then they have to ensure to set the `filters: [context]` on the right
handler, otherwise records get written with the wrong context.
Instead we move the logic to happen when we generate a record, which is
when we *log* rather than *handle*.
(It's possible to add filters to loggers in the config, however they
don't apply to descendant loggers and so they have to be manually set on
*every* logger used in the code base)
We do this to prevent foot guns. The default config uses a MemoryFilter,
but users are free to change to logging to files directly. If they do
then they have to ensure to set the `filters: [context]` on the right
handler, otherwise records get written with the wrong context.
Instead we move the logic to happen when we generate a record, which is
when we *log* rather than *handle*.
(It's possible to add filters to loggers in the config, however they
don't apply to descendant loggers and so they have to be manually set on
*every* logger used in the code base)
c.f. #8021
A lot of the code here is to change the `Completed 200 OK` logging to include the request URI so that we can drop the `Sending request...` log line.
Some notes:
1. We won't log retries, which may be confusing considering the time taken log line includes retries and sleeps.
2. The `_send_request_with_optional_trailing_slash` will always be logged *without* the forward slash, even if it succeeded only with the forward slash.
* Change default log config to buffer by default.
This batches up writes to the filesystem, which is more efficient for
disk I/O. This means that it can take some time for logs to get written
to disk. Note that ERROR logs (and above) immediately flush the buffer.
This only effects new installs, as we only write the log config if
started with `--generate-config` (in the same way we do for generating
signing keys).
* Default to keeping last 4 days of logs.
This hopefully reduces the amount of logs kept for new servers. Keeping
the last 1GB of logs is likely overkill for new servers, but equally may
not be enough for busy ones.
Instead, we keep the last four days worth of logs, enough so that admins
can investigate any problems that happened over e.g. a long weekend.
Duplicating function signatures between server.py and server.pyi is
silly. This commit changes that by changing all `build_*` methods to
`get_*` methods and changing the `_make_dependency_method` to work work
as a descriptor that caches the produced value.
There are some changes in other files that were made to fix the typing
in server.py.
This has long been something I've wanted to do. Basically the `Daemonize` code
is both too flexible and not flexible enough, in that it offers a bunch of
features that we don't use (changing UID, closing FDs in the child, logging to
syslog) and doesn't offer a bunch that we could do with (redirecting stdout/err
to a file instead of /dev/null; having the parent not exit until the child is
running).
As a first step, I've lifted the Daemonize code and removed the bits we don't
use. This should be a non-functional change. Fixing everything else will come
later.
We've [decided](https://github.com/matrix-org/synapse/issues/5253#issuecomment-665976308) to remove the signature check for v1 lookups.
The signature check has been removed in v2 lookups. v1 lookups are currently deprecated. As mentioned in the above linked issue, this verification was causing deployments for the vector.im and matrix.org IS deployments, and this change is the simplest solution, without being unjustified.
Implementations are encouraged to use the v2 lookup API as it has [increased privacy benefits](https://github.com/matrix-org/matrix-doc/pull/2134).
`StatsHandler` handles updates to the `current_state_delta_stream`, and updates room stats such as the amount of state events, joined users, etc.
However, it counts every new join membership as a new user entering a room (and that user being in another room), whereas it's possible for a user's membership status to go from join -> join, for instance when they change their per-room profile information.
This PR adds a check for join->join membership transitions, and bails out early, as none of the further checks are necessary at that point.
Due to this bug, membership stats in many rooms have ended up being wildly larger than their true values. I am not sure if we also want to include a migration step which recalculates these statistics (possibly using the `_populate_stats_process_rooms` bg update).
Bug introduced in the initial implementation https://github.com/matrix-org/synapse/pull/4338.
Thanks to some slightly overzealous cleanup in the
`delete_old_current_state_events`, it's possible to end up with no
`event_forward_extremities` in a room where we have outstanding local
invites. The user would then get a "no create event in auth events" when trying
to reject the invite.
We can hack around it by using the dangling invite as the prev event.
==============================
Bugfixes
--------
- Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
- Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967))
Internal Changes
----------------
- Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl8f/f8ACgkQOSor00I9
eP8/Uwf8CiVWvrBsmFZMvxJDkUWm0/f1kN4IQdm8ibDtyNyvFUx+Y1K8KOQS+VwG
a3bZqSC2Vv2sO9O9kR+V2tk831l+ujO0Nlaohuqyvhcl9lzh04rRYI9x9IHlAq2H
WPb0NMLwMufL6YkXDBwZT/G9TVW1vLRGASu4f7X2rXqek34VNVgYbg1hB2dp4dDa
wjKk3iBZ6h34IhKPgu0sLBUcyvX4U5xdOHjEG3HXvNnvDNO0HMD8rGB7065vFMD6
PH4nUK/h+RL0UBs2sJOMK1ZazFUODdURwANJQNAQ6pNvf9/RWgw2okka2bYIcmQQ
UT7tiwMsBvKdy4PER5fcDX3COY16qw==
=Q+bI
-----END PGP SIGNATURE-----
Merge tag 'v1.18.0rc2' into develop
Synapse 1.18.0rc2 (2020-07-28)
==============================
Bugfixes
--------
- Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
- Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967))
Internal Changes
----------------
- Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
IIRC this doesn't break tests because its only hit on reconnection, or something.
Basically, when a process needs to fetch missing updates for the `typing` stream it needs to query the writer instance via HTTP (as we don't write typing notifications to the DB), the problem was that the endpoint (`streams`) was only registered on master and specifically not on the typing writer worker.
Most of the stuff we do for replication commands can be done synchronously. There's no point spinning up background processes if we're not going to need them.
Handling of incoming typing stream updates from replication was not
hooked up on master, effecting set ups where typing was handled on a
different worker.
This is really only a problem if the master process is also handling
sync requests, which is unlikely for those that are at the stage of
moving typing off.
The other observable effect is that if a worker restarts or a
replication connect drops then the typing worker will issue a
`POSITION typing`, triggering master process to try and stream *all*
typing updates from position 0.
Fixes#7907
If we send out an event which refers to `prev_events` which other servers in
the federation are missing, then (after a round or two of backfill attempts),
they will end up asking us for `/state_ids` at a particular point in the DAG.
As per https://github.com/matrix-org/synapse/issues/7893, this is quite
expensive, and we tend to see lots of very similar requests around the same
time.
We can therefore handle this much more efficiently by using a cache, which (a)
ensures that if we see the same request from multiple servers (or even the same
server, multiple times), then they share the result, and (b) any other servers
that miss the initial excitement can also benefit from the work.
[It's interesting to note that `/state` has a cache for exactly this
reason. `/state` is now essentially unused and replaced with `/state_ids`, but
evidently when we replaced it we forgot to add a cache to the new endpoint.]
For inbound federation requests, if a given remote server makes too many
requests at once, we start stacking them up rather than processing them
immediatedly.
However, that means that there is a fair chance that the requesting server will
disconnect before we start processing the request. In that case, if it was a
read-only request (ie, a GET request), there is absolutely no point in
building a response (and some requests are quite expensive to handle).
Even in the case of a POST request, one of two things will happen:
* Most likely, the requesting server will retry the request and we'll get the
information anyway.
* Even if it doesn't, the requesting server has to assume that we didn't get
the memo, and act accordingly.
In short, we're better off aborting the request at this point rather than
ploughing on with what might be a quite expensive request.
The [postgres setup docs](https://github.com/matrix-org/synapse/blob/develop/docs/postgres.md#set-up-database) recommend setting up your database with user `synapse_user`.
However, uncommenting the postgres defaults in the sample config leave you with user `synapse`.
This PR switches the sample config to recommend `synapse_user`. Took a me a second to figure this out, so assume this will beneficial to others.
It serves no purpose and updating everytime we write to the device inbox
stream means all such transactions will conflict, causing lots of
transaction failures and retries.
When we get behind on replication, we tend to stack up background processes
behind a linearizer. Bg processes are heavy (particularly with respect to
prometheus metrics) and linearizers aren't terribly efficient once the queue
gets long either.
A better approach is to maintain a queue of requests to be processed, and
nominate a single process to work its way through the queue.
Fixes: #7444
It was correct at the time of our friend Jorik writing it (checking
git blame), but the world has moved now and it is no longer a
generator.
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
When considering rooms to clean up in `delete_old_current_state_events`, skip
rooms which we are creating, which otherwise look a bit like rooms we have
left.
Fixes#7834.
As far as I can tell from the sentry logs, the only time this has actually done
anything in the last two years is when we had two master workers running at
once, and even then, it made a bit of a mess of it (see
https://github.com/matrix-org/synapse/issues/7845#issuecomment-658238739).
Generally I feel like this code is doing more harm than good.
The replication client requires that arguments are given as keyword
arguments, which was not done in this case. We also pull out the logic
so that we can catch and handle any exceptions raised, rather than
leaving them unhandled.
When fetching the state of a room over federation we receive the event
IDs of the state and auth chain. We then fetch those events that we
don't already have.
However, we used a function that recursively fetched any missing auth
events for the fetched events, which can lead to a lot of recursion if
the server is missing most of the auth chain. This work is entirely
pointless because would have queued up the missing events in the auth
chain to be fetched already.
Let's just diable the recursion, since it only gets called from one
place anyway.
Fixes#2181.
The basic premise is that, when we
fail to reject an invite via the remote server, we can generate our own
out-of-band leave event and persist it as an outlier, so that we have something
to send to the client.
This table is no longer used, so we may as well stop populating it. Removing it
would prevent people rolling back to older releases of Synapse, so that can
happen in a future release.
* Fix spec compliance; tweaks without values are valid
(default to True, which is only concretely specified for
`highlight`, but it seems only reasonable to generalise)
* Changelog for 7766.
* Add documentation to `tweaks_for_actions`
May as well tidy up when I'm here.
* Add a test for `tweaks_for_actions`
Fixes https://github.com/matrix-org/synapse/issues/7641
The package was pinned to <0.8.0 without an obvious reasoning with
7ad1d7635
in https://github.com/matrix-org/synapse/pull/5636
while the version selection looks to just try to exclude an arbitrary
next minor version number that might introduce API breaking changes.
Selecting the next minor number might be a good conservative selection.
Downstream distributions already reported success patching out the version
requirements.
This also fixes the integration of upgraded packages into openSUSE packages,
e.g. for openSUSE Tumbleweed which already ships prometheus_client >= 0.8 .
Signed-off-by: Oliver Kurz <okurz@suse.de>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.
==============================
Synapse 1.16.0rc2 includes the security fixes released with Synapse 1.15.2.
Please see [below](https://github.com/matrix-org/synapse/blob/master/CHANGES.md#synapse-1152-2020-07-02) for more details.
Improved Documentation
----------------------
- Update postgres image in example `docker-compose.yaml` to tag `12-alpine`. ([\#7696](https://github.com/matrix-org/synapse/issues/7696))
Internal Changes
----------------
- Add some metrics for inbound and outbound federation latencies: `synapse_federation_server_pdu_process_time` and `synapse_event_processing_lag_by_event`. ([\#7771](https://github.com/matrix-org/synapse/issues/7771))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl79+qgACgkQM/xY9qcR
MEhcaRAAjWLW3ojN1F0DUfE85jziZK2VdnMQC3g+uEOLX6QRbfqFNaNNMjLdK+vl
K/+2ZoHkRsg6g8noSPhPmI1z1+hb5xDJaxjltzHxonIipW8XSU8o2PQMkf8O/BAy
VS58y3GyLkhEgzWC+/hcII+LBgcqXpLuNM0xrKTHmxclIjdewlwe1v+hxkP+6wsX
9Whhn1f4sNHrCtyFVK9uzMFcVyzcQaiWZRjEDMj2uR7rWT6UbCUifN/G4fWmtGbY
xWoNoC4Qv8xiqXOG4U7juPp9T3bRyWMKyjBFM5PWO6Ec2zfafDyFzhBxJhlQhODG
g21tS4PowX/dM/pBpJFEOPh1BVrPZzzTD+YMmTcd3NO79HeaQGqEX/+tzFCFUyPp
0daJK3Y85+l5w/M09WU8DDN8CiR3PFJyGDIZp+nweMsiJZkbEbLOkh1tx6TL+5/6
zwewU6cq8nTVGrn53Tn58l8C7Sj4w+Qk+1XDzymAoidyoWqAKW9Y/fw53PaViUSx
voDu0rpsEUXR1OzCBG8SAPQCFy9gdEWV04OvIpzHuq2uojkz66f7NAXy+Wz+Occ9
AYb/s6Ei80bGCLgRd5jg+myqavwRbzCyv+LIC6dxpopxZJ3AzrFuD11eXKtrIxOC
FZYf3U4KeBk4Q9TV5IFV1xcGFrq5aK36LdmP6WOsEl3PXVT9p/Q=
=YaJn
-----END PGP SIGNATURE-----
Merge tag 'v1.16.0rc2' into develop
Synapse 1.16.0rc2 (2020-07-02)
==============================
Synapse 1.16.0rc2 includes the security fixes released with Synapse 1.15.2.
Please see [below](https://github.com/matrix-org/synapse/blob/master/CHANGES.md#synapse-1152-2020-07-02) for more details.
Improved Documentation
----------------------
- Update postgres image in example `docker-compose.yaml` to tag `12-alpine`. ([\#7696](https://github.com/matrix-org/synapse/issues/7696))
Internal Changes
----------------
- Add some metrics for inbound and outbound federation latencies: `synapse_federation_server_pdu_process_time` and `synapse_event_processing_lag_by_event`. ([\#7771](https://github.com/matrix-org/synapse/issues/7771))
State res v2 across large data sets can be very CPU intensive, and if
all the relevant events are in the cache the algorithm will run from
start to finish within a single reactor tick. This can result in
blocking the reactor tick for several seconds, which can have major
repercussions on other requests.
To fix this we simply add the occaisonal `sleep(0)` during iterations to
yield execution until the next reactor tick. The aim is to only do this
for large data sets so that we don't impact otherwise quick resolutions.=
HTTP requires the response to contain a Content-Length header unless chunked encoding is being used.
Prometheus metrics endpoint did not set this, causing software such as prometheus-proxy to not be able to scrape synapse for metrics.
Signed-off-by: Christian Svensson <blue@cmd.nu>
Older versions of `parameterized` package have no `parameterized_class` decorator. This decorator is used in tests.
Signed-off-by: Oleg Girko <ol@infoserver.lv>
* Always return an unread_count in get_unread_event_push_actions_by_room_for_user
* Don't always expect unread_count to be there so we don't take out sync entirely if something goes wrong
This requires a new config option to specify which media repo should be
responsible for running background jobs to e.g. clear out expired URL
preview caches.
The aim here is to make it easier to reason about when streams are limited and when they're not, by moving the logic into the database functions themselves. This should mean we can kill of `db_query_to_update_function` function.
This ended up being a bit more invasive than I'd hoped for (not helped by
generic_worker duplicating some of the code from homeserver), but hopefully
it's an improvement.
The idea is that, rather than storing unstructured `dict`s in the config for
the listener configurations, we instead parse it into a structured
`ListenerConfig` object.
Fixes https://github.com/matrix-org/synapse/issues/7683
Broke in: #7649
We had a `yield` acting on a coroutine. To be fair this one is a bit difficult to notice as there's a function in the middle that just passes the coroutine along.
The spec [states](https://matrix.org/docs/spec/client_server/r0.6.1#phone-number) that `m.id.phone` requires the field `country` and `phone`.
In Synapse, we've been enforcing `country` and `number`.
I am not currently sure whether this affects any client implementations.
This issue was introduced in #1994.
Fixes https://github.com/matrix-org/synapse/issues/2431
Adds config option `encryption_enabled_by_default_for_room_type`, which determines whether encryption should be enabled with the default encryption algorithm in private or public rooms upon creation. Whether the room is private or public is decided based upon the room creation preset that is used.
Part of this PR is also pulling out all of the individual instances of `m.megolm.v1.aes-sha2` into a constant variable to eliminate typos ala https://github.com/matrix-org/synapse/pull/7637
Based on #7637
* Ensure account data stream IDs are unique.
The account data stream is shared between three tables, and the maximum
allocated ID was tracked in a dedicated table. Updating the max ID
happened outside the transaction that allocated the ID, leading to a
race where if the server was restarted then the same ID could be
allocated but the max ID failed to be updated, leading it to be reused.
The ID generators have support for tracking across multiple tables, so
we may as well use that instead of a dedicated table.
* Fix bug in account data replication stream.
If the same stream ID was used in both global and room account data then
the getting updates for the replication stream would fail due to
`heapq.merge(..)` trying to compare a `str` with a `None`. (This is
because you'd have two rows like `(534, '!room')` and `(534, None)` from
the room and global account data tables).
Fix is just to order by stream ID, since we don't rely on the ordering
beyond that. The bug where stream IDs can be reused should be fixed now,
so this case shouldn't happen going forward.
Fixes#7617
While working on https://github.com/matrix-org/synapse/issues/5665 I found myself digging into the `Ratelimiter` class and seeing that it was both:
* Rather undocumented, and
* causing a *lot* of config checks
This PR attempts to refactor and comment the `Ratelimiter` class, as well as encourage config file accesses to only be done at instantiation.
Best to be reviewed commit-by-commit.
* Expose `return_html_error`, and allow it to take a Jinja2 template instead of a raw string
* Clean up exception handling in SAML2ResponseResource
* use the existing code in `return_html_error` instead of re-implementing it
(giving it a jinja2 template rather than inventing a new form of template)
* do the exception-catching in the REST layer rather than in the handler
layer, to make sure we catch all exceptions.
It looks like `user_device_resync` was ignoring cross-signing keys from the results received from the remote server. This patch fixes this, by processing these keys using the same process `_handle_signing_key_updates` does (and effectively factor that part out of that function).
The query keeps showing up in my slow query log.
This changes the plan under the top-level Sort node from
```
WindowAgg (cost=280335.88..292963.15 rows=561212 width=80) (actual time=138.651..160.562 rows=27112 loops=1)
-> Sort (cost=280335.88..281738.91 rows=561212 width=84) (actual time=138.597..140.622 rows=27112 loops=1)
Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group
Sort Method: quicksort Memory: 4581kB
-> Nested Loop (cost=2.83..226745.22 rows=561212 width=84) (actual time=21.548..47.657 rows=27112 loops=1)
-> HashAggregate (cost=2.27..3.28 rows=101 width=8) (actual time=21.526..21.535 rows=20 loops=1)
Group Key: state.state_group
-> CTE Scan on state (cost=0.00..2.02 rows=101 width=8) (actual time=21.280..21.493 rows=20 loops=1)
-> Index Scan using state_groups_state_type_idx on state_groups_state (cost=0.56..2189.40 rows=5557 width=84) (actual time=0.005..0.991 rows=1356 loops=20)
Index Cond: (state_group = state.state_group)
```
to
```
Nested Loop (cost=2.83..226745.22 rows=561212 width=84) (actual time=24.194..52.834 rows=27112 loops=1)
-> HashAggregate (cost=2.27..3.28 rows=101 width=8) (actual time=24.130..24.138 rows=20 loops=1)
Group Key: state.state_group
-> CTE Scan on state (cost=0.00..2.02 rows=101 width=8) (actual time=23.887..24.113 rows=20 loops=1)
-> Index Scan using state_groups_state_type_idx on state_groups_state (cost=0.56..2189.40 rows=5557 width=84) (actual time=0.016..1.159 rows=1356 loops=20)
Index Cond: (state_group = state.state_group)
```
This cuts the execution time from ~190ms to ~130ms, i.e. a reduction
of ~30%.
The full plans are visualised at https://explain.depesz.com/s/WpbT and
https://explain.depesz.com/s/KlEk
Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Without this patch, if an error happens which isn't caught by `user_device_resync`, then `_maybe_retry_device_resync` would fail, without retrying the next users in the iteration. This patch fixes this so that it now only logs an error in this case.
==============================
Bugfixes
--------
- Fix cache config to not apply cache factor to event cache. Regression in v1.14.0rc1. ([\#7578](https://github.com/matrix-org/synapse/issues/7578))
- Fix bug where `ReplicationStreamer` was not always started when replication was enabled. Bug introduced in v1.14.0rc1. ([\#7579](https://github.com/matrix-org/synapse/issues/7579))
- Fix specifying individual cache factors for caches with special characters in their name. Regression in v1.14.0rc1. ([\#7580](https://github.com/matrix-org/synapse/issues/7580))
Improved Documentation
----------------------
- Fix the OIDC `client_auth_method` value in the sample config. ([\#7581](https://github.com/matrix-org/synapse/issues/7581))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdVkXOgzrGzds0jtrHgFcFF8ZFs0FAl7Oh9sACgkQHgFcFF8Z
Fs2OmQ//SlksllrJ7egG0JeprfCCz9TJt7cYJJhCprqfBVNs8lIc+UT/NX0/duGe
g2KC4kPGEyusRmGSV43Yt/m/qqhcdM4tXq4iErgHG/xOTqNN/GVvrE3RaUvo9ydS
9IdeIkDON5ylEe8sSigbBGUnpCS20Stch1z2edoUSQHNnMMSgmTNpFaZnCEXc+Us
EYv+HfCeShdLXbzMioYj6B5qNiRnG+hJmz+h40Bmp/HAuZRyp4kx5EawAgMXIDfy
DGWn3H7TksqC+7zETAlQgMwFzggwQx64Dpa9w2RFAchUCG6bi8pM2U3doVDGlu1i
4HcltfBE+fE5Sy2tT2zz8qpaGGFwAp6K/c+4h29PwVBKSHRN4nepSqNHHb3fA1Ea
GI8DWiHGyZWzlMmKI4x85lMFyAGvGJiO1Jo8icietJHZ3+7tr6SUlnIPwlLFdkUv
xCKmlrYzwDO5enzoF6HuddnMLbtE304Ckr+vj5gWpBwYf3FIXpUS101YBV8zXTsB
NJBMu2fjYEkPQ/d9YjWRZwL312Cb68Kytlp2ETiTuuyfLCA5Df2wdA6yReemyg//
ZFfN1Z/Dc6p6MVRF3sf38jcWKX3r9ErNQ0p5q//uo4JbpnMW7FLXiSv/xonj4JTv
VLSFy6F0YIIVTol4H9Fj7f3iq8/zsJe3kBE/Kycd4uhplmfvK2w=
=BVTL
-----END PGP SIGNATURE-----
Merge tag 'v1.14.0rc2' into develop
Synapse 1.14.0rc2 (2020-05-27)
==============================
Bugfixes
--------
- Fix cache config to not apply cache factor to event cache. Regression in v1.14.0rc1. ([\#7578](https://github.com/matrix-org/synapse/issues/7578))
- Fix bug where `ReplicationStreamer` was not always started when replication was enabled. Bug introduced in v1.14.0rc1. ([\#7579](https://github.com/matrix-org/synapse/issues/7579))
- Fix specifying individual cache factors for caches with special characters in their name. Regression in v1.14.0rc1. ([\#7580](https://github.com/matrix-org/synapse/issues/7580))
Improved Documentation
----------------------
- Fix the OIDC `client_auth_method` value in the sample config. ([\#7581](https://github.com/matrix-org/synapse/issues/7581))
'client_auth_method' commented out value was erronously 'client_auth_basic',
when code and docstring says it should be 'client_secret_basic'.
Signed-off-by: Jason Robinson <jasonr@matrix.org>