While working on https://github.com/matrix-org/synapse/issues/5665 I found myself digging into the `Ratelimiter` class and seeing that it was both:
* Rather undocumented, and
* causing a *lot* of config checks
This PR attempts to refactor and comment the `Ratelimiter` class, as well as encourage config file accesses to only be done at instantiation.
Best to be reviewed commit-by-commit.
* Expose `return_html_error`, and allow it to take a Jinja2 template instead of a raw string
* Clean up exception handling in SAML2ResponseResource
* use the existing code in `return_html_error` instead of re-implementing it
(giving it a jinja2 template rather than inventing a new form of template)
* do the exception-catching in the REST layer rather than in the handler
layer, to make sure we catch all exceptions.
It looks like `user_device_resync` was ignoring cross-signing keys from the results received from the remote server. This patch fixes this, by processing these keys using the same process `_handle_signing_key_updates` does (and effectively factor that part out of that function).
The query keeps showing up in my slow query log.
This changes the plan under the top-level Sort node from
```
WindowAgg (cost=280335.88..292963.15 rows=561212 width=80) (actual time=138.651..160.562 rows=27112 loops=1)
-> Sort (cost=280335.88..281738.91 rows=561212 width=84) (actual time=138.597..140.622 rows=27112 loops=1)
Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group
Sort Method: quicksort Memory: 4581kB
-> Nested Loop (cost=2.83..226745.22 rows=561212 width=84) (actual time=21.548..47.657 rows=27112 loops=1)
-> HashAggregate (cost=2.27..3.28 rows=101 width=8) (actual time=21.526..21.535 rows=20 loops=1)
Group Key: state.state_group
-> CTE Scan on state (cost=0.00..2.02 rows=101 width=8) (actual time=21.280..21.493 rows=20 loops=1)
-> Index Scan using state_groups_state_type_idx on state_groups_state (cost=0.56..2189.40 rows=5557 width=84) (actual time=0.005..0.991 rows=1356 loops=20)
Index Cond: (state_group = state.state_group)
```
to
```
Nested Loop (cost=2.83..226745.22 rows=561212 width=84) (actual time=24.194..52.834 rows=27112 loops=1)
-> HashAggregate (cost=2.27..3.28 rows=101 width=8) (actual time=24.130..24.138 rows=20 loops=1)
Group Key: state.state_group
-> CTE Scan on state (cost=0.00..2.02 rows=101 width=8) (actual time=23.887..24.113 rows=20 loops=1)
-> Index Scan using state_groups_state_type_idx on state_groups_state (cost=0.56..2189.40 rows=5557 width=84) (actual time=0.016..1.159 rows=1356 loops=20)
Index Cond: (state_group = state.state_group)
```
This cuts the execution time from ~190ms to ~130ms, i.e. a reduction
of ~30%.
The full plans are visualised at https://explain.depesz.com/s/WpbT and
https://explain.depesz.com/s/KlEk
Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Without this patch, if an error happens which isn't caught by `user_device_resync`, then `_maybe_retry_device_resync` would fail, without retrying the next users in the iteration. This patch fixes this so that it now only logs an error in this case.
==============================
Bugfixes
--------
- Fix cache config to not apply cache factor to event cache. Regression in v1.14.0rc1. ([\#7578](https://github.com/matrix-org/synapse/issues/7578))
- Fix bug where `ReplicationStreamer` was not always started when replication was enabled. Bug introduced in v1.14.0rc1. ([\#7579](https://github.com/matrix-org/synapse/issues/7579))
- Fix specifying individual cache factors for caches with special characters in their name. Regression in v1.14.0rc1. ([\#7580](https://github.com/matrix-org/synapse/issues/7580))
Improved Documentation
----------------------
- Fix the OIDC `client_auth_method` value in the sample config. ([\#7581](https://github.com/matrix-org/synapse/issues/7581))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdVkXOgzrGzds0jtrHgFcFF8ZFs0FAl7Oh9sACgkQHgFcFF8Z
Fs2OmQ//SlksllrJ7egG0JeprfCCz9TJt7cYJJhCprqfBVNs8lIc+UT/NX0/duGe
g2KC4kPGEyusRmGSV43Yt/m/qqhcdM4tXq4iErgHG/xOTqNN/GVvrE3RaUvo9ydS
9IdeIkDON5ylEe8sSigbBGUnpCS20Stch1z2edoUSQHNnMMSgmTNpFaZnCEXc+Us
EYv+HfCeShdLXbzMioYj6B5qNiRnG+hJmz+h40Bmp/HAuZRyp4kx5EawAgMXIDfy
DGWn3H7TksqC+7zETAlQgMwFzggwQx64Dpa9w2RFAchUCG6bi8pM2U3doVDGlu1i
4HcltfBE+fE5Sy2tT2zz8qpaGGFwAp6K/c+4h29PwVBKSHRN4nepSqNHHb3fA1Ea
GI8DWiHGyZWzlMmKI4x85lMFyAGvGJiO1Jo8icietJHZ3+7tr6SUlnIPwlLFdkUv
xCKmlrYzwDO5enzoF6HuddnMLbtE304Ckr+vj5gWpBwYf3FIXpUS101YBV8zXTsB
NJBMu2fjYEkPQ/d9YjWRZwL312Cb68Kytlp2ETiTuuyfLCA5Df2wdA6yReemyg//
ZFfN1Z/Dc6p6MVRF3sf38jcWKX3r9ErNQ0p5q//uo4JbpnMW7FLXiSv/xonj4JTv
VLSFy6F0YIIVTol4H9Fj7f3iq8/zsJe3kBE/Kycd4uhplmfvK2w=
=BVTL
-----END PGP SIGNATURE-----
Merge tag 'v1.14.0rc2' into develop
Synapse 1.14.0rc2 (2020-05-27)
==============================
Bugfixes
--------
- Fix cache config to not apply cache factor to event cache. Regression in v1.14.0rc1. ([\#7578](https://github.com/matrix-org/synapse/issues/7578))
- Fix bug where `ReplicationStreamer` was not always started when replication was enabled. Bug introduced in v1.14.0rc1. ([\#7579](https://github.com/matrix-org/synapse/issues/7579))
- Fix specifying individual cache factors for caches with special characters in their name. Regression in v1.14.0rc1. ([\#7580](https://github.com/matrix-org/synapse/issues/7580))
Improved Documentation
----------------------
- Fix the OIDC `client_auth_method` value in the sample config. ([\#7581](https://github.com/matrix-org/synapse/issues/7581))
'client_auth_method' commented out value was erronously 'client_auth_basic',
when code and docstring says it should be 'client_secret_basic'.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
The bg update never managed to complete, because it kept being interrupted by
transactions which want to take a lock.
Just doing it in the foreground isn't that bad, and is a good deal simpler.
A couple of changes of significance:
* remove the `_last_ack < federation_position` condition, so that
updates will still be correctly processed after restart
* Correctly wire up send_federation_ack to the right class.
we can use `make_in_list_sql_clause` rather than doing our own half-baked
equivalent, which has the benefit of working just fine with empty lists.
(This has quite a lot of tests, so I think it's pretty safe)
The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).
Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.
People probably want to look at this commit by commit.
Instead of doing a complicated dance of deleting and moving aliases one
by one, which sends a canonical alias update into the old room for each
one, lets do it all in one go.
This also changes the function to move *all* local alias events to the new
room, however that happens later on anyway.