Commit Graph

17629 Commits

Author SHA1 Message Date
Richard van der Hoff
c1ef579b63
Add prometheus metrics to track federation delays (#8430)
Add a pair of federation metrics to track the delays in sending PDUs to/from 
particular servers.
2020-10-01 11:09:12 +01:00
Erik Johnston
7941372ec8
Make token serializing/deserializing async (#8427)
The idea is that in future tokens will encode a mapping of instance to position. However, we don't want to include the full instance name in the string representation, so instead we'll have a mapping between instance name and an immutable integer ID in the DB that we can use instead. We'll then do the lookup when we serialize/deserialize the token (we could alternatively pass around an `Instance` type that includes both the name and ID, but that turns out to be a lot more invasive).
2020-09-30 20:29:19 +01:00
Richard van der Hoff
a0a1ba6973
Merge pull request #8425 from matrix-org/rav/extremity_metrics
Add an improved "forward extremities" metric
2020-09-30 19:33:27 +01:00
Patrick Cloke
8b40843392
Allow additional SSO properties to be passed to the client (#8413) 2020-09-30 13:02:43 -04:00
Richard van der Hoff
32acab3fa2 changelog 2020-09-30 16:49:15 +01:00
Richard van der Hoff
20e7c4de26 Add an improved "forward extremities" metric
Hopefully, N(extremities) * N(state_events) is a more realistic approximation
to "how big a problem is this room?".
2020-09-30 16:49:15 +01:00
Richard van der Hoff
6d2d42f8fb Rewrite BucketCollector
This was a bit unweildy for what I wanted: in particular, I wanted to assign
each measurement straight into a bucket, rather than storing an intermediate
Counter which didn't do any bucketing at all.

I've replaced it with something that is hopefully a bit easier to use.

(I'm not entirely sure what the difference between a HistogramMetricFamily and
a GaugeHistogramMetricFamily is, but given our counters can go down as well as
up the latter *sounds* more accurate?)
2020-09-30 16:49:15 +01:00
Richard van der Hoff
1c8ca2c543 Fix _exposition.py to stop stripping samples
Our hacked-up `_exposition.py` was stripping out some samples it shouldn't
have been. Put them back in, to more closely match the upstream
`exposition.py`.
2020-09-30 16:45:43 +01:00
Richard van der Hoff
ceafb5a1c6
Drop support for ancient prometheus_client (#8426)
Drop compatibility hacks for prometheus-client pre 0.4.0. Debian stretch and
Fedora 31 both have newer versions, so hopefully this will be ok.
2020-09-30 16:42:05 +01:00
Richard van der Hoff
c429dfc300
Merge pull request #8420 from matrix-org/rav/state_res_stats
Report metrics on expensive rooms for state res
2020-09-30 10:37:52 +01:00
Erik Johnston
ea70f1c362
Various clean ups to room stream tokens. (#8423) 2020-09-29 21:48:33 +01:00
Aaron Raimist
8238b55e08
Update description of server_name config option (#8415) 2020-09-29 13:50:25 -04:00
Richard van der Hoff
d4274dd17e changelog 2020-09-29 17:35:20 +01:00
Richard van der Hoff
057f04fa9f Report state res metrics to Prometheus and log 2020-09-29 17:35:20 +01:00
Richard van der Hoff
8412c08a87 Move Measure calls into resolve_events_with_store 2020-09-29 17:35:20 +01:00
Richard van der Hoff
ba700074c6 Expose a get_resource_usage method in Measure 2020-09-29 17:35:20 +01:00
Richard van der Hoff
937393abd8 Move resolve_events_with_store into StateResolutionHandler 2020-09-29 17:35:20 +01:00
Will Hunt
c2bdf040aa
Discard an empty upload_name before persisting an uploaded file (#7905) 2020-09-29 12:15:27 -04:00
Andrew Morgan
e154f7ccb5
Don't check whether a 3pid is allowed to register during password reset (#8414)
* Don't check whether a 3pid is allowed to register during password reset

This endpoint should only deal with emails that have already been approved, and
are attached with user's account. There's no need to re-check them here.

* Changelog
2020-09-29 16:42:25 +01:00
Erik Johnston
b1433bf231
Don't table scan events on worker startup (#8419)
* Fix table scan of events on worker startup.

This happened because we assumed "new" writers had an initial stream
position of 0, so the replication code tried to fetch all events written
by the instance between 0 and the current position.

Instead, set the initial position of new writers to the current
persisted up to position, on the assumption that new writers won't have
written anything before that point.

* Consider old writers coming back as "new".

Otherwise we'd try and fetch entries between the old stale token and the
current position, even though it won't have written any rows.

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2020-09-29 16:42:19 +01:00
Richard van der Hoff
2649d545a5
Mypy fixes for synapse.handlers.federation (#8422)
For some reason, an apparently unrelated PR upset mypy about this module. Here are a number of little fixes.
2020-09-29 15:57:36 +01:00
Andrew Morgan
f43c66d23b Merge branch 'develop' of github.com:matrix-org/synapse into anoa/info-mainline-no-check-password-reset 2020-09-29 14:21:41 +01:00
Andrew Morgan
12f0d18611
Add support for running Complement against the local checkout (#8317)
This PR adds a script that:

* Builds the local Synapse checkout using our existing `docker/Dockerfile` image.
* Downloads [Complement](https://github.com/matrix-org/complement/)'s source code.
* Builds the [Synapse.Dockerfile](https://github.com/matrix-org/complement/blob/master/dockerfiles/Synapse.Dockerfile) using the above dockerfile as a base.
* Builds and runs Complement against it.

This set up differs slightly from [that of the dendrite repo](https://github.com/matrix-org/dendrite/blob/master/build/scripts/complement.sh) (`complement.sh`, `Complement.Dockerfile`), which instead stores a separate, but slightly modified, dockerfile in Dendrite's repo rather than running the one stored in Complement's repo. That synapse equivalent to that dockerfile (`Synapse.Dockerfile`) in Complement's repo is just based on top of `matrixdotorg/synapse:latest`, which we opt to build here locally.

Thus copying over the files from Complement's repo wouldn't change any functionality, and would result in two instances of the same files. So just using the dockerfile in Complement's repo was decided upon instead.
2020-09-29 13:47:47 +01:00
Will Hunt
8676d8ab2e
Filter out appservices from mau count (#8404)
This is an attempt to fix #8403.
2020-09-29 13:11:02 +01:00
Andrew Morgan
1c6b8752b8
Only assert valid next_link params when provided (#8417)
Broken in https://github.com/matrix-org/synapse/pull/8275 and has yet to be put in a release. Fixes https://github.com/matrix-org/synapse/issues/8418.

`next_link` is an optional parameter. However, we were checking whether the `next_link` param was valid, even if it wasn't provided. In that case, `next_link` was `None`, which would clearly not be a valid URL.

This would prevent password reset and other operations if `next_link` was not provided, and the `next_link_domain_whitelist` config option was set.
2020-09-29 12:36:44 +01:00
Richard van der Hoff
866c84da8d
Add metrics to track success/otherwise of replication requests (#8406)
One hope is that this might provide some insights into #3365.
2020-09-29 11:06:11 +01:00
Richard van der Hoff
1c262431f9
Fix handling of connection timeouts in outgoing http requests (#8400)
* Remove `on_timeout_cancel` from `timeout_deferred`

The `on_timeout_cancel` param to `timeout_deferred` wasn't always called on a
timeout (in particular if the canceller raised an exception), so it was
unreliable. It was also only used in one place, and to be honest it's easier to
do what it does a different way.

* Fix handling of connection timeouts in outgoing http requests

Turns out that if we get a timeout during connection, then a different
exception is raised, which wasn't always handled correctly.

To fix it, catch the exception in SimpleHttpClient and turn it into a
RequestTimedOutError (which is already a documented exception).

Also add a description to RequestTimedOutError so that we can see which stage
it failed at.

* Fix incorrect handling of timeouts reading federation responses

This was trapping the wrong sort of TimeoutError, so was never being hit.

The effect was relatively minor, but we should fix this so that it does the
expected thing.

* Fix inconsistent handling of `timeout` param between methods

`get_json`, `put_json` and `delete_json` were applying a different timeout to
the response body to `post_json`; bring them in line and test.

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
2020-09-29 10:29:21 +01:00
Andrew Morgan
fe443acaee Changelog 2020-09-28 18:51:41 +01:00
Andrew Morgan
d4605d1f16 Don't check whether a 3pid is allowed to register during password reset
This endpoint should only deal with emails that have already been approved, and
are attached with user's account. There's no need to re-check them here.
2020-09-28 18:46:59 +01:00
Erik Johnston
bd380d942f
Add checks for postgres sequence consistency (#8402) 2020-09-28 18:00:30 +01:00
Richard van der Hoff
5e3ca12b15
Create a mechanism for marking tests "logcontext clean" (#8399) 2020-09-28 17:58:33 +01:00
Dagfinn Ilmari Mannsåker
bd715e1278
Add ui_auth_sessions_ips table to synapse_port_db ignore list (#8410)
This table was created in #8034 (1.20.0).  It references
`ui_auth_sessions`, which is ignored, so this one should be too.

Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
2020-09-28 15:35:02 +01:00
Richard van der Hoff
450ec48445
A pair of tiny cleanups in the federation request code. (#8401) 2020-09-28 13:15:00 +01:00
Matthew Hodgson
4b3a1faa08 typo 2020-09-28 00:23:35 +01:00
Patrick Cloke
31acc5c309
Escape the error description on the sso_error template. (#8405) 2020-09-25 11:05:54 -04:00
Richard van der Hoff
fec6f9ac17
Fix occasional "Re-starting finished log context" from keyring (#8398)
* Fix test_verify_json_objects_for_server_awaits_previous_requests

It turns out that this wasn't really testing what it thought it was testing
(in particular, `check_context` was turning failures into success, which was
making the tests pass even though it wasn't clear they should have been.

It was also somewhat overcomplex - we can test what it was trying to test
without mocking out perspectives servers.

* Fix warnings about finished logcontexts in the keyring

We need to make sure that we finish the key fetching magic before we run the
verifying code, to ensure that we don't mess up our logcontexts.
2020-09-25 12:29:54 +01:00
Tdxdxoz
abd04b6af0
Allow existing users to login via OpenID Connect. (#8345)
Co-authored-by: Benjamin Koch <bbbsnowball@gmail.com>

This adds configuration flags that will match a user to pre-existing users
when logging in via OpenID Connect. This is useful when switching to
an existing SSO system.
2020-09-25 07:01:45 -04:00
Erik Johnston
3e87d79e1c
Fix schema delta for servers that have not backfilled (#8396)
Fixes #8395.
2020-09-25 09:58:32 +01:00
Andrew Morgan
c77c4a2fcd Merge branch 'master' into develop 2020-09-24 17:00:33 +01:00
Erik Johnston
f112cfe5bb
Fix MultiWriteIdGenerator's handling of restarts. (#8374)
On startup `MultiWriteIdGenerator` fetches the maximum stream ID for
each instance from the table and uses that as its initial "current
position" for each writer. This is problematic as a) it involves either
a scan of events table or an index (neither of which is ideal), and b)
if rows are being persisted out of order elsewhere while the process
restarts then using the maximum stream ID is not correct. This could
theoretically lead to race conditions where e.g. events that are
persisted out of order are not sent down sync streams.

We fix this by creating a new table that tracks the current positions of
each writer to the stream, and update it each time we finish persisting
a new entry. This is a relatively small overhead when persisting events.
However for the cache invalidation stream this is a much bigger relative
overhead, so instead we note that for invalidation we don't actually
care about reliability over restarts (as there's no caches to
invalidate) and simply don't bother reading and writing to the new table
in that particular case.
2020-09-24 16:53:51 +01:00
Andrew Morgan
ab903e7337 s/URLs/variables in changelog 2020-09-24 16:35:31 +01:00
Andrew Morgan
271086ebda s/accidentally/incorrectly in changelog 2020-09-24 16:33:49 +01:00
Andrew Morgan
5ce5a9f144 Update changelog wording 2020-09-24 16:26:57 +01:00
Andrew Morgan
920dd1083e 1.20.1 2020-09-24 16:25:33 +01:00
Patrick Cloke
f3e5c2e702 Mark the shadow_banned column as boolean in synapse_port_db. (#8386) 2020-09-24 16:24:24 +01:00
Andrew Morgan
3f4a2a7064
Hotfix: disable autoescape by default when rendering Jinja2 templates (#8394)
#8037 changed the default `autoescape` option when rendering Jinja2 templates from `False` to `True`. This caused some bugs, noticeably around redirect URLs being escaped in SAML2 auth confirmation templates, causing those URLs to break for users.

This change returns the previous behaviour as it stood. We may want to look at each template individually and see whether autoescaping is a good idea at some point, but for now lets just fix the breakage.
2020-09-24 16:24:08 +01:00
Richard van der Hoff
11c9e17738
Add type annotations to SimpleHttpClient (#8372) 2020-09-24 15:47:20 +01:00
Erik Johnston
6fdf577593
Add new sequences to port DB script (#8387) 2020-09-24 13:43:49 +01:00
Erik Johnston
ac11fcbbb8
Add EventStreamPosition type (#8388)
The idea is to remove some of the places we pass around `int`, where it can represent one of two things:

1. the position of an event in the stream; or
2. a token that partitions the stream, used as part of the stream tokens.

The valid operations are then:

1. did a position happen before or after a token;
2. get all events that happened before or after a token; and
3. get all events between two tokens.

(Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)
2020-09-24 13:24:17 +01:00
Patrick Cloke
13099ae431
Mark the shadow_banned column as boolean in synapse_port_db. (#8386) 2020-09-24 08:13:55 -04:00