Commit Graph

12458 Commits

Author SHA1 Message Date
Erik Johnston
fa8934b175 Reduce serialization errors in MultiWriterIdGen (#8456)
We call `_update_stream_positions_table_txn` a lot, which is an UPSERT
that can conflict in `REPEATABLE READ` isolation level. Instead of doing
a transaction consisting of a single query we may as well run it outside
of a transaction.
2020-10-07 17:08:58 +01:00
Richard van der Hoff
f6c526ce67 1.21.0rc2 2020-10-02 12:46:58 +01:00
Richard van der Hoff
73d93039ff
Fix bug in remote thumbnail search (#8438)
#7124 changed the behaviour of remote thumbnails so that the thumbnailing method was included in the filename of the thumbnail. To support existing files, it included a fallback so that we would check the old filename if the new filename didn't exist.

Unfortunately, it didn't apply this logic to storage providers, so any thumbnails stored on such a storage provider was broken.
2020-10-02 12:29:29 +01:00
Erik Johnston
695240d34a
Fix DB query on startup for negative streams. (#8447)
For negative streams we have to negate the internal stream ID before
querying the DB.

The effect of this bug was to query far too many rows, slowing start up
time, but we would correctly filter the results afterwards so there was
no ill effect.
2020-10-02 12:22:19 +01:00
Patrick Cloke
34ff8da83b
Convert additional templates to Jinja (#8444)
This converts a few more of our inline HTML templates to Jinja. This is somewhat part of #7280 and should make it a bit easier to customize these in the future.
2020-10-02 11:15:53 +01:00
Richard van der Hoff
3bd3707cb9
Fix malformed log line in new federation "catch up" logic (#8442) 2020-10-02 11:05:29 +01:00
Patrick Cloke
61aaf36a1c
Do not expose the experimental appservice login flow to clients. (#8440) 2020-10-01 13:38:20 -04:00
Richard van der Hoff
b1f4e6e4fc
fix a logging error in thumbnailer (#8435)
Introduced in #8236
2020-10-01 13:34:24 +01:00
Richard van der Hoff
c501c80e46 fix version number
we're not doing a final release yet!
2020-10-01 13:17:59 +01:00
Richard van der Hoff
cc40a59b4a 1.21.0 2020-10-01 13:14:56 +01:00
Richard van der Hoff
c1ef579b63
Add prometheus metrics to track federation delays (#8430)
Add a pair of federation metrics to track the delays in sending PDUs to/from 
particular servers.
2020-10-01 11:09:12 +01:00
Erik Johnston
7941372ec8
Make token serializing/deserializing async (#8427)
The idea is that in future tokens will encode a mapping of instance to position. However, we don't want to include the full instance name in the string representation, so instead we'll have a mapping between instance name and an immutable integer ID in the DB that we can use instead. We'll then do the lookup when we serialize/deserialize the token (we could alternatively pass around an `Instance` type that includes both the name and ID, but that turns out to be a lot more invasive).
2020-09-30 20:29:19 +01:00
Richard van der Hoff
a0a1ba6973
Merge pull request #8425 from matrix-org/rav/extremity_metrics
Add an improved "forward extremities" metric
2020-09-30 19:33:27 +01:00
Patrick Cloke
8b40843392
Allow additional SSO properties to be passed to the client (#8413) 2020-09-30 13:02:43 -04:00
Richard van der Hoff
20e7c4de26 Add an improved "forward extremities" metric
Hopefully, N(extremities) * N(state_events) is a more realistic approximation
to "how big a problem is this room?".
2020-09-30 16:49:15 +01:00
Richard van der Hoff
6d2d42f8fb Rewrite BucketCollector
This was a bit unweildy for what I wanted: in particular, I wanted to assign
each measurement straight into a bucket, rather than storing an intermediate
Counter which didn't do any bucketing at all.

I've replaced it with something that is hopefully a bit easier to use.

(I'm not entirely sure what the difference between a HistogramMetricFamily and
a GaugeHistogramMetricFamily is, but given our counters can go down as well as
up the latter *sounds* more accurate?)
2020-09-30 16:49:15 +01:00
Richard van der Hoff
1c8ca2c543 Fix _exposition.py to stop stripping samples
Our hacked-up `_exposition.py` was stripping out some samples it shouldn't
have been. Put them back in, to more closely match the upstream
`exposition.py`.
2020-09-30 16:45:43 +01:00
Richard van der Hoff
ceafb5a1c6
Drop support for ancient prometheus_client (#8426)
Drop compatibility hacks for prometheus-client pre 0.4.0. Debian stretch and
Fedora 31 both have newer versions, so hopefully this will be ok.
2020-09-30 16:42:05 +01:00
Richard van der Hoff
c429dfc300
Merge pull request #8420 from matrix-org/rav/state_res_stats
Report metrics on expensive rooms for state res
2020-09-30 10:37:52 +01:00
Erik Johnston
ea70f1c362
Various clean ups to room stream tokens. (#8423) 2020-09-29 21:48:33 +01:00
Aaron Raimist
8238b55e08
Update description of server_name config option (#8415) 2020-09-29 13:50:25 -04:00
Richard van der Hoff
057f04fa9f Report state res metrics to Prometheus and log 2020-09-29 17:35:20 +01:00
Richard van der Hoff
8412c08a87 Move Measure calls into resolve_events_with_store 2020-09-29 17:35:20 +01:00
Richard van der Hoff
ba700074c6 Expose a get_resource_usage method in Measure 2020-09-29 17:35:20 +01:00
Richard van der Hoff
937393abd8 Move resolve_events_with_store into StateResolutionHandler 2020-09-29 17:35:20 +01:00
Will Hunt
c2bdf040aa
Discard an empty upload_name before persisting an uploaded file (#7905) 2020-09-29 12:15:27 -04:00
Andrew Morgan
e154f7ccb5
Don't check whether a 3pid is allowed to register during password reset (#8414)
* Don't check whether a 3pid is allowed to register during password reset

This endpoint should only deal with emails that have already been approved, and
are attached with user's account. There's no need to re-check them here.

* Changelog
2020-09-29 16:42:25 +01:00
Erik Johnston
b1433bf231
Don't table scan events on worker startup (#8419)
* Fix table scan of events on worker startup.

This happened because we assumed "new" writers had an initial stream
position of 0, so the replication code tried to fetch all events written
by the instance between 0 and the current position.

Instead, set the initial position of new writers to the current
persisted up to position, on the assumption that new writers won't have
written anything before that point.

* Consider old writers coming back as "new".

Otherwise we'd try and fetch entries between the old stale token and the
current position, even though it won't have written any rows.

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2020-09-29 16:42:19 +01:00
Richard van der Hoff
2649d545a5
Mypy fixes for synapse.handlers.federation (#8422)
For some reason, an apparently unrelated PR upset mypy about this module. Here are a number of little fixes.
2020-09-29 15:57:36 +01:00
Andrew Morgan
f43c66d23b Merge branch 'develop' of github.com:matrix-org/synapse into anoa/info-mainline-no-check-password-reset 2020-09-29 14:21:41 +01:00
Will Hunt
8676d8ab2e
Filter out appservices from mau count (#8404)
This is an attempt to fix #8403.
2020-09-29 13:11:02 +01:00
Andrew Morgan
1c6b8752b8
Only assert valid next_link params when provided (#8417)
Broken in https://github.com/matrix-org/synapse/pull/8275 and has yet to be put in a release. Fixes https://github.com/matrix-org/synapse/issues/8418.

`next_link` is an optional parameter. However, we were checking whether the `next_link` param was valid, even if it wasn't provided. In that case, `next_link` was `None`, which would clearly not be a valid URL.

This would prevent password reset and other operations if `next_link` was not provided, and the `next_link_domain_whitelist` config option was set.
2020-09-29 12:36:44 +01:00
Richard van der Hoff
866c84da8d
Add metrics to track success/otherwise of replication requests (#8406)
One hope is that this might provide some insights into #3365.
2020-09-29 11:06:11 +01:00
Richard van der Hoff
1c262431f9
Fix handling of connection timeouts in outgoing http requests (#8400)
* Remove `on_timeout_cancel` from `timeout_deferred`

The `on_timeout_cancel` param to `timeout_deferred` wasn't always called on a
timeout (in particular if the canceller raised an exception), so it was
unreliable. It was also only used in one place, and to be honest it's easier to
do what it does a different way.

* Fix handling of connection timeouts in outgoing http requests

Turns out that if we get a timeout during connection, then a different
exception is raised, which wasn't always handled correctly.

To fix it, catch the exception in SimpleHttpClient and turn it into a
RequestTimedOutError (which is already a documented exception).

Also add a description to RequestTimedOutError so that we can see which stage
it failed at.

* Fix incorrect handling of timeouts reading federation responses

This was trapping the wrong sort of TimeoutError, so was never being hit.

The effect was relatively minor, but we should fix this so that it does the
expected thing.

* Fix inconsistent handling of `timeout` param between methods

`get_json`, `put_json` and `delete_json` were applying a different timeout to
the response body to `post_json`; bring them in line and test.

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
2020-09-29 10:29:21 +01:00
Andrew Morgan
d4605d1f16 Don't check whether a 3pid is allowed to register during password reset
This endpoint should only deal with emails that have already been approved, and
are attached with user's account. There's no need to re-check them here.
2020-09-28 18:46:59 +01:00
Erik Johnston
bd380d942f
Add checks for postgres sequence consistency (#8402) 2020-09-28 18:00:30 +01:00
Richard van der Hoff
5e3ca12b15
Create a mechanism for marking tests "logcontext clean" (#8399) 2020-09-28 17:58:33 +01:00
Richard van der Hoff
450ec48445
A pair of tiny cleanups in the federation request code. (#8401) 2020-09-28 13:15:00 +01:00
Matthew Hodgson
4b3a1faa08 typo 2020-09-28 00:23:35 +01:00
Patrick Cloke
31acc5c309
Escape the error description on the sso_error template. (#8405) 2020-09-25 11:05:54 -04:00
Richard van der Hoff
fec6f9ac17
Fix occasional "Re-starting finished log context" from keyring (#8398)
* Fix test_verify_json_objects_for_server_awaits_previous_requests

It turns out that this wasn't really testing what it thought it was testing
(in particular, `check_context` was turning failures into success, which was
making the tests pass even though it wasn't clear they should have been.

It was also somewhat overcomplex - we can test what it was trying to test
without mocking out perspectives servers.

* Fix warnings about finished logcontexts in the keyring

We need to make sure that we finish the key fetching magic before we run the
verifying code, to ensure that we don't mess up our logcontexts.
2020-09-25 12:29:54 +01:00
Tdxdxoz
abd04b6af0
Allow existing users to login via OpenID Connect. (#8345)
Co-authored-by: Benjamin Koch <bbbsnowball@gmail.com>

This adds configuration flags that will match a user to pre-existing users
when logging in via OpenID Connect. This is useful when switching to
an existing SSO system.
2020-09-25 07:01:45 -04:00
Erik Johnston
3e87d79e1c
Fix schema delta for servers that have not backfilled (#8396)
Fixes #8395.
2020-09-25 09:58:32 +01:00
Andrew Morgan
c77c4a2fcd Merge branch 'master' into develop 2020-09-24 17:00:33 +01:00
Erik Johnston
f112cfe5bb
Fix MultiWriteIdGenerator's handling of restarts. (#8374)
On startup `MultiWriteIdGenerator` fetches the maximum stream ID for
each instance from the table and uses that as its initial "current
position" for each writer. This is problematic as a) it involves either
a scan of events table or an index (neither of which is ideal), and b)
if rows are being persisted out of order elsewhere while the process
restarts then using the maximum stream ID is not correct. This could
theoretically lead to race conditions where e.g. events that are
persisted out of order are not sent down sync streams.

We fix this by creating a new table that tracks the current positions of
each writer to the stream, and update it each time we finish persisting
a new entry. This is a relatively small overhead when persisting events.
However for the cache invalidation stream this is a much bigger relative
overhead, so instead we note that for invalidation we don't actually
care about reliability over restarts (as there's no caches to
invalidate) and simply don't bother reading and writing to the new table
in that particular case.
2020-09-24 16:53:51 +01:00
Andrew Morgan
920dd1083e 1.20.1 2020-09-24 16:25:33 +01:00
Andrew Morgan
3f4a2a7064
Hotfix: disable autoescape by default when rendering Jinja2 templates (#8394)
#8037 changed the default `autoescape` option when rendering Jinja2 templates from `False` to `True`. This caused some bugs, noticeably around redirect URLs being escaped in SAML2 auth confirmation templates, causing those URLs to break for users.

This change returns the previous behaviour as it stood. We may want to look at each template individually and see whether autoescaping is a good idea at some point, but for now lets just fix the breakage.
2020-09-24 16:24:08 +01:00
Richard van der Hoff
11c9e17738
Add type annotations to SimpleHttpClient (#8372) 2020-09-24 15:47:20 +01:00
Erik Johnston
ac11fcbbb8
Add EventStreamPosition type (#8388)
The idea is to remove some of the places we pass around `int`, where it can represent one of two things:

1. the position of an event in the stream; or
2. a token that partitions the stream, used as part of the stream tokens.

The valid operations are then:

1. did a position happen before or after a token;
2. get all events that happened before or after a token; and
3. get all events between two tokens.

(Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)
2020-09-24 13:24:17 +01:00
Richard van der Hoff
2983049a77
Factor out _send_dummy_event_for_room (#8370)
this makes it possible to use from the manhole, and seems cleaner anyway.
2020-09-23 18:18:43 +01:00