We call `_update_stream_positions_table_txn` a lot, which is an UPSERT
that can conflict in `REPEATABLE READ` isolation level. Instead of doing
a transaction consisting of a single query we may as well run it outside
of a transaction.
Currently when using multiple event persisters we (in the worst case) don't tell clients about events until all event persisters have persisted new events after the original event. This is a suboptimal, especially if one of the event persisters goes down.
To handle this, we encode the position of each event persister in the room tokens so that we can send events to clients immediately. To reduce the size of the token we do two things:
1. We create a unique immutable persistent mapping between instance names and a generated small integer ID, which we can encode in the tokens instead of the instance name; and
2. We encode the "persisted upto position" of the room token and then only explicitly include instances that have positions strictly greater than that.
The new tokens look something like: `m3478~1.3488~2.3489`, where the first number is the min position, and the subsequent `-` separated pairs are the instance ID to positions map. (We use `.` and `~` as separators as they're URL safe and not already used by `StreamToken`).
Lots of different module apis is not easy to maintain.
Rather than adding yet another ModuleApi(hs, hs.get_auth_handler()) incantation, first add an hs.get_module_api() method and use it where possible.
* Optimise and test state fetching for 3p event rules
Getting all the events at once is much more efficient than getting them
individually
* Test that 3p event rules can modify events
PR #8292 tried to maintain backwards compat with modules which don't provide a
`check_visibility_can_be_modified` method, but the tests weren't being run,
and the check didn't work.
This PR allows `ThirdPartyEventRules` modules to view, manipulate and block changes to the state of whether a room is published in the public rooms directory.
While the idea of whether a room is in the public rooms list is not kept within an event in the room, `ThirdPartyEventRules` generally deal with controlling which modifications can happen to a room. Public rooms fits within that idea, even if its toggle state isn't controlled through a state event.
There's no need for it to be in the dict as well as the events table. Instead,
we store it in a separate attribute in the EventInternalMetadata object, and
populate that on load.
This means that we can rely on it being correctly populated for any event which
has been persited to the database.
This is so we can tell what is going on when things are taking a while to start up.
The main change here is to ensure that transactions that are created during startup get correctly logged like normal transactions.
#7124 changed the behaviour of remote thumbnails so that the thumbnailing method was included in the filename of the thumbnail. To support existing files, it included a fallback so that we would check the old filename if the new filename didn't exist.
Unfortunately, it didn't apply this logic to storage providers, so any thumbnails stored on such a storage provider was broken.
For negative streams we have to negate the internal stream ID before
querying the DB.
The effect of this bug was to query far too many rows, slowing start up
time, but we would correctly filter the results afterwards so there was
no ill effect.
This converts a few more of our inline HTML templates to Jinja. This is somewhat part of #7280 and should make it a bit easier to customize these in the future.
The idea is that in future tokens will encode a mapping of instance to position. However, we don't want to include the full instance name in the string representation, so instead we'll have a mapping between instance name and an immutable integer ID in the DB that we can use instead. We'll then do the lookup when we serialize/deserialize the token (we could alternatively pass around an `Instance` type that includes both the name and ID, but that turns out to be a lot more invasive).
This was a bit unweildy for what I wanted: in particular, I wanted to assign
each measurement straight into a bucket, rather than storing an intermediate
Counter which didn't do any bucketing at all.
I've replaced it with something that is hopefully a bit easier to use.
(I'm not entirely sure what the difference between a HistogramMetricFamily and
a GaugeHistogramMetricFamily is, but given our counters can go down as well as
up the latter *sounds* more accurate?)
Our hacked-up `_exposition.py` was stripping out some samples it shouldn't
have been. Put them back in, to more closely match the upstream
`exposition.py`.
* Don't check whether a 3pid is allowed to register during password reset
This endpoint should only deal with emails that have already been approved, and
are attached with user's account. There's no need to re-check them here.
* Changelog
* Fix table scan of events on worker startup.
This happened because we assumed "new" writers had an initial stream
position of 0, so the replication code tried to fetch all events written
by the instance between 0 and the current position.
Instead, set the initial position of new writers to the current
persisted up to position, on the assumption that new writers won't have
written anything before that point.
* Consider old writers coming back as "new".
Otherwise we'd try and fetch entries between the old stale token and the
current position, even though it won't have written any rows.
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Broken in https://github.com/matrix-org/synapse/pull/8275 and has yet to be put in a release. Fixes https://github.com/matrix-org/synapse/issues/8418.
`next_link` is an optional parameter. However, we were checking whether the `next_link` param was valid, even if it wasn't provided. In that case, `next_link` was `None`, which would clearly not be a valid URL.
This would prevent password reset and other operations if `next_link` was not provided, and the `next_link_domain_whitelist` config option was set.
* Remove `on_timeout_cancel` from `timeout_deferred`
The `on_timeout_cancel` param to `timeout_deferred` wasn't always called on a
timeout (in particular if the canceller raised an exception), so it was
unreliable. It was also only used in one place, and to be honest it's easier to
do what it does a different way.
* Fix handling of connection timeouts in outgoing http requests
Turns out that if we get a timeout during connection, then a different
exception is raised, which wasn't always handled correctly.
To fix it, catch the exception in SimpleHttpClient and turn it into a
RequestTimedOutError (which is already a documented exception).
Also add a description to RequestTimedOutError so that we can see which stage
it failed at.
* Fix incorrect handling of timeouts reading federation responses
This was trapping the wrong sort of TimeoutError, so was never being hit.
The effect was relatively minor, but we should fix this so that it does the
expected thing.
* Fix inconsistent handling of `timeout` param between methods
`get_json`, `put_json` and `delete_json` were applying a different timeout to
the response body to `post_json`; bring them in line and test.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
This endpoint should only deal with emails that have already been approved, and
are attached with user's account. There's no need to re-check them here.
* Fix test_verify_json_objects_for_server_awaits_previous_requests
It turns out that this wasn't really testing what it thought it was testing
(in particular, `check_context` was turning failures into success, which was
making the tests pass even though it wasn't clear they should have been.
It was also somewhat overcomplex - we can test what it was trying to test
without mocking out perspectives servers.
* Fix warnings about finished logcontexts in the keyring
We need to make sure that we finish the key fetching magic before we run the
verifying code, to ensure that we don't mess up our logcontexts.
Co-authored-by: Benjamin Koch <bbbsnowball@gmail.com>
This adds configuration flags that will match a user to pre-existing users
when logging in via OpenID Connect. This is useful when switching to
an existing SSO system.
On startup `MultiWriteIdGenerator` fetches the maximum stream ID for
each instance from the table and uses that as its initial "current
position" for each writer. This is problematic as a) it involves either
a scan of events table or an index (neither of which is ideal), and b)
if rows are being persisted out of order elsewhere while the process
restarts then using the maximum stream ID is not correct. This could
theoretically lead to race conditions where e.g. events that are
persisted out of order are not sent down sync streams.
We fix this by creating a new table that tracks the current positions of
each writer to the stream, and update it each time we finish persisting
a new entry. This is a relatively small overhead when persisting events.
However for the cache invalidation stream this is a much bigger relative
overhead, so instead we note that for invalidation we don't actually
care about reliability over restarts (as there's no caches to
invalidate) and simply don't bother reading and writing to the new table
in that particular case.
#8037 changed the default `autoescape` option when rendering Jinja2 templates from `False` to `True`. This caused some bugs, noticeably around redirect URLs being escaped in SAML2 auth confirmation templates, causing those URLs to break for users.
This change returns the previous behaviour as it stood. We may want to look at each template individually and see whether autoescaping is a good idea at some point, but for now lets just fix the breakage.
The idea is to remove some of the places we pass around `int`, where it can represent one of two things:
1. the position of an event in the stream; or
2. a token that partitions the stream, used as part of the stream tokens.
The valid operations are then:
1. did a position happen before or after a token;
2. get all events that happened before or after a token; and
3. get all events between two tokens.
(Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)
I'd like to get a better insight into what we are doing with respect to state
res. The list of state groups we are resolving across should be short (if it
isn't, that's a massive problem in itself), so it should be fine to log it in
ite entiretly.
I've done some grepping and found approximately zero cases in which the
"shortcut" code delivered the result, so I've ripped that out too.
When updating the `room_stats_state` table, we try to check for null bytes slipping in to the content for state events. It turns out we had added `guest_access` as a field to room_stats_state without including it in the null byte check.
Lo and behold, a null byte in a `m.room.guest_access` event then breaks `room_stats_state` updates.
This PR adds the check for `guest_access`.
When updating room_stats_state, we try to check for null bytes slipping
in to the
content for state events. It turns out we had added guest_access as a
field to
room_stats_state without including it in the null byte check.
Lo and behold, a null byte in a m.room.guest_access event then breaks
room_stats_state
updates.
This PR adds the check for guest_access. A further PR will improve this
function so that this hopefully does not happen again in future.
Fixes: #8359
Trying to reactivate a user with the admin API (`PUT /_synapse/admin/v2/users/<user_name>`) causes an internal server error.
Seems to be a regression in #8033.
* Create a new function to verify that the length of a device name is
under a certain threshold.
* Refactor old code and tests to use said function.
* Verify device name length during registration of device
* Add a test for the above
Signed-off-by: Dionysis Grigoropoulos <dgrig@erethon.com>
==============================
In addition to the below, Synapse 1.20.0rc5 also includes the bug fix that was included in 1.19.3.
Features
--------
- Add flags to the `/versions` endpoint for whether new rooms default to using E2EE. ([\#8343](https://github.com/matrix-org/synapse/issues/8343))
Bugfixes
--------
- Fix rate limiting of federation `/send` requests. ([\#8342](https://github.com/matrix-org/synapse/issues/8342))
- Fix a longstanding bug where back pagination over federation could get stuck if it failed to handle a received event. ([\#8349](https://github.com/matrix-org/synapse/issues/8349))
Internal Changes
----------------
- Blacklist [MSC2753](https://github.com/matrix-org/matrix-doc/pull/2753) SyTests until it is implemented. ([\#8285](https://github.com/matrix-org/synapse/issues/8285))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl9kzk8ACgkQM/xY9qcR
MEim6A//aERkhyLGRlGpLd37lCyFQCeffTMH1rTvu04iIBQBaUZ6g7CYWOpK43zT
U8kt379+5OShjdAXs/X4XP+ucdHVbrwsRSP3hBS/fFLiDT0fJgP8uiSf5QqO6NnT
OqDyXYjcXvj/c6tMKglVtsdh8u4hFwNZjGPMGG68IzJu14uEhnD100cL9jSB9bLB
ongWpsQzzdGBpJPSFRjv9dCUSeRbzyUdl1t0uqzrNqyN9s/JnzFTn7ZYo6y3lnSS
dHGVMMo/12M2PkbBHnbJVvDY5Q/R7ZxyXlpz0gvSNOQIw8FqYFnuB0Niy5dQhXSR
Sy5h4qbczLxqbql1x+lmzeQm4ZMORsW/Tl4C3z6yK6OYaOCJHIf9en4DplTSTqp1
t+85JxWR2wH10d99YHBpaYKmkVovpwgchrO4YWrtXljUFAhhavzf+YiAdOHYT52s
RDsDLsvjMbxEHsz4cHfycmshYhjzjb340wkoDXuQpj0zrO99d+Zd83xdK8pS0UQn
OaljLRAd/5iBjTSyZPSrB1U5141OzlM3QZVJzaYAnP12yhR9eaX2twSCk+lPYOWd
nhLJjNnj1B1XSGArthuE5NLyEiCPz6KyN2RhO0EOx5YjZN9TwH7LS9upyNFe1nN1
GIhO5gz+jWLuBZE3xzRNjJyCx/I/LolpCwGMvKDu6638rpsbrPs=
=tT5/
-----END PGP SIGNATURE-----
Merge tag 'v1.20.0rc5' into develop
Synapse 1.20.0rc5 (2020-09-18)
==============================
In addition to the below, Synapse 1.20.0rc5 also includes the bug fix that was included in 1.19.3.
Features
--------
- Add flags to the `/versions` endpoint for whether new rooms default to using E2EE. ([\#8343](https://github.com/matrix-org/synapse/issues/8343))
Bugfixes
--------
- Fix rate limiting of federation `/send` requests. ([\#8342](https://github.com/matrix-org/synapse/issues/8342))
- Fix a longstanding bug where back pagination over federation could get stuck if it failed to handle a received event. ([\#8349](https://github.com/matrix-org/synapse/issues/8349))
Internal Changes
----------------
- Blacklist [MSC2753](https://github.com/matrix-org/matrix-doc/pull/2753) SyTests until it is implemented. ([\#8285](https://github.com/matrix-org/synapse/issues/8285))
Synapse 1.19.3 (2020-09-18)
===========================
Bugfixes
--------
- Partially mitigate bug where newly joined servers couldn't get past
events in a room when there is a malformed event.
([\#8350](https://github.com/matrix-org/synapse/issues/8350))
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAl9kxN4THGFuZHJld0Bh
bW9yZ2FuLnh5egAKCRCIhIgNLv5f9LznEACSm7ZL0GjVDcDjGEu+QjKIi3KUiFq3
i8EXEZWT3w5NNNER0Jey5BHxaBKtnPsvon0k3U4bp5KKOA6BGa9L4NDGZYJa3p0k
A1uc3DCgGG1aVazpIzfhWRA0va13T+zRKdz52GdjzksH0WGl6w3UoJhloWOmHyxz
K4UxGwOqJMSBxseBHOFcXdomPtsNYUqsrOcZYWjh3hWN0GMV6H+WrcbKVYl49V0F
6aVHuaxit35iAGYER41mnTA34ZNuC1Qkp83mAaE+Z8i39qBWPMRErUNAyZQ/mCKz
QrF98p7F2kFgSzDagtZiUPZj3w3XwfZf05bqnyd9cxBEQdIYFLAL0lokEXcoY1os
q7gKwGuwicuvYEQrt+gSFlkoUaSvy7/b4cmFqvT0NGnBNZoYl6MX4MXP2CNHuaFk
yljZoTecKEmhInY10S4uy+Hp0JNHuZWEOYGKy7CrQaqRo8MhBLk5LWBPjUOayPLP
uvDNv6MShQ8SpCiKvsoCBiX9G3LEo1yHPo5oX57nOr+IHawH0PPkXVKL3b+K+7s0
eXah/9n/wQYO5K+ReqTFd9ZCegN0/hW/NAT9aX/gEYASkS4ANvGALWwXbZSOG5IG
2glXiewbJSOaVutPRpIVI3XGDSdm3/8VpO+cAKotZ+pR1V6nsxtVwLRmAhxqhNFD
3AULCLMt2yKzDw==
=a9VC
-----END PGP SIGNATURE-----
Merge tag 'v1.19.3' into release-v1.20.0
1.19.3
Synapse 1.19.3 (2020-09-18)
===========================
Bugfixes
--------
- Partially mitigate bug where newly joined servers couldn't get past
events in a room when there is a malformed event.
([\#8350](https://github.com/matrix-org/synapse/issues/8350))
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Fix _set_destination_retry_timings
This came about because the code assumed that retry_interval
could not be NULL — which has been challenged by catch-up.
Add ability for ASes to /login using the `uk.half-shot.msc2778.login.application_service` login `type`.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
This is a bit of a hack, as `_check_sigs_and_hash_and_fetch` is intended
for attempting to pull an event from the database/(re)pull it from the
server that originally sent the event if checking the signature of the
event fails.
During backfill we *know* that we won't have the event in our database,
however it is still useful to be able to query the original sending
server as the server we're backfilling from may be acting maliciously.
The main benefit and reason for this change however is that
`_check_sigs_and_hash_and_fetch` will drop an event during backfill if
it cannot be successfully validated, whereas the current code will
simply fail the backfill request - resulting in the client's /messages
request silently being dropped.
This is a quick patch to fix backfilling rooms that contain malformed
events. A better implementation in planned in future.
Instead of just using the most recent extremities let's pick the
ones that will give us results that the pagination request cares about,
i.e. pick extremities only if they have a smaller depth than the
pagination token.
This is useful when we fail to backfill an extremity, as we no longer
get stuck requesting that same extremity repeatedly.
==============================
Synapse 1.20.0rc4 is identical to 1.20.0rc3, with the addition of the security fix that was included in 1.19.2.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl9iC2QACgkQM/xY9qcR
MEhQJw/+N158AAvxLQsk4bNu8gscc6CAJaDtdR+QNqQa9UOit8dj7nnIhgRCqvPy
ZyNjKpI0LxytTS+NAYWNT88gfOcODR9TkWd7o8rGLCBenF3SwdZ2MM40gmHkola1
WunsRSHuYH1lyVumdOOIw8uOOnglORgRjcuQ4ht5AKYYnWAMBxD+crluXOGdJeLS
OdxBzCCHO1s3nb09ybvpxsxtXwbsRqhDKqotZn3gJzPbtV8NUNaer29bA7bAfMJ0
NvIKIEjrg79PEYX2G0i2kGaNqcCne8SY8Mb/nc0/M2KiLPDevy+eKA2jKYmWfUK+
2MbGYmA5rXUy4RPUB4cmGkqbsZhBPMGT4O6GD8UTJ/9dESUPzTJKcJgv1mZbMA1F
HfQOnjTRrsf9incU+QBgkKkrVboBTnpewO2mRoiLtavAUiPiXX0WvWrYfRpXyiSU
uihuVHY/tefxEW2uUGYGsCxthm8V2vrIumd8qASdybAO7gZtyrKN/5Z9M1Xhu9Fc
FidoOXo1TrAjoqVL2rfslPYdl0Pqed4IoWthwSIV+0C33e3qzeH32X2unY0HX4s/
3KPvxXqAsdE6T0WdZwKEDrGO33OtrBPkJK+O2yjSV+3d+wdqGm+RVm39lsVysKX2
KJXcJ/An8yDwaN+lemeSLbOxU3TRWGdT/Wrek8HAH/bAULzlLWg=
=0feV
-----END PGP SIGNATURE-----
Merge tag 'v1.20.0rc4' into develop
Synapse 1.20.0rc4 (2020-09-16)
==============================
Synapse 1.20.0rc4 is identical to 1.20.0rc3, with the addition of the security fix that was included in 1.19.2.
slots use less memory (and attribute access is faster) while slightly
limiting the flexibility of the class attributes. This focuses on objects
which are instantiated "often" and for short periods of time.
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
==============================
Bugfixes
--------
- Fix a bug introduced in v1.20.0rc1 where the wrong exception was raised when invalid JSON data is encountered. ([\#8291](https://github.com/matrix-org/synapse/issues/8291))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl9bbA8ACgkQM/xY9qcR
MEjjJg//ZGcvPLr8y3F5JXptcjWA3AFja3DgztBA3uNFoDrwVhR9m3+F4rL/Fbat
gnaccWsKIXOjZw5WOThQEUIfYMKmHEsaDl7xtlUPC4ylmSV+31ypu1KiRSxND2kT
TmOupvqw+4/E4RajpYX6WT3e2oQUpIcBAKsUyZL3ZXECR/yNYrA6w2w7i4wrv3cU
QDzBFrCcr5aJql7VUc88BlaZUG6xom2/kkPtjmO6imPnPGBHLN22uPU7zQ4n7Vgi
Nrg45v5WHVCx57WoXqAEap1zdKgoRCna1x0NkqYh0OAXq1l6aVwlf/Pdynt91f2x
yyaMYjMskjyatHlzONMV4kz0w03dUrGJXiAx2ldEDd32SKBUxLJ/CYtNEdoXZ4Mm
o0OJfDbaSJwlqK7FhdZJE3kCSkUvlmVasDErXfQRDVHy5sx9Ufr4P0kkaZyXDys1
UAWFcCJwy9dgFR2679PoXk1s/gHf10wuk6FOs8ESbcJJOvCljxWrOIqKNzSUL3HQ
Hj5V1zHYXB6HtgiI7tbKxSn/d2uajHI9b8LVOkyV9RKsb1DGGRgNQQ5Kz4Zud7dC
vuQ2jVw7JIo41W3QIXvcL1KdQeje2nwYwuwNREZaEDZJsGLjxaJ6dNyBVgkmYYFP
gxhplSOSsFjQRGEsBrc2utWxDnLVGPHEAt7iHVSqjwVUyARvxjo=
=e2T1
-----END PGP SIGNATURE-----
Merge tag 'v1.20.0rc3' into develop
Synapse 1.20.0rc3 (2020-09-11)
==============================
Bugfixes
--------
- Fix a bug introduced in v1.20.0rc1 where the wrong exception was raised when invalid JSON data is encountered. ([\#8291](https://github.com/matrix-org/synapse/issues/8291))
The idea here is that we pass the `max_stream_id` to everything, and only use the stream ID of the particular event to figure out *when* the max stream position has caught up to the event and we can notify people about it.
This is to maintain the distinction between the position of an item in the stream (i.e. event A has stream ID 513) and a token that can be used to partition the stream (i.e. give me all events after stream ID 352). This distinction becomes important when the tokens are more complicated than a single number, which they will be once we start tracking the position of multiple writers in the tokens.
The valid operations here are:
1. Is a position before or after a token
2. Fetching all events between two tokens
3. Merging multiple tokens to get the "max", i.e. `C = max(A, B)` means that for all positions P where P is before A *or* before B, then P is before C.
Future PR will change the token type to a dedicated type.
This PR adds a confirmation step to resetting your user password between clicking the link in your email and your password actually being reset.
This is to better align our password reset flow with the industry standard of requiring a confirmation from the user after email validation.
If a file cannot be thumbnailed for some reason (e.g. the file is empty), then
catch the exception and convert it to a reasonable error message for the client.
`pusher_pool.on_new_notifications` expected a min and max stream ID, however that was not what we were passing in. Instead, let's just pass it the current max stream ID and have it track the last stream ID it got passed.
I believe that it mostly worked as we called the function for every event. However, it would break for events that got persisted out of order, i.e, that were persisted but the max stream ID wasn't incremented as not all preceding events had finished persisting, and push for that event would be delayed until another event got pushed to the effected users.
This fixes an issue where different methods (crop/scale) overwrite each other.
This first tries the new path. If that fails and we are looking for a
remote thumbnail, it tries the old path. If that still isn't found, it
continues as normal.
This should probably be removed in the future, after some of the newer
thumbnails were generated with the new path on most deployments. Then
the overhead should be minimal if the other thumbnails need to be
regenerated.
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
The intention here is to change `StreamToken.room_key` to be a `RoomStreamToken` in a future PR, but that is a big enough change without this refactoring too.
This is a config option ported over from DINUM's Sydent: https://github.com/matrix-org/sydent/pull/285
They've switched to validating 3PIDs via Synapse rather than Sydent, and would like to retain this functionality.
This original purpose for this change is phishing prevention. This solution could also potentially be replaced by a similar one to https://github.com/matrix-org/synapse/pull/8004, but across all `*/submit_token` endpoint.
This option may still be useful to enterprise even with that safeguard in place though, if they want to be absolutely sure that their employees don't follow links to other domains.
This removes `SourcePaginationConfig` and `get_pagination_rows`. The reasoning behind this is that these generic classes/functions erased the types of the IDs it used (i.e. instead of passing around `StreamToken` it'd pass in e.g. `token.room_key`, which don't have uniform types).
By importing from canonicaljson the simplejson module was still being used
in some situations. After this change the std lib json is consistenty used
throughout Synapse.
Fixes https://github.com/matrix-org/synapse/issues/8238
Alongside the delta file, some changes were also necessary to the codebase to remove references to the now defunct `populate_stats_process_rooms_2` background job. Thankfully the latter doesn't seem to have made it into any documentation yet :)
The version 1.3.0 has a bug with unicode charecters:
```
>>> from canonicaljson import encode_pretty_printed_json
>>> encode_pretty_printed_json({'a': 'à'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/erdnaxeli/.pyenv/versions/3.6.7/lib/python3.6/site-packages/canonicaljson.py", line 96, in encode_pretty_printed_json
return _pretty_encoder.encode(json_object).encode("ascii")
UnicodeEncodeError: 'ascii' codec can't encode character '\xe0' in position 12: ordinal not in range(128)
```
Signed-off-by: Alexandre Morignot <erdnaxeli@cervoi.se>
Co-authored-by: Alexandre Morignot <erdnaxeli@cervoi.se>
* Fixup `ALTER TABLE` database queries
Make the new columns nullable, because doing otherwise can wedge a
server with a big database, as setting a default value rewrites the
table.
* Switch back to using the notifications count in the push badge
Clients are likely to be confused if we send a push but the badge count
is the unread messages one, and not the notifications one.
* Changelog
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
* Add shared_rooms api
* Add changelog
* Add .
* Wrap response in {"rooms": }
* linting
* Add unstable_features key
* Remove options from isort that aren't part of 5.x
`-y` and `-rc` are now default behaviour and no longer exist.
`dont-skip` is no longer required
https://timothycrosley.github.io/isort/CHANGELOG/#500-penny-july-4-2020
* Update imports to make isort happy
* Add changelog
* Update tox.ini file with correct invocation
* fix linting again for isort
* Vendor prefix unstable API
* Fix to match spec
* import Codes
* import Codes
* Use FORBIDDEN
* Update changelog.d/7785.feature
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Implement get_shared_rooms_for_users
* a comma
* trailing whitespace
* Handle the easy feedback
* Switch to using runInteraction
* Add tests
* Feedback
* Seperate unstable endpoint from v2
* Add upgrade node
* a line
* Fix style by adding a blank line at EOF.
* Update synapse/storage/databases/main/user_directory.py
Co-authored-by: Tulir Asokan <tulir@maunium.net>
* Update synapse/storage/databases/main/user_directory.py
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Update UPGRADE.rst
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Fix UPGRADE/CHANGELOG unstable paths
unstable unstable unstable
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Tulir Asokan <tulir@maunium.net>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Tulir Asokan <tulir@maunium.net>
* Move `get_devices_with_keys_by_user` to `EndToEndKeyWorkerStore`
this seems a better fit for it.
This commit simply moves the existing code: no other changes at all.
* Rename `get_devices_with_keys_by_user`
to better reflect what it does.
* get_device_stream_token abstract method
To avoid referencing fields which are declared in the derived classes, make
`get_device_stream_token` abstract, and define that in the classes which define
`_device_list_id_gen`.
... and to show that it does something slightly different to
`_get_e2e_device_keys_txn`.
`include_all_devices` and `include_deleted_devices` were never used (and
`include_deleted_devices` was broken, since that would cause `None`s in the
result which were not handled in the loop below.
Add some typing too.
This fixes a bug where having multiple callers waiting on the same
stream and position will cause it to try and compare two deferreds,
which fails (due to the sorted list having an entry of `Tuple[int,
Deferred]`).
This is split out from https://github.com/matrix-org/synapse/pull/7438, which had gotten rather large.
`LoginRestServlet` has a couple helper methods, `login_submission_legacy_convert` and `login_id_thirdparty_from_phone`. They're primarily used for converting legacy user login submissions to "identifier" dicts ([see spec](https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-login)). Identifying information such as usernames or 3PID information used to be top-level in the login body. They're now supposed to be put inside an [identifier](https://matrix.org/docs/spec/client_server/r0.6.1#identifier-types) parameter instead.
#7438's purpose is to allow using the new identifier parameter during User-Interactive Authentication, which is currently handled in AuthHandler. That's why I've moved these helper methods there. I also moved the refactoring of these method from #7438 as they're relevant.
Small cleanup PR.
* Removed the unused `is_guest` argument
* Added a safeguard to a (currently) impossible code path, fixing static checking at the same time.
==============================
Bugfixes
--------
- Fix a bug introduced in v1.19.0 where appservices with ratelimiting disabled would still be ratelimited when joining rooms. ([\#8139](https://github.com/matrix-org/synapse/issues/8139))
- Fix a bug introduced in v1.19.0 that would cause e.g. profile updates to fail due to incorrect application of rate limits on join requests. ([\#8153](https://github.com/matrix-org/synapse/issues/8153))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdVkXOgzrGzds0jtrHgFcFF8ZFs0FAl9FIKkACgkQHgFcFF8Z
Fs3zIw/9H40ieb73Iay6ecQeOSfHiMVMzvJqYbKgho/a6h5JDHir+NGpwuLFdeGM
eHR07QkQgUU9+dLNTMOQpCKTIsU70wvzH1vTDINS3ChjnRBdrHKqhAG6ZyEt1dJx
kxYX54zsQUwiwshMKbJ+DPclHaBFnL+SY5OFfqCNjvaNob59DbHAL3tlSktPc2go
tGmj81q0dWY6maMCGI3IIYcrW7oLi+4TwosZual5Hz/xgRBiGaKHXRIJnInvkXpl
R+rSOmpYraapfDPHzPyQgLN4Dt7aAccGho843tt7dAVfd2GRSaUkLGXVXCdoruQG
CRjY1P3BRBzRBx6o80Uw7Ah0hsoVgpJTSzY008KigJce+IiRWG5sgPjoubhfK0MA
BqzmCa3/lrR+/WUOf4+w6HSfRncKawgAp7Y7wVj4nQF5fc8mwpFLz4pA/C2YOyjp
nYXCHf60/KSBDhnr0ZRAhAby4MJoYSf03djFG1oef5SVzOzHD7zho9oBnEz15Tab
XXkg1iJ7AhNFiQjsY4H1sl2onoF4T7B53NOnUEwD0oll+nXIYGe6hlNuq6x4j6l9
39ZlMoe9zK28LoKKWa1RDug8z+PmarKRJ2zATlHIb2RGeVX+oFfaKVsIbJtupJVC
8HSFt7gcgLCdUazk6taKpOHeVyGxK6WUkLnCMHzD2rzPhzpSyws=
=0rHM
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdVkXOgzrGzds0jtrHgFcFF8ZFs0FAl9FJK4ACgkQHgFcFF8Z
Fs0O2xAAgxqCfAHylVuwunRppwV49Bvq6H0mMM29hYBogGB5cmh1vRnUm5GsPQt0
9vTKlwMz6XKjLs3TLqYsnZfXqK+lyaN0xFXd7xWCNzFtEXoIvDnq+u3h5WGaKOC9
PcXb2LSZoHC5yECBpoh58ZQEPKtYILEjo+OSDboIqHz4N5HSjMSPGPvMkUn6xMNG
roWTAWKPd9juyE2dPzZxIoBWwJGn3D8EMkeTQDlExTTvmnDPyPvJ5MVUY/xaHLgy
XV8lapFu/SzWAKotc5+9qkVN64obaxwovYTU9JnlqEc5+WlD+Jl+g0258Q1bV1H9
341aQQJX08iYw3xw13xVgT0zLPRbp82O3/SHC3S1nz27HUWKXqUtsm6woDbgHIz5
UPvKFQsp2dEN4tFXxkEHiossIVNGuXdRYwEjFQrxOwayCuS4cQwDADhqnzDU4hio
LSVhtxs9rgLps4iKpcaRAqK8kifTrsomlQfh/7axPJQ43pmBR2PiItetlBW/9Le6
KTH90ghLQzJwKFkIcFcvPhFMVqSyXI32+g5++YAPmNVy9M/7LdJxuEc9ifTWgwds
LtV3/F8xlqd0qwl5IbwC6Wf19N06jdlRv/q1zL/Hb6qu3FLQeGd+/1aiC0rsbq15
grdHVZkZi1iVF/zrOx24ctxQvgLyGHA+M7n/oIaIgxlT1S6+FUI=
=49ZC
-----END PGP SIGNATURE-----
Merge tag 'v1.19.1rc1' into develop
Synapse 1.19.1rc1 (2020-08-25)
==============================
Bugfixes
--------
- Fix a bug introduced in v1.19.0 where appservices with ratelimiting disabled would still be ratelimited when joining rooms. ([\#8139](https://github.com/matrix-org/synapse/issues/8139))
- Fix a bug introduced in v1.19.0 that would cause e.g. profile updates to fail due to incorrect application of rate limits on join requests. ([\#8153](https://github.com/matrix-org/synapse/issues/8153))
Add new method ratelimiter.can_requester_do_action and ensure that appservices are exempt from being ratelimited.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
* Don't raise session_id errors on submit_token if request_token_inhibit_3pid_errors is set
* Changelog
* Also wait some time before responding to /requestToken
* Incorporate review
* Update synapse/storage/databases/main/registration.py
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Incorporate review
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Add new method ratelimiter.can_requester_do_action and ensure that appservices are exempt from being ratelimited.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
It's just a thin wrapper around two ID gens to make `get_current_token`
and `get_next` return tuples. This can easily be replaced by calling the
appropriate methods on the underlying ID gens directly.
The function is used for two purposes: 1) for subscribers of streams to
get a token they can use to get further updates with, and 2) for
replication to track position of the writers of the stream.
For streams with a single writer the two scenarios produce the same
result, however the situation becomes complicated for streams with
multiple writers. The current `MultiWriterIdGenerator` does not
correctly handle the first case (which is not an issue as its only used
for the `caches` stream which nothing subscribes to outside of
replication).
If we got an error persisting an event, we would try to remove the push actions
asynchronously, which would lead to a 'Re-starting finished log context'
warning.
I don't think there's any need for this to be asynchronous.
We do this to prevent foot guns. The default config uses a MemoryFilter,
but users are free to change to logging to files directly. If they do
then they have to ensure to set the `filters: [context]` on the right
handler, otherwise records get written with the wrong context.
Instead we move the logic to happen when we generate a record, which is
when we *log* rather than *handle*.
(It's possible to add filters to loggers in the config, however they
don't apply to descendant loggers and so they have to be manually set on
*every* logger used in the code base)
We do this to prevent foot guns. The default config uses a MemoryFilter,
but users are free to change to logging to files directly. If they do
then they have to ensure to set the `filters: [context]` on the right
handler, otherwise records get written with the wrong context.
Instead we move the logic to happen when we generate a record, which is
when we *log* rather than *handle*.
(It's possible to add filters to loggers in the config, however they
don't apply to descendant loggers and so they have to be manually set on
*every* logger used in the code base)
c.f. #8021
A lot of the code here is to change the `Completed 200 OK` logging to include the request URI so that we can drop the `Sending request...` log line.
Some notes:
1. We won't log retries, which may be confusing considering the time taken log line includes retries and sleeps.
2. The `_send_request_with_optional_trailing_slash` will always be logged *without* the forward slash, even if it succeeded only with the forward slash.
* Change default log config to buffer by default.
This batches up writes to the filesystem, which is more efficient for
disk I/O. This means that it can take some time for logs to get written
to disk. Note that ERROR logs (and above) immediately flush the buffer.
This only effects new installs, as we only write the log config if
started with `--generate-config` (in the same way we do for generating
signing keys).
* Default to keeping last 4 days of logs.
This hopefully reduces the amount of logs kept for new servers. Keeping
the last 1GB of logs is likely overkill for new servers, but equally may
not be enough for busy ones.
Instead, we keep the last four days worth of logs, enough so that admins
can investigate any problems that happened over e.g. a long weekend.
Duplicating function signatures between server.py and server.pyi is
silly. This commit changes that by changing all `build_*` methods to
`get_*` methods and changing the `_make_dependency_method` to work work
as a descriptor that caches the produced value.
There are some changes in other files that were made to fix the typing
in server.py.
This has long been something I've wanted to do. Basically the `Daemonize` code
is both too flexible and not flexible enough, in that it offers a bunch of
features that we don't use (changing UID, closing FDs in the child, logging to
syslog) and doesn't offer a bunch that we could do with (redirecting stdout/err
to a file instead of /dev/null; having the parent not exit until the child is
running).
As a first step, I've lifted the Daemonize code and removed the bits we don't
use. This should be a non-functional change. Fixing everything else will come
later.
We've [decided](https://github.com/matrix-org/synapse/issues/5253#issuecomment-665976308) to remove the signature check for v1 lookups.
The signature check has been removed in v2 lookups. v1 lookups are currently deprecated. As mentioned in the above linked issue, this verification was causing deployments for the vector.im and matrix.org IS deployments, and this change is the simplest solution, without being unjustified.
Implementations are encouraged to use the v2 lookup API as it has [increased privacy benefits](https://github.com/matrix-org/matrix-doc/pull/2134).
`StatsHandler` handles updates to the `current_state_delta_stream`, and updates room stats such as the amount of state events, joined users, etc.
However, it counts every new join membership as a new user entering a room (and that user being in another room), whereas it's possible for a user's membership status to go from join -> join, for instance when they change their per-room profile information.
This PR adds a check for join->join membership transitions, and bails out early, as none of the further checks are necessary at that point.
Due to this bug, membership stats in many rooms have ended up being wildly larger than their true values. I am not sure if we also want to include a migration step which recalculates these statistics (possibly using the `_populate_stats_process_rooms` bg update).
Bug introduced in the initial implementation https://github.com/matrix-org/synapse/pull/4338.
Thanks to some slightly overzealous cleanup in the
`delete_old_current_state_events`, it's possible to end up with no
`event_forward_extremities` in a room where we have outstanding local
invites. The user would then get a "no create event in auth events" when trying
to reject the invite.
We can hack around it by using the dangling invite as the prev event.
==============================
Bugfixes
--------
- Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
- Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967))
Internal Changes
----------------
- Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl8f/f8ACgkQOSor00I9
eP8/Uwf8CiVWvrBsmFZMvxJDkUWm0/f1kN4IQdm8ibDtyNyvFUx+Y1K8KOQS+VwG
a3bZqSC2Vv2sO9O9kR+V2tk831l+ujO0Nlaohuqyvhcl9lzh04rRYI9x9IHlAq2H
WPb0NMLwMufL6YkXDBwZT/G9TVW1vLRGASu4f7X2rXqek34VNVgYbg1hB2dp4dDa
wjKk3iBZ6h34IhKPgu0sLBUcyvX4U5xdOHjEG3HXvNnvDNO0HMD8rGB7065vFMD6
PH4nUK/h+RL0UBs2sJOMK1ZazFUODdURwANJQNAQ6pNvf9/RWgw2okka2bYIcmQQ
UT7tiwMsBvKdy4PER5fcDX3COY16qw==
=Q+bI
-----END PGP SIGNATURE-----
Merge tag 'v1.18.0rc2' into develop
Synapse 1.18.0rc2 (2020-07-28)
==============================
Bugfixes
--------
- Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
- Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967))
Internal Changes
----------------
- Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
IIRC this doesn't break tests because its only hit on reconnection, or something.
Basically, when a process needs to fetch missing updates for the `typing` stream it needs to query the writer instance via HTTP (as we don't write typing notifications to the DB), the problem was that the endpoint (`streams`) was only registered on master and specifically not on the typing writer worker.
Most of the stuff we do for replication commands can be done synchronously. There's no point spinning up background processes if we're not going to need them.
Handling of incoming typing stream updates from replication was not
hooked up on master, effecting set ups where typing was handled on a
different worker.
This is really only a problem if the master process is also handling
sync requests, which is unlikely for those that are at the stage of
moving typing off.
The other observable effect is that if a worker restarts or a
replication connect drops then the typing worker will issue a
`POSITION typing`, triggering master process to try and stream *all*
typing updates from position 0.
Fixes#7907
If we send out an event which refers to `prev_events` which other servers in
the federation are missing, then (after a round or two of backfill attempts),
they will end up asking us for `/state_ids` at a particular point in the DAG.
As per https://github.com/matrix-org/synapse/issues/7893, this is quite
expensive, and we tend to see lots of very similar requests around the same
time.
We can therefore handle this much more efficiently by using a cache, which (a)
ensures that if we see the same request from multiple servers (or even the same
server, multiple times), then they share the result, and (b) any other servers
that miss the initial excitement can also benefit from the work.
[It's interesting to note that `/state` has a cache for exactly this
reason. `/state` is now essentially unused and replaced with `/state_ids`, but
evidently when we replaced it we forgot to add a cache to the new endpoint.]
For inbound federation requests, if a given remote server makes too many
requests at once, we start stacking them up rather than processing them
immediatedly.
However, that means that there is a fair chance that the requesting server will
disconnect before we start processing the request. In that case, if it was a
read-only request (ie, a GET request), there is absolutely no point in
building a response (and some requests are quite expensive to handle).
Even in the case of a POST request, one of two things will happen:
* Most likely, the requesting server will retry the request and we'll get the
information anyway.
* Even if it doesn't, the requesting server has to assume that we didn't get
the memo, and act accordingly.
In short, we're better off aborting the request at this point rather than
ploughing on with what might be a quite expensive request.
The [postgres setup docs](https://github.com/matrix-org/synapse/blob/develop/docs/postgres.md#set-up-database) recommend setting up your database with user `synapse_user`.
However, uncommenting the postgres defaults in the sample config leave you with user `synapse`.
This PR switches the sample config to recommend `synapse_user`. Took a me a second to figure this out, so assume this will beneficial to others.
It serves no purpose and updating everytime we write to the device inbox
stream means all such transactions will conflict, causing lots of
transaction failures and retries.
When we get behind on replication, we tend to stack up background processes
behind a linearizer. Bg processes are heavy (particularly with respect to
prometheus metrics) and linearizers aren't terribly efficient once the queue
gets long either.
A better approach is to maintain a queue of requests to be processed, and
nominate a single process to work its way through the queue.
Fixes: #7444
It was correct at the time of our friend Jorik writing it (checking
git blame), but the world has moved now and it is no longer a
generator.
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
When considering rooms to clean up in `delete_old_current_state_events`, skip
rooms which we are creating, which otherwise look a bit like rooms we have
left.
Fixes#7834.
As far as I can tell from the sentry logs, the only time this has actually done
anything in the last two years is when we had two master workers running at
once, and even then, it made a bit of a mess of it (see
https://github.com/matrix-org/synapse/issues/7845#issuecomment-658238739).
Generally I feel like this code is doing more harm than good.
The replication client requires that arguments are given as keyword
arguments, which was not done in this case. We also pull out the logic
so that we can catch and handle any exceptions raised, rather than
leaving them unhandled.
When fetching the state of a room over federation we receive the event
IDs of the state and auth chain. We then fetch those events that we
don't already have.
However, we used a function that recursively fetched any missing auth
events for the fetched events, which can lead to a lot of recursion if
the server is missing most of the auth chain. This work is entirely
pointless because would have queued up the missing events in the auth
chain to be fetched already.
Let's just diable the recursion, since it only gets called from one
place anyway.