Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.
In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.
Fixes (I hope) #7257.
==============================
Features
--------
- Always send users their own device updates. ([\#7160](https://github.com/matrix-org/synapse/issues/7160))
- Add support for handling GET requests for `account_data` on a worker. ([\#7311](https://github.com/matrix-org/synapse/issues/7311))
Bugfixes
--------
- Fix a bug that prevented cross-signing with users on worker-mode synapses. ([\#7255](https://github.com/matrix-org/synapse/issues/7255))
- Do not treat display names as globs in push rules. ([\#7271](https://github.com/matrix-org/synapse/issues/7271))
- Fix a bug with cross-signing devices belonging to remote users who did not share a room with any user on the local homeserver. ([\#7289](https://github.com/matrix-org/synapse/issues/7289))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl6gS+IACgkQOSor00I9
eP8CTwf+KLehHQMcrz0yoHN+/O86mpNv3Uz83BJ0qDIgezVbtqT7+ZwnK2Jb8dwM
zTcGRq5a6V+JLdfSTnXfrPzvAPt0WLaEENSAEGwZ6unqqxcG2PgATkRb0XtY8Rzh
Feu8WgJTnORbiiM3sBLI8toZMxal3X3ZcftPEAan1SaoyqLPQ2s14edDwgx64S5e
rBGzPDNSxZU1McNBq7Q11QTg3zgBUF+TYwyhnLbC5X1RdDwQpKhNZhwhYBS1yOD6
TsPjtlixC+HXb21Rmob7XY8VFxTBSBdUdtVQKAT4h3+l9bNFQLpBfDSVc7xkPiLt
/Gj+PL+YgY9xQDvHO2fVw8m4ololZw==
=p53e
-----END PGP SIGNATURE-----
Merge tag 'v1.12.4rc1' into develop
Synapse 1.12.4rc1 (2020-04-22)
==============================
Features
--------
- Always send users their own device updates. ([\#7160](https://github.com/matrix-org/synapse/issues/7160))
- Add support for handling GET requests for `account_data` on a worker. ([\#7311](https://github.com/matrix-org/synapse/issues/7311))
Bugfixes
--------
- Fix a bug that prevented cross-signing with users on worker-mode synapses. ([\#7255](https://github.com/matrix-org/synapse/issues/7255))
- Do not treat display names as globs in push rules. ([\#7271](https://github.com/matrix-org/synapse/issues/7271))
- Fix a bug with cross-signing devices belonging to remote users who did not share a room with any user on the local homeserver. ([\#7289](https://github.com/matrix-org/synapse/issues/7289))
First some background: StreamChangeCache is used to keep track of what "entities" have
changed since a given stream ID. So for example, we might use it to keep track of when the last
to-device message for a given user was received [1], and hence whether we need to pull any to-device messages from the database on a sync [2].
Now, it turns out that StreamChangeCache didn't support more than one thing being changed at
a given stream_id (this was part of the problem with #7206). However, it's entirely valid to send
to-device messages to more than one user at a time.
As it turns out, this did in fact work, because *some* methods of StreamChangeCache coped
ok with having multiple things changing on the same stream ID, and it seems we never actually
use the methods which don't work on the stream change caches where we allow multiple
changes at the same stream ID. But that feels horribly fragile, hence: let's update
StreamChangeCache to properly support this, and add some typing and some more tests while
we're at it.
[1]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L301
[2]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L47-L51
Splitting based on the response code means we can avoid double logging here and identical information from line 164 while still logging at info if we don't get a good response and need to retry.
Other parts of the code (such as the StreamChangeCache) assume that there will
not be multiple changes with the same stream id.
This code was introduced in #7024, and I hope this fixes#7206.
Add changelog
Save retrieved keys to the db
lint
Fix and de-brittle remote result dict processing
Use query_user_devices instead, assume only master, self_signing key types
Make changelog more useful
Remove very specific exception handling
Wrap get_verify_key_from_cross_signing_key in a try/except
Note that _get_e2e_cross_signing_verify_key can raise a SynapseError
lint
Add comment explaining why this is useful
Only fetch master and self_signing key types
Fix log statements, docstrings
Remove extraneous items from remote query try/except
lint
Factor key retrieval out into a separate function
Send device updates, modeled after SigningKeyEduUpdater._handle_signing_key_updates
Update method docstring
The general idea here is to get rid of the type: ignore annotations on all of the current_token and update_function assignments, which would have caught #7290.
After a bit of experimentation, it seems like the least-awful way to do this is to pass the offending functions in as parameters to the Stream constructor. Unfortunately that means that the concrete implementations no longer have the same constructor signature as Stream itself, which means that it gets hard to correctly annotate STREAMS_MAP.
I've also introduced a couple of new types, to take out some duplication.
Some of the query functions return generators rather than lists, so we can't
index into the result. Happily we already have a copy of the results.
(think this was introduced in #7024)
I don't really remember why this was so complicated; I think it dates
back to the time when we had to instantiate the Config classes before
we could call `add_arguments` - ie before #5597. In any case, I don't
think there's a good reason for it any more, and the impact of it
being complicated is that `--help` doesn't work correctly.
`REPLICATE` is now a valid command, and it's nice if you can issue it from the
console without remembering to call it `REPLICATE ` with a trailing space.
Separate `SimpleCommand` from `Command`, so that things which don't want to use
the `data` property don't have to, and thus fix the warnings PyCharm was giving
me about not calling `__init__` in the base class.
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.
Fixes#6815
Before figuring out whether we should alert a user on MAU, we call get_notice_room_for_user to get some info on the existing server notices room for this user. This function, if the room doesn't exist, creates it and invites the user in it. This means that, if we decide later that no server notice is needed, the user gets invited in a room with no message in it. This happens at every restart of the server, since the room ID returned by get_notice_room_for_user is cached.
This PR fixes that by moving the inviting bit to a dedicated function, that's only called when the server actually needs to send a notice to the user. A potential issue with this approach is that the room that's created by get_notice_room_for_user doesn't match how that same function looks for an existing room (i.e. it creates a room that doesn't have an invite or a join for the current user in it, so it could lead to a new room being created each time a user syncs), but I'm not sure this is a problem given it's cached until the server restarts, so that function won't run very often.
It also renames get_notice_room_for_user into get_or_create_notice_room_for_user to make what it does clearer.
Occasionally we could get a federation device list update transaction which
looked like:
```
[
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D2', 'prev_id': [], 'stream_id': 12, 'deleted': True}},
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D1', 'prev_id': [12], 'stream_id': 11, 'deleted': True}},
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D3', 'prev_id': [11], 'stream_id': 13, 'deleted': True}}
]
```
Having `stream_ids` which are lower than `prev_ids` looks odd. It might work
(I'm not actually sure), but in any case it doesn't seem like a reasonable
thing to expect other implementations to support.
* master:
1.12.1
Note where bugs were introduced
1.12.1rc1
Newsfile
Rewrite changelog
Add changelog
Only import sqlite3 when type checking
Fix another instance
Only setdefault for signatures if device has key_json
Fix starting workers when federation sending not split out.
Attempt to clarify Python version requirements (#7161)
Improve the UX of the login fallback when using SSO (#7152)
Update the wording of the config comment
Lint
Changelog
Regenerate sample config
Whitelist the login fallback by default for SSO
===========================
No significant changes since 1.12.1rc1.
Synapse 1.12.1rc1 (2020-03-31)
==============================
Bugfixes
--------
- Fix starting workers when federation sending not split out. ([\#7133](https://github.com/matrix-org/synapse/issues/7133)). Introduced in v1.12.0.
- Avoid importing `sqlite3` when using the postgres backend. Contributed by David Vo. ([\#7155](https://github.com/matrix-org/synapse/issues/7155)). Introduced in v1.12.0rc1.
- Fix a bug which could cause outbound federation traffic to stop working if a client uploaded an incorrect e2e device signature. ([\#7177](https://github.com/matrix-org/synapse/issues/7177)). Introduced in v1.11.0.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAl6GAZMTHGFuZHJld0Bh
bW9yZ2FuLnh5egAKCRCIhIgNLv5f9JqOD/4kjIBwKSOaUzYhzaP+4o4fDwz49IiO
GzSgq6bf+C1V6Vev/7+N1is0FnbfelaJZHf7wM1044tozL+puPqaGl2A/Zjxs8Pf
x9LpS53yOBsYYUvvYSUdE8MlWPimV/EERJa9eoIKloMt2vtcNpwwE+KYygqPR6Rz
xvexnh5FwOj9zQAS3KDE+ZUbgrx+S+VkV6C5tlDziuOlT1VZBsGGj/SmfKX9d11z
e22hgebAaEKheACAEvrHzGl5JyR8fmAuBOSWRmybucyzAQRGNm4yqoOaonD8ic0l
CkbjC1ix5BdfP2vAww6wzWRDXSmU1qk8hC4/SXCaS/xH2RN5Jh9ptnw0tuWwP0dk
J8joNBGR3cOrDDwjnqguqE1fckLTa/JOxNy8JzVTugO0v1KmDk7kYuO6GOapNmXI
qUuqrQARTCsoAt6r5qMlyKw/yk4vjLdZ9VxRDtI4uz/P+WeWS4LTG8G9eMKjZ9Kd
rOaIlO7lA6vwFMd7Twe1p6y721yfhGyp6Jcz6UDOdh+cbxZ1fSg8/SsYA9NrbqOk
4bHt7s8YNI/V2kU+LxuZjtOFENa7XvsR/rURts2GvNGDusZJMNHl8aJvOCnF+ReO
M6Ayzx+91R3oRaRdmuuvwLdFStrnSfHp7XcZYOGvN8fUBocfB2c+yZR5H9MhWSnv
h1Gndj+lR7uvKg==
=/J/H
-----END PGP SIGNATURE-----
Merge tag 'v1.12.1'
Synapse 1.12.1 (2020-04-02)
===========================
No significant changes since 1.12.1rc1.
Synapse 1.12.1rc1 (2020-03-31)
==============================
Bugfixes
--------
- Fix starting workers when federation sending not split out. ([\#7133](https://github.com/matrix-org/synapse/issues/7133)). Introduced in v1.12.0.
- Avoid importing `sqlite3` when using the postgres backend. Contributed by David Vo. ([\#7155](https://github.com/matrix-org/synapse/issues/7155)). Introduced in v1.12.0rc1.
- Fix a bug which could cause outbound federation traffic to stop working if a client uploaded an incorrect e2e device signature. ([\#7177](https://github.com/matrix-org/synapse/issues/7177)). Introduced in v1.11.0.
* tag 'v1.12.1':
1.12.1
Note where bugs were introduced
1.12.1rc1
Newsfile
Rewrite changelog
Add changelog
Only import sqlite3 when type checking
Fix another instance
Only setdefault for signatures if device has key_json
Fix starting workers when federation sending not split out.
This broke in a recent PR (#7024) and is no longer useful due to all
replication clients implicitly subscribing to all streams, so let's
just remove it.
If there was an exception setting up one of the attributes of the Homeserver
god object, then future attempts to fetch that attribute would raise a
confusing "Cyclic dependency" error. Let's make sure that we clear the
`building` flag so that we just get the original exception.
Ref: #7169
* Remove `conn_id` usage for UserSyncCommand.
Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.
This really only effects UserSyncCommand.
* Add CLEAR_USER_SYNCS command that is sent on shutdown.
This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
That fallback sets the redirect URL to itself (so it can process the login
token then return gracefully to the client). This would make it pointless to
ask the user for confirmation, since the URL the confirmation page would be
showing wouldn't be the client's.
* Don't show the login forms if we're currently logging in with a
password or a token.
* Submit directly the SSO login form, showing only a spinner to the
user, in order to eliminate from the clunkiness of SSO through this
fallback.
* Don't show the login forms if we're currently logging in with a
password or a token.
* Submit directly the SSO login form, showing only a spinner to the
user, in order to eliminate from the clunkiness of SSO through this
fallback.
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
* Pull Sentinel out of LoggingContext
... and drop a few unnecessary references to it
* Factor out LoggingContext.current_context
move `current_context` and `set_context` out to top-level functions.
Mostly this means that I can more easily trace what's actually referring to
LoggingContext, but I think it's generally neater.
* move copy-to-parent into `stop`
this really just makes `start` and `stop` more symetric. It also means that it
behaves correctly if you manually `set_log_context` rather than using the
context manager.
* Replace `LoggingContext.alive` with `finished`
Turn `alive` into `finished` and make it a bit better defined.
* Add 'device_lists_outbound_pokes' as extra table.
This makes sure we check all the relevant tables to get the current max
stream ID.
Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
* Change device lists stream to have one row per id.
This will make it possible to process the streams more incrementally,
avoiding having to process large chunks at once.
* Change device list replication to match new semantics.
Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).
* Newsfile
* Remove handling of multiple rows per ID
* Fix worker handling
* Comments from review
This should be safe to do on all workers/masters because it is guarded by
a config option which will ensure it is only actually done on the worker
assigned as a pusher.
It was originally implemented by pulling the full auth chain of all
state sets out of the database and doing set comparison. However, that
can take a lot work if the state and auth chains are large.
Instead, lets try and fetch the auth chains at the same time and
calculate the difference on the fly, allowing us to bail early if all
the auth chains converge. Assuming that the auth chains do converge more
often than not, this should improve performance. Hopefully.
Fixes#7065
This is basically the same as https://github.com/matrix-org/synapse/pull/6847 except it tries to populate events from `state_events` rather than `current_state_events`, since the latter might have been cleared from the state of some rooms too early, leaving them with a `NULL` room version.
If an error happened while processing a SAML AuthN response, or a client
ends up doing a `GET` request to `/authn_response`, then render a
customisable error page rather than a confusing error.
Fixes#7054
I also had a look at the rest of the functions in
`EventPushActionsStore` and in the push notifications send code and it
looks to me like there shouldn't be any other method with this issue in
this part of the codebase.
This is a bit fiddly because it all has to be done on one fell swoop:
* Wherever we create a new event, pass in the room version (and check it matches the format version)
* When we prune an event, use the room version of the unpruned event to create the pruned version.
* When we pass an event over the replication protocol, pass the room version over alongside it, and use it when deserialising the event again.
This currently causes presence notify code to log exceptions when there
is no state changes to process. This doesn't actually cause any problems
as we'd simply do nothing anyway.
This makes sure we check all the relevant tables to get the current max
stream ID.
Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
Instead lets just warn if the worker has a media listener configured but
has the media repository disabled.
Previously non media repository workers would just ignore the media
listener.
When we get an invite over federation, store the room version in the rooms table.
The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
This is intended as a precursor to storing room versions when we receive an
invite over federation, but has the happy side-effect of fixing #3374 at last.
In short: change the store_room with try/except to a proper upsert which
updates the right columns.
* Give `notif_template_html`, `notif_template_text` default values (fixes#6960)
* Don't complain if `smtp_host` and `smtp_port` are unset, since they have sensible defaults (fixes#6961)
* Set the example for `enable_notifs` to `True`, for consistency and because it's more useful
* Raise errors as ConfigError rather than RuntimeError for nicer formatting
The state res v2 algorithm only cares about the difference between auth
chains, so we can pass in the known common state to the `get_auth_chain`
storage function so that it can ignore those events.
* Increase DB/CPU perf of `_is_server_still_joined` check.
For rooms with large amount of state a single user leaving could cause
us to go and load a lot of membership events and then pull out
membership state in a large number of batches.
* Newsfile
* Update synapse/storage/persist_events.py
Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Fix adding if too soon
* Update docstring
* Review comments
* Woops typo
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
... and set it everywhere it's called.
while we're here, rename it for consistency with `check_user_in_room` (and to
help check that I haven't missed any instances)
* Reject device display names that are too long.
Too long is currently defined as 100 characters in length.
* Add a regression test for rejecting a too long device display name.
==============================
Features
--------
- Filter out m.room.aliases from the CS API to mitigate abuse while a better solution is specced. ([\#6878](https://github.com/matrix-org/synapse/issues/6878))
Internal Changes
----------------
- Fix continuous integration failures with old versions of `pip`, which were introduced by a release of the `zipp` library. ([\#6880](https://github.com/matrix-org/synapse/issues/6880))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl5BLBwACgkQOSor00I9
eP/9Ogf9HZ+XD5tt1uYkGqDSvUBS4sHdNcvxpNf1UMo6VaY4K5b9kF28MjY07ur2
nGq/nnLyLZZzwVXtV3q8upIdIK1hcMp2HCU0qDhEljepOkGmyQTqZooxrdrLTjPS
232bbifB6VFdK8P4gPIGopCRQI9a+TII4jODBobiJR43e0gEPfSREJKAnJs7DYaD
Quf3Svzu+jB+X8ymWLVbCFXuk4A4UqPzrRPb8TQBDN3w35G4JBDAgIpdxhQdJpdR
RIS1HgSm95P03doS23RLUjXuIpBfYveyStuGfPx4P9Lf30M2o6bxDeTPo2+IGbST
yhl1jB0l8Ve5/6veuTieA6vuXWU55g==
=2rgX
-----END PGP SIGNATURE-----
Merge tag 'v1.10.0rc3' into develop
Synapse 1.10.0rc3 (2020-02-10)
==============================
Features
--------
- Filter out m.room.aliases from the CS API to mitigate abuse while a better solution is specced. ([\#6878](https://github.com/matrix-org/synapse/issues/6878))
Internal Changes
----------------
- Fix continuous integration failures with old versions of `pip`, which were introduced by a release of the `zipp` library. ([\#6880](https://github.com/matrix-org/synapse/issues/6880))
We're in the middle of properly mitigating spam caused by malicious aliases being added to a room. However, until this work fully lands, we temporarily filter out all m.room.aliases events from /sync and /messages on the CS API, to remove abusive aliases. This is considered acceptable as m.room.aliases events were never a reliable record of the given alias->id mapping and were purely informational, and in their current state do more harm than good.
A lot of the things we log at INFO are now a bit superfluous, so lets
make them DEBUG logs to reduce the amount we log by default.
Co-Authored-By: Brendan Abolivier <babolivier@matrix.org>
Co-authored-by: Brendan Abolivier <github@brendanabolivier.com>
==============================
Bugfixes
--------
- Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](https://github.com/matrix-org/synapse/issues/6844))
- Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](https://github.com/matrix-org/synapse/issues/6848))
Internal Changes
----------------
- Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](https://github.com/matrix-org/synapse/issues/6850))
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEumuwyPtYLL2OMhYdOtoG7cdT0R4FAl478l8QHGVyaWtAbWF0
cml4Lm9yZwAKCRA62gbtx1PRHmKfD/0ZpBV95gMk0iecfvO6IY8+0KigzDUypgXf
zp1k+k5DBUmH5/83h7+cXWHIJyofv1At1YdIq9J/nNeyqr3V3dcQwPabOKYbI96d
kVD0EZX+2NjuYejAuF9eJNdiwRq5yMLluCfnOXr5jCS0NNdpO1xVb4bWYm50RmcD
WJvBvmfwXwCC3AFyNb4IAipaVUXuVXXx1YhjFs6DsbtpPUG88e4uIhpfvrS4v0t8
GFJZNuVJStOvyKHHTfNRIMkhy4xidj3HRUMSmjNpx07WnPBnbv4LnUOD1boYxgaP
EoODYnoAPshdiKB0AywNNjue3TmFD/Z7vVEzlFP/lNZ4GDU4kN9C/EwFYw0C2Jqq
f9O/E2ZuP9Qqume5O9UwMQQAmhV5lMBaIsYRbixU+9bSB893zRHRffA11HypzD00
ZXj8QDOXpiXp2jpAwN1Rk9e/aZEX3qd+zUAgRmk+kYb0KTC2/gcY956rodNSSO/U
RFYg4DFvoCgCrRSxZ4LQMlFu3YY2E7qWH+p/OHk3WG79jW64VpEFaytvv8fR+GKq
g3EbtWy6mUzlKYqbjZ+ZTUDe5AWdv6ZX8xJqRD/S6cpiwyh6Gp89HHNvBThXoCmK
fxkgw8if7eIITuTlNuDrqunxyWqwd3oVlzd2mi2bg0yRfcqJ9C6OuBV1VTLFZeky
3sjCiU0IRw==
=kcSy
-----END PGP SIGNATURE-----
Merge tag 'v1.10.0rc2' into develop
Synapse 1.10.0rc2 (2020-02-06)
==============================
Bugfixes
--------
- Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](https://github.com/matrix-org/synapse/issues/6844))
- Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](https://github.com/matrix-org/synapse/issues/6848))
Internal Changes
----------------
- Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](https://github.com/matrix-org/synapse/issues/6850))
We were looking at the wrong event type (`m.room.encryption` vs
`m.room.encrypted`).
Also fixup the duplicate `EvenTypes` entries.
Introduced in #6776.