For the record, the reason we need this is as follows:
each RDATA command comes down the redis pipe as a subscription message. txredisapi as written needs at least three reactor ticks to read each subscription message from the tcp buffer. Hence, once the process gets loaded, it starts getting behind, and eventually redis knifes the connection. it then takes ages for the master to work its way through the backlog, before it reconnects again, during which any commands from any workers are dropped.
We forgot to set the password on the subscriber connection, as well as
not calling super methods for overridden connectionMade/connectionLost
functions.
For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.
We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.
By persisting the user interactive authentication sessions to the database, this fixes
situations where a user hits different works throughout their auth session and also
allows sessions to persist through restarts of Synapse.
This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.
For direct TCP connections we need the master to relay REMOTE_SERVER_UP
commands to the other connections so that all instances get notified
about it. The old implementation just relayed to all connections,
assuming that sending back to the original sender of the command was
safe. This is not true for redis, where commands sent get echoed back to
the sender, which was causing master to effectively infinite loop
sending and then re-receiving REMOTE_SERVER_UP commands that it sent.
The fix is to ensure that we only relay to *other* connections and not
to the connection we received the notification from.
Fixes#7334.
* Factor out functions for injecting events into database
I want to add some more flexibility to the tools for injecting events into the
database, and I don't want to clutter up HomeserverTestCase with them, so let's
factor them out to a new file.
* Rework TestReplicationDataHandler
This wasn't very easy to work with: the mock wrapping was largely superfluous,
and it's useful to be able to inspect the received rows, and clear out the
received list.
* Fix AssertionErrors being thrown by EventsStream
Part of the problem was that there was an off-by-one error in the assertion,
but also the limit logic was too simple. Fix it all up and add some tests.
If the admin adds a `.yaml` file that's either empty or doesn't parse into a dict to a config directory (e.g. `conf.d` for debs installs), stuff like https://github.com/matrix-org/synapse/issues/7322 would happen. This PR checks that the file is correctly parsed into a dict, or ignores it with a warning if it parses into any other type (including `None` for empty files).
Fixes https://github.com/matrix-org/synapse/issues/7322
Figuring out how to correctly limit updates from this stream without dropping
entries is far more complicated than just counting the number of rows being
returned. We need to consider each query separately and, if any one query hits
the limit, truncate the results from the others.
I think this also fixes some potentially long-standing bugs where events or
state changes could get missed if we hit the limit on either query.
Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.
In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.
Fixes (I hope) #7257.
==============================
Features
--------
- Always send users their own device updates. ([\#7160](https://github.com/matrix-org/synapse/issues/7160))
- Add support for handling GET requests for `account_data` on a worker. ([\#7311](https://github.com/matrix-org/synapse/issues/7311))
Bugfixes
--------
- Fix a bug that prevented cross-signing with users on worker-mode synapses. ([\#7255](https://github.com/matrix-org/synapse/issues/7255))
- Do not treat display names as globs in push rules. ([\#7271](https://github.com/matrix-org/synapse/issues/7271))
- Fix a bug with cross-signing devices belonging to remote users who did not share a room with any user on the local homeserver. ([\#7289](https://github.com/matrix-org/synapse/issues/7289))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl6gS+IACgkQOSor00I9
eP8CTwf+KLehHQMcrz0yoHN+/O86mpNv3Uz83BJ0qDIgezVbtqT7+ZwnK2Jb8dwM
zTcGRq5a6V+JLdfSTnXfrPzvAPt0WLaEENSAEGwZ6unqqxcG2PgATkRb0XtY8Rzh
Feu8WgJTnORbiiM3sBLI8toZMxal3X3ZcftPEAan1SaoyqLPQ2s14edDwgx64S5e
rBGzPDNSxZU1McNBq7Q11QTg3zgBUF+TYwyhnLbC5X1RdDwQpKhNZhwhYBS1yOD6
TsPjtlixC+HXb21Rmob7XY8VFxTBSBdUdtVQKAT4h3+l9bNFQLpBfDSVc7xkPiLt
/Gj+PL+YgY9xQDvHO2fVw8m4ololZw==
=p53e
-----END PGP SIGNATURE-----
Merge tag 'v1.12.4rc1' into develop
Synapse 1.12.4rc1 (2020-04-22)
==============================
Features
--------
- Always send users their own device updates. ([\#7160](https://github.com/matrix-org/synapse/issues/7160))
- Add support for handling GET requests for `account_data` on a worker. ([\#7311](https://github.com/matrix-org/synapse/issues/7311))
Bugfixes
--------
- Fix a bug that prevented cross-signing with users on worker-mode synapses. ([\#7255](https://github.com/matrix-org/synapse/issues/7255))
- Do not treat display names as globs in push rules. ([\#7271](https://github.com/matrix-org/synapse/issues/7271))
- Fix a bug with cross-signing devices belonging to remote users who did not share a room with any user on the local homeserver. ([\#7289](https://github.com/matrix-org/synapse/issues/7289))
First some background: StreamChangeCache is used to keep track of what "entities" have
changed since a given stream ID. So for example, we might use it to keep track of when the last
to-device message for a given user was received [1], and hence whether we need to pull any to-device messages from the database on a sync [2].
Now, it turns out that StreamChangeCache didn't support more than one thing being changed at
a given stream_id (this was part of the problem with #7206). However, it's entirely valid to send
to-device messages to more than one user at a time.
As it turns out, this did in fact work, because *some* methods of StreamChangeCache coped
ok with having multiple things changing on the same stream ID, and it seems we never actually
use the methods which don't work on the stream change caches where we allow multiple
changes at the same stream ID. But that feels horribly fragile, hence: let's update
StreamChangeCache to properly support this, and add some typing and some more tests while
we're at it.
[1]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L301
[2]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L47-L51
Splitting based on the response code means we can avoid double logging here and identical information from line 164 while still logging at info if we don't get a good response and need to retry.
Other parts of the code (such as the StreamChangeCache) assume that there will
not be multiple changes with the same stream id.
This code was introduced in #7024, and I hope this fixes#7206.
Add changelog
Save retrieved keys to the db
lint
Fix and de-brittle remote result dict processing
Use query_user_devices instead, assume only master, self_signing key types
Make changelog more useful
Remove very specific exception handling
Wrap get_verify_key_from_cross_signing_key in a try/except
Note that _get_e2e_cross_signing_verify_key can raise a SynapseError
lint
Add comment explaining why this is useful
Only fetch master and self_signing key types
Fix log statements, docstrings
Remove extraneous items from remote query try/except
lint
Factor key retrieval out into a separate function
Send device updates, modeled after SigningKeyEduUpdater._handle_signing_key_updates
Update method docstring
The general idea here is to get rid of the type: ignore annotations on all of the current_token and update_function assignments, which would have caught #7290.
After a bit of experimentation, it seems like the least-awful way to do this is to pass the offending functions in as parameters to the Stream constructor. Unfortunately that means that the concrete implementations no longer have the same constructor signature as Stream itself, which means that it gets hard to correctly annotate STREAMS_MAP.
I've also introduced a couple of new types, to take out some duplication.
Some of the query functions return generators rather than lists, so we can't
index into the result. Happily we already have a copy of the results.
(think this was introduced in #7024)
I don't really remember why this was so complicated; I think it dates
back to the time when we had to instantiate the Config classes before
we could call `add_arguments` - ie before #5597. In any case, I don't
think there's a good reason for it any more, and the impact of it
being complicated is that `--help` doesn't work correctly.
`REPLICATE` is now a valid command, and it's nice if you can issue it from the
console without remembering to call it `REPLICATE ` with a trailing space.
Separate `SimpleCommand` from `Command`, so that things which don't want to use
the `data` property don't have to, and thus fix the warnings PyCharm was giving
me about not calling `__init__` in the base class.
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.
Fixes#6815
Before figuring out whether we should alert a user on MAU, we call get_notice_room_for_user to get some info on the existing server notices room for this user. This function, if the room doesn't exist, creates it and invites the user in it. This means that, if we decide later that no server notice is needed, the user gets invited in a room with no message in it. This happens at every restart of the server, since the room ID returned by get_notice_room_for_user is cached.
This PR fixes that by moving the inviting bit to a dedicated function, that's only called when the server actually needs to send a notice to the user. A potential issue with this approach is that the room that's created by get_notice_room_for_user doesn't match how that same function looks for an existing room (i.e. it creates a room that doesn't have an invite or a join for the current user in it, so it could lead to a new room being created each time a user syncs), but I'm not sure this is a problem given it's cached until the server restarts, so that function won't run very often.
It also renames get_notice_room_for_user into get_or_create_notice_room_for_user to make what it does clearer.
Occasionally we could get a federation device list update transaction which
looked like:
```
[
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D2', 'prev_id': [], 'stream_id': 12, 'deleted': True}},
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D1', 'prev_id': [12], 'stream_id': 11, 'deleted': True}},
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D3', 'prev_id': [11], 'stream_id': 13, 'deleted': True}}
]
```
Having `stream_ids` which are lower than `prev_ids` looks odd. It might work
(I'm not actually sure), but in any case it doesn't seem like a reasonable
thing to expect other implementations to support.
* master:
1.12.1
Note where bugs were introduced
1.12.1rc1
Newsfile
Rewrite changelog
Add changelog
Only import sqlite3 when type checking
Fix another instance
Only setdefault for signatures if device has key_json
Fix starting workers when federation sending not split out.
Attempt to clarify Python version requirements (#7161)
Improve the UX of the login fallback when using SSO (#7152)
Update the wording of the config comment
Lint
Changelog
Regenerate sample config
Whitelist the login fallback by default for SSO
===========================
No significant changes since 1.12.1rc1.
Synapse 1.12.1rc1 (2020-03-31)
==============================
Bugfixes
--------
- Fix starting workers when federation sending not split out. ([\#7133](https://github.com/matrix-org/synapse/issues/7133)). Introduced in v1.12.0.
- Avoid importing `sqlite3` when using the postgres backend. Contributed by David Vo. ([\#7155](https://github.com/matrix-org/synapse/issues/7155)). Introduced in v1.12.0rc1.
- Fix a bug which could cause outbound federation traffic to stop working if a client uploaded an incorrect e2e device signature. ([\#7177](https://github.com/matrix-org/synapse/issues/7177)). Introduced in v1.11.0.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAl6GAZMTHGFuZHJld0Bh
bW9yZ2FuLnh5egAKCRCIhIgNLv5f9JqOD/4kjIBwKSOaUzYhzaP+4o4fDwz49IiO
GzSgq6bf+C1V6Vev/7+N1is0FnbfelaJZHf7wM1044tozL+puPqaGl2A/Zjxs8Pf
x9LpS53yOBsYYUvvYSUdE8MlWPimV/EERJa9eoIKloMt2vtcNpwwE+KYygqPR6Rz
xvexnh5FwOj9zQAS3KDE+ZUbgrx+S+VkV6C5tlDziuOlT1VZBsGGj/SmfKX9d11z
e22hgebAaEKheACAEvrHzGl5JyR8fmAuBOSWRmybucyzAQRGNm4yqoOaonD8ic0l
CkbjC1ix5BdfP2vAww6wzWRDXSmU1qk8hC4/SXCaS/xH2RN5Jh9ptnw0tuWwP0dk
J8joNBGR3cOrDDwjnqguqE1fckLTa/JOxNy8JzVTugO0v1KmDk7kYuO6GOapNmXI
qUuqrQARTCsoAt6r5qMlyKw/yk4vjLdZ9VxRDtI4uz/P+WeWS4LTG8G9eMKjZ9Kd
rOaIlO7lA6vwFMd7Twe1p6y721yfhGyp6Jcz6UDOdh+cbxZ1fSg8/SsYA9NrbqOk
4bHt7s8YNI/V2kU+LxuZjtOFENa7XvsR/rURts2GvNGDusZJMNHl8aJvOCnF+ReO
M6Ayzx+91R3oRaRdmuuvwLdFStrnSfHp7XcZYOGvN8fUBocfB2c+yZR5H9MhWSnv
h1Gndj+lR7uvKg==
=/J/H
-----END PGP SIGNATURE-----
Merge tag 'v1.12.1'
Synapse 1.12.1 (2020-04-02)
===========================
No significant changes since 1.12.1rc1.
Synapse 1.12.1rc1 (2020-03-31)
==============================
Bugfixes
--------
- Fix starting workers when federation sending not split out. ([\#7133](https://github.com/matrix-org/synapse/issues/7133)). Introduced in v1.12.0.
- Avoid importing `sqlite3` when using the postgres backend. Contributed by David Vo. ([\#7155](https://github.com/matrix-org/synapse/issues/7155)). Introduced in v1.12.0rc1.
- Fix a bug which could cause outbound federation traffic to stop working if a client uploaded an incorrect e2e device signature. ([\#7177](https://github.com/matrix-org/synapse/issues/7177)). Introduced in v1.11.0.
* tag 'v1.12.1':
1.12.1
Note where bugs were introduced
1.12.1rc1
Newsfile
Rewrite changelog
Add changelog
Only import sqlite3 when type checking
Fix another instance
Only setdefault for signatures if device has key_json
Fix starting workers when federation sending not split out.
This broke in a recent PR (#7024) and is no longer useful due to all
replication clients implicitly subscribing to all streams, so let's
just remove it.
If there was an exception setting up one of the attributes of the Homeserver
god object, then future attempts to fetch that attribute would raise a
confusing "Cyclic dependency" error. Let's make sure that we clear the
`building` flag so that we just get the original exception.
Ref: #7169
* Remove `conn_id` usage for UserSyncCommand.
Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.
This really only effects UserSyncCommand.
* Add CLEAR_USER_SYNCS command that is sent on shutdown.
This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
That fallback sets the redirect URL to itself (so it can process the login
token then return gracefully to the client). This would make it pointless to
ask the user for confirmation, since the URL the confirmation page would be
showing wouldn't be the client's.
* Don't show the login forms if we're currently logging in with a
password or a token.
* Submit directly the SSO login form, showing only a spinner to the
user, in order to eliminate from the clunkiness of SSO through this
fallback.
* Don't show the login forms if we're currently logging in with a
password or a token.
* Submit directly the SSO login form, showing only a spinner to the
user, in order to eliminate from the clunkiness of SSO through this
fallback.
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
* Pull Sentinel out of LoggingContext
... and drop a few unnecessary references to it
* Factor out LoggingContext.current_context
move `current_context` and `set_context` out to top-level functions.
Mostly this means that I can more easily trace what's actually referring to
LoggingContext, but I think it's generally neater.
* move copy-to-parent into `stop`
this really just makes `start` and `stop` more symetric. It also means that it
behaves correctly if you manually `set_log_context` rather than using the
context manager.
* Replace `LoggingContext.alive` with `finished`
Turn `alive` into `finished` and make it a bit better defined.
* Add 'device_lists_outbound_pokes' as extra table.
This makes sure we check all the relevant tables to get the current max
stream ID.
Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
* Change device lists stream to have one row per id.
This will make it possible to process the streams more incrementally,
avoiding having to process large chunks at once.
* Change device list replication to match new semantics.
Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).
* Newsfile
* Remove handling of multiple rows per ID
* Fix worker handling
* Comments from review
This should be safe to do on all workers/masters because it is guarded by
a config option which will ensure it is only actually done on the worker
assigned as a pusher.
It was originally implemented by pulling the full auth chain of all
state sets out of the database and doing set comparison. However, that
can take a lot work if the state and auth chains are large.
Instead, lets try and fetch the auth chains at the same time and
calculate the difference on the fly, allowing us to bail early if all
the auth chains converge. Assuming that the auth chains do converge more
often than not, this should improve performance. Hopefully.
Fixes#7065
This is basically the same as https://github.com/matrix-org/synapse/pull/6847 except it tries to populate events from `state_events` rather than `current_state_events`, since the latter might have been cleared from the state of some rooms too early, leaving them with a `NULL` room version.
If an error happened while processing a SAML AuthN response, or a client
ends up doing a `GET` request to `/authn_response`, then render a
customisable error page rather than a confusing error.
Fixes#7054
I also had a look at the rest of the functions in
`EventPushActionsStore` and in the push notifications send code and it
looks to me like there shouldn't be any other method with this issue in
this part of the codebase.
This is a bit fiddly because it all has to be done on one fell swoop:
* Wherever we create a new event, pass in the room version (and check it matches the format version)
* When we prune an event, use the room version of the unpruned event to create the pruned version.
* When we pass an event over the replication protocol, pass the room version over alongside it, and use it when deserialising the event again.
This currently causes presence notify code to log exceptions when there
is no state changes to process. This doesn't actually cause any problems
as we'd simply do nothing anyway.
This makes sure we check all the relevant tables to get the current max
stream ID.
Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
Instead lets just warn if the worker has a media listener configured but
has the media repository disabled.
Previously non media repository workers would just ignore the media
listener.
When we get an invite over federation, store the room version in the rooms table.
The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
This is intended as a precursor to storing room versions when we receive an
invite over federation, but has the happy side-effect of fixing #3374 at last.
In short: change the store_room with try/except to a proper upsert which
updates the right columns.
* Give `notif_template_html`, `notif_template_text` default values (fixes#6960)
* Don't complain if `smtp_host` and `smtp_port` are unset, since they have sensible defaults (fixes#6961)
* Set the example for `enable_notifs` to `True`, for consistency and because it's more useful
* Raise errors as ConfigError rather than RuntimeError for nicer formatting
The state res v2 algorithm only cares about the difference between auth
chains, so we can pass in the known common state to the `get_auth_chain`
storage function so that it can ignore those events.
* Increase DB/CPU perf of `_is_server_still_joined` check.
For rooms with large amount of state a single user leaving could cause
us to go and load a lot of membership events and then pull out
membership state in a large number of batches.
* Newsfile
* Update synapse/storage/persist_events.py
Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Fix adding if too soon
* Update docstring
* Review comments
* Woops typo
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
... and set it everywhere it's called.
while we're here, rename it for consistency with `check_user_in_room` (and to
help check that I haven't missed any instances)
* Reject device display names that are too long.
Too long is currently defined as 100 characters in length.
* Add a regression test for rejecting a too long device display name.
==============================
Features
--------
- Filter out m.room.aliases from the CS API to mitigate abuse while a better solution is specced. ([\#6878](https://github.com/matrix-org/synapse/issues/6878))
Internal Changes
----------------
- Fix continuous integration failures with old versions of `pip`, which were introduced by a release of the `zipp` library. ([\#6880](https://github.com/matrix-org/synapse/issues/6880))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl5BLBwACgkQOSor00I9
eP/9Ogf9HZ+XD5tt1uYkGqDSvUBS4sHdNcvxpNf1UMo6VaY4K5b9kF28MjY07ur2
nGq/nnLyLZZzwVXtV3q8upIdIK1hcMp2HCU0qDhEljepOkGmyQTqZooxrdrLTjPS
232bbifB6VFdK8P4gPIGopCRQI9a+TII4jODBobiJR43e0gEPfSREJKAnJs7DYaD
Quf3Svzu+jB+X8ymWLVbCFXuk4A4UqPzrRPb8TQBDN3w35G4JBDAgIpdxhQdJpdR
RIS1HgSm95P03doS23RLUjXuIpBfYveyStuGfPx4P9Lf30M2o6bxDeTPo2+IGbST
yhl1jB0l8Ve5/6veuTieA6vuXWU55g==
=2rgX
-----END PGP SIGNATURE-----
Merge tag 'v1.10.0rc3' into develop
Synapse 1.10.0rc3 (2020-02-10)
==============================
Features
--------
- Filter out m.room.aliases from the CS API to mitigate abuse while a better solution is specced. ([\#6878](https://github.com/matrix-org/synapse/issues/6878))
Internal Changes
----------------
- Fix continuous integration failures with old versions of `pip`, which were introduced by a release of the `zipp` library. ([\#6880](https://github.com/matrix-org/synapse/issues/6880))
We're in the middle of properly mitigating spam caused by malicious aliases being added to a room. However, until this work fully lands, we temporarily filter out all m.room.aliases events from /sync and /messages on the CS API, to remove abusive aliases. This is considered acceptable as m.room.aliases events were never a reliable record of the given alias->id mapping and were purely informational, and in their current state do more harm than good.
A lot of the things we log at INFO are now a bit superfluous, so lets
make them DEBUG logs to reduce the amount we log by default.
Co-Authored-By: Brendan Abolivier <babolivier@matrix.org>
Co-authored-by: Brendan Abolivier <github@brendanabolivier.com>
==============================
Bugfixes
--------
- Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](https://github.com/matrix-org/synapse/issues/6844))
- Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](https://github.com/matrix-org/synapse/issues/6848))
Internal Changes
----------------
- Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](https://github.com/matrix-org/synapse/issues/6850))
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEumuwyPtYLL2OMhYdOtoG7cdT0R4FAl478l8QHGVyaWtAbWF0
cml4Lm9yZwAKCRA62gbtx1PRHmKfD/0ZpBV95gMk0iecfvO6IY8+0KigzDUypgXf
zp1k+k5DBUmH5/83h7+cXWHIJyofv1At1YdIq9J/nNeyqr3V3dcQwPabOKYbI96d
kVD0EZX+2NjuYejAuF9eJNdiwRq5yMLluCfnOXr5jCS0NNdpO1xVb4bWYm50RmcD
WJvBvmfwXwCC3AFyNb4IAipaVUXuVXXx1YhjFs6DsbtpPUG88e4uIhpfvrS4v0t8
GFJZNuVJStOvyKHHTfNRIMkhy4xidj3HRUMSmjNpx07WnPBnbv4LnUOD1boYxgaP
EoODYnoAPshdiKB0AywNNjue3TmFD/Z7vVEzlFP/lNZ4GDU4kN9C/EwFYw0C2Jqq
f9O/E2ZuP9Qqume5O9UwMQQAmhV5lMBaIsYRbixU+9bSB893zRHRffA11HypzD00
ZXj8QDOXpiXp2jpAwN1Rk9e/aZEX3qd+zUAgRmk+kYb0KTC2/gcY956rodNSSO/U
RFYg4DFvoCgCrRSxZ4LQMlFu3YY2E7qWH+p/OHk3WG79jW64VpEFaytvv8fR+GKq
g3EbtWy6mUzlKYqbjZ+ZTUDe5AWdv6ZX8xJqRD/S6cpiwyh6Gp89HHNvBThXoCmK
fxkgw8if7eIITuTlNuDrqunxyWqwd3oVlzd2mi2bg0yRfcqJ9C6OuBV1VTLFZeky
3sjCiU0IRw==
=kcSy
-----END PGP SIGNATURE-----
Merge tag 'v1.10.0rc2' into develop
Synapse 1.10.0rc2 (2020-02-06)
==============================
Bugfixes
--------
- Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](https://github.com/matrix-org/synapse/issues/6844))
- Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](https://github.com/matrix-org/synapse/issues/6848))
Internal Changes
----------------
- Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](https://github.com/matrix-org/synapse/issues/6850))
We were looking at the wrong event type (`m.room.encryption` vs
`m.room.encrypted`).
Also fixup the duplicate `EvenTypes` entries.
Introduced in #6776.
I messed this up a bit in #6805, but fortunately we weren't actually doing
anything with the room_version so it didn't matter that it was a str not a RoomVersion.
When a server leaves a room it may stop sharing a room with remote
users, and thus not get any updates to their device lists. So we need to
check for this case and delete those device lists from the cache.
We don't need to do this if we stop sharing a room because the remote
user leaves the room, because we track that case via looking at
membership changes.
If we detect that the remote users' keys may have changed then we should
attempt to resync against the remote server rather than using the
(potentially) stale local cache.
We were sending device updates down both the federation stream and
device streams. This mean there was a race if the federation sender
worker processed the federation stream first, as when the sender checked
if there were new device updates the slaved ID generator hadn't been
updated with the new stream IDs and so returned nothing.
This situation is correctly handled by events/receipts/etc by not
sending updates down the federation stream and instead having the
federation sender worker listen on the other streams and poke the
transaction queues as appropriate.
Otherwise its just stale data, which may get deleted later anyway so
can't be relied on. It's also a bit of a shotgun if we're trying to get
the current state of a room we're not in.
These are easier to work with than the strings and we normally have one around.
This fixes `FederationHander._persist_auth_tree` which was passing a
RoomVersion object into event_auth.check instead of a string.
There are quite a few places that we assume that a redaction event has a
corresponding `redacts` key, which is not always the case. So lets
cheekily make it so that event.redacts just returns None instead.
Turns out that figuring out a remote user id for the SAML user isn't quite as obvious as it seems. Factor it out to the SamlMappingProvider so that it's easy to control.
Generally try to make this more comprehensible, and make it match the
conventions.
I've removed the documentation for all the settings which allow you to change
the names of the template files, because I can't really see why they are
useful.
* Port synapse.replication.tcp to async/await
* Newsfile
* Correctly document type of on_<FOO> functions as async
* Don't be overenthusiastic with the asyncing....
When figuring out which topological token to start a purge job at, we
need to do the following:
1. Figure out a timestamp before which events will be purged
2. Select the first stream ordering after that timestamp
3. Select info about the first event after that stream ordering
4. Build a topological token from that info
In some situations (e.g. quiet rooms with a short max_lifetime), there
might not be an event after the stream ordering at step 3, therefore we
abort the purge with the error `No event found`. To mitigate that, this
patch fetches the first event _before_ the stream ordering, instead of
after.
Currently we rely on `current_state_events` to figure out what rooms a
user was in and their last membership event in there. However, if the
server leaves the room then the table may be cleaned up and that
information is lost. So lets add a table that separately holds that
information.
AdditionalResource really doesn't add any value, and it gets in the way for
resources which want to support child resources or the like. So, if the
resource object already implements the IResource interface, don't bother
wrapping it.
Add some useful things, such as error types and logcontext handling, to the
API.
Make `hs` a private member to dissuade people from using it (hopefully
they aren't already).
Add a couple of new methods (`record_user_external_id` and
`generate_short_term_login_token`).
This was ill-advised. We can't modify verify_keys here, because the response
object has already been signed by the requested key.
Furthermore, it's somewhat unnecessary because existing versions of Synapse
(which get upset that the notary key isn't present in verify_keys) will fall
back to a direct fetch via `/key/v2/server`.
Also: more tests for fetching keys via perspectives: it would be nice if we actually tested when our fetcher can't talk to our notary impl.
* Kill off redundant SynapseRequestFactory
We already get the Site via the Channel, so there's no need for a dedicated
RequestFactory: we can just use the right constructor.
* Workaround for error when fetching notary's own key
As a notary server, when we return our own keys, include all of our signing
keys in verify_keys.
This is a workaround for #6596.
If acme was enabled, the sdnotify startup hook would never be run because we
would try to add it to a hook which had already fired.
There's no need to delay it: we can sdnotify as soon as we've started the
listeners.
* Remove redundant python2 support code
`str.decode()` doesn't exist on python3, so presumably this code was doing
nothing
* Filter out pushers with corrupt data
When we get a row with unparsable json, drop the row, rather than returning a
row with null `data`, which will then cause an explosion later on.
* Improve logging when we can't start a pusher
Log the ID to help us understand the problem
* Make email pusher setup more robust
We know we'll have a `data` member, since that comes from the database. What we
*don't* know is if that is a dict, and if that has a `brand` member, and if
that member is a string.
Previously we tried to be clever and filter out some unnecessary event
IDs to keep the auth chain small, but that had some annoying
interactions with state res v2 so we stop doing that for now.
Previously we tried to be clever and filter out some unnecessary event
IDs to keep the auth chain small, but that had some annoying
interactions with state res v2 so we stop doing that for now.
This fixes a weird bug where, if you were determined enough, you could end up with a rejected event forming part of the state at a backwards-extremity. Authing that backwards extrem would then lead to us trying to pull the rejected event from the db (with allow_rejected=False), which would fail with a 404.
When we request the state/auth_events to populate a backwards extremity (on
backfill or in the case of missing events in a transaction push), we should
check that the returned events are in the right room rather than blindly using
them in the room state or auth chain.
Given that _get_events_from_store_or_dest takes a room_id, it seems clear that
it should be sanity-checking the room_id of the requested events, so let's do
it there.
Make it return the state *after* the requested event, rather than the one
before it. This is a bit easier and requires fewer calls to
get_events_from_store_or_dest.
This fixes a weird bug where, if you were determined enough, you could end up with a rejected event forming part of the state at a backwards-extremity. Authing that backwards extrem would then lead to us trying to pull the rejected event from the db (with allow_rejected=False), which would fail with a 404.
Sometimes the filtering function can return a pruned version of an event (on top of either the event itself or an empty list), if it thinks the user should be able to see that there's an event there but not the content of that event. Therefore, the previous logic of 'if filtered is empty then we can use the event we retrieved from the database' is flawed, and we should use the event returned by the filtering function.
When we request the state/auth_events to populate a backwards extremity (on
backfill or in the case of missing events in a transaction push), we should
check that the returned events are in the right room rather than blindly using
them in the room state or auth chain.
Given that _get_events_from_store_or_dest takes a room_id, it seems clear that
it should be sanity-checking the room_id of the requested events, so let's do
it there.
Make it return the state *after* the requested event, rather than the one
before it. This is a bit easier and requires fewer calls to
get_events_from_store_or_dest.
PaginationHandler.get_messages is only called by RoomMessageListRestServlet,
which is async.
Chase the code path down from there:
- FederationHandler.maybe_backfill (and nested try_backfill)
- FederationHandler.backfill
=============================
Bugfixes
--------
- Fix incorrect error message for invalid requests when setting user's avatar URL. ([\#6497](https://github.com/matrix-org/synapse/issues/6497))
- Fix support for SQLite 3.7. ([\#6499](https://github.com/matrix-org/synapse/issues/6499))
- Fix regression where sending email push would not work when using a pusher worker. ([\#6507](https://github.com/matrix-org/synapse/issues/6507), [\#6509](https://github.com/matrix-org/synapse/issues/6509))
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEumuwyPtYLL2OMhYdOtoG7cdT0R4FAl3w+RwQHGVyaWtAbWF0
cml4Lm9yZwAKCRA62gbtx1PRHijfD/43EQ58jqKvD+qKZwVFOE6JJ3SCS7UJbi3f
zq9KWDuCB6EAFjAgmJikbBBqHPO5qq2WtNXaCpexx44s8Mk8SZKHg56dP6Fk651C
assAb/Nsh6CAlPUcRkx8I0L/kYXMPDyATlLVBHVOi3pFDJ093mdOQ4q8yP9iUTM+
OPsbT8k/pMhrhCH951bGmB6/SEcju+ubObW+bRFe8o3v1KE9jVYQjUGMhuoXp3pM
z/OB8idZcqOvCc6HMo83tg9FuI613Jy80PMIc1ofyJgvnu+aDBepWuldvFEMnmSR
D862jMor7+WdnDOTeWZrC+DXjl0qCP8F6ahs5rEllRglt/Ep2wEDPA/8YrRoEI3V
RWe9W7XDdFCXdzlvXheOfETqTu9kdsurTwBEeJrWQ0vOLY86hxt9KKKcHVhy5eKq
kNfRtvSVLmRhIssp7hVBcywRwnaxN7R2OoRq/TWnTZz+xEOPzYFU6r0l9kk5dbg6
fqYYxIXbgZlhjihLWNIMNwwZH9ll/eoPiinkRTZNz40THP/VDTR9DM4tQ6jrhrr6
sP0qM7LljvrdiimXtV00tUDyUspNgJl6xDJyGDHWM9uoCB7uorEpMQZSzlZhZe8s
6q+fQPHlfW4JYOejfihPkjrV1ViawvEucqWPaIsD+v26C4RP7qCTtYj3NPiA65of
zhOjomWWcg==
=ABoZ
-----END PGP SIGNATURE-----
Merge tag 'v1.7.0rc2' into develop
Synapse 1.7.0rc2 (2019-12-11)
=============================
Bugfixes
--------
- Fix incorrect error message for invalid requests when setting user's avatar URL. ([\#6497](https://github.com/matrix-org/synapse/issues/6497))
- Fix support for SQLite 3.7. ([\#6499](https://github.com/matrix-org/synapse/issues/6499))
- Fix regression where sending email push would not work when using a pusher worker. ([\#6507](https://github.com/matrix-org/synapse/issues/6507), [\#6509](https://github.com/matrix-org/synapse/issues/6509))
`Measure` incorrectly assumed that it was the only thing being done by the parent `LoggingContext`. For instance, during a "renew group attestations" operation, hundreds of `outbound_request` calls could take place in parallel, all using the same `LoggingContext`. This would mean that any resources used during *any* of those calls would be reported against *all* of them, producing wildly inaccurate results.
Instead, we now give each `Measure` block its own `LoggingContext` (using the parent `LoggingContext` mechanism to ensure that the log lines look correct and that the metrics are ultimately propogated to the top level for reporting against requests/backgrond tasks).
have_events was a map from event_id to rejection reason (or None) for events
which are in our local database. It was used as filter on the list of
event_ids being passed into get_events_as_list. However, since
get_events_as_list will ignore any event_ids that are unknown or rejected, we
can equivalently just leave it to get_events_as_list to do the filtering.
That means that we don't have to keep `have_events` up-to-date, and can use
`have_seen_events` instead of `get_seen_events_with_rejection` in the one place
we do need it.
Implement part [MSC2228](https://github.com/matrix-org/matrix-doc/pull/2228). The parts that differ are:
* the feature is hidden behind a configuration flag (`enable_ephemeral_messages`)
* self-destruction doesn't happen for state events
* only implement support for the `m.self_destruct_after` field (not the `m.self_destruct` one)
* doesn't send synthetic redactions to clients because for this specific case we consider the clients to be able to destroy an event themselves, instead we just censor it (by pruning its JSON) in the database
Purge jobs don't delete the latest event in a room in order to keep the forward extremity and not break the room. On the other hand, get_state_events, when given an at_token argument calls filter_events_for_client to know if the user can see the event that matches that (sync) token. That function uses the retention policies of the events it's given to filter out those that are too old from a client's view.
Some clients, such as Riot, when loading a room, request the list of members for the latest sync token it knows about, and get confused to the point of refusing to send any message if the server tells it that it can't get that information. This can happen very easily with the message retention feature turned on and a room with low activity so that the last event sent becomes too old according to the room's retention policy.
An easy and clean fix for that issue is to discard the room's retention policies when retrieving state.
=============================
Bugfixes
--------
- Fix a bug which could cause the background database update hander for event labels to get stuck in a loop raising exceptions. ([\#6407](https://github.com/matrix-org/synapse/issues/6407))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl3b1yUACgkQOSor00I9
eP/dWQf+ORS/B853qyH5KPZ66o6d7WudSewPmEkFD3747CBevxBsTPETijkqTBlo
WPOmQy9i5OUWpsFYrsrCH+ATpr0JYaIuuoHsIFq/BPFFUx64qrgDwL+X4QEShwAm
kjGNtCMP6VNGjM6MqFepRHSTbIEamCCS665CgVJtqgRYRaAJYI3SQDQ64+ALcbx3
clFZowKV2EtfqhYR7HuBUuxuRjRPGcciNVyjMQFkKq91gKsO4rjPttvE4Bok29ia
/uqFB6T0qty/81T708teZGgB/3/bYK4RtUA4lZCHBNeUejj26bESTI691RfBAEde
to+D7xjA5zaMP3atYNlrvRrqK7Mm3w==
=ucJN
-----END PGP SIGNATURE-----
Merge tag 'v1.6.0rc2' into develop
Synapse 1.6.0rc2 (2019-11-25)
=============================
Bugfixes
--------
- Fix a bug which could cause the background database update hander for event labels to get stuck in a loop raising exceptions. ([\#6407](https://github.com/matrix-org/synapse/issues/6407))
Doing a SELECT DISTINCT when paginating is quite expensive, because it requires the engine to do sorting on the entire events table. However, we only need to run it if we're filtering on 2+ labels, so this PR is changing the request so that DISTINCT is only used then.
We were doing this in a number of places which meant that some login
code paths incremented the counter multiple times.
It was also applying ratelimiting to UIA endpoints, which was probably
not intentional.
In particular, some custom auth modules were calling
`check_user_exists`, which incremented the counters, meaning that people
would fail to login sometimes.
Fixes a bug where rejected events were persisted with the wrong state group.
Also fixes an occasional internal-server-error when receiving events over
federation which are rejected and (possibly because they are
backwards-extremities) have no prev_group.
Fixes#6289.
* Raise an exception if accessing state for rejected events
Add some sanity checks on accessing state_group etc for
rejected events.
* Skip calculating push actions for rejected events
It didn't actually cause any bugs, because rejected events get filtered out at
various later points, but there's not point in trying to calculate the push
actions for a rejected event.
When the `/keys/query` API is hit on client_reader worker Synapse may
decide that it needs to resync some remote deivces. Usually this happens
on master, and then gets cached. However, that fails on workers and so
it falls back to fetching devices from remotes directly, which may in
turn fail if the remote is down.
While the current version of the spec doesn't say much about how this endpoint uses filters (see https://github.com/matrix-org/matrix-doc/issues/2338), the current implementation is that some fields of an EventFilter apply (the ones that are used when running the SQL query) and others don't (the ones that are used by the filter itself) because we don't call event_filter.filter(...). This seems counter-intuitive and probably not what we want so this commit fixes it.
The intention here is to make it clearer which fields we can expect to be
populated when: notably, that the _event_type etc aren't used for the
synchronous impl of EventContext.
The `http_proxy` and `HTTPS_PROXY` env vars can be set to a `host[:port]` value which should point to a proxy.
The address of the proxy should be excluded from IP blacklists such as the `url_preview_ip_range_blacklist`.
The proxy will then be used for
* push
* url previews
* phone-home stats
* recaptcha validation
* CAS auth validation
It will *not* be used for:
* Application Services
* Identity servers
* Outbound federation
* In worker configurations, connections from workers to masters
Fixes#4198.
* Offer the homeserver instance to the spam checker
* Newsfile
* Linting
* Expose a Spam Checker API instead of passing the homeserver object
* Alter changelog
* s/hs/api
This makes it easier to use in an async/await world.
Also fixes a bug where cache descriptors would occaisonally return a raw
value rather than a deferred.
This adds:
* a test sqlite database
* a configuration file for the sqlite database
* a configuration file for a postgresql database (using the credentials in `.buildkite/docker-compose.pyXX.pgXX.yaml`)
as well as a new script named `.buildkite/scripts/test_synapse_port_db.sh` that:
1. installs Synapse
2. updates the test sqlite database to the latest schema and runs background updates on it
3. creates an empty postgresql database
4. run the `synapse_port_db` script to migrate the test sqlite database to the empty postgresql database (with coverage)
Step `2` is done via a new script located at `scripts-dev/update_database`.
The test sqlite database is extracted from a SyTest run, so that it can be considered as an actual homeserver's database with actual data in it.
Turns out that loggers that are instantiated before the config is loaded get
turned off.
Also bring the logging config that is generated by --generate-config into line.
Fixes#6194.
* Fix presence timeouts when synchrotron restarts.
Handling timeouts would fail if there was an external process that had
timed out, e.g. a synchrotron restarting. This was due to a couple of
variable name typoes.
Fixes#3715.
=============================
Bugfixes
--------
- Fix bug where redacted events were sometimes incorrectly censored in the database, breaking APIs that attempted to fetch such events. ([\#6185](https://github.com/matrix-org/synapse/issues/6185), [5b0e9948](5b0e9948ea))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl2oj9sACgkQOSor00I9
eP+ZQQf9GQbjQhq92G1Mt1dI3HxXQnIvelGWaLC9rB++eGcafBkP0EJgCbQZTKda
WntghJib0oWVHVtPbwYKtwF3z365WbME3awO6RpRoMvrFL5PDqJ8Ma5LUgJ6/GuO
yDBfABbZk7PVS5WibftXXqibK8MkSrnId57VN3tc/SUHj3DGe4vk+A7plwjNckBj
2SLUA8DPw5I3Y4Z6peb6VweEA8M59+i3/0cZFAJ+4Ofy2Jad3z8kVXmJ1gTjBOdG
1bjBfnS2SrZLZwD79zJzaUR0zxfIvj62mL/fjrtR66oRmKF2T+9UnM7+9gH9JHWZ
ct3Mk4Nnr4VaBBWOWOcD2btvByckug==
=9lJk
-----END PGP SIGNATURE-----
Merge tag 'v1.4.1rc1' into develop
Synapse 1.4.1rc1 (2019-10-17)
=============================
Bugfixes
--------
- Fix bug where redacted events were sometimes incorrectly censored in the database, breaking APIs that attempted to fetch such events. ([\#6185](https://github.com/matrix-org/synapse/issues/6185), [5b0e9948](5b0e9948ea))
This fixed the weirdness of 400 vs 404 as http status code in the case
the filter id is not known by the server.
As e.g. matrix-js-sdk expects 404 to catch this situation this leads
to unwanted behaviour.
Hopefully this will fix the occasional failures we were seeing in the room directory.
The problem was that events are not necessarily persisted (and `current_state_delta_stream` updated) in the same order as their stream_id. So for instance current_state_delta 9 might be persisted *before* current_state_delta 8. Then, when the room stats saw stream_id 9, it assumed it had done everything up to 9, and never came back to do stream_id 8.
We can solve this easily by only processing up to the stream_id where we know all events have been persisted.
It turns out that _local_membership_update doesn't run when you join a new, remote room. It only runs if you're joining a room that your server already knows about. This would explain #4703 and #5295 and why the transfer would work in testing and some rooms, but not others. This would especially hit single-user homeservers.
The check has been moved to right after the room has been joined, and works much more reliably. (Though it may still be a bit awkward of a place).
More often than not passing bytes to `txn.execute` is a bug (where we
meant to pass a string) that just happens to work if `BYTEA_OUTPUT` is
set to `ESCAPE`. However, this is a bit of a footgun so we want to
instead error when this happens, and force using `bytearray` if we
actually want to use bytes.
* Fix /federation/v1/state for recent room versions
Turns out this endpoint was completely broken for v3 rooms. Hopefully this
re-signing code is irrelevant nowadays anyway.
Pillow will use nearest neighbour as the resampling algorithm if the
source image is either 1-bit or a color palette using 8 bits. If we
convert to RGB before scaling, we'll probably get a better result.
Replace `client_secret` query parameter values with `<redacted>` in the logs. Prevents a scenario where a MITM of server traffic can horde 3pids on their account.
We incorrectly used `room_id` as to bound the result set, even though we
order by `joined_members, room_id`, leading to incorrect results after
pagination.
Copy push rules during a room upgrade from the old room to the new room, instead of deleting them from the old room.
For instance, we've defined upgrading of a room multiple times to be possible, and push rules won't be transferred on the second upgrade if they're deleted during the first.
Also fix some missing yields that probably broke things quite a bit.
While this is not documented in the spec (but should be), Riot (and other clients) revoke 3PID invites by sending a m.room.third_party_invite event with an empty ({}) content to the room's state.
When the invited 3PID gets associated with a MXID, the identity server (which doesn't know about revocations) sends down to the MXID's homeserver all of the undelivered invites it has for this 3PID. The homeserver then tries to talk to the inviting homeserver in order to exchange these invite for m.room.member events.
When one of the invite is revoked, the inviting homeserver responds with a 500 error because it tries to extract a 'display_name' property from the content, which is empty. This might cause the invited server to consider that the server is down and not try to exchange other, valid invites (or at least delay it).
This fix handles the case of revoked invites by avoiding trying to fetch a 'display_name' from the original invite's content, and letting the m.room.member event fail the auth rules (because, since the original invite's content is empty, it doesn't have public keys), which results in sending a 403 with the correct error message to the invited server.
We have set the max retry interval to a value larger than a postgres or
sqlite int can hold, which caused exceptions when updating the
destinations table.
To fix postgres we need to change the column to a bigint, and for sqlite
we lower the max interval to 2**62 (which is still incredibly long).
Joining against `events` and ordering by `stream_ordering` is
inefficient as it forced scanning the entirety of the redactions table.
This isn't the case if we use `redactions.received_ts` column as we can
then use an index.
Currently we don't set `have_censored` column if we don't have the
target event of a redaction, which means we repeatedly attempt to censor
the same non-existant event.
When we persist non-redacted events we unset the `have_censored` column
for any redactions that target said event.
Doing a password reset via SMS has never worked, and in any case is a silly
idea because msisdn recycling is a thing.
See also matrix-org/matrix-doc#2303.
Pull the checkers out to their own classes, rather than having them lost in a
massive 1000-line class which does everything.
This is also preparation for some more intelligent advertising of flows, as per #6100
We don't actually care about what happens in `get_user_by_req` and
having it as a separate span means that the entity tag isn't added to
the servlet spans, making it harder to search.
because, frankly, it looked like it was written by an axe-murderer.
This should be a non-functional change, except that where `m.login.dummy` was
previously advertised *before* `m.login.terms`, it will now be advertised
afterwards. AFAICT that should have no effect, and will be more consistent with
the flows that involve passing a 3pid.
Second part of solving #6076Fixes#6076
We return a submit_url parameter on calls to POST */msisdn/requestToken so that clients know where to submit token information to.
Uses a SimpleHttpClient instance equipped with the federation_ip_range_blacklist list for requests to identity servers provided by user input. Does not use a blacklist when contacting identity servers specified by account_threepid_delegates. The homeserver trusts the latter and we don't want to prevent homeserver admins from specifying delegates that are on internal IP addresses.
Fixes#5935
As MSC2263 states, m.require_identity_server must be set to false when it does not require an identity server to be provided by the client for the purposes of email registration or password reset.
Adds an m.require_identity_server flag to /versionss unstable_flags section. This will advertise that Synapse no longer needs id_server as a parameter.
This is a) simpler than querying user_ips directly and b) means we can
purge older entries from user_ips without losing the required info.
The storage functions now no longer return the access_token, since it
was unused.
Implements MSC2290. This PR adds two new endpoints, /unstable/account/3pid/add and /unstable/account/3pid/bind. Depending on the progress of that MSC the unstable prefix may go away.
This PR also removes the blacklist on some 3PID tests which occurs in #6042, as the corresponding Sytest PR changes them to use the new endpoints.
Finally, it also modifies the account deactivation code such that it doesn't just try to deactivate 3PIDs that were bound to the user's account, but any 3PIDs that were bound through the homeserver on that user's account.
Fixes#6066
This register endpoint should be disabled if registration is disabled, otherwise we're giving anyone the ability to check if a username exists on a server when we don't need to be.
Error code is 403 (Forbidden) as that's the same returned by /register when registration is disabled.
In ancient times Synapse would only send emails when it was notifying a user about a message they received...
Now it can do all sorts of neat things!
Change the logging so it's not just about notifications.
Remove trailing slash ability from the password reset submit_token endpoint. Since we provide the link in an email, and have never sent it with a trailing slash, there's no point for us to accept them on the endpoint.
The validation links sent via email had their query parameters inserted without any URL-encoding. Surprisingly this didn't seem to cause any issues, but if a user were to put a `/` in their client_secret it could lead to problems.
Fixes a bug where the default attribute maps were prioritised over
user-specified ones, resulting in incorrect mappings.
The problem is that if you call SPConfig.load() multiple times, it adds new
attribute mappers to a list. So by calling it with the default config first,
and then the user-specified config, we would always get the default mappers
before the user-specified mappers.
To solve this, let's merge the config dicts first, and then pass them to
SPConfig.
This is a partial revert of #5893. The problem is that if we drop these tables
in the same release as removing the code that writes to them, it prevents users
users from being able to roll back to a previous release.
So let's leave the tables in place for now, and remember to drop them in a
subsequent release.
(Note that these tables haven't been *read* for *years*, so any missing rows
resulting from a temporary upgrade to vNext won't cause a problem.)
Removes the POST method from `/password_reset/<medium>/submit_token/` as it's only used by phone number verification which Synapse does not support yet.
This is a potential solution to https://github.com/vector-im/riot-web/issues/3374
and https://github.com/vector-im/riot-web/issues/5953
as raised by Mozilla at https://github.com/vector-im/riot-web/issues/10868.
This lets you define a push rule action which increases the badge count (unread notification)
count on a given room, but doesn't actually send a push for that notification via email or HTTP.
We might want to define this as the default behaviour for group chats in future
to solve https://github.com/vector-im/riot-web/issues/3268 at last.
This is implemented as a string action rather than a tweak because:
* Other pushers don't care about the tweak, given they won't ever get pushed
* The DB can store the tweak more efficiently using the existing `notify` table.
* It avoids breaking the default_notif/highlight_action optimisations.
Clients which generate their own notifs (e.g. desktop notifs from Riot/Web
would need to be aware of the new push action) to uphold it.
An alternative way to do this would be to maintain a `msg_count` alongside
`highlight_count` and `notification_count` in `unread_notifications` in sync responses.
However, doing this by counting the rows in `events` since the `stream_position`
of the user's last read receipt turns out to be painfully slow (~200ms), perhaps
due to the size of the events table. So instead, we use the highly optimised
existing event_push_actions (and event_push_actions_staging) table to maintain
the counts - using the code paths which already exist for tracking unread
notification counts efficiently. These queries are typically ~3ms or so.
The biggest issues I see here are:
* We're slightly repurposing the `notif` field on `event_push_actions` to
track whether a given action actually sent a `push` or not. This doesn't
seem unreasonable, but it's slightly naughty given that previously the
field explicitly tracked whether `notify` was true for the action (and
as a result, it was uselessly always set to 1 in the DB).
* We're going to put more load on the `event_push_actions` table for all the
random group chats which people had previously muted. In practice i don't
think there are many of these though.
* There isn't an MSC for this yet (although this comment could become one).
3PID invites require making a request to an identity server to check that the invited 3PID has an Matrix ID linked, and if so, what it is.
These requests are being made on behalf of a user. The user will supply an identity server and an access token for that identity server. The homeserver will then forward this request with the access token (using an `Authorization` header) and, if the given identity server doesn't support v2 endpoints, will fall back to v1 (which doesn't require any access tokens).
Requires: ~~#5976~~
Broke in #5971
Basically the bug is that if get_current_state_deltas returns no new updates and we then take the max pos, its possible that we miss an update that happens in between the two calls. (e.g. get_current_state_deltas looks up to stream pos 5, then an event persists and so getting the max stream pos returns 6, meaning that next time we check for things with a stream pos bigger than 6)
We want to assign unique mxids to saml users based on an incrementing
suffix. For that to work, we need to record the allocated mxid in a separate
table.
This allows support users to be created even on MAU limits via
the admin API. Support users are excluded from MAU after creation,
so it makes sense to exclude them in creation - except if the
whole host is in disabled state.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
Some small fixes to `room_member.py` found while doing other PRs.
1. Add requester to the base `_remote_reject_invite` method.
2. `send_membership_event`'s docstring was out of date and took in a `remote_room_hosts` arg that was not used and no calling function provided.
Another small fixup noticed during work on a larger PR. The `origin` field of `add_display_name_to_third_party_invite` is not used and likely was just carried over from the `on_PUT` method of `FederationThirdPartyInviteExchangeServlet` which, like all other servlets, provides an `origin` argument.
Since it's not used anywhere in the handler function though, we should remove it from the function arguments.
Previously if the first registered user was a "support" or "bot" user,
when the first real user registers, the auto-join rooms were not
created.
Fix to exclude non-real (ie users with a special user type) users
when counting how many users there are to determine whether we should
auto-create a room.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
This is a combination of a few different PRs, finally all being merged into `develop`:
* #5875
* #5876
* #5868 (This one added the `/versions` flag but the flag itself was actually [backed out](891afb57cb (diff-e591d42d30690ffb79f63bb726200891)) in #5969. What's left is just giving /versions access to the config file, which could be useful in the future)
* #5835
* #5969
* #5940
Clients should not actually use the new registration functionality until https://github.com/matrix-org/synapse/pull/5972 is merged.
UPGRADE.rst, changelog entries and config file changes should all be reviewed closely before this PR is merged.
* Ensure the list media admin API is always available
This API is required for some external media repo implementations to operate (mostly for doing quarantine operations on a room).
* changelog
Remove all the "double return" statements which were a result of us removing all the instances of
```
defer.returnValue(...)
return
```
statements when we switched to python3 fully.
Python will return a tuple whether there are parentheses around the returned values or not.
I'm just sick of my editor complaining about this all over the place :)
Template config files
* Imagine a system composed entirely of x, y, z etc and the basic operations..
Wait George, why XOR? Why not just neq?
George: Eh, I didn't think of that..
Co-Authored-By: Erik Johnston <erik@matrix.org>
Some of the caches on worker processes were not being correctly invalidated
when a room's state was changed in a way that did not affect the membership
list of the room.
We need to make sure we send out cache invalidations even when no memberships
are changing.
Propagate opentracing contexts across workers
Also includes some Convenience modifications to opentracing for servlets, notably:
- Add boolean to skip the whitelisting check on inject
extract methods. - useful when injecting into carriers
locally. Otherwise we'd always have to include our
own servername and whitelist our servername
- start_active_span_from_request instead of header
- Add boolean to decide whether to extract context
from a request to a servlet
This type of registration was probably never used. It only includes the
user name in the HMAC but not the password.
Shared secret registration is still available via
client/r0/admin/register.
Signed-off-by: Manuel Stahl <manuel.stahl@awesome-technologies.de>
There's no point doing a raise_from here, because the exception is always
logged at warn with no stacktrace in the caller. Instead, let's try to give
better messages to reduce confusion.
In particular, this means that we won't log 'Failed to connect to remote
server' when we don't even attempt to connect to the remote server due to
blacklisting.
Get rid of the labyrinthine `recoverer_fn` code, and clean up the startup code
(it seemed to be previously inexplicably split between
`ApplicationServiceScheduler.start` and `_Recoverer.start`).
Add some docstrings too.
Hopefully, this will fix a stack overflow when recovering an appservice.
The recursion here leads to a huge chain of deferred callbacks, which then
overflows the stack when the chain completes. `inlineCallbacks` makes a better
job of this if we use iteration instead.
Clean up the code a bit too, while we're there.
Get rid of the labyrinthine `recoverer_fn` code, and clean up the startup code
(it seemed to be previously inexplicably split between
`ApplicationServiceScheduler.start` and `_Recoverer.start`).
Add some docstrings too.
Add authenticated_entity and servlet_names tags.
Functionally:
- Add a tag for authenticated_entity
- Add a tag for servlet_names
Stylistically:
Moved to importing methods directly from opentracing.
Fixes#5833
The emailconfig code was attempting to pull incorrect config file names. This corrects that, while also marking a difference between a config file variable that's a filepath versus a str containing HTML.
This refactors MatrixFederationAgent to move the SRV lookup into the
endpoint code, this has two benefits:
1. Its easier to retry different host/ports in the same way as
HostnameEndpoint.
2. We avoid SRV lookups if we have a free connection in the pool
If we have recently seen a valid well-known for a domain we want to
retry on (non-final) errors a few times, to handle temporary blips in
networking/etc.
is cached and so does not always return a `Deferred`.
`await` does not silently pass-through non-Deferreds like `yield` used to.
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
This gives a bit of a grace period where we can attempt to refetch a
remote `well-known`, while still using the cached result if that fails.
Hopefully this will make the well-known resolution a bit more torelant
of failures, rather than it immediately treating failures as "no result"
and caching that for an hour.
* allow devices to be marked as "hidden"
This is a prerequisite for cross-signing, as it allows us to create other things
that live within the device namespace, so they can be used for signatures.
It costs both us and the remote server for us to fetch the well known
for every single request we send, so we add a minimum cache period. This
is set to 5m so that we still honour the basic premise of "refetch
frequently".
When persisting events we calculate new stream orderings up front.
Before we notify about an event all events with lower stream orderings
must have finished being persisted.
This PR moves the assignment of stream ordering till *after* calculated
the new current state and split the batch of events into separate chunks
for persistence. This means that if it takes a long time to calculate
new current state then it will not block events in other rooms being
notified about.
This should help reduce some global pauses in the events stream which
can last for tens of seconds (if not longer), caused by some
particularly expensive state resolutions.
This hopefully addresses #5407 by gracefully handling an empty but
limited TimelineBatch. We also add some logging to figure out how this
is happening.
This is intended as an amendment to #5674 as using M_UNKNOWN as the errcode makes it hard for clients to differentiate between an invalid password and a deactivated user (the problem we were trying to solve in the first place).
M_UNKNOWN was originally chosen as it was presumed than an MSC would have to be carried out to add a new code, but as Synapse often is the testing bed for new MSC implementations, it makes sense to try it out first in the wild and then add it into the spec if it is successful. Thus this PR return a new M_USER_DEACTIVATED code when a deactivated user attempts to login.
The `expire_access_token` didn't do what it sounded like it should do. What it
actually did was make Synapse enforce the 'time' caveat on macaroons used as
access tokens, but since our access token macaroons never contained such a
caveat, it was always a no-op.
(The code to add 'time' caveats was removed back in v0.18.5, in #1656)
Annoyingly, `current_state_events` table can include rejected events,
in which case the membership column will be null. To work around this
lets just always filter out null membership for now.
There was some inconsistent behaviour in the caching layer around how
exceptions were handled - particularly synchronously-thrown ones.
This seems to be most easily handled by pushing the creation of
ObservableDeferreds down from CacheDescriptor to the Cache.