This makes it easier to use in an async/await world.
Also fixes a bug where cache descriptors would occaisonally return a raw
value rather than a deferred.
* Fix presence timeouts when synchrotron restarts.
Handling timeouts would fail if there was an external process that had
timed out, e.g. a synchrotron restarting. This was due to a couple of
variable name typoes.
Fixes#3715.
This fixed the weirdness of 400 vs 404 as http status code in the case
the filter id is not known by the server.
As e.g. matrix-js-sdk expects 404 to catch this situation this leads
to unwanted behaviour.
Hopefully this will fix the occasional failures we were seeing in the room directory.
The problem was that events are not necessarily persisted (and `current_state_delta_stream` updated) in the same order as their stream_id. So for instance current_state_delta 9 might be persisted *before* current_state_delta 8. Then, when the room stats saw stream_id 9, it assumed it had done everything up to 9, and never came back to do stream_id 8.
We can solve this easily by only processing up to the stream_id where we know all events have been persisted.
We have set the max retry interval to a value larger than a postgres or
sqlite int can hold, which caused exceptions when updating the
destinations table.
To fix postgres we need to change the column to a bigint, and for sqlite
we lower the max interval to 2**62 (which is still incredibly long).
Pull the checkers out to their own classes, rather than having them lost in a
massive 1000-line class which does everything.
This is also preparation for some more intelligent advertising of flows, as per #6100
because, frankly, it looked like it was written by an axe-murderer.
This should be a non-functional change, except that where `m.login.dummy` was
previously advertised *before* `m.login.terms`, it will now be advertised
afterwards. AFAICT that should have no effect, and will be more consistent with
the flows that involve passing a 3pid.
This is a) simpler than querying user_ips directly and b) means we can
purge older entries from user_ips without losing the required info.
The storage functions now no longer return the access_token, since it
was unused.
This allows support users to be created even on MAU limits via
the admin API. Support users are excluded from MAU after creation,
so it makes sense to exclude them in creation - except if the
whole host is in disabled state.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
Previously if the first registered user was a "support" or "bot" user,
when the first real user registers, the auto-join rooms were not
created.
Fix to exclude non-real (ie users with a special user type) users
when counting how many users there are to determine whether we should
auto-create a room.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
Python will return a tuple whether there are parentheses around the returned values or not.
I'm just sick of my editor complaining about this all over the place :)
Template config files
* Imagine a system composed entirely of x, y, z etc and the basic operations..
Wait George, why XOR? Why not just neq?
George: Eh, I didn't think of that..
Co-Authored-By: Erik Johnston <erik@matrix.org>
Get rid of the labyrinthine `recoverer_fn` code, and clean up the startup code
(it seemed to be previously inexplicably split between
`ApplicationServiceScheduler.start` and `_Recoverer.start`).
Add some docstrings too.
This refactors MatrixFederationAgent to move the SRV lookup into the
endpoint code, this has two benefits:
1. Its easier to retry different host/ports in the same way as
HostnameEndpoint.
2. We avoid SRV lookups if we have a free connection in the pool
If we have recently seen a valid well-known for a domain we want to
retry on (non-final) errors a few times, to handle temporary blips in
networking/etc.
This gives a bit of a grace period where we can attempt to refetch a
remote `well-known`, while still using the cached result if that fails.
Hopefully this will make the well-known resolution a bit more torelant
of failures, rather than it immediately treating failures as "no result"
and caching that for an hour.
It costs both us and the remote server for us to fetch the well known
for every single request we send, so we add a minimum cache period. This
is set to 5m so that we still honour the basic premise of "refetch
frequently".
The `expire_access_token` didn't do what it sounded like it should do. What it
actually did was make Synapse enforce the 'time' caveat on macaroons used as
access tokens, but since our access token macaroons never contained such a
caveat, it was always a no-op.
(The code to add 'time' caveats was removed back in v0.18.5, in #1656)
There was some inconsistent behaviour in the caching layer around how
exceptions were handled - particularly synchronously-thrown ones.
This seems to be most easily handled by pushing the creation of
ObservableDeferreds down from CacheDescriptor to the Cache.
--------
- Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAl04Ur0THGFuZHJld0Bh
bW9yZ2FuLnh5egAKCRCIhIgNLv5f9F4oD/0TY6S/SEd2uAmzor64ojmbX5BOwPzf
j/wzUTrfvuf40EvkNPDpnejNZSvy/ysbaGQaQusv0SQKlV3xrvdn4RuMvnOWVWck
kBsO+lvzOaUTR0KHDxN4y9F5eI2NdPbub4847PPVzyqSIHAd+kolxXS8kSBBhwpL
yfaICWV/AOy5L7xN+JZ9IQpnegVAvUj5DmgXzDHd6VdeiHDVJuARaBgrR5uCkwVS
ZoLRqZ95XV/qiguMAUvPOwyEqht2mwO64989MswP16YYm8oMkB5QA6I5nYnACsTP
qk9YcN/oNvEfQXUhttku6MxK1/4yUMPUhEoDBDH7ebc0440QDtWN+IHTdA6oPVZB
IuStL9YGY16m7Ltx37ZUA4URfNMiSeLHo3zKc/mCAcwxN4HyOjJewtxbG5zKQAOZ
SMs8UcDwGR4zL1hnt8ZDNYtWwfzJBQIdGjoHvjXJEY7/1csTv2lmAwewFTXiqSAr
30GW5ews94kotqBK53zZT6V0F5gHNqgGHniOz1ZpqLLxYLqO3LSAGe97CrqlWUdX
GkhA9tZyweknociD9fyyBmKdcFJ4mL4a+oGI5CMnSMph8UvCY8Y5XMb1T+iYEABI
tA9G3mBvgkLPj+5V+8QggNkBafSigW2Q4FX7enGsDmiiskZOtfeKrAcVkapD4ooi
3I7IW5aetZr2IQ==
=+JBn
-----END PGP SIGNATURE-----
Merge tag 'v1.2.0rc2' into develop
Bugfixes
--------
- Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
* Fix servlet metric names
Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Remove redundant check
* Cover all return paths
We can now use `_get_events_from_cache_or_db` rather than going right back to
the database, which means that (a) we can benefit from caching, and (b) it
opens the way forward to more extensive checks on the original event.
We now always require the original event to exist before we will serve up a
redaction.
First of all, let's get rid of `TOKEN_NOT_FOUND_HTTP_STATUS`. It was a hack we
did at one point when it was possible to return either a 403 or a 401 if the
creds were missing. We always return a 401 in these cases now (thankfully), so
it's not needed.
Let's also stop abusing `AuthError` for these cases. Honestly they have nothing
that relates them to the other places that `AuthError` is used, other than the
fact that they are loosely under the 'Auth' banner. It makes no sense for them
to share exception classes.
Instead, let's add a couple of new exception classes: `InvalidClientTokenError`
and `MissingClientTokenError`, for the `M_UNKNOWN_TOKEN` and `M_MISSING_TOKEN`
cases respectively - and an `InvalidClientCredentialsError` base class for the
two of them.
this is only used in one place, so it's clearer if we inline it and reduce the
API surface.
Also, fixes a buglet where we would create an access token even if we were
about to block the user (we would never return the AT, so the user could never
use it, but it was still created and added to the db.)
When asking for the relations of an event, include the original event in the response. This will mostly be used for efficiently showing edit history, but could be useful in other circumstances.
Nothing uses this now, so we can remove the dead code, and clean up the
API.
Since we're changing the shape of the return value anyway, we take the
opportunity to give the method a better name.
- Put the default window_size back to 1000ms (broken by #5181)
- Make the `rc_federation` config actually do something
- fix an off-by-one error in the 'concurrent' limit
- Avoid creating an unused `_PerHostRatelimiter` object for every single
incoming request
Adds new config option `cleanup_extremities_with_dummy_events` which
periodically sends dummy events to rooms with more than 10 extremities.
THIS IS REALLY EXPERIMENTAL.
Some keys are stored in the synapse database with a null valid_until_ms
which caused an exception to be thrown when using that key. We fix this
by treating nulls as zeroes, i.e. they keys will match verification
requests with a minimum_valid_until_ms of zero (i.e. don't validate ts)
but will not match requests with a non-zero minimum_valid_until_ms.
Fixes#5391.
Sends password reset emails from the homeserver instead of proxying to the identity server. This is now the default behaviour for security reasons. If you wish to continue proxying password reset requests to the identity server you must now enable the email.trust_identity_server_for_password_resets option.
This PR is a culmination of 3 smaller PRs which have each been separately reviewed:
* #5308
* #5345
* #5368
There are a few changes going on here:
* We make checking the signature on a key server response optional: if no
verify_keys are specified, we trust to TLS to validate the connection.
* We change the default config so that it does not require responses to be
signed by the old key.
* We replace the old 'perspectives' config with 'trusted_key_servers', which
is also formatted slightly differently.
* We emit a warning to the logs every time we trust a key server response
signed by the old key.
* Fix background updates to handle redactions/rejections
In background updates based on current state delta stream we need to
handle that we may not have all the events (or at least that
`get_events` may raise an exception).
Also:
* rename VerifyKeyRequest->VerifyJsonRequest
* calculate key_ids on VerifyJsonRequest construction
* refactor things to pass around VerifyJsonRequests instead of 4-tuples
When handling incoming federation requests, make sure that we have an
up-to-date copy of the signing key.
We do not yet enforce the validity period for event signatures.
When enabling the account validity feature, Synapse will look at startup for registered account without an expiration date, and will set one equals to 'now + validity_period' for them. On large servers, it can mean that a large number of users will have the same expiration date, which means that they will all be sent a renewal email at the same time, which isn't ideal.
In order to mitigate this, this PR allows server admins to define a 'max_delta' so that the expiration date is a random value in the [now + validity_period ; now + validity_period + max_delta] range. This allows renewal emails to be progressively sent over a configured period instead of being sent all in one big batch.
The list of server names was redundant, since it was equivalent to the keys on
the server_to_deferred map. This reduces the number of large lists being passed
around, and has the benefit of deduplicating the entries in `wait_on`.
Replaces DEFAULT_ROOM_VERSION constant with a method that first checks the config, then returns a hardcoded value if the option is not present.
That hardcoded value is now located in the server.py config file.
Rather than have three methods which have to have the same interface,
factor out a separate interface which is provided by three implementations.
I find it easier to grok the code this way.
This is a first step to checking that the key is valid at the required moment.
The idea here is that, rather than passing VerifyKey objects in and out of the
storage layer, we instead pass FetchKeyResult objects, which simply wrap the
VerifyKey and add a valid_until_ts field.
Storing server keys hammered the database a bit. This replaces the
implementation which stored a single key, with one which can do many updates at
once.
If account validity is enabled in the server's configuration, this job will run at startup as a background job and will stick an expiration date to any registered account missing one.
Prevents a SynapseError being raised inside of a IResolutionReceiver and instead opts to just return 0 results. This thus means that we have to lump a failed lookup and a blacklisted lookup together with the same error message, but the substitute should be generic enough to cover both cases.
This commit adds two config options:
* `restrict_public_rooms_to_local_users`
Requires auth to fetch the public rooms directory through the CS API and disables fetching it through the federation API.
* `require_auth_for_profile_requests`
When set to `true`, requires that requests to `/profile` over the CS API are authenticated, and only returns the user's profile if the requester shares a room with the profile's owner, as per MSC1301.
MSC1301 also specifies a behaviour for federation (only returning the profile if the server asking for it shares a room with the profile's owner), but that's currently really non-trivial to do in a not too expensive way. Next step is writing down a MSC that allows a HS to specify which user sent the profile query. In this implementation, Synapse won't send a profile query over federation if it doesn't believe it already shares a room with the profile's owner, though.
Groups have been intentionally omitted from this commit.
This endpoint isn't much use for its intended purpose if you first need to get
yourself an admin's auth token.
I've restricted it to the `/_synapse/admin` path to make it a bit easier to
lock down for those concerned about exposing this information. I don't imagine
anyone is using it in anger currently.
Hopefully this time we really will fix#4422.
We need to make sure that the cache on
`get_rooms_for_user_with_stream_ordering` is invalidated *before* the
SyncHandler is notified for the new events, and we can now do so reliably via
the `events` stream.
I don't have a database with the same name as my user, so leaving the database
name unset fails.
While we're at it, clear out some unused stuff in the test setup.
As per #3622, we remove trailing slashes from outbound federation requests. However, to ensure that we remain backwards compatible with previous versions of Synapse, if we receive a HTTP 400 with `M_UNRECOGNIZED`, then we are likely talking to an older version of Synapse in which case we retry with a trailing slash appended to the request path.
Rather than using a Mock for the homeserver config, use a genuine
HomeServerConfig object. This makes for a more realistic test, and means that
we don't have to keep remembering to add things to the mock config every time
we add a new config setting.
The Mailer expects the config object to have `email_smtp_pass` and
`email_riot_base_url` attributes (and it won't by default, because the default
config impl doesn't set any of the attributes unless email_enable_notifs is
set).
* Rate-limiting for registration
* Add unit test for registration rate limiting
* Add config parameters for rate limiting on auth endpoints
* Doc
* Fix doc of rate limiting function
Co-Authored-By: babolivier <contact@brendanabolivier.com>
* Incorporate review
* Fix config parsing
* Fix linting errors
* Set default config for auth rate limiting
* Fix tests
* Add changelog
* Advance reactor instead of mocked clock
* Move parameters to registration specific config and give them more sensible default values
* Remove unused config options
* Don't mock the rate limiter un MAU tests
* Rename _register_with_store into register_with_store
* Make CI happy
* Remove unused import
* Update sample config
* Fix ratelimiting test for py2
* Add non-guest test
We will later need also to import 'register_servlets' from the
'login' module, so we un-pollute the namespace now to keep the
logical changes separate.
two reasons for this. One, it saves a bunch of boilerplate. Two, it squashes
unicode to IDNA-in-a-`str` (even on python 3) in a way that it turns out we
rely on to give consistent behaviour between python 2 and 3.
Turns out that the library does a better job of parsing URIs than our
reinvented wheel. Who knew.
There are two things going on here. The first is that, unlike
parse_server_name, URI.fromBytes will strip off square brackets from IPv6
literals, which means that it is valid input to ClientTLSOptionsFactory and
HostnameEndpoint.
The second is that we stay in `bytes` throughout (except for the argument to
ClientTLSOptionsFactory), which avoids the weirdness of (sometimes) ending up
with idna-encoded values being held in `unicode` variables. TBH it probably
would have been ok but it made the tests fragile.
The problem here is that we have cut-and-pasted an impl from Twisted, and then
failed to maintain it. It was fixed in Twisted in
https://github.com/twisted/twisted/pull/1047/files; let's do the same here.
* Remove redundant WrappedConnection
The matrix federation client uses an HTTP connection pool, which times out its
idle HTTP connections, so there is no need for any of this business.
* Correctly retry and back off if we get a HTTPerror response
* Refactor request sending to have better excpetions
MatrixFederationHttpClient blindly reraised exceptions to the caller
without differentiating "expected" failures (e.g. connection timeouts
etc) versus more severe problems (e.g. programming errors).
This commit adds a RequestSendFailed exception that is raised when
"expected" failures happen, allowing the TransactionQueue to log them as
warnings while allowing us to log other exceptions as actual exceptions.
prometheus_client 0.5 has a named-tuple Sample type with more member
than the old plain tuple had. This commit makes sure the unit test
detects this and changes the way it reads the sample.
Signed-off-by: Maarten de Vries <maarten@de-vri.es>
Allow for the creation of a support user.
A support user can access the server, join rooms, interact with other users, but does not appear in the user directory nor does it contribute to monthly active user limits.
This implements both a SAML2 metadata endpoint (at
`/_matrix/saml2/metadata.xml`), and a SAML2 response receiver (at
`/_matrix/saml2/authn_response`). If the SAML2 response matches what's been
configured, we complete the SSO login flow by redirecting to the client url
(aka `RelayState` in SAML2 jargon) with a login token.
What we don't yet have is anything to build a SAML2 request and redirect the
user to the identity provider. That is left as an exercise for the reader.
This is mostly factoring out the post-CAS-login code to somewhere we can reuse
it for other SSO flows, but it also fixes the userid mapping while we're at it.
* Rip out half-implemented m.login.saml2 support
This was implemented in an odd way that left most of the work to the client, in
a way that I really didn't understand. It's going to be a pain to maintain, so
let's start by ripping it out.
* drop undocumented dependency on dateutil
It turns out we were relying on dateutil being pulled in transitively by
pysaml2. There's no need for that bloat.
* Add better diagnostics to flakey keyring test
* fix interpolation fail
* Check logcontexts before and after each test
* update changelog
* update changelog
* Some words about garbage collections and logcontexts
* Do a GC after each test to fix logcontext leaks
This feels like an awful hack, but...
* changelog
Currently when fetching state groups from the data store we make two
hits two the database: once for members and once for non-members (unless
request is filtered to one or the other). This adds needless load to the
datbase, so this PR refactors the lookup to make only a single database
hit.
Debug tests
Try printing the channel
fix
Import and use six
Remove debugging
Disable captcha
Add some mocks
Define the URL
Fix the clock?
Less rendering?
use the other render
Complete the dummy auth stage
Fix last stage of the test
Remove mocks we don't need
Fixes a bug introduced in https://github.com/matrix-org/synapse/pull/1783 which
meant that single backslashes were not allowed in event field filters.
The intention here is to allow single-backslashes, but disallow
double-backslashes.
Broadly three things here:
* disable W504 which seems a bit whacko
* remove a bunch of `as e` expressions from exception handlers that don't use
them
* use `r""` for strings which include backslashes
Also, we don't use pep8 any more, so we can get rid of the duplicate config
there.
This test didn't do what it claimed to do, and what it claimed to do was the
same as test_cant_hide_direct_ancestors anyway.
This stuff is tested by sytest anyway.
when processing incoming transactions, it can be hard to see what's going on,
because we process a bunch of stuff in parallel, and because we may end up
recursively working our way through a chain of three or four events.
This commit creates a way to use logcontexts to add the relevant event ids to
the log lines.
Older Twisted (18.4.0) returns TimeoutError instead of
ConnectingCancelledError when connection times out.
This change allows tests to be compatible with this behaviour.
Signed-off-by: Oleg Girko <ol@infoserver.lv>
ExpiringCache required that `start()` be called before it would actually
start expiring entries. A number of places didn't do that.
This PR removes `start` from ExpiringCache, and automatically starts
backround reaping process on creation instead.
We want to wait until we have read the response body before we log the request
as complete, otherwise a confusing thing happens where the request appears to
have completed, but we later fail it.
To do this, we factor the salient details of a request out to a separate
object, which can then keep track of the txn_id, so that it can be logged.
Splits the state_group_cache in two.
One half contains normal state events; the other contains member events.
The idea is that the lazyloading common case of: "I want a subset of member events plus all of the other state" can be accomplished efficiently by splitting the cache into two, and asking for "all events" from the non-members cache, and "just these keys" from the members cache. This means we can avoid having to make DictionaryCache aware of these sort of complicated queries, whilst letting LL requests benefit from the caching.
Previously we were unable to sensibly use the caching and had to pull all state from the DB irrespective of the filtering, which made things slow. Hopefully fixes https://github.com/matrix-org/synapse/issues/3720.