It's not really a problem to trust notary responses signed by the old key so
long as we are also doing TLS validation.
This commit adds a check to the config parsing code at startup to check that
we do not have the insecure matrix.org key without tls validation, and refuses
to start without it.
This allows us to remove the rather alarming-looking warning which happens at
runtime.
There are a few changes going on here:
* We make checking the signature on a key server response optional: if no
verify_keys are specified, we trust to TLS to validate the connection.
* We change the default config so that it does not require responses to be
signed by the old key.
* We replace the old 'perspectives' config with 'trusted_key_servers', which
is also formatted slightly differently.
* We emit a warning to the logs every time we trust a key server response
signed by the old key.
Also:
* rename VerifyKeyRequest->VerifyJsonRequest
* calculate key_ids on VerifyJsonRequest construction
* refactor things to pass around VerifyJsonRequests instead of 4-tuples
It takes at least 20 minutes to work through the long_retries schedule (11
attempts, each with a 60 second timeout, and 60 seconds between each request),
so if the notary server isn't returning within the timeout, we'll just end up
blocking whatever request is happening for 20 minutes.
Ain't nobody got time for that.
When handling incoming federation requests, make sure that we have an
up-to-date copy of the signing key.
We do not yet enforce the validity period for event signatures.
The verify_request deferred already returns a suitable SynapseError, so I don't
really know what we expect to achieve by doing more wrapping, other than log
spam.
Fixes#4278.
The list of server names was redundant, since it was equivalent to the keys on
the server_to_deferred map. This reduces the number of large lists being passed
around, and has the benefit of deduplicating the entries in `wait_on`.
Rather than have three methods which have to have the same interface,
factor out a separate interface which is provided by three implementations.
I find it easier to grok the code this way.
This is a first step to checking that the key is valid at the required moment.
The idea here is that, rather than passing VerifyKey objects in and out of the
storage layer, we instead pass FetchKeyResult objects, which simply wrap the
VerifyKey and add a valid_until_ts field.
* Pass time_added_ms into process_v2_response
* Simplify process_v2_response
We can merge old_verify_keys into verify_keys, and reduce the number of dicts
flying around.
Storing server keys hammered the database a bit. This replaces the
implementation which stored a single key, with one which can do many updates at
once.
There's no point in collecting a merged dict of keys: it is sufficient to
consider just the new keys which have been fetched by the most recent
key_fetch_fns.
Make this just return the key dict, rather than a single-entry dict mapping the
server name to the key dict. It's easy for the caller to get the server name
from from the response object anyway.
This is in preparation for making EventBuilder format agnostic, which
means event signing should be done against the event dict rather than
the EventBuilder object.
The problem here is that we have cut-and-pasted an impl from Twisted, and then
failed to maintain it. It was fixed in Twisted in
https://github.com/twisted/twisted/pull/1047/files; let's do the same here.
Broadly three things here:
* disable W504 which seems a bit whacko
* remove a bunch of `as e` expressions from exception handlers that don't use
them
* use `r""` for strings which include backslashes
Also, we don't use pep8 any more, so we can get rid of the duplicate config
there.
Firstly, don't swallow the reason for the failure
Secondly, don't assume all exceptions are verification failures
Thirdly, log a bit of info about the key being used if debug is enabled
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
There were a bunch of places where we fire off a process to happen in the
background, but don't have any exception handling on it - instead relying on
the unhandled error being logged when the relevent deferred gets
garbage-collected.
This is unsatisfactory for a number of reasons:
- logging on garbage collection is best-effort and may happen some time after
the error, if at all
- it can be hard to figure out where the error actually happened.
- it is logged as a scary CRITICAL error which (a) I always forget to grep for
and (b) it's not really CRITICAL if a background process we don't care about
fails.
So this is an attempt to add exception handling to everything we fire off into
the background.
Doing this I learned e.message was pretty shortlived, added in 2.6,
they realized it was a bad idea and deprecated it in 2.7
Signed-off-by: Adrian Tschira <nota@notafile.com>
matrix-dev has an event (`$/6ANj/9QWQyd71N6DpRQPf+SDUu11+HVMeKSpMzBCwM:zemos.net`)
which has no `hashes` member.
Check for missing `hashes` element in events.
preserve_context_over_fn is essentially broken, because (a) it pointlessly
drops the current logcontext before calling its wrapped function, which means
we don't get any useful logcontexts for _handle_key_deferred; (b) it wraps the
resulting deferred in a _PreservingContextDeferred, which is very dangerous
because you then can't yield on it without leaking context back into the
reactor.
Instead, let's specify that the resultant deferreds call their callbacks with
no logcontext.
... which means that logcontexts can be correctly preserved for the stuff it
does.
get_server_verify_keys is now called with the logcontext, so needs to
preserve_fn when it fires off its nested inlineCallbacks function.
Also renames get_server_verify_keys to reflect the fact it's meant to be
private.
If the verify_request.deferred has already completed, then `remove_deferreds`
will be called immediately. It therefore might resolve the server_to_deferred
deferred while there are still other requests for that server in flight.
To avoid that, we should build the complete list of requests, and *then* add the
callbacks.
Define that it is run with no log context, and make sure that happens.
If we aren't careful to reset the logcontext, we can't bung the deferreds into
defer.gatherResults etc. We don't actually do that directly, but we *do*
resolve other deferreds from affected callbacks (notably the server_to_deferred
map in _start_key_lookups), and those *do* get passed into
defer.gatherResults. It turns out that this way ends up being least confusing.
There's no need for this to be a nested definition; pulling it out not only
makes it more efficient, but makes it easier to check that it's not accessing
any local variables it shouldn't be.
I'm still unclear on what the intended behaviour for
`verify_json_objects_for_server` is, but at least I now understand the
behaviour of most of the things it calls...