(instead of everywhere that writes a response. Or rather, the subset of places
which write responses where we haven't forgotten it).
This also means that we don't have to have the mysterious version_string
attribute in anything with a request handler.
Unfortunately it does mean that we have to pass the version string wherever we
instantiate a SynapseSite, which has been c&ped 150 times, but that is code
that ought to be cleaned up anyway really.
This is useful in its own right, because server.py is full of stuff; but more
importantly, I want to do some refactoring that will cause a circular reference
as it is.
It is especially important that sync requests don't get cached, as if a
sync returns the same token given then the client will call sync with
the same parameters again. If the previous response was cached it will
get reused, resulting in the client tight looping making the same
request and never making any progress.
In general, clients will expect to get up to date data when requesting
APIs, and so its safer to do a blanket no cache policy than only
whitelisting APIs that we know will break things if they get cached.
It's useful to know when there are peaks in incoming requests - which isn't
quite the same as there being peaks in outgoing responses, due to the time
taken to handle requests.
Add federation_domain_whitelist
gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
Avoid throwing a (harmless) exception when we try to write an error response to
an http request where the client has disconnected.
This comes up as a CRITICAL error in the logs which tends to mislead people
into thinking there's an actual problem
For each request, track the amount of time spent waiting for a db
connection. This entails adding it to the LoggingContext and we may as well add
metrics for it while we are passing.
In order to circumvent the number of duplicate foo:count metrics increasing
without bounds, it's time for a rearrangement.
The following are all deprecated, and replaced with synapse_util_metrics_block_count:
synapse_util_metrics_block_timer:count
synapse_util_metrics_block_ru_utime:count
synapse_util_metrics_block_ru_stime:count
synapse_util_metrics_block_db_txn_count:count
synapse_util_metrics_block_db_txn_duration:count
The following are all deprecated, and replaced with synapse_http_server_response_count:
synapse_http_server_requests
synapse_http_server_response_time:count
synapse_http_server_response_ru_utime:count
synapse_http_server_response_ru_stime:count
synapse_http_server_response_db_txn_count:count
synapse_http_server_response_db_txn_duration:count
The following are renamed (the old metrics are kept for now, but deprecated):
synapse_util_metrics_block_timer:total ->
synapse_util_metrics_block_time_seconds
synapse_util_metrics_block_ru_utime:total ->
synapse_util_metrics_block_ru_utime_seconds
synapse_util_metrics_block_ru_stime:total ->
synapse_util_metrics_block_ru_stime_seconds
synapse_util_metrics_block_db_txn_count:total ->
synapse_util_metrics_block_db_txn_count
synapse_util_metrics_block_db_txn_duration:total ->
synapse_util_metrics_block_db_txn_duration_seconds
synapse_http_server_response_time:total ->
synapse_http_server_response_time_seconds
synapse_http_server_response_ru_utime:total ->
synapse_http_server_response_ru_utime_seconds
synapse_http_server_response_ru_stime:total ->
synapse_http_server_response_ru_stime_seconds
synapse_http_server_response_db_txn_count:total ->
synapse_http_server_response_db_txn_count
synapse_http_server_response_db_txn_duration:total
synapse_http_server_response_db_txn_duration_seconds
If somebody sends us a request where the the body is invalid utf-8, we should
return a 400 rather than a 500. (json.loads throws a UnicodeError in this
situation)
We might as well catch all Exceptions here: it seems very unlikely that we
would get a request that *isn't caused by invalid json.
preserve_context_over_fn uses a ContextPreservingDeferred, which only restores
context for the duration of its callbacks, which isn't really correct, and
means that subsequent operations in the same request can end up without their
logcontexts.
When we proxy a media request to a remote server, add a query-param, which will
tell the remote server to 404 if it doesn't recognise the server_name.
This should fix a routing loop where the server keeps forwarding back to
itself.
Also improves the error handling on remote media fetches, so that we don't
always return a rather obscure 502.
The abort() method calls loseConnection() which tries to shutdown the
TLS connection cleanly. We now call abortConnection() directly which
should promptly close both the TLS connection and the underlying TCP
connection.
I also added some TODO markers to consider cancelling the old previous
timeout rather than checking time.time(). But given how urgently we want
to get this code released I'd rather leave the existing code with the
duplicate timeouts and the time.time() check.