Avoid throwing a (harmless) exception when we try to write an error response to
an http request where the client has disconnected.
This comes up as a CRITICAL error in the logs which tends to mislead people
into thinking there's an actual problem
In order to circumvent the number of duplicate foo:count metrics increasing
without bounds, it's time for a rearrangement.
The following are all deprecated, and replaced with synapse_util_metrics_block_count:
synapse_util_metrics_block_timer:count
synapse_util_metrics_block_ru_utime:count
synapse_util_metrics_block_ru_stime:count
synapse_util_metrics_block_db_txn_count:count
synapse_util_metrics_block_db_txn_duration:count
The following are all deprecated, and replaced with synapse_http_server_response_count:
synapse_http_server_requests
synapse_http_server_response_time:count
synapse_http_server_response_ru_utime:count
synapse_http_server_response_ru_stime:count
synapse_http_server_response_db_txn_count:count
synapse_http_server_response_db_txn_duration:count
The following are renamed (the old metrics are kept for now, but deprecated):
synapse_util_metrics_block_timer:total ->
synapse_util_metrics_block_time_seconds
synapse_util_metrics_block_ru_utime:total ->
synapse_util_metrics_block_ru_utime_seconds
synapse_util_metrics_block_ru_stime:total ->
synapse_util_metrics_block_ru_stime_seconds
synapse_util_metrics_block_db_txn_count:total ->
synapse_util_metrics_block_db_txn_count
synapse_util_metrics_block_db_txn_duration:total ->
synapse_util_metrics_block_db_txn_duration_seconds
synapse_http_server_response_time:total ->
synapse_http_server_response_time_seconds
synapse_http_server_response_ru_utime:total ->
synapse_http_server_response_ru_utime_seconds
synapse_http_server_response_ru_stime:total ->
synapse_http_server_response_ru_stime_seconds
synapse_http_server_response_db_txn_count:total ->
synapse_http_server_response_db_txn_count
synapse_http_server_response_db_txn_duration:total
synapse_http_server_response_db_txn_duration_seconds
If somebody sends us a request where the the body is invalid utf-8, we should
return a 400 rather than a 500. (json.loads throws a UnicodeError in this
situation)
We might as well catch all Exceptions here: it seems very unlikely that we
would get a request that *isn't caused by invalid json.
preserve_context_over_fn uses a ContextPreservingDeferred, which only restores
context for the duration of its callbacks, which isn't really correct, and
means that subsequent operations in the same request can end up without their
logcontexts.
When we proxy a media request to a remote server, add a query-param, which will
tell the remote server to 404 if it doesn't recognise the server_name.
This should fix a routing loop where the server keeps forwarding back to
itself.
Also improves the error handling on remote media fetches, so that we don't
always return a rather obscure 502.
The abort() method calls loseConnection() which tries to shutdown the
TLS connection cleanly. We now call abortConnection() directly which
should promptly close both the TLS connection and the underlying TCP
connection.
I also added some TODO markers to consider cancelling the old previous
timeout rather than checking time.time(). But given how urgently we want
to get this code released I'd rather leave the existing code with the
duplicate timeouts and the time.time() check.