The problem with dumping all of the json response into the Request object at
once is that doing so starts the timeout for the next request to be received:
so if it takes longer than 60s to stream back the response to the client, the
client never gets it.
The correct solution is to use a Producer; then the timeout is only started
once all of the content is sent over the TCP connection.
This commit moves a bunch of the logic for deciding when to log the receipt and
completion of HTTP requests into SynapseRequest, rather than in the request
handling wrappers.
Advantages of this are:
* we get logs for *all* requests (including OPTIONS and HEADs), rather than
just those that end up hitting handlers we've remembered to decorate
correctly.
* when a request handler wires up a Producer (as the media stuff does
currently, and as other things will do soon), we log at the point that all
of the traffic has been sent to the client.
We really shouldn't be sending all CodeMessageExceptions back over the C-S API;
it will include things like 401s which we shouldn't proxy.
That means that we need to explicitly turn a few HttpResponseExceptions into
SynapseErrors in the federation layer.
The effect of the latter is that the matrix errcode will get passed through
correctly to calling clients, which might help with some of the random
M_UNKNOWN errors when trying to join rooms.
(instead of everywhere that writes a response. Or rather, the subset of places
which write responses where we haven't forgotten it).
This also means that we don't have to have the mysterious version_string
attribute in anything with a request handler.
Unfortunately it does mean that we have to pass the version string wherever we
instantiate a SynapseSite, which has been c&ped 150 times, but that is code
that ought to be cleaned up anyway really.
This is useful in its own right, because server.py is full of stuff; but more
importantly, I want to do some refactoring that will cause a circular reference
as it is.
It is especially important that sync requests don't get cached, as if a
sync returns the same token given then the client will call sync with
the same parameters again. If the previous response was cached it will
get reused, resulting in the client tight looping making the same
request and never making any progress.
In general, clients will expect to get up to date data when requesting
APIs, and so its safer to do a blanket no cache policy than only
whitelisting APIs that we know will break things if they get cached.
It's useful to know when there are peaks in incoming requests - which isn't
quite the same as there being peaks in outgoing responses, due to the time
taken to handle requests.
Avoid throwing a (harmless) exception when we try to write an error response to
an http request where the client has disconnected.
This comes up as a CRITICAL error in the logs which tends to mislead people
into thinking there's an actual problem
For each request, track the amount of time spent waiting for a db
connection. This entails adding it to the LoggingContext and we may as well add
metrics for it while we are passing.
In order to circumvent the number of duplicate foo:count metrics increasing
without bounds, it's time for a rearrangement.
The following are all deprecated, and replaced with synapse_util_metrics_block_count:
synapse_util_metrics_block_timer:count
synapse_util_metrics_block_ru_utime:count
synapse_util_metrics_block_ru_stime:count
synapse_util_metrics_block_db_txn_count:count
synapse_util_metrics_block_db_txn_duration:count
The following are all deprecated, and replaced with synapse_http_server_response_count:
synapse_http_server_requests
synapse_http_server_response_time:count
synapse_http_server_response_ru_utime:count
synapse_http_server_response_ru_stime:count
synapse_http_server_response_db_txn_count:count
synapse_http_server_response_db_txn_duration:count
The following are renamed (the old metrics are kept for now, but deprecated):
synapse_util_metrics_block_timer:total ->
synapse_util_metrics_block_time_seconds
synapse_util_metrics_block_ru_utime:total ->
synapse_util_metrics_block_ru_utime_seconds
synapse_util_metrics_block_ru_stime:total ->
synapse_util_metrics_block_ru_stime_seconds
synapse_util_metrics_block_db_txn_count:total ->
synapse_util_metrics_block_db_txn_count
synapse_util_metrics_block_db_txn_duration:total ->
synapse_util_metrics_block_db_txn_duration_seconds
synapse_http_server_response_time:total ->
synapse_http_server_response_time_seconds
synapse_http_server_response_ru_utime:total ->
synapse_http_server_response_ru_utime_seconds
synapse_http_server_response_ru_stime:total ->
synapse_http_server_response_ru_stime_seconds
synapse_http_server_response_db_txn_count:total ->
synapse_http_server_response_db_txn_count
synapse_http_server_response_db_txn_duration:total
synapse_http_server_response_db_txn_duration_seconds
* Now only the auth part goes to fallback, not the whole operation
* Auth fallback is a normal API endpoint, not a static page
* Params like the recaptcha pubkey can just live in the config
Involves a little engineering on JsonResource so its servlets aren't always forced to return JSON. I should document this more, in fact I'll do that now.