This is the first of two PRs which seek to address #8518. This first PR lays the groundwork by extending ResponseCache; a second PR (#10158) will update the SyncHandler to actually use it, and fix the bug.
The idea here is that we allow the callback given to ResponseCache.wrap to decide whether its result should be cached or not. We do that by (optionally) passing a ResponseCacheContext into it, which it can modify.
==============================
Bugfixes
--------
- Fix a bug which caused presence updates to stop working some time after a restart, when using a presence writer worker. Broke in v1.33.0. ([\#10149](https://github.com/matrix-org/synapse/issues/10149))
- Fix a bug when using federation sender worker where it would send out more presence updates than necessary, leading to high resource usage. Broke in v1.33.0. ([\#10163](https://github.com/matrix-org/synapse/issues/10163))
- Fix a bug where Synapse could send the same presence update to a remote twice. ([\#10165](https://github.com/matrix-org/synapse/issues/10165))
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEBTGR3/RnAzBGUif3pULk7RsPrAkFAmDDdrcQHGVyaWtAbWF0
cml4Lm9yZwAKCRClQuTtGw+sCTzfB/4qaTqW2mBiwjf52SmOu6HNyd8uQd6nLIAZ
mJC218Wakh2tT0W4iVkKwUgpuHFbtcy0rSNTlXtW4kG8XzhpTvT56RH9qls99aD3
SGKqYpOv6NkWZibN6NdVvLDW85ixficDTXco3BljRCIMlORhY0swy+LWLwksdjWj
6kQ+Gi/QAtKP3Pt5epYs0Ix5o1T94DfZOWE//mqBhG5cMDAw/K/G/c8tRfEjclt9
wACBjmt2fw/Lbn9j3b0feNVp+xnFcFNuAK2bSEd8Y3yph1mhjdsIszULnM7IFNsR
Q8zg+i7PJKNq8pQjei8j8T/aKscPTPH5XGqOSLlizj15snsiwlkz
=C6lV
-----END PGP SIGNATURE-----
Merge tag 'v1.36.0rc2' into develop
Synapse 1.36.0rc2 (2021-06-11)
==============================
Bugfixes
--------
- Fix a bug which caused presence updates to stop working some time after a restart, when using a presence writer worker. Broke in v1.33.0. ([\#10149](https://github.com/matrix-org/synapse/issues/10149))
- Fix a bug when using federation sender worker where it would send out more presence updates than necessary, leading to high resource usage. Broke in v1.33.0. ([\#10163](https://github.com/matrix-org/synapse/issues/10163))
- Fix a bug where Synapse could send the same presence update to a remote twice. ([\#10165](https://github.com/matrix-org/synapse/issues/10165))
This is essentially an implementation of the proposal made at https://hackmd.io/@richvdh/BJYXQMQHO, though the details have ended up looking slightly different.
Spawned from missing messages we were seeing on `matrix.org` from a
federated Gtiter bridged room, https://gitlab.com/gitterHQ/webapp/-/issues/2770.
The underlying issue in Synapse is tracked by https://github.com/matrix-org/synapse/issues/10066
where the message and join event race and the message is `soft_failed` before the
`join` event reaches the remote federated server.
Less soft_failed events = better and usually this should only trigger for events
where people are doing bad things and trying to fuzz and fake everything.
Fixes#1834.
`get_new_events_for_appservice` internally calls `get_events_as_list`, which will filter out any rejected events. If all returned events are filtered out, `_notify_interested_services` will return without updating the last handled stream position. If there are 100 consecutive such events, processing will halt altogether.
Breaking the loop is now done by checking whether we're up-to-date with `current_max` in the loop condition, instead of relying on an empty `events` list.
Signed-off-by: Willem Mulder <14mRh4X0r@gmail.com>
If backfilling is slow then the client may time out and retry, causing
Synapse to start a new `/backfill` before the existing backfill has
finished, duplicating work.
This adds quite a lot of OpenTracing decoration for database activity. Specifically it adds tracing at four different levels:
* emit a span for each "interaction" - ie, the top level database function that we tend to call "transaction", but isn't really, because it can end up as multiple transactions.
* emit a span while we hold a database connection open
* emit a span for each database transaction - actual actual transaction.
* emit a span for each database query.
I'm aware this might be quite a lot of overhead, but even just running it on a local Synapse it looks really interesting, and I hope the overhead can be offset just by turning down the sampling frequency and finding other ways of tracing requests of interest (eg, the `force_tracing_for_users` setting).
The existing tracing reports an error each time there is a timeout, which isn't
really representative.
Additionally, we log things about the way `wait_for_events` works
(eg, the result of the callback) to the *parent* span, which is confusing.
Empirically, this helped my server considerably when handling gaps in Matrix HQ. The problem was that we would repeatedly call have_seen_events for the same set of (50K or so) auth_events, each of which would take many minutes to complete, even though it's only an index scan.