Put a cache on /state_ids (#7931)

If we send out an event which refers to `prev_events` which other servers in
the federation are missing, then (after a round or two of backfill attempts),
they will end up asking us for `/state_ids` at a particular point in the DAG.

As per https://github.com/matrix-org/synapse/issues/7893, this is quite
expensive, and we tend to see lots of very similar requests around the same
time.

We can therefore handle this much more efficiently by using a cache, which (a)
ensures that if we see the same request from multiple servers (or even the same
server, multiple times), then they share the result, and (b) any other servers
that miss the initial excitement can also benefit from the work.

[It's interesting to note that `/state` has a cache for exactly this
reason. `/state` is now essentially unused and replaced with `/state_ids`, but
evidently when we replaced it we forgot to add a cache to the new endpoint.]
This commit is contained in:
Richard van der Hoff 2020-07-23 18:38:19 +01:00 committed by GitHub
parent 4876af06dd
commit 7078866969
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 12 additions and 2 deletions

1
changelog.d/7931.feature Normal file
View File

@ -0,0 +1 @@
Cache responses to `/_matrix/federation/v1/state_ids` to reduce duplicated work.

View File

@ -109,6 +109,9 @@ class FederationServer(FederationBase):
# We cache responses to state queries, as they take a while and often # We cache responses to state queries, as they take a while and often
# come in waves. # come in waves.
self._state_resp_cache = ResponseCache(hs, "state_resp", timeout_ms=30000) self._state_resp_cache = ResponseCache(hs, "state_resp", timeout_ms=30000)
self._state_ids_resp_cache = ResponseCache(
hs, "state_ids_resp", timeout_ms=30000
)
async def on_backfill_request( async def on_backfill_request(
self, origin: str, room_id: str, versions: List[str], limit: int self, origin: str, room_id: str, versions: List[str], limit: int
@ -376,10 +379,16 @@ class FederationServer(FederationBase):
if not in_room: if not in_room:
raise AuthError(403, "Host not in room.") raise AuthError(403, "Host not in room.")
resp = await self._state_ids_resp_cache.wrap(
(room_id, event_id), self._on_state_ids_request_compute, room_id, event_id,
)
return 200, resp
async def _on_state_ids_request_compute(self, room_id, event_id):
state_ids = await self.handler.get_state_ids_for_pdu(room_id, event_id) state_ids = await self.handler.get_state_ids_for_pdu(room_id, event_id)
auth_chain_ids = await self.store.get_auth_chain_ids(state_ids) auth_chain_ids = await self.store.get_auth_chain_ids(state_ids)
return {"pdu_ids": state_ids, "auth_chain_ids": auth_chain_ids}
return 200, {"pdu_ids": state_ids, "auth_chain_ids": auth_chain_ids}
async def _on_context_state_request_compute( async def _on_context_state_request_compute(
self, room_id: str, event_id: str self, room_id: str, event_id: str