Handle federation inbound instances being killed more gracefully (#11262)

* Make lock better handle process being killed If the process gets killed and restarted (so that it didn't have a chance to drop its locks gracefully) then there may still be locks in the DB that are for the same instance that haven't yet timed out but are safe to delete. We handle this case by a) checking if the current instance already has taken out the lock, and b) if not then ignoring locks that are for the same instance. * Periodically check for old staged events This is to protect against other instances dying and their locks timing out.
2025-08-06 11:44:11 -04:00 · 2021-11-08 09:54:47 +00:00 · 2021-11-08 09:54:47 +00:00 · 98c8fc6ce8
commit 98c8fc6ce8
parent 9799c569bb
3 changed files with 27 additions and 10 deletions
--- a/synapse/federation/federation_server.py
+++ b/synapse/federation/federation_server.py
@ -213,6 +213,11 @@ class FederationServer(FederationBase):
            self._started_handling_of_staged_events = True
            self._handle_old_staged_events()

+            # Start a periodic check for old staged events. This is to handle
+            # the case where locks time out, e.g. if another process gets killed
+            # without dropping its locks.
+            self._clock.looping_call(self._handle_old_staged_events, 60 * 1000)
+
        # keep this as early as possible to make the calculated origin ts as
        # accurate as possible.
        request_time = self._clock.time_msec()