Handle federation inbound instances being killed more gracefully (#11262)

* Make lock better handle process being killed

If the process gets killed and restarted (so that it didn't have a
chance to drop its locks gracefully) then there may still be locks in
the DB that are for the same instance that haven't yet timed out but are
safe to delete.

We handle this case by a) checking if the current instance already has
taken out the lock, and b) if not then ignoring locks that are for the
same instance.

* Periodically check for old staged events

This is to protect against other instances dying and their locks timing
out.
This commit is contained in:
Erik Johnston 2021-11-08 09:54:47 +00:00 committed by GitHub
parent 9799c569bb
commit 98c8fc6ce8
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
3 changed files with 27 additions and 10 deletions

View file

@ -213,6 +213,11 @@ class FederationServer(FederationBase):
self._started_handling_of_staged_events = True
self._handle_old_staged_events()
# Start a periodic check for old staged events. This is to handle
# the case where locks time out, e.g. if another process gets killed
# without dropping its locks.
self._clock.looping_call(self._handle_old_staged_events, 60 * 1000)
# keep this as early as possible to make the calculated origin ts as
# accurate as possible.
request_time = self._clock.time_msec()