Fix HomeServers leaking during trial test runs (#15630)

This change fixes two memory leaks during `trial` test runs.

Garbage collection is disabled during each test case and a gen-0 GC is
run at the end of each test. However, when the gen-0 GC is run, the
`TestCase` object usually still holds references to the `HomeServer`
used during the test. As a result, the `HomeServer` gets promoted to
gen-1 and then never garbage collected.

Fix this by periodically running full GCs.

Additionally, fix `HomeServer`s leaking after tests that touch inbound
federation due to `FederationRateLimiter`s adding themselves to a global
set, by turning the set into a `WeakSet`.

Resolves #15622.

Signed-off-by: Sean Quah <seanq@matrix.org>
This commit is contained in:
Sean Quah 2023-05-19 11:17:12 +01:00 committed by GitHub
parent ad50510a06
commit d0de452d12
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 15 additions and 3 deletions

1
changelog.d/15630.misc Normal file
View File

@ -0,0 +1 @@
Fix two memory leaks in `trial` test runs.

View File

@ -25,10 +25,12 @@ from typing import (
Iterator,
List,
Mapping,
MutableSet,
Optional,
Set,
Tuple,
)
from weakref import WeakSet
from prometheus_client.core import Counter
from typing_extensions import ContextManager
@ -86,7 +88,9 @@ queue_wait_timer = Histogram(
)
_rate_limiter_instances: Set["FederationRateLimiter"] = set()
# This must be a `WeakSet`, otherwise we indirectly hold on to entire `HomeServer`s
# during trial test runs and leak a lot of memory.
_rate_limiter_instances: MutableSet["FederationRateLimiter"] = WeakSet()
# Protects the _rate_limiter_instances set from concurrent access
_rate_limiter_instances_lock = threading.Lock()

View File

@ -229,13 +229,20 @@ class TestCase(unittest.TestCase):
#
# The easiest way to do this would be to do a full GC after each test
# run, but that is very expensive. Instead, we disable GC (above) for
# the duration of the test so that we only need to run a gen-0 GC, which
# is a lot quicker.
# the duration of the test and only run a gen-0 GC, which is a lot
# quicker. This doesn't clean up everything, since the TestCase
# instance still holds references to objects created during the test,
# such as HomeServers, so we do a full GC every so often.
@around(self)
def tearDown(orig: Callable[[], R]) -> R:
ret = orig()
gc.collect(0)
# Run a full GC every 50 gen-0 GCs.
gen0_stats = gc.get_stats()[0]
gen0_collections = gen0_stats["collections"]
if gen0_collections % 50 == 0:
gc.collect()
gc.enable()
set_current_context(SENTINEL_CONTEXT)