Fix MSC3030 /timestamp_to_event returning outliers that it has no idea whether are near a gap or not (#14215)

Fix MSC3030 `/timestamp_to_event` endpoint returning `outliers` that it has no idea whether are near a gap or not (and therefore unable to determine whether it's actually the closest event). The reason Synapse doesn't know whether an `outlier` is next to a gap is because our gap checks rely on entries in the `event_edges`, `event_forward_extremeties`, and `event_backward_extremities` tables which is [not the case for `outliers`](2c63cdcc3f/docs/development/room-dag-concepts.md (outliers)).

Also fixes MSC3030 Complement `can_paginate_after_getting_remote_event_from_timestamp_to_event_endpoint` test flake.  Although this acted flakey in Complement, if `sync_partial_state` raced and beat us before `/timestamp_to_event`, then even if we retried the failing `/context` request it wouldn't work until we made this Synapse change. With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.

Fix  https://github.com/matrix-org/synapse/issues/13944


### Why did this fail before? Why was it flakey?

Sleuthing the server logs on the [CI failure](https://github.com/matrix-org/synapse/actions/runs/3149623842/jobs/5121449357#step:5:5805), it looks like `hs2:/timestamp_to_event` found `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` event locally. Then when we went and asked for it via `/context`, since it's an `outlier`, it was filtered out of the results -> `You don't have permission to access that event.`

This is reproducible when `sync_partial_state` races and persists `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` before we evaluate `get_event_for_timestamp(...)`. To consistently reproduce locally, just add a delay at the [start of `get_event_for_timestamp(...)`](cb20b885cb/synapse/handlers/room.py (L1470-L1496)) so it always runs after `sync_partial_state` completes.

```py
from twisted.internet import task as twisted_task
d = twisted_task.deferLater(self.hs.get_reactor(), 3.5)
await d
```

In a run where it passes, on `hs2`, `get_event_for_timestamp(...)` finds a different event locally which is next to a gap and we request from a closer one from `hs1` which gets backfilled. And since the backfilled event is not an `outlier`, it's returned as expected during `/context`.

With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.
This commit is contained in:
Eric Eastwood 2022-10-18 19:46:25 -05:00 committed by GitHub
parent 2a76a7369f
commit fa8616e65c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 104 additions and 21 deletions

1
changelog.d/14215.bugfix Normal file
View File

@ -0,0 +1 @@
Fix [MSC3030](https://github.com/matrix-org/matrix-spec-proposals/pull/3030) `/timestamp_to_event` endpoint returning potentially inaccurate closest events with `outliers` present.

View File

@ -1971,12 +1971,17 @@ class EventsWorkerStore(SQLBaseStore):
Args: Args:
room_id: room where the event lives room_id: room where the event lives
event_id: event to check event: event to check (can't be an `outlier`)
Returns: Returns:
Boolean indicating whether it's an extremity Boolean indicating whether it's an extremity
""" """
assert not event.internal_metadata.is_outlier(), (
"is_event_next_to_backward_gap(...) can't be used with `outlier` events. "
"This function relies on `event_backward_extremities` which won't be filled in for `outliers`."
)
def is_event_next_to_backward_gap_txn(txn: LoggingTransaction) -> bool: def is_event_next_to_backward_gap_txn(txn: LoggingTransaction) -> bool:
# If the event in question has any of its prev_events listed as a # If the event in question has any of its prev_events listed as a
# backward extremity, it's next to a gap. # backward extremity, it's next to a gap.
@ -2026,12 +2031,17 @@ class EventsWorkerStore(SQLBaseStore):
Args: Args:
room_id: room where the event lives room_id: room where the event lives
event_id: event to check event: event to check (can't be an `outlier`)
Returns: Returns:
Boolean indicating whether it's an extremity Boolean indicating whether it's an extremity
""" """
assert not event.internal_metadata.is_outlier(), (
"is_event_next_to_forward_gap(...) can't be used with `outlier` events. "
"This function relies on `event_edges` and `event_forward_extremities` which won't be filled in for `outliers`."
)
def is_event_next_to_gap_txn(txn: LoggingTransaction) -> bool: def is_event_next_to_gap_txn(txn: LoggingTransaction) -> bool:
# If the event in question is a forward extremity, we will just # If the event in question is a forward extremity, we will just
# consider any potential forward gap as not a gap since it's one of # consider any potential forward gap as not a gap since it's one of
@ -2112,27 +2122,6 @@ class EventsWorkerStore(SQLBaseStore):
The closest event_id otherwise None if we can't find any event in The closest event_id otherwise None if we can't find any event in
the given direction. the given direction.
""" """
sql_template = """
SELECT event_id FROM events
LEFT JOIN rejections USING (event_id)
WHERE
origin_server_ts %s ?
AND room_id = ?
/* Make sure event is not rejected */
AND rejections.event_id IS NULL
/**
* First sort by the message timestamp. If the message timestamps are the
* same, we want the message that logically comes "next" (before/after
* the given timestamp) based on the DAG and its topological order (`depth`).
* Finally, we can tie-break based on when it was received on the server
* (`stream_ordering`).
*/
ORDER BY origin_server_ts %s, depth %s, stream_ordering %s
LIMIT 1;
"""
def get_event_id_for_timestamp_txn(txn: LoggingTransaction) -> Optional[str]:
if direction == "b": if direction == "b":
# Find closest event *before* a given timestamp. We use descending # Find closest event *before* a given timestamp. We use descending
# (which gives values largest to smallest) because we want the # (which gives values largest to smallest) because we want the
@ -2146,9 +2135,37 @@ class EventsWorkerStore(SQLBaseStore):
comparison_operator = ">=" comparison_operator = ">="
order = "ASC" order = "ASC"
sql_template = f"""
SELECT event_id FROM events
LEFT JOIN rejections USING (event_id)
WHERE
room_id = ?
AND origin_server_ts {comparison_operator} ?
/**
* Make sure the event isn't an `outlier` because we have no way
* to later check whether it's next to a gap. `outliers` do not
* have entries in the `event_edges`, `event_forward_extremeties`,
* and `event_backward_extremities` tables to check against
* (used by `is_event_next_to_backward_gap` and `is_event_next_to_forward_gap`).
*/
AND NOT outlier
/* Make sure event is not rejected */
AND rejections.event_id IS NULL
/**
* First sort by the message timestamp. If the message timestamps are the
* same, we want the message that logically comes "next" (before/after
* the given timestamp) based on the DAG and its topological order (`depth`).
* Finally, we can tie-break based on when it was received on the server
* (`stream_ordering`).
*/
ORDER BY origin_server_ts {order}, depth {order}, stream_ordering {order}
LIMIT 1;
"""
def get_event_id_for_timestamp_txn(txn: LoggingTransaction) -> Optional[str]:
txn.execute( txn.execute(
sql_template % (comparison_operator, order, order, order), sql_template,
(timestamp, room_id), (room_id, timestamp),
) )
row = txn.fetchone() row = txn.fetchone()
if row: if row:

View File

@ -39,6 +39,8 @@ from synapse.api.constants import (
) )
from synapse.api.errors import Codes, HttpResponseException from synapse.api.errors import Codes, HttpResponseException
from synapse.appservice import ApplicationService from synapse.appservice import ApplicationService
from synapse.events import EventBase
from synapse.events.snapshot import EventContext
from synapse.handlers.pagination import PurgeStatus from synapse.handlers.pagination import PurgeStatus
from synapse.rest import admin from synapse.rest import admin
from synapse.rest.client import account, directory, login, profile, register, room, sync from synapse.rest.client import account, directory, login, profile, register, room, sync
@ -51,6 +53,7 @@ from tests import unittest
from tests.http.server._base import make_request_with_cancellation_test from tests.http.server._base import make_request_with_cancellation_test
from tests.storage.test_stream import PaginationTestCase from tests.storage.test_stream import PaginationTestCase
from tests.test_utils import make_awaitable from tests.test_utils import make_awaitable
from tests.test_utils.event_injection import create_event
PATH_PREFIX = b"/_matrix/client/api/v1" PATH_PREFIX = b"/_matrix/client/api/v1"
@ -3486,3 +3489,65 @@ class ThreepidInviteTestCase(unittest.HomeserverTestCase):
) )
self.assertEqual(channel.code, 400) self.assertEqual(channel.code, 400)
self.assertEqual(channel.json_body["errcode"], "M_MISSING_PARAM") self.assertEqual(channel.json_body["errcode"], "M_MISSING_PARAM")
class TimestampLookupTestCase(unittest.HomeserverTestCase):
servlets = [
admin.register_servlets,
room.register_servlets,
login.register_servlets,
]
def default_config(self) -> JsonDict:
config = super().default_config()
config["experimental_features"] = {"msc3030_enabled": True}
return config
def prepare(self, reactor: MemoryReactor, clock: Clock, hs: HomeServer) -> None:
self._storage_controllers = self.hs.get_storage_controllers()
self.room_owner = self.register_user("room_owner", "test")
self.room_owner_tok = self.login("room_owner", "test")
def _inject_outlier(self, room_id: str) -> EventBase:
event, _context = self.get_success(
create_event(
self.hs,
room_id=room_id,
type="m.test",
sender="@test_remote_user:remote",
)
)
event.internal_metadata.outlier = True
self.get_success(
self._storage_controllers.persistence.persist_event(
event, EventContext.for_outlier(self._storage_controllers)
)
)
return event
def test_no_outliers(self) -> None:
"""
Test to make sure `/timestamp_to_event` does not return `outlier` events.
We're unable to determine whether an `outlier` is next to a gap so we
don't know whether it's actually the closest event. Instead, let's just
ignore `outliers` with this endpoint.
This test is really seeing that we choose the non-`outlier` event behind the
`outlier`. Since the gap checking logic considers the latest message in the room
as *not* next to a gap, asking over federation does not come into play here.
"""
room_id = self.helper.create_room_as(self.room_owner, tok=self.room_owner_tok)
outlier_event = self._inject_outlier(room_id)
channel = self.make_request(
"GET",
f"/_matrix/client/unstable/org.matrix.msc3030/rooms/{room_id}/timestamp_to_event?dir=b&ts={outlier_event.origin_server_ts}",
access_token=self.room_owner_tok,
)
self.assertEqual(HTTPStatus.OK, channel.code, msg=channel.json_body)
# Make sure the outlier event is not returned
self.assertNotEqual(channel.json_body["event_id"], outlier_event.event_id)