* `_auth_and_persist_outliers`: mark persisted events as outliers
Mark any events that get persisted via `_auth_and_persist_outliers` as, well,
outliers.
Currently this will be a no-op as everything will already be flagged as an
outlier, but I'm going to change that.
* `process_remote_join`: stop flagging as outlier
The events are now flagged as outliers later on, by `_auth_and_persist_outliers`.
* `send_join`: remove `outlier=True`
The events created here are returned in the result of `send_join` to
`FederationHandler.do_invite_join`. From there they are passed into
`FederationEventHandler.process_remote_join`, which passes them to
`_auth_and_persist_outliers`... which sets the `outlier` flag.
* `get_event_auth`: remove `outlier=True`
stop flagging the events returned by `get_event_auth` as outliers. This method
is only called by `_get_remote_auth_chain_for_event`, which passes the results
into `_auth_and_persist_outliers`, which will flag them as outliers.
* `_get_remote_auth_chain_for_event`: remove `outlier=True`
we pass all the events into `_auth_and_persist_outliers`, which will now flag
the events as outliers.
* `_check_sigs_and_hash_and_fetch`: remove unused `outlier` parameter
This param is now never set to True, so we can remove it.
* `_check_sigs_and_hash_and_fetch_one`: remove unused `outlier` param
This is no longer set anywhere, so we can remove it.
* `get_pdu`: remove unused `outlier` parameter
... and chase it down into `get_pdu_from_destination_raw`.
* `event_from_pdu_json`: remove redundant `outlier` param
This is never set to `True`, so can be removed.
* changelog
* update docstring
Events returned by `backfill` should not be flagged as outliers.
Fixes:
```
AssertionError: null
File "synapse/handlers/federation.py", line 313, in try_backfill
dom, room_id, limit=100, extremities=extremities
File "synapse/handlers/federation_event.py", line 517, in backfill
await self._process_pulled_events(dest, events, backfilled=True)
File "synapse/handlers/federation_event.py", line 642, in _process_pulled_events
await self._process_pulled_event(origin, ev, backfilled=backfilled)
File "synapse/handlers/federation_event.py", line 669, in _process_pulled_event
assert not event.internal_metadata.is_outlier()
```
See https://sentry.matrix.org/sentry/synapse-matrixorg/issues/231992Fixes#8894.
* remove `start_active_span_from_request`
Instead, pull out a separate function, `span_context_from_request`, to extract
the parent span, which we can then pass into `start_active_span` as
normal. This seems to be clearer all round.
* Remove redundant tags from `incoming-federation-request`
These are all wrapped up inside a parent span generated in AsyncResource, so
there's no point duplicating all the tags that are set there.
* Leave request spans open until the request completes
It may take some time for the response to be encoded into JSON, and that JSON
to be streamed back to the client, and really we want that inside the top-level
span, so let's hand responsibility for closure to the SynapseRequest.
* opentracing logs for HTTP request events
* changelog
MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030
Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about.
```
GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
{
"event_id": ...
"origin_server_ts": ...
}
```
Federation API endpoint:
```
GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
{
"event_id": ...
"origin_server_ts": ...
}
```
Co-authored-by: Erik Johnston <erik@matrix.org>
* Make lock better handle process being killed
If the process gets killed and restarted (so that it didn't have a
chance to drop its locks gracefully) then there may still be locks in
the DB that are for the same instance that haven't yet timed out but are
safe to delete.
We handle this case by a) checking if the current instance already has
taken out the lock, and b) if not then ignoring locks that are for the
same instance.
* Periodically check for old staged events
This is to protect against other instances dying and their locks timing
out.
This fixes a "Event not signed by authorising server" error when
transition room member from join -> join, e.g. when updating a
display name or avatar URL for restricted rooms.
* Factor more stuff out of `_get_events_and_persist`
It turns out that the event-sorting algorithm in `_get_events_and_persist` is
also useful in other circumstances. Here we move the current
`_auth_and_persist_fetched_events` to `_auth_and_persist_fetched_events_inner`,
and then factor the sorting part out to `_auth_and_persist_fetched_events`.
* `_get_remote_auth_chain_for_event`: remove redundant `outlier` assignment
`get_event_auth` returns events with the outlier flag already set, so this is
redundant (though we need to update a test where `get_event_auth` is mocked).
* `_get_remote_auth_chain_for_event`: move existing-event tests earlier
Move a couple of tests outside the loop. This is a bit inefficient for now, but
a future commit will make it better. It should be functionally identical.
* `_get_remote_auth_chain_for_event`: use `_auth_and_persist_fetched_events`
We can use the same codepath for persisting the events fetched as part of an
auth chain as for those fetched individually by `_get_events_and_persist` for
building the state at a backwards extremity.
* `_get_remote_auth_chain_for_event`: use a dict for efficiency
`_auth_and_persist_fetched_events` sorts the events itself, so we no longer
need to care about maintaining the ordering from `get_event_auth` (and no
longer need to sort by depth in `get_event_auth`).
That means that we can use a map, making it easier to filter out events we
already have, etc.
* changelog
* `_auth_and_persist_fetched_events`: improve docstring
Here we split on_receive_pdu into two functions (on_receive_pdu and process_pulled_event), rather than having both cases in the same method. There's a tiny bit of overlap, but not that much.
If the new /hierarchy API does not exist on all destinations,
fallback to querying the /spaces API and translating the results.
This is a backwards compatibility hack since not all of the
federated homeservers will update at the same time.
* Include outlier status in `str(event)`
In places where we log event objects, knowing whether or not you're dealing
with an outlier is super useful.
* Remove duplicated logging in get_missing_events
When we process events received from get_missing_events, we log them twice
(once in `_get_missing_events_for_pdu`, and once in `on_receive_pdu`). Reduce
the duplication by removing the logging in `on_receive_pdu`, and ensuring the
call sites do sensible logging.
* log in `on_receive_pdu` when we already have the event
* Log which prev_events we are missing
* changelog
Instead of wrapping the JSON into an object, this creates concrete
instances for Transaction and Edu. This allows for improved type
hints and simplified code.
If the federation client receives an M_UNABLE_TO_AUTHORISE_JOIN or
M_UNABLE_TO_GRANT_JOIN response it will attempt another server
before giving up completely.
Improves type hints for:
* parse_{boolean,integer}
* parse_{boolean,integer}_from_args
* parse_json_{value,object}_from_request
And fixes any incorrect calls that resulted from unknown types.
This is to help with performance, where trying to connect to thousands
of hosts at once can consume a lot of CPU (due to TLS etc).
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>