Fixes a bug where rejected events were persisted with the wrong state group.
Also fixes an occasional internal-server-error when receiving events over
federation which are rejected and (possibly because they are
backwards-extremities) have no prev_group.
Fixes#6289.
* Raise an exception if accessing state for rejected events
Add some sanity checks on accessing state_group etc for
rejected events.
* Skip calculating push actions for rejected events
It didn't actually cause any bugs, because rejected events get filtered out at
various later points, but there's not point in trying to calculate the push
actions for a rejected event.
The intention here is to make it clearer which fields we can expect to be
populated when: notably, that the _event_type etc aren't used for the
synchronous impl of EventContext.
While this is not documented in the spec (but should be), Riot (and other clients) revoke 3PID invites by sending a m.room.third_party_invite event with an empty ({}) content to the room's state.
When the invited 3PID gets associated with a MXID, the identity server (which doesn't know about revocations) sends down to the MXID's homeserver all of the undelivered invites it has for this 3PID. The homeserver then tries to talk to the inviting homeserver in order to exchange these invite for m.room.member events.
When one of the invite is revoked, the inviting homeserver responds with a 500 error because it tries to extract a 'display_name' property from the content, which is empty. This might cause the invited server to consider that the server is down and not try to exchange other, valid invites (or at least delay it).
This fix handles the case of revoked invites by avoiding trying to fetch a 'display_name' from the original invite's content, and letting the m.room.member event fail the auth rules (because, since the original invite's content is empty, it doesn't have public keys), which results in sending a 403 with the correct error message to the invited server.
Another small fixup noticed during work on a larger PR. The `origin` field of `add_display_name_to_third_party_invite` is not used and likely was just carried over from the `on_PUT` method of `FederationThirdPartyInviteExchangeServlet` which, like all other servlets, provides an `origin` argument.
Since it's not used anywhere in the handler function though, we should remove it from the function arguments.
Python will return a tuple whether there are parentheses around the returned values or not.
I'm just sick of my editor complaining about this all over the place :)
I had to add quite a lot of logging to diagnose a problem with 3pid
invites - we only logged the one failure which isn't all that
informative.
NB. I'm not convinced the logic of this loop is right: I think it
should just accept a single valid signature from a trusted source
rather than fail if *any* signature is invalid. Also it should
probably not skip the rest of middle loop if a check fails? However,
I'm deliberately not changing the logic here.
Fixes that when a user exchanges a 3PID invite for a proper invite over
federation it does not include the `invite_room_state` key.
This was due to synapse incorrectly sending out two invite requests.
When processing an incoming event over federation, we may try and
resolve any unexpected differences in auth events. This is a
non-essential process and so should not stop the processing of the event
if it fails (e.g. due to the remote disappearing or not implementing the
necessary endpoints).
Fixes#3330
I was staring at this function trying to figure out wtf it was actually
doing. This is (hopefully) a non-functional refactor which makes it a bit
clearer.
Currently if a call to `/get_missing_events` fails we log an exception
and stop processing the top level event we received over federation.
Instead let's try and handle it sensibly given it is a somewhat expected
failure mode.
When filtering events to send to server we check more than just history
visibility. However when deciding whether to backfill or not we only
care about the history visibility.
The transaction queue only sends out events that we generate. This was
done by checking domain of event ID, but that can no longer be used.
Instead, we may as well use the sender field.
We currently pass FrozenEvent instead of `dict` to
`compute_event_signature`, which works by accident due to `dict(event)`
producing the correct result.
This fixes PR #4493 commit 855a151
The validator was being run on the EventBuilder objects, and so the
validator only checked a subset of fields. With the upcoming
EventBuilder refactor even fewer fields will be there to validate.
To get around this we split the validation into those that can be run
against an EventBuilder and those run against a fully fledged event.
Currently they're stored as non-outliers even though the server isn't in
the room, which can be problematic in places where the code assumes it
has the state for all non outlier events.
In particular, there is an edge case where persisting the leave event
triggers a state resolution, which requires looking up the room version
from state. Since the server doesn't have the state, this causes an
exception to be thrown.
`on_new_notifications` and `on_new_receipts` in `HttpPusher` and `EmailPusher`
now always return synchronously, so we can remove the `defer.gatherResults` on
their results, and the `run_as_background_process` wrappers can be removed too
because the PusherPool methods will now complete quickly enough.
It's quite important that get_missing_events returns the *latest* events in the
room; however we were pulling event ids out of the database until we got *at
least* 10, and then taking the *earliest* of the results.
We also shouldn't really be relying on depth, and should be checking the
room_id.
If we have a forward extremity for a room as `E`, and you receive `A`, `B`,
s.t. `A -> B -> E`, and `B` also points to an unknown event `X`, then we need
to do state res between `X` and `E`.
When that happens, we need to make sure we include `X` in the state that goes
into the state res alg.
Fixes#3934.
If we've fetched state events from remote servers in order to resolve the state
for a new event, we need to actually pass those events into
resolve_events_with_factory (so that it can do the state res) and then persist
the ones we need - otherwise other bits of the codebase get confused about why
we have state groups pointing to non-existent events.
get_state_groups returns a map from state_group_id to a list of FrozenEvents,
so was very much the wrong thing to be putting as one of the entries in the
list passed to resolve_events_with_factory (which expects maps from
(event_type, state_key) to event id).
We actually want get_state_groups_ids().values() rather than
get_state_groups().
This fixes the main problem in #3923, but there are other problems with this
bit of code which get discovered once you do so.
* add some comments on things that look a bit bogus
* rename this `state` variable to avoid confusion with the `state` used
elsewhere in this function. (There was no actual conflict, but it was
a confusing bit of spaghetti.)
when processing incoming transactions, it can be hard to see what's going on,
because we process a bunch of stuff in parallel, and because we may end up
recursively working our way through a chain of three or four events.
This commit creates a way to use logcontexts to add the relevant event ids to
the log lines.
Add some informative comments about what's going on here.
Also, `sent_to_us_directly` and `get_missing` were doing the same thing (apart
from in `_handle_queued_pdus`, which looks like a bug), so let's get rid of
`get_missing` and use `sent_to_us_directly` consistently.
Let's try to rationalise the logging that happens when we are processing an
incoming transaction, to make it easier to figure out what is going wrong when
they take ages. In particular:
- make everything start with a [room_id event_id] prefix
- make sure we log a warning when catching exceptions rather than just turning
them into other, more cryptic, exceptions.
First of all, avoid resetting the logcontext before running the pushers, to fix
the "Starting db txn 'get_all_updated_receipts' from sentinel context" warning.
Instead, give them their own "background process" logcontexts.
When we get a federation request which refers to an event id, make sure that
said event is in the room the caller claims it is in.
(patch supplied by @turt2live)
Most rooms have a trivial history visibility like "shared" or
"world_readable", especially large rooms, so lets not bother getting the
full membership of those rooms in that case.
These "temporary fixes" have been here three and a half years, and I can't find
any events in the matrix.org database where the calculated signature differs
from what's in the db. It's time for them to go away.
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
We need to be careful (under python 2, at least) that when we reraise an
exception after doing some error handling, we actually reraise the original
exception rather than anything that might have been raised (and handled) during
the error handling.
It turns out that most of the time we were calling have_events, we were only
using half of the result. Replace have_events with have_seen_events and
get_rejection_reasons, so that we can see what's going on a bit more clearly.
* Split state group persist into seperate storage func
* Add per database engine code for state group id gen
* Move store_state_group to StateReadStore
This allows other workers to use it, and so resolve state.
* Hook up store_state_group
* Fix tests
* Rename _store_mult_state_groups_txn
* Rename StateGroupReadStore
* Remove redundant _have_persisted_state_group_txn
* Update comments
* Comment compute_event_context
* Set start val for state_group_id_seq
... otherwise we try to recreate old state groups
* Update comments
* Don't store state for outliers
* Update comment
* Update docstring as state groups are ints
Add federation_domain_whitelist
gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
When synapse receives an event for a room its not in over federation, it
double checks with the remote server to see if it is in fact in the
room. This is done so that if the server has forgotten about the room
(usually as a result of the database being dropped) it can recover from
it.
However, in the presence of state resets in large rooms, this can cause
a lot of work for servers that have legitimately left. As a hacky
solution that supports both cases we drop incoming events for rooms that
we have explicitly left.
This means that we no longer support the case of servers having
forgotten that they've rejoined a room, but that is sufficiently rare
that we're not going to support it for now.
Since we didn't instansiate the PusherPool at start time it could fail
at run time, which it did for some users.
This may or may not fix things for those users, but it should happen at
start time and stop the server from starting.
The `except SynapseError` clauses were pointless because the wrapped functions
would never throw a `SynapseError` (they either throw a `CodeMessageException`
or a `RuntimeError`).
The `except CodeMessageException` is now also pointless because the caller
treats all exceptions equally, so we may as well just throw the
`CodeMessageException`.
During a rejection of an invite received over federation, we ask a remote
server to make us a `leave` event, then sign it, then send that with
`send_leave`.
We were saving the *unsigned* version of the event (which has a different event
id to the signed version) to our db (and sending it to the clients), whereas
other servers in the room will have seen the *signed* version. We're not aware
of any actual problems that caused, except that it makes the database confusing
to look at and generally leaves the room in a weird state.
Make sure that we accept join events from any server, rather than just the
origin server, to make the federation join dance work correctly.
(Fixes#1893).
Fix a bug in ``logcontext.preserve_fn`` which made it leak context into the
reactor, and add a test for it.
Also, get rid of ``logcontext.reset_context_after_deferred``, which tried to do
the same thing but had its own, different, set of bugs.
A few non-functional changes:
* A bunch of docstrings to document types
* Split `EventsStore._persist_events_txn` up a bit. Hopefully it's a bit more
readable.
* Rephrase `EventFederationStore._update_min_depth_for_room_txn` to avoid
mind-bending conditional.
* Rephrase rejected/outlier conditional in `_update_outliers_txn` to avoid
mind-bending conditional.
This just takes the existing `room_queues` logic and moves it out to
`on_receive_pdu` instead of `_process_received_pdu`, which ensures that we
don't start trying to fetch prev_events and whathaveyou until the join has
completed.
Unfortunately this significantly increases the size of the already-rather-big
FederationHandler, but the code fits more naturally here, and it paves the way
for the tighter integration that I need between handling incoming PDUs and
doing the join dance.
Other than renaming the existing `FederationHandler.on_receive_pdu` to
`_process_received_pdu` to make way for it, this just consists of the move, and
replacing `self.handler` with `self` and `self` with `self.replication_layer`.
When a server sends a third party invite another server may be the one
that the inviting user registers with. In this case it is that remote
server that will issue an actual invitation, and wants to do it "in the
name of" the original invitee. However, the new proper invite will not
be signed by the original server, and thus other servers would reject
the invite if it was seen as coming from the original user.
To fix this, a special case has been added to the auth rules whereby
another server can send an invite "in the name of" another server's
user, so long as that user had previously issued a third party invite
that is now being accepted.
Loading push rules now happens in the datastore, so we can remove
the methods that loaded them outside the datastore.
The ``waiting_for_join_list`` in federation handler is populated by
anything, so can be removed.
The ``_get_members_events_txn`` method isn't called from anywhere
so can be removed.
Move the functions inside the distributor and import them
where needed. This reduces duplication and makes it possible
for flake8 to detect when the functions aren't used in a
given file.
If rejecting a remote invite fails with an error response don't fail
the entire request; instead mark the invite as locally rejected.
This fixes the bug where users can get stuck invites which they can
neither accept nor reject.
pycharm supports them so there is no need to use the other format.
Might as well convert the existing strings to reduce the risk of
people accidentally cargo culting the wrong doc string format.
Move the checks for whether an event is new state inside persist
event itself.
This was harder than expected because there wasn't enough information
passed to persist event to correctly handle invites from remote servers
for new rooms.