It's quite important that get_missing_events returns the *latest* events in the
room; however we were pulling event ids out of the database until we got *at
least* 10, and then taking the *earliest* of the results.
We also shouldn't really be relying on depth, and should be checking the
room_id.
In particular, we assume that the name and canonical alias events in
the state have not been rejected. In practice this may not be the case
(though we should probably think about fixing that) so lets ensure that
we gracefully handle that case, rather than 404'ing the sync request
like we do now.
If we have a forward extremity for a room as `E`, and you receive `A`, `B`,
s.t. `A -> B -> E`, and `B` also points to an unknown event `X`, then we need
to do state res between `X` and `E`.
When that happens, we need to make sure we include `X` in the state that goes
into the state res alg.
Fixes#3934.
If we've fetched state events from remote servers in order to resolve the state
for a new event, we need to actually pass those events into
resolve_events_with_factory (so that it can do the state res) and then persist
the ones we need - otherwise other bits of the codebase get confused about why
we have state groups pointing to non-existent events.
get_state_groups returns a map from state_group_id to a list of FrozenEvents,
so was very much the wrong thing to be putting as one of the entries in the
list passed to resolve_events_with_factory (which expects maps from
(event_type, state_key) to event id).
We actually want get_state_groups_ids().values() rather than
get_state_groups().
This fixes the main problem in #3923, but there are other problems with this
bit of code which get discovered once you do so.
* add some comments on things that look a bit bogus
* rename this `state` variable to avoid confusion with the `state` used
elsewhere in this function. (There was no actual conflict, but it was
a confusing bit of spaghetti.)
when processing incoming transactions, it can be hard to see what's going on,
because we process a bunch of stuff in parallel, and because we may end up
recursively working our way through a chain of three or four events.
This commit creates a way to use logcontexts to add the relevant event ids to
the log lines.
Given we have disabled lazy loading for incr syncs in #3840, we can make self-LL more efficient by only doing it on initial sync. Also adds a bounds check for if/when we change our mind, so that we don't try to include LL members on sync responses with no timeline.
Add some informative comments about what's going on here.
Also, `sent_to_us_directly` and `get_missing` were doing the same thing (apart
from in `_handle_queued_pdus`, which looks like a bug), so let's get rid of
`get_missing` and use `sent_to_us_directly` consistently.
Let's try to rationalise the logging that happens when we are processing an
incoming transaction, to make it easier to figure out what is going wrong when
they take ages. In particular:
- make everything start with a [room_id event_id] prefix
- make sure we log a warning when catching exceptions rather than just turning
them into other, more cryptic, exceptions.
don't filter membership events based on history visibility
as we will already have filtered the messages in the timeline, and state events
are always visible.
and because @erikjohnston said so.
* speed up room summaries by pulling their data from room_memberships rather than room state
* disable LL for incr syncs, and log incr sync stats (#3840)
When a user joined a room any existing tags were not sent down the sync
stream. Ordinarily this isn't a problem because the user needs to be in
the room to have set tags in it, however synapse will sometimes add tags
for a user to a room, e.g. for server notices, which need to come down
sync.
Fetching the list of all new typing notifications involved iterating
over all rooms and comparing their serial. Lets move to using a stream
change cache, like we do for other streams.
Turns out that the user directory handling is fairly racey as a bunch
of stuff assumes that the processing happens on master, which it doesn't
when there is a synapse.app.user_dir worker. So lets just call the
function directly until we actually get round to fixing it, since it
doesn't make the situation any worse.