* attempt at deduplicating lazy-loaded members
as per the proposal; we can deduplicate redundant lazy-loaded members
which are sent in the same sync sequence. we do this heuristically
rather than requiring the client to somehow tell us which members it
has chosen to cache, by instead caching the last N members sent to
a client, and not sending them again. For now we hardcode N to 100.
Each cache for a given (user,device) tuple is in turn cached for up to
X minutes (to avoid the caches building up). For now we hardcode X to 30.
* add include_redundant_members filter option & make it work
* remove stale todo
* add tests for _get_some_state_from_cache
* incorporate review
This field is no longer read from, so we should stop populating it. Once we're
happy that this doesn't break everything, and a rollback is unlikely, we can
think about dropping the column.
We've long passed the point where it's possible to have the same event_id in
different tables, so these join conditions are redundant: we can just join on
event_id.
event_edges is of non-trivial size, and the room_id column is wasteful, so
let's stop reading from it. In future, we can stop writing to it, and then drop
it.
(since it uses methods therein)
Turns out that we had a bunch of things which were incorrectly importing
EventWorkerStore from events.py rather than events_worker.py, which broke once
I removed the import into events.py.
It turns out that looping_call does check the deferred returned by its
callback, and (at least in the case of client_ips), we were relying on this,
and I broke it in #3604.
Update run_as_background_process to return the deferred, and make sure we
return it to clock.looping_call.
This fixes a bug in _delete_existing_rows_txn which was introduced in #3435
(though it's been on matrix-org-hotfixes for *years*). This code is only called
when there is some sort of conflict the first time we try to persist an event,
so it only happens rarely. Still, the exceptions are annoying.
We need to run the errback in the sentinel context to avoid losing our own
context.
Also: add logging to runInteraction to help identify where "Starting db
connection from sentinel context" warnings are coming from
_update_remote_profile_cache was missing its `defer.inlineCallbacks`, so when
it was called, would just return a generator object, without actually running
any of the method body.
on_notifier_poke no longer runs synchonously, so we have to do a different hack
to make sure that the replication data has been sent. Let's actually listen for
its arrival.
We incorrectly asserted that all contexts must have a non None state
group without consider outliers. This would usually be fine as the
assertion would never be hit, as there is a shortcut during persistence
if the forward extremities don't change.
However, if the outlier is being persisted with non-outlier events, the
function would be called and the assertion would be hit.
Fixes#3601
This allows us to handle /context/ requests on the client_reader worker
without having to pull in all the various stream handlers (e.g.
precence, typing, pushers etc). The only thing the token gets used for
is pagination, and that ignores everything but the room portion of the
token.
First of all, fix the logic which looks for identical input state groups so
that we actually use them. This turned out to be most easily done by factoring
the relevant code out to a separate function so that we could do an early
return.
Secondly, avoid building the whole `conflicted_state` dict (which was only ever
used as a boolean flag).
Thirdly, replace the construction of the `state` dict (which mapped from keys
to events that set them), with an optimistic construction of the resolution
result assuming there will be no conflicts. This should be no slower than
building the old `state` dict, and:
- in the conflicted case, we'll short-cut it, saving part of the work
- in the unconflicted case, it saves rebuilding the resolution from the
`state` dict.
Finally, do a couple of s/values/itervalues/.
We don't want to bother pulling out the current state from the DB since
until we know we have to. Checking the context for state is just an
optimisation.
This is more involved than it might otherwise be, because the current
implementation just drops its logcontexts and runs everything in the sentinel
context.
It turns out that we aren't actually using a bunch of the functionality here
(notably suppress_failures and the fact that Distributor.fire returns a
deferred), so the easiest way to fix this is actually by simplifying a bunch of
code.
This fixes#3518, and ensures that we get useful logs and metrics for lots of
things that happen in the background.
(There are certainly more things that happen in the background; these are just
the common ones I've found running a single-process synapse locally).
This introduces a mechanism for tracking resource usage by background
processes, along with an example of how it will be used.
This will help address #3518, but more importantly will give us better insights
into things which are happening but not being shown up by the request metrics.
We *could* do this with Measure blocks, but:
- I think having them pulled out as a completely separate metric class will
make it easier to distinguish top-level processes from those which are
nested.
- I want to be able to report on in-flight background processes, and I don't
think we want to do this for *all* Measure blocks.
The get_entities_changed function was changed to return all changed
entities since the given stream position, rather than only those changed
from a given list of entities. This resulted in the function incorrectly
returning large numbers of entities that, for example, caused large
increases in database usage.
parse_integer and parse_string can take a request and raise errors
in case we have wrong or missing params.
This PR tries to use them more to deduplicate some code and make it
better readable
The stream cache keeps track of all entities that have changed since
a particular stream position, so get_entities_changed does not need to
return unknown entites when given a larger stream position.
This makes it consistent with the behaviour of has_entity_changed.
This line shows up as about 5% of cpu time on a synchrotron:
not_known_entities = set(entities) - set(self._entity_to_key)
Presumably the problem here is that _entity_to_key can be largeish, and
building a set for its keys every time this function is called is slow.
Here we rewrite the logic to avoid building so many sets.