The _get_joined_users_from_context cache stores a mapping from user_id
to avatar_url and display_name. Instead of storing those in a dict,
store them in a namedtuple as that uses much less memory.
We also try converting the string to ascii to further reduce the size.
Currently the cache descriptors store deferreds rather than raw values,
this is a simple way of triggering only one database hit and sharing the
result if two callers attempt to get the same value.
However, there are a few caches that simply store a mapping from string
to string (or int). These caches can have a large number of entries,
under the assumption that each entry is small. However, the size of a
deferred (specifically the size of ObservableDeferred) is signigicantly
larger than that of the raw value, 2kb vs 32b.
This PR therefore changes the cache descriptors to store the raw values
rather than the deferreds.
As a side effect cached storage function now either return a deferred or
the actual value, as the cached list decriptor already does. This is
fine as we always end up just yield'ing on the returned value
eventually, which handles that case correctly.
Using _simple_select_list is fairly expensive for functions that return
a lot of rows and/or get called a lot. (This is because it carefully
constructs a list of dicts).
get_current_state_ids gets called a lot on startup and e.g. when the IRC
bridge decided to send tonnes of joins/leaves (as it invalidates the
cache). We therefore replace it with a custon txn function that builds
up the final result dict without building up and intermediate
representation.
Instead of using the cache invalidation replication stream to invalidate
the _get_presence_cache, we can instead rely on the presence replication
stream. This reduces the amount of replication traffic considerably.
background_updates was using `call_later` in a way that leaked the logcontext
into the reactor.
We could have rewritten it to do it properly, but given that we weren't using
the fancier facilities provided by `call_later`, we might as well just use
`async.sleep`, which does the logcontext stuff properly.
A few non-functional changes:
* A bunch of docstrings to document types
* Split `EventsStore._persist_events_txn` up a bit. Hopefully it's a bit more
readable.
* Rephrase `EventFederationStore._update_min_depth_for_room_txn` to avoid
mind-bending conditional.
* Rephrase rejected/outlier conditional in `_update_outliers_txn` to avoid
mind-bending conditional.
... and update some docstrings to correctly reflect the types being used.
get_new_device_msgs_for_remote can return a long under some circumstances,
which was being stored in last_device_list_stream_id_by_dest, and was then
upsetting things on the next loop.
Currently we fetch the list of current state events whenever we send
something in a room. This is overkill for the common case of persisting
a simple chain of non-state events, so lets handle that case specially.
* `get_forward_extremeties_for_room` takes a numeric `stream_ordering`. We were
passing a `RoomStreamToken`, which meant that it returned the *current*
extremities, rather than those corresponding to the `from_token`. However:
* `get_state_ids_for_events` required a second ('types') parameter; this meant
that a `TypeError` was thrown and we ended up acting as though there was *no*
prev state.
* `get_state_ids_for_events` actually returns a map from event_id to state
dictionary - just looking up the state keys in it again meant that we acted
as though there was no prev state. We now check if each member's state has
changed since *any* of the extremities.
Also add/fix some comments.