Commit Graph

37 Commits

Author SHA1 Message Date
Richard van der Hoff
13683a3a22
Extend StreamChangeCache to support multiple entities per stream ID (#7303)
First some background: StreamChangeCache is used to keep track of what "entities" have 
changed since a given stream ID. So for example, we might use it to keep track of when the last
to-device message for a given user was received [1], and hence whether we need to pull any to-device messages from the database on a sync [2].

Now, it turns out that StreamChangeCache didn't support more than one thing being changed at
a given stream_id (this was part of the problem with #7206). However, it's entirely valid to send
to-device messages to more than one user at a time.

As it turns out, this did in fact work, because *some* methods of StreamChangeCache coped
ok with having multiple things changing on the same stream ID, and it seems we never actually
use the methods which don't work on the stream change caches where we allow multiple
changes at the same stream ID. But that feels horribly fragile, hence: let's update
StreamChangeCache to properly support this, and add some typing and some more tests while
we're at it.

[1]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L301
[2]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L47-L51
2020-04-22 13:45:40 +01:00
Richard van der Hoff
0f8f02bc39
On catchup, process each row with its own stream id (#7286)
Other parts of the code (such as the StreamChangeCache) assume that there will
not be multiple changes with the same stream id.

This code was introduced in #7024, and I hope this fixes #7206.
2020-04-20 11:43:29 +01:00
Amber Brown
32e7c9e7f2
Run Black. (#5482) 2019-06-20 19:32:02 +10:00
Amber Brown
e1728dfcbe
Make scripts/ and scripts-dev/ pass pyflakes (and the rest of the codebase on py3) (#4068) 2018-10-20 11:16:55 +11:00
Erik Johnston
b2aa05a8d6 Use efficient .intersection 2018-07-17 11:07:04 +01:00
Erik Johnston
547b1355d3 Fix perf regression in PR #3530
The get_entities_changed function was changed to return all changed
entities since the given stream position, rather than only those changed
from a given list of entities. This resulted in the function incorrectly
returning large numbers of entities that, for example, caused large
increases in database usage.
2018-07-17 10:27:51 +01:00
Erik Johnston
77b692e65d Don't return unknown entities in get_entities_changed
The stream cache keeps track of all entities that have changed since
a particular stream position, so get_entities_changed does not need to
return unknown entites when given a larger stream position.

This makes it consistent with the behaviour of has_entity_changed.
2018-07-13 15:26:10 +01:00
Richard van der Hoff
fa5c2bc082 Reduce set building in get_entities_changed
This line shows up as about 5% of cpu time on a synchrotron:

    not_known_entities = set(entities) - set(self._entity_to_key)

Presumably the problem here is that _entity_to_key can be largeish, and
building a set for its keys every time this function is called is slow.

Here we rewrite the logic to avoid building so many sets.
2018-07-12 11:37:44 +01:00
Amber Brown
49af402019 run isort 2018-07-09 16:09:20 +10:00
Amber Brown
72d2143ea8
Revert "Revert "Try to not use as much CPU in the StreamChangeCache"" (#3454) 2018-06-28 11:04:18 +01:00
Matthew Hodgson
8057489b26
Revert "Try to not use as much CPU in the StreamChangeCache" 2018-06-26 18:09:01 +01:00
Amber Brown
1202508067 fixes 2018-06-26 17:29:01 +01:00
Amber Brown
bd3d329c88 fixes 2018-06-26 17:28:12 +01:00
Amber Brown
abfe4b2957 try and make loading items from the cache faster 2018-06-26 17:25:34 +01:00
Amber Brown
f7869f8f8b
Port to sortedcontainers (with tests!) (#3332) 2018-06-06 00:13:57 +10:00
Amber Brown
df9f72d9e5 replacing portions 2018-05-21 19:47:37 -05:00
Richard van der Hoff
d3347ad485 Revert "Use sortedcontainers instead of blist"
This reverts commit 9fbe70a7dc.

It turns out that sortedcontainers.SortedDict is not an exact match for
blist.sorteddict; in particular, `popitem()` removes things from the opposite
end of the dict.

This is trivial to fix, but I want to add some unit tests, and potentially some
more thought about it, before we do so.
2018-04-13 11:16:43 +01:00
Vincent Breitmoser
9fbe70a7dc Use sortedcontainers instead of blist
This commit drop-in replaces blist with SortedContainers. They are
written in pure python so work with pypy, but perform as good as
native implementations, at least in a couple benchmarks:

http://www.grantjenks.com/docs/sortedcontainers/performance.html
2018-04-10 11:29:51 +02:00
Erik Johnston
b5e8d529e6 Define CACHE_SIZE_FACTOR once 2017-07-04 09:56:44 +01:00
Erik Johnston
efc2b7db95 Rewrite conditional 2017-06-09 13:35:15 +01:00
Erik Johnston
eed59dcc1e Fix has_any_entity_changed
Occaisonally has_any_entity_changed would throw the error: "Set changed
size during iteration" when taking the max of the `sorteddict`. While
its uncertain how that happens, its quite inefficient to iterate over
the entire dict anyway so we change to using the more traditional
`bisect_*` functions.
2017-06-09 11:44:01 +01:00
Erik Johnston
304880d185 Add stream change cache 2017-05-31 15:46:36 +01:00
Richard van der Hoff
29ed09e80a Fix assertion to stop transaction queue getting wedged
... and update some docstrings to correctly reflect the types being used.

get_new_device_msgs_for_remote can return a long under some circumstances,
which was being stored in last_device_list_stream_id_by_dest, and was then
upsetting things on the next loop.
2017-03-15 12:16:55 +00:00
Erik Johnston
955f34d23e Change get_pos_of_last_change to return upper bound 2016-09-15 15:12:07 +01:00
Erik Johnston
cb3edec6af Use stream_change cache to make get_forward_extremeties_for_room cache more effective 2016-09-15 14:28:13 +01:00
Erik Johnston
73c7112433 Change CacheMetrics to be quicker
We change it so that each cache has an individual CacheMetric, instead
of having one global CacheMetric. This means that when a cache tries to
increment a counter it does not need to go through so many indirections.
2016-06-03 11:26:52 +01:00
Erik Johnston
a547e2df85 Return list, not generator. 2016-03-14 15:30:19 +00:00
Erik Johnston
374f9b2f07 Limit stream change cache size too 2016-03-01 13:30:15 +00:00
Erik Johnston
c77dae7a1a Change the way we figure out presence updates for small deltas 2016-02-23 14:54:40 +00:00
Erik Johnston
e70165039c If stream pos is greater then earliest known key and entity hasn't changed, then entity hasn't changed 2016-01-29 16:41:32 +00:00
Erik Johnston
18579534ea Prefill stream change caches 2016-01-29 14:37:59 +00:00
Erik Johnston
3f5dd18bd4 If the same as the earliest key, assume nothing has changed. 2016-01-28 18:11:41 +00:00
Erik Johnston
40431251cb Correctly update _entity_to_key 2016-01-28 18:05:43 +00:00
Erik Johnston
82cf3a8043 Fix inequalities 2016-01-28 17:44:04 +00:00
Erik Johnston
0663c5bd52 Include cache hits with has_entity_changed 2016-01-28 17:27:28 +00:00
Erik Johnston
45cf827c8f Change name and doc has_entity_changed 2016-01-28 16:39:18 +00:00
Erik Johnston
00cb3eb24b Cache tags and account data 2016-01-28 16:37:41 +00:00