Commit Graph

54 Commits

Author SHA1 Message Date
Erik Johnston
3e6ee8ff88
Add optimisation to StreamChangeCache (#17130)
When there have been lots of changes compared with the number of
entities, we can do a fast(er) path.

Locally I ran some benchmarking, and the comparison seems to give the
best determination of which method we use.
2024-05-06 12:56:52 +01:00
Erik Johnston
7c9ac01eb5
Fix bug where StreamChangeCache would not respect cache factors (#17152)
Annoyingly mypy didn't pick up this typo.
2024-05-03 18:00:08 +01:00
Erik Johnston
23740eaa3d
Correctly mention previous copyright (#16820)
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
2024-01-23 11:26:48 +00:00
Patrick Cloke
8e1e62c9e0 Update license headers 2023-11-21 15:29:58 -05:00
Patrick Cloke
da77720752
Check the stream position before checking if the cache is empty. (#14639)
An empty cache does not mean the entity has no changed, if
it is earlier than the earliest known stream position return that
the entity *has* changed since the cache cannot accurately
answer that query.
2022-12-08 11:35:49 -05:00
Erik Johnston
cee9445884
Better return type for get_all_entities_changed (#14604)
Help callers from using the return value incorrectly by ensuring
that callers explicitly check if there was a cache hit or not.
2022-12-05 15:19:14 -05:00
Patrick Cloke
6a8310f3df
Compare to the earliest known stream pos in the stream change cache. (#14435)
The internal methods of the StreamChangeCache were inconsistently
treating the earliest known stream position as valid. It is now treated as
invalid, meaning the cache cannot determine if an entity at the earliest
known stream position has changed or not.
2022-12-05 09:00:59 -05:00
Patrick Cloke
13ca8bb2fc
Remove duplicated code to evict entries. (#14410)
This code was factored out to a method, but also left in-place.

Calling this twice in a row makes no sense: the first call will reduce
the size appropriately, but the loop will immediately exit since the
cache size was already reduced.
2022-11-10 15:33:34 -05:00
David Robertson
f8d0f72b27
More types for synapse.util, part 1 (#10888)
The following modules now pass `disallow_untyped_defs`:

* synapse.util.caches.cached_call 
* synapse.util.caches.lrucache
* synapse.util.caches.response_cache 
* synapse.util.caches.stream_change_cache
* synapse.util.caches.ttlcache pass
* synapse.util.daemonize
* synapse.util.patch_inline_callbacks pass `no-untyped-defs`
* synapse.util.versionstring

Additional typing in synapse.util.metrics. Didn't get this to pass `no-untyped-defs`, think I'll need to watch #10847
2021-10-06 11:20:49 +01:00
reivilibre
524b8ead77
Add types to synapse.util. (#10601) 2021-09-10 17:03:18 +01:00
Jonathan de Jong
bdfde6dca1
Use inline type hints in http/federation/, storage/ and util/ (#10381) 2021-07-15 12:46:54 -04:00
Richard van der Hoff
294c675033
Remove synapse.types.Collection (#9856)
This is no longer required, since we have dropped support for Python 3.5.
2021-04-22 16:43:50 +01:00
Jonathan de Jong
4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Eric Eastwood
0a00b7ff14
Update black, and run auto formatting over the codebase (#9381)
- Update black version to the latest
 - Run black auto formatting over the codebase
    - Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
 - Update `code_style.md` docs around installing black to use the correct version
2021-02-16 22:32:34 +00:00
Dagfinn Ilmari Mannsåker
a3f11567d9
Replace all remaining six usage with native Python 3 equivalents (#7704) 2020-06-16 08:51:47 -04:00
Amber Brown
7cb8b4bc67
Allow configuration of Synapse's cache without using synctl or environment variables (#6391) 2020-05-11 18:45:23 +01:00
Erik Johnston
f9073893af Speed up fetching device lists changes in sync.
Currently we copy `users_who_share_room` needlessly about three times,
which is expensive when the set is large (which it can easily be).
2020-05-05 17:40:29 +01:00
Richard van der Hoff
13683a3a22
Extend StreamChangeCache to support multiple entities per stream ID (#7303)
First some background: StreamChangeCache is used to keep track of what "entities" have 
changed since a given stream ID. So for example, we might use it to keep track of when the last
to-device message for a given user was received [1], and hence whether we need to pull any to-device messages from the database on a sync [2].

Now, it turns out that StreamChangeCache didn't support more than one thing being changed at
a given stream_id (this was part of the problem with #7206). However, it's entirely valid to send
to-device messages to more than one user at a time.

As it turns out, this did in fact work, because *some* methods of StreamChangeCache coped
ok with having multiple things changing on the same stream ID, and it seems we never actually
use the methods which don't work on the stream change caches where we allow multiple
changes at the same stream ID. But that feels horribly fragile, hence: let's update
StreamChangeCache to properly support this, and add some typing and some more tests while
we're at it.

[1]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L301
[2]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L47-L51
2020-04-22 13:45:40 +01:00
Richard van der Hoff
0f8f02bc39
On catchup, process each row with its own stream id (#7286)
Other parts of the code (such as the StreamChangeCache) assume that there will
not be multiple changes with the same stream id.

This code was introduced in #7024, and I hope this fixes #7206.
2020-04-20 11:43:29 +01:00
Amber Brown
32e7c9e7f2
Run Black. (#5482) 2019-06-20 19:32:02 +10:00
Amber Brown
e1728dfcbe
Make scripts/ and scripts-dev/ pass pyflakes (and the rest of the codebase on py3) (#4068) 2018-10-20 11:16:55 +11:00
Erik Johnston
b2aa05a8d6 Use efficient .intersection 2018-07-17 11:07:04 +01:00
Erik Johnston
547b1355d3 Fix perf regression in PR #3530
The get_entities_changed function was changed to return all changed
entities since the given stream position, rather than only those changed
from a given list of entities. This resulted in the function incorrectly
returning large numbers of entities that, for example, caused large
increases in database usage.
2018-07-17 10:27:51 +01:00
Erik Johnston
77b692e65d Don't return unknown entities in get_entities_changed
The stream cache keeps track of all entities that have changed since
a particular stream position, so get_entities_changed does not need to
return unknown entites when given a larger stream position.

This makes it consistent with the behaviour of has_entity_changed.
2018-07-13 15:26:10 +01:00
Richard van der Hoff
fa5c2bc082 Reduce set building in get_entities_changed
This line shows up as about 5% of cpu time on a synchrotron:

    not_known_entities = set(entities) - set(self._entity_to_key)

Presumably the problem here is that _entity_to_key can be largeish, and
building a set for its keys every time this function is called is slow.

Here we rewrite the logic to avoid building so many sets.
2018-07-12 11:37:44 +01:00
Amber Brown
49af402019 run isort 2018-07-09 16:09:20 +10:00
Amber Brown
72d2143ea8
Revert "Revert "Try to not use as much CPU in the StreamChangeCache"" (#3454) 2018-06-28 11:04:18 +01:00
Matthew Hodgson
8057489b26
Revert "Try to not use as much CPU in the StreamChangeCache" 2018-06-26 18:09:01 +01:00
Amber Brown
1202508067 fixes 2018-06-26 17:29:01 +01:00
Amber Brown
bd3d329c88 fixes 2018-06-26 17:28:12 +01:00
Amber Brown
abfe4b2957 try and make loading items from the cache faster 2018-06-26 17:25:34 +01:00
Amber Brown
f7869f8f8b
Port to sortedcontainers (with tests!) (#3332) 2018-06-06 00:13:57 +10:00
Amber Brown
df9f72d9e5 replacing portions 2018-05-21 19:47:37 -05:00
Richard van der Hoff
d3347ad485 Revert "Use sortedcontainers instead of blist"
This reverts commit 9fbe70a7dc.

It turns out that sortedcontainers.SortedDict is not an exact match for
blist.sorteddict; in particular, `popitem()` removes things from the opposite
end of the dict.

This is trivial to fix, but I want to add some unit tests, and potentially some
more thought about it, before we do so.
2018-04-13 11:16:43 +01:00
Vincent Breitmoser
9fbe70a7dc Use sortedcontainers instead of blist
This commit drop-in replaces blist with SortedContainers. They are
written in pure python so work with pypy, but perform as good as
native implementations, at least in a couple benchmarks:

http://www.grantjenks.com/docs/sortedcontainers/performance.html
2018-04-10 11:29:51 +02:00
Erik Johnston
b5e8d529e6 Define CACHE_SIZE_FACTOR once 2017-07-04 09:56:44 +01:00
Erik Johnston
efc2b7db95 Rewrite conditional 2017-06-09 13:35:15 +01:00
Erik Johnston
eed59dcc1e Fix has_any_entity_changed
Occaisonally has_any_entity_changed would throw the error: "Set changed
size during iteration" when taking the max of the `sorteddict`. While
its uncertain how that happens, its quite inefficient to iterate over
the entire dict anyway so we change to using the more traditional
`bisect_*` functions.
2017-06-09 11:44:01 +01:00
Erik Johnston
304880d185 Add stream change cache 2017-05-31 15:46:36 +01:00
Richard van der Hoff
29ed09e80a Fix assertion to stop transaction queue getting wedged
... and update some docstrings to correctly reflect the types being used.

get_new_device_msgs_for_remote can return a long under some circumstances,
which was being stored in last_device_list_stream_id_by_dest, and was then
upsetting things on the next loop.
2017-03-15 12:16:55 +00:00
Erik Johnston
955f34d23e Change get_pos_of_last_change to return upper bound 2016-09-15 15:12:07 +01:00
Erik Johnston
cb3edec6af Use stream_change cache to make get_forward_extremeties_for_room cache more effective 2016-09-15 14:28:13 +01:00
Erik Johnston
73c7112433 Change CacheMetrics to be quicker
We change it so that each cache has an individual CacheMetric, instead
of having one global CacheMetric. This means that when a cache tries to
increment a counter it does not need to go through so many indirections.
2016-06-03 11:26:52 +01:00
Erik Johnston
a547e2df85 Return list, not generator. 2016-03-14 15:30:19 +00:00
Erik Johnston
374f9b2f07 Limit stream change cache size too 2016-03-01 13:30:15 +00:00
Erik Johnston
c77dae7a1a Change the way we figure out presence updates for small deltas 2016-02-23 14:54:40 +00:00
Erik Johnston
e70165039c If stream pos is greater then earliest known key and entity hasn't changed, then entity hasn't changed 2016-01-29 16:41:32 +00:00
Erik Johnston
18579534ea Prefill stream change caches 2016-01-29 14:37:59 +00:00
Erik Johnston
3f5dd18bd4 If the same as the earliest key, assume nothing has changed. 2016-01-28 18:11:41 +00:00
Erik Johnston
40431251cb Correctly update _entity_to_key 2016-01-28 18:05:43 +00:00