Commit Graph

672 Commits

Author SHA1 Message Date
Erik Johnston
fdd1a62e8d Add a five minute cache to get_destination_retry_timings
Hopefully helps with #3931
2018-09-21 14:56:12 +01:00
Erik Johnston
79eded1ae4 Make ExpiringCache slightly more performant 2018-09-21 14:52:21 +01:00
Erik Johnston
8601c24287 Fix some instances of ExpiringCache not expiring cache items
ExpiringCache required that `start()` be called before it would actually
start expiring entries. A number of places didn't do that.

This PR removes `start` from ExpiringCache, and automatically starts
backround reaping process on creation instead.
2018-09-21 14:19:46 +01:00
Richard van der Hoff
642199570c
Improve the logging when handling a federation transaction (#3904)
Let's try to rationalise the logging that happens when we are processing an
incoming transaction, to make it easier to figure out what is going wrong when
they take ages. In particular:

- make everything start with a [room_id event_id] prefix
- make sure we log a warning when catching exceptions rather than just turning
  them into other, more cryptic, exceptions.
2018-09-19 17:28:18 +01:00
Erik Johnston
9407bcf37a Replace custom DeferredTimeoutError with defer.TimeoutError 2018-09-19 11:07:29 +01:00
Erik Johnston
6c48aa0256 Run canceller first to allow it to generate correct error 2018-09-19 11:07:27 +01:00
Erik Johnston
a334e1cace Update to use new timeout function everywhere.
The existing deferred timeout helper function (and the one into twisted)
suffer from a bug when a deferred's canceller throws an exception, #3842.

The new helper function doesn't suffer from this problem.
2018-09-19 10:39:40 +01:00
Erik Johnston
24efb2a70d Fix timeout function
Turns out deferred.cancel sometimes throws, so we do that last to ensure
that we always do resolve the new deferred.
2018-09-15 11:38:39 +01:00
Erik Johnston
fcfe7a850d Add an awful secondary timeout to fix wedged requests
This is an attempt to mitigate #3842 by adding yet-another-timeout
2018-09-14 19:23:07 +01:00
Erik Johnston
0a81038ea0 Add in flight real time metrics for Measure blocks 2018-09-14 15:08:37 +01:00
Erik Johnston
9e05c8d309 Change the manhole SSH key to have more bits
Newer versions of openssh client refuse to connect to the old key due to
its length.
2018-09-11 10:42:10 +01:00
Richard van der Hoff
be6527325a Fix exceptions when a connection is closed before we read the headers
This fixes bugs introduced in #3700, by making sure that we behave sanely
when an incoming connection is closed before the headers are read.
2018-08-20 18:21:10 +01:00
Richard van der Hoff
55e6bdf287 Robustness fix for logcontext filter
Make the logcontext filter not explode if it somehow ends up with a logcontext
of None, since that infinite-loops the whole logging system.
2018-08-20 18:20:07 +01:00
Amber Brown
324525f40c
Port over enough to get some sytests running on Python 3 (#3668) 2018-08-20 23:54:49 +10:00
Richard van der Hoff
c31793a784 Merge branch 'rav/fix_linearizer_cancellation' into develop 2018-08-10 14:57:27 +01:00
Amber Brown
b37c472419
Rename async to async_helpers because async is a keyword on Python 3.7 (#3678) 2018-08-10 23:50:21 +10:00
Richard van der Hoff
638d35ef08 Fix linearizer cancellation on twisted < 18.7
Turns out that cancellation of inlineDeferreds didn't really work properly
until Twisted 18.7. This commit refactors Linearizer.queue to avoid
inlineCallbacks.
2018-08-10 10:59:09 +01:00
Amber Brown
da7785147d
Python 3: Convert some unicode/bytes uses (#3569) 2018-08-02 00:54:06 +10:00
Richard van der Hoff
a8cbce0ced fix invalidation 2018-07-27 16:17:17 +01:00
Richard van der Hoff
f102c05856 Rewrite cache list decorator
Because it was complicated and annoyed me. I suspect this will be more
efficient too.
2018-07-27 13:47:04 +01:00
Richard van der Hoff
03751a6420 Fix some looping_call calls which were broken in #3604
It turns out that looping_call does check the deferred returned by its
callback, and (at least in the case of client_ips), we were relying on this,
and I broke it in #3604.

Update run_as_background_process to return the deferred, and make sure we
return it to clock.looping_call.
2018-07-26 11:48:08 +01:00
Richard van der Hoff
3d6df84658 Test and fix support for cancellation in Linearizer 2018-07-20 13:59:55 +01:00
Richard van der Hoff
7c712f95bb Combine Limiter and Linearizer
Linearizer was effectively a Limiter with max_count=1, so rather than
maintaining two sets of code, let's combine them.
2018-07-20 13:11:43 +01:00
Richard van der Hoff
8462c26485 Improvements to the Limiter
* give them names, to improve logging
* use a deque rather than a list for efficiency
2018-07-20 12:50:27 +01:00
Richard van der Hoff
d7275eecf3 Add a sleep to the Limiter to fix stack overflows.
Fixes #3570
2018-07-20 12:37:12 +01:00
Amber Brown
95ccb6e2ec
Don't spew errors because we can't save metrics (#3563) 2018-07-19 20:58:18 +10:00
Richard van der Hoff
8c69b735e3 Make Distributor run its processes as a background process
This is more involved than it might otherwise be, because the current
implementation just drops its logcontexts and runs everything in the sentinel
context.

It turns out that we aren't actually using a bunch of the functionality here
(notably suppress_failures and the fact that Distributor.fire returns a
deferred), so the easiest way to fix this is actually by simplifying a bunch of
code.
2018-07-18 20:55:05 +01:00
Richard van der Hoff
667fba68f3 Run things as background processes
This fixes #3518, and ensures that we get useful logs and metrics for lots of
things that happen in the background.

(There are certainly more things that happen in the background; these are just
the common ones I've found running a single-process synapse locally).
2018-07-18 20:55:05 +01:00
Erik Johnston
b2aa05a8d6 Use efficient .intersection 2018-07-17 11:07:04 +01:00
Erik Johnston
547b1355d3 Fix perf regression in PR #3530
The get_entities_changed function was changed to return all changed
entities since the given stream position, rather than only those changed
from a given list of entities. This resulted in the function incorrectly
returning large numbers of entities that, for example, caused large
increases in database usage.
2018-07-17 10:27:51 +01:00
Amber Brown
3fe0938b76
Merge pull request #3530 from matrix-org/erikj/stream_cache
Don't return unknown entities in get_entities_changed
2018-07-17 13:44:46 +10:00
Richard van der Hoff
33b40d0a25 Make FederationRateLimiter queue requests properly
popitem removes the *most recent* item by default [1]. We want the oldest.

Fixes #3524

[1]: https://docs.python.org/2/library/collections.html#collections.OrderedDict.popitem
2018-07-13 16:19:40 +01:00
Erik Johnston
77b692e65d Don't return unknown entities in get_entities_changed
The stream cache keeps track of all entities that have changed since
a particular stream position, so get_entities_changed does not need to
return unknown entites when given a larger stream position.

This makes it consistent with the behaviour of has_entity_changed.
2018-07-13 15:26:10 +01:00
Richard van der Hoff
fa5c2bc082 Reduce set building in get_entities_changed
This line shows up as about 5% of cpu time on a synchrotron:

    not_known_entities = set(entities) - set(self._entity_to_key)

Presumably the problem here is that _entity_to_key can be largeish, and
building a set for its keys every time this function is called is slow.

Here we rewrite the logic to avoid building so many sets.
2018-07-12 11:37:44 +01:00
Richard van der Hoff
c3c29aa196
Attempt to include db threads in cpu usage stats (#3496)
Let's try to include time spent in the DB threads in the per-request/block cpu
usage metrics.
2018-07-10 16:12:36 +01:00
Richard van der Hoff
55370331da
Refactor logcontext resource usage tracking (#3501)
Factor out the resource usage tracking out to a separate object, which can be
passed around and copied independently of the logcontext itself.
2018-07-10 13:56:07 +01:00
Amber Brown
49af402019 run isort 2018-07-09 16:09:20 +10:00
Amber Brown
6350bf925e
Attempt to be more performant on PyPy (#3462) 2018-06-28 14:49:57 +01:00
Amber Brown
72d2143ea8
Revert "Revert "Try to not use as much CPU in the StreamChangeCache"" (#3454) 2018-06-28 11:04:18 +01:00
Matthew Hodgson
8057489b26
Revert "Try to not use as much CPU in the StreamChangeCache" 2018-06-26 18:09:01 +01:00
Amber Brown
1202508067 fixes 2018-06-26 17:29:01 +01:00
Amber Brown
bd3d329c88 fixes 2018-06-26 17:28:12 +01:00
Amber Brown
abfe4b2957 try and make loading items from the cache faster 2018-06-26 17:25:34 +01:00
Amber Brown
07cad26d65
Remove all global reactor imports & pass it around explicitly (#3424) 2018-06-25 14:08:28 +01:00
Richard van der Hoff
43e02c409d Disable partial state group caching for wildcard lookups
When _get_state_for_groups is given a wildcard filter, just do a complete
lookup. Hopefully this will give us the best of both worlds by not filling up
the ram if we only need one or two keys, but also making the cache still work
for the federation reader usecase.
2018-06-22 11:52:07 +01:00
Richard van der Hoff
70e6501913
Merge pull request #3419 from matrix-org/rav/events_per_request
Log number of events fetched from DB
2018-06-22 11:17:56 +01:00
Richard van der Hoff
0495fe0035 Indirect evt_count updates via method call
so that we can stub it for the sentinel and not have a billion failing UTs
2018-06-22 10:42:28 +01:00
Amber Brown
77ac14b960
Pass around the reactor explicitly (#3385) 2018-06-22 09:37:10 +01:00
Richard van der Hoff
b088aafcae Log number of events fetched from DB
When we finish processing a request, log the number of events we fetched from
the database to handle it.

[I'm trying to figure out which requests are responsible for large amounts of
event cache churn. It may turn out to be more helpful to add counts to the
prometheus per-request/block metrics, but that is an extension to this code
anyway.]
2018-06-21 06:15:03 +01:00
Amber Brown
a61738b316
Remove run_on_reactor (#3395) 2018-06-14 18:27:37 +10:00
Amber Brown
f7869f8f8b
Port to sortedcontainers (with tests!) (#3332) 2018-06-06 00:13:57 +10:00
Erik Johnston
042eedfa2b Add hacky cache factor override system 2018-06-04 15:39:28 +01:00
Amber Brown
c936a52a9e
Consistently use six's iteritems and wrap lazy keys/values in list() if they're not meant to be lazy (#3307) 2018-05-31 19:03:47 +10:00
Amber Brown
debff7ae09
Merge pull request #3281 from NotAFile/py3-six-isinstance
remaining isintance fixes
2018-05-30 12:44:46 +10:00
Adrian Tschira
7873cde526 pep8 2018-05-29 17:35:55 +02:00
Amber Brown
57ad76fa4a fix up tests 2018-05-28 19:51:53 +10:00
Amber Brown
3ef5cd74a6 update to more consistently use seconds in any metrics or logging 2018-05-28 19:39:27 +10:00
Amber Brown
357c74a50f add comment about why unreg 2018-05-28 19:14:41 +10:00
Amber Brown
754826a830 Merge remote-tracking branch 'origin/develop' into 3218-official-prom 2018-05-28 18:57:23 +10:00
Adrian Tschira
4ee4450d66 fix recursion error 2018-05-24 21:44:10 +02:00
Adrian Tschira
dd068ca979 remaining isintance fixes
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-05-24 20:55:08 +02:00
Amber Brown
36501068d8
Merge pull request #3247 from NotAFile/py3-misc
Misc Python3 fixes
2018-05-24 12:58:37 -05:00
Amber Brown
2aff6eab6d
Merge pull request #3245 from NotAFile/batch-iter
Add batch_iter to utils
2018-05-24 12:54:12 -05:00
Amber Brown
53cc2cde1f cleanup 2018-05-22 17:32:57 -05:00
Amber Brown
071206304d cleanup pep8 errors 2018-05-22 16:54:22 -05:00
Amber Brown
85ba83eb51 fixes 2018-05-22 16:28:23 -05:00
Amber Brown
a8990fa2ec Merge remote-tracking branch 'origin/develop' into 3218-official-prom 2018-05-22 10:50:26 -05:00
Erik Johnston
7948ecf234 Comment 2018-05-22 11:39:43 +01:00
Erik Johnston
020377a550 Fix logcontext resource usage tracking 2018-05-22 11:16:07 +01:00
Amber Brown
df9f72d9e5 replacing portions 2018-05-21 19:47:37 -05:00
Adrian Tschira
45b55e23d3 Add batch_iter to utils
There's a frequent idiom I noticed where an iterable is split up into a
number of chunks/batches. Unfortunately that method does not work with
iterators like dict.keys() in python3. This implementation works with
iterators.

Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-05-19 17:48:30 +02:00
Adrian Tschira
73cbdef5f7 fix py3 intern and remove unnecessary py3 encode
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-05-19 17:35:31 +02:00
Richard van der Hoff
093d8c415a Merge remote-tracking branch 'origin/develop' into rav/warn_on_logcontext_fail 2018-05-03 14:59:29 +01:00
Richard van der Hoff
a7fe62f0cb Fix logcontext leaks in rate limiter 2018-05-03 12:31:59 +01:00
Richard van der Hoff
415c6b672e Merge branch 'develop' into rav/more_logcontext_leaks 2018-05-02 16:16:01 +01:00
Richard van der Hoff
f22e7cda2c Fix a class of logcontext leaks
So, it turns out that if you have a first `Deferred` `D1`, you can add a
callback which returns another `Deferred` `D2`, and `D2` must then complete
before any further callbacks on `D1` will execute (and later callbacks on `D1`
get the *result* of `D2` rather than `D2` itself).

So, `D1` might have `called=True` (as in, it has started running its
callbacks), but any new callbacks added to `D1` won't get run until `D2`
completes - so if you `yield D1` in an `inlineCallbacks` function, your `yield`
will 'block'.

In conclusion: some of our assumptions in `logcontext` were invalid. We need to
make sure that we don't optimise out the logcontext juggling when this
situation happens. Fortunately, it is easy to detect by checking `D1.paused`.
2018-05-02 11:58:00 +01:00
Richard van der Hoff
e482f8cd85 Fix incorrect reference to StringIO
This was introduced in 4f2f5171
2018-05-02 09:12:26 +01:00
Richard van der Hoff
fdb6849b81
Merge pull request #3144 from matrix-org/rav/run_in_background_exception_handling
Trap exceptions thrown within run_in_background
2018-04-30 10:23:02 +01:00
Richard van der Hoff
db75c86e84
Merge branch 'develop' into py3-xrange-1 2018-04-30 01:02:25 +01:00
Richard van der Hoff
049b0b5af2
Merge pull request #3154 from NotAFile/py3-stringio
Replace stringIO imports with six
2018-04-30 00:59:04 +01:00
Richard van der Hoff
dbf6f28d64
Merge pull request #3155 from NotAFile/py3-bytes-1
more bytes strings
2018-04-30 00:38:21 +01:00
Richard van der Hoff
aab2e4da60
Merge pull request #3140 from matrix-org/rav/use_run_in_background
Use run_in_background in preference to preserve_fn
2018-04-30 00:34:28 +01:00
Adrian Tschira
e9143b6593 more bytes strings
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-29 00:13:57 +02:00
Adrian Tschira
d82b6ea9e6 Move more xrange to six
plus a bonus next()

Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-28 13:57:00 +02:00
Adrian Tschira
4f2f5171b7 replace stringIO imports 2018-04-28 13:46:23 +02:00
Richard van der Hoff
fc149b4eeb Merge remote-tracking branch 'origin/develop' into rav/use_run_in_background 2018-04-27 14:31:23 +01:00
Richard van der Hoff
6146332387 Merge remote-tracking branch 'origin/develop' into rav/deferred_timeout 2018-04-27 14:18:00 +01:00
Richard van der Hoff
2a13af23bc Use run_in_background in preference to preserve_fn
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
2018-04-27 12:55:51 +01:00
Richard van der Hoff
9d2c1b8429 Backport deferred.addTimeout
Twisted 16.0 doesn't have addTimeout, so let's backport it.
2018-04-27 12:52:30 +01:00
Richard van der Hoff
13843f771e Trap exceptions thrown within run_in_background
Turn any exceptions that get thrown synchronously within run_in_background into
Failures instead.
2018-04-27 12:17:13 +01:00
Richard van der Hoff
9255a6cb17 Improve exception handling for background processes
There were a bunch of places where we fire off a process to happen in the
background, but don't have any exception handling on it - instead relying on
the unhandled error being logged when the relevent deferred gets
garbage-collected.

This is unsatisfactory for a number of reasons:
 - logging on garbage collection is best-effort and may happen some time after
   the error, if at all
 - it can be hard to figure out where the error actually happened.
 - it is logged as a scary CRITICAL error which (a) I always forget to grep for
   and (b) it's not really CRITICAL if a background process we don't care about
   fails.

So this is an attempt to add exception handling to everything we fire off into
the background.
2018-04-27 11:07:40 +01:00
Richard van der Hoff
1ea904b9f0 Use deferred.addTimeout instead of time_bound_deferred
This doesn't feel like a wheel we need to reinvent.
2018-04-23 00:53:18 +01:00
Richard van der Hoff
8dc4a6144b
Merge pull request #3107 from NotAFile/py3-bool-nonzero
add __bool__ alias to __nonzero__ methods
2018-04-20 15:43:39 +01:00
Richard van der Hoff
c09a6daf09
Merge pull request #3110 from NotAFile/py3-six-queue
Replace Queue with six.moves.queue
2018-04-20 15:35:00 +01:00
Richard van der Hoff
11a67b7c9d
Merge pull request #3093 from matrix-org/rav/response_cache_wrap
Refactor ResponseCache usage
2018-04-20 11:31:17 +01:00
Adrian Tschira
878995e660 Replace Queue with six.moves.queue
and a six.range change which I missed the last time

Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-16 00:46:21 +02:00
Adrian Tschira
f63ff73c7f add __bool__ alias to __nonzero__ methods
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-15 20:40:47 +02:00
Richard van der Hoff
d3347ad485 Revert "Use sortedcontainers instead of blist"
This reverts commit 9fbe70a7dc.

It turns out that sortedcontainers.SortedDict is not an exact match for
blist.sorteddict; in particular, `popitem()` removes things from the opposite
end of the dict.

This is trivial to fix, but I want to add some unit tests, and potentially some
more thought about it, before we do so.
2018-04-13 11:16:43 +01:00
Richard van der Hoff
60f6014bb7 ResponseCache: fix handling of completed results
Turns out that ObservableDeferred.observe doesn't return a deferred if the
result is already completed. Fix handling and improve documentation.
2018-04-13 07:32:29 +01:00
Richard van der Hoff
b78395b7fe Refactor ResponseCache usage
Adds a `.wrap` method to ResponseCache which wraps up the boilerplate of a
(get, set) pair, and then use it throughout the codebase.

This will be largely non-functional, but does include the following functional
changes:

* federation_server.on_context_state_request: drops use of _server_linearizer
  which looked redundant and could cause incorrect cache misses by yielding
  between the get and the set.
* RoomListHandler.get_remote_public_room_list(): fixes logcontext leaks
* the wrap function includes some logging. I'm hoping this won't be too noisy
  on production.
2018-04-12 13:02:15 +01:00
Richard van der Hoff
d5c74b9f6c
Merge pull request #3092 from matrix-org/rav/response_cache_metrics
Add metrics for ResponseCache
2018-04-12 12:59:36 +01:00
Richard van der Hoff
261124396e
Merge pull request #3059 from matrix-org/rav/doc_response_cache
Document the behaviour of ResponseCache
2018-04-12 11:22:30 +01:00
Richard van der Hoff
b3384232a0 Add metrics for ResponseCache 2018-04-10 23:14:47 +01:00
Vincent Breitmoser
9fbe70a7dc Use sortedcontainers instead of blist
This commit drop-in replaces blist with SortedContainers. They are
written in pure python so work with pypy, but perform as good as
native implementations, at least in a couple benchmarks:

http://www.grantjenks.com/docs/sortedcontainers/performance.html
2018-04-10 11:29:51 +02:00
Richard van der Hoff
13decdbf96 Revert "Merge pull request #3066 from matrix-org/rav/remove_redundant_metrics"
We aren't ready to release this yet, so I'm reverting it for now.

This reverts commit d1679a4ed7, reversing
changes made to e089100c62.
2018-04-09 12:59:12 +01:00
Richard van der Hoff
3449da3bc7
Merge pull request #3068 from matrix-org/rav/fix_cache_invalidation
Improve database cache performance
2018-04-05 17:21:44 +01:00
Richard van der Hoff
01afc563c3 Fix overzealous cache invalidation
Fixes an issue where a cache invalidation would invalidate *all* pending
entries, rather than just the entry that we intended to invalidate.
2018-04-05 16:24:04 +01:00
Richard van der Hoff
518f6de088 Remove redundant metrics which were deprecated in 0.27.0. 2018-04-04 19:46:28 +01:00
Richard van der Hoff
a9a74101a4 Document the behaviour of ResponseCache
it looks like everything that uses ResponseCache expects to have to
`make_deferred_yieldable` its results. It's debatable whether that is the best
approach, but let's document it for now to avoid further confusion.
2018-04-04 09:06:22 +01:00
Richard van der Hoff
05630758f2 Use static JSONEncoders
using json.dumps with custom options requires us to create a new JSONEncoder on
each call. It's more efficient to create one upfront and reuse it.
2018-03-29 23:13:33 +01:00
Matthew Hodgson
8cbbfaefc1 404 correctly on missing paths via NoResource
fixes https://github.com/matrix-org/synapse/issues/2043 and https://github.com/matrix-org/synapse/issues/2029
2018-03-23 10:32:50 +00:00
Erik Johnston
9a0d783c11 Add comments 2018-03-19 11:35:53 +00:00
Richard van der Hoff
5a6e54264d Make 'unexpected logging context' into warnings
I think we've now fixed enough of these that the rest can be logged at
warning.
2018-03-15 18:40:38 +00:00
Erik Johnston
7c7706f42b Fix bug where state cache used lots of memory
The state cache bases its size on the sum of the size of entries. The
size of the entry is calculated once on insertion, so it is important
that the size of entries does not change.

The DictionaryCache modified the entries size, which caused the state
cache to incorrectly think it was smaller than it actually was.
2018-03-15 15:46:54 +00:00
Richard van der Hoff
20f40348d4 Factor run_in_background out from preserve_fn
It annoys me that we create temporary function objects when there's really no
need for it. Let's factor the gubbins out of preserve_fn and start using it.
2018-03-08 11:50:11 +00:00
Richard van der Hoff
3a75de923b Rewrite make_deferred_yieldable avoiding inlineCallbacks
... because (a) it's actually simpler (b) it might be marginally more
performant?
2018-03-01 12:40:05 +00:00
Richard van der Hoff
bc496df192 report metrics on number of cache evictions 2018-02-05 15:34:01 +00:00
Matthew Hodgson
ab9f844aaf
Add federation_domain_whitelist option (#2820)
Add federation_domain_whitelist

gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
2018-01-22 19:11:18 +01:00
Matthew Hodgson
d84f65255e
Merge pull request #2813 from matrix-org/matthew/registrations_require_3pid
add registrations_require_3pid and allow_local_3pids
2018-01-22 13:57:22 +00:00
Matthew Hodgson
8fe253f19b fix PR nitpicking 2018-01-19 18:23:45 +00:00
Matthew Hodgson
447f4f0d5f rewrite based on PR feedback:
* [ ] split config options into allowed_local_3pids and registrations_require_3pid
 * [ ] simplify and comment logic for picking registration flows
 * [ ] fix docstring and move check_3pid_allowed into a new util module
 * [ ] use check_3pid_allowed everywhere

@erikjohnston PTAL
2018-01-19 15:33:55 +00:00
Erik Johnston
b6dc7044a9
Merge pull request #2804 from matrix-org/erikj/file_consumer
Add decent impl of a FileConsumer
2018-01-18 16:31:33 +00:00
Richard van der Hoff
d57765fc8a Fix bugs in block metrics
... which I introduced in #2785
2018-01-18 12:24:42 +00:00
Erik Johnston
be0dfcd4a2 Do logcontexts correctly 2018-01-18 11:57:57 +00:00
Erik Johnston
1432f7ccd5 Move test stuff to tests 2018-01-18 11:57:57 +00:00
Erik Johnston
2f18a2647b Make all fields private 2018-01-18 11:57:54 +00:00
Erik Johnston
dc519602ac Ensure we registerProducer isn't called twice 2018-01-18 11:07:17 +00:00
Erik Johnston
17b54389fe Fix _notify_empty typo 2018-01-18 11:05:34 +00:00
Erik Johnston
28b338ed9b Move definition of paused_producer to __init__ 2018-01-18 11:04:41 +00:00
Erik Johnston
a177325b49 Fix comments 2018-01-18 11:02:43 +00:00
Erik Johnston
bc67e7d260 Add decent impl of a FileConsumer
Twisted core doesn't have a general purpose one, so we need to write one
ourselves.

Features:
- All writing happens in background thread
- Supports both push and pull producers
- Push producers get paused if the consumer falls behind
2018-01-17 16:43:03 +00:00
Richard van der Hoff
3d12d97415 Track DB scheduling delay per-request
For each request, track the amount of time spent waiting for a db
connection. This entails adding it to the LoggingContext and we may as well add
metrics for it while we are passing.
2018-01-16 17:23:32 +00:00
Richard van der Hoff
6324b65f08 Track db txn time in millisecs
... to reduce the amount of floating-point foo we do.
2018-01-16 15:53:18 +00:00
Richard van der Hoff
44a498418c Optimise LoggingContext creation and copying
It turns out that the only thing we use the __dict__ of LoggingContext for is
`request`, and given we create lots of LoggingContexts and then copy them every
time we do a db transaction or log line, using the __dict__ seems a bit
redundant. Let's try to optimise things by making the request attribute
explicit.
2018-01-16 15:49:42 +00:00
Richard van der Hoff
39f4e29d01 Reorganise request and block metrics
In order to circumvent the number of duplicate foo:count metrics increasing
without bounds, it's time for a rearrangement.

The following are all deprecated, and replaced with synapse_util_metrics_block_count:
  synapse_util_metrics_block_timer:count
  synapse_util_metrics_block_ru_utime:count
  synapse_util_metrics_block_ru_stime:count
  synapse_util_metrics_block_db_txn_count:count
  synapse_util_metrics_block_db_txn_duration:count

The following are all deprecated, and replaced with synapse_http_server_response_count:
   synapse_http_server_requests
   synapse_http_server_response_time:count
   synapse_http_server_response_ru_utime:count
   synapse_http_server_response_ru_stime:count
   synapse_http_server_response_db_txn_count:count
   synapse_http_server_response_db_txn_duration:count

The following are renamed (the old metrics are kept for now, but deprecated):

  synapse_util_metrics_block_timer:total ->
     synapse_util_metrics_block_time_seconds

  synapse_util_metrics_block_ru_utime:total ->
     synapse_util_metrics_block_ru_utime_seconds

  synapse_util_metrics_block_ru_stime:total ->
     synapse_util_metrics_block_ru_stime_seconds

  synapse_util_metrics_block_db_txn_count:total ->
     synapse_util_metrics_block_db_txn_count

  synapse_util_metrics_block_db_txn_duration:total ->
     synapse_util_metrics_block_db_txn_duration_seconds

  synapse_http_server_response_time:total ->
     synapse_http_server_response_time_seconds

  synapse_http_server_response_ru_utime:total ->
     synapse_http_server_response_ru_utime_seconds

  synapse_http_server_response_ru_stime:total ->
     synapse_http_server_response_ru_stime_seconds

   synapse_http_server_response_db_txn_count:total ->
      synapse_http_server_response_db_txn_count

   synapse_http_server_response_db_txn_duration:total
      synapse_http_server_response_db_txn_duration_seconds
2018-01-15 17:09:44 +00:00
Richard van der Hoff
b2cd6accf5 Remove __PreservingContextDeferred too 2017-11-14 23:00:10 +00:00
Richard van der Hoff
7e6fa29cb5 Remove preserve_context_over_{fn, deferred}
Both of these functions ae known to leak logcontexts. Replace the remaining
calls to them and kill them off.
2017-11-14 11:22:42 +00:00
Richard van der Hoff
bf993db11c Logging and logcontext fixes for Limiter
Add some logging to the Limiter in a similar spirit to the Linearizer, to help
debug issues.

Also fix a logcontext leak.

Also refactor slightly to avoid throwing exceptions.
2017-11-07 00:48:57 +00:00
Richard van der Hoff
0be99858f3 fix vars named l
E741 says "do not use variables named ‘l’, ‘O’, or ‘I’".
2017-10-23 15:56:38 +01:00
Richard van der Hoff
eaaabc6c4f replace 'except:' with 'except Exception:'
what could possibly go wrong
2017-10-23 15:52:32 +01:00
Richard van der Hoff
2e9f5ea31a Fix logcontext handling for persist_events
* don't use preserve_context_over_deferred, which is known broken.

* remove a redundant preserve_fn.

* add/improve some comments
2017-10-17 10:59:30 +01:00
Richard van der Hoff
cc794d60e7 Merge pull request #2532 from matrix-org/rav/fix_linearizer
Fix stackoverflow and logcontexts from linearizer
2017-10-11 17:29:32 +01:00
Richard van der Hoff
f30c4ed2bc logformatter: fix AttributeError
make sure we have the relevant fields before we try to log them.
2017-10-11 17:26:17 +01:00
Richard van der Hoff
4fad8efbfb Fix stackoverflow and logcontexts from linearizer
1. make it not blow out the stack when there are more than 50 things waiting
   for a lock. Fixes https://github.com/matrix-org/synapse/issues/2505.

2. Make it not mess up the log contexts.
2017-10-11 15:05:05 +01:00
Richard van der Hoff
3cc852d339 Fancy logformatter to format exceptions better
This is a bit of an experimental change at this point; the idea is to see if it
helps us track down where our stack overflows are coming from by logging the
stack when the exception was caught and turned into a Failure. (We'll also need
edf2704420).

If we deploy this, we'll be able to enable it via the log config yaml.
2017-10-09 17:44:42 +01:00
Richard van der Hoff
148428ce76 Fix logcontext handling for concurrently_execute
Avoid preserve_context_over_deferred, which is broken.
2017-10-06 22:24:28 +01:00
David Baker
8ad5f34908 pep8 2017-09-26 19:21:41 +01:00
David Baker
9fd086e506 unnecessary parens 2017-09-26 17:59:46 +01:00
David Baker
0b03a97708 Add module_loader.py 2017-09-26 17:56:41 +01:00
Erik Johnston
495f075b41 Increase default cache factor size. 2017-07-04 09:58:32 +01:00
Erik Johnston
b5e8d529e6 Define CACHE_SIZE_FACTOR once 2017-07-04 09:56:44 +01:00
Erik Johnston
c72058bcc6 Use an ExpiringCache for storing registration sessions
This is because pruning them was a significant performance drain on
matrix.org
2017-06-29 14:08:37 +01:00
Erik Johnston
efc2b7db95 Rewrite conditional 2017-06-09 13:35:15 +01:00
Erik Johnston
eed59dcc1e Fix has_any_entity_changed
Occaisonally has_any_entity_changed would throw the error: "Set changed
size during iteration" when taking the max of the `sorteddict`. While
its uncertain how that happens, its quite inefficient to iterate over
the entire dict anyway so we change to using the more traditional
`bisect_*` functions.
2017-06-09 11:44:01 +01:00
Erik Johnston
304880d185 Add stream change cache 2017-05-31 15:46:36 +01:00
Erik Johnston
bd7bb5df71 Pull out if statement from for loop 2017-05-22 15:12:19 +01:00
Erik Johnston
e3417a06e2 Update list cache to handle one arg case
We update the normal cache descriptors to handle caches with a single
argument specially so that the key wasn't a 1-tuple. We need to update
the cache list to be aware of this.
2017-05-22 15:04:42 +01:00
Erik Johnston
bbfe4e996c Make get_state_groups_from_groups faster.
Most of the time was spent copying a dict to filter out sentinel values
that indicated that keys did not exist in the dict. The sentinel values
were added to ensure that we cached the non-existence of keys.

By updating DictionaryCache to keep track of which keys were known to
not exist itself we can remove a dictionary copy.
2017-05-17 15:12:15 +01:00
Erik Johnston
ffad4fe35b Don't update event cache hit ratio from get_joined_users
Otherwise the hit ration of plain get_events gets completely skewed by
calls to get_joined_users* functions.
2017-05-08 16:06:17 +01:00
Erik Johnston
d2d8ed4884 Optimise caches with single key 2017-05-04 14:18:46 +01:00
Richard van der Hoff
2e996271fe Instantiate DeferredTimedOutError correctly
Call `super` correctly, so that we correctly initialise the `errcode` field.

Fixes https://github.com/matrix-org/synapse/issues/2179.
2017-05-02 13:26:17 +01:00
Erik Johnston
d9aa645f86 Reduce size of joined_user cache
The _get_joined_users_from_context cache stores a mapping from user_id
to avatar_url and display_name. Instead of storing those in a dict,
store them in a namedtuple as that uses much less memory.

We also try converting the string to ascii to further reduce the size.
2017-04-25 14:38:51 +01:00
Erik Johnston
efab1dadde Remove DEBUG_CACHES 2017-04-25 10:54:09 +01:00
Erik Johnston
119cb9bbcf Reduce cache size by not storing deferreds
Currently the cache descriptors store deferreds rather than raw values,
this is a simple way of triggering only one database hit and sharing the
result if two callers attempt to get the same value.

However, there are a few caches that simply store a mapping from string
to string (or int). These caches can have a large number of entries,
under the assumption that each entry is small. However, the size of a
deferred (specifically the size of ObservableDeferred) is signigicantly
larger than that of the raw value, 2kb vs 32b.

This PR therefore changes the cache descriptors to store the raw values
rather than the deferreds.

As a side effect cached storage function now either return a deferred or
the actual value, as the cached list decriptor already does. This is
fine as we always end up just yield'ing on the returned value
eventually, which handles that case correctly.
2017-04-25 10:23:11 +01:00
Erik Johnston
d134d0935e Only intern ascii strings 2017-04-24 14:07:48 +01:00
Richard van der Hoff
e2eebf1696 Fix fixme in preserve_fn
`preserve_fn` is no longer used as a decorator anywhere, so we can safely fix a
fixme therein.
2017-04-03 15:38:02 +01:00
Erik Johnston
4d17add8de Remove unused instance variable 2017-03-31 09:38:27 +01:00
Erik Johnston
5b5b171f3e Docs 2017-03-30 17:05:53 +01:00
Erik Johnston
b282fe7170 Revert log context change 2017-03-30 17:03:59 +01:00
Erik Johnston
6194a64ae9 Doc new instance variables 2017-03-30 14:19:10 +01:00
Erik Johnston
014fee93b3 Manually calculate cache key as getcallargs is expensive
This is because getcallargs recomputes the getargspec, amongst other
things, which we don't need to do as its already been done
2017-03-30 14:14:46 +01:00
Erik Johnston
86780a8bc3 Don't convert to deferreds when not necessary 2017-03-30 14:14:36 +01:00
Richard van der Hoff
f9b4bb05e0 Fix the logcontext handling in the cache wrappers (#2077)
The cache wrappers had a habit of leaking the logcontext into the reactor while
the lookup function was running, and then not restoring it correctly when the
lookup function had completed. It's all the fault of
`preserve_context_over_{fn,deferred}` which are basically a bit broken.
2017-03-30 13:22:24 +01:00
Richard van der Hoff
9397edb28b Merge pull request #2050 from matrix-org/rav/federation_backoff
push federation retry limiter down to matrixfederationclient
2017-03-23 22:27:01 +00:00
Richard van der Hoff
06ce7335e9 Merge pull request #2052 from matrix-org/rav/time_bound_deferred
Fix time_bound_deferred to throw the right exception
2017-03-23 22:22:54 +00:00
Richard van der Hoff
5a16cb4bf0 Ignore backoff history for invites, aliases, and roomdirs
Add a param to the federation client which lets us ignore historical backoff
data for federation queries, and set it for a handful of operations.
2017-03-23 12:23:22 +00:00
Richard van der Hoff
b88a323ffb Fix time_bound_deferred to throw the right exception
Due to a failure to instantiate DeferredTimedOutError, time_bound_deferred
would throw a CancelledError when the deferred timed out, which was rather
confusing.
2017-03-23 12:07:11 +00:00
Richard van der Hoff
4bd597d9fc push federation retry limiter down to matrixfederationclient
rather than having to instrument everywhere we make a federation call,
make the MatrixFederationHttpClient manage the retry limiter.
2017-03-23 09:28:46 +00:00
Richard van der Hoff
19b9366d73 Fix a couple of logcontext leaks
Use preserve_fn to correctly manage the logcontexts around things we don't want
to yield on.
2017-03-23 00:17:46 +00:00
Richard van der Hoff
95f21c7a66 Fix caching of remote servers' signature keys
The `@cached` decorator on `KeyStore._get_server_verify_key` was missing
its `num_args` parameter, which meant that it was returning the wrong key for
any server which had more than one recorded key.

By way of a fix, change the default for `num_args` to be *all* arguments. To
implement that, factor out a common base class for `CacheDescriptor` and `CacheListDescriptor`.
2017-03-22 15:11:30 +00:00
Richard van der Hoff
bd08ee7a46 Merge pull request #2026 from matrix-org/rav/logcontext_docs
Logcontext docs
2017-03-20 12:05:21 +00:00
Richard van der Hoff
f40c2db05a Stop preserve_fn leaking context into the reactor
Fix a bug in ``logcontext.preserve_fn`` which made it leak context into the
reactor, and add a test for it.

Also, get rid of ``logcontext.reset_context_after_deferred``, which tried to do
the same thing but had its own, different, set of bugs.
2017-03-18 00:07:43 +00:00
Richard van der Hoff
d2d146a314 Logcontext docs 2017-03-17 23:59:28 +00:00
Richard van der Hoff
2abe85d50e Merge pull request #2016 from matrix-org/rav/queue_pdus_during_join
Queue up federation PDUs while a room join is in progress
2017-03-17 11:32:44 +00:00
Richard van der Hoff
29ed09e80a Fix assertion to stop transaction queue getting wedged
... and update some docstrings to correctly reflect the types being used.

get_new_device_msgs_for_remote can return a long under some circumstances,
which was being stored in last_device_list_stream_id_by_dest, and was then
upsetting things on the next loop.
2017-03-15 12:16:55 +00:00
Richard van der Hoff
b5d1c68beb Implement reset_context_after_deferred
to correctly reset the context when we fire off a deferred we aren't going to
wait for.
2017-03-15 02:21:07 +00:00
David Baker
73a5f06652 Support registration / login with phone number
Changes from https://github.com/matrix-org/synapse/pull/1971
2017-03-13 17:27:51 +00:00
Erik Johnston
7eae6eaa2f Revert "Support registration & login with phone number" 2017-03-13 09:59:33 +00:00
Erik Johnston
3545e17f43 Add setdefault key to ExpiringCache 2017-03-10 10:30:49 +00:00
David Baker
9d0d40fc15 Docs 2017-03-08 19:05:29 +00:00
David Baker
3edc57296d Incorrectly copied copyright
This file post-dates OM
2017-03-08 19:00:51 +00:00
David Baker
1c99934b28 pep8 2017-03-08 11:58:20 +00:00
David Baker
a9e2b9ec16 Add msisdn util file 2017-03-08 11:53:36 +00:00
Erik Johnston
6b61060b51 Comment 2017-02-02 14:47:15 +00:00
Erik Johnston
9efcc3f3be Comment 2017-02-02 13:50:22 +00:00
Erik Johnston
df4ecff5a9 Correctly raise exceptions for ratelimitng. Ratelimit on 401 2017-02-01 15:42:19 +00:00
Erik Johnston
fe08db2713 Remove explicit < 400 check as apparently this is confusing 2017-01-31 15:21:32 +00:00
Erik Johnston
4c0ec15bdc Comment 2017-01-31 13:53:46 +00:00
Erik Johnston
85c590105f Comment 2017-01-31 13:46:38 +00:00
Erik Johnston
ae7a132f38 Better handle 404 response for federation /send/ 2017-01-31 13:40:09 +00:00
Erik Johnston
c430111d0e Update LruCache size estimate on clear 2017-01-18 14:55:23 +00:00
Erik Johnston
380dba1020 Measure metrics of string_cache 2017-01-17 17:04:46 +00:00
Erik Johnston
37b4c7d8a9 Fix typo in return type 2017-01-17 14:43:32 +00:00
Erik Johnston
d6c75cb7c2 Rename and comment tree_to_leaves_iterator 2017-01-17 11:47:03 +00:00
Erik Johnston
1ccd5676e3 Remove needless call to evict() 2017-01-17 11:42:26 +00:00
Erik Johnston
f85b6ca494 Speed up cache size calculation
Instead of calculating the size of the cache repeatedly, which can take
a long time now that it can use a callback, instead cache the size and
update that on insertion and deletion.

This requires changing the cache descriptors to have two caches, one for
pending deferreds and the other for the actual values. There's no reason
to evict from the pending deferreds as they won't take up any more
memory.
2017-01-17 11:18:13 +00:00
Erik Johnston
6d00213e80 Use OrderedDict in ExpiringCache 2017-01-16 15:33:22 +00:00
Erik Johnston
46aebbbcbf Add support for 'iterable' to ExpiringCache 2017-01-16 14:57:23 +00:00
Erik Johnston
2fae34bd2c Optionally measure size of cache by sum of length of values 2017-01-13 17:46:17 +00:00
Erik Johnston
bf5c9706d9 Remove full_twisted_stacktraces option
The debug 'full_twisted_stacktraces' flag caused synapse to rewrite
twisted deferreds to always fire the callback on the next reactor tick.
This was to force the deferred to always store the stacktraces on
exceptions, and thus be more likely to have a full stacktrace when it
reaches the final error handlers and gets printed to the logs.

Dynamically rewriting things is generally bad, and in particular this
change violates assumptions of various bits of Twisted. This wouldn't
necessarily be so bad, but it turns out this option has been turned on
on some production servers.

Turning the option can cause e.g. #1778.

For now, lets just entirely nuke this option.
2017-01-12 10:32:52 +00:00
Erik Johnston
f477370c0c Add paranoia exception catch in Linearizer 2017-01-10 14:04:13 +00:00
Mark Haines
dd3df11c55 More logging for the linearizer and for get_events 2017-01-05 12:32:47 +00:00
Mark Haines
62ce3034f3 s/aquire/acquire/g 2016-12-30 20:04:44 +00:00
Mark Haines
0aff09f6c9 Add more useful logging when we block fetching events 2016-12-30 20:00:44 +00:00
Erik Johnston
d53a80af25 Merge pull request #1620 from matrix-org/erikj/concurrent_room_access
Limit the number of events that can be created on a given room concurrently
2016-12-12 10:30:23 +00:00
Erik Johnston
fbaf868f62 Correctly handle timeout errors 2016-12-09 16:30:29 +00:00
Erik Johnston
11bfe438a2 Use correct var 2016-11-24 15:26:53 +00:00
Erik Johnston
aaecffba3a Correctly handle 500's and 429 on federation 2016-11-24 15:04:49 +00:00
Erik Johnston
90565d015e Invalidate retry cache in both directions 2016-11-22 17:45:44 +00:00
Erik Johnston
f8ee66250a Handle sending events and device messages over federation 2016-11-17 15:48:04 +00:00
Erik Johnston
d56c39cf24 Use external ldap auth pacakge 2016-11-15 13:03:19 +00:00
Kegan Dougal
3991b4cbdb Clean transactions based on time. Add HttpTransactionCache tests. 2016-11-14 11:19:24 +00:00
Erik Johnston
64038b806c Comments 2016-11-11 10:42:08 +00:00
Erik Johnston
d073cb7ead Add Limiter: limit concurrent access to resource 2016-11-10 16:29:51 +00:00
Erik Johnston
27d3f2e7ab Explicitly set authentication mode in ldap3
This only makes a difference for versions of ldap3 before 1.0, but a)
its best to be explicit and b) there are distributions that package
ancient versions for ldap3 (e.g. debian).
2016-11-08 14:35:25 +00:00
Erik Johnston
850b103b36 Implement pluggable password auth
Allows delegating the password auth to an external module. This also
moves the LDAP auth to using this system, allowing it to be removed from
the synapse tree entirely in the future.
2016-10-03 10:36:40 +01:00
Erik Johnston
955f34d23e Change get_pos_of_last_change to return upper bound 2016-09-15 15:12:07 +01:00
Erik Johnston
cb3edec6af Use stream_change cache to make get_forward_extremeties_for_room cache more effective 2016-09-15 14:28:13 +01:00
Erik Johnston
7356d52e73 Fix up push to use get_current_state_ids 2016-08-25 18:35:49 +01:00
Erik Johnston
9219139351 Preserve some logcontexts 2016-08-24 11:58:40 +01:00
Erik Johnston
e65bc7d315 Merge pull request #1031 from matrix-org/erikj/measure_notifier
Add more Measure blocks
2016-08-22 12:13:07 +01:00
Erik Johnston
8731197e54 Only abort Measure on Exceptions 2016-08-19 18:23:45 +01:00
Erik Johnston
45fd2c8942 Ensure invalidation list does not grow unboundedly 2016-08-19 16:09:16 +01:00
Erik Johnston
c0d7d9d642 Rename to on_invalidate 2016-08-19 15:13:58 +01:00
Erik Johnston
dc76a3e909 Make cache_context an explicit option 2016-08-19 15:02:38 +01:00
Erik Johnston
ba214a5e32 Remove lru option 2016-08-19 14:17:11 +01:00
Erik Johnston
4161ff2fc4 Add concept of cache contexts 2016-08-19 14:17:07 +01:00
Erik Johnston
ca8abfbf30 Clean up TransactionQueue 2016-08-10 16:24:16 +01:00
Erik Johnston
3bc9629be5 Measure federation send transaction resources 2016-08-10 10:56:38 +01:00
Erik Johnston
24f36469bc Add federation /version API 2016-08-05 16:36:07 +01:00
Erik Johnston
7b0f6293f2 Merge pull request #940 from matrix-org/erikj/fed_state_cache
Cache federation state responses
2016-08-02 15:21:37 +01:00
Mark Haines
bf81e38d36 Fix retry utils to check if the exception is a subclass of CME 2016-07-28 10:47:02 +01:00
David Baker
389c890f14 Don't include name of room for invites in push
Avoids insane pushes like, "Bob invited you to invite from Bob"
2016-07-28 10:20:47 +01:00
Matthew Hodgson
242c52d607 typo 2016-07-26 10:09:25 +02:00
Erik Johnston
248e6770ca Cache federation state responses 2016-07-21 10:30:12 +01:00
Erik Johnston
7335f0adda Add ReadWriteLock 2016-07-05 15:23:17 +01:00
David Baker
870c45913e Use similar naming we use in email notifs for push
Fixes https://github.com/vector-im/vector-web/issues/1654
2016-06-24 11:41:11 +01:00
Mark Haines
9f1800fba8 Remove registered_users from the distributor.
The only place that was observed was to set the profile. I've made it
so that the profile is set within store.register in the same transaction
that creates the user.

This required some slight changes to the registration code for upgrading
guest users, since it previously relied on the distributor swallowing errors
if the profile already existed.
2016-06-17 19:14:16 +01:00
Erik Johnston
3b096c5f5c Merge branch 'erikj/cache_perf' of github.com:matrix-org/synapse into develop 2016-06-03 12:00:33 +01:00
Erik Johnston
58a224a651 Pull out update_results_dict 2016-06-03 11:47:07 +01:00