There were a bunch of places where we fire off a process to happen in the
background, but don't have any exception handling on it - instead relying on
the unhandled error being logged when the relevent deferred gets
garbage-collected.
This is unsatisfactory for a number of reasons:
- logging on garbage collection is best-effort and may happen some time after
the error, if at all
- it can be hard to figure out where the error actually happened.
- it is logged as a scary CRITICAL error which (a) I always forget to grep for
and (b) it's not really CRITICAL if a background process we don't care about
fails.
So this is an attempt to add exception handling to everything we fire off into
the background.
Changes in synapse v0.28.0-rc1 (2018-04-26)
===========================================
Bug Fixes:
* Fix quarantine media admin API and search reindex (PR #3130)
* Fix media admin APIs (PR #3134)
Changes in synapse v0.28.0-rc1 (2018-04-24)
===========================================
Minor performance improvement to federation sending and bug fixes.
(Note: This release does not include state resolutions discussed in matrix live)
Features:
* Add metrics for event processing lag (PR #3090)
* Add metrics for ResponseCache (PR #3092)
Changes:
* Synapse on PyPy (PR #2760) Thanks to @Valodim!
* move handling of auto_join_rooms to RegisterHandler (PR #2996) Thanks to @krombel!
* Improve handling of SRV records for federation connections (PR #3016) Thanks to @silkeh!
* Document the behaviour of ResponseCache (PR #3059)
* Preparation for py3 (PR #3061, #3073, #3074, #3075, #3103, #3104, #3106, #3107, #3109, #3110) Thanks to @NotAFile!
* update prometheus dashboard to use new metric names (PR #3069) Thanks to @krombel!
* use python3-compatible prints (PR #3074) Thanks to @NotAFile!
* Send federation events concurrently (PR #3078)
* Limit concurrent event sends for a room (PR #3079)
* Improve R30 stat definition (PR #3086)
* Send events to ASes concurrently (PR #3088)
* Refactor ResponseCache usage (PR #3093)
* Clarify that SRV may not point to a CNAME (PR #3100) Thanks to @silkeh!
* Use str(e) instead of e.message (PR #3103) Thanks to @NotAFile!
* Use six.itervalues in some places (PR #3106) Thanks to @NotAFile!
* Refactor store.have_events (PR #3117)
Bug Fixes:
* Return 401 for invalid access_token on logout (PR #2938) Thanks to @dklug!
* Return a 404 rather than a 500 on rejoining empty rooms (PR #3080)
* fix federation_domain_whitelist (PR #3099)
* Avoid creating events with huge numbers of prev_events (PR #3113)
* Reject events which have lots of prev_events (PR #3118)
It turns out that most of the time we were calling have_events, we were only
using half of the result. Replace have_events with have_seen_events and
get_rejection_reasons, so that we can see what's going on a bit more clearly.
In most cases, we limit the number of prev_events for a given event to 10
events. This fixes a particular code path which created events with huge
numbers of prev_events.
This is a mixed commit that fixes various small issues
* print parentheses
* 01 is invalid syntax (it was octal in py2)
* [x for i in 1, 2] is invalid syntax
* six moves
Signed-off-by: Adrian Tschira <nota@notafile.com>
These worked accidentally before (python2 doesn't complain if you
compare incompatible types) but under py3 this blows up spectacularly
Signed-off-by: Adrian Tschira <nota@notafile.com>
Doing this I learned e.message was pretty shortlived, added in 2.6,
they realized it was a bad idea and deprecated it in 2.7
Signed-off-by: Adrian Tschira <nota@notafile.com>
They raised KeyError before. I'm changing this because the code uses
hasattr() to check for the presence of a key. This worked accidentally
before, because hasattr() silences all exceptions in python 2. However,
in python3, this isn't the case anymore.
I had a look around to see if anything depended on this raising a
KeyError and I couldn't find anything. Of course, I could have simply
missed it.
Signed-off-by: Adrian Tschira <nota@notafile.com>