This is more involved than it might otherwise be, because the current
implementation just drops its logcontexts and runs everything in the sentinel
context.
It turns out that we aren't actually using a bunch of the functionality here
(notably suppress_failures and the fact that Distributor.fire returns a
deferred), so the easiest way to fix this is actually by simplifying a bunch of
code.
This fixes#3518, and ensures that we get useful logs and metrics for lots of
things that happen in the background.
(There are certainly more things that happen in the background; these are just
the common ones I've found running a single-process synapse locally).
This introduces a mechanism for tracking resource usage by background
processes, along with an example of how it will be used.
This will help address #3518, but more importantly will give us better insights
into things which are happening but not being shown up by the request metrics.
We *could* do this with Measure blocks, but:
- I think having them pulled out as a completely separate metric class will
make it easier to distinguish top-level processes from those which are
nested.
- I want to be able to report on in-flight background processes, and I don't
think we want to do this for *all* Measure blocks.
The get_entities_changed function was changed to return all changed
entities since the given stream position, rather than only those changed
from a given list of entities. This resulted in the function incorrectly
returning large numbers of entities that, for example, caused large
increases in database usage.
parse_integer and parse_string can take a request and raise errors
in case we have wrong or missing params.
This PR tries to use them more to deduplicate some code and make it
better readable
The stream cache keeps track of all entities that have changed since
a particular stream position, so get_entities_changed does not need to
return unknown entites when given a larger stream position.
This makes it consistent with the behaviour of has_entity_changed.
This line shows up as about 5% of cpu time on a synchrotron:
not_known_entities = set(entities) - set(self._entity_to_key)
Presumably the problem here is that _entity_to_key can be largeish, and
building a set for its keys every time this function is called is slow.
Here we rewrite the logic to avoid building so many sets.
Previously we queued up the poke correctly when the device was deleted,
but then the actual EDU wouldn't get sent, as the device was no longer known.
Instead, we now send EDUs for deleted devices too if there's a poke for them.
This default config is parsed and used a base before the actual
config is overlaid, so with these values not commented out, the
code to detect when no turn params were set and refuse to generate
credentials was never firing because the dummy default was always set.
We need to do a bit more validation when we get a server name, but don't want
to be re-doing it all over the shop, so factor out a separate
parse_and_validate_server_name, and do the extra validation.
Also, use it to verify the server name in the config file.
Make sure that server_names used in auth headers are sane, and reject them with
a sensible error code, before they disappear off into the depths of the system.
otherwise we explode with:
```
Traceback (most recent call last):
File /usr/lib/python2.7/logging/handlers.py, line 78, in emit
logging.FileHandler.emit(self, record)
File /usr/lib/python2.7/logging/__init__.py, line 950, in emit
StreamHandler.emit(self, record)
File /usr/lib/python2.7/logging/__init__.py, line 887, in emit
self.handleError(record)
File /usr/lib/python2.7/logging/__init__.py, line 810, in handleError
None, sys.stderr)
File /usr/lib/python2.7/traceback.py, line 124, in print_exception
_print(file, 'Traceback (most recent call last):')
File /usr/lib/python2.7/traceback.py, line 13, in _print
file.write(str+terminator)
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_io.py, line 170, in write
self.log.emit(self.level, format=u{log_io}, log_io=line)
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 144, in emit
self.observer(event)
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 136, in __call__
errorLogger = self._errorLoggerForObserver(brokenObserver)
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 156, in _errorLoggerForObserver
if obs is not observer
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 81, in __init__
self.log = Logger(observer=self)
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 64, in __init__
namespace = self._namespaceFromCallingContext()
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 42, in _namespaceFromCallingContext
return currentframe(2).f_globals[__name__]
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/python/compat.py, line 93, in currentframe
for x in range(n + 1):
RuntimeError: maximum recursion depth exceeded while calling a Python object
Logged from file site.py, line 129
File /usr/lib/python2.7/logging/__init__.py, line 859, in emit
msg = self.format(record)
File /usr/lib/python2.7/logging/__init__.py, line 732, in format
return fmt.format(record)
File /usr/lib/python2.7/logging/__init__.py, line 471, in format
record.message = record.getMessage()
File /usr/lib/python2.7/logging/__init__.py, line 335, in getMessage
msg = msg % self.args
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)
Logged from file site.py, line 129
```
...where the logger apparently recurses whilst trying to log the error, hitting the
maximum recursion depth and killing everything badly.
Most rooms have a trivial history visibility like "shared" or
"world_readable", especially large rooms, so lets not bother getting the
full membership of those rooms in that case.
When _get_state_for_groups is given a wildcard filter, just do a complete
lookup. Hopefully this will give us the best of both worlds by not filling up
the ram if we only need one or two keys, but also making the cache still work
for the federation reader usecase.
When we finish processing a request, log the number of events we fetched from
the database to handle it.
[I'm trying to figure out which requests are responsible for large amounts of
event cache churn. It may turn out to be more helpful to add counts to the
prometheus per-request/block metrics, but that is an extension to this code
anyway.]
when there is no `m.room.power_levels` event in force in the room. (PR #3397)
Discussion around the Matrix Spec change proposal for this change can be
followed at https://github.com/matrix-org/matrix-doc/issues/1304.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJbIop9AAoJEIofk9V1tejV9lsIAJVH0l5dXROmy1KH/zt16AUA
CXa6Vv4Vyo6hKad/fZ81OZVRr5ChK/TvbIJVn/SA/muCfdoIFdxhT8eo/pXzO2UW
zReuLsDhAg+gSvpNus37oWj2FVsAE1HYDZ60lfaapAdZnkFit68d5DQZjO6nZHHA
YUXcU3GUwj0ZYuUzFzYKMLu6uNNasNkN8h6SS2lF7Bm4JaKDW+mFMfCyJwdIVSEh
BGhHoVpXdxFysD9s6Mwxqrz3KKg1Jtp7idDkk0x2S2Eh+gxyiDQQokv0oQ3+0+HG
sgy5Iz2t2CkpS02/j+LOvAZljTmnD0bXu3srGR+25StsoDFP038Am3bfQwtD190=
=9jsT
-----END PGP SIGNATURE-----
Merge tag 'v0.31.2'
SECURITY UPDATE: Prevent unauthorised users from setting state events in a room
when there is no `m.room.power_levels` event in force in the room. (PR #3397)
Discussion around the Matrix Spec change proposal for this change can be
followed at https://github.com/matrix-org/matrix-doc/issues/1304.
This is only used by filter_events_for_client, so we can simplify the whole
thing by just doing one user at a time, and removing a dead storage function to
boot.
=======================================
v0.31.1 fixes a security bug in the ``get_missing_events`` federation API
where event visibility rules were not applied correctly.
We are not aware of it being actively exploited but please upgrade asap.
Bug Fixes:
* Fix event filtering in get_missing_events handler (PR #3371)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEETQ1YthIGLQRddG54CTxDAAxPS/QFAlsalP8ACgkQCTxDAAxP
S/RADw/+NeDu0LjVpS5Uc4ElHgRBuFSm6l2i4z8rZBBlKSYnuq0Em4WMvLloi/JF
iAvTOYE7OjmF+gNvmsdH1N7hc1lKdQ2gAlpvaQR/5Qz9NtOVmM3WPZxS7n5jZHvD
hVSxeO9+GQOwK7rJorqrrsnWHQt0OkLHV6WThFdrgZb1JjWCUDTvw+Hei2uMX2aq
y2mkMG4TLStHwMvL2qw0h+hFtXywXI796qJR73ZxbEn24YD+kOXeEVkIFi2LT0Pj
cgkg7WWT32JD43/ioumLupZuhCmpRyxn4fi5gIKpXe5kiLsxOdApzNQSwmoJ+WA8
7zlrWY+0QDN4pbA5ESLitWSWAT50Ul//uM4nwmM4xEBPHdljXvKyHPsaSCeKLvT9
RT8pc41TQAqSshlXF8zgIAtStnF3oGel3EBBl1mmM9Un1ULnBFwWDZhlIm+ZZGhJ
MWoAWNG7j8AQuy0BTUAUr76x7t+/cdSqDuyVl1GO1tbDh0DUWoHZGXCUKrAXnn2T
SbiFigwOLvEADbvkW7L9Je9CVOi2V5Pg/32X9O8YMEiSz+j5PQEiGefyVh/I/QvV
Ha/atRpZF2OZ+XUOO5DLZMP/XCXpVgvHuskzfU6LvvVQCXgExuJhRD1PrRGeBaWr
zjJW+rmY+VeredqX7QkTB3XOGLMLGJfx2FeSMb+j0w3a7/iFj+Q=
=J80S
-----END PGP SIGNATURE-----
Merge tag 'v0.31.1'
Changes in synapse v0.31.1 (2018-06-08)
=======================================
v0.31.1 fixes a security bug in the ``get_missing_events`` federation API
where event visibility rules were not applied correctly.
We are not aware of it being actively exploited but please upgrade asap.
Bug Fixes:
* Fix event filtering in get_missing_events handler (PR #3371)
Firstly, don't swallow the reason for the failure
Secondly, don't assume all exceptions are verification failures
Thirdly, log a bit of info about the key being used if debug is enabled
These "temporary fixes" have been here three and a half years, and I can't find
any events in the matrix.org database where the calculated signature differs
from what's in the db. It's time for them to go away.
======================================
Most notable change from v0.30.0 is to switch to python prometheus library to improve system
stats reporting. WARNING this changes a number of prometheus metrics in a
backwards-incompatible manner. For more details, see
`docs/metrics-howto.rst <docs/metrics-howto.rst#removal-of-deprecated-metrics--time-based-counters-becoming-histograms-in-0310>`_.
Bug Fixes:
* Fix metric documentation tables (PR #3341)
* Fix LaterGuage error handling (694968f)
* Fix replication metrics (b7e7fd2)
Changes in synapse v0.31.0-rc1 (2018-06-04)
==========================================
Features:
* Switch to the Python Prometheus library (PR #3256, #3274)
* Let users leave the server notice room after joining (PR #3287)
Changes:
* daily user type phone home stats (PR #3264)
* Use iter* methods for _filter_events_for_server (PR #3267)
* Docs on consent bits (PR #3268)
* Remove users from user directory on deactivate (PR #3277)
* Avoid sending consent notice to guest users (PR #3288)
* disable CPUMetrics if no /proc/self/stat (PR #3299)
* Add local and loopback IPv6 addresses to url_preview_ip_range_blacklist (PR #3312) Thanks to @thegcat!
* Consistently use six's iteritems and wrap lazy keys/values in list() if they're not meant to be lazy (PR #3307)
* Add private IPv6 addresses to example config for url preview blacklist (PR #3317) Thanks to @thegcat!
* Reduce stuck read-receipts: ignore depth when updating (PR #3318)
* Put python's logs into Trial when running unit tests (PR #3319)
Changes, python 3 migration:
* Replace some more comparisons with six (PR #3243) Thanks to @NotAFile!
* replace some iteritems with six (PR #3244) Thanks to @NotAFile!
* Add batch_iter to utils (PR #3245) Thanks to @NotAFile!
* use repr, not str (PR #3246) Thanks to @NotAFile!
* Misc Python3 fixes (PR #3247) Thanks to @NotAFile!
* Py3 storage/_base.py (PR #3278) Thanks to @NotAFile!
* more six iteritems (PR #3279) Thanks to @NotAFile!
* More Misc. py3 fixes (PR #3280) Thanks to @NotAFile!
* remaining isintance fixes (PR #3281) Thanks to @NotAFile!
* py3-ize state.py (PR #3283) Thanks to @NotAFile!
* extend tox testing for py3 to avoid regressions (PR #3302) Thanks to @krombel!
* use memoryview in py3 (PR #3303) Thanks to @NotAFile!
Bugs:
* Fix federation backfill bugs (PR #3261)
* federation: fix LaterGauge usage (PR #3328) Thanks to @intelfx!
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEETQ1YthIGLQRddG54CTxDAAxPS/QFAlsXxJAACgkQCTxDAAxP
S/SJTg//Wtr+Qop9LJh2/leAYXpyqW6P7Ftak0w3aJ3KL3+tYg32yYNoRADCqbp3
LkrHu8MwbZagHjRUyEWNfDk4jbfq5fwh0JVGmYuUKhG9aF0HYyytKkbW79YzuhdQ
dfHj9x0xSBOUvgt/husloZSDy0VHC6uyQSAFgFDyHS2y7RPAiGstqLGByv0ciZOk
pO7TdjkUQcx4Ps7Wgip31NuHy3GY2int6f540pUXoZHLXs7RkfqS2cpF9Z/sTXJ4
xDLiY7uYNsTcCblwqaiijY5c90xwRB2vLs5CJdKFgyB6PNgg/2wHJqP/WHHEj+F8
BoSm3Ts7NXQf23pP9CXICe7vXX3J+ruOnC7FOSRobr6KGjn6DUrIZxo1ZepTwpp9
DIq+1eOFKKjwLQM3Jdi8WBCP63LhYXrTZxreke3jpwdcD7oIO9v6/e9J1gU5xHWa
Izg+YnWn1JLfq/X8T7YTZddUXGGPkH5i6LZKKkyY8u7LkJ4WR7syuAceUzkOOIAq
UWO0uEV7IiLnZzZGTtNIlEqtuklmVQTm6bvAgTPabai2JQyngFpH5M/5mPpVSiLV
QRLwaM56c+5GGZJWef8vxdGeYn+8rFI/UUniJ7358kLJF2IHsxlZu8J0ZZO2HWI2
ze5Kz0AWRzXLhWzq62Qb2dsiGySrZ7hng1tDxLak2IiusY+9SjM=
=Mz9U
-----END PGP SIGNATURE-----
Merge tag 'v0.31.0'
Changes in synapse v0.31.0 (2018-06-06)
======================================
Most notable change from v0.30.0 is to switch to python prometheus library to improve system
stats reporting. WARNING this changes a number of prometheus metrics in a
backwards-incompatible manner. For more details, see
`docs/metrics-howto.rst <docs/metrics-howto.rst#removal-of-deprecated-metrics--time-based-counters-becoming-histograms-in-0310>`_.
Bug Fixes:
* Fix metric documentation tables (PR #3341)
* Fix LaterGuage error handling (694968f)
* Fix replication metrics (b7e7fd2)
Changes in synapse v0.31.0-rc1 (2018-06-04)
==========================================
Features:
* Switch to the Python Prometheus library (PR #3256, #3274)
* Let users leave the server notice room after joining (PR #3287)
Changes:
* daily user type phone home stats (PR #3264)
* Use iter* methods for _filter_events_for_server (PR #3267)
* Docs on consent bits (PR #3268)
* Remove users from user directory on deactivate (PR #3277)
* Avoid sending consent notice to guest users (PR #3288)
* disable CPUMetrics if no /proc/self/stat (PR #3299)
* Add local and loopback IPv6 addresses to url_preview_ip_range_blacklist (PR #3312) Thanks to @thegcat!
* Consistently use six's iteritems and wrap lazy keys/values in list() if they're not meant to be lazy (PR #3307)
* Add private IPv6 addresses to example config for url preview blacklist (PR #3317) Thanks to @thegcat!
* Reduce stuck read-receipts: ignore depth when updating (PR #3318)
* Put python's logs into Trial when running unit tests (PR #3319)
Changes, python 3 migration:
* Replace some more comparisons with six (PR #3243) Thanks to @NotAFile!
* replace some iteritems with six (PR #3244) Thanks to @NotAFile!
* Add batch_iter to utils (PR #3245) Thanks to @NotAFile!
* use repr, not str (PR #3246) Thanks to @NotAFile!
* Misc Python3 fixes (PR #3247) Thanks to @NotAFile!
* Py3 storage/_base.py (PR #3278) Thanks to @NotAFile!
* more six iteritems (PR #3279) Thanks to @NotAFile!
* More Misc. py3 fixes (PR #3280) Thanks to @NotAFile!
* remaining isintance fixes (PR #3281) Thanks to @NotAFile!
* py3-ize state.py (PR #3283) Thanks to @NotAFile!
* extend tox testing for py3 to avoid regressions (PR #3302) Thanks to @krombel!
* use memoryview in py3 (PR #3303) Thanks to @NotAFile!
Bugs:
* Fix federation backfill bugs (PR #3261)
* federation: fix LaterGauge usage (PR #3328) Thanks to @intelfx!
This is unused. IT MUST DIE!!!1
̧̪͈̱̹̳͖͙H̵̰̤̰͕̖e̛ ͚͉̗̼̞w̶̩̥͉̮h̩̺̪̩͘ͅọ͎͉̟ ̜̩͔̦̘ͅW̪̫̩̣̲͔̳a͏͔̳͖i͖͜t͓̤̠͓͙s̘̰̩̥̙̝ͅ ̲̠̬̥Be̡̙̫̦h̰̩i̛̫͙͔̭̤̗̲n̳͞d̸ ͎̻͘T̛͇̝̲̹̠̗ͅh̫̦̝ͅe̩̫͟ ͓͖̼W͕̳͎͚̙̥ą̙l̘͚̺͔͞ͅl̳͍̙̤̤̮̳.̢
̟̺̜̙͉Z̤̲̙̙͎̥̝A͎̣͔̙͘L̥̻̗̳̻̳̳͢G͉̖̯͓̞̩̦O̹̹̺!̙͈͎̞̬ *
The added addresses are expected to be local or loopback addresses and
shouldn't be spidered for previews.
Signed-off-by: Felix Schäfer <felix@thegcat.net>
The pagination storage function supported not specifiying a limit on the
number of events returned. This was triggered when using the search or
context API with a limit of zero, which the storage function took to
mean not being limited.
The transaction cache has some code which tries to stop it caching failures,
but if the callback function failed straight away, then things would happen
backwards and we'd end up with the failure stuck in the cache.
There's a frequent idiom I noticed where an iterable is split up into a
number of chunks/batches. Unfortunately that method does not work with
iterators like dict.keys() in python3. This implementation works with
iterators.
Signed-off-by: Adrian Tschira <nota@notafile.com>
Server Notices use a special room which the user can't dismiss. They are
created on demand when some other bit of the code calls send_notice.
(This doesn't actually do much yet becuse we don't call send_notice anywhere)
This simplifies things as it is, but will also allow us to change the
way we traverse topologically without having to update the way push
actions work.
(instead of everywhere that writes a response. Or rather, the subset of places
which write responses where we haven't forgotten it).
This also means that we don't have to have the mysterious version_string
attribute in anything with a request handler.
Unfortunately it does mean that we have to pass the version string wherever we
instantiate a SynapseSite, which has been c&ped 150 times, but that is code
that ought to be cleaned up anyway really.
This is useful in its own right, because server.py is full of stuff; but more
importantly, I want to do some refactoring that will cause a circular reference
as it is.
The sync API often returns events in a topological rather than stream
ordering, e.g. when the user joined the room or on initial sync. When
this happens we can reuse existing pagination storage functions.
There is no reason to return a tuple of tokens when the last token is
always the token passed as an argument. Changing it makes it consistent
with other storage APIs
This implements this very crudely: this probably isn't viable
because parting a user from all their rooms could take a long time,
and if the HS gets restarted in that time the process will be
aborted.
So, it turns out that if you have a first `Deferred` `D1`, you can add a
callback which returns another `Deferred` `D2`, and `D2` must then complete
before any further callbacks on `D1` will execute (and later callbacks on `D1`
get the *result* of `D2` rather than `D2` itself).
So, `D1` might have `called=True` (as in, it has started running its
callbacks), but any new callbacks added to `D1` won't get run until `D2`
completes - so if you `yield D1` in an `inlineCallbacks` function, your `yield`
will 'block'.
In conclusion: some of our assumptions in `logcontext` were invalid. We need to
make sure that we don't optimise out the logcontext juggling when this
situation happens. Fortunately, it is easy to detect by checking `D1.paused`.