This commit replaces SynapseError.from_http_response_exception with
HttpResponseException.to_synapse_error.
The new method actually returns a ProxiedRequestError, which allows us to pass
through additional metadata from the API call.
We really shouldn't be sending all CodeMessageExceptions back over the C-S API;
it will include things like 401s which we shouldn't proxy.
That means that we need to explicitly turn a few HttpResponseExceptions into
SynapseErrors in the federation layer.
The effect of the latter is that the matrix errcode will get passed through
correctly to calling clients, which might help with some of the random
M_UNKNOWN errors when trying to join rooms.
secrets got introduced in python 3.6 so this class is not available
in 3.5 and before.
This now checks for the current running version and only tries using
secrets if the version is 3.6 or above
Signed-Off-By: Matthias Kesler <krombel@krombel.de>
* attempt at deduplicating lazy-loaded members
as per the proposal; we can deduplicate redundant lazy-loaded members
which are sent in the same sync sequence. we do this heuristically
rather than requiring the client to somehow tell us which members it
has chosen to cache, by instead caching the last N members sent to
a client, and not sending them again. For now we hardcode N to 100.
Each cache for a given (user,device) tuple is in turn cached for up to
X minutes (to avoid the caches building up). For now we hardcode X to 30.
* add include_redundant_members filter option & make it work
* remove stale todo
* add tests for _get_some_state_from_cache
* incorporate review
This field is no longer read from, so we should stop populating it. Once we're
happy that this doesn't break everything, and a rollback is unlikely, we can
think about dropping the column.
We've long passed the point where it's possible to have the same event_id in
different tables, so these join conditions are redundant: we can just join on
event_id.
event_edges is of non-trivial size, and the room_id column is wasteful, so
let's stop reading from it. In future, we can stop writing to it, and then drop
it.
(since it uses methods therein)
Turns out that we had a bunch of things which were incorrectly importing
EventWorkerStore from events.py rather than events_worker.py, which broke once
I removed the import into events.py.
It turns out that looping_call does check the deferred returned by its
callback, and (at least in the case of client_ips), we were relying on this,
and I broke it in #3604.
Update run_as_background_process to return the deferred, and make sure we
return it to clock.looping_call.
This fixes a bug in _delete_existing_rows_txn which was introduced in #3435
(though it's been on matrix-org-hotfixes for *years*). This code is only called
when there is some sort of conflict the first time we try to persist an event,
so it only happens rarely. Still, the exceptions are annoying.
We need to run the errback in the sentinel context to avoid losing our own
context.
Also: add logging to runInteraction to help identify where "Starting db
connection from sentinel context" warnings are coming from
_update_remote_profile_cache was missing its `defer.inlineCallbacks`, so when
it was called, would just return a generator object, without actually running
any of the method body.
on_notifier_poke no longer runs synchonously, so we have to do a different hack
to make sure that the replication data has been sent. Let's actually listen for
its arrival.
We incorrectly asserted that all contexts must have a non None state
group without consider outliers. This would usually be fine as the
assertion would never be hit, as there is a shortcut during persistence
if the forward extremities don't change.
However, if the outlier is being persisted with non-outlier events, the
function would be called and the assertion would be hit.
Fixes#3601
This allows us to handle /context/ requests on the client_reader worker
without having to pull in all the various stream handlers (e.g.
precence, typing, pushers etc). The only thing the token gets used for
is pagination, and that ignores everything but the room portion of the
token.
First of all, fix the logic which looks for identical input state groups so
that we actually use them. This turned out to be most easily done by factoring
the relevant code out to a separate function so that we could do an early
return.
Secondly, avoid building the whole `conflicted_state` dict (which was only ever
used as a boolean flag).
Thirdly, replace the construction of the `state` dict (which mapped from keys
to events that set them), with an optimistic construction of the resolution
result assuming there will be no conflicts. This should be no slower than
building the old `state` dict, and:
- in the conflicted case, we'll short-cut it, saving part of the work
- in the unconflicted case, it saves rebuilding the resolution from the
`state` dict.
Finally, do a couple of s/values/itervalues/.
We don't want to bother pulling out the current state from the DB since
until we know we have to. Checking the context for state is just an
optimisation.
This is more involved than it might otherwise be, because the current
implementation just drops its logcontexts and runs everything in the sentinel
context.
It turns out that we aren't actually using a bunch of the functionality here
(notably suppress_failures and the fact that Distributor.fire returns a
deferred), so the easiest way to fix this is actually by simplifying a bunch of
code.
This fixes#3518, and ensures that we get useful logs and metrics for lots of
things that happen in the background.
(There are certainly more things that happen in the background; these are just
the common ones I've found running a single-process synapse locally).
This introduces a mechanism for tracking resource usage by background
processes, along with an example of how it will be used.
This will help address #3518, but more importantly will give us better insights
into things which are happening but not being shown up by the request metrics.
We *could* do this with Measure blocks, but:
- I think having them pulled out as a completely separate metric class will
make it easier to distinguish top-level processes from those which are
nested.
- I want to be able to report on in-flight background processes, and I don't
think we want to do this for *all* Measure blocks.
The get_entities_changed function was changed to return all changed
entities since the given stream position, rather than only those changed
from a given list of entities. This resulted in the function incorrectly
returning large numbers of entities that, for example, caused large
increases in database usage.
parse_integer and parse_string can take a request and raise errors
in case we have wrong or missing params.
This PR tries to use them more to deduplicate some code and make it
better readable
The stream cache keeps track of all entities that have changed since
a particular stream position, so get_entities_changed does not need to
return unknown entites when given a larger stream position.
This makes it consistent with the behaviour of has_entity_changed.
This line shows up as about 5% of cpu time on a synchrotron:
not_known_entities = set(entities) - set(self._entity_to_key)
Presumably the problem here is that _entity_to_key can be largeish, and
building a set for its keys every time this function is called is slow.
Here we rewrite the logic to avoid building so many sets.
Previously we queued up the poke correctly when the device was deleted,
but then the actual EDU wouldn't get sent, as the device was no longer known.
Instead, we now send EDUs for deleted devices too if there's a poke for them.
This default config is parsed and used a base before the actual
config is overlaid, so with these values not commented out, the
code to detect when no turn params were set and refuse to generate
credentials was never firing because the dummy default was always set.
We need to do a bit more validation when we get a server name, but don't want
to be re-doing it all over the shop, so factor out a separate
parse_and_validate_server_name, and do the extra validation.
Also, use it to verify the server name in the config file.
Make sure that server_names used in auth headers are sane, and reject them with
a sensible error code, before they disappear off into the depths of the system.
otherwise we explode with:
```
Traceback (most recent call last):
File /usr/lib/python2.7/logging/handlers.py, line 78, in emit
logging.FileHandler.emit(self, record)
File /usr/lib/python2.7/logging/__init__.py, line 950, in emit
StreamHandler.emit(self, record)
File /usr/lib/python2.7/logging/__init__.py, line 887, in emit
self.handleError(record)
File /usr/lib/python2.7/logging/__init__.py, line 810, in handleError
None, sys.stderr)
File /usr/lib/python2.7/traceback.py, line 124, in print_exception
_print(file, 'Traceback (most recent call last):')
File /usr/lib/python2.7/traceback.py, line 13, in _print
file.write(str+terminator)
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_io.py, line 170, in write
self.log.emit(self.level, format=u{log_io}, log_io=line)
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 144, in emit
self.observer(event)
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 136, in __call__
errorLogger = self._errorLoggerForObserver(brokenObserver)
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 156, in _errorLoggerForObserver
if obs is not observer
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_observer.py, line 81, in __init__
self.log = Logger(observer=self)
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 64, in __init__
namespace = self._namespaceFromCallingContext()
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/logger/_logger.py, line 42, in _namespaceFromCallingContext
return currentframe(2).f_globals[__name__]
File /home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/python/compat.py, line 93, in currentframe
for x in range(n + 1):
RuntimeError: maximum recursion depth exceeded while calling a Python object
Logged from file site.py, line 129
File /usr/lib/python2.7/logging/__init__.py, line 859, in emit
msg = self.format(record)
File /usr/lib/python2.7/logging/__init__.py, line 732, in format
return fmt.format(record)
File /usr/lib/python2.7/logging/__init__.py, line 471, in format
record.message = record.getMessage()
File /usr/lib/python2.7/logging/__init__.py, line 335, in getMessage
msg = msg % self.args
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)
Logged from file site.py, line 129
```
...where the logger apparently recurses whilst trying to log the error, hitting the
maximum recursion depth and killing everything badly.
Most rooms have a trivial history visibility like "shared" or
"world_readable", especially large rooms, so lets not bother getting the
full membership of those rooms in that case.
When _get_state_for_groups is given a wildcard filter, just do a complete
lookup. Hopefully this will give us the best of both worlds by not filling up
the ram if we only need one or two keys, but also making the cache still work
for the federation reader usecase.
When we finish processing a request, log the number of events we fetched from
the database to handle it.
[I'm trying to figure out which requests are responsible for large amounts of
event cache churn. It may turn out to be more helpful to add counts to the
prometheus per-request/block metrics, but that is an extension to this code
anyway.]
when there is no `m.room.power_levels` event in force in the room. (PR #3397)
Discussion around the Matrix Spec change proposal for this change can be
followed at https://github.com/matrix-org/matrix-doc/issues/1304.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJbIop9AAoJEIofk9V1tejV9lsIAJVH0l5dXROmy1KH/zt16AUA
CXa6Vv4Vyo6hKad/fZ81OZVRr5ChK/TvbIJVn/SA/muCfdoIFdxhT8eo/pXzO2UW
zReuLsDhAg+gSvpNus37oWj2FVsAE1HYDZ60lfaapAdZnkFit68d5DQZjO6nZHHA
YUXcU3GUwj0ZYuUzFzYKMLu6uNNasNkN8h6SS2lF7Bm4JaKDW+mFMfCyJwdIVSEh
BGhHoVpXdxFysD9s6Mwxqrz3KKg1Jtp7idDkk0x2S2Eh+gxyiDQQokv0oQ3+0+HG
sgy5Iz2t2CkpS02/j+LOvAZljTmnD0bXu3srGR+25StsoDFP038Am3bfQwtD190=
=9jsT
-----END PGP SIGNATURE-----
Merge tag 'v0.31.2'
SECURITY UPDATE: Prevent unauthorised users from setting state events in a room
when there is no `m.room.power_levels` event in force in the room. (PR #3397)
Discussion around the Matrix Spec change proposal for this change can be
followed at https://github.com/matrix-org/matrix-doc/issues/1304.
This is only used by filter_events_for_client, so we can simplify the whole
thing by just doing one user at a time, and removing a dead storage function to
boot.
=======================================
v0.31.1 fixes a security bug in the ``get_missing_events`` federation API
where event visibility rules were not applied correctly.
We are not aware of it being actively exploited but please upgrade asap.
Bug Fixes:
* Fix event filtering in get_missing_events handler (PR #3371)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEETQ1YthIGLQRddG54CTxDAAxPS/QFAlsalP8ACgkQCTxDAAxP
S/RADw/+NeDu0LjVpS5Uc4ElHgRBuFSm6l2i4z8rZBBlKSYnuq0Em4WMvLloi/JF
iAvTOYE7OjmF+gNvmsdH1N7hc1lKdQ2gAlpvaQR/5Qz9NtOVmM3WPZxS7n5jZHvD
hVSxeO9+GQOwK7rJorqrrsnWHQt0OkLHV6WThFdrgZb1JjWCUDTvw+Hei2uMX2aq
y2mkMG4TLStHwMvL2qw0h+hFtXywXI796qJR73ZxbEn24YD+kOXeEVkIFi2LT0Pj
cgkg7WWT32JD43/ioumLupZuhCmpRyxn4fi5gIKpXe5kiLsxOdApzNQSwmoJ+WA8
7zlrWY+0QDN4pbA5ESLitWSWAT50Ul//uM4nwmM4xEBPHdljXvKyHPsaSCeKLvT9
RT8pc41TQAqSshlXF8zgIAtStnF3oGel3EBBl1mmM9Un1ULnBFwWDZhlIm+ZZGhJ
MWoAWNG7j8AQuy0BTUAUr76x7t+/cdSqDuyVl1GO1tbDh0DUWoHZGXCUKrAXnn2T
SbiFigwOLvEADbvkW7L9Je9CVOi2V5Pg/32X9O8YMEiSz+j5PQEiGefyVh/I/QvV
Ha/atRpZF2OZ+XUOO5DLZMP/XCXpVgvHuskzfU6LvvVQCXgExuJhRD1PrRGeBaWr
zjJW+rmY+VeredqX7QkTB3XOGLMLGJfx2FeSMb+j0w3a7/iFj+Q=
=J80S
-----END PGP SIGNATURE-----
Merge tag 'v0.31.1'
Changes in synapse v0.31.1 (2018-06-08)
=======================================
v0.31.1 fixes a security bug in the ``get_missing_events`` federation API
where event visibility rules were not applied correctly.
We are not aware of it being actively exploited but please upgrade asap.
Bug Fixes:
* Fix event filtering in get_missing_events handler (PR #3371)
Firstly, don't swallow the reason for the failure
Secondly, don't assume all exceptions are verification failures
Thirdly, log a bit of info about the key being used if debug is enabled
These "temporary fixes" have been here three and a half years, and I can't find
any events in the matrix.org database where the calculated signature differs
from what's in the db. It's time for them to go away.
======================================
Most notable change from v0.30.0 is to switch to python prometheus library to improve system
stats reporting. WARNING this changes a number of prometheus metrics in a
backwards-incompatible manner. For more details, see
`docs/metrics-howto.rst <docs/metrics-howto.rst#removal-of-deprecated-metrics--time-based-counters-becoming-histograms-in-0310>`_.
Bug Fixes:
* Fix metric documentation tables (PR #3341)
* Fix LaterGuage error handling (694968f)
* Fix replication metrics (b7e7fd2)
Changes in synapse v0.31.0-rc1 (2018-06-04)
==========================================
Features:
* Switch to the Python Prometheus library (PR #3256, #3274)
* Let users leave the server notice room after joining (PR #3287)
Changes:
* daily user type phone home stats (PR #3264)
* Use iter* methods for _filter_events_for_server (PR #3267)
* Docs on consent bits (PR #3268)
* Remove users from user directory on deactivate (PR #3277)
* Avoid sending consent notice to guest users (PR #3288)
* disable CPUMetrics if no /proc/self/stat (PR #3299)
* Add local and loopback IPv6 addresses to url_preview_ip_range_blacklist (PR #3312) Thanks to @thegcat!
* Consistently use six's iteritems and wrap lazy keys/values in list() if they're not meant to be lazy (PR #3307)
* Add private IPv6 addresses to example config for url preview blacklist (PR #3317) Thanks to @thegcat!
* Reduce stuck read-receipts: ignore depth when updating (PR #3318)
* Put python's logs into Trial when running unit tests (PR #3319)
Changes, python 3 migration:
* Replace some more comparisons with six (PR #3243) Thanks to @NotAFile!
* replace some iteritems with six (PR #3244) Thanks to @NotAFile!
* Add batch_iter to utils (PR #3245) Thanks to @NotAFile!
* use repr, not str (PR #3246) Thanks to @NotAFile!
* Misc Python3 fixes (PR #3247) Thanks to @NotAFile!
* Py3 storage/_base.py (PR #3278) Thanks to @NotAFile!
* more six iteritems (PR #3279) Thanks to @NotAFile!
* More Misc. py3 fixes (PR #3280) Thanks to @NotAFile!
* remaining isintance fixes (PR #3281) Thanks to @NotAFile!
* py3-ize state.py (PR #3283) Thanks to @NotAFile!
* extend tox testing for py3 to avoid regressions (PR #3302) Thanks to @krombel!
* use memoryview in py3 (PR #3303) Thanks to @NotAFile!
Bugs:
* Fix federation backfill bugs (PR #3261)
* federation: fix LaterGauge usage (PR #3328) Thanks to @intelfx!
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEETQ1YthIGLQRddG54CTxDAAxPS/QFAlsXxJAACgkQCTxDAAxP
S/SJTg//Wtr+Qop9LJh2/leAYXpyqW6P7Ftak0w3aJ3KL3+tYg32yYNoRADCqbp3
LkrHu8MwbZagHjRUyEWNfDk4jbfq5fwh0JVGmYuUKhG9aF0HYyytKkbW79YzuhdQ
dfHj9x0xSBOUvgt/husloZSDy0VHC6uyQSAFgFDyHS2y7RPAiGstqLGByv0ciZOk
pO7TdjkUQcx4Ps7Wgip31NuHy3GY2int6f540pUXoZHLXs7RkfqS2cpF9Z/sTXJ4
xDLiY7uYNsTcCblwqaiijY5c90xwRB2vLs5CJdKFgyB6PNgg/2wHJqP/WHHEj+F8
BoSm3Ts7NXQf23pP9CXICe7vXX3J+ruOnC7FOSRobr6KGjn6DUrIZxo1ZepTwpp9
DIq+1eOFKKjwLQM3Jdi8WBCP63LhYXrTZxreke3jpwdcD7oIO9v6/e9J1gU5xHWa
Izg+YnWn1JLfq/X8T7YTZddUXGGPkH5i6LZKKkyY8u7LkJ4WR7syuAceUzkOOIAq
UWO0uEV7IiLnZzZGTtNIlEqtuklmVQTm6bvAgTPabai2JQyngFpH5M/5mPpVSiLV
QRLwaM56c+5GGZJWef8vxdGeYn+8rFI/UUniJ7358kLJF2IHsxlZu8J0ZZO2HWI2
ze5Kz0AWRzXLhWzq62Qb2dsiGySrZ7hng1tDxLak2IiusY+9SjM=
=Mz9U
-----END PGP SIGNATURE-----
Merge tag 'v0.31.0'
Changes in synapse v0.31.0 (2018-06-06)
======================================
Most notable change from v0.30.0 is to switch to python prometheus library to improve system
stats reporting. WARNING this changes a number of prometheus metrics in a
backwards-incompatible manner. For more details, see
`docs/metrics-howto.rst <docs/metrics-howto.rst#removal-of-deprecated-metrics--time-based-counters-becoming-histograms-in-0310>`_.
Bug Fixes:
* Fix metric documentation tables (PR #3341)
* Fix LaterGuage error handling (694968f)
* Fix replication metrics (b7e7fd2)
Changes in synapse v0.31.0-rc1 (2018-06-04)
==========================================
Features:
* Switch to the Python Prometheus library (PR #3256, #3274)
* Let users leave the server notice room after joining (PR #3287)
Changes:
* daily user type phone home stats (PR #3264)
* Use iter* methods for _filter_events_for_server (PR #3267)
* Docs on consent bits (PR #3268)
* Remove users from user directory on deactivate (PR #3277)
* Avoid sending consent notice to guest users (PR #3288)
* disable CPUMetrics if no /proc/self/stat (PR #3299)
* Add local and loopback IPv6 addresses to url_preview_ip_range_blacklist (PR #3312) Thanks to @thegcat!
* Consistently use six's iteritems and wrap lazy keys/values in list() if they're not meant to be lazy (PR #3307)
* Add private IPv6 addresses to example config for url preview blacklist (PR #3317) Thanks to @thegcat!
* Reduce stuck read-receipts: ignore depth when updating (PR #3318)
* Put python's logs into Trial when running unit tests (PR #3319)
Changes, python 3 migration:
* Replace some more comparisons with six (PR #3243) Thanks to @NotAFile!
* replace some iteritems with six (PR #3244) Thanks to @NotAFile!
* Add batch_iter to utils (PR #3245) Thanks to @NotAFile!
* use repr, not str (PR #3246) Thanks to @NotAFile!
* Misc Python3 fixes (PR #3247) Thanks to @NotAFile!
* Py3 storage/_base.py (PR #3278) Thanks to @NotAFile!
* more six iteritems (PR #3279) Thanks to @NotAFile!
* More Misc. py3 fixes (PR #3280) Thanks to @NotAFile!
* remaining isintance fixes (PR #3281) Thanks to @NotAFile!
* py3-ize state.py (PR #3283) Thanks to @NotAFile!
* extend tox testing for py3 to avoid regressions (PR #3302) Thanks to @krombel!
* use memoryview in py3 (PR #3303) Thanks to @NotAFile!
Bugs:
* Fix federation backfill bugs (PR #3261)
* federation: fix LaterGauge usage (PR #3328) Thanks to @intelfx!
This is unused. IT MUST DIE!!!1
̧̪͈̱̹̳͖͙H̵̰̤̰͕̖e̛ ͚͉̗̼̞w̶̩̥͉̮h̩̺̪̩͘ͅọ͎͉̟ ̜̩͔̦̘ͅW̪̫̩̣̲͔̳a͏͔̳͖i͖͜t͓̤̠͓͙s̘̰̩̥̙̝ͅ ̲̠̬̥Be̡̙̫̦h̰̩i̛̫͙͔̭̤̗̲n̳͞d̸ ͎̻͘T̛͇̝̲̹̠̗ͅh̫̦̝ͅe̩̫͟ ͓͖̼W͕̳͎͚̙̥ą̙l̘͚̺͔͞ͅl̳͍̙̤̤̮̳.̢
̟̺̜̙͉Z̤̲̙̙͎̥̝A͎̣͔̙͘L̥̻̗̳̻̳̳͢G͉̖̯͓̞̩̦O̹̹̺!̙͈͎̞̬ *
The added addresses are expected to be local or loopback addresses and
shouldn't be spidered for previews.
Signed-off-by: Felix Schäfer <felix@thegcat.net>
The pagination storage function supported not specifiying a limit on the
number of events returned. This was triggered when using the search or
context API with a limit of zero, which the storage function took to
mean not being limited.