This adds quite a lot of OpenTracing decoration for database activity. Specifically it adds tracing at four different levels:
* emit a span for each "interaction" - ie, the top level database function that we tend to call "transaction", but isn't really, because it can end up as multiple transactions.
* emit a span while we hold a database connection open
* emit a span for each database transaction - actual actual transaction.
* emit a span for each database query.
I'm aware this might be quite a lot of overhead, but even just running it on a local Synapse it looks really interesting, and I hope the overhead can be offset just by turning down the sampling frequency and finding other ways of tracing requests of interest (eg, the `force_tracing_for_users` setting).
Empirically, this helped my server considerably when handling gaps in Matrix HQ. The problem was that we would repeatedly call have_seen_events for the same set of (50K or so) auth_events, each of which would take many minutes to complete, even though it's only an index scan.
* Make `invalidate` and `invalidate_many` do the same thing
... so that we can do either over the invalidation replication stream, and also
because they always confused me a bit.
* Kill off `invalidate_many`
* changelog
`keylen` seems to be a thing that is frequently incorrectly set, and we don't really need it.
The only time it was used was to figure out if we had removed a subtree in `del_multi`, which we can do better by changing `TreeCache.pop` to return a different type (`TreeCacheNode`).
Commits should be independently reviewable.
Fixes: https://github.com/matrix-org/synapse/issues/9962
This is a fix for above problem.
I fixed it by swaping the order of insertion of new records and deletion of old ones. This ensures that we don't delete fresh database records as we do deletes before inserts.
Signed-off-by: Marek Matys <themarcq@gmail.com>
- use a tuple rather than a list for the iterable that is passed into the
wrapped function, for performance
- test that we can pass an iterable and that keys are correctly deduped.
It's not obvious that instances of SQLBaseStore each need their own
instances of random.SystemRandom(); let's just use random directly.
Introduced by 52839886d6
Signed-off-by: Dan Callahan <danc@element.io>
Now that cross signing exists there is much less of a need for other people to look at devices and verify them individually. This PR adds a config option to allow you to prevent device display names from being shared with other servers.
Signed-off-by: Aaron Raimist <aaron@raim.ist>
The hope here is that by moving all the schema files into synapse/storage/schema, it gets a bit easier for newcomers to navigate.
It certainly got easier for me to write a helpful README. There's more to do on that front, but I'll follow up with other PRs for that.
This fixes a regression where the logging context for runWithConnection
was reported as runWithConnection instead of the connection name,
e.g. "POST-XYZ".
I went through and removed a bunch of cruft that was lying around for compatibility with old Python versions. This PR also will now prevent Synapse from starting unless you're running Python 3.6+.
This attempts to be a direct port of https://github.com/matrix-org/synapse-dinsic/pull/74 to mainline. There was some fiddling required to deal with the changes that have been made to mainline since (mainly dealing with the split of `RegistrationWorkerStore` from `RegistrationStore`, and the changes made to `self.make_request` in test code).
This basically speeds up federation by "squeezing" each individual dual database call (to destinations and destination_rooms), which previously happened per every event, into one call for an entire batch (100 max).
Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
Part of #9744
Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.
`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
The `remote_media_cache_thumbnails_media_origin_media_id_thumbna_key`
constraint is superceded by
`remote_media_repository_thumbn_media_origin_id_width_height_met` (which adds
`thumbnail_method` to the unique key).
PR #7124 made an attempt to remove the old constraint, but got the name wrong,
so it didn't work. Here we update the bg update and rerun it.
Fixes#8649.
`room_invite_state_types` was inconvenient as a configuration setting, because
anyone that ever set it would not receive any new types that were added to the
defaults. Here, we deprecate the old setting, and replace it with a couple of
new settings under `room_prejoin_state`.
Split off from https://github.com/matrix-org/synapse/pull/9491
Adds a storage method for getting the current presence of all local users, optionally excluding those that are offline. This will be used by the code in #9491 when a PresenceRouter module informs Synapse that a given user should have `"ALL"` user presence updates routed to them. Specifically, it is used here: b588f16e39/synapse/handlers/presence.py (L1131-L1133)
Note that there is a `get_all_presence_updates` function just above. That function is intended to walk up the table through stream IDs, and is primarily used by the presence replication stream. I could possibly make use of it in the PresenceRouter-related code, but it would be a bit of a bodge.
We had two functions named `get_forward_extremities_for_room` and
`get_forward_extremeties_for_room` that took different paramters. We
rename one of them to avoid confusion.
* Populate `internal_metadata.outlier` based on `events` table
Rather than relying on `outlier` being in the `internal_metadata` column,
populate it based on the `events.outlier` column.
* Move `outlier` out of InternalMetadata._dict
Ultimately, this will allow us to stop writing it to the database. For now, we
have to grandfather it back in so as to maintain compatibility with older
versions of Synapse.
Federation catch up mode is very inefficient if the number of events
that the remote server has missed is small, since handling gaps can be
very expensive, c.f. #9492.
Instead of going into catch up mode whenever we see an error, we instead
do so only if we've backed off from trying the remote for more than an
hour (the assumption being that in such a case it is more than a
transient failure).
Background: When we receive incoming federation traffic, and notice that we are missing prev_events from
the incoming traffic, first we do a `/get_missing_events` request, and then if we still have missing prev_events,
we set up new backwards-extremities. To do that, we need to make a `/state_ids` request to ask the remote
server for the state at those prev_events, and then we may need to then ask the remote server for any events
in that state which we don't already have, as well as the auth events for those missing state events, so that we
can auth them.
This PR attempts to optimise the processing of that state request. The `state_ids` API returns a list of the state
events, as well as a list of all the auth events for *all* of those state events. The optimisation comes from the
observation that we are currently loading all of those auth events into memory at the start of the operation, but
we almost certainly aren't going to need *all* of the auth events. Rather, we can check that we have them, and
leave the actual load into memory for later. (Ideally the federation API would tell us which auth events we're
actually going to need, but it doesn't.)
The effect of this is to reduce the number of events that I need to load for an event in Matrix HQ from about
60000 to about 22000, which means it can stay in my in-memory cache, whereas previously the sheer number
of events meant that all 60K events had to be loaded from db for each request, due to the amount of cache
churn. (NB I've already tripled the size of the cache from its default of 10K).
Unfortunately I've ended up basically C&Ping `_get_state_for_room` and `_get_events_from_store_or_dest` into
a new method, because `_get_state_for_room` is also called during backfill, which expects the auth events to be
returned, so the same tricks don't work. That said, I don't really know why that codepath is completely different
(ultimately we're doing the same thing in setting up a new backwards extremity) so I've left a TODO suggesting
that we clean it up.
Turns out matrix.org has an event that has duplicate auth events (which really isn't supposed to happen, but here we are). This caused the background update to fail due to `UniqueViolation`.
It landed in schema version 58 after 59 had been created, causing some
servers to not run it. The main effect of was that not all rooms had
their chain cover calculated correctly. After the BG updates complete
the chain covers will get fixed when a new state event in the affected
rooms is received.
The idea here is to stop people forgetting to call `check_consistency`. Folks can still just pass in `None` to the new args in `build_sequence_generator`, but hopefully they won't.
This PR remove the cache for the `get_shared_rooms_for_users` storage method (the db method driving the experimental "what rooms do I share with this user?" feature: [MSC2666](https://github.com/matrix-org/matrix-doc/pull/2666)). Currently subsequent requests to the endpoint will return the same result, even if your shared rooms with that user have changed.
The cache was added in https://github.com/matrix-org/synapse/pull/7785, but we forgot to ensure it was invalidated appropriately.
Upon attempting to invalidate it, I found that the cache had to be entirely invalidated whenever a user (remote or local) joined or left a room. This didn't make for a very useful cache, especially for a function that may or may not be called very often. Thus, I've opted to remove it instead of invalidating it.
This PR adds a homeserver config option, `user_directory.prefer_local_users`, that when enabled will show local users higher in user directory search results than remote users. This option is off by default.
Note that turning this on doesn't necessarily mean that remote users will always be put below local users, but they should be assuming all other ranking factors (search query match, profile information present etc) are identical.
This is useful for, say, University networks that are openly federating, but want to prioritise local students and staff in the user directory over other random users.
- Update black version to the latest
- Run black auto formatting over the codebase
- Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
- Update `code_style.md` docs around installing black to use the correct version
Remove conflicting sqlite tables that throw sqlite3.OperationalError: object name reserved for internal use: event_search_content when running the twisted unit tests.
Fix#8996
Fixes#8966.
* Factor out build_synapse_client_resource_tree
Start a function which will mount resources common to all workers.
* Move sso init into build_synapse_client_resource_tree
... so that we don't have to do it for each worker
* Fix SSO-login-via-a-worker
Expose the SSO login endpoints on workers, like the documentation says.
* Update workers config for new endpoints
Add documentation for endpoints recently added (#8942, #9017, #9262)
* remove submit_token from workers endpoints list
this *doesn't* work on workers (yet).
* changelog
* Add a comment about the odd path for SAML2Resource
This expands the current shadow-banning feature to be usable via
the admin API and adds documentation for it.
A shadow-banned users receives successful responses to their
client-server API requests, but the events are not propagated into rooms.
Shadow-banning a user should be used as a tool of last resort and may lead
to confusing or broken behaviour for the client.
==============================
Bugfixes
--------
- Fix receipts and account data not being sent down sync. Introduced in v1.26.0rc1. ([\#9193](https://github.com/matrix-org/synapse/issues/9193), [\#9195](https://github.com/matrix-org/synapse/issues/9195))
- Fix chain cover update to handle events with duplicate auth events. Introduced in v1.26.0rc1. ([\#9210](https://github.com/matrix-org/synapse/issues/9210))
Internal Changes
----------------
- Add an `oidc-` prefix to any `idp_id`s which are given in the `oidc_providers` configuration. ([\#9189](https://github.com/matrix-org/synapse/issues/9189))
- Bump minimum `psycopg2` version to v2.8. ([\#9204](https://github.com/matrix-org/synapse/issues/9204))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAmAOy7IACgkQM/xY9qcR
MEj63w//WlHcArwcpJG4LdiNaKHBUQm00DFmtV27Tl7bixa7LlClUP4qhvE1PP1n
+uFWQUUAMUCUC31ySF3X5MEny7svD0J3r2BzbUzP8Vo1d0bHXvpKDgMrh8GbIJsF
BEe+uQkRII15Zlkg+Oa7sk7ZI3oyg8y+SQ6yodzc8fL1cRw9bCIUDvFjFcKR3JEw
2dHwzLq4MdYNnPME+mzhNKj30XsQ78VlbSImhGOoKdD/iBQ32E+RGpPNdVx6WDTb
09C8pFA7qvB8d1nrnH5yGLaBzbU6mxc6jaG9xfadnhMJzG7RDVIJya+1JLm0KF2C
d8HJWZMIFn6IdHADr7xoQF2km6QN1JTedCSzYzpfbAHwq5bOCiRqBjNnU6xC4giw
oYqsV7xRTqRd0psh+/nN8Gz2XiZzkeMbC31kAzjDofPIZFTcte3gR1NhpxWIKoQJ
O2gZb2wV5mq8DgJEP4Xjfe/PXiMhFIpb3fbkLdX5tZonPhs95yQHAqZFeZXUaV51
/U3AWQrvxvkM7TpdcdDi+kqzMTCi6imXBuAzXKvY+nmcAd0nvmDg6WXAOSK3DL9O
VDZJKm5urzBgbv8R0eKz7cWdW9YXqIIc0mcS/LCu/KYOJnK5YfhJc3grfJx9Dv/S
g2T7T+xXUIj2ok9U5M56ACW/bdATNs62ihHx1uiereHQtP/GjtM=
=Qc7+
-----END PGP SIGNATURE-----
Merge tag 'v1.26.0rc2' into social_login
Synapse 1.26.0rc2 (2021-01-25)
==============================
Bugfixes
--------
- Fix receipts and account data not being sent down sync. Introduced in v1.26.0rc1. ([\#9193](https://github.com/matrix-org/synapse/issues/9193), [\#9195](https://github.com/matrix-org/synapse/issues/9195))
- Fix chain cover update to handle events with duplicate auth events. Introduced in v1.26.0rc1. ([\#9210](https://github.com/matrix-org/synapse/issues/9210))
Internal Changes
----------------
- Add an `oidc-` prefix to any `idp_id`s which are given in the `oidc_providers` configuration. ([\#9189](https://github.com/matrix-org/synapse/issues/9189))
- Bump minimum `psycopg2` version to v2.8. ([\#9204](https://github.com/matrix-org/synapse/issues/9204))
We have seen a failure mode here where if there are many in flight
unfinished IDs then marking an ID as finished takes a lot of CPU (as
calling deque.remove iterates over the list)
Introduced in #9104
This wasn't picked up by the tests as this is all fine the first time you run Synapse (after upgrading), but then when you restart the wrong value is pulled from `stream_positions`.
We do this by allowing a single iteration to process multiple rooms at a
time, as there are often a lot of really tiny rooms, which can massively
slow things down.
You can't continue using a transaction once an exception has been
raised, so catching and dropping the error here is pointless and just
causes more errors.
GET /_synapse/admin/v1/rooms/<identifier>/forward_extremities now gets forward extremities for a room, returning count and the list of extremities.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
It's important that we make sure our background updates happen in a defined
order, to avoid disasters like #6923.
Add an ordering to all of the background updates that have landed since #7190.