Before Synapse 1.31 (#9411), we relied on `outlier` being stored in the
`internal_metadata` column. We can now assume nobody will roll back their
deployment that far and drop the legacy support.
* Inline `_check_event_auth` for outliers
When we are persisting an outlier, most of `_check_event_auth` is redundant:
* `_update_auth_events_and_context_for_auth` does nothing, because the
`input_auth_events` are (now) exactly the event's auth_events,
which means that `missing_auth` is empty.
* we don't care about soft-fail, kicking guest users or `send_on_behalf_of`
for outliers
... so the only thing that matters is the auth itself, so let's just do that.
* `_auth_and_persist_fetched_events_inner`: de-async `prep`
`prep` no longer calls any `async` methods, so let's make it synchronous.
* Simplify `_check_event_auth`
We no longer need to support outliers here, which makes things rather simpler.
* changelog
* lint
Currently we use `JsonEncoder.iterencode` to write JSON responses, which ensures that we don't block the main reactor thread when encoding huge objects. The downside to this is that `iterencode` falls back to using a pure Python encoder that is *much* less efficient and can easily burn a lot of CPU for huge responses. To fix this, while still ensuring we don't block the reactor loop, we encode the JSON on a threadpool using the standard `JsonEncoder.encode` functions, which is backed by a C library.
Doing so, however, requires `respond_with_json` to have access to the reactor, which it previously didn't. There are two ways of doing this:
1. threading through the reactor object, which is a bit fiddly as e.g. `DirectServeJsonResource` doesn't currently take a reactor, but is exposed to modules and so is a PITA to change; or
2. expose the reactor in `SynapseRequest`, which requires updating a bunch of servlet types.
I went with the latter as that is just a mechanical change, and I think makes sense as a request already has a reactor associated with it (via its http channel).
This is in the context of creating new module callbacks that modules in https://github.com/matrix-org/synapse-dinsic can use, in an effort to reconcile the spam checker API in synapse-dinsic with the one in mainline.
This adds a callback that's fairly similar to user_may_create_room except it also allows processing based on the invites sent at room creation.
- Use sytest:bionic. Sytest:latest is two years old (do we want
CI to push out latest at all?) and comes with Python 3.5, which we
explictly no longer support. The script now runs under PostgreSQL 10
as a result.
- Advertise script in the docs
- Move pg testing script to scripts-dev directory
- Write to host as the script's exector, not root
A few changes to make it speedier to re-run the tests:
- Create blank DB in the container, not the script, so we don't have to
`initdb` each time
- Use a named volume to persist the tox environment, so we don't have to
fetch and install a bunch of packages from PyPI each time
Co-authored-by: reivilibre <olivier@librepush.net>
* Factor more stuff out of `_get_events_and_persist`
It turns out that the event-sorting algorithm in `_get_events_and_persist` is
also useful in other circumstances. Here we move the current
`_auth_and_persist_fetched_events` to `_auth_and_persist_fetched_events_inner`,
and then factor the sorting part out to `_auth_and_persist_fetched_events`.
* `_get_remote_auth_chain_for_event`: remove redundant `outlier` assignment
`get_event_auth` returns events with the outlier flag already set, so this is
redundant (though we need to update a test where `get_event_auth` is mocked).
* `_get_remote_auth_chain_for_event`: move existing-event tests earlier
Move a couple of tests outside the loop. This is a bit inefficient for now, but
a future commit will make it better. It should be functionally identical.
* `_get_remote_auth_chain_for_event`: use `_auth_and_persist_fetched_events`
We can use the same codepath for persisting the events fetched as part of an
auth chain as for those fetched individually by `_get_events_and_persist` for
building the state at a backwards extremity.
* `_get_remote_auth_chain_for_event`: use a dict for efficiency
`_auth_and_persist_fetched_events` sorts the events itself, so we no longer
need to care about maintaining the ordering from `get_event_auth` (and no
longer need to sort by depth in `get_event_auth`).
That means that we can use a map, making it easier to filter out events we
already have, etc.
* changelog
* `_auth_and_persist_fetched_events`: improve docstring
Combine the two loops over the list of events, and hence get rid of
`_NewEventInfo`. Also pass the event back alongside the context, so that it's
easier to process the result.
If the MAU count had been reached, Synapse incorrectly blocked appservice users even though they've been explicitly configured not to be tracked (the default). This was due to bypassing the relevant if as it was chained behind another earlier hit if as an elif.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
* Improve typing in user_directory files
This makes the user_directory.py in storage pass most of mypy's
checks (including `no-untyped-defs`). Unfortunately that file is in the
tangled web of Store class inheritance so doesn't pass mypy at the moment.
The handlers directory has already been mypyed.
Co-authored-by: reivilibre <olivier@librepush.net>
This change adds a check for row existence before accessing row element, this should fix issue #10669
Signed-off-by: Vasya Boytsov vasiliy.boytsov@phystech.edu
* Reload auth events from db after fetching and persisting
In `_update_auth_events_and_context_for_auth`, when we fetch the remote auth
tree and persist the returned events: load the missing events from the database
rather than using the copies we got from the remote server.
This is mostly in preparation for additional refactors, but does have an
advantage in that if we later get around to checking the rejected status, we'll
be able to make use of it.
* Factor out `_get_remote_auth_chain_for_event` from `_update_auth_events_and_context_for_auth`
* changelog
Co-authored-by: Dirk Klimpel <5740567+dklimpel@users.noreply.github.com>
Co-authored-by: reivilibre <olivier@librepush.net>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
This avoids the overhead of searching through the various
configuration classes by directly referencing the class that
the attributes are in.
It also improves type hints since mypy can now resolve the
types of the configuration variables.
Constructing an EventContext for an outlier is actually really simple, and
there's no sense in going via an `async` method in the `StateHandler`.
This also means that we can resolve a bunch of FIXMEs.
* add test to check if null code points are being inserted
* add logic to detect and replace null code points before insertion into db
* lints
* add license to test
* change approach to null substitution
* add type hint for SearchEntry
* Add changelog entry
Signed-off-by: H.Shay <shaysquared@gmail.com>
* updated changelog
* update chanelog message
* remove duplicate changelog
* Update synapse/storage/databases/main/events.py remove extra space
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* rename and move test file, update tests, delete old test file
* fix typo in comments
* update _find_highlights_in_postgres to replace null byte with space
* replace null byte in sqlite search insertion
* beef up and reorganize test for this pr
* update changelog
* add type hints and update docstring
* check db engine directly vs using env variable
* refactor tests to be less repetetive
* move rplace logic into seperate function
* requested changes
* Fix typo.
* Update synapse/storage/databases/main/search.py
Co-authored-by: reivilibre <olivier@librepush.net>
* Update changelog.d/10820.misc
Co-authored-by: Aaron Raimist <aaron@raim.ist>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: reivilibre <olivier@librepush.net>
Co-authored-by: Aaron Raimist <aaron@raim.ist>
The invalidation was missing in `_claim_e2e_one_time_key_returning`,
which is used on SQLite 3.24+ and Postgres. This could break e2ee if
nothing else happened to invalidate the caches before the keys ran out.
Signed-off-by: Tulir Asokan <tulir@beeper.com>
* Improved titles (fall back to the author name if there's not title) and include the site name.
* Handle photo/video payloads.
* Include the original URL in the Open Graph response.
* Fix the expiration time (by properly converting from seconds to milliseconds).
The deprecated /initialSync endpoint maintains a cache of responses,
using parameter values as part of the cache key. When a `from` or `to`
parameter is specified, it gets converted into a `StreamToken`, which
contains a `RoomStreamToken` and forms part of the cache key.
`RoomStreamToken`s need to be made hashable for this to work.
I meant to do this before, in #10591, but because I'm stupid I forgot to do it
for V2 and V3 events.
I've factored the common code out to `EventBase` to save us having two copies
of it.
This means that for `FrozenEvent` we replace `self.get("event_id", None)` with
`self.event_id`, which I think is safe. `get()` is an alias for
`self._dict.get()`, whereas `event_id()` is an `@property` method which looks
up `self._event_id`, which is populated during construction from the same
dict. We don't seem to rely on the fallback, because if the `event_id` key is
absent from the dict then construction of the `EventBase` object will
fail.
Long story short, the only way this could change behaviour is if
`event_dict["event_id"]` is changed *after* the `EventBase` object is
constructed without updating the `_event_id` field, or vice versa - either of
which would be very problematic anyway and the behavior of `str(event)` is the
least of our worries.
The major change is moving the decision of whether to use oEmbed
further up the call-stack. This reverts the _download_url method to
being a "dumb" functionwhich takes a single URL and downloads it
(as it was before #7920).
This also makes more minor refactorings:
* Renames internal variables for clarity.
* Factors out shared code between the HTML and rich oEmbed
previews.
* Fixes tests to preview an oEmbed image.
* add tests for checking if room search works with non-ascii char
* change encoding on parse_string to UTF-8
* lints
* properly encode search term
* lints
* add changelog file
* update changelog number
* set changelog entry filetype to .bugfix
* Revert "set changelog entry filetype to .bugfix"
This reverts commit be8e5a314251438ec4ec7dbc59ba32162c93e550.
* update changelog message and file type
* change parse_string default encoding back to ascii and update room search admin api calll to parse string
* refactor tests
* Update tests/rest/admin/test_room.py
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
It's a simplification, but one that'll help make the user directory logic easier
to follow with the other changes upcoming. It's not strictly required for those
changes, but this will help simplify the resulting logic that listens for
`m.room.member` events and generally make the logic easier to follow.
This means the config option `search_all_users` ends up controlling the
search query only, and not the data we store. The cost of doing so is an
extra row in the `user_directory` and `user_directory_search` tables for
each local user which
- belongs to no public rooms
- belongs to no private rooms of size ≥ 2
I think the cost of this will be marginal (since they'll already have entries
in `users` and `profiles` anyway).
As a small upside, a homeserver whose directory was built with this
change can toggle `search_all_users` without having to rebuild their
directory.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Adds missing type hints to methods in the synapse.handlers
module and requires all methods to have type hints there.
This also removes the unused construct_auth_difference method
from the FederationHandler.
In `MatrixFederationHttpClient._send_request()`, we make a HTTP request
using an `Agent`, wrap that request in a timeout and await the resulting
`Deferred`. On its own, the `Agent` performing the HTTP request
correctly stashes and restores the logging context while waiting.
The addition of the timeout introduces a path where the logging context
is not restored when execution resumes.
To address this, we wrap the timeout `Deferred` in a
`make_deferred_yieldable()` to stash the logging context and restore it
on completion of the `await`. However this is not sufficient, since by
the time we construct the timeout `Deferred`, the `Agent` has already
stashed and cleared the logging context when using
`make_deferred_yieldable()` to produce its `Deferred` for the request.
Hence, we wrap the `Agent` request in a `run_in_background()` to "fork"
and preserve the logging context so that we can stash and restore it
when `await`ing the timeout `Deferred`.
This approach is similar to the one used with `defer.gatherResults`.
Note that the code is still not fully correct. When a timeout occurs,
the request remains running in the background (existing behavior which
is nothing to do with the new call to `run_in_background`) and may
re-start the logging context after it has finished.
I had one of these error messages yesterday and assumed it was an
invalid auth token (because that was an HTTP query parameter in the
test) I was working on. In fact, it was an invalid next batch token for
syncing.
We've already batched up the events previously, and assume in other
places in the events.py file that we have. Removing this makes it easier
to adjust the batch sizes in one place.
Hint to clients via the room capabilities API (MSC3244) that
room version 9 should be preferred for creating a room with
restricted join rules (instead of room version 8).
* Split up the documentation in several files rather than one huge one
* Add examples for each callback category
* Other niceties like fixing https://github.com/matrix-org/synapse/issues/10632
* Add titles to callbacks so they're easier to find in the navigation panels and link to
When releasing 1.42.0 with @Azrenbeth and talking with @clokep yesterday I realised doing the dch incantations related to releasing Synapse wasn't trivial on eg a macOS system, so this is a script to run in a Debian container to make things a bit easier.
This adds the format to the request arguments / URL to
ensure that JSON data is returned (which is all that
Synapse supports).
This also adds additional error checking / filtering to the
configuration file to ignore XML-only providers.
I think I have finally teased apart the codepaths which handle outliers, and those that handle non-outliers.
Let's add some assertions to demonstrate my newfound knowledge.
If we're persisting an event E which has auth_events A1, A2, then we ought to make sure that we correctly auth
and persist A1 and A2, before we blindly accept E.
This PR does part of that - it persists the auth events first - but it does not fully solve the problem, because we
still don't check that the auth events weren't rejected.
The full event content cannot be trusted from this API (as no auth
chain, etc.) is processed over federation. Returning the full event
content was a bug as MSC2946 specifies that only the stripped
state should be returned.
This also avoids calculating aggregations / annotations which go
unused.