Ideally we would replace this with parsing of the Accept header
or something else, but for now just make Synapse spec compliant
by ignoring the unspecced parameter.
It does not seem that this is ever sent by a client, and even if it is
there's a reasonable fallback.
* Change `create_room` return type
* Don't return room alias from /createRoom
* Update other callsites
* Fix up mypy complaints
It looks like new_room_user_id is None iff new_room_id is None. It's a
shame we haven't expressed this in a way that mypy can understand.
* Changelog
* Sort BOOLEAN_COLUMNS and APPEND_ONLY_TABLES
So I can see if a given table is present in logarithmic time, rather
than linear.
* Teach portdb about `un_partial_stated_event_streams`
* Comments comments comments
* Changelog
Previously, when creating a join event in /make_join, we would decide
whether to include additional fields to satisfy restricted room checks
based on the current state of the room. Then, when building the event,
we would capture the forward extremities of the room to use as prev
events.
This is subject to race conditions. For example, when leaving and
rejoining a room, the following sequence of events leads to a misleading
403 response:
1. /make_join reads the current state of the room and sees that the user
is still in the room. It decides to omit the field required for
restricted room joins.
2. The leave event is persisted and the room's forward extremities are
updated.
3. /make_join builds the event, using the post-leave forward extremities.
The event then fails the restricted room checks.
To mitigate the race, we move the read of the forward extremities closer
to the read of the current state. Ideally, we would compute the state
based off the chosen prev events, but that can involve state resolution,
which is expensive.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Update mypy and mypy-zope
* Remove unused ignores
These used to suppress
```
synapse/storage/engines/__init__.py:28: error: "__new__" must return a
class instance (got "NoReturn") [misc]
```
and
```
synapse/http/matrixfederationclient.py:1270: error: "BaseException" has no attribute "reasons" [attr-defined]
```
(note that we check `hasattr(e, "reasons")` above)
* Avoid empty body warnings, sometimes by marking methods as abstract
E.g.
```
tests/handlers/test_register.py:58: error: Missing return statement [empty-body]
tests/handlers/test_register.py:108: error: Missing return statement [empty-body]
```
* Suppress false positive about `JaegerConfig`
Complaint was
```
synapse/logging/opentracing.py:450: error: Function "Type[Config]" could always be true in boolean context [truthy-function]
```
* Fix not calling `is_state()`
Oops!
```
tests/rest/client/test_third_party_rules.py:428: error: Function "Callable[[], bool]" could always be true in boolean context [truthy-function]
```
* Suppress false positives from ParamSpecs
````
synapse/logging/opentracing.py:971: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]" [arg-type]
synapse/logging/opentracing.py:1017: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]" [arg-type]
````
* Drive-by improvement to `wrapping_logic` annotation
* Workaround false "unreachable" positives
See https://github.com/Shoobx/mypy-zope/issues/91
```
tests/http/test_proxyagent.py:626: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:762: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:826: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:838: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:845: error: Statement is unreachable [unreachable]
tests/http/federation/test_matrix_federation_agent.py:151: error: Statement is unreachable [unreachable]
tests/http/federation/test_matrix_federation_agent.py:452: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:60: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:93: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:127: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:152: error: Statement is unreachable [unreachable]
```
* Changelog
* Tweak DBAPI2 Protocol to be accepted by mypy 1.0
Some extra context in:
- https://github.com/matrix-org/python-canonicaljson/pull/57
- https://github.com/python/mypy/issues/6002
- https://mypy.readthedocs.io/en/latest/common_issues.html#covariant-subtyping-of-mutable-protocol-members-is-rejected
* Pull in updated canonicaljson lib
so the protocol check just works
* Improve comments in opentracing
I tried to workaround the ignores but found it too much trouble.
I think the corresponding issue is
https://github.com/python/mypy/issues/12909. The mypy repo has a PR
claiming to fix this (https://github.com/python/mypy/pull/14677) which
might mean this gets resolved soon?
* Better annotation for INTERACTIVE_AUTH_CHECKERS
* Drive-by AUTH_TYPE annotation, to remove an ignore
This replaces the specific `is_room_mention` push rule condition
used in MSC3952 with the generic `exact_event_match` push rule
condition from MSC3758.
No functionality changes due to this.
Previously we would give up upon receiving a 404 from the first server,
instead of trying the rest of the servers in the list.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Fix order of partial state tables when purging
`partial_state_rooms` has an FK on `events` pointing to the join event we
get from `/send_join`, so we must delete from that table before deleting
from `events`.
**NB:** It would be nice to cancel any resync processes for the room
being purged. We do not do this at present. To do so reliably we'd need
an internal HTTP "replication" endpoint, because the worker doing the
resync process may be different to that handling the purge request.
The first time the resync process tries to write data after the deletion
it will fail because we have deleted necessary data e.g. auth
events. AFAICS it will not retry the resync, so the only downside to
not cancelling the resync is a scary-looking traceback.
(This is presumably extremely race-sensitive.)
* Changelog
* admist(?) -> between
* Warn about a race
* Fix typo, thanks Sean
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
---------
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
...when lazy loading of members is not enabled. It's weird to notify
a client that another user's device list has changed when the client
doesn't think that they share a room.
Note that when a room is un-partial stated, device list updates are
emitted for every member in that room over /sync.
Signed-off-by: Sean Quah <seanq@matrix.org>
Fixes#12801.
Complement tests are at
https://github.com/matrix-org/complement/pull/567.
Avoid blocking on full state when handling a subsequent join into a
partial state room.
Also always perform a remote join into partial state rooms, since we do
not know whether the joining user has been banned and want to avoid
leaking history to banned users.
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <seanq@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
It's important that collections returned from `@cached` methods are not
modified, otherwise future retrievals from the cache will return the
modified collection.
This applies to the return values from `@cached` methods and the values
inside the dictionaries returned by `@cachedList` methods. It's not
necessary for the dictionaries returned by `@cachedList` methods
themselves to be read-only.
Signed-off-by: Sean Quah <seanq@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
This specifies to search for an exact value match, instead of
string globbing. It only works across non-compound JSON values
(null, boolean, integer, and strings).
The per-room account data is no longer unconditionally
fetched, even if all rooms will be filtered out.
Global account data will not be fetched if it will all be
filtered out.
The previous version of the code could mutate a cached value,
but only if the input requested all devices of a user *and* a specific
device.
To avoid this nonsensical situation we no longer fetch a specific
device ID if all of a user's devices are returned.
This disambiguates keys which attempt to match fields
with a dot in them (e.g. m.relates_to).
Disabled by default behind an experimental configuration flag.
* Fix MediaStorage type hint
* Typecheck tests.rest.media.v1.test_media_storage
* Changelog
* Remove assert and make the comment succinct
* Fix syntax for olddeps
* Tweak http types in Synapse
AFACIS these are correct, and they make mypy happier on tests.http.
* Type hints for test_proxyagent
* type hints for test_srv_resolver
* test_matrix_federation_agent
* tests.http.server._base
* tests.http.__init__
* tests.http.test_additional_resource
* tests.http.test_client
* tests.http.test_endpoint
* tests.http.test_matrixfederationclient
* tests.http.test_servlet
* tests.http.test_simple_client
* tests.http.test_site
* One fixup in tests.server
* Untyped defs
* Changelog
* Fixup syntax for Python 3.7
* Fix olddeps syntax
* Use a twisted IPv4 addr for dummy_address
* Fix typo, thanks Sean
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Remove redundant `Optional`
---------
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
This adds an `event_stream_ordering` column to `current_state_events`,
`local_current_membership` and `room_memberships`. Each of these tables
is regularly joined with the `events` table to get the stream ordering
and denormalising this into each table will yield significant query
performance improvements once used. Includes a background job to
populate these values from the `events` table.
Same idea as https://github.com/matrix-org/synapse/pull/13703.
Signed off by Nick @ Beeper (@fizzadar).
* Accept a Sequence of events in synapse.appservice
This avoids some casts/ignores in the tests I'm about to fixup. It seems
that `List[Mock]` is not a subtype of `List[EventBase]`, but
`Sequence[Mock]` is a subtype of `Sequence[EventBase]`. So presumably
`Mock` is considered a subtype of anything, much like `Any`.
* make tests.appservice.test_scheduler pass mypy
* Extra hints in tests.appservice.test_scheduler
* Extra hints in tests.appservice.test_api
* Extra hints in tests.appservice.test_appservice
* Disallow untyped defs
* Changelog
Co-authored-by: Brad Murray <brad@beeper.com>
Co-authored-by: Nick Barrett <nick@beeper.com>
Copy the suppress_edits push rule from Beeper to implement MSC3958.
9415a1284b/rust/src/push/base_rules.rs (L98-L114)
Ensure that the list of servers in a partial state room always contains
the server we joined off.
Also refactor `get_partial_state_servers_at_join` to return `None` when
the given room is no longer partial stated, to explicitly indicate when
the room has partial state. Otherwise it's not clear whether an empty
list means that the room has full state, or the room is partial stated,
but the server we joined off told us that there are no servers in the
room.
Signed-off-by: Sean Quah <seanq@matrix.org>
Since pyo3-log is initialized very early in the Python start-up
it caches the state of the loggers before they're fully initialized
(and thus are essentially disabled). Whenever we reload the
logging configuration we now also tell pyo3-log to discard
any cached logging configuration it has; it will refetch the
current logging configuration from Python at the next point
it logs.
This fixes Rust log lines not appearing in the homeserver logs.
If a sync request does not need to calculate per-room entries &
is not generating presence & is not generating device list data
(e.g. during initial sync) avoid the expensive calculation of room
specific data.
This is a micro-optimisation for clients syncing simply to receive
to-device information.
This expands the previous optimisation from being only for initial
sync to being for all sync requests.
It also inverts some of the logic to be inclusive instead of exclusive.
The `parse_enum` helper pulls an enum value from the query string
(by delegating down to the parse_string helper with values generated
from the enum).
This is used to pull out "f" and "b" in most places and then we thread
the resulting Direction enum throughout more code.
The previous assumption was that the stream_id column was unique
(for a room ID, receipt type, user ID tuple), but this turned out to be
incorrect.
Now find the max stream ID, then map this back to a database-specific
row identifier and delete other rows which match the (room ID, receipt type,
user ID) tuple, but *not* the row ID.
`run_in_background` calls re-use the current logging context. When they
are not awaited, they can complete after the current logging context has
been marked as finished, which leads to log spam. Use
`run_as_background_process` instead.
Fixes one of the instances of #13090.
Signed-off-by: Sean Quah <seanq@matrix.org>
#14910 fixed the regression introduced by #13873 where sqlite database
migrations would no longer run inside a transaction. However, it
committed the transaction before Synapse updated its bookkeeping of
which migrations have been run, which means that migrations may be run
again after they have completed successfully.
Leave the transaction open at the end of `executescript`, to restore the
old, correct behaviour. Also make the PostgreSQL behaviour consistent
with SQLite.
Fixes#14909.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Better test for bad values in power levels events
The previous test only checked that Synapse didn't raise an exception,
but didn't check that we had correctly interpreted the value of the
dodgy power level.
It also conflated two things: bad room notification levels, and bad user
levels. There _is_ logic for converting the latter to integers, but we
should test it separately.
* Check we ignore types that don't convert to int
* Handle `None` values in `notifications.room`
* Changelog
* Also test that bad values are rejected by event auth
* Docstring
* linter scripttttttttt
* Test boolean values in PL content
* Reject boolean power levels
* Changelog
* Perfer `type(x) is int` to `isinstance(x, int)`
This covered all additional instances I could see where `x` was
user-controlled.
The remaining cases are
```
$ rg -s 'isinstance.*[^_]int'
tests/replication/_base.py
576: if isinstance(obj, int):
synapse/util/caches/stream_change_cache.py
136: assert isinstance(stream_pos, int)
214: assert isinstance(stream_pos, int)
246: assert isinstance(stream_pos, int)
267: assert isinstance(stream_pos, int)
synapse/replication/tcp/external_cache.py
133: if isinstance(result, int):
synapse/metrics/__init__.py
100: if isinstance(calls, (int, float)):
synapse/handlers/appservice.py
262: assert isinstance(new_token, int)
synapse/config/_util.py
62: if isinstance(p, int):
```
which cover metrics, logic related to `jsonschema`, and replication and
data streams. AFAICS these are all internal to Synapse
* Changelog
* Better test for bad values in power levels events
The previous test only checked that Synapse didn't raise an exception,
but didn't check that we had correctly interpreted the value of the
dodgy power level.
It also conflated two things: bad room notification levels, and bad user
levels. There _is_ logic for converting the latter to integers, but we
should test it separately.
* Check we ignore types that don't convert to int
* Handle `None` values in `notifications.room`
* Changelog
* Also test that bad values are rejected by event auth
* Docstring
* linter scripttttttttt
MSC3952 defines push rules which searches for mentions in a list of
Matrix IDs in the event body, instead of searching the entire event
body for display name / local part.
This is implemented behind an experimental configuration flag and
does not yet implement the backwards compatibility pieces of the MSC.
The `/relations` endpoint was not properly handle "live tokens"
(i.e sync tokens), to do this properly we abstract the code that
`/messages` has and re-use it.
* Batch look-ups to see if rooms are partial stated.
* Fix issues found in linting.
* Fix typo.
* Apply suggestions from code review
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Clarify comments.
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Also improve the cache size while we're at it
* is_partial_state_rooms -> is_partial_state_room_batched
* Run `black`
* Improve annotation for `simple_select_many_batch`
* Fix is_partial_state_room_batched impl
* Okay, _actually_ fix impl
* Update description.
* Update synapse/storage/databases/main/room.py
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Run black.
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: David Robertson <davidr@element.io>
On startup, the `_device_list_id_gen` stream id generator is initialized
using the maximum stream id seen in a list of tables. When we started
populating the `device_list_remote_pending` table in #13913, we forgot
to add it to the aforementioned list of tables, so the stream id
generator can hand out old stream ids after a restart. The end result is
that Synapse can fail to handle device list update EDUs after a restart
when a partial state join is in progress.
Add the `device_list_remote_pending` table to the list of tables to
consider when initializing the `_device_list_id_gen` stream id generator.
Signed-off-by: Sean Quah <seanq@matrix.org>
Destination was being used incorrectly (a single destination instead
of a list of destinations was being passed).
This also updates some of the types in the area to not use Collection[str],
which is a footgun.
* Bump the client-side timeout for /state
to allow faster joins resyncs the chance to complete for large rooms.
We have seen this fair poorly (~90s for Matrix HQ's /state) in testing,
causing the resync to advance to another HS who hasn't seen our join yet.
* Changelog
* Milliseconds!!!!
#13873 introduced a regression which causes sqlite database migrations
to no longer run inside a transaction. Wrap them in a transaction again,
to avoid database corruption when migrations are interrupted.
Fixes#14909.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Request partial joins by default
This is a little sloppy, but we are trying to gain confidence in faster
joins in the upcoming RC.
Admins can still opt out by adding the following to their Synapse
config:
```yaml
experimental:
faster_joins: false
```
We may revert this change before the release proper, depending on how
testing in the wild goes.
* Changelog
* Try to fix the backfill test failures
* Upgrade notes
* Postgres compat?
* Allow `AbstractSet` in `StrCollection`
Or else frozensets are excluded. This will be useful in an upcoming
commit where I plan to change a function that accepts `List[str]` to
accept `StrCollection` instead.
* `rooms_to_exclude` -> `rooms_to_exclude_globally`
I am about to make use of this exclusion mechanism to exclude rooms for
a specific user and a specific sync. This rename helps to clarify the
distinction between the global config and the rooms to exclude for a
specific sync.
* Better function names for internal sync methods
* Track a list of excluded rooms on SyncResultBuilder
I plan to feed a list of partially stated rooms for this sync to ignore
* Exclude partial state rooms during eager sync
using the mechanism established in the previous commit
* Track un-partial-state stream in sync tokens
So that we can work out which rooms have become fully-stated during a
given sync period.
* Fix mutation of `@cached` return value
This was fouling up a complement test added alongside this PR.
Excluding a room would mean the set of forgotten rooms in the cache
would be extended. This means that room could be erroneously considered
forgotten in the future.
Introduced in #12310, Synapse 1.57.0. I don't think this had any
user-visible side effects (until now).
* SyncResultBuilder: track rooms to force as newly joined
Similar plan as before. We've omitted rooms from certain sync responses;
now we establish the mechanism to reintroduce them into future syncs.
* Read new field, to present rooms as newly joined
* Force un-partial-stated rooms to be newly-joined
for eager incremental syncs only, provided they're still fully stated
* Notify user stream listeners to wake up long polling syncs
* Changelog
* Typo fix
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Unnecessary list cast
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Rephrase comment
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Another comment
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Fixup merge(?)
* Poke notifier when receiving un-partial-stated msg over replication
* Fixup merge whoops
Thanks MV :)
Co-authored-by: Mathieu Velen <mathieuv@matrix.org>
Co-authored-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Faster joins: Update room stats and user directory on workers when done
When finishing a partial state join to a room, we update the current
state of the room without persisting additional events. Workers receive
notice of the current state update over replication, but neglect to wake
the room stats and user directory updaters, which then get incidentally
triggered the next time an event is persisted or an unrelated event
persister sends out a stream position update.
We wake the room stats and user directory updaters at the appropriate
time in this commit.
Part of #12814 and #12815.
Signed-off-by: Sean Quah <seanq@matrix.org>
* fixup comment
Signed-off-by: Sean Quah <seanq@matrix.org>
* Enable Complement tests for Faster Remote Room Joins on worker-mode
* (dangerous) Add an override to allow Complement to use FRRJ under workers
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Fix race where we didn't send out replication notification
* MORE HACKS
* Fix get_un_partial_stated_rooms_token to take instance_name
* Fix bad merge
* Remove warning
* Correctly advance un_partial_stated_room_stream
* Fix merge
* Add another notify_replication
* Fixups
* Create a separate ReplicationNotifier
* Fix test
* Fix portdb
* Create a separate ReplicationNotifier
* Fix test
* Fix portdb
* Fix presence test
* Newsfile
* Apply suggestions from code review
* Update changelog.d/14752.misc
Co-authored-by: Erik Johnston <erik@matrix.org>
* lint
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Erik Johnston <erik@matrix.org>
* Avoid clearing out forward extremities when doing a second remote join
When joining a restricted room where the local homeserver does not have
a user able to issue invites, we perform a second remote join. We want
to avoid clearing out forward extremities in this case because the
forward extremities we have are up to date and clearing out forward
extremities creates a window in which the room can get bricked if
Synapse crashes.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Do a full join when doing a second remote join into a full state room
We cannot persist a partial state join event into a joined full state
room, so we perform a full state join for such rooms instead. As a
future optimization, we could always perform a partial state join and
compute or retrieve the full state ourselves if necessary.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Add lock around partial state flag for rooms
Signed-off-by: Sean Quah <seanq@matrix.org>
* Preserve partial state info when doing a second partial state join
Signed-off-by: Sean Quah <seanq@matrix.org>
* Add newsfile
* Add a TODO(faster_joins) marker
Signed-off-by: Sean Quah <seanq@matrix.org>
Now that we wait for stream positions whenever we do a HTTP replication
hit, we need to be less brutal in the case where we do timeout (as we
have bugs around this).
Currently, we will try to start a new partial state sync every time we
perform a remote join, which is undesirable if there is already one
running for a given room.
We intend to perform remote joins whenever additional local users wish
to join a partial state room, so let's ensure that we do not start more
than one concurrent partial state sync for any given room.
------------------------------------------------------------------------
There is a race condition where the homeserver leaves a room and later
rejoins while the partial state sync from the previous membership is
still running. There is no guarantee that the previous partial state
sync will process the latest join, so we restart it if needed.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Change Documentation to have v10 as default room version
* Change Default Room version to 10
* Add changelog entry for default room version swap
* Add changelog entry for v10 default room version in docs
* Clarify doc changelog entry
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Improve Documentation changes.
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Update Changelog entry to have correct format
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Update Spec Version to 1.5
* Only need 1 changelog.
* Fix test.
* Update "Changed in" line
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <patrickc@matrix.org>
Serving partial join responses is no longer experimental. They will only be served under the stable identifier if the the undocumented config flag experimental.msc3706_enabled is set to true.
Synapse continues to request a partial join only if the undocumented config flag experimental.faster_joins is set to true; this setting remains present and unaffected.
We were incorrectly checking if the *local* token had been advanced, rather than the token for the remote instance.
In practice, I don't think this has caused any bugs due to where we use `wait_for_stream_position`, as critically we don't use it on instances that also write to the given streams (and so the local token will lag behind all remote tokens).
When the local homeserver is already joined to a room and wants to
perform another remote join, we may find it useful to do a non-partial
state join if we already have the full state for the room.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Also use stable name in SendJoinResponse struct
follow-up to #14832
* Changelog
* Fix a rename I missed
* Run black
* Update synapse/federation/federation_client.py
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Use new query param when requesting a partial join
* Read new query param when serving partial join
* Provide new field names when serving partial joins
* Read new field names from partial join response
* Changelog
When there are many synchronous requests waiting on a
`_PerHostRatelimiter`, each request will be started recursively just
after the previous request has completed. Under the right conditions,
this leads to stack exhaustion.
A common way for requests to become synchronous is when the remote
client disconnects early, because the homeserver is overloaded and slow
to respond.
Avoid stack exhaustion under these conditions by deferring subsequent
requests until the next reactor tick.
Fixes#14480.
Signed-off-by: Sean Quah <seanq@matrix.org>
Two parts to this:
* Bundle the whole of the replacement with any edited events. This is backwards-compatible so I haven't put it behind a flag.
* Optionally, inhibit server-side replacement of edited events. This has scope to break things, so it is currently disabled by default.
It doesn't seem valid that HTML entities should appear in
the title field of oEmbed responses, but a popular WordPress
plug-in seems to do it.
There should not be harm in unescaping these.
This has two related changes:
* It enables fast-path processing for an empty filter (`[]`) which was
previously only used for wildcard not-filters (`["*"]`).
* It special cases a `/sync` filter with no-rooms to skip all room
processing, previously we would partially skip processing, but would
generally still calculate intermediate values for each room which were
then unused.
Future changes might consider further optimizations:
* Skip calculating per-room account data when all rooms are filtered (currently
this is thrown away).
* Make similar improvements to other endpoints which support filters.
* Fixes#12277 :Disable sending confirmation email when 3pid is disabled
* Fix test_add_email_if_disabled test case to reflect changes to enable_3pid_changes flag
* Add changelog file
* Rename newsfragment.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
PKCE can protect against certain attacks and is enabled by default. Support
can be controlled manually by setting the pkce_method of each oidc_providers
entry to 'auto' (default), 'always', or 'never'.
This is required by Twitter OAuth 2.0 support.
OpenID specifies the format of the user info endpoint and some
OAuth 2.0 IdPs do not follow it, e.g. NextCloud and Twitter.
This adds subject_template and picture_template options to the
default mapping provider for more flexibility in matching those user
info responses.
This creates a new store method, `process_replication_position` that
is called after `process_replication_rows`. By moving stream ID advances
here this guarantees any relevant cache invalidations will have been
applied before the stream is advanced.
This avoids race conditions where Python switches between threads mid
way through processing the `process_replication_rows` method where stream
IDs may be advanced before caches are invalidated due to class resolution
ordering.
See this comment/issue for further discussion:
https://github.com/matrix-org/synapse/issues/14158#issuecomment-1344048703
Then adapts calling code to retry when needed so it doesn't 500
to clients.
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
if a Synapse deployment upgraded (from < 1.62.0 to >= 1.70.0) then it
is possible for schema deltas to run before background updates causing
drift in the database schema due to:
1. A delta registered a background update to create an index.
2. A delta dropped the above index if it exists (but it yet exist won't since
the background job hasn't run).
3. The code assumed the index was dropped.
To fix this we:
1. Cancel the background update which could create the index.
2. Drop the index again.
3. Drop a related index which is dropped by the background update.
This avoids pulling additional state information (and events) from
the database for each item returned in the hierarchy response.
The room type might be out of date until a background update finishes
running, the worst impact of this would be spaces being treated as rooms
in the hierarchy response. This should self-heal once the background
update finishes.
* Declare new config
* Parse new config
* Read new config
* Don't use trial/our TestCase where it's not needed
Before:
```
$ time trial tests/events/test_utils.py > /dev/null
real 0m2.277s
user 0m2.186s
sys 0m0.083s
```
After:
```
$ time trial tests/events/test_utils.py > /dev/null
real 0m0.566s
user 0m0.508s
sys 0m0.056s
```
* Helper to upsert to event fields
without exceeding size limits.
* Use helper when adding invite/knock state
Now that we allow admins to include events in prejoin room state with
arbitrary state keys, be a good Matrix citizen and ensure they don't
accidentally create an oversized event.
* Changelog
* Move StateFilter tests
should have done this in #14668
* Add extra methods to StateFilter
* Use StateFilter
* Ensure test file enforces typed defs; alphabetise
* Workaround surprising get_current_state_ids
* Whoops, fix mypy
* Enable `--warn-redundant-casts` option in mypy
Doesn't do much but helps me sleep better at night.
* Changelog
* Fix name of the ignore
* Fix one more missed cast
Not sure why I didn't see this one locally, maybe I needed a poetry update
* Remove old comment
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
#11915 introduced the `@cached` `is_interested_in_room` method in
Synapse 1.55.0, which depends upon `get_aliases_for_room`. Add a missing
cache invalidation callback so that the `is_interested_in_room` cache is
invalidated when `get_aliases_for_room` is invalidated.
#13787 made `get_rooms_for_user` `@cached`. Add a missing cache
invalidation callback so that the `is_interested_in_presence` cache is
invalidated when `get_rooms_for_user` is invalidated.
Signed-off-by: Sean Quah <seanq@matrix.org>
Fixes#13655
This change uses ICU (International Components for Unicode) to improve boundary detection in user search.
This change also adds a new dependency on libicu-dev and pkg-config for the Debian packages, which are available in all supported distros.
When Synapse is terminated while running the background update to create
the `receipts_graph` or `receipts_linearized` indexes, the indexes may
be successfully created (or marked as invalid on postgres) while the
background update remains unfinished. When Synapse next starts up, the
background update will fail because the index already exists, or exists
but is invalid on postgres.
Use the existing code to create indices in background updates, since it
handles these edge cases.
Signed-off-by: Sean Quah <seanq@matrix.org>
This should help reduce the number of devices e.g. simple bots the repeatedly login rack up.
We only delete non-e2e devices as they should be safe to delete, whereas if we delete e2e devices for a user we may accidentally break their ability to receive e2e keys for a message.
This PR changes http-based image URLs to be https in html templates.
This impacts the Synapse SSO error page, where browsers report mixed
media content warnings.
Also, https://matrix.org/img/vector-logo-email.png is currently broken
but the URL has been updated to be https anyway.
Signed-off-by: Ashish Kumar <ashfame@users.noreply.github.com>
Due to the various fixes to the StreamChangeCache it is not
safe to trust the information in the user directory or room/user
stats tables. Rebuild them as background jobs.
In particular see da77720752 (#14639),
and 6a8310f3df (#14435).
Maybe also be related to fac8a38525
(#14592).
An empty cache does not mean the entity has no changed, if
it is earlier than the earliest known stream position return that
the entity *has* changed since the cache cannot accurately
answer that query.
A batch of changes intended to make it easier to trace to-device messages through the system.
The intention here is that a client can set a property org.matrix.msgid in any to-device message it sends. That ID is then included in any tracing or logging related to the message. (Suggestions as to where this field should be documented welcome. I'm not enthusiastic about speccing it - it's very much an optional extra to help with debugging.)
I've also generally improved the data we send to opentracing for these messages.
The internal methods of the StreamChangeCache were inconsistently
treating the earliest known stream position as valid. It is now treated as
invalid, meaning the cache cannot determine if an entity at the earliest
known stream position has changed or not.
Add logic to ClientRestResource to decide whether to mount servlets
or not based on whether the current process is a worker.
This is clearer to see what a worker runs than the completely separate /
copy & pasted list of servlets being mounted for workers.
StreamChangeCache.get_all_changed_entities can return None to signify
it does not have information at the given stream position. Two callers (related
to device lists and presence) were treating this response the same as an empty
list (i.e. there being no updates).
This should help reduce the number of devices e.g. simple bots the repeatedly login rack up.
We only delete non-e2e devices as they should be safe to delete, whereas if we delete e2e devices for a user we may accidentally break their ability to receive e2e keys for a message.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Support MSC1767's `content.body` behaviour in push rules
* Add the base rules from MSC3933
* Changelog entry
* Flip condition around for finding `m.markup`
* Remove forgotten import
* Add support for MSC3931: Room Version Supports push rule condition
* Create experimental flag for future work, and use it to gate MSC3931
* Changelog entry
* Use `device_one_time_keys_count` to match MSC3202
Rename the `device_one_time_key_counts` key in responses to
`device_one_time_keys_count` to match the name specified by MSC3202.
Also change related variable/class names for consistency.
Signed-off-by: Andrew Ferrazzutti <andrewf@element.io>
* Update changelog.d/14565.misc
* Revert name change for `one_time_key_counts` key
as this is a different key altogether from `device_one_time_keys_count`,
which is used for `/sync` instead of appservice transactions.
Signed-off-by: Andrew Ferrazzutti <andrewf@element.io>
`setup()` is run under the sentinel context manager, so we wrap the
initial update in a background process. Before this change, Synapse
would log two warnings on startup:
Starting db txn 'count_daily_users' from sentinel context
Starting db connection from sentinel context: metrics will be lost
Signed-off-by: Sean Quah <seanq@matrix.org>
Include the thread_id field when sending read receipts over
federation. This might result in the same user having multiple
read receipts per-room, meaning multiple EDUs must be sent
to encapsulate those receipts.
This restructures the PerDestinationQueue APIs to support
multiple receipt EDUs, queue_read_receipt now becomes linear
time in the number of queued threaded receipts in the room for
the given user, it is expected this is a small number since receipt
EDUs are sent as filler in transactions.
To perform an emulated upsert into a table safely, we must either:
* lock the table,
* be the only writer upserting into the table
* or rely on another unique index being present.
When the 2nd or 3rd cases were applicable, we previously avoided locking
the table as an optimization. However, as seen in #14406, it is easy to
slip up when adding new schema deltas and corrupt the database.
The only time we lock when performing emulated upserts is while waiting
for background updates on postgres. On sqlite, we do no locking at all.
Let's remove the option to skip locking tables, so that we don't shoot
ourselves in the foot again.
Signed-off-by: Sean Quah <seanq@matrix.org>
This commit adds support for handling a provided avatar picture URL
when logging in via SSO.
Signed-off-by: Ashish Kumar <ashfame@users.noreply.github.com>
Fixes#9357.
This was the last untyped handler from the HomeServer object. Since
it was being treated as Any (and thus unchecked) it was being used
incorrectly in a few places.
When a local device list change is added to
`device_lists_changes_in_room`, the `converted_to_destinations` flag is
set to `FALSE` and the `_handle_new_device_update_async` background
process is started. This background process looks for unconverted rows
in `device_lists_changes_in_room`, copies them to
`device_lists_outbound_pokes` and updates the flag.
To update the `converted_to_destinations` flag, the database performs a
`DELETE` and `INSERT` internally, which fragments the table. To avoid
this, track unconverted rows using a `(stream ID, room ID)` position
instead of the flag.
From now on, the `converted_to_destinations` column indicates rows that
need converting to outbound pokes, but does not indicate whether the
conversion has already taken place.
Closes#14037.
Signed-off-by: Sean Quah <seanq@matrix.org>
Avoid an n+1 query problem and fetch the bundled aggregations for
m.reference relations in a single query instead of a query per event.
This applies similar logic for as was previously done for edits in
8b309adb43 (#11660; threads
in b65acead42 (#11752); and
annotations in 1799a54a54 (#14491).
Avoid an n+1 query problem and fetch the bundled aggregations for
m.annotation relations in a single query instead of a query per event.
This applies similar logic for as was previously done for edits in
8b309adb43 (#11660) and threads
in b65acead42 (#11752).
* Add tests for StreamIdGenerator
* Drive-by: annotate all defs
* Revert "Revert "Remove slaved id tracker (#14376)" (#14463)"
This reverts commit d63814fd73, which in
turn reverted 36097e88c4. This restores
the latter.
* Fix StreamIdGenerator not handling unpersisted IDs
Spotted by @erikjohnston.
Closes#14456.
* Changelog
Co-authored-by: Nick Mills-Barrett <nick@fizzadar.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.
Also adds some missing type hints which were simple to fill in.
As part of the database migration to support threaded receipts, there is
a possible window in between
`73/08thread_receipts_non_null.sql.postgres` removing the original
unique constraints on `receipts_linearized` and `receipts_graph` and the
`reeipts_linearized_unique_index` and `receipts_graph_unique_index`
background updates from `72/08thread_receipts.sql` completing where
the unique constraints on `receipts_linearized` and `receipts_graph` are
missing. Any emulated upserts on these tables must therefore be
performed with a lock held, otherwise duplicate rows can end up in the
tables when there are concurrent emulated upserts. Fix the missing lock.
Note that emulated upserts no longer happen by default on sqlite, since
the minimum supported version of sqlite supports native upserts by
default now.
Finally, clean up any duplicate receipts that may have crept in before
trying to create the `receipts_graph_unique_index` and
`receipts_linearized_unique_index` unique indexes.
Signed-off-by: Sean Quah <seanq@matrix.org>
We don't filter state usually, so doing so here is a waste of time. This is not much of an issue for clients that enable lazy loading of members, since there will be fewer state events.
This matches the multi instance writer ID generator class which can
both handle advancing the current token over replication and by calling
the database.
This code was factored out to a method, but also left in-place.
Calling this twice in a row makes no sense: the first call will reduce
the size appropriately, but the loop will immediately exit since the
cache size was already reduced.
PostgreSQL may underestimate the number of distinct `room_id`s in
`event_search`, which can cause it to use table scans for queries for
multiple rooms.
Fix this by setting `n_distinct` on the column.
Resolves#14402.
Signed-off-by: Sean Quah <seanq@matrix.org>
When this background update did its last batch, it would try to update all the
events that had been inserted since the bgupdate started, which could cause a
table-scan. Make sure we limit the update correctly.
For forward compatibility, Synapse needs to ignore fields it does not
recognise instead of raising an error.
Fixes#14365.
Signed-off-by: Sean Quah <seanq@matrix.org>
==============================
Please note that, as announced in the release notes for Synapse 1.69.0, legacy Prometheus metric names are now disabled by default.
They will be removed altogether in Synapse 1.73.0.
If not already done, server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names.
See the [upgrade notes](https://matrix-org.github.io/synapse/v1.71/upgrade.html#upgrading-to-v1710) for more details.
Improved Documentation
----------------------
- Document the changes to monthly active user metrics due to deprecation of legacy Prometheus metric names. ([\#14358](https://github.com/matrix-org/synapse/issues/14358), [\#14360](https://github.com/matrix-org/synapse/issues/14360))
Deprecations and Removals
-------------------------
- Disable legacy Prometheus metric names by default. They can still be re-enabled for now, but they will be removed altogether in Synapse 1.73.0. ([\#14353](https://github.com/matrix-org/synapse/issues/14353))
Internal Changes
----------------
- Run unit tests against Python 3.11. ([\#13812](https://github.com/matrix-org/synapse/issues/13812))
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEWMTnW8Z8khaaf90R+84KzgcyGG8FAmNlD5YACgkQ+84Kzgcy
GG8T0Qv+PkTt9+lUbDAsqWnqWtSYCit4DEZU/g0G+/xUVdZypo+pIHYzzEHEEPoB
Siinh5sLqUagoOn0i+YU+D6WZw+20F5+qdQi6QSkpPDorReYBl+4MvyRNqjn3lAE
a/bfyHY3+88SfVMza+GRQS7pU06CCMD5aqUZABnYwV5fpKWdF2hOESSuzqEElQDR
7zaZuc7I/8d/VNz79FmyDfGzYnvWS+tr0BcNIyye6OIzZ2AS+q8loBzQiJ5rd1kY
ZiGeAHfeu/r05X25kzD1E1qr04ycM7mFjDgBa/aSdf5kkimNNUISo424sawoCVOU
oDm+QVefH4zKK9b+DVAJ7eaHIaQsRjJir+h/umXrX09I/x/eMB7Zll1wy1T+KzUS
Zf4GCVCI1kKY3FzcPWh2Jf1j899JtbkrKasA+UYf3+PEeFxQ4LyjYlVHC+J1sCXL
NGn5YDZkvhfpTOR300eDisUsxKIBFPxL6FjzW5whOnBddKC+VVnlSQkAbn6yHQGE
LU1xKhta
=m6ze
-----END PGP SIGNATURE-----
Merge tag 'v1.71.0rc2' into develop
Synapse 1.71.0rc2 (2022-11-04)
==============================
Please note that, as announced in the release notes for Synapse 1.69.0, legacy Prometheus metric names are now disabled by default.
They will be removed altogether in Synapse 1.73.0.
If not already done, server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names.
See the [upgrade notes](https://matrix-org.github.io/synapse/v1.71/upgrade.html#upgrading-to-v1710) for more details.
Improved Documentation
----------------------
- Document the changes to monthly active user metrics due to deprecation of legacy Prometheus metric names. ([\#14358](https://github.com/matrix-org/synapse/issues/14358), [\#14360](https://github.com/matrix-org/synapse/issues/14360))
Deprecations and Removals
-------------------------
- Disable legacy Prometheus metric names by default. They can still be re-enabled for now, but they will be removed altogether in Synapse 1.73.0. ([\#14353](https://github.com/matrix-org/synapse/issues/14353))
Internal Changes
----------------
- Run unit tests against Python 3.11. ([\#13812](https://github.com/matrix-org/synapse/issues/13812))
If configured an OIDC IdP can log a user's session out of
Synapse when they log out of the identity provider.
The IdP sends a request directly to Synapse (and must be
configured with an endpoint) when a user logs out.