==============================
Bugfixes
--------
- Fix a bug introduced in Synapse 1.40.0 that caused Synapse to fail to process incoming federation traffic after handling a large amount of events in a v1 room. ([\#11806](https://github.com/matrix-org/synapse/issues/11806))
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAmHum5QTHGFuZHJld0Bh
bW9yZ2FuLnh5egAKCRCIhIgNLv5f9G6vD/9Dw6V0687vzahnU6aYWaecRSO1sbww
EtcCiXOh0r8HXPAwEIXJYomSTTbPl0eAwP8T1the1WZVQArvRW2VvMzDoBheO8bt
dCm4CTKFCGUsI/GrlFDPtjAEd6kruAETmpHZn7bAFSqtIpissD6FjUwg/ND/NJfM
zVM/bMgM0+js6eD9J5/k16E1ZWj7Lbp+/fKN+qTeQrXzIeT13WQZ4Nz1o/cqe/21
o/coI3FmYq8CzKpfA6qZMDd/OtYpYqwwr7otSmW+6qHeZI/yqeoxlpzfgSZGpRUe
mtXqbQllZQeqrbm8oK96GNmKcThy6awTwJPoD46b1AOkmFdf/lGSqO0lnTQRRPqR
hyPPdrx6Lt2t7DDbVVyUElzkqLPhVJtrItPDC8669sWvmSgsPQdUqIsRmhF+8aJe
ffjRvKbGRICnNZkT+qGf1HwuMHajZyAIHAS/kyFDKUZCvau3VQ1wlyOqVQ1hWr7+
3k3CobjSx4y1bYKXncAK6hWH6lE8M319jaTnVfYXocDLWRonyFf7o286Q+c8WBGF
tY0FzvUPb5S2kktC4WLwcqtcTWK1cu5MI6GfD69EqL7iifJRVyKDeUoD7tiUvgzN
O2wBl2soJIsAU+8y16WQu7p+k2nZIomCCAySnW3C8mlxvv7CKQu0Url8J7IH8uZE
ZVN2MMe7H+ZHJQ==
=Fpsa
-----END PGP SIGNATURE-----
Merge tag 'v1.51.0rc2' into develop
Synapse 1.51.0rc2 (2022-01-24)
==============================
Bugfixes
--------
- Fix a bug introduced in Synapse 1.40.0 that caused Synapse to fail to process incoming federation traffic after handling a large amount of events in a v1 room. ([\#11806](https://github.com/matrix-org/synapse/issues/11806))
* remove reference in comments to python3.6
* upgrade tox python env in script
* bump python version in example for completeness
* upgrade python version requirement in setup doc
* upgrade necessary python version in __init__.py
* upgrade python version in setup.py
* newsfragment
* drops refs to bionic and replace with focal
* bump refs to postgres 9.6 to 10
* fix hanging ci
* try installing tzdata first
* revert change made in b979f336
* ignore new random mypy error while debugging other error
* fix lint error for temporary workaround
* revert change to install list
* try passing env var
* export debian frontend var?
* move line and add comment
* bump pillow dependency
* bump lxml depenency
* install libjpeg-dev for pillow
* bump automat version to one compatible with py3.8
* add libwebp for pillow
* bump twisted trunk python version
* change suffix of newsfragment
* remove redundant python 3.7 checks
* lint
Debug for #8631.
I'm having a hard time tracking down what's going wrong in that issue.
In the reported example, I could see server A sending federation traffic
to server B and all was well. Yet B reports out-of-sync device updates
from A.
I couldn't see what was _in_ the events being sent from A to B. So I
have added some crude logging to track
- when we have updates to send to a remote HS
- the edus we actually accumulate to send
- when a federation transaction includes a device list update edu
- when such an EDU is received
This is a bit of a sledgehammer.
By scraping Open Graph information from the HTML even
when an autodiscovery endpoint is found. The results are
then combined to capture as much information as possible
from the page.
I've never found this terribly useful. I think it was added in the early days
of Synapse, without much thought as to what would actually be useful to log,
and has just been cargo-culted ever since.
Rather, it tends to clutter up debug logs with useless information.
The existing implementation of the `python_twisted_reactor_tick_time` metric is pretty useless, because it *only*
measures the time taken to execute timed calls and callbacks from threads. That neglects everything that
happens off the back of I/O, which is obviously quite a lot for us.
To improve this, I've hooked into a different place in the reactor - in particular, where it calls `epoll`. That call is
the only place it should wait for something to happen - the rest of the loop *should* be quick.
I've also removed `python_twisted_reactor_pending_calls`, because I don't believe anyone ever looks at it, and
it's a nuisance to populate.
Always add state.room_id after the configurable ORDER BY. Otherwise,
for any sort, certain pages can contain results from
other pages. (Especially when sorting by creator, since there may
be many rooms by the same creator)
* Document different order direction of numerical fields
"joined_members", "joined_local_members", "version" and "state_events"
are ordered in descending direction by default (dir=f). Added a note
in tests to explain the differences in ordering.
Signed-off-by: Daniël Sonck <daniel@sonck.nl>
documentation claims that you can use the %(app)s variable in password_reset and email_validation subjects, but if you do you end up with an error 500
Co-authored-by: br4nnigan <10244835+br4nnigan@users.noreply.github.com>
Rather than hooking into the reactor loop, just add a timed task that runs every 100 ms to do the garbage collection.
Part 1 of a quest to simplify the reactor monkey-patching.
Currently when puppeting another user, the user doing the puppeting is
tracked for client IPs and MAU (if configured).
When tracking MAU is important, it becomes necessary to be possible to
also track the client IPs and MAU of puppeted users. As an example a
client that manages user creation and creation of tokens via the Synapse
admin API, passing those tokens for the client to use.
This PR adds optional configuration to enable tracking of puppeted users
into monthly active users. The default behaviour stays the same.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
By returning all of the m.space.child state of the space, not just
the first 50. The number of rooms returned is still capped at 50.
For the federation API this implies that the requesting server will
need to individually query for any other rooms it is not joined to.
This makes the serialization of events synchronous (and it no
longer access the database), but we must manually calculate and
provide the bundled aggregations.
Overall this should cause no change in behavior, but is prep work
for other improvements.
Fixes minor discrepancies between the /hierarchy endpoint described
in MSC2946 and the implementation.
Note that the changes impact the stable and unstable /hierarchy and
unstable /spaces endpoints for both client and federation APIs.
* `_auth_and_persist_outliers`: mark persisted events as outliers
Mark any events that get persisted via `_auth_and_persist_outliers` as, well,
outliers.
Currently this will be a no-op as everything will already be flagged as an
outlier, but I'm going to change that.
* `process_remote_join`: stop flagging as outlier
The events are now flagged as outliers later on, by `_auth_and_persist_outliers`.
* `send_join`: remove `outlier=True`
The events created here are returned in the result of `send_join` to
`FederationHandler.do_invite_join`. From there they are passed into
`FederationEventHandler.process_remote_join`, which passes them to
`_auth_and_persist_outliers`... which sets the `outlier` flag.
* `get_event_auth`: remove `outlier=True`
stop flagging the events returned by `get_event_auth` as outliers. This method
is only called by `_get_remote_auth_chain_for_event`, which passes the results
into `_auth_and_persist_outliers`, which will flag them as outliers.
* `_get_remote_auth_chain_for_event`: remove `outlier=True`
we pass all the events into `_auth_and_persist_outliers`, which will now flag
the events as outliers.
* `_check_sigs_and_hash_and_fetch`: remove unused `outlier` parameter
This param is now never set to True, so we can remove it.
* `_check_sigs_and_hash_and_fetch_one`: remove unused `outlier` param
This is no longer set anywhere, so we can remove it.
* `get_pdu`: remove unused `outlier` parameter
... and chase it down into `get_pdu_from_destination_raw`.
* `event_from_pdu_json`: remove redundant `outlier` param
This is never set to `True`, so can be removed.
* changelog
* update docstring
* Fix AssertionErrors after purging events
If you purged a bunch of events from your database, and then restarted synapse
without receiving more events, then you would get a bunch of AssertionErrors on
restart.
This fixes the situation by rewinding the stream processors.
* `check-newsfragment`: ignore deleted newsfiles
Events returned by `backfill` should not be flagged as outliers.
Fixes:
```
AssertionError: null
File "synapse/handlers/federation.py", line 313, in try_backfill
dom, room_id, limit=100, extremities=extremities
File "synapse/handlers/federation_event.py", line 517, in backfill
await self._process_pulled_events(dest, events, backfilled=True)
File "synapse/handlers/federation_event.py", line 642, in _process_pulled_events
await self._process_pulled_event(origin, ev, backfilled=backfilled)
File "synapse/handlers/federation_event.py", line 669, in _process_pulled_event
assert not event.internal_metadata.is_outlier()
```
See https://sentry.matrix.org/sentry/synapse-matrixorg/issues/231992Fixes#8894.
* Push `get_room_{min,max_stream_ordering}` into StreamStore
Both implementations of this are identical, so we may as well push it down and
get rid of the abstract base class nonsense.
* Remove redundant `StreamStore` class
This is empty now
* Remove redundant `get_current_events_token`
This was an exact duplicate of `get_room_max_stream_ordering`, so let's get rid
of it.
* newsfile
* Wrap `auth.get_user_by_req` in an opentracing span
give `get_user_by_req` its own opentracing span, since it can result in a
non-trivial number of sub-spans which it is useful to group together.
This requires a bit of reorganisation because it also sets some tags (and may
force tracing) on the servlet span.
* Emit opentracing span for encoding json responses
This can be a significant time sink.
* Rename all sync spans with a prefix
* Write an opentracing span for encoding sync response
* opentracing span to group generate_room_entries
* opentracing spans within sync.encode_response
* changelog
* Use the `trace` decorator instead of context managers
This adds some opentracing annotations to ResponseCache, to make it easier to see what's going on; in particular, it adds a link back to the initial trace which is actually doing the work of generating the response.
* remove `start_active_span_from_request`
Instead, pull out a separate function, `span_context_from_request`, to extract
the parent span, which we can then pass into `start_active_span` as
normal. This seems to be clearer all round.
* Remove redundant tags from `incoming-federation-request`
These are all wrapped up inside a parent span generated in AsyncResource, so
there's no point duplicating all the tags that are set there.
* Leave request spans open until the request completes
It may take some time for the response to be encoded into JSON, and that JSON
to be streamed back to the client, and really we want that inside the top-level
span, so let's hand responsibility for closure to the SynapseRequest.
* opentracing logs for HTTP request events
* changelog
* Disable aggregation bundling on `/sync` responses
A partial revert of #11478. This turns out to have had a significant CPU impact
on initial-sync handling. For now, let's disable it, until we find a more
efficient way of achieving this.
* Fix tests.
Co-authored-by: Patrick Cloke <patrickc@matrix.org>
A couple of safety-checks to hopefully stop people doing what I just did, and create a storage
function which only works the first time it is called (and not when it is re-run due to a database
concurrency error or similar).
* Splits the logic for parsing HTML from the resource handling code.
* Fix a circular import in the oEmbed code (which uses the HTML parsing code).
* Renames some of the HTML parsing methods to:
* Make it clear which methods are "internal" to the module.
* Clarify what the methods do.
Create a new dict helper method `simple_insert_many_values_txn`, which takes
raw row values, rather than {key=>value} dicts. This saves us a bunch of dict
munging, and makes it easier to use generators rather than creating
intermediate lists and dicts.
* Move sync_token up to the top
* Pull out _get_ignored_users
* Try to signpost the body of `_generate_sync_entry_for_rooms`
* Pull out _calculate_user_changes
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
After #10847, `looping_background_call` would print an error in the logs
every time a non-async function was called. Since the error would be
caught and ignored immediately, there were no other side effects.
Due to updates to MSC2675 this includes a few fixes:
* Include bundled aggregations for /sync.
* Do not include bundled aggregations for /initialSync and /events.
* Do not bundle aggregations for state events.
* Clarifies comments and variable names.
This mainly consists of docstrings and inline comments. There are one or two type annotations and variable renames thrown in while I was here.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030
Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about.
```
GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
{
"event_id": ...
"origin_server_ts": ...
}
```
Federation API endpoint:
```
GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
{
"event_id": ...
"origin_server_ts": ...
}
```
Co-authored-by: Erik Johnston <erik@matrix.org>
* Add check to catch syanpse master process starting when workers are configured
* add test to verify that starting master process with worker config raises error
* newsfragment
* specify config.worker.worker_app in check
* update test
* report specific config option that triggered the error
Co-authored-by: reivilibre <oliverw@matrix.org>
* clarify error message
Co-authored-by: reivilibre <oliverw@matrix.org>
Co-authored-by: reivilibre <oliverw@matrix.org>
When all entries in an `LruCache` have a size of 0 according to the
provided `size_callback`, and `drop_from_cache` is called on a cache
node, the node would be unlinked from the LRU linked list but remain in
the cache dictionary. An assertion would be later be tripped due to the
inconsistency.
Avoid unintentionally calling `__len__` and use a strict `is None`
check instead when unwrapping the weak reference.
Part of https://github.com/matrix-org/synapse/issues/11300
Call stack:
- `_persist_events_and_state_updates` (added `use_negative_stream_ordering`)
- `_persist_events_txn`
- `_update_room_depths_txn` (added `update_room_forward_stream_ordering`)
- `_update_metadata_tables_txn`
- `_store_room_members_txn` (added `inhibit_local_membership_updates`)
Using keyword-only arguments (`*`) to reduce the mistakes from `backfilled` being left as a positional argument somewhere and being interpreted wrong by our new arguments.
Since e81fa92648, Synapse depends on
the use_float flag which has been introduced in ijson 3.1 and
is not available in 3.0. This is known to cause runtime errors
with send_join.
Signed-off-by: Daniel Molkentin <danimo@infra.run>
Co-authored-by: Daniel Molkentin <danimo@infra.run>
The previous fix for the ongoing event fetches counter
(8eec25a1d9) was both insufficient and
incorrect.
When the database is unreachable, `_do_fetch` never gets run and so
`_event_fetch_ongoing` is never decremented.
The previous fix also moved the `_event_fetch_ongoing` decrement outside
of the `_event_fetch_lock` which allowed race conditions to corrupt the
counter.
This change makes mypy complain if the constants are ever reassigned,
and, more usefully, makes mypy type them as `Literal`s instead of `str`s,
allowing code of the following form to pass mypy:
```py
def do_something(membership: Literal["join", "leave"], ...): ...
do_something(Membership.JOIN, ...)
```
* remove background update code related to deprecated config flag
* changelog entry
* update changelog
* Delete 11394.removal
Duplicate, wrong number
* add no-op background update and change newfragment so it will be consolidated with associated work
* remove unused code
* Remove code associated with deprecated flag from legacy docker dynamic config file
Co-authored-by: reivilibre <oliverw@matrix.org>
Instead of only known relation types. This also reworks the background
update for thread relations to crawl events and search for any relation
type, not just threaded relations.
If `room_list_publication_rules` was configured with a rule with a
non-wildcard alias and a room was created with an alias then an
internal server error would have been thrown.
This fixes the error and properly applies the publication rules
during room creation.
Fixes a bug introduced in #11129: objects signed by the local server, but with
keys other than the current one, could not be successfully verified.
We need to check the key id in the signature, and track down the right key.