Commit Graph

4695 Commits

Author SHA1 Message Date
Eric Eastwood
fa8616e65c
Fix MSC3030 /timestamp_to_event returning outliers that it has no idea whether are near a gap or not (#14215)
Fix MSC3030 `/timestamp_to_event` endpoint returning `outliers` that it has no idea whether are near a gap or not (and therefore unable to determine whether it's actually the closest event). The reason Synapse doesn't know whether an `outlier` is next to a gap is because our gap checks rely on entries in the `event_edges`, `event_forward_extremeties`, and `event_backward_extremities` tables which is [not the case for `outliers`](2c63cdcc3f/docs/development/room-dag-concepts.md (outliers)).

Also fixes MSC3030 Complement `can_paginate_after_getting_remote_event_from_timestamp_to_event_endpoint` test flake.  Although this acted flakey in Complement, if `sync_partial_state` raced and beat us before `/timestamp_to_event`, then even if we retried the failing `/context` request it wouldn't work until we made this Synapse change. With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.

Fix  https://github.com/matrix-org/synapse/issues/13944


### Why did this fail before? Why was it flakey?

Sleuthing the server logs on the [CI failure](https://github.com/matrix-org/synapse/actions/runs/3149623842/jobs/5121449357#step:5:5805), it looks like `hs2:/timestamp_to_event` found `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` event locally. Then when we went and asked for it via `/context`, since it's an `outlier`, it was filtered out of the results -> `You don't have permission to access that event.`

This is reproducible when `sync_partial_state` races and persists `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` before we evaluate `get_event_for_timestamp(...)`. To consistently reproduce locally, just add a delay at the [start of `get_event_for_timestamp(...)`](cb20b885cb/synapse/handlers/room.py (L1470-L1496)) so it always runs after `sync_partial_state` completes.

```py
from twisted.internet import task as twisted_task
d = twisted_task.deferLater(self.hs.get_reactor(), 3.5)
await d
```

In a run where it passes, on `hs2`, `get_event_for_timestamp(...)` finds a different event locally which is next to a gap and we request from a closer one from `hs1` which gets backfilled. And since the backfilled event is not an `outlier`, it's returned as expected during `/context`.

With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.
2022-10-18 19:46:25 -05:00
Aaron Raimist
2a76a7369f
Fix hiding devices names over federation (#10015)
And don't include blank opentracing stuff in device list updates.

Signed-off-by: Aaron Raimist <aaron@raim.ist>
2022-10-18 20:54:27 +00:00
Patrick Cloke
dbf18f514e
Update the thread_id right before use (in case the bg update hasn't finished) (#14222)
This avoids running a forced-update of a null thread_id rows.

An index is added (in the background) to hopefully make this
easier in the future.
2022-10-18 14:55:41 +00:00
David Robertson
c3a4780080
When restarting a partial join resync, prioritise the server which actioned a partial join (#14126) 2022-10-18 12:33:18 +01:00
Andrew Morgan
dc02d9f8c5
Avoid checking the event cache when backfilling events (#14164) 2022-10-18 10:33:35 +01:00
Andrew Morgan
828b5502cf
Remove _get_events_cache check optimisation from _have_seen_events_dict (#14161) 2022-10-18 10:33:21 +01:00
Patrick Cloke
4283bd1cf9
Support filtering the /messages API by relation type (MSC3874). (#14148)
Gated behind an experimental configuration flag.
2022-10-17 11:32:11 -04:00
Nick Mills-Barrett
2c2c3f8b2c
Invalidate rooms for user caches when receiving membership events (#14155)
This should fix a race where the event notification comes in over
replication before the state replication, leaving a window during
which a sync may get an incorrect list of rooms for the user.
2022-10-17 13:27:51 +01:00
Eric Eastwood
40bb37eb27
Stop getting missing prev_events after we already know their signature is invalid (#13816)
While https://github.com/matrix-org/synapse/pull/13635 stops us from doing the slow thing after we've already done it once, this PR stops us from doing one of the slow things in the first place.

Related to
 - https://github.com/matrix-org/synapse/issues/13622
    - https://github.com/matrix-org/synapse/pull/13635
 - https://github.com/matrix-org/synapse/issues/13676

Part of https://github.com/matrix-org/synapse/issues/13356

Follow-up to https://github.com/matrix-org/synapse/pull/13815 which tracks event signature failures.

With this PR, we avoid the call to the costly `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before and we filter events based on the backoff before calling `_get_state_ids_after_missing_prev_event` now.

For example, this will save us 156s out of the 185s total that this `matrix.org` `/messages` request. If you want to see the full Jaeger trace of this, you can drag and drop this `trace.json` into your own Jaeger, https://gist.github.com/MadLittleMods/4b12d0d0afe88c2f65ffcc907306b761

To explain this exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid.

 - `backfill`
    - `outgoing-federation-request` `/backfill`
    - `_check_sigs_and_hash_and_fetch`
       - `_check_sigs_and_hash_and_fetch_one` for each event received over backfill
          -  `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
          -  `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
   - `_process_pulled_events`
      - `_process_pulled_event` for each validated event
         -  Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it
            - `_get_state_ids_after_missing_prev_event`
               - `outgoing-federation-request` `/state_ids`
               -  `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again
               -  `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check
2022-10-15 00:36:49 -05:00
Patrick Cloke
bc2bd92b93 Merge remote-tracking branch 'origin/release-v1.69' into develop 2022-10-14 14:11:27 -04:00
Patrick Cloke
d1bdeccb50
Accept threaded receipts for events related to the root event. (#14174)
The root node of a thread (and events related to it) are considered
"part of a thread" when validating receipts. This allows clients which
show the root node in both the main timeline and the threaded timeline
to easily send receipts in either.

Note that threaded notifications are not created for these events, these
events created notifications on the main timeline.
2022-10-14 18:05:25 +00:00
Erik Johnston
d241a1350d
Fix background update to use an index (#14181) 2022-10-14 13:46:23 +00:00
Patrick Cloke
126a15794c
Do not allow a None-limit on PaginationConfig. (#14146)
The callers either set a default limit or manually handle a None-limit
later on (by setting a default value).

Update the callers to always instantiate PaginationConfig with a default
limit and then assume the limit is non-None.
2022-10-14 12:30:05 +00:00
Patrick Cloke
9ff4155f6c
Properly invalidate get_thread_id cache. (#14163)
This was missed in 2b6d41ebd6 (#13824).
2022-10-14 07:10:44 -04:00
David Robertson
16c5d95b59
Optimise the event_push_backfill_thread_id bg job (#14172)
Co-authored-by: Erik Johnston <erik@matrix.org>
2022-10-13 17:32:16 +00:00
Patrick Cloke
2019b60f3b
Fix sqlite syntax for upserts. (#14171) 2022-10-13 12:53:24 -04:00
Patrick Cloke
7d59a515bb Properly return the thread ID down sync. (#14159)
Fix a broken conflict in e6e876b9b1,
by not stomping over a field right after creating it.
2022-10-13 12:15:41 -04:00
Patrick Cloke
3bbe532abb
Add an API for listing threads in a room. (#13394)
Implement the /threads endpoint from MSC3856.

This is currently unstable and behind an experimental configuration
flag.

It includes a background update to backfill data, results from
the /threads endpoint will be partial until that finishes.
2022-10-13 08:02:11 -04:00
Patrick Cloke
e6e876b9b1 Return the thread ID properly down sync. (#14159)
A receipt's thread ID, if one exists, should be added to the
body of a receipt.
2022-10-12 12:18:34 -04:00
Patrick Cloke
87099b6ea5
Return the main timeline for events which are not part of a thread. (#14140)
Fixes a bug where threaded receipts could not be sent for the
main timeline.
2022-10-12 12:15:52 -04:00
Nick Mills-Barrett
f9bc5428c4
Batch up calls to get_rooms_for_users (#14109) 2022-10-12 11:36:22 +01:00
Patrick Cloke
09be8ab5f9
Remove the experimental implementation of MSC3772. (#14094)
MSC3772 has been abandoned.
2022-10-12 06:26:39 -04:00
Shay
a86b2f6837
Fix a bug where redactions were not being sent over federation if we did not have the original event. (#13813) 2022-10-11 11:18:45 -07:00
Erik Johnston
02086e1da0
Fix rotating existing notifications in push summary (#14138)
Broke by #14045. Fixes #14120.

Introduced in v1.69.0rc2.
2022-10-11 15:13:32 +00:00
Patrick Cloke
ab8047b4bf
Apply & bundle edits for non-message events. (#14034)
Fixes two related bugs:

* No edit information was bundled for events which aren't `m.room.message`.
* `m.new_content` was not applied for those events.
2022-10-07 15:27:50 +00:00
Patrick Cloke
0b037d6c91
Fix handling of public rooms filter with a network tuple. (#14053)
Fixes two related bugs:

* The handling of `[null]` for a `room_types` filter was incorrect.
* The ordering of arguments when providing both a network tuple
  and room type field was incorrect.
2022-10-05 12:49:52 +00:00
Patrick Cloke
e3d4755454
Fix backwards compatibility with upcoming threads schema changes. (#14045)
Ensure that the upsert will work properly by first updating any existing
rows (in the same way that the background update to backfill data works).
2022-10-05 07:56:05 -04:00
Patrick Cloke
dcced5a8d7
Use threaded receipts when fetching events for push. (#13878)
Update the HTTP and email pushers to consider threaded read receipts
when fetching unread events.
2022-10-04 12:07:02 -04:00
Patrick Cloke
2b6d41ebd6
Recursively fetch the thread for receipts & notifications. (#13824)
Consider an event to be part of a thread if you can follow a
chain of relations up to a thread root.

Part of MSC3773 & MSC3771.
2022-10-04 11:36:16 -04:00
Patrick Cloke
a7ba457b2b
Mark events as read using threaded read receipts from MSC3771. (#13877)
Applies the proper logic for unthreaded and threaded receipts to either
apply to all events in the room or only events in the same thread, respectively.
2022-10-04 10:46:42 -04:00
Patrick Cloke
b4ec4f5e71
Track notification counts per thread (implement MSC3773). (#13776)
When retrieving counts of notifications segment the results based on the
thread ID, but choose whether to return them as individual threads or as
a single summed field by letting the client opt-in via a sync flag.

The summarization code is also updated to be per thread, instead of per
room.
2022-10-04 09:47:04 -04:00
Patrick Cloke
e70c6b720e
Disable pushing for server ACL events (MSC3786). (#13997)
Switches to the stable identifier for MSC3786 and enables it
by default.

This disables pushes of m.room.server_acl events.
2022-10-04 07:08:27 -04:00
Erik Johnston
5a6d025246
Clear out old rows from event_push_actions_staging (#14020)
On matrix.org we have ~5 million stale rows in `event_push_actions_staging`, let's add a background job to make sure we clear them out.
2022-10-03 18:44:44 +01:00
Erik Johnston
2c237debd3
Fix bug where we didn't delete staging push actions (#14014)
Introduced in #13719
2022-10-03 13:45:19 +00:00
Erik Johnston
606b2d9009
Add cache to get_partial_state_servers_at_join (#14013) 2022-10-03 13:13:11 +00:00
Sean Quah
d65862c41f
Refactor _get_e2e_device_keys_txn to split large queries (#13956)
Instead of running a single large query, run a single query for
user-only lookups and additional queries for batches of user device
lookups.

Resolves #13580.

Signed-off-by: Sean Quah <seanq@matrix.org>
2022-10-03 13:46:36 +01:00
David Robertson
285d72556b
Update mypy and mypy-zope, attempt 3 (#13993)
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2022-09-30 17:36:28 +01:00
David Robertson
8e52cb0bce
Revert "Update mypy and mypy-zope (#13925)"
This reverts commit 6d543d6d9f.
2022-09-30 16:37:48 +01:00
David Robertson
6d543d6d9f
Update mypy and mypy-zope (#13925)
* Update mypy and mypy-zope

* Unignore assigning to LogRecord attributes

Presumably https://github.com/python/typeshed/pull/8064 makes this ok

Cherry-picked from #13521

* Remove unused ignores due to mypy ParamSpec fixes

https://github.com/python/mypy/pull/12668

Cherry-picked from #13521

* Remove additional unused ignores

* Fix new mypy complaints related to `assertGreater`

Presumably due to https://github.com/python/typeshed/pull/8077

* Changelog

* Reword changelog

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2022-09-30 16:34:47 +01:00
Erik Johnston
3dfc4a08dc
Fix performance regression in get_users_in_room (#13972)
Fixes #13942. Introduced in #13575.

Basically, let's only get the ordered set of hosts out of the DB if we need an ordered set of hosts. Since we split the function up the caching won't be as good, but I think it will still be fine as e.g. multiple backfill requests for the same room will hit the cache.
2022-09-30 13:15:32 +01:00
David Robertson
e8f30a76ca
Fix overflows in /messages backfill calculation (#13936)
* Reproduce bug
* Compute `least_function` first
* Substitute `least_function` with an f-string
* Bugfix: avoid overflow

Co-authored-by: Eric Eastwood <erice@element.io>
2022-09-30 11:54:53 +01:00
David Robertson
15754d720f
Update UPSERT comment now that native upserts are the default (#13924) 2022-09-29 19:10:47 +01:00
Nick Mills-Barrett
a466164647
Optimise get_rooms_for_user (drop with_stream_ordering) (#13787) 2022-09-29 13:55:12 +00:00
Brendan Abolivier
be76cd8200
Allow admins to require a manual approval process before new accounts can be used (using MSC3866) (#13556) 2022-09-29 15:23:24 +02:00
Patrick Cloke
8625ad8099
Explicit cast to enforce type hints. (#13939) 2022-09-29 07:22:41 -04:00
Patrick Cloke
568016929f
Clarify that a method returns only unthreaded receipts. (#13937)
By renaming it and updating the docstring.

Additionally, refactors a method which is used only by tests.
2022-09-29 07:07:31 -04:00
Erik Johnston
5f659d4a88
Handle local device list updates during partial join (#13934) 2022-09-28 23:22:35 +01:00
Eric Eastwood
df8b91ed2b
Limit and filter the number of backfill points to get from the database (#13879)
There is no need to grab thousands of backfill points when we only need 5 to make the `/backfill` request with. We need to grab a few extra in case the first few aren't visible in the history.

Previously, we grabbed thousands of backfill points from the database, then sorted and filtered them in the app. Fetching the 4.6k backfill points for `#matrix:matrix.org` from the database takes ~50ms - ~570ms so it's not like this saves a lot of time 🤷. But it might save us more time now that `get_backfill_points_in_room`/`get_insertion_event_backward_extremities_in_room` are more complicated after https://github.com/matrix-org/synapse/pull/13635 

This PR moves the filtering and limiting to the SQL query so we just have less data to work with in the first place.

Part of https://github.com/matrix-org/synapse/issues/13356
2022-09-28 15:26:16 -05:00
Patrick Cloke
1386ce4735
Revert "Stop returning an unused column when handling new receipts. (#13933)" (#13935)
This reverts commit 7766bd5b35 (#13933).

The unused column is actually used, but much further down in the function.
2022-09-28 11:01:41 -04:00
Patrick Cloke
7766bd5b35
Stop returning an unused column when handling new receipts. (#13933) 2022-09-28 10:58:25 -04:00