Commit Graph

124 Commits

Author SHA1 Message Date
Eric Eastwood
daf498e099
Fix 500 error on /messages when we accumulate more than 5 backward extremities (#11027)
Found while working on the Gitter backfill script and noticed
it only happened after we sent 7 batches, https://gitlab.com/gitterHQ/webapp/-/merge_requests/2229#note_665906390

When there are more than 5 backward extremities for a given depth,
backfill will throw an error because we sliced the extremity list
to 5 but then try to iterate over the full list. This causes
us to look for state that we never fetched and we get a `KeyError`.

Before when calling `/messages` when there are more than 5 backward extremities:
```
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 258, in _async_render_wrapper
    callback_return = await self._async_render(request)
  File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 446, in _async_render
    callback_return = await raw_callback_return
  File "/usr/local/lib/python3.8/site-packages/synapse/rest/client/room.py", line 580, in on_GET
    msgs = await self.pagination_handler.get_messages(
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/pagination.py", line 396, in get_messages
    await self.hs.get_federation_handler().maybe_backfill(
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 133, in maybe_backfill
    return await self._maybe_backfill_inner(room_id, current_depth, limit)
  File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 386, in _maybe_backfill_inner
    likely_extremeties_domains = get_domains_from_state(states[e_id])
KeyError: '$zpFflMEBtZdgcMQWTakaVItTLMjLFdKcRWUPHbbSZJl'
```
2021-10-14 18:53:45 -05:00
Richard van der Hoff
96fe77c254
Improve the logging in _auth_and_persist_outliers (#11010)
Include the event ids being peristed
2021-10-07 11:43:25 +00:00
Richard van der Hoff
86af6b2f0e
Add a comment in _process_received_pdu (#11011) 2021-10-07 12:20:03 +01:00
Eric Eastwood
392863fbf1
Fix logic flaw preventing tracking of MSC2716 events in existing room versions (#10962)
We correctly allowed using the MSC2716 batch endpoint for
the room creator in existing room versions but accidentally didn't track
the events because of a logic flaw.

This prevented you from connecting subsequent chunks together because it would
throw the unknown batch ID error.

We only want to process MSC2716 events when:

 - The room version supports MSC2716
 - Any room where the homeserver has the `msc2716_enabled` experimental feature enabled and the event is from the room creator
2021-10-05 11:51:57 -05:00
Richard van der Hoff
787af4a106
Host cache_joined_hosts_for_event to caller (#10986)
`_check_event_auth` is only called in two places, and only one of those sets
`send_on_behalf_of`. Warming the cache isn't really part of auth anyway, so
moving it out makes a lot more sense.
2021-10-05 13:01:41 +01:00
Richard van der Hoff
d099535deb
_update_auth_events_and_context_for_auth: add some comments (#10987)
Add some more comments about wtf is going on here.
2021-10-05 12:50:38 +01:00
Richard van der Hoff
cb88ed912b
_check_event_auth: move event validation earlier (#10988)
There's little point in doing a fancy state reconciliation dance if the event
itself is invalid.

Likewise, there's no point checking it again in `_check_for_soft_fail`.
2021-10-05 12:50:07 +01:00
Richard van der Hoff
428174f902
Split event_auth.check into two parts (#10940)
Broadly, the existing `event_auth.check` function has two parts:
 * a validation section: checks that the event isn't too big, that it has the rught signatures, etc. 
   This bit is independent of the rest of the state in the room, and so need only be done once 
   for each event.
 * an auth section: ensures that the event is allowed, given the rest of the state in the room.
   This gets done multiple times, against various sets of room state, because it forms part of
   the state res algorithm.

Currently, this is implemented with `do_sig_check` and `do_size_check` parameters, but I think
that makes everything hard to follow. Instead, we split the function in two and call each part
separately where it is needed.
2021-09-29 18:59:15 +01:00
Richard van der Hoff
2622b28c5c
Inline _check_event_auth for outliers (#10926)
* Inline `_check_event_auth` for outliers

When we are persisting an outlier, most of `_check_event_auth` is redundant:

 * `_update_auth_events_and_context_for_auth` does nothing, because the
   `input_auth_events` are (now) exactly the event's auth_events,
   which means that `missing_auth` is empty.

 * we don't care about soft-fail, kicking guest users or `send_on_behalf_of`
   for outliers

... so the only thing that matters is the auth itself, so let's just do that.

* `_auth_and_persist_fetched_events_inner`: de-async `prep`

`prep` no longer calls any `async` methods, so let's make it synchronous.

* Simplify `_check_event_auth`

We no longer need to support outliers here, which makes things rather simpler.

* changelog

* lint
2021-09-28 15:25:07 +01:00
Richard van der Hoff
0420d4e6a5
Stop trying to auth/persist events whose auth events we do not have. (#10907) 2021-09-24 14:01:45 +01:00
Richard van der Hoff
85551b7a85
Factor out common code for persisting fetched auth events (#10896)
* Factor more stuff out of `_get_events_and_persist`

It turns out that the event-sorting algorithm in `_get_events_and_persist` is
also useful in other circumstances. Here we move the current
`_auth_and_persist_fetched_events` to `_auth_and_persist_fetched_events_inner`,
and then factor the sorting part out to `_auth_and_persist_fetched_events`.

* `_get_remote_auth_chain_for_event`: remove redundant `outlier` assignment

`get_event_auth` returns events with the outlier flag already set, so this is
redundant (though we need to update a test where `get_event_auth` is mocked).

* `_get_remote_auth_chain_for_event`: move existing-event tests earlier

Move a couple of tests outside the loop. This is a bit inefficient for now, but
a future commit will make it better. It should be functionally identical.

* `_get_remote_auth_chain_for_event`: use `_auth_and_persist_fetched_events`

We can use the same codepath for persisting the events fetched as part of an
auth chain as for those fetched individually by `_get_events_and_persist` for
building the state at a backwards extremity.

* `_get_remote_auth_chain_for_event`: use a dict for efficiency

`_auth_and_persist_fetched_events` sorts the events itself, so we no longer
need to care about maintaining the ordering from `get_event_auth` (and no
longer need to sort by depth in `get_event_auth`).

That means that we can use a map, making it easier to filter out events we
already have, etc.

* changelog

* `_auth_and_persist_fetched_events`: improve docstring
2021-09-24 11:56:33 +01:00
Richard van der Hoff
261c9763c4
Simplify _auth_and_persist_fetched_events (#10901)
Combine the two loops over the list of events, and hence get rid of
`_NewEventInfo`. Also pass the event back alongside the context, so that it's
easier to process the result.
2021-09-24 11:56:13 +01:00
Richard van der Hoff
a7304adc7d
Factor out _get_remote_auth_chain_for_event from _update_auth_events_and_context_for_auth (#10884)
* Reload auth events from db after fetching and persisting

In `_update_auth_events_and_context_for_auth`, when we fetch the remote auth
tree and persist the returned events: load the missing events from the database
rather than using the copies we got from the remote server.

This is mostly in preparation for additional refactors, but does have an
advantage in that if we later get around to checking the rejected status, we'll
be able to make use of it.

* Factor out `_get_remote_auth_chain_for_event` from `_update_auth_events_and_context_for_auth`

* changelog
2021-09-23 17:34:33 +01:00
Richard van der Hoff
26f2bfedbf
Factor out a separate EventContext.for_outlier (#10883)
Constructing an EventContext for an outlier is actually really simple, and
there's no sense in going via an `async` method in the `StateHandler`.

This also means that we can resolve a bunch of FIXMEs.
2021-09-22 17:58:57 +01:00
Patrick Cloke
b3590614da
Require type hints in the handlers module. (#10831)
Adds missing type hints to methods in the synapse.handlers
module and requires all methods to have type hints there.

This also removes the unused construct_auth_difference method
from the FederationHandler.
2021-09-20 08:56:23 -04:00
Patrick Cloke
01c88a09cd
Use direct references for some configuration variables (#10798)
Instead of proxying through the magic getter of the RootConfig
object. This should be more performant (and is more explicit).
2021-09-13 13:07:12 -04:00
Richard van der Hoff
abedf7d77f
Get rid of _auth_and_persist_event (#10781)
This is only called in two places, and the code seems much clearer without it.
2021-09-08 19:03:08 +01:00
Richard van der Hoff
aacdce8fc0
Add some assertions about outliers (#10773)
I think I have finally teased apart the codepaths which handle outliers, and those that handle non-outliers. 
Let's add some assertions to demonstrate my newfound knowledge.
2021-09-08 10:41:13 +01:00
Richard van der Hoff
5724883ac2
Persist auth events before the events that rely on them (#10771)
If we're persisting an event E which has auth_events A1, A2, then we ought to make sure that we correctly auth
and persist A1 and A2, before we blindly accept E.

This PR does part of that - it persists the auth events first - but it does not fully solve the problem, because we
still don't check that the auth events weren't rejected.
2021-09-08 10:37:50 +01:00
Richard van der Hoff
f30c9745ab
Underscore-prefix private fields in FederationEventHandler (#10746) 2021-09-07 11:15:51 +01:00
Richard van der Hoff
b298de780a
Stop using BaseHandler in FederationEventHandler (#10745)
It's now only used in a couple of places, so we can drop it altogether.
2021-09-06 14:49:33 +01:00
Richard van der Hoff
56e2a30634
Move maybe_kick_guest_users out of BaseHandler (#10744)
This is part of my ongoing war against BaseHandler. I've moved kick_guest_users into RoomMemberHandler (since it calls out to that handler anyway), and split maybe_kick_guest_users into the two places it is called.
2021-09-06 12:17:16 +01:00
Eric Eastwood
1ca70fd312
Allow room creator to send MSC2716 related events in existing room versions (#10566)
* Allow room creator to send MSC2716 related events in existing room versions

Discussed at https://github.com/matrix-org/matrix-doc/pull/2716/#discussion_r682474869

Restoring `get_create_event_for_room_txn` from,
44bb3f0cf5

* Add changelog

* Stop people from trying to redact MSC2716 events in unsupported room versions

* Populate rooms.creator column for easy lookup

> From some [out of band discussion](https://matrix.to/#/!UytJQHLQYfvYWsGrGY:jki.re/$p2fKESoFst038x6pOOmsY0C49S2gLKMr0jhNMz_JJz0?via=jki.re&via=matrix.org), my plan is to use `rooms.creator`. But currently, we don't fill in `creator` for remote rooms when a user is invited to a room for example. So we need to add some code to fill in `creator` wherever we add to the `rooms` table. And also add a background update to fill in the rows missing `creator` (we can use the same logic that `get_create_event_for_room_txn` is doing by looking in the state events to get the `creator`).
>
> https://github.com/matrix-org/synapse/pull/10566#issuecomment-901616642

* Remove and switch away from get_create_event_for_room_txn

* Fix no create event being found because no state events persisted yet

* Fix and add tests for rooms creator bg update

* Populate rooms.creator field for easy lookup

Part of https://github.com/matrix-org/synapse/pull/10566

 - Fill in creator whenever we insert into the rooms table
 - Add background update to backfill any missing creator values

* Add changelog

* Fix usage

* Remove extra delta already included in #10697

* Don't worry about setting creator for invite

* Only iterate over rows missing the creator

See https://github.com/matrix-org/synapse/pull/10697#discussion_r695940898

* Use constant to fetch room creator field

See https://github.com/matrix-org/synapse/pull/10697#discussion_r696803029

* More protection from other random types

See https://github.com/matrix-org/synapse/pull/10697#discussion_r696806853

* Move new background update to end of list

See https://github.com/matrix-org/synapse/pull/10697#discussion_r696814181

* Fix query casing

* Fix ambiguity iterating over cursor instead of list

Fix `psycopg2.ProgrammingError: no results to fetch` error
when tests run with Postgres.

```
SYNAPSE_POSTGRES=1 SYNAPSE_TEST_LOG_LEVEL=INFO python -m twisted.trial tests.storage.databases.main.test_room
```

---

We use `txn.fetchall` because it will return the results as a
list or an empty list when there are no results.

Docs:

> `cursor` objects are iterable, so, instead of calling explicitly fetchone() in a loop, the object itself can be used:
>
> https://www.psycopg.org/docs/cursor.html#cursor-iterable

And I'm guessing iterating over a raw cursor does something weird when there are no results.

---

Test CI failure: https://github.com/matrix-org/synapse/pull/10697/checks?check_run_id=3468916530
```
tests.test_visibility.FilterEventsForServerTestCase.test_large_room
===============================================================================
[FAIL]
Traceback (most recent call last):
  File "/home/runner/work/synapse/synapse/tests/storage/databases/main/test_room.py", line 85, in test_background_populate_rooms_creator_column
    self.get_success(
  File "/home/runner/work/synapse/synapse/tests/unittest.py", line 500, in get_success
    return self.successResultOf(d)
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/trial/_synctest.py", line 700, in successResultOf
    self.fail(
twisted.trial.unittest.FailTest: Success result expected on <Deferred at 0x7f4022f3eb50 current result: None>, found failure result instead:
Traceback (most recent call last):
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 701, in errback
    self._startRunCallbacks(fail)
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 764, in _startRunCallbacks
    self._runCallbacks()
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 1751, in gotResult
    current_context.run(_inlineCallbacks, r, gen, status)
--- <exception caught here> ---
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 1657, in _inlineCallbacks
    result = current_context.run(
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/python/failure.py", line 500, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/runner/work/synapse/synapse/synapse/storage/background_updates.py", line 224, in do_next_background_update
    await self._do_background_update(desired_duration_ms)
  File "/home/runner/work/synapse/synapse/synapse/storage/background_updates.py", line 261, in _do_background_update
    items_updated = await update_handler(progress, batch_size)
  File "/home/runner/work/synapse/synapse/synapse/storage/databases/main/room.py", line 1399, in _background_populate_rooms_creator_column
    end = await self.db_pool.runInteraction(
  File "/home/runner/work/synapse/synapse/synapse/storage/database.py", line 686, in runInteraction
    result = await self.runWithConnection(
  File "/home/runner/work/synapse/synapse/synapse/storage/database.py", line 791, in runWithConnection
    return await make_deferred_yieldable(
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/home/runner/work/synapse/synapse/tests/server.py", line 425, in <lambda>
    d.addCallback(lambda x: function(*args, **kwargs))
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 293, in _runWithConnection
    compat.reraise(excValue, excTraceback)
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/python/deprecate.py", line 298, in deprecatedFunction
    return function(*args, **kwargs)
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/python/compat.py", line 404, in reraise
    raise exception.with_traceback(traceback)
  File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 284, in _runWithConnection
    result = func(conn, *args, **kw)
  File "/home/runner/work/synapse/synapse/synapse/storage/database.py", line 786, in inner_func
    return func(db_conn, *args, **kwargs)
  File "/home/runner/work/synapse/synapse/synapse/storage/database.py", line 554, in new_transaction
    r = func(cursor, *args, **kwargs)
  File "/home/runner/work/synapse/synapse/synapse/storage/databases/main/room.py", line 1375, in _background_populate_rooms_creator_column_txn
    for room_id, event_json in txn:
psycopg2.ProgrammingError: no results to fetch
```

* Move code not under the MSC2716 room version underneath an experimental config option

See https://github.com/matrix-org/synapse/pull/10566#issuecomment-906437909

* Add ordering to rooms creator background update

See https://github.com/matrix-org/synapse/pull/10697#discussion_r696815277

* Add comment to better document constant

See https://github.com/matrix-org/synapse/pull/10697#discussion_r699674458

* Use constant field
2021-09-04 00:58:49 -05:00
Richard van der Hoff
1800aabfc2
Split FederationHandler in half (#10692)
The idea here is to take anything to do with incoming events and move it out to a separate handler, as a way of making FederationHandler smaller.
2021-08-26 21:41:44 +01:00