Commit Graph

837 Commits

Author SHA1 Message Date
Erik Johnston
d35bed8369
Don't wake up destination transaction queue if they're not due for retry. (#16223) 2023-09-04 17:14:09 +01:00
Erik Johnston
f84baecb6f
Don't reset retry timers on "valid" error codes (#16221) 2023-09-04 14:04:43 +01:00
David Robertson
62a1a9be52
Describe which rate limiter was hit in logs (#16135) 2023-08-30 00:39:39 +01:00
Patrick Cloke
9ec3da06da
Bump mypy-zope & mypy. (#16188) 2023-08-29 10:38:56 -04:00
Mathieu Velten
501da8ecd8
Task scheduler: add replication notify for new task to launch ASAP (#16184) 2023-08-28 14:03:51 +00:00
David Robertson
e691243e19
Fix typechecking with twisted trunk (#16121) 2023-08-24 14:53:07 +00:00
Mathieu Velten
873971a8b9
Task scheduler: mark task as active if we are scheduling ASAP (#16165) 2023-08-23 13:37:51 +02:00
Shay
69048f7b48
Add an admin endpoint to allow authorizing server to signal token revocations (#16125) 2023-08-22 14:15:34 +00:00
Mathieu Velten
358896e1b8
Implements a task scheduler for resumable potentially long running tasks (#15891) 2023-08-21 14:17:13 +02:00
David Robertson
47c629bb27
Attempt to fix twisted trunk (#16115) 2023-08-15 16:07:13 +00:00
Patrick Cloke
ad3f43be9a
Run pyupgrade for python 3.7 & 3.8. (#16110) 2023-08-15 08:11:20 -04:00
Mathieu Velten
f0a860908b
Allow config of the backoff algorithm for the federation client. (#15754)
Adds three new configuration variables:

* destination_min_retry_interval is identical to before (10mn).
* destination_retry_multiplier is now 2 instead of 5, the maximum value will
  be reached slower.
* destination_max_retry_interval is one day instead of (essentially) infinity.

Capping this will cause destinations to continue to be retried sometimes instead
of being lost forever. The previous value was 2 ^ 62 milliseconds.
2023-08-03 14:36:55 -04:00
Jason Little
7cbb2a00d1
Add metrics tracking for eviction to ResponseCache (#16028)
Track whether the ResponseCache is evicting due to invalidation
or due to time.
2023-08-01 08:10:49 -04:00
Eric Eastwood
561d06b481
Remove support for Python 3.7 (#15851)
Fix https://github.com/matrix-org/synapse/issues/15836
2023-07-05 18:45:42 -05:00
Jason Little
21fea6b749
Prefill events after invalidate not before when persisting events (#15758)
Fixes #15757
2023-06-14 09:42:18 +01:00
Eric Eastwood
8ddb2de553
Document looping_call() functionality that will wait for the given function to finish before scheduling another (#15772)
Thanks to @erikjohnston for clarifying, https://github.com/matrix-org/synapse/pull/15743#discussion_r1226544457

We don't have to worry about calls stacking up if the given function takes longer than the scheduled time.
2023-06-13 16:34:54 -05:00
Erik Johnston
c485ed1c5a
Clear event caches when we purge history (#15609)
This should help a little with #13476

---------

Co-authored-by: Patrick Cloke <patrickc@matrix.org>
2023-06-08 13:14:40 +01:00
Patrick Cloke
c01343de43
Add stricter mypy options (#15694)
Enable warn_unused_configs, strict_concatenate, disallow_subclassing_any,
and disallow_incomplete_defs.
2023-05-31 07:18:29 -04:00
Eric Eastwood
77156a4bc1
Process previously failed backfill events in the background (#15585)
Process previously failed backfill events in the background because they are bound to fail again and we don't need to waste time holding up the request for something that is bound to fail again.

Fix https://github.com/matrix-org/synapse/issues/13623

Follow-up to https://github.com/matrix-org/synapse/issues/13621 and https://github.com/matrix-org/synapse/issues/13622

Part of making `/messages` faster: https://github.com/matrix-org/synapse/issues/13356
2023-05-24 23:22:24 -05:00
Patrick Cloke
1f55c04cbc
Improve type hints for cached decorator. (#15658)
The cached decorators always return a Deferred, which was not
properly propagated. It was close enough when wrapping coroutines,
but failed if a bare function was wrapped.
2023-05-24 12:59:31 +00:00
Sean Quah
d0de452d12
Fix HomeServers leaking during trial test runs (#15630)
This change fixes two memory leaks during `trial` test runs.

Garbage collection is disabled during each test case and a gen-0 GC is
run at the end of each test. However, when the gen-0 GC is run, the
`TestCase` object usually still holds references to the `HomeServer`
used during the test. As a result, the `HomeServer` gets promoted to
gen-1 and then never garbage collected.

Fix this by periodically running full GCs.

Additionally, fix `HomeServer`s leaking after tests that touch inbound
federation due to `FederationRateLimiter`s adding themselves to a global
set, by turning the set into a `WeakSet`.

Resolves #15622.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-05-19 11:17:12 +01:00
Sean Quah
68dcd2cbcb
Re-type config paths in ConfigErrors to be StrSequences (#15615)
Part of #14809.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-05-18 11:11:30 +01:00
Andrew Morgan
7c95b65873
Clean up and clarify "Create or modify Account" Admin API documentation (#15544) 2023-05-05 15:51:46 +01:00
David Robertson
3b0083c92a
Use immutabledict instead of frozendict (#15113)
Additionally:

* Consistently use `freeze()` in test

---------

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: 6543 <6543@obermui.de>
2023-03-22 17:15:34 +00:00
dependabot[bot]
9bb2eac719
Bump black from 22.12.0 to 23.1.0 (#15103) 2023-02-22 15:29:09 -05:00
Sean Quah
a302d3ecf7
Remove unnecessary reactor reference from _PerHostRatelimiter (#14842)
Fix up #14812 to avoid introducing a reference to the reactor.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-16 13:16:19 +00:00
Sean Quah
772e8c2385
Fix stack overflow in _PerHostRatelimiter due to synchronous requests (#14812)
When there are many synchronous requests waiting on a
`_PerHostRatelimiter`, each request will be started recursively just
after the previous request has completed. Under the right conditions,
this leads to stack exhaustion.

A common way for requests to become synchronous is when the remote
client disconnects early, because the homeserver is overloaded and slow
to respond.

Avoid stack exhaustion under these conditions by deferring subsequent
requests until the next reactor tick.

Fixes #14480.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-13 00:16:21 +00:00
reivilibre
ba4ea7d13f
Batch up replication requests to request the resyncing of remote users's devices. (#14716) 2023-01-10 11:17:59 +00:00
Patrick Cloke
630d0aeaf6
Support RFC7636 PKCE in the OAuth 2.0 flow. (#14750)
PKCE can protect against certain attacks and is enabled by default. Support
can be controlled manually by setting the pkce_method of each oidc_providers
entry to 'auto' (default), 'always', or 'never'.

This is required by Twitter OAuth 2.0 support.
2023-01-04 14:58:08 -05:00
Patrick Cloke
3aeca2588b
Add missing type hints to tests.config. (#14681) 2022-12-16 08:53:28 -05:00
reivilibre
864c3f85b0
Improve type annotations for the helper methods on a CachedFunction. (#14685) 2022-12-16 13:04:54 +00:00
Mathieu Velten
54c012c5a8
Make handle_new_client_event throws PartialStateConflictError (#14665)
Then adapts calling code to retry when needed so it doesn't 500
to clients.

Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2022-12-15 16:04:23 +00:00
Patrick Cloke
9d8a3234ba
Respond with proper error responses on unknown paths. (#14621)
Returns a proper 404 with an errcode of M_RECOGNIZED for
unknown endpoints per MSC3743.
2022-12-08 11:37:05 -05:00
Patrick Cloke
da77720752
Check the stream position before checking if the cache is empty. (#14639)
An empty cache does not mean the entity has no changed, if
it is earlier than the earliest known stream position return that
the entity *has* changed since the cache cannot accurately
answer that query.
2022-12-08 11:35:49 -05:00
Erik Johnston
cee9445884
Better return type for get_all_entities_changed (#14604)
Help callers from using the return value incorrectly by ensuring
that callers explicitly check if there was a cache hit or not.
2022-12-05 15:19:14 -05:00
Patrick Cloke
6a8310f3df
Compare to the earliest known stream pos in the stream change cache. (#14435)
The internal methods of the StreamChangeCache were inconsistently
treating the earliest known stream position as valid. It is now treated as
invalid, meaning the cache cannot determine if an entity at the earliest
known stream position has changed or not.
2022-12-05 09:00:59 -05:00
Patrick Cloke
1799a54a54
Batch fetch bundled annotations (#14491)
Avoid an n+1 query problem and fetch the bundled aggregations for
m.annotation relations in a single query instead of a query per event.

This applies similar logic for as was previously done for edits in
8b309adb43 (#11660) and threads
in b65acead42 (#11752).
2022-11-22 07:26:11 -05:00
Patrick Cloke
d8cc86eff4
Remove redundant types from comments. (#14412)
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.

Also adds some missing type hints which were simple to fill in.
2022-11-16 15:25:24 +00:00
Patrick Cloke
13ca8bb2fc
Remove duplicated code to evict entries. (#14410)
This code was factored out to a method, but also left in-place.

Calling this twice in a row makes no sense: the first call will reduce
the size appropriately, but the loop will immediately exit since the
cache size was already reduced.
2022-11-10 15:33:34 -05:00
Eric Eastwood
40fa8294e3
Refactor MSC3030 /timestamp_to_event to move away from our snowflake pull from destination pattern (#14096)
1. `federation_client.timestamp_to_event(...)` now handles all `destination` looping and uses our generic `_try_destination_list(...)` helper.
 2. Consistently handling `NotRetryingDestination` and `FederationDeniedError` across `get_pdu` , backfill, and the generic `_try_destination_list` which is used for many places we use this pattern.
 3. `get_pdu(...)` now returns `PulledPduInfo` so we know which `destination` we ended up pulling the PDU from
2022-10-26 16:10:55 -05:00
Quentin Gliech
8756d5c87e
Save login tokens in database (#13844)
* Save login tokens in database

Signed-off-by: Quentin Gliech <quenting@element.io>

* Add upgrade notes

* Track login token reuse in a Prometheus metric

Signed-off-by: Quentin Gliech <quenting@element.io>
2022-10-26 11:45:41 +01:00
Nick Mills-Barrett
c9dffd5b33
Remove unused @lru_cache decorator (#13595)
* Remove unused `@lru_cache` decorator

Spotted this working on something else.

Co-authored-by: David Robertson <davidr@element.io>
2022-10-25 11:39:25 +01:00
dependabot[bot]
0b7830e457
Bump flake8-bugbear from 21.3.2 to 22.9.23 (#14042)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
2022-10-19 19:38:24 +00:00
Abdullah Osama
a9934d48c1
Making parse_server_name more consistent (#14007)
Fixes #12122
2022-10-11 12:42:11 +00:00
David Robertson
cb20b885cb
Always close _all_ ijson coroutines, even if doing so raises Exceptions (#14065) 2022-10-06 18:17:50 +00:00
David Robertson
6f0c3e669d
Don't require setuptools_rust at runtime (#13952) 2022-09-29 20:16:08 +00:00
Eric Eastwood
29269d9d3f
Fix have_seen_event cache not being invalidated (#13863)
Fix https://github.com/matrix-org/synapse/issues/13856
Fix https://github.com/matrix-org/synapse/issues/13865

> Discovered while trying to make Synapse fast enough for [this MSC2716 test for importing many batches](https://github.com/matrix-org/complement/pull/214#discussion_r741678240). As an example, disabling the `have_seen_event` cache saves 10 seconds for each `/messages` request in that MSC2716 Complement test because we're not making as many federation requests for `/state` (speeding up `have_seen_event` itself is related to https://github.com/matrix-org/synapse/issues/13625) 
> 
> But this will also make `/messages` faster in general so we can include it in the [faster `/messages` milestone](https://github.com/matrix-org/synapse/milestone/11).
> 
> *-- https://github.com/matrix-org/synapse/issues/13856*


### The problem

`_invalidate_caches_for_event` doesn't run in monolith mode which means we never even tried to clear the `have_seen_event` and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT).

Additionally there was bug with the key being wrong so `_invalidate_caches_for_event` never invalidates the `have_seen_event` cache even when it does run.

Because we were using the `@cachedList` wrong, it was putting items in the cache under keys like `((room_id, event_id),)` with a `set` in a `set` (ex. `(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)`) and we we're trying to invalidate with just `(room_id, event_id)` which did nothing.
2022-09-27 15:55:43 -05:00
Mathieu Velten
6bd8763804
Add cache invalidation across workers to module API (#13667)
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
2022-09-21 15:32:01 +02:00
reivilibre
cf65433de2
Fix a memory leak when running the unit tests. (#13798) 2022-09-14 15:29:05 +00:00
Erik Johnston
ebfeac7c5d
Check if Rust lib needs rebuilding. (#13759)
This protects against the common mistake of failing to remember to rebuild Rust code after making changes.
2022-09-12 10:03:42 +00:00