Commit Graph

61 Commits

Author SHA1 Message Date
Patrick Cloke
8e1e62c9e0 Update license headers 2023-11-21 15:29:58 -05:00
Erik Johnston
8c63e93286
Fix HTTP repl response to use minimum token (#16578) 2023-10-30 12:27:14 +00:00
Erik Johnston
8f35f8148e
Fix bug where a new writer advances their token too quickly (#16473)
* Fix bug where a new writer advances their token too quickly

When starting a new writer (for e.g. persisting events), the
`MultiWriterIdGenerator` doesn't have a minimum token for it as there
are no rows matching that new writer in the DB.

This results in the the first stream ID it acquired being announced as
persisted *before* it actually finishes persisting, if another writer
gets and persists a subsequent stream ID. This is due to the logic of
setting the minimum persisted position to the minimum known position of
across all writers, and the new writer starts off not being considered.

* Fix sending out POSITIONs when our token advances without update

Broke in #14820

* For replication HTTP requests, only wait for minimal position
2023-10-23 16:57:30 +01:00
Jason Little
1df0221bda
Use a custom scheme & the worker name for replication requests. (#15578)
All the information needed is already in the `instance_map`, so
use that instead of passing the hostname / IP & port manually
for each replication request.

This consolidates logic for future improvements of using e.g.
UNIX sockets for workers.
2023-05-23 09:05:30 -04:00
Jason Little
e4f545c452
Remove worker_replication_* settings (#15491)
* Add master to the instance_map as part of Complement, have ReplicationEndpoint look at instance_map for master.

* Fix typo in drive by.

* Remove unnecessary worker_replication_* bits from unit tests and add master to instance_map(hopefully in the right place)

* Several updates:

1. Switch from master to main for naming the main process in the instance_map. Add useful constants for easier adjustment of names in the future.
2. Add backwards compatibility for worker_replication_* to allow time to transition to new style. Make sure to prioritize declaring main directly on the instance_map.
3. Clean up old comments/commented out code.
4. Adjust unit tests to match with new code.
5. Adjust Complement setup infrastructure to only add main to the instance_map if workers are used and remove now unused options from the worker.yaml template.

* Initial Docs upload

* Changelog

* Missed some commented out code that can go now

* Remove TODO comment that no longer holds true.

* Fix links in docs

* More docs

* Remove debug logging

* Apply suggestions from code review

Co-authored-by: reivilibre <olivier@librepush.net>

* Apply suggestions from code review

Co-authored-by: reivilibre <olivier@librepush.net>

* Update version to latest, include completeish before/after examples in upgrade notes.

* Fix up and docs too

---------

Co-authored-by: reivilibre <olivier@librepush.net>
2023-05-11 11:30:56 +01:00
Jason Little
d3bd03559b
HTTP Replication Client (#15470)
Separate out a HTTP client for replication in preparation for
also supporting using UNIX sockets. The major difference from
the base class is that this does not use treq to handle HTTP
requests.
2023-05-09 14:25:20 -04:00
David Robertson
1bc9985eb7
Have replication clients remove _INT_STREAM_POS (#15309)
* Have replication clients remove _INT_STREAM_POS

Suppose worker A makes an internal http request from worker B. B may
make changes that A later learns about over replication. We want A's
request to block until it has seen those changes—mainly to ensure A's
caches are invalidated promptly. This helps provide read-after-write
consistency, eliminating entire categories of races and test flakes.

To implement this, B includes a top-level field `_INT_STREAM_POS` in its
response JSON. Roughly speaking, the field's value tells A what to wait
for. But we weren't removing that internal field before A's request
completed!

Introduced in https://github.com/matrix-org/synapse/pull/14820.
Fixes #15308.

* Changelog
2023-03-22 12:53:55 +00:00
Erik Johnston
c78c67c5a9
Fix bug in replication where response is cached (#15024) 2023-02-08 16:41:55 +00:00
Erik Johnston
0ec12a3753
Reduce max time we wait for stream positions (#14881)
Now that we wait for stream positions whenever we do a HTTP replication
hit, we need to be less brutal in the case where we do timeout (as we
have bugs around this).
2023-01-20 21:04:33 +00:00
Erik Johnston
9187fd940e
Wait for streams to catch up when processing HTTP replication. (#14820)
This should hopefully mitigate a class of races where data gets out of
sync due a HTTP replication request racing with the replication streams.
2023-01-18 19:35:29 +00:00
Patrick Cloke
d8cc86eff4
Remove redundant types from comments. (#14412)
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.

Also adds some missing type hints which were simple to fill in.
2022-11-16 15:25:24 +00:00
Tuomas Ojamies
b5ab2c428a
Support using SSL on worker endpoints. (#14128)
* Fix missing SSL support in worker endpoints.

* Add changelog

* SSL for Replication endpoint

* Remove unit test change

* Refactor listener creation to reduce duplicated code

* Fix the logger message

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Add config documentation for new TLS option

Co-authored-by: Tuomas Ojamies <tojamies@palantir.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
2022-11-15 12:55:00 +00:00
reivilibre
7bc110a19e
Generalise the @cancellable annotation so it can be used on functions other than just servlet methods. (#13662) 2022-08-31 11:16:05 +00:00
Patrick Cloke
a6895dd576
Add type annotations to trace decorator. (#13328)
Functions that are decorated with `trace` are now properly typed
and the type hints for them are fixed.
2022-07-19 14:14:30 -04:00
Sean Quah
a559c8b0d9
Respect the @cancellable flag for ReplicationEndpoints (#12700)
While `ReplicationEndpoint`s register themselves via `JsonResource`,
they pass a method that calls the handler, instead of the handler itself,
to `register_paths`. As a result, `JsonResource` will not correctly pick
up the `@cancellable` flag and we have to apply it ourselves.

Signed-off-by: Sean Quah <seanq@element.io>
2022-05-11 12:25:39 +01:00
David Robertson
a2b00a4486
Bump black and click versions (#12320) 2022-03-29 10:41:19 +00:00
Nick Mills-Barrett
180d8ff0d4
Retry some http replication failures (#12182)
This allows for the target process to be down for around a minute
which provides time for restarts during synapse upgrades/config updates.

Closes: #12178

Signed off by Nick Mills-Barrett nick@beeper.com
2022-03-09 14:53:28 +00:00
Erik Johnston
6d14b3dabf
Better error message when failing to request from another process (#12060) 2022-02-22 15:52:08 +00:00
Patrick Cloke
63d90f10ec
Add missing type hints to synapse.replication.http. (#11856) 2022-02-08 07:44:39 -05:00
Sean Quah
2b82ec425f
Add type hints for most HomeServer parameters (#11095) 2021-10-22 18:15:41 +01:00
Sean Quah
6b18eb4430
Fix opentracing and Prometheus metrics for replication requests (#10996)
This commit fixes two bugs to do with decorators not instrumenting
`ReplicationEndpoint`'s `send_request` correctly. There are two
decorators on `send_request`: Prometheus' `Gauge.track_inprogress()`
and Synapse's `opentracing.trace`.

`Gauge.track_inprogress()` does not have any support for async
functions when used as a decorator. Since async functions behave like
regular functions that return coroutines, only the creation of the
coroutine was covered by the metric and none of the actual body of
`send_request`.

`Gauge.track_inprogress()` returns a regular, non-async function
wrapping `send_request`, which is the source of the next bug.
The `opentracing.trace` decorator would normally handle async functions
correctly, but since the wrapped `send_request` is a non-async function,
the decorator ends up suffering from the same issue as
`Gauge.track_inprogress()`: the opentracing span only measures the
creation of the coroutine and none of the actual function body.

Using `Gauge.track_inprogress()` as a context manager instead of a
decorator resolves both bugs.
2021-10-12 11:23:46 +01:00
Patrick Cloke
bb7fdd821b
Use direct references for configuration variables (part 5). (#10897) 2021-09-24 07:25:21 -04:00
Jonathan de Jong
bf72d10dbf
Use inline type hints in various other places (in synapse/) (#10380) 2021-07-15 11:02:43 +01:00
Richard van der Hoff
d7808a2dde
Extend ResponseCache to pass a context object into the callback (#10157)
This is the first of two PRs which seek to address #8518. This first PR lays the groundwork by extending ResponseCache; a second PR (#10158) will update the SyncHandler to actually use it, and fix the bug.

The idea here is that we allow the callback given to ResponseCache.wrap to decide whether its result should be cached or not. We do that by (optionally) passing a ResponseCacheContext into it, which it can modify.
2021-06-14 10:26:09 +01:00
Richard van der Hoff
1bf83a191b
Clean up the interface for injecting opentracing over HTTP (#10143)
* Remove unused helper functions

* Clean up the interface for injecting opentracing over HTTP

* changelog
2021-06-09 11:33:00 +01:00
Erik Johnston
9d25a0ae65
Split presence out of master (#9820) 2021-04-23 12:21:55 +01:00
Jonathan de Jong
4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Jonathan de Jong
d6196efafc
Add ResponseCache tests. (#9458) 2021-03-08 14:00:07 -05:00
Eric Eastwood
0a00b7ff14
Update black, and run auto formatting over the codebase (#9381)
- Update black version to the latest
 - Run black auto formatting over the codebase
    - Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
 - Update `code_style.md` docs around installing black to use the correct version
2021-02-16 22:32:34 +00:00
Erik Johnston
f08ef64926
Enforce all replication HTTP clients calls use kwargs (#9144) 2021-01-18 15:24:04 +00:00
Patrick Cloke
96358cb424
Add authentication to replication endpoints. (#8853)
Authentication is done by checking a shared secret provided
in the Synapse configuration file.
2020-12-04 10:56:28 -05:00
Patrick Cloke
1781bbe319
Add type hints to response cache. (#8507) 2020-10-09 11:35:11 -04:00
Richard van der Hoff
866c84da8d
Add metrics to track success/otherwise of replication requests (#8406)
One hope is that this might provide some insights into #3365.
2020-09-29 11:06:11 +01:00
Jonathan de Jong
a3f124b821
Switch metaclass initialization to python 3-compatible syntax (#8326) 2020-09-16 15:15:55 -04:00
Patrick Cloke
c619253db8
Stop sub-classing object (#8249) 2020-09-04 06:54:56 -04:00
Patrick Cloke
3b415e23a5
Convert replication code to async/await. (#7987) 2020-08-03 07:12:55 -04:00
Patrick Cloke
38e1fac886
Fix some spelling mistakes / typos. (#7811) 2020-07-09 09:52:58 -04:00
Erik Johnston
5cdca53aa0
Merge different Resource implementation classes (#7732) 2020-07-03 19:02:19 +01:00
Dagfinn Ilmari Mannsåker
a3f11567d9
Replace all remaining six usage with native Python 3 equivalents (#7704) 2020-06-16 08:51:47 -04:00
Erik Johnston
e5c67d04db
Add option to move event persistence off master (#7517) 2020-05-22 16:11:35 +01:00
Erik Johnston
1de36407d1
Add instance_map config and route replication calls (#7495) 2020-05-14 14:00:58 +01:00
Erik Johnston
0e719f2398
Thread through instance name to replication client. (#7369)
For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.
2020-05-01 17:19:56 +01:00
Richard van der Hoff
3e99528f2b
Store room version on invite (#6983)
When we get an invite over federation, store the room version in the rooms table.

The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
2020-02-26 16:58:33 +00:00
Erik Johnston
e8b68a4e4b
Fixup synapse.replication to pass mypy checks (#6667) 2020-01-14 14:08:06 +00:00
Andrew Morgan
54fef094b3
Remove usage of deprecated logger.warn method from codebase (#6271)
Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
2019-10-31 10:23:24 +00:00
Erik Johnston
e577a4b2ad Port replication http server endpoints to async/await 2019-10-29 13:00:51 +00:00
Jorik Schellekens
f7c873a643
Trace how long it takes for the send trasaction to complete, including retrys (#5986) 2019-09-05 17:44:55 +01:00
Jorik Schellekens
909827b422
Add opentracing to all client servlets (#5983) 2019-09-05 14:46:04 +01:00
Jorik Schellekens
812ed6b0d5
Opentracing across workers (#5771)
Propagate opentracing contexts across workers


Also includes some Convenience modifications to opentracing for servlets, notably:
- Add boolean to skip the whitelisting check on inject
  extract methods. - useful when injecting into carriers
  locally. Otherwise we'd always have to include our
  own servername and whitelist our servername
- start_active_span_from_request instead of header
- Add boolean to decide whether to extract context
  from a request to a servlet
2019-08-22 18:08:07 +01:00
Andrew Morgan
baf081cd3b Bugfixes
--------
 
 - Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAl04Ur0THGFuZHJld0Bh
 bW9yZ2FuLnh5egAKCRCIhIgNLv5f9F4oD/0TY6S/SEd2uAmzor64ojmbX5BOwPzf
 j/wzUTrfvuf40EvkNPDpnejNZSvy/ysbaGQaQusv0SQKlV3xrvdn4RuMvnOWVWck
 kBsO+lvzOaUTR0KHDxN4y9F5eI2NdPbub4847PPVzyqSIHAd+kolxXS8kSBBhwpL
 yfaICWV/AOy5L7xN+JZ9IQpnegVAvUj5DmgXzDHd6VdeiHDVJuARaBgrR5uCkwVS
 ZoLRqZ95XV/qiguMAUvPOwyEqht2mwO64989MswP16YYm8oMkB5QA6I5nYnACsTP
 qk9YcN/oNvEfQXUhttku6MxK1/4yUMPUhEoDBDH7ebc0440QDtWN+IHTdA6oPVZB
 IuStL9YGY16m7Ltx37ZUA4URfNMiSeLHo3zKc/mCAcwxN4HyOjJewtxbG5zKQAOZ
 SMs8UcDwGR4zL1hnt8ZDNYtWwfzJBQIdGjoHvjXJEY7/1csTv2lmAwewFTXiqSAr
 30GW5ews94kotqBK53zZT6V0F5gHNqgGHniOz1ZpqLLxYLqO3LSAGe97CrqlWUdX
 GkhA9tZyweknociD9fyyBmKdcFJ4mL4a+oGI5CMnSMph8UvCY8Y5XMb1T+iYEABI
 tA9G3mBvgkLPj+5V+8QggNkBafSigW2Q4FX7enGsDmiiskZOtfeKrAcVkapD4ooi
 3I7IW5aetZr2IQ==
 =+JBn
 -----END PGP SIGNATURE-----

Merge tag 'v1.2.0rc2' into develop

Bugfixes
--------

- Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
2019-07-24 13:47:51 +01:00