Commit Graph

743 Commits

Author SHA1 Message Date
Erik Johnston
7620912d84
Add health check endpoint () 2020-08-07 14:21:24 +01:00
Erik Johnston
a7bdf98d01
Rename database classes to make some sense () 2020-08-05 21:38:57 +01:00
Richard van der Hoff
916cf2d439
re-implement daemonize ()
This has long been something I've wanted to do. Basically the `Daemonize` code
is both too flexible and not flexible enough, in that it offers a bunch of
features that we don't use (changing UID, closing FDs in the child, logging to
syslog) and doesn't offer a bunch that we could do with (redirecting stdout/err
to a file instead of /dev/null; having the parent not exit until the child is
running).

As a first step, I've lifted the Daemonize code and removed the bits we don't
use. This should be a non-functional change. Fixing everything else will come
later.
2020-08-04 10:03:41 +01:00
Patrick Cloke
db5970ac6d
Convert ACME code to async/await. () 2020-08-03 07:09:33 -04:00
Olivier Wilkinson (reivilibre)
3aa36b782c Merge branch 'master' into develop 2020-07-30 15:18:36 +01:00
Patrick Cloke
3950ae51ef
Ensure that remove_pusher is always async () 2020-07-30 06:56:55 -04:00
Erik Johnston
2c1b9d6763
Update worker docs with recent enhancements () 2020-07-29 23:22:13 +01:00
Erik Johnston
84d099ae11
Fix typing replication not being handled on master ()
Handling of incoming typing stream updates from replication was not
hooked up on master, effecting set ups where typing was handled on a
different worker.

This is really only a problem if the master process is also handling
sync requests, which is unlikely for those that are at the stage of
moving typing off.

The other observable effect is that if a worker restarts or a
replication connect drops then the typing worker will issue a
`POSITION typing`, triggering master process to try and stream *all*
typing updates from position 0.

Fixes 
2020-07-27 14:10:53 +01:00
Patrick Cloke
00e57b755c
Convert synapse.app to async/await. () 2020-07-17 07:08:56 -04:00
Erik Johnston
f2e38ca867
Allow moving typing off master () 2020-07-16 15:12:54 +01:00
Erik Johnston
f299441cc6
Add ability to shard the federation sender () 2020-07-10 18:26:36 +01:00
Patrick Cloke
8fa7fdd4cb
Pass original request headers from workers to the main process. () 2020-07-09 07:34:46 -04:00
Patrick Cloke
4d978d7db4 Merge branch 'master' into develop 2020-07-02 10:55:41 -04:00
Patrick Cloke
ea26e9a98b Ensure that HTML pages served from Synapse include headers to avoid embedding. 2020-07-02 09:58:31 -04:00
Richard van der Hoff
03619324fc
Create a ListenerConfig object ()
This ended up being a bit more invasive than I'd hoped for (not helped by
generic_worker duplicating some of the code from homeserver), but hopefully
it's an improvement.

The idea is that, rather than storing unstructured `dict`s in the config for
the listener configurations, we instead parse it into a structured
`ListenerConfig` object.
2020-06-16 12:44:07 +01:00
Patrick Cloke
7d2532be36
Discard RDATA from already seen positions. () 2020-06-15 08:44:54 -04:00
Patrick Cloke
bd6dc17221
Replace iteritems/itervalues/iterkeys with native versions. () 2020-06-15 07:03:36 -04:00
Patrick Cloke
02f345d053
Attempt to fix PhoneHomeStatsTestCase.test_performance_100 being flaky. () 2020-06-05 07:36:47 -04:00
Andrew Morgan
e91abfd291
async/await get_user_id_by_threepid ()
Based on  

async's `get_user_id_by_threepid` and its call stack.
2020-06-03 17:15:57 +01:00
Erik Johnston
ef3934ec8f Ensure we persist and ack the same token 2020-05-27 19:45:42 +01:00
Erik Johnston
35c308731d Speed up processing of federation stream RDATA rows.
Instead of storing and sending an ACK for every single row we send
synchronously, we instead do it asynchronously while batching up
updates.
2020-05-27 19:34:07 +01:00
Richard van der Hoff
04729b86f8
Fix incorrect exception handling in KeyUploadServlet.on_POST ()
Introduced in 
2020-05-26 11:42:22 +01:00
Richard van der Hoff
00db90f409
Fix recording of federation stream token ()
A couple of changes of significance:

 * remove the `_last_ack < federation_position` condition, so that
   updates will still be correctly processed after restart

 * Correctly wire up send_federation_ack to the right class.
2020-05-26 11:41:38 +01:00
Erik Johnston
e5c67d04db
Add option to move event persistence off master () 2020-05-22 16:11:35 +01:00
Patrick Cloke
4429764c9f
Return 200 OK for all OPTIONS requests () 2020-05-22 09:30:07 -04:00
Erik Johnston
547e4dd83e
Fix exception reporting due to HTTP request errors. ()
These are business as usual errors, rather than stuff we want to log at
error.
2020-05-22 11:39:20 +01:00
Richard van der Hoff
0bbbd10513
Stub out GET presence requests in the frontend proxy ()
We don't really make any promises about returning accurate presence data when
presence is disabled, so we may as well just return a static response, rather
than making the master handle a request.
2020-05-21 14:36:46 +01:00
Erik Johnston
51055c8c44
Allow ReplicationRestResource to be added to workers ()
This allows workers to talk to each other over HTTP replication.
2020-05-18 12:24:48 +01:00
Erik Johnston
03aff4c75e
Add a worker store for search insertion. ()
This is required as both event persistence and the background update needs access to this function. It should be perfectly safe for two workers to write to that table at the same time.
2020-05-15 17:22:47 +01:00
Erik Johnston
4734a7bbe4
Move EventStream handling into default ReplicationDataHandler ()
This is so that the logic can happen on both master and workers when we move event persistence out.
2020-05-14 14:01:39 +01:00
Erik Johnston
1124111a12
Allow censoring of events to happen on workers. ()
This is safe as we can now write to cache invalidation stream on workers, and is required for when we move event persistence off master.
2020-05-13 17:15:40 +01:00
Erik Johnston
1a1da60ad2
Fix new flake8 errors () 2020-05-12 11:20:48 +01:00
Amber Brown
7cb8b4bc67
Allow configuration of Synapse's cache without using synctl or environment variables () 2020-05-11 18:45:23 +01:00
Quentin Gliech
616af44137
Implement OpenID Connect-based login () 2020-05-08 08:30:40 -04:00
Erik Johnston
0e719f2398
Thread through instance name to replication client. ()
For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.
2020-05-01 17:19:56 +01:00
Erik Johnston
3085cde577
Use stream.current_token() and remove stream_positions() ()
We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.
2020-05-01 15:21:35 +01:00
Patrick Cloke
627b0f5f27
Persist user interactive authentication sessions ()
By persisting the user interactive authentication sessions to the database, this fixes
situations where a user hits different works throughout their auth session and also
allows sessions to persist through restarts of Synapse.
2020-04-30 13:47:49 -04:00
Erik Johnston
37f6823f5b
Add instance name to RDATA/POSITION commands ()
This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.
2020-04-29 16:23:08 +01:00
Erik Johnston
38919b521e
Run replication streamers on workers ()
Currently we never write to streams from workers, but that will change soon
2020-04-28 13:34:12 +01:00
Richard van der Hoff
71a1abb8a1
Stop the master relaying USER_SYNC for other workers ()
Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.

In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.

Fixes (I hope) .
2020-04-22 22:39:04 +01:00
Richard van der Hoff
2aa5bf13c8 Merge branch 'release-v1.12.4' into develop 2020-04-22 13:09:23 +01:00
Erik Johnston
51f7eaf908
Add ability to run replication protocol over redis. ()
This is configured via the `redis` config options.
2020-04-22 13:07:41 +01:00
Richard van der Hoff
974c0d726a
Support GET account_data requests on a worker () 2020-04-21 10:46:30 +01:00
Erik Johnston
5016b162fc
Move client command handling out of TCP protocol ()
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.
2020-04-06 09:58:42 +01:00
Martin Milata
b0db928c63
Extend web_client_location to handle absolute URLs ()
Log warning when filesystem path is used.

Signed-off-by: Martin Milata <martin@martinmilata.cz>
2020-04-03 11:57:34 -04:00
Richard van der Hoff
bae32740da
Remove some run_in_background calls in replication code ()
By running this stuff with `run_in_background`, it won't be correctly reported
against the relevant CPU usage stats.

Fixes 
2020-04-03 12:29:30 +01:00
Erik Johnston
db098ec994 Fix starting workers when federation sending not split out. 2020-03-31 11:25:21 +01:00
Erik Johnston
4f21c33be3
Remove usage of "conn_id" for presence. ()
* Remove `conn_id` usage for UserSyncCommand.

Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.

This really only effects UserSyncCommand.

* Add CLEAR_USER_SYNCS command that is sent on shutdown.

This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
2020-03-30 16:37:24 +01:00
Erik Johnston
4cff617df1
Move catchup of replication streams to worker. ()
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
2020-03-25 14:54:01 +00:00
Erik Johnston
b1cfaf08af
Merge pull request from matrix-org/erikj/fix_worker_startup
Fix starting workers when federation sending not split out.
2020-03-25 09:42:39 +00:00