Commit Graph

31 Commits

Author SHA1 Message Date
Erik Johnston
8de3703d21
Make event persisters periodically announce position over replication. (#8499)
Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. `get_current_token()`) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress.

This PR changes it so that the `MultiWriterIDGenerator` keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.
2020-10-12 15:51:41 +01:00
Patrick Cloke
eebf52be06
Be stricter about JSON that is accepted by Synapse (#8106) 2020-08-19 07:26:03 -04:00
David Vo
4dd27e6d11
Reduce unnecessary whitespace in JSON. (#7372) 2020-08-07 08:02:55 -04:00
Erik Johnston
f299441cc6
Add ability to shard the federation sender (#7798) 2020-07-10 18:26:36 +01:00
Patrick Cloke
38e1fac886
Fix some spelling mistakes / typos. (#7811) 2020-07-09 09:52:58 -04:00
Patrick Cloke
e7efd8f827
Do not use simplejson in Synapse. (#7800) 2020-07-08 07:15:08 -04:00
Patrick Cloke
7d2532be36
Discard RDATA from already seen positions. (#7648) 2020-06-15 08:44:54 -04:00
Erik Johnston
d7983b63a6
Support any process writing to cache invalidation stream. (#7436) 2020-05-07 13:51:08 +01:00
Erik Johnston
37f6823f5b
Add instance name to RDATA/POSITION commands (#7364)
This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.
2020-04-29 16:23:08 +01:00
Richard van der Hoff
71a1abb8a1
Stop the master relaying USER_SYNC for other workers (#7318)
Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.

In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.

Fixes (I hope) #7257.
2020-04-22 22:39:04 +01:00
Richard van der Hoff
82d8b1dd1f
Another go at fixing one-word commands (#7326)
I messed this up last time I tried (#7239 / e13c6c7).
2020-04-22 14:34:31 +01:00
Erik Johnston
51f7eaf908
Add ability to run replication protocol over redis. (#7040)
This is configured via the `redis` config options.
2020-04-22 13:07:41 +01:00
Richard van der Hoff
c3e4b4edb2 Fix warnings about not calling superclass constructor
Separate `SimpleCommand` from `Command`, so that things which don't want to use
the `data` property don't have to, and thus fix the warnings PyCharm was giving
me about not calling `__init__` in the base class.
2020-04-07 17:40:22 +01:00
Richard van der Hoff
6a519a0ca0 Remove vestigal references to SYNC replication command
We've ripped pretty much all of this out: let's remove the remains.
2020-04-07 17:40:07 +01:00
Erik Johnston
4f21c33be3
Remove usage of "conn_id" for presence. (#7128)
* Remove `conn_id` usage for UserSyncCommand.

Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.

This really only effects UserSyncCommand.

* Add CLEAR_USER_SYNCS command that is sent on shutdown.

This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
2020-03-30 16:37:24 +01:00
Erik Johnston
4cff617df1
Move catchup of replication streams to worker. (#7024)
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
2020-03-25 14:54:01 +00:00
Erik Johnston
a8a50f5b57
Wake up transaction queue when remote server comes back online (#6706)
This will be used to retry outbound transactions to a remote server if
we think it might have come back up.
2020-01-17 10:27:19 +00:00
Erik Johnston
e8b68a4e4b
Fixup synapse.replication to pass mypy checks (#6667) 2020-01-14 14:08:06 +00:00
Amber Brown
32e7c9e7f2
Run Black. (#5482) 2019-06-20 19:32:02 +10:00
Erik Johnston
313987187e Fix tightloop over connecting to replication server
If the client failed to process incoming commands during the initial set
up of the replication connection it would immediately disconnect and
reconnect, resulting in a tightloop.

This can happen, for example, when subscribing to a stream that has a
row that is too long in the backlog.

The fix here is to not consider the connection successfully set up until
the client has succesfully subscribed and caught up with the streams.
This ensures that the retry logic timers aren't reset until then,
meaning that if an error does happen during start up the client will
continue backing off before retrying again.
2019-02-26 15:05:41 +00:00
Richard van der Hoff
0e8d78f6aa Logcontexts for replication command handlers
Run the handlers for replication commands as background processes. This should
improve the visibility in our metrics, and reduce the number of "running db
transaction from sentinel context" warnings.

Ideally it means converting the things that fire off deferreds into the night
into things that actually return a Deferred when they are done. I've made a bit
of a stab at this, but it will probably be leaky.
2018-08-17 00:43:43 +01:00
Amber Brown
6350bf925e
Attempt to be more performant on PyPy (#3462) 2018-06-28 14:49:57 +01:00
Richard van der Hoff
3ee4ad09eb Fix json encoding bug in replication
json encoders have an encode method, not a dumps method.
2018-04-03 15:09:48 +01:00
Richard van der Hoff
05630758f2 Use static JSONEncoders
using json.dumps with custom options requires us to create a new JSONEncoder on
each call. It's more efficient to create one upfront and reuse it.
2018-03-29 23:13:33 +01:00
Erik Johnston
9aa5a0af51 Explicitly use simplejson 2018-03-20 09:58:13 +00:00
Erik Johnston
610accbb7f Fix replication after switch to simplejson
Turns out that simplejson serialises namedtuple's as dictionaries rather
than tuples by default.
2018-03-19 16:12:48 +00:00
Erik Johnston
926ba76e23 Replace ujson with simplejson 2018-03-15 23:43:31 +00:00
Erik Johnston
27f26e48b7 Serialize user ip command as json 2017-06-27 16:25:38 +01:00
Erik Johnston
78cefd78d6 Make workers report to master for user ip updates 2017-06-27 14:58:10 +01:00
Erik Johnston
36d2b66f90 Add a timestamp to USER_SYNC command
This timestamp is used to indicate when the user last sync'd
2017-03-31 15:42:22 +01:00
Erik Johnston
7450693435 Initial TCP protocol implementation
This defines the low level TCP replication protocol
2017-03-30 12:54:46 +01:00