forked-synapse

mirror of https://mau.dev/maunium/synapse.git synced 2024-10-01 01:36:05 -04:00

Author	SHA1	Message	Date
Richard van der Hoff	ce428a1abe	Fix EventsStream raising assertions when it falls behind Figuring out how to correctly limit updates from this stream without dropping entries is far more complicated than just counting the number of rows being returned. We need to consider each query separately and, if any one query hits the limit, truncate the results from the others. I think this also fixes some potentially long-standing bugs where events or state changes could get missed if we hit the limit on either query.	2020-04-24 13:59:21 +01:00
Richard van der Hoff	9cbdfb3a2f	Make it clear that the limit for an update_function is a target	2020-04-23 15:45:12 +01:00
Richard van der Hoff	23b28266ac	Remove 'limit' param from `get_repl_stream_updates` API there doesn't seem to be much point in passing this limit all around, since both sides agree it's meant to be 100.	2020-04-23 15:44:35 +01:00
Richard van der Hoff	71a1abb8a1	Stop the master relaying USER_SYNC for other workers (#7318 ) Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication. In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits. Fixes (I hope) #7257.	2020-04-22 22:39:04 +01:00
Erik Johnston	841c581c40	Fix replication metrics when using redis (#7325 )	2020-04-22 16:26:19 +01:00
Richard van der Hoff	82d8b1dd1f	Another go at fixing one-word commands (#7326 ) I messed this up last time I tried (#7239 / `e13c6c7`).	2020-04-22 14:34:31 +01:00
Erik Johnston	51f7eaf908	Add ability to run replication protocol over redis. (#7040 ) This is configured via the `redis` config options.	2020-04-22 13:07:41 +01:00
Richard van der Hoff	0f8f02bc39	On catchup, process each row with its own stream id (#7286 ) Other parts of the code (such as the StreamChangeCache) assume that there will not be multiple changes with the same stream id. This code was introduced in #7024, and I hope this fixes #7206.	2020-04-20 11:43:29 +01:00
Richard van der Hoff	67ff7b8ba0	Improve type checking in `replication.tcp.Stream` (#7291 ) The general idea here is to get rid of the type: ignore annotations on all of the current_token and update_function assignments, which would have caught #7290. After a bit of experimentation, it seems like the least-awful way to do this is to pass the offending functions in as parameters to the Stream constructor. Unfortunately that means that the concrete implementations no longer have the same constructor signature as Stream itself, which means that it gets hard to correctly annotate STREAMS_MAP. I've also introduced a couple of new types, to take out some duplication.	2020-04-17 14:49:55 +01:00
Richard van der Hoff	d7d42387f5	Fix 'generator object is not subscriptable' error (#7290 ) Some of the query functions return generators rather than lists, so we can't index into the result. Happily we already have a copy of the results. (think this was introduced in #7024)	2020-04-16 14:37:06 +01:00
Richard van der Hoff	e13c6c7a96	Handle one-word replication commands correctly `REPLICATE` is now a valid command, and it's nice if you can issue it from the console without remembering to call it `REPLICATE ` with a trailing space.	2020-04-07 17:43:46 +01:00
Richard van der Hoff	c3e4b4edb2	Fix warnings about not calling superclass constructor Separate `SimpleCommand` from `Command`, so that things which don't want to use the `data` property don't have to, and thus fix the warnings PyCharm was giving me about not calling `__init__` in the base class.	2020-04-07 17:40:22 +01:00
Richard van der Hoff	6a519a0ca0	Remove vestigal references to SYNC replication command We've ripped pretty much all of this out: let's remove the remains.	2020-04-07 17:40:07 +01:00
Erik Johnston	ce72355d7f	Fix race in replication (#7226 ) Fixes a race between handling `POSITION` and `RDATA` commands. We do this by simply linearizing handling of them.	2020-04-07 11:01:04 +01:00
Erik Johnston	82498ee901	Move server command handling out of TCP protocol (#7187 ) This completes the merging of server and client command processing.	2020-04-07 10:51:07 +01:00
Erik Johnston	5016b162fc	Move client command handling out of TCP protocol (#7185 ) The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.	2020-04-06 09:58:42 +01:00
Erik Johnston	dfa0782254	Remove connections per replication stream metric. (#7195 ) This broke in a recent PR (#7024) and is no longer useful due to all replication clients implicitly subscribing to all streams, so let's just remove it.	2020-04-01 10:40:46 +01:00
Erik Johnston	4f21c33be3	Remove usage of "conn_id" for presence. (#7128 ) * Remove `conn_id` usage for UserSyncCommand. Each tcp replication connection is assigned a "conn_id", which is used to give an ID to a remotely connected worker. In a redis world, there will no longer be a one to one mapping between connection and instance, so instead we need to replace such usages with an ID generated by the remote instances and included in the replicaiton commands. This really only effects UserSyncCommand. * Add CLEAR_USER_SYNCS command that is sent on shutdown. This should help with the case where a synchrotron gets restarted gracefully, rather than rely on 5 minute timeout.	2020-03-30 16:37:24 +01:00
Erik Johnston	4cff617df1	Move catchup of replication streams to worker. (#7024 ) This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.	2020-03-25 14:54:01 +00:00
Richard van der Hoff	a564b92d37	Convert `*StreamRow` classes to inner classes (#7116 ) This just helps keep the rows closer to their streams, so that it's easier to see what the format of each stream is.	2020-03-23 13:59:11 +00:00
Richard van der Hoff	b3cee0ce67	Fix processing of `groups` stream, and use symbolic names for streams (#7117 ) `groups` != `receipts` Introduced in #6964	2020-03-23 11:39:36 +00:00
Erik Johnston	fdb1344716	Remove concept of a non-limited stream. (#7011 )	2020-03-20 14:40:47 +00:00
Erik Johnston	9ce4e344a8	Change device list replication to match new semantics. Instead of sending down batches of user ID/host tuples, send down a row per entity (user ID or host).	2020-02-28 11:25:34 +00:00
Erik Johnston	1f773eec91	Port PresenceHandler to async/await (#6991 )	2020-02-26 15:33:26 +00:00
Erik Johnston	0bd8cf435e	Increase MAX_EVENTS_BEHIND for replication clients	2020-02-21 09:04:33 +00:00
Erik Johnston	c3d4ad8afd	Fix sending server up commands from workers (#6811 ) Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2020-01-30 16:42:11 +00:00
Erik Johnston	d5275fc55f	Propagate cache invalidates from workers to other workers. (#6748 ) Currently if a worker invalidates a cache it will be streamed to master, which then didn't forward those to other workers.	2020-01-27 13:47:50 +00:00
Erik Johnston	5d7a6ad223	Allow streaming cache invalidate all to workers. (#6749 )	2020-01-22 10:37:00 +00:00
Erik Johnston	a8a50f5b57	Wake up transaction queue when remote server comes back online (#6706 ) This will be used to retry outbound transactions to a remote server if we think it might have come back up.	2020-01-17 10:27:19 +00:00
Erik Johnston	48c3a96886	Port synapse.replication.tcp to async/await (#6666 ) * Port synapse.replication.tcp to async/await * Newsfile * Correctly document type of on_<FOO> functions as async * Don't be overenthusiastic with the asyncing....	2020-01-16 09:16:12 +00:00
Erik Johnston	e8b68a4e4b	Fixup synapse.replication to pass mypy checks (#6667 )	2020-01-14 14:08:06 +00:00
Richard van der Hoff	6964ea095b	Reduce the reconnect time when replication fails. (#6617 )	2020-01-03 14:19:09 +00:00
Andrew Morgan	cd96b4586f	lint	2019-11-08 15:45:45 +00:00
Andrew Morgan	c4bdf2d785	Remove content from being sent for account data rdata stream	2019-11-08 15:44:02 +00:00
Richard van der Hoff	cc6243b4c0	document the REPLICATE command a bit better (#6305 ) since I found myself wonder how it works	2019-11-04 12:40:18 +00:00
Hubert Chathi	9c94b48bf1	Merge branch 'develop' into uhoreg/cross_signing_fix_workers_notify	2019-10-31 12:32:07 -04:00
Andrew Morgan	54fef094b3	Remove usage of deprecated logger.warn method from codebase (#6271 ) Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.	2019-10-31 10:23:24 +00:00
Hubert Chathi	998f7fe7d4	make user signatures a separate stream	2019-10-30 17:22:52 -04:00
Andrew Morgan	4548d1f87e	Remove unnecessary parentheses around return statements (#5931 ) Python will return a tuple whether there are parentheses around the returned values or not. I'm just sick of my editor complaining about this all over the place :)	2019-08-30 16:28:26 +01:00
Amber Brown	4806651744	Replace returnValue with return (#5736 )	2019-07-23 23:00:55 +10:00
Amber Brown	463b072b12	Move logging utilities out of the side drawer of util/ and into logging/ (#5606 )	2019-07-04 00:07:04 +10:00
Amber Brown	32e7c9e7f2	Run Black. (#5482 )	2019-06-20 19:32:02 +10:00
Erik Johnston	b5c62c6b26	Fix relations in worker mode	2019-05-16 10:38:13 +01:00
Richard van der Hoff	4b91c313a9	Combine the CurrentStateDeltaStream into the EventStream	2019-03-27 22:07:05 +00:00
Richard van der Hoff	1f6d6f918a	Make EventStream rows have a type ... as a precursor to combining it with the CurrentStateDelta stream.	2019-03-27 22:07:05 +00:00
Richard van der Hoff	015b3622eb	Skip building a ROW_TYPE when building updates We're about to turn it straight into a JSON object anyway so building a ROW_TYPE is a bit pointless, and reduces flexibility in the update_function.	2019-03-27 21:58:03 +00:00
Richard van der Hoff	f570916a3e	Add parse_row method to replication stream class This will allow individual stream classes to override how a row is parsed.	2019-03-27 21:32:33 +00:00
Richard van der Hoff	71dcb275f1	move FederationStream out to its own file	2019-03-27 21:13:14 +00:00
Richard van der Hoff	aa1e017864	move EventsStream out to its own file	2019-03-27 21:13:14 +00:00
Richard van der Hoff	a5798de067	Move replication.tcp.streams into a package	2019-03-27 21:13:14 +00:00
Richard van der Hoff	acaa18f7dd	Fix/improve some docstrings in the replication code. (#4949 )	2019-03-27 21:12:36 +00:00
Richard van der Hoff	8cbbedaa2b	Fix ClientReplicationStreamProtocol.__str__ (#4929 ) `__str__` depended on `self.addr`, which was absent from ClientReplicationStreamProtocol, so attempting to call str on such an object would raise an exception. We can calculate the peer addr from the transport, so there is no need for addr anyway.	2019-03-25 16:41:51 +00:00
Richard van der Hoff	9bde730ef8	Fix bug where read-receipts lost their timestamps (#4927 ) Make sure that they are sent correctly over the replication stream. Fixes: #4898	2019-03-25 16:38:05 +00:00
Richard van der Hoff	cdb8036161	Add a config option for torture-testing worker replication. (#4902 ) Setting this to 50 or so makes a bunch of sytests fail in worker mode.	2019-03-20 16:04:35 +00:00
Andrew Morgan	b9f6163092	Simplify token replication logic	2019-03-05 13:58:30 +00:00
Andrew Morgan	fe7bd23a85	Clean up logic and add comments	2019-03-04 15:08:15 +00:00
Andrew Morgan	9f7cdf3da1	Clearer branching, fix missing list clear	2019-03-04 14:36:52 +00:00
Andrew Morgan	5f0c449dd5	Prevent replication wedging	2019-03-04 14:03:18 +00:00
Erik Johnston	7590e9fa28	Merge pull request #4749 from matrix-org/erikj/replication_connection_backoff Fix tightloop over connecting to replication server	2019-02-27 11:00:59 +00:00
Erik Johnston	6bb1c028f1	Limit cache invalidation replication line length (#4748 )	2019-02-27 10:28:37 +00:00
Erik Johnston	6870fc496f	Move connecting logic into ClientReplicationStreamProtocol	2019-02-27 10:23:51 +00:00
Erik Johnston	25814921f1	Increase the max delay between retry attempts Otherwise if you have many workers they can easily take out master with their connection attempts	2019-02-26 15:12:33 +00:00
Erik Johnston	313987187e	Fix tightloop over connecting to replication server If the client failed to process incoming commands during the initial set up of the replication connection it would immediately disconnect and reconnect, resulting in a tightloop. This can happen, for example, when subscribing to a stream that has a row that is too long in the backlog. The fix here is to not consider the connection successfully set up until the client has succesfully subscribed and caught up with the streams. This ensures that the retry logic timers aren't reset until then, meaning that if an error does happen during start up the client will continue backing off before retrying again.	2019-02-26 15:05:41 +00:00
Erik Johnston	a163b748a5	Don't truncate command name in metrics	2018-10-29 17:34:21 +00:00
Amber Brown	c4b3698a80	Make the replication logger quieter (#4108 )	2018-10-29 22:59:44 +11:00
Travis Ralston	f1a7264663	Fix minor typo in exception	2018-09-13 11:51:12 -06:00
Erik Johnston	3e242dc149	Remove conn_id	2018-09-04 11:45:52 +01:00
Erik Johnston	b13836da7f	Remove conn_id from repl prometheus metrics `conn_id` gets set to a random string, and so we end up filling up prometheus with tonnes of data series, which is bad.	2018-09-03 17:22:49 +01:00
Richard van der Hoff	0e8d78f6aa	Logcontexts for replication command handlers Run the handlers for replication commands as background processes. This should improve the visibility in our metrics, and reduce the number of "running db transaction from sentinel context" warnings. Ideally it means converting the things that fire off deferreds into the night into things that actually return a Deferred when they are done. I've made a bit of a stab at this, but it will probably be leaky.	2018-08-17 00:43:43 +01:00
Richard van der Hoff	f59be4eb0e	Fix unit tests on_notifier_poke no longer runs synchonously, so we have to do a different hack to make sure that the replication data has been sent. Let's actually listen for its arrival.	2018-07-25 10:30:36 +01:00
Richard van der Hoff	371da42ae4	Wrap a number of things that run in the background This will reduce the number of "Starting db connection from sentinel context" warnings, and will help with our metrics.	2018-07-25 09:41:12 +01:00
Amber Brown	49af402019	run isort	2018-07-09 16:09:20 +10:00
Amber Brown	6350bf925e	Attempt to be more performant on PyPy (#3462 )	2018-06-28 14:49:57 +01:00
Amber Brown	07cad26d65	Remove all global reactor imports & pass it around explicitly (#3424 )	2018-06-25 14:08:28 +01:00
Amber Brown	99b77aa829	Fix tcp protocol metrics naming (#3410 )	2018-06-21 09:39:27 +01:00
Richard van der Hoff	b7e7fd2d0e	Fix replication metrics fix bug introduced in #3256	2018-06-04 16:23:05 +01:00
Amber Brown	754826a830	Merge remote-tracking branch 'origin/develop' into 3218-official-prom	2018-05-28 18:57:23 +10:00
Amber Brown	1f69693347	Merge pull request #3244 from NotAFile/py3-six-4 replace some iteritems with six	2018-05-24 13:04:07 -05:00
Amber Brown	b6063631c3	more cleanup	2018-05-22 17:36:20 -05:00
Amber Brown	228f1f584e	fix the test failures	2018-05-22 15:02:38 -05:00
Amber Brown	8f5a688d42	cleanups, self-registration	2018-05-22 10:56:03 -05:00
Amber Brown	a8990fa2ec	Merge remote-tracking branch 'origin/develop' into 3218-official-prom	2018-05-22 10:50:26 -05:00
Richard van der Hoff	9ea219c514	Send users a server notice about consent When a user first syncs, we will send them a server notice asking them to consent to the privacy policy if they have not already done so.	2018-05-22 11:54:51 +01:00
Amber Brown	fcc525b0b7	rest of the changes	2018-05-21 19:48:57 -05:00
Amber Brown	df9f72d9e5	replacing portions	2018-05-21 19:47:37 -05:00
Adrian Tschira	933bf2dd35	replace some iteritems with six Signed-off-by: Adrian Tschira <nota@notafile.com>	2018-05-19 17:59:26 +02:00
Adrian Tschira	57b58e2174	make imports local Signed-off-by: Adrian Tschira <nota@notafile.com>	2018-04-28 13:41:41 +02:00
Richard van der Hoff	3ee4ad09eb	Fix json encoding bug in replication json encoders have an encode method, not a dumps method.	2018-04-03 15:09:48 +01:00
Richard van der Hoff	05630758f2	Use static JSONEncoders using json.dumps with custom options requires us to create a new JSONEncoder on each call. It's more efficient to create one upfront and reuse it.	2018-03-29 23:13:33 +01:00
Erik Johnston	9aa5a0af51	Explicitly use simplejson	2018-03-20 09:58:13 +00:00
Erik Johnston	610accbb7f	Fix replication after switch to simplejson Turns out that simplejson serialises namedtuple's as dictionaries rather than tuples by default.	2018-03-19 16:12:48 +00:00
Erik Johnston	fa72803490	Merge branch 'master' of github.com:matrix-org/synapse into develop	2018-03-19 11:41:01 +00:00
Erik Johnston	926ba76e23	Replace ujson with simplejson	2018-03-15 23:43:31 +00:00
Richard van der Hoff	5c3c32f16f	Metrics for number of RDATA commands received I found myself wishing we had this.	2018-01-15 17:45:55 +00:00
Richard van der Hoff	0edf085b68	Fix some logcontext leaks in replication resource The @measure_func annotations rely on the wrapped function respecting the logcontext rules. Add the necessary yields to make this work.	2017-11-23 23:19:43 +00:00
Richard van der Hoff	eaaabc6c4f	replace 'except:' with 'except Exception:' what could possibly go wrong	2017-10-23 15:52:32 +01:00
hera	f807f7f804	log when we get an exception handling replication updates	2017-10-12 11:51:24 +01:00
Erik Johnston	2cc998fed8	Fix replication. And notify	2017-07-20 17:13:18 +01:00
Erik Johnston	925b3638ff	Reduce log levels in tcp replication	2017-07-11 10:04:21 +01:00
Erik Johnston	27f26e48b7	Serialize user ip command as json	2017-06-27 16:25:38 +01:00
Erik Johnston	78cefd78d6	Make workers report to master for user ip updates	2017-06-27 14:58:10 +01:00
Erik Johnston	6aa5bc8635	Initial worker impl	2017-06-16 11:47:11 +01:00
Erik Johnston	2cac7623a5	Add missing notifier	2017-06-09 11:24:41 +01:00
Erik Johnston	2e6f5a4910	Typo	2017-04-10 16:17:40 +01:00
Erik Johnston	efcb6db688	Merge pull request #2109 from matrix-org/erikj/send_queue_fix Fix up federation SendQueue and document types	2017-04-10 13:09:25 +01:00
Erik Johnston	0364d23210	Up replication ping timeout	2017-04-10 11:32:05 +01:00
Erik Johnston	ab904caf33	Comments	2017-04-10 10:02:17 +01:00
Erik Johnston	98ce212093	Merge pull request #2103 from matrix-org/erikj/no-double-encode Don't double encode replication data	2017-04-07 09:39:52 +01:00
Erik Johnston	ad544c803a	Document types of the replication streams	2017-04-06 13:28:52 +01:00
Erik Johnston	69b3fd485d	Fix incorrect type when using InvalidateCacheCommand	2017-04-06 09:36:38 +01:00
Erik Johnston	fcc803b2bf	Add log lines	2017-04-05 17:13:44 +01:00
Erik Johnston	3f213d908d	Rearrange metrics	2017-04-05 14:15:09 +01:00
Erik Johnston	1ca0e78ca1	Fix typo	2017-04-05 13:43:39 +01:00
Erik Johnston	b43d3267e2	Fixup some metrics for tcp repl	2017-04-05 13:34:54 +01:00
Erik Johnston	a5c401bd12	Merge pull request #2097 from matrix-org/erikj/repl_tcp_client Move to using TCP replication	2017-04-05 09:36:21 +01:00
Erik Johnston	a76886726b	Merge pull request #2098 from matrix-org/erikj/repl_tcp_fix Advance replication streams even if nothing is listening	2017-04-04 15:40:51 +01:00
Erik Johnston	4264ceb31c	Fiddle tcp replication logging	2017-04-04 14:14:03 +01:00
Erik Johnston	023ee197be	Advance replication streams even if nothing is listening Otherwise the streams don't advance and steadily fall behind, so when a worker does connect either a) they'll be streamed lots of old updates or b) the connection will fail as the streams are too far behind.	2017-04-04 13:19:26 +01:00
Erik Johnston	52bfa604e1	Add basic replication client handler and factory	2017-04-03 15:34:13 +01:00
Erik Johnston	0a6a966e2b	Always advance stream tokens	2017-04-03 15:22:56 +01:00
Erik Johnston	1df7c28661	Use callbacks to notify tcp replication rather than deferreds	2017-03-31 15:42:51 +01:00
Erik Johnston	36d2b66f90	Add a timestamp to USER_SYNC command This timestamp is used to indicate when the user last sync'd	2017-03-31 15:42:22 +01:00
Erik Johnston	bfcf016714	Fix up docs	2017-03-31 11:19:24 +01:00
Erik Johnston	4d7fc7f977	Add server side resource for tcp replication	2017-03-30 13:24:45 +01:00
Erik Johnston	7450693435	Initial TCP protocol implementation This defines the low level TCP replication protocol	2017-03-30 12:54:46 +01:00
Erik Johnston	8da6f0be48	Define the various streams we will replicate	2017-03-30 12:54:46 +01:00

... 2 3 4 5 6

276 Commits