forked-synapse

mirror of https://mau.dev/maunium/synapse.git synced 2024-10-01 01:36:05 -04:00

Author	SHA1	Message	Date
Richard van der Hoff	aa07c37cf0	Move and rename `get_devices_with_keys_by_user` (#8204 ) * Move `get_devices_with_keys_by_user` to `EndToEndKeyWorkerStore` this seems a better fit for it. This commit simply moves the existing code: no other changes at all. * Rename `get_devices_with_keys_by_user` to better reflect what it does. * get_device_stream_token abstract method To avoid referencing fields which are declared in the derived classes, make `get_device_stream_token` abstract, and define that in the classes which define `_device_list_id_gen`.	2020-09-01 12:41:21 +01:00
Erik Johnston	3b4556cf87	Fix `wait_for_stream_position` for multiple waiters. (#8196 ) This fixes a bug where having multiple callers waiting on the same stream and position will cause it to try and compare two deferreds, which fails (due to the sorted list having an entry of `Tuple[int, Deferred]`).	2020-08-28 17:12:45 +01:00
Erik Johnston	e3c91a3c55	Make SlavedIdTracker.advance have same interface as MultiWriterIDGenerator (#8171 )	2020-08-26 13:15:20 +01:00
Erik Johnston	c9c544cda5	Remove `ChainedIdGenerator`. (#8123 ) It's just a thin wrapper around two ID gens to make `get_current_token` and `get_next` return tuples. This can easily be replaced by calling the appropriate methods on the underlying ID gens directly.	2020-08-19 13:41:51 +01:00
Patrick Cloke	eebf52be06	Be stricter about JSON that is accepted by Synapse (#8106 )	2020-08-19 07:26:03 -04:00
Erik Johnston	76d21d14a0	Separate `get_current_token` into two. (#8113 ) The function is used for two purposes: 1) for subscribers of streams to get a token they can use to get further updates with, and 2) for replication to track position of the writers of the stream. For streams with a single writer the two scenarios produce the same result, however the situation becomes complicated for streams with multiple writers. The current `MultiWriterIdGenerator` does not correctly handle the first case (which is not an issue as its only used for the `caches` stream which nothing subscribes to outside of replication).	2020-08-19 10:39:31 +01:00
Patrick Cloke	ac77cdb64e	Add a shadow-banned flag to users. (#8092 )	2020-08-14 12:37:59 -04:00
David Vo	4dd27e6d11	Reduce unnecessary whitespace in JSON. (#7372 )	2020-08-07 08:02:55 -04:00
Patrick Cloke	d4a7829b12	Convert synapse.api to async/await (#8031 )	2020-08-06 08:30:06 -04:00
Erik Johnston	a7bdf98d01	Rename database classes to make some sense (#8033 )	2020-08-05 21:38:57 +01:00
Patrick Cloke	3b415e23a5	Convert replication code to async/await. (#7987 )	2020-08-03 07:12:55 -04:00
Richard van der Hoff	349119a340	Synapse 1.18.0rc2 (2020-07-28) ============================== Bugfixes -------- - Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876)) - Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967)) Internal Changes ---------------- - Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876)) -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl8f/f8ACgkQOSor00I9 eP8/Uwf8CiVWvrBsmFZMvxJDkUWm0/f1kN4IQdm8ibDtyNyvFUx+Y1K8KOQS+VwG a3bZqSC2Vv2sO9O9kR+V2tk831l+ujO0Nlaohuqyvhcl9lzh04rRYI9x9IHlAq2H WPb0NMLwMufL6YkXDBwZT/G9TVW1vLRGASu4f7X2rXqek34VNVgYbg1hB2dp4dDa wjKk3iBZ6h34IhKPgu0sLBUcyvX4U5xdOHjEG3HXvNnvDNO0HMD8rGB7065vFMD6 PH4nUK/h+RL0UBs2sJOMK1ZazFUODdURwANJQNAQ6pNvf9/RWgw2okka2bYIcmQQ UT7tiwMsBvKdy4PER5fcDX3COY16qw== =Q+bI -----END PGP SIGNATURE----- Merge tag 'v1.18.0rc2' into develop Synapse 1.18.0rc2 (2020-07-28) ============================== Bugfixes -------- - Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876)) - Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967)) Internal Changes ---------------- - Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))	2020-07-28 11:31:31 +01:00
Erik Johnston	a8f7ed28c6	Typing worker needs to handle stream update requests (#7967 ) IIRC this doesn't break tests because its only hit on reconnection, or something. Basically, when a process needs to fetch missing updates for the `typing` stream it needs to query the writer instance via HTTP (as we don't write typing notifications to the DB), the problem was that the endpoint (`streams`) was only registered on master and specifically not on the typing writer worker.	2020-07-28 11:04:53 +01:00
Richard van der Hoff	f57b99af22	Handle replication commands synchronously where possible (#7876 ) Most of the stuff we do for replication commands can be done synchronously. There's no point spinning up background processes if we're not going to need them.	2020-07-27 18:54:43 +01:00
Patrick Cloke	8553f46498	Convert a synapse.events to async/await. (#7949 )	2020-07-27 13:40:22 -04:00
Erik Johnston	84d099ae11	Fix typing replication not being handled on master (#7959 ) Handling of incoming typing stream updates from replication was not hooked up on master, effecting set ups where typing was handled on a different worker. This is really only a problem if the master process is also handling sync requests, which is unlikely for those that are at the stage of moving typing off. The other observable effect is that if a worker restarts or a replication connect drops then the typing worker will issue a `POSITION typing`, triggering master process to try and stream all typing updates from position 0. Fixes #7907	2020-07-27 14:10:53 +01:00
Richard van der Hoff	931b026844	Remove an unused prometheus metric (#7878 )	2020-07-22 00:40:55 +01:00
Richard van der Hoff	05060e0223	Track command processing as a background process (#7879 ) I'm going to be doing more stuff synchronously, and I don't want to lose the CPU metrics down the sofa.	2020-07-22 00:40:42 +01:00
Karthikeyan Singaravelan	a7b06a81f0	Fix deprecation warning: import ABC from collections.abc (#7892 )	2020-07-20 13:33:04 -04:00
Erik Johnston	2d2acc1cf2	Stop using 'device_max_stream_id' (#7882 ) It serves no purpose and updating everytime we write to the device inbox stream means all such transactions will conflict, causing lots of transaction failures and retries.	2020-07-17 17:03:27 +01:00
Richard van der Hoff	e5300063ed	Optimise queueing of inbound replication commands (#7861 ) When we get behind on replication, we tend to stack up background processes behind a linearizer. Bg processes are heavy (particularly with respect to prometheus metrics) and linearizers aren't terribly efficient once the queue gets long either. A better approach is to maintain a queue of requests to be processed, and nominate a single process to work its way through the queue. Fixes: #7444	2020-07-16 15:49:37 +01:00
Erik Johnston	f2e38ca867	Allow moving typing off master (#7869 )	2020-07-16 15:12:54 +01:00
Erik Johnston	f299441cc6	Add ability to shard the federation sender (#7798 )	2020-07-10 18:26:36 +01:00
Patrick Cloke	38e1fac886	Fix some spelling mistakes / typos. (#7811 )	2020-07-09 09:52:58 -04:00
Richard van der Hoff	2ab0b021f1	Generate real events when we reject invites (#7804 ) Fixes #2181. The basic premise is that, when we fail to reject an invite via the remote server, we can generate our own out-of-band leave event and persist it as an outlier, so that we have something to send to the client.	2020-07-09 10:40:19 +01:00
Patrick Cloke	e7efd8f827	Do not use simplejson in Synapse. (#7800 )	2020-07-08 07:15:08 -04:00
Erik Johnston	67d7756fcf	Refactor getting replication updates from database v2. (#7740 )	2020-07-07 12:11:35 +01:00
Will Hunt	62b1ce8539	isort 5 compatibility (#7786 ) The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.	2020-07-05 16:32:02 +01:00
Erik Johnston	5cdca53aa0	Merge different Resource implementation classes (#7732 )	2020-07-03 19:02:19 +01:00
Richard van der Hoff	f01e2ca039	Use symbolic names for replication stream names (#7768 ) This makes it much easier to find where streams are referenced.	2020-07-01 16:35:40 +01:00
Erik Johnston	f6f7511a4c	Refactor getting replication updates from database. (#7636 ) The aim here is to make it easier to reason about when streams are limited and when they're not, by moving the logic into the database functions themselves. This should mean we can kill of `db_query_to_update_function` function.	2020-06-16 17:10:28 +01:00
Dagfinn Ilmari Mannsåker	a3f11567d9	Replace all remaining six usage with native Python 3 equivalents (#7704 )	2020-06-16 08:51:47 -04:00
Patrick Cloke	7d2532be36	Discard RDATA from already seen positions. (#7648 )	2020-06-15 08:44:54 -04:00
Erik Johnston	664409b169	Fix bug in account data replication stream. (#7656 ) * Ensure account data stream IDs are unique. The account data stream is shared between three tables, and the maximum allocated ID was tracked in a dedicated table. Updating the max ID happened outside the transaction that allocated the ID, leading to a race where if the server was restarted then the same ID could be allocated but the max ID failed to be updated, leading it to be reused. The ID generators have support for tracking across multiple tables, so we may as well use that instead of a dedicated table. * Fix bug in account data replication stream. If the same stream ID was used in both global and room account data then the getting updates for the replication stream would fail due to `heapq.merge(..)` trying to compare a `str` with a `None`. (This is because you'd have two rows like `(534, '!room')` and `(534, None)` from the room and global account data tables). Fix is just to order by stream ID, since we don't rely on the ordering beyond that. The bug where stream IDs can be reused should be fixed now, so this case shouldn't happen going forward. Fixes #7617	2020-06-09 16:28:57 +01:00
Patrick Cloke	f1e61ef85c	Typo fixes.	2020-06-05 08:43:21 -04:00
Erik Johnston	9bac5d62b3	Ensure ReplicationStreamer is always started when replication enabled. (#7579 ) Fixes #7566.	2020-05-27 11:44:19 +01:00
Erik Johnston	e5c67d04db	Add option to move event persistence off master (#7517 )	2020-05-22 16:11:35 +01:00
Erik Johnston	1531b214fc	Add ability to wait for replication streams (#7542 ) The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room). Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on. People probably want to look at this commit by commit.	2020-05-22 14:21:54 +01:00
Erik Johnston	51055c8c44	Allow ReplicationRestResource to be added to workers (#7515 ) This allows workers to talk to each other over HTTP replication.	2020-05-18 12:24:48 +01:00
Richard van der Hoff	4d1afb1dfe	Merge pull request #7519 from matrix-org/rav/kill_py2_code Kill off some old python 2 code	2020-05-18 10:45:30 +01:00
Richard van der Hoff	91f51c611c	remove redundant `__func__` this is a no-op under python 3	2020-05-15 19:37:41 +01:00
Richard van der Hoff	6c1f7c722f	Fix limit logic for AccountDataStream (#7384 ) Make sure that the AccountDataStream presents complete updates, in the right order. This is much the same fix as #7337 and #7358, but applied to a different stream.	2020-05-15 19:03:25 +01:00
Erik Johnston	1f36ff69e8	Move event stream handling out of slave store. (#7491 ) This allows us to have the logic on both master and workers, which is necessary to move event persistence off master. We also combine the instantiation of ID generators from DataStore and slave stores to the base worker stores. This allows us to select which process writes events independently of the master/worker splits.	2020-05-15 16:43:59 +01:00
Erik Johnston	4734a7bbe4	Move EventStream handling into default ReplicationDataHandler (#7493 ) This is so that the logic can happen on both master and workers when we move event persistence out.	2020-05-14 14:01:39 +01:00
Erik Johnston	1de36407d1	Add `instance_map` config and route replication calls (#7495 )	2020-05-14 14:00:58 +01:00
Erik Johnston	7ee24c5674	Have all instances correctly respond to REPLICATE command. (#7475 ) Before all streams were only written to from master, so only master needed to respond to `REPLICATE` commands. Before all instances wrote to the cache invalidation stream, but didn't respond to `REPLICATE`. This was a bug, which could lead to missed rows from cache invalidation stream if an instance is restarted, however all the caches would be empty in that case so it wasn't a problem.	2020-05-13 10:27:02 +01:00
Erik Johnston	8ca79613e6	Fix Redis reconnection logic (#7482 ) Proactively send out `POSITION` commands (as if we had just received a `REPLICATE`) when we connect to Redis. This is important as other instances won't notice we've connected to issue a `REPLICATE` command (unlike for direct TCP connections). This is only currently an issue if master process reconnects without restarting (if it restarts then it won't have written anything and so other instances probably won't have missed anything).	2020-05-13 09:57:15 +01:00
Amber Brown	7cb8b4bc67	Allow configuration of Synapse's cache without using synctl or environment variables (#6391 )	2020-05-11 18:45:23 +01:00
Andrew Morgan	5cf758cdd6	Merge branch 'release-v1.13.0' into develop * release-v1.13.0: Don't UPGRADE database rows RST indenting Put rollback instructions in upgrade notes Fix changelog typo Oh yeah, RST Absolute URL it is then Fix upgrade notes link Provide summary of upgrade issues in changelog. Fix ) Move next version notes from changelog to upgrade notes Changelog fixes 1.13.0rc1 Documentation on setting up redis (#7446) Rework UI Auth session validation for registration (#7455) Fix errors from malformed log line (#7454) Drop support for redis.dbid (#7450)	2020-05-11 16:46:33 +01:00
Richard van der Hoff	aa5aa6f96a	Fix errors from malformed log line (#7454 )	2020-05-07 19:51:38 +01:00
Richard van der Hoff	da9b2db3af	Drop support for redis.dbid (#7450 ) Since we only use pubsub, the dbid is irrelevant.	2020-05-07 16:46:15 +01:00
Erik Johnston	d7983b63a6	Support any process writing to cache invalidation stream. (#7436 )	2020-05-07 13:51:08 +01:00
Richard van der Hoff	62ee862119	Merge branch 'release-v1.13.0' into develop	2020-05-06 15:56:03 +01:00
Richard van der Hoff	2e0c46ca07	Merge branch 'release-v1.13.0' into develop	2020-05-06 11:58:31 +01:00
Richard van der Hoff	a8c17da245	Merge branch 'release-v1.13.0' into rav/fix_dropped_messages	2020-05-05 23:01:12 +01:00
Richard van der Hoff	1242267316	Merge branch 'release-v1.13.0' into rav/fix_dropped_messages	2020-05-05 22:38:44 +01:00
Richard van der Hoff	7f7eedbebb	Wait for a POSITION on the right connection before accepting RDATA ... otherwise we can believe we're up to date when we're not.	2020-05-05 22:38:16 +01:00
Brendan Abolivier	5b8023dc7f	Move logs about discarded RDATA to debug (#7421 )	2020-05-05 21:07:33 +02:00
Richard van der Hoff	d78265af0c	Wait to subscribe before sending REPLICATE	2020-05-05 19:31:37 +01:00
Richard van der Hoff	d5aa7d93ed	Fix catchup-on-reconnect for the Federation Stream (#7374 ) looks like we managed to break this during the refactorathon.	2020-05-05 14:15:57 +01:00
Erik Johnston	350421e058	Fix redis password support. (#7401 ) We forgot to set the password on the subscriber connection, as well as not calling super methods for overridden connectionMade/connectionLost functions.	2020-05-04 14:04:09 +01:00
Erik Johnston	0e719f2398	Thread through instance name to replication client. (#7369 ) For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.	2020-05-01 17:19:56 +01:00
Erik Johnston	3085cde577	Use `stream.current_token()` and remove `stream_positions()` (#7172 ) We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.	2020-05-01 15:21:35 +01:00
Richard van der Hoff	b2dba06079	Workaround for assertion errors from db_query_to_update_function (#7378 ) Hopefully this is no worse than what we have on master...	2020-05-01 09:25:16 +01:00
Erik Johnston	37f6823f5b	Add instance name to RDATA/POSITION commands (#7364 ) This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.	2020-04-29 16:23:08 +01:00
Erik Johnston	3eab76ad43	Don't relay REMOTE_SERVER_UP cmds to same conn. (#7352 ) For direct TCP connections we need the master to relay REMOTE_SERVER_UP commands to the other connections so that all instances get notified about it. The old implementation just relayed to all connections, assuming that sending back to the original sender of the command was safe. This is not true for redis, where commands sent get echoed back to the sender, which was causing master to effectively infinite loop sending and then re-receiving REMOTE_SERVER_UP commands that it sent. The fix is to ensure that we only relay to other connections and not to the connection we received the notification from. Fixes #7334.	2020-04-29 14:10:59 +01:00
Richard van der Hoff	c2e1a2110f	Fix limit logic for EventsStream (#7358 ) * Factor out functions for injecting events into database I want to add some more flexibility to the tools for injecting events into the database, and I don't want to clutter up HomeserverTestCase with them, so let's factor them out to a new file. * Rework TestReplicationDataHandler This wasn't very easy to work with: the mock wrapping was largely superfluous, and it's useful to be able to inspect the received rows, and clear out the received list. * Fix AssertionErrors being thrown by EventsStream Part of the problem was that there was an off-by-one error in the assertion, but also the limit logic was too simple. Fix it all up and add some tests.	2020-04-29 12:30:36 +01:00
Erik Johnston	38919b521e	Run replication streamers on workers (#7146 ) Currently we never write to streams from workers, but that will change soon	2020-04-28 13:34:12 +01:00
Richard van der Hoff	ce428a1abe	Fix EventsStream raising assertions when it falls behind Figuring out how to correctly limit updates from this stream without dropping entries is far more complicated than just counting the number of rows being returned. We need to consider each query separately and, if any one query hits the limit, truncate the results from the others. I think this also fixes some potentially long-standing bugs where events or state changes could get missed if we hit the limit on either query.	2020-04-24 13:59:21 +01:00
Richard van der Hoff	9cbdfb3a2f	Make it clear that the limit for an update_function is a target	2020-04-23 15:45:12 +01:00
Richard van der Hoff	23b28266ac	Remove 'limit' param from `get_repl_stream_updates` API there doesn't seem to be much point in passing this limit all around, since both sides agree it's meant to be 100.	2020-04-23 15:44:35 +01:00
Richard van der Hoff	71a1abb8a1	Stop the master relaying USER_SYNC for other workers (#7318 ) Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication. In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits. Fixes (I hope) #7257.	2020-04-22 22:39:04 +01:00
Erik Johnston	841c581c40	Fix replication metrics when using redis (#7325 )	2020-04-22 16:26:19 +01:00
Richard van der Hoff	82d8b1dd1f	Another go at fixing one-word commands (#7326 ) I messed this up last time I tried (#7239 / `e13c6c7`).	2020-04-22 14:34:31 +01:00
Erik Johnston	51f7eaf908	Add ability to run replication protocol over redis. (#7040 ) This is configured via the `redis` config options.	2020-04-22 13:07:41 +01:00
Richard van der Hoff	0f8f02bc39	On catchup, process each row with its own stream id (#7286 ) Other parts of the code (such as the StreamChangeCache) assume that there will not be multiple changes with the same stream id. This code was introduced in #7024, and I hope this fixes #7206.	2020-04-20 11:43:29 +01:00
Richard van der Hoff	67ff7b8ba0	Improve type checking in `replication.tcp.Stream` (#7291 ) The general idea here is to get rid of the type: ignore annotations on all of the current_token and update_function assignments, which would have caught #7290. After a bit of experimentation, it seems like the least-awful way to do this is to pass the offending functions in as parameters to the Stream constructor. Unfortunately that means that the concrete implementations no longer have the same constructor signature as Stream itself, which means that it gets hard to correctly annotate STREAMS_MAP. I've also introduced a couple of new types, to take out some duplication.	2020-04-17 14:49:55 +01:00
Richard van der Hoff	d7d42387f5	Fix 'generator object is not subscriptable' error (#7290 ) Some of the query functions return generators rather than lists, so we can't index into the result. Happily we already have a copy of the results. (think this was introduced in #7024)	2020-04-16 14:37:06 +01:00
Richard van der Hoff	e13c6c7a96	Handle one-word replication commands correctly `REPLICATE` is now a valid command, and it's nice if you can issue it from the console without remembering to call it `REPLICATE ` with a trailing space.	2020-04-07 17:43:46 +01:00
Richard van der Hoff	c3e4b4edb2	Fix warnings about not calling superclass constructor Separate `SimpleCommand` from `Command`, so that things which don't want to use the `data` property don't have to, and thus fix the warnings PyCharm was giving me about not calling `__init__` in the base class.	2020-04-07 17:40:22 +01:00
Richard van der Hoff	6a519a0ca0	Remove vestigal references to SYNC replication command We've ripped pretty much all of this out: let's remove the remains.	2020-04-07 17:40:07 +01:00
Erik Johnston	ce72355d7f	Fix race in replication (#7226 ) Fixes a race between handling `POSITION` and `RDATA` commands. We do this by simply linearizing handling of them.	2020-04-07 11:01:04 +01:00
Erik Johnston	82498ee901	Move server command handling out of TCP protocol (#7187 ) This completes the merging of server and client command processing.	2020-04-07 10:51:07 +01:00
Erik Johnston	5016b162fc	Move client command handling out of TCP protocol (#7185 ) The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.	2020-04-06 09:58:42 +01:00
Erik Johnston	dfa0782254	Remove connections per replication stream metric. (#7195 ) This broke in a recent PR (#7024) and is no longer useful due to all replication clients implicitly subscribing to all streams, so let's just remove it.	2020-04-01 10:40:46 +01:00
Erik Johnston	4f21c33be3	Remove usage of "conn_id" for presence. (#7128 ) * Remove `conn_id` usage for UserSyncCommand. Each tcp replication connection is assigned a "conn_id", which is used to give an ID to a remotely connected worker. In a redis world, there will no longer be a one to one mapping between connection and instance, so instead we need to replace such usages with an ID generated by the remote instances and included in the replicaiton commands. This really only effects UserSyncCommand. * Add CLEAR_USER_SYNCS command that is sent on shutdown. This should help with the case where a synchrotron gets restarted gracefully, rather than rely on 5 minute timeout.	2020-03-30 16:37:24 +01:00
Erik Johnston	4cff617df1	Move catchup of replication streams to worker. (#7024 ) This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.	2020-03-25 14:54:01 +00:00
Richard van der Hoff	a564b92d37	Convert `*StreamRow` classes to inner classes (#7116 ) This just helps keep the rows closer to their streams, so that it's easier to see what the format of each stream is.	2020-03-23 13:59:11 +00:00
Richard van der Hoff	b3cee0ce67	Fix processing of `groups` stream, and use symbolic names for streams (#7117 ) `groups` != `receipts` Introduced in #6964	2020-03-23 11:39:36 +00:00
Erik Johnston	fdb1344716	Remove concept of a non-limited stream. (#7011 )	2020-03-20 14:40:47 +00:00
Erik Johnston	a319cb1dd1	Change device list streams to have one row per ID (#7010 ) * Add 'device_lists_outbound_pokes' as extra table. This makes sure we check all the relevant tables to get the current max stream ID. Currently not doing so isn't problematic as the max stream ID in `device_lists_outbound_pokes` is the same as in `device_lists_stream`, however that will change. * Change device lists stream to have one row per id. This will make it possible to process the streams more incrementally, avoiding having to process large chunks at once. * Change device list replication to match new semantics. Instead of sending down batches of user ID/host tuples, send down a row per entity (user ID or host). * Newsfile * Remove handling of multiple rows per ID * Fix worker handling * Comments from review	2020-03-19 11:36:53 +00:00
Erik Johnston	6e6476ef07	Comments from review	2020-03-18 10:13:55 +00:00
Richard van der Hoff	78a15b1f9d	Store room_versions in EventBase objects (#6875 ) This is a bit fiddly because it all has to be done on one fell swoop: * Wherever we create a new event, pass in the room version (and check it matches the format version) * When we prune an event, use the room version of the unpruned event to create the pruned version. * When we pass an event over the replication protocol, pass the room version over alongside it, and use it when deserialising the event again.	2020-03-05 15:46:44 +00:00
Erik Johnston	9ce4e344a8	Change device list replication to match new semantics. Instead of sending down batches of user ID/host tuples, send down a row per entity (user ID or host).	2020-02-28 11:25:34 +00:00
Erik Johnston	c3c6c0e622	Add 'device_lists_outbound_pokes' as extra table. This makes sure we check all the relevant tables to get the current max stream ID. Currently not doing so isn't problematic as the max stream ID in `device_lists_outbound_pokes` is the same as in `device_lists_stream`, however that will change.	2020-02-28 11:15:11 +00:00
Richard van der Hoff	3e99528f2b	Store room version on invite (#6983 ) When we get an invite over federation, store the room version in the rooms table. The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...	2020-02-26 16:58:33 +00:00
Erik Johnston	1f773eec91	Port PresenceHandler to async/await (#6991 )	2020-02-26 15:33:26 +00:00
Erik Johnston	bbf8886a05	Merge worker apps into one. (#6964 )	2020-02-25 16:56:55 +00:00
Erik Johnston	0bd8cf435e	Increase MAX_EVENTS_BEHIND for replication clients	2020-02-21 09:04:33 +00:00
Erik Johnston	de2d267375	Allow moving group read APIs to workers (#6866 )	2020-02-07 11:14:19 +00:00
Erik Johnston	c3d4ad8afd	Fix sending server up commands from workers (#6811 ) Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>	2020-01-30 16:42:11 +00:00
Erik Johnston	e17a110661	Detect unknown remote devices and mark cache as stale (#6776 ) We just mark the fact that the cache may be stale in the database for now.	2020-01-28 14:43:21 +00:00
Erik Johnston	d5275fc55f	Propagate cache invalidates from workers to other workers. (#6748 ) Currently if a worker invalidates a cache it will be streamed to master, which then didn't forward those to other workers.	2020-01-27 13:47:50 +00:00
Erik Johnston	5d7a6ad223	Allow streaming cache invalidate all to workers. (#6749 )	2020-01-22 10:37:00 +00:00
Erik Johnston	a8a50f5b57	Wake up transaction queue when remote server comes back online (#6706 ) This will be used to retry outbound transactions to a remote server if we think it might have come back up.	2020-01-17 10:27:19 +00:00
Erik Johnston	48c3a96886	Port synapse.replication.tcp to async/await (#6666 ) * Port synapse.replication.tcp to async/await * Newsfile * Correctly document type of on_<FOO> functions as async * Don't be overenthusiastic with the asyncing....	2020-01-16 09:16:12 +00:00
Erik Johnston	28c98e51ff	Add `local_current_membership` table (#6655 ) Currently we rely on `current_state_events` to figure out what rooms a user was in and their last membership event in there. However, if the server leaves the room then the table may be cleaned up and that information is lost. So lets add a table that separately holds that information.	2020-01-15 14:59:33 +00:00
Erik Johnston	e8b68a4e4b	Fixup synapse.replication to pass mypy checks (#6667 )	2020-01-14 14:08:06 +00:00
Richard van der Hoff	6964ea095b	Reduce the reconnect time when replication fails. (#6617 )	2020-01-03 14:19:09 +00:00
Erik Johnston	fa780e9721	Change EventContext to use the Storage class (#6564 )	2019-12-20 10:32:02 +00:00
Erik Johnston	9a4fb457cf	Change DataStores to accept 'database' param.	2019-12-06 13:30:06 +00:00
Erik Johnston	a7f20500ff	_CURRENT_STATE_CACHE_NAME is public	2019-12-04 15:45:42 +00:00
Erik Johnston	1056d6885a	Move cache invalidation to main data store	2019-12-04 15:21:14 +00:00
Erik Johnston	2173785f0d	Propagate reason in remotely rejected invites	2019-11-28 11:31:56 +00:00
Andrew Morgan	a8175d0f96	Prevent account_data content from being sent over TCP replication (#6333 )	2019-11-26 13:58:39 +00:00
Erik Johnston	f9f1c8acbb	Merge pull request #6332 from matrix-org/erikj/query_devices_fix Fix caching devices for remote servers in worker.	2019-11-26 12:56:05 +00:00
Erik Johnston	35f9165e96	Fixup docs	2019-11-26 12:04:48 +00:00
Andrew Morgan	cd96b4586f	lint	2019-11-08 15:45:45 +00:00
Andrew Morgan	c4bdf2d785	Remove content from being sent for account data rdata stream	2019-11-08 15:44:02 +00:00
Andrew Morgan	1fe3cc2c9c	Address review comments	2019-11-06 14:54:24 +00:00
Andrew Morgan	4059d61e26	Don't forget to ratelimit calls outside of RegistrationHandler	2019-11-06 12:01:54 +00:00
Erik Johnston	c16e192e2f	Fix caching devices for remote servers in worker. When the `/keys/query` API is hit on client_reader worker Synapse may decide that it needs to resync some remote deivces. Usually this happens on master, and then gets cached. However, that fails on workers and so it falls back to fetching devices from remotes directly, which may in turn fail if the remote is down.	2019-11-05 15:49:43 +00:00
Richard van der Hoff	cc6243b4c0	document the REPLICATE command a bit better (#6305 ) since I found myself wonder how it works	2019-11-04 12:40:18 +00:00
Hubert Chathi	9c94b48bf1	Merge branch 'develop' into uhoreg/cross_signing_fix_workers_notify	2019-10-31 12:32:07 -04:00
Hubert Chathi	f7e4a582ef	clean up code a bit	2019-10-31 12:01:00 -04:00
Andrew Morgan	54fef094b3	Remove usage of deprecated logger.warn method from codebase (#6271 ) Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.	2019-10-31 10:23:24 +00:00
Hubert Chathi	998f7fe7d4	make user signatures a separate stream	2019-10-30 17:22:52 -04:00
Hubert Chathi	670972c0e1	Merge branch 'develop' into uhoreg/cross_signing_fix_workers_notify	2019-10-30 16:46:31 -04:00
Erik Johnston	e577a4b2ad	Port replication http server endpoints to async/await	2019-10-29 13:00:51 +00:00
Hubert Chathi	8ac766c44a	make notification of signatures work with workers	2019-10-24 22:14:58 -04:00
Erik Johnston	bb6264be0b	Merge branch 'develop' of github.com:matrix-org/synapse into erikj/refactor_stores	2019-10-22 10:41:18 +01:00
Erik Johnston	c66a06ac6b	Move storage classes into a main "data store". This is in preparation for having multiple data stores that offer different functionality, e.g. splitting out state or event storage.	2019-10-21 16:05:06 +01:00
Hubert Chathi	8e86f5b65c	Merge branch 'develop' into uhoreg/e2e_cross-signing_merged	2019-09-07 13:20:34 -04:00
Jorik Schellekens	f7c873a643	Trace how long it takes for the send trasaction to complete, including retrys (#5986 )	2019-09-05 17:44:55 +01:00
Jorik Schellekens	909827b422	Add opentracing to all client servlets (#5983 )	2019-09-05 14:46:04 +01:00
Hubert Chathi	a22d58c96c	add user signature stream change cache to slaved device store	2019-09-04 19:32:35 -04:00
Andrew Morgan	b736c6cd3a	Remove bind_email and bind_msisdn (#5964 ) Removes the `bind_email` and `bind_msisdn` parameters from the `/register` C/S API endpoint as per [MSC2140: Terms of Service for ISes and IMs](https://github.com/matrix-org/matrix-doc/pull/2140/files#diff-c03a26de5ac40fb532de19cb7fc2aaf7R107).	2019-09-04 18:24:23 +01:00
Andrew Morgan	4548d1f87e	Remove unnecessary parentheses around return statements (#5931 ) Python will return a tuple whether there are parentheses around the returned values or not. I'm just sick of my editor complaining about this all over the place :)	2019-08-30 16:28:26 +01:00
Jorik Schellekens	812ed6b0d5	Opentracing across workers (#5771 ) Propagate opentracing contexts across workers Also includes some Convenience modifications to opentracing for servlets, notably: - Add boolean to skip the whitelisting check on inject extract methods. - useful when injecting into carriers locally. Otherwise we'd always have to include our own servername and whitelist our servername - start_active_span_from_request instead of header - Add boolean to decide whether to extract context from a request to a servlet	2019-08-22 18:08:07 +01:00
Brendan Abolivier	1c5b8c6222	Revert "Add "require_consent" parameter for registration" This reverts commit `3320aaab3a`.	2019-08-22 14:47:34 +01:00
Half-Shot	3320aaab3a	Add "require_consent" parameter for registration	2019-08-22 14:21:54 +01:00
Andrew Morgan	baf081cd3b	Bugfixes -------- - Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734)) -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAl04Ur0THGFuZHJld0Bh bW9yZ2FuLnh5egAKCRCIhIgNLv5f9F4oD/0TY6S/SEd2uAmzor64ojmbX5BOwPzf j/wzUTrfvuf40EvkNPDpnejNZSvy/ysbaGQaQusv0SQKlV3xrvdn4RuMvnOWVWck kBsO+lvzOaUTR0KHDxN4y9F5eI2NdPbub4847PPVzyqSIHAd+kolxXS8kSBBhwpL yfaICWV/AOy5L7xN+JZ9IQpnegVAvUj5DmgXzDHd6VdeiHDVJuARaBgrR5uCkwVS ZoLRqZ95XV/qiguMAUvPOwyEqht2mwO64989MswP16YYm8oMkB5QA6I5nYnACsTP qk9YcN/oNvEfQXUhttku6MxK1/4yUMPUhEoDBDH7ebc0440QDtWN+IHTdA6oPVZB IuStL9YGY16m7Ltx37ZUA4URfNMiSeLHo3zKc/mCAcwxN4HyOjJewtxbG5zKQAOZ SMs8UcDwGR4zL1hnt8ZDNYtWwfzJBQIdGjoHvjXJEY7/1csTv2lmAwewFTXiqSAr 30GW5ews94kotqBK53zZT6V0F5gHNqgGHniOz1ZpqLLxYLqO3LSAGe97CrqlWUdX GkhA9tZyweknociD9fyyBmKdcFJ4mL4a+oGI5CMnSMph8UvCY8Y5XMb1T+iYEABI tA9G3mBvgkLPj+5V+8QggNkBafSigW2Q4FX7enGsDmiiskZOtfeKrAcVkapD4ooi 3I7IW5aetZr2IQ== =+JBn -----END PGP SIGNATURE----- Merge tag 'v1.2.0rc2' into develop Bugfixes -------- - Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))	2019-07-24 13:47:51 +01:00
Jorik Schellekens	cf2972c818	Fix servlet metric names (#5734 ) * Fix servlet metric names Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> * Remove redundant check * Cover all return paths	2019-07-24 13:07:35 +01:00
Amber Brown	4806651744	Replace returnValue with return (#5736 )	2019-07-23 23:00:55 +10:00
Richard van der Hoff	824707383b	Remove access-token support from RegistrationHandler.register (#5641 ) Nothing uses this now, so we can remove the dead code, and clean up the API. Since we're changing the shape of the return value anyway, we take the opportunity to give the method a better name.	2019-07-08 19:01:08 +01:00
Richard van der Hoff	80cc82a445	Remove support for invite_3pid_guest. (#5625 ) This has never been documented, and I'm not sure it's ever been used outside sytest. It's quite a lot of poorly-maintained code, so I'd like to get rid of it. For now I haven't removed the database table; I suggest we leave that for a future clearout.	2019-07-05 16:47:58 +01:00
Amber Brown	463b072b12	Move logging utilities out of the side drawer of util/ and into logging/ (#5606 )	2019-07-04 00:07:04 +10:00
Amber Brown	32e7c9e7f2	Run Black. (#5482 )	2019-06-20 19:32:02 +10:00
Erik Johnston	6745b7de6d	Handle failing to talk to master over replication	2019-06-07 10:47:31 +01:00
Erik Johnston	5dbff34509	Fixup bsaed on review comments	2019-05-17 15:48:04 +01:00
Erik Johnston	d46aab3fa8	Add basic editing support	2019-05-16 16:54:45 +01:00
Erik Johnston	b5c62c6b26	Fix relations in worker mode	2019-05-16 10:38:13 +01:00
Richard van der Hoff	f50efcb65d	Replace SlavedKeyStore with a shim since we're pulling everything out of KeyStore anyway, we may as well simplify it.	2019-04-08 23:59:07 +01:00
Richard van der Hoff	3352baac4b	Remove unused server_tls_certificates functions (#5028 ) These have been unused since #4120, and with the demise of perspectives, it is unlikely that they will ever be used again.	2019-04-08 21:50:18 +01:00
Neil Johnson	e8419554ff	Remove presence lists (#4989 ) Remove presence list support as per MSC 1819	2019-04-03 11:11:15 +01:00
Richard van der Hoff	297bf2547e	Fix sync bug when accepting invites (#4956 ) Hopefully this time we really will fix #4422. We need to make sure that the cache on `get_rooms_for_user_with_stream_ordering` is invalidated before the SyncHandler is notified for the new events, and we can now do so reliably via the `events` stream.	2019-04-02 12:42:39 +01:00
Richard van der Hoff	4b91c313a9	Combine the CurrentStateDeltaStream into the EventStream	2019-03-27 22:07:05 +00:00
Richard van der Hoff	1f6d6f918a	Make EventStream rows have a type ... as a precursor to combining it with the CurrentStateDelta stream.	2019-03-27 22:07:05 +00:00
Richard van der Hoff	015b3622eb	Skip building a ROW_TYPE when building updates We're about to turn it straight into a JSON object anyway so building a ROW_TYPE is a bit pointless, and reduces flexibility in the update_function.	2019-03-27 21:58:03 +00:00
Richard van der Hoff	f570916a3e	Add parse_row method to replication stream class This will allow individual stream classes to override how a row is parsed.	2019-03-27 21:32:33 +00:00
Richard van der Hoff	71dcb275f1	move FederationStream out to its own file	2019-03-27 21:13:14 +00:00
Richard van der Hoff	aa1e017864	move EventsStream out to its own file	2019-03-27 21:13:14 +00:00
Richard van der Hoff	a5798de067	Move replication.tcp.streams into a package	2019-03-27 21:13:14 +00:00
Richard van der Hoff	acaa18f7dd	Fix/improve some docstrings in the replication code. (#4949 )	2019-03-27 21:12:36 +00:00
Richard van der Hoff	8cbbedaa2b	Fix ClientReplicationStreamProtocol.__str__ (#4929 ) `__str__` depended on `self.addr`, which was absent from ClientReplicationStreamProtocol, so attempting to call str on such an object would raise an exception. We can calculate the peer addr from the transport, so there is no need for addr anyway.	2019-03-25 16:41:51 +00:00
Richard van der Hoff	9bde730ef8	Fix bug where read-receipts lost their timestamps (#4927 ) Make sure that they are sent correctly over the replication stream. Fixes: #4898	2019-03-25 16:38:05 +00:00
Richard van der Hoff	cdb8036161	Add a config option for torture-testing worker replication. (#4902 ) Setting this to 50 or so makes a bunch of sytests fail in worker mode.	2019-03-20 16:04:35 +00:00
Erik Johnston	face0c5b3c	Prefill client IPs cache on workers	2019-03-06 17:39:32 +00:00
Andrew Morgan	7b8a157b79	Merge pull request #4792 from matrix-org/anoa/replication_tokens Support batch updates in the worker sender	2019-03-06 15:48:29 +00:00
Brendan Abolivier	a4c3a361b7	Add rate-limiting on registration (#4735 ) * Rate-limiting for registration * Add unit test for registration rate limiting * Add config parameters for rate limiting on auth endpoints * Doc * Fix doc of rate limiting function Co-Authored-By: babolivier <contact@brendanabolivier.com> * Incorporate review * Fix config parsing * Fix linting errors * Set default config for auth rate limiting * Fix tests * Add changelog * Advance reactor instead of mocked clock * Move parameters to registration specific config and give them more sensible default values * Remove unused config options * Don't mock the rate limiter un MAU tests * Rename _register_with_store into register_with_store * Make CI happy * Remove unused import * Update sample config * Fix ratelimiting test for py2 * Add non-guest test	2019-03-05 14:25:33 +00:00
Andrew Morgan	b9f6163092	Simplify token replication logic	2019-03-05 13:58:30 +00:00
Erik Johnston	a84b8d56c2	Fixup slave stores	2019-03-04 18:04:57 +00:00
Andrew Morgan	fe7bd23a85	Clean up logic and add comments	2019-03-04 15:08:15 +00:00
Andrew Morgan	9f7cdf3da1	Clearer branching, fix missing list clear	2019-03-04 14:36:52 +00:00
Andrew Morgan	5f0c449dd5	Prevent replication wedging	2019-03-04 14:03:18 +00:00
Erik Johnston	1e315017d3	When presence is enabled don't send over replication	2019-02-27 13:53:46 +00:00
Erik Johnston	7590e9fa28	Merge pull request #4749 from matrix-org/erikj/replication_connection_backoff Fix tightloop over connecting to replication server	2019-02-27 11:00:59 +00:00
Erik Johnston	6bb1c028f1	Limit cache invalidation replication line length (#4748 )	2019-02-27 10:28:37 +00:00
Erik Johnston	6870fc496f	Move connecting logic into ClientReplicationStreamProtocol	2019-02-27 10:23:51 +00:00
Erik Johnston	25814921f1	Increase the max delay between retry attempts Otherwise if you have many workers they can easily take out master with their connection attempts	2019-02-26 15:12:33 +00:00
Erik Johnston	313987187e	Fix tightloop over connecting to replication server If the client failed to process incoming commands during the initial set up of the replication connection it would immediately disconnect and reconnect, resulting in a tightloop. This can happen, for example, when subscribing to a stream that has a row that is too long in the backlog. The fix here is to not consider the connection successfully set up until the client has succesfully subscribed and caught up with the streams. This ensures that the retry logic timers aren't reset until then, meaning that if an error does happen during start up the client will continue backing off before retrying again.	2019-02-26 15:05:41 +00:00
Erik Johnston	80467bbac3	Fix state cache invalidation on workers	2019-02-22 14:38:14 +00:00
Erik Johnston	dbdc565dfd	Fix registration on workers (#4682 ) * Move RegistrationHandler init to HomeServer * Move post registration actions to RegistrationHandler * Add post regisration replication endpoint * Newsfile	2019-02-20 18:47:31 +11:00
Erik Johnston	a9b5ea6fc1	Batch cache invalidation over replication Currently whenever the current state changes in a room invalidate a lot of caches, which cause a lot of traffic over replication. Instead, lets batch up all those invalidations and send a single poke down the replication streams. Hopefully this will reduce load on the master process by substantially reducing traffic.	2019-02-18 17:53:31 +00:00
Erik Johnston	af691e415c	Move register_device into handler	2019-02-18 16:49:38 +00:00
Erik Johnston	eb2b8523ae	Split out registration to worker This allows registration to be handled by a worker, though the actual write to the database still happens on master. Note: due to the in-memory session map all registration requests must be handled by the same worker.	2019-02-18 12:12:57 +00:00
Erik Johnston	a4f52a33fe	Fix replication for room v3 (#4523 ) * Fix replication for room v3 We were not correctly quoting the path fragments over http replication, which meant that it exploded when the event IDs had a slash in them * Newsfile	2019-01-30 14:19:52 +00:00
Erik Johnston	b6b73a0bcf	Fix receiving events from federation via a worker This bug was introduced in PR #4470, commit `678a92cb56`	2019-01-29 10:30:26 +00:00
Erik Johnston	678a92cb56	Replace missed usages of FrozenEvent	2019-01-25 10:32:30 +00:00
Erik Johnston	be6a7e47fa	Revert "Require event format version to parse or create events"	2019-01-25 10:23:51 +00:00
Erik Johnston	e8c9f15397	Replace missed usages of FrozenEvent	2019-01-24 11:14:07 +00:00
Erik Johnston	a163b748a5	Don't truncate command name in metrics	2018-10-29 17:34:21 +00:00
Amber Brown	c4b3698a80	Make the replication logger quieter (#4108 )	2018-10-29 22:59:44 +11:00
Amber Brown	381d2cfdf0	Make workers work on Py3 (#4027 )	2018-10-13 00:14:08 +11:00
Travis Ralston	f1a7264663	Fix minor typo in exception	2018-09-13 11:51:12 -06:00
Amber Brown	7c27c4d51c	merge (#3576 )	2018-09-14 03:11:11 +10:00
Erik Johnston	3e242dc149	Remove conn_id	2018-09-04 11:45:52 +01:00
Erik Johnston	b13836da7f	Remove conn_id from repl prometheus metrics `conn_id` gets set to a random string, and so we end up filling up prometheus with tonnes of data series, which is bad.	2018-09-03 17:22:49 +01:00
Erik Johnston	2aa7cc6a46	Merge pull request #3713 from matrix-org/erikj/fixup_fed_logging Fix logging bug in EDU handling over replication	2018-08-20 10:51:45 +01:00
Erik Johnston	3b2dcfff78	Fix logging bug in EDU handling over replication	2018-08-17 11:11:06 +01:00

... 2 3 4 5 6 ...

620 Commits