Commit Graph

435 Commits

Author SHA1 Message Date
Erik Johnston
4f21c33be3
Remove usage of "conn_id" for presence. (#7128)
* Remove `conn_id` usage for UserSyncCommand.

Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.

This really only effects UserSyncCommand.

* Add CLEAR_USER_SYNCS command that is sent on shutdown.

This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
2020-03-30 16:37:24 +01:00
Erik Johnston
4cff617df1
Move catchup of replication streams to worker. (#7024)
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
2020-03-25 14:54:01 +00:00
Richard van der Hoff
a564b92d37
Convert *StreamRow classes to inner classes (#7116)
This just helps keep the rows closer to their streams, so that it's easier to
see what the format of each stream is.
2020-03-23 13:59:11 +00:00
Richard van der Hoff
b3cee0ce67
Fix processing of groups stream, and use symbolic names for streams (#7117)
`groups` != `receipts`

Introduced in #6964
2020-03-23 11:39:36 +00:00
Erik Johnston
fdb1344716
Remove concept of a non-limited stream. (#7011) 2020-03-20 14:40:47 +00:00
Erik Johnston
a319cb1dd1
Change device list streams to have one row per ID (#7010)
* Add 'device_lists_outbound_pokes' as extra table.

This makes sure we check all the relevant tables to get the current max
stream ID.

Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.

* Change device lists stream to have one row per id.

This will make it possible to process the streams more incrementally,
avoiding having to process large chunks at once.

* Change device list replication to match new semantics.

Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).

* Newsfile

* Remove handling of multiple rows per ID

* Fix worker handling

* Comments from review
2020-03-19 11:36:53 +00:00
Erik Johnston
6e6476ef07 Comments from review 2020-03-18 10:13:55 +00:00
Richard van der Hoff
78a15b1f9d
Store room_versions in EventBase objects (#6875)
This is a bit fiddly because it all has to be done on one fell swoop:

* Wherever we create a new event, pass in the room version (and check it matches the format version)
* When we prune an event, use the room version of the unpruned event to create the pruned version.
* When we pass an event over the replication protocol, pass the room version over alongside it, and use it when deserialising the event again.
2020-03-05 15:46:44 +00:00
Erik Johnston
9ce4e344a8 Change device list replication to match new semantics.
Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).
2020-02-28 11:25:34 +00:00
Erik Johnston
c3c6c0e622 Add 'device_lists_outbound_pokes' as extra table.
This makes sure we check all the relevant tables to get the current max
stream ID.

Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
2020-02-28 11:15:11 +00:00
Richard van der Hoff
3e99528f2b
Store room version on invite (#6983)
When we get an invite over federation, store the room version in the rooms table.

The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
2020-02-26 16:58:33 +00:00
Erik Johnston
1f773eec91
Port PresenceHandler to async/await (#6991) 2020-02-26 15:33:26 +00:00
Erik Johnston
bbf8886a05
Merge worker apps into one. (#6964) 2020-02-25 16:56:55 +00:00
Erik Johnston
0bd8cf435e Increase MAX_EVENTS_BEHIND for replication clients 2020-02-21 09:04:33 +00:00
Erik Johnston
de2d267375
Allow moving group read APIs to workers (#6866) 2020-02-07 11:14:19 +00:00
Erik Johnston
c3d4ad8afd
Fix sending server up commands from workers (#6811)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2020-01-30 16:42:11 +00:00
Erik Johnston
e17a110661
Detect unknown remote devices and mark cache as stale (#6776)
We just mark the fact that the cache may be stale in the database for
now.
2020-01-28 14:43:21 +00:00
Erik Johnston
d5275fc55f
Propagate cache invalidates from workers to other workers. (#6748)
Currently if a worker invalidates a cache it will be streamed to master, which then didn't forward those to other workers.
2020-01-27 13:47:50 +00:00
Erik Johnston
5d7a6ad223
Allow streaming cache invalidate all to workers. (#6749) 2020-01-22 10:37:00 +00:00
Erik Johnston
a8a50f5b57
Wake up transaction queue when remote server comes back online (#6706)
This will be used to retry outbound transactions to a remote server if
we think it might have come back up.
2020-01-17 10:27:19 +00:00
Erik Johnston
48c3a96886
Port synapse.replication.tcp to async/await (#6666)
* Port synapse.replication.tcp to async/await

* Newsfile

* Correctly document type of on_<FOO> functions as async

* Don't be overenthusiastic with the asyncing....
2020-01-16 09:16:12 +00:00
Erik Johnston
28c98e51ff
Add local_current_membership table (#6655)
Currently we rely on `current_state_events` to figure out what rooms a
user was in and their last membership event in there. However, if the
server leaves the room then the table may be cleaned up and that
information is lost. So lets add a table that separately holds that
information.
2020-01-15 14:59:33 +00:00
Erik Johnston
e8b68a4e4b
Fixup synapse.replication to pass mypy checks (#6667) 2020-01-14 14:08:06 +00:00
Richard van der Hoff
6964ea095b
Reduce the reconnect time when replication fails. (#6617) 2020-01-03 14:19:09 +00:00
Erik Johnston
fa780e9721
Change EventContext to use the Storage class (#6564) 2019-12-20 10:32:02 +00:00
Erik Johnston
9a4fb457cf Change DataStores to accept 'database' param. 2019-12-06 13:30:06 +00:00
Erik Johnston
a7f20500ff _CURRENT_STATE_CACHE_NAME is public 2019-12-04 15:45:42 +00:00
Erik Johnston
1056d6885a Move cache invalidation to main data store 2019-12-04 15:21:14 +00:00
Erik Johnston
2173785f0d Propagate reason in remotely rejected invites 2019-11-28 11:31:56 +00:00
Andrew Morgan
a8175d0f96
Prevent account_data content from being sent over TCP replication (#6333) 2019-11-26 13:58:39 +00:00
Erik Johnston
f9f1c8acbb
Merge pull request #6332 from matrix-org/erikj/query_devices_fix
Fix caching devices for remote servers in worker.
2019-11-26 12:56:05 +00:00
Erik Johnston
35f9165e96 Fixup docs 2019-11-26 12:04:48 +00:00
Andrew Morgan
cd96b4586f lint 2019-11-08 15:45:45 +00:00
Andrew Morgan
c4bdf2d785 Remove content from being sent for account data rdata stream 2019-11-08 15:44:02 +00:00
Andrew Morgan
1fe3cc2c9c Address review comments 2019-11-06 14:54:24 +00:00
Andrew Morgan
4059d61e26 Don't forget to ratelimit calls outside of RegistrationHandler 2019-11-06 12:01:54 +00:00
Erik Johnston
c16e192e2f Fix caching devices for remote servers in worker.
When the `/keys/query` API is hit on client_reader worker Synapse may
decide that it needs to resync some remote deivces. Usually this happens
on master, and then gets cached. However, that fails on workers and so
it falls back to fetching devices from remotes directly, which may in
turn fail if the remote is down.
2019-11-05 15:49:43 +00:00
Richard van der Hoff
cc6243b4c0
document the REPLICATE command a bit better (#6305)
since I found myself wonder how it works
2019-11-04 12:40:18 +00:00
Hubert Chathi
9c94b48bf1 Merge branch 'develop' into uhoreg/cross_signing_fix_workers_notify 2019-10-31 12:32:07 -04:00
Hubert Chathi
f7e4a582ef clean up code a bit 2019-10-31 12:01:00 -04:00
Andrew Morgan
54fef094b3
Remove usage of deprecated logger.warn method from codebase (#6271)
Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
2019-10-31 10:23:24 +00:00
Hubert Chathi
998f7fe7d4 make user signatures a separate stream 2019-10-30 17:22:52 -04:00
Hubert Chathi
670972c0e1 Merge branch 'develop' into uhoreg/cross_signing_fix_workers_notify 2019-10-30 16:46:31 -04:00
Erik Johnston
e577a4b2ad Port replication http server endpoints to async/await 2019-10-29 13:00:51 +00:00
Hubert Chathi
8ac766c44a make notification of signatures work with workers 2019-10-24 22:14:58 -04:00
Erik Johnston
bb6264be0b Merge branch 'develop' of github.com:matrix-org/synapse into erikj/refactor_stores 2019-10-22 10:41:18 +01:00
Erik Johnston
c66a06ac6b Move storage classes into a main "data store".
This is in preparation for having multiple data stores that offer
different functionality, e.g. splitting out state or event storage.
2019-10-21 16:05:06 +01:00
Hubert Chathi
8e86f5b65c Merge branch 'develop' into uhoreg/e2e_cross-signing_merged 2019-09-07 13:20:34 -04:00
Jorik Schellekens
f7c873a643
Trace how long it takes for the send trasaction to complete, including retrys (#5986) 2019-09-05 17:44:55 +01:00
Jorik Schellekens
909827b422
Add opentracing to all client servlets (#5983) 2019-09-05 14:46:04 +01:00
Hubert Chathi
a22d58c96c add user signature stream change cache to slaved device store 2019-09-04 19:32:35 -04:00
Andrew Morgan
b736c6cd3a
Remove bind_email and bind_msisdn (#5964)
Removes the `bind_email` and `bind_msisdn` parameters from the `/register` C/S API endpoint as per [MSC2140: Terms of Service for ISes and IMs](https://github.com/matrix-org/matrix-doc/pull/2140/files#diff-c03a26de5ac40fb532de19cb7fc2aaf7R107).
2019-09-04 18:24:23 +01:00
Andrew Morgan
4548d1f87e
Remove unnecessary parentheses around return statements (#5931)
Python will return a tuple whether there are parentheses around the returned values or not.

I'm just sick of my editor complaining about this all over the place :)
2019-08-30 16:28:26 +01:00
Jorik Schellekens
812ed6b0d5
Opentracing across workers (#5771)
Propagate opentracing contexts across workers


Also includes some Convenience modifications to opentracing for servlets, notably:
- Add boolean to skip the whitelisting check on inject
  extract methods. - useful when injecting into carriers
  locally. Otherwise we'd always have to include our
  own servername and whitelist our servername
- start_active_span_from_request instead of header
- Add boolean to decide whether to extract context
  from a request to a servlet
2019-08-22 18:08:07 +01:00
Brendan Abolivier
1c5b8c6222 Revert "Add "require_consent" parameter for registration"
This reverts commit 3320aaab3a.
2019-08-22 14:47:34 +01:00
Half-Shot
3320aaab3a Add "require_consent" parameter for registration 2019-08-22 14:21:54 +01:00
Andrew Morgan
baf081cd3b Bugfixes
--------
 
 - Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAl04Ur0THGFuZHJld0Bh
 bW9yZ2FuLnh5egAKCRCIhIgNLv5f9F4oD/0TY6S/SEd2uAmzor64ojmbX5BOwPzf
 j/wzUTrfvuf40EvkNPDpnejNZSvy/ysbaGQaQusv0SQKlV3xrvdn4RuMvnOWVWck
 kBsO+lvzOaUTR0KHDxN4y9F5eI2NdPbub4847PPVzyqSIHAd+kolxXS8kSBBhwpL
 yfaICWV/AOy5L7xN+JZ9IQpnegVAvUj5DmgXzDHd6VdeiHDVJuARaBgrR5uCkwVS
 ZoLRqZ95XV/qiguMAUvPOwyEqht2mwO64989MswP16YYm8oMkB5QA6I5nYnACsTP
 qk9YcN/oNvEfQXUhttku6MxK1/4yUMPUhEoDBDH7ebc0440QDtWN+IHTdA6oPVZB
 IuStL9YGY16m7Ltx37ZUA4URfNMiSeLHo3zKc/mCAcwxN4HyOjJewtxbG5zKQAOZ
 SMs8UcDwGR4zL1hnt8ZDNYtWwfzJBQIdGjoHvjXJEY7/1csTv2lmAwewFTXiqSAr
 30GW5ews94kotqBK53zZT6V0F5gHNqgGHniOz1ZpqLLxYLqO3LSAGe97CrqlWUdX
 GkhA9tZyweknociD9fyyBmKdcFJ4mL4a+oGI5CMnSMph8UvCY8Y5XMb1T+iYEABI
 tA9G3mBvgkLPj+5V+8QggNkBafSigW2Q4FX7enGsDmiiskZOtfeKrAcVkapD4ooi
 3I7IW5aetZr2IQ==
 =+JBn
 -----END PGP SIGNATURE-----

Merge tag 'v1.2.0rc2' into develop

Bugfixes
--------

- Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
2019-07-24 13:47:51 +01:00
Jorik Schellekens
cf2972c818
Fix servlet metric names (#5734)
* Fix servlet metric names

Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>

* Remove redundant check

* Cover all return paths
2019-07-24 13:07:35 +01:00
Amber Brown
4806651744
Replace returnValue with return (#5736) 2019-07-23 23:00:55 +10:00
Richard van der Hoff
824707383b
Remove access-token support from RegistrationHandler.register (#5641)
Nothing uses this now, so we can remove the dead code, and clean up the
API.

Since we're changing the shape of the return value anyway, we take the
opportunity to give the method a better name.
2019-07-08 19:01:08 +01:00
Richard van der Hoff
80cc82a445
Remove support for invite_3pid_guest. (#5625)
This has never been documented, and I'm not sure it's ever been used outside
sytest.

It's quite a lot of poorly-maintained code, so I'd like to get rid of it.

For now I haven't removed the database table; I suggest we leave that for a
future clearout.
2019-07-05 16:47:58 +01:00
Amber Brown
463b072b12
Move logging utilities out of the side drawer of util/ and into logging/ (#5606) 2019-07-04 00:07:04 +10:00
Amber Brown
32e7c9e7f2
Run Black. (#5482) 2019-06-20 19:32:02 +10:00
Erik Johnston
6745b7de6d Handle failing to talk to master over replication 2019-06-07 10:47:31 +01:00
Erik Johnston
5dbff34509 Fixup bsaed on review comments 2019-05-17 15:48:04 +01:00
Erik Johnston
d46aab3fa8 Add basic editing support 2019-05-16 16:54:45 +01:00
Erik Johnston
b5c62c6b26 Fix relations in worker mode 2019-05-16 10:38:13 +01:00
Richard van der Hoff
f50efcb65d Replace SlavedKeyStore with a shim
since we're pulling everything out of KeyStore anyway, we may as well simplify
it.
2019-04-08 23:59:07 +01:00
Richard van der Hoff
3352baac4b
Remove unused server_tls_certificates functions (#5028)
These have been unused since #4120, and with the demise of perspectives, it is
unlikely that they will ever be used again.
2019-04-08 21:50:18 +01:00
Neil Johnson
e8419554ff
Remove presence lists (#4989)
Remove presence list support as per MSC 1819
2019-04-03 11:11:15 +01:00
Richard van der Hoff
297bf2547e
Fix sync bug when accepting invites (#4956)
Hopefully this time we really will fix #4422.

We need to make sure that the cache on
`get_rooms_for_user_with_stream_ordering` is invalidated *before* the
SyncHandler is notified for the new events, and we can now do so reliably via
the `events` stream.
2019-04-02 12:42:39 +01:00
Richard van der Hoff
4b91c313a9 Combine the CurrentStateDeltaStream into the EventStream 2019-03-27 22:07:05 +00:00
Richard van der Hoff
1f6d6f918a Make EventStream rows have a type
... as a precursor to combining it with the CurrentStateDelta stream.
2019-03-27 22:07:05 +00:00
Richard van der Hoff
015b3622eb Skip building a ROW_TYPE when building updates
We're about to turn it straight into a JSON object anyway so building a
ROW_TYPE is a bit pointless, and reduces flexibility in the update_function.
2019-03-27 21:58:03 +00:00
Richard van der Hoff
f570916a3e Add parse_row method to replication stream class
This will allow individual stream classes to override how a row is parsed.
2019-03-27 21:32:33 +00:00
Richard van der Hoff
71dcb275f1 move FederationStream out to its own file 2019-03-27 21:13:14 +00:00
Richard van der Hoff
aa1e017864 move EventsStream out to its own file 2019-03-27 21:13:14 +00:00
Richard van der Hoff
a5798de067 Move replication.tcp.streams into a package 2019-03-27 21:13:14 +00:00
Richard van der Hoff
acaa18f7dd
Fix/improve some docstrings in the replication code. (#4949) 2019-03-27 21:12:36 +00:00
Richard van der Hoff
8cbbedaa2b
Fix ClientReplicationStreamProtocol.__str__ (#4929)
`__str__` depended on `self.addr`, which was absent from
ClientReplicationStreamProtocol, so attempting to call str on such an object
would raise an exception.

We can calculate the peer addr from the transport, so there is no need for addr
anyway.
2019-03-25 16:41:51 +00:00
Richard van der Hoff
9bde730ef8
Fix bug where read-receipts lost their timestamps (#4927)
Make sure that they are sent correctly over the replication stream.

Fixes: #4898
2019-03-25 16:38:05 +00:00
Richard van der Hoff
cdb8036161
Add a config option for torture-testing worker replication. (#4902)
Setting this to 50 or so makes a bunch of sytests fail in worker mode.
2019-03-20 16:04:35 +00:00
Erik Johnston
face0c5b3c Prefill client IPs cache on workers 2019-03-06 17:39:32 +00:00
Andrew Morgan
7b8a157b79
Merge pull request #4792 from matrix-org/anoa/replication_tokens
Support batch updates in the worker sender
2019-03-06 15:48:29 +00:00
Brendan Abolivier
a4c3a361b7
Add rate-limiting on registration (#4735)
* Rate-limiting for registration

* Add unit test for registration rate limiting

* Add config parameters for rate limiting on auth endpoints

* Doc

* Fix doc of rate limiting function

Co-Authored-By: babolivier <contact@brendanabolivier.com>

* Incorporate review

* Fix config parsing

* Fix linting errors

* Set default config for auth rate limiting

* Fix tests

* Add changelog

* Advance reactor instead of mocked clock

* Move parameters to registration specific config and give them more sensible default values

* Remove unused config options

* Don't mock the rate limiter un MAU tests

* Rename _register_with_store into register_with_store

* Make CI happy

* Remove unused import

* Update sample config

* Fix ratelimiting test for py2

* Add non-guest test
2019-03-05 14:25:33 +00:00
Andrew Morgan
b9f6163092 Simplify token replication logic 2019-03-05 13:58:30 +00:00
Erik Johnston
a84b8d56c2 Fixup slave stores 2019-03-04 18:04:57 +00:00
Andrew Morgan
fe7bd23a85 Clean up logic and add comments 2019-03-04 15:08:15 +00:00
Andrew Morgan
9f7cdf3da1 Clearer branching, fix missing list clear 2019-03-04 14:36:52 +00:00
Andrew Morgan
5f0c449dd5 Prevent replication wedging 2019-03-04 14:03:18 +00:00
Erik Johnston
1e315017d3 When presence is enabled don't send over replication 2019-02-27 13:53:46 +00:00
Erik Johnston
7590e9fa28
Merge pull request #4749 from matrix-org/erikj/replication_connection_backoff
Fix tightloop over connecting to replication server
2019-02-27 11:00:59 +00:00
Erik Johnston
6bb1c028f1 Limit cache invalidation replication line length (#4748) 2019-02-27 10:28:37 +00:00
Erik Johnston
6870fc496f Move connecting logic into ClientReplicationStreamProtocol 2019-02-27 10:23:51 +00:00
Erik Johnston
25814921f1 Increase the max delay between retry attempts
Otherwise if you have many workers they can easily take out master with
their connection attempts
2019-02-26 15:12:33 +00:00
Erik Johnston
313987187e Fix tightloop over connecting to replication server
If the client failed to process incoming commands during the initial set
up of the replication connection it would immediately disconnect and
reconnect, resulting in a tightloop.

This can happen, for example, when subscribing to a stream that has a
row that is too long in the backlog.

The fix here is to not consider the connection successfully set up until
the client has succesfully subscribed and caught up with the streams.
This ensures that the retry logic timers aren't reset until then,
meaning that if an error does happen during start up the client will
continue backing off before retrying again.
2019-02-26 15:05:41 +00:00
Erik Johnston
80467bbac3 Fix state cache invalidation on workers 2019-02-22 14:38:14 +00:00
Erik Johnston
dbdc565dfd Fix registration on workers (#4682)
* Move RegistrationHandler init to HomeServer

* Move post registration actions to RegistrationHandler

* Add post regisration replication endpoint

* Newsfile
2019-02-20 18:47:31 +11:00
Erik Johnston
a9b5ea6fc1 Batch cache invalidation over replication
Currently whenever the current state changes in a room invalidate a lot
of caches, which cause *a lot* of traffic over replication. Instead,
lets batch up all those invalidations and send a single poke down
the replication streams.

Hopefully this will reduce load on the master process by substantially
reducing traffic.
2019-02-18 17:53:31 +00:00
Erik Johnston
af691e415c Move register_device into handler 2019-02-18 16:49:38 +00:00