Commit Graph

716 Commits

Author SHA1 Message Date
Richard van der Hoff
a8c17da245 Merge branch 'release-v1.13.0' into rav/fix_dropped_messages 2020-05-05 23:01:12 +01:00
Richard van der Hoff
1242267316 Merge branch 'release-v1.13.0' into rav/fix_dropped_messages 2020-05-05 22:38:44 +01:00
Richard van der Hoff
7f7eedbebb Wait for a POSITION on the right connection before accepting RDATA
... otherwise we can believe we're up to date when we're not.
2020-05-05 22:38:16 +01:00
Brendan Abolivier
5b8023dc7f
Move logs about discarded RDATA to debug (#7421) 2020-05-05 21:07:33 +02:00
Richard van der Hoff
d78265af0c Wait to subscribe before sending REPLICATE 2020-05-05 19:31:37 +01:00
Richard van der Hoff
d5aa7d93ed
Fix catchup-on-reconnect for the Federation Stream (#7374)
looks like we managed to break this during the refactorathon.
2020-05-05 14:15:57 +01:00
Erik Johnston
350421e058
Fix redis password support. (#7401)
We forgot to set the password on the subscriber connection, as well as
not calling super methods for overridden connectionMade/connectionLost
functions.
2020-05-04 14:04:09 +01:00
Erik Johnston
0e719f2398
Thread through instance name to replication client. (#7369)
For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.
2020-05-01 17:19:56 +01:00
Erik Johnston
3085cde577
Use stream.current_token() and remove stream_positions() (#7172)
We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.
2020-05-01 15:21:35 +01:00
Richard van der Hoff
b2dba06079
Workaround for assertion errors from db_query_to_update_function (#7378)
Hopefully this is no worse than what we have on master...
2020-05-01 09:25:16 +01:00
Erik Johnston
37f6823f5b
Add instance name to RDATA/POSITION commands (#7364)
This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.
2020-04-29 16:23:08 +01:00
Erik Johnston
3eab76ad43
Don't relay REMOTE_SERVER_UP cmds to same conn. (#7352)
For direct TCP connections we need the master to relay REMOTE_SERVER_UP
commands to the other connections so that all instances get notified
about it. The old implementation just relayed to all connections,
assuming that sending back to the original sender of the command was
safe. This is not true for redis, where commands sent get echoed back to
the sender, which was causing master to effectively infinite loop
sending and then re-receiving REMOTE_SERVER_UP commands that it sent.

The fix is to ensure that we only relay to *other* connections and not
to the connection we received the notification from.

Fixes #7334.
2020-04-29 14:10:59 +01:00
Richard van der Hoff
c2e1a2110f
Fix limit logic for EventsStream (#7358)
* Factor out functions for injecting events into database

I want to add some more flexibility to the tools for injecting events into the
database, and I don't want to clutter up HomeserverTestCase with them, so let's
factor them out to a new file.

* Rework TestReplicationDataHandler

This wasn't very easy to work with: the mock wrapping was largely superfluous,
and it's useful to be able to inspect the received rows, and clear out the
received list.

* Fix AssertionErrors being thrown by EventsStream

Part of the problem was that there was an off-by-one error in the assertion,
but also the limit logic was too simple. Fix it all up and add some tests.
2020-04-29 12:30:36 +01:00
Erik Johnston
38919b521e
Run replication streamers on workers (#7146)
Currently we never write to streams from workers, but that will change soon
2020-04-28 13:34:12 +01:00
Richard van der Hoff
ce428a1abe Fix EventsStream raising assertions when it falls behind
Figuring out how to correctly limit updates from this stream without dropping
entries is far more complicated than just counting the number of rows being
returned. We need to consider each query separately and, if any one query hits
the limit, truncate the results from the others.

I think this also fixes some potentially long-standing bugs where events or
state changes could get missed if we hit the limit on either query.
2020-04-24 13:59:21 +01:00
Richard van der Hoff
9cbdfb3a2f Make it clear that the limit for an update_function is a target 2020-04-23 15:45:12 +01:00
Richard van der Hoff
23b28266ac Remove 'limit' param from get_repl_stream_updates API
there doesn't seem to be much point in passing this limit all around, since
both sides agree it's meant to be 100.
2020-04-23 15:44:35 +01:00
Richard van der Hoff
71a1abb8a1
Stop the master relaying USER_SYNC for other workers (#7318)
Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.

In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.

Fixes (I hope) #7257.
2020-04-22 22:39:04 +01:00
Erik Johnston
841c581c40
Fix replication metrics when using redis (#7325) 2020-04-22 16:26:19 +01:00
Richard van der Hoff
82d8b1dd1f
Another go at fixing one-word commands (#7326)
I messed this up last time I tried (#7239 / e13c6c7).
2020-04-22 14:34:31 +01:00
Erik Johnston
51f7eaf908
Add ability to run replication protocol over redis. (#7040)
This is configured via the `redis` config options.
2020-04-22 13:07:41 +01:00
Richard van der Hoff
0f8f02bc39
On catchup, process each row with its own stream id (#7286)
Other parts of the code (such as the StreamChangeCache) assume that there will
not be multiple changes with the same stream id.

This code was introduced in #7024, and I hope this fixes #7206.
2020-04-20 11:43:29 +01:00
Richard van der Hoff
67ff7b8ba0
Improve type checking in replication.tcp.Stream (#7291)
The general idea here is to get rid of the type: ignore annotations on all of the current_token and update_function assignments, which would have caught #7290.

After a bit of experimentation, it seems like the least-awful way to do this is to pass the offending functions in as parameters to the Stream constructor. Unfortunately that means that the concrete implementations no longer have the same constructor signature as Stream itself, which means that it gets hard to correctly annotate STREAMS_MAP.

I've also introduced a couple of new types, to take out some duplication.
2020-04-17 14:49:55 +01:00
Richard van der Hoff
d7d42387f5
Fix 'generator object is not subscriptable' error (#7290)
Some of the query functions return generators rather than lists, so we can't
index into the result. Happily we already have a copy of the results.

(think this was introduced in #7024)
2020-04-16 14:37:06 +01:00
Richard van der Hoff
e13c6c7a96 Handle one-word replication commands correctly
`REPLICATE` is now a valid command, and it's nice if you can issue it from the
console without remembering to call it `REPLICATE ` with a trailing space.
2020-04-07 17:43:46 +01:00
Richard van der Hoff
c3e4b4edb2 Fix warnings about not calling superclass constructor
Separate `SimpleCommand` from `Command`, so that things which don't want to use
the `data` property don't have to, and thus fix the warnings PyCharm was giving
me about not calling `__init__` in the base class.
2020-04-07 17:40:22 +01:00
Richard van der Hoff
6a519a0ca0 Remove vestigal references to SYNC replication command
We've ripped pretty much all of this out: let's remove the remains.
2020-04-07 17:40:07 +01:00
Erik Johnston
ce72355d7f
Fix race in replication (#7226)
Fixes a race between handling `POSITION` and `RDATA` commands. We do this by simply linearizing handling of them.
2020-04-07 11:01:04 +01:00
Erik Johnston
82498ee901
Move server command handling out of TCP protocol (#7187)
This completes the merging of server and client command processing.
2020-04-07 10:51:07 +01:00
Erik Johnston
5016b162fc
Move client command handling out of TCP protocol (#7185)
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.
2020-04-06 09:58:42 +01:00
Erik Johnston
dfa0782254
Remove connections per replication stream metric. (#7195)
This broke in a recent PR (#7024) and is no longer useful due to all
replication clients implicitly subscribing to all streams, so let's
just remove it.
2020-04-01 10:40:46 +01:00
Erik Johnston
4f21c33be3
Remove usage of "conn_id" for presence. (#7128)
* Remove `conn_id` usage for UserSyncCommand.

Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.

This really only effects UserSyncCommand.

* Add CLEAR_USER_SYNCS command that is sent on shutdown.

This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
2020-03-30 16:37:24 +01:00
Erik Johnston
4cff617df1
Move catchup of replication streams to worker. (#7024)
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
2020-03-25 14:54:01 +00:00
Richard van der Hoff
a564b92d37
Convert *StreamRow classes to inner classes (#7116)
This just helps keep the rows closer to their streams, so that it's easier to
see what the format of each stream is.
2020-03-23 13:59:11 +00:00
Richard van der Hoff
b3cee0ce67
Fix processing of groups stream, and use symbolic names for streams (#7117)
`groups` != `receipts`

Introduced in #6964
2020-03-23 11:39:36 +00:00
Erik Johnston
fdb1344716
Remove concept of a non-limited stream. (#7011) 2020-03-20 14:40:47 +00:00
Erik Johnston
a319cb1dd1
Change device list streams to have one row per ID (#7010)
* Add 'device_lists_outbound_pokes' as extra table.

This makes sure we check all the relevant tables to get the current max
stream ID.

Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.

* Change device lists stream to have one row per id.

This will make it possible to process the streams more incrementally,
avoiding having to process large chunks at once.

* Change device list replication to match new semantics.

Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).

* Newsfile

* Remove handling of multiple rows per ID

* Fix worker handling

* Comments from review
2020-03-19 11:36:53 +00:00
Erik Johnston
6e6476ef07 Comments from review 2020-03-18 10:13:55 +00:00
Richard van der Hoff
78a15b1f9d
Store room_versions in EventBase objects (#6875)
This is a bit fiddly because it all has to be done on one fell swoop:

* Wherever we create a new event, pass in the room version (and check it matches the format version)
* When we prune an event, use the room version of the unpruned event to create the pruned version.
* When we pass an event over the replication protocol, pass the room version over alongside it, and use it when deserialising the event again.
2020-03-05 15:46:44 +00:00
Erik Johnston
9ce4e344a8 Change device list replication to match new semantics.
Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).
2020-02-28 11:25:34 +00:00
Erik Johnston
c3c6c0e622 Add 'device_lists_outbound_pokes' as extra table.
This makes sure we check all the relevant tables to get the current max
stream ID.

Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
2020-02-28 11:15:11 +00:00
Richard van der Hoff
3e99528f2b
Store room version on invite (#6983)
When we get an invite over federation, store the room version in the rooms table.

The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
2020-02-26 16:58:33 +00:00
Erik Johnston
1f773eec91
Port PresenceHandler to async/await (#6991) 2020-02-26 15:33:26 +00:00
Erik Johnston
bbf8886a05
Merge worker apps into one. (#6964) 2020-02-25 16:56:55 +00:00
Erik Johnston
0bd8cf435e Increase MAX_EVENTS_BEHIND for replication clients 2020-02-21 09:04:33 +00:00
Erik Johnston
de2d267375
Allow moving group read APIs to workers (#6866) 2020-02-07 11:14:19 +00:00
Erik Johnston
c3d4ad8afd
Fix sending server up commands from workers (#6811)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2020-01-30 16:42:11 +00:00
Erik Johnston
e17a110661
Detect unknown remote devices and mark cache as stale (#6776)
We just mark the fact that the cache may be stale in the database for
now.
2020-01-28 14:43:21 +00:00
Erik Johnston
d5275fc55f
Propagate cache invalidates from workers to other workers. (#6748)
Currently if a worker invalidates a cache it will be streamed to master, which then didn't forward those to other workers.
2020-01-27 13:47:50 +00:00
Erik Johnston
5d7a6ad223
Allow streaming cache invalidate all to workers. (#6749) 2020-01-22 10:37:00 +00:00
Erik Johnston
a8a50f5b57
Wake up transaction queue when remote server comes back online (#6706)
This will be used to retry outbound transactions to a remote server if
we think it might have come back up.
2020-01-17 10:27:19 +00:00
Erik Johnston
48c3a96886
Port synapse.replication.tcp to async/await (#6666)
* Port synapse.replication.tcp to async/await

* Newsfile

* Correctly document type of on_<FOO> functions as async

* Don't be overenthusiastic with the asyncing....
2020-01-16 09:16:12 +00:00
Erik Johnston
28c98e51ff
Add local_current_membership table (#6655)
Currently we rely on `current_state_events` to figure out what rooms a
user was in and their last membership event in there. However, if the
server leaves the room then the table may be cleaned up and that
information is lost. So lets add a table that separately holds that
information.
2020-01-15 14:59:33 +00:00
Erik Johnston
e8b68a4e4b
Fixup synapse.replication to pass mypy checks (#6667) 2020-01-14 14:08:06 +00:00
Richard van der Hoff
6964ea095b
Reduce the reconnect time when replication fails. (#6617) 2020-01-03 14:19:09 +00:00
Erik Johnston
fa780e9721
Change EventContext to use the Storage class (#6564) 2019-12-20 10:32:02 +00:00
Erik Johnston
9a4fb457cf Change DataStores to accept 'database' param. 2019-12-06 13:30:06 +00:00
Erik Johnston
a7f20500ff _CURRENT_STATE_CACHE_NAME is public 2019-12-04 15:45:42 +00:00
Erik Johnston
1056d6885a Move cache invalidation to main data store 2019-12-04 15:21:14 +00:00
Erik Johnston
2173785f0d Propagate reason in remotely rejected invites 2019-11-28 11:31:56 +00:00
Andrew Morgan
a8175d0f96
Prevent account_data content from being sent over TCP replication (#6333) 2019-11-26 13:58:39 +00:00
Erik Johnston
f9f1c8acbb
Merge pull request #6332 from matrix-org/erikj/query_devices_fix
Fix caching devices for remote servers in worker.
2019-11-26 12:56:05 +00:00
Erik Johnston
35f9165e96 Fixup docs 2019-11-26 12:04:48 +00:00
Andrew Morgan
cd96b4586f lint 2019-11-08 15:45:45 +00:00
Andrew Morgan
c4bdf2d785 Remove content from being sent for account data rdata stream 2019-11-08 15:44:02 +00:00
Andrew Morgan
1fe3cc2c9c Address review comments 2019-11-06 14:54:24 +00:00
Andrew Morgan
4059d61e26 Don't forget to ratelimit calls outside of RegistrationHandler 2019-11-06 12:01:54 +00:00
Erik Johnston
c16e192e2f Fix caching devices for remote servers in worker.
When the `/keys/query` API is hit on client_reader worker Synapse may
decide that it needs to resync some remote deivces. Usually this happens
on master, and then gets cached. However, that fails on workers and so
it falls back to fetching devices from remotes directly, which may in
turn fail if the remote is down.
2019-11-05 15:49:43 +00:00
Richard van der Hoff
cc6243b4c0
document the REPLICATE command a bit better (#6305)
since I found myself wonder how it works
2019-11-04 12:40:18 +00:00
Hubert Chathi
9c94b48bf1 Merge branch 'develop' into uhoreg/cross_signing_fix_workers_notify 2019-10-31 12:32:07 -04:00
Hubert Chathi
f7e4a582ef clean up code a bit 2019-10-31 12:01:00 -04:00
Andrew Morgan
54fef094b3
Remove usage of deprecated logger.warn method from codebase (#6271)
Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
2019-10-31 10:23:24 +00:00
Hubert Chathi
998f7fe7d4 make user signatures a separate stream 2019-10-30 17:22:52 -04:00
Hubert Chathi
670972c0e1 Merge branch 'develop' into uhoreg/cross_signing_fix_workers_notify 2019-10-30 16:46:31 -04:00
Erik Johnston
e577a4b2ad Port replication http server endpoints to async/await 2019-10-29 13:00:51 +00:00
Hubert Chathi
8ac766c44a make notification of signatures work with workers 2019-10-24 22:14:58 -04:00
Erik Johnston
bb6264be0b Merge branch 'develop' of github.com:matrix-org/synapse into erikj/refactor_stores 2019-10-22 10:41:18 +01:00
Erik Johnston
c66a06ac6b Move storage classes into a main "data store".
This is in preparation for having multiple data stores that offer
different functionality, e.g. splitting out state or event storage.
2019-10-21 16:05:06 +01:00
Hubert Chathi
8e86f5b65c Merge branch 'develop' into uhoreg/e2e_cross-signing_merged 2019-09-07 13:20:34 -04:00
Jorik Schellekens
f7c873a643
Trace how long it takes for the send trasaction to complete, including retrys (#5986) 2019-09-05 17:44:55 +01:00
Jorik Schellekens
909827b422
Add opentracing to all client servlets (#5983) 2019-09-05 14:46:04 +01:00
Hubert Chathi
a22d58c96c add user signature stream change cache to slaved device store 2019-09-04 19:32:35 -04:00
Andrew Morgan
b736c6cd3a
Remove bind_email and bind_msisdn (#5964)
Removes the `bind_email` and `bind_msisdn` parameters from the `/register` C/S API endpoint as per [MSC2140: Terms of Service for ISes and IMs](https://github.com/matrix-org/matrix-doc/pull/2140/files#diff-c03a26de5ac40fb532de19cb7fc2aaf7R107).
2019-09-04 18:24:23 +01:00
Andrew Morgan
4548d1f87e
Remove unnecessary parentheses around return statements (#5931)
Python will return a tuple whether there are parentheses around the returned values or not.

I'm just sick of my editor complaining about this all over the place :)
2019-08-30 16:28:26 +01:00
Jorik Schellekens
812ed6b0d5
Opentracing across workers (#5771)
Propagate opentracing contexts across workers


Also includes some Convenience modifications to opentracing for servlets, notably:
- Add boolean to skip the whitelisting check on inject
  extract methods. - useful when injecting into carriers
  locally. Otherwise we'd always have to include our
  own servername and whitelist our servername
- start_active_span_from_request instead of header
- Add boolean to decide whether to extract context
  from a request to a servlet
2019-08-22 18:08:07 +01:00
Brendan Abolivier
1c5b8c6222 Revert "Add "require_consent" parameter for registration"
This reverts commit 3320aaab3a.
2019-08-22 14:47:34 +01:00
Half-Shot
3320aaab3a Add "require_consent" parameter for registration 2019-08-22 14:21:54 +01:00
Andrew Morgan
baf081cd3b Bugfixes
--------
 
 - Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAl04Ur0THGFuZHJld0Bh
 bW9yZ2FuLnh5egAKCRCIhIgNLv5f9F4oD/0TY6S/SEd2uAmzor64ojmbX5BOwPzf
 j/wzUTrfvuf40EvkNPDpnejNZSvy/ysbaGQaQusv0SQKlV3xrvdn4RuMvnOWVWck
 kBsO+lvzOaUTR0KHDxN4y9F5eI2NdPbub4847PPVzyqSIHAd+kolxXS8kSBBhwpL
 yfaICWV/AOy5L7xN+JZ9IQpnegVAvUj5DmgXzDHd6VdeiHDVJuARaBgrR5uCkwVS
 ZoLRqZ95XV/qiguMAUvPOwyEqht2mwO64989MswP16YYm8oMkB5QA6I5nYnACsTP
 qk9YcN/oNvEfQXUhttku6MxK1/4yUMPUhEoDBDH7ebc0440QDtWN+IHTdA6oPVZB
 IuStL9YGY16m7Ltx37ZUA4URfNMiSeLHo3zKc/mCAcwxN4HyOjJewtxbG5zKQAOZ
 SMs8UcDwGR4zL1hnt8ZDNYtWwfzJBQIdGjoHvjXJEY7/1csTv2lmAwewFTXiqSAr
 30GW5ews94kotqBK53zZT6V0F5gHNqgGHniOz1ZpqLLxYLqO3LSAGe97CrqlWUdX
 GkhA9tZyweknociD9fyyBmKdcFJ4mL4a+oGI5CMnSMph8UvCY8Y5XMb1T+iYEABI
 tA9G3mBvgkLPj+5V+8QggNkBafSigW2Q4FX7enGsDmiiskZOtfeKrAcVkapD4ooi
 3I7IW5aetZr2IQ==
 =+JBn
 -----END PGP SIGNATURE-----

Merge tag 'v1.2.0rc2' into develop

Bugfixes
--------

- Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
2019-07-24 13:47:51 +01:00
Jorik Schellekens
cf2972c818
Fix servlet metric names (#5734)
* Fix servlet metric names

Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>

* Remove redundant check

* Cover all return paths
2019-07-24 13:07:35 +01:00
Amber Brown
4806651744
Replace returnValue with return (#5736) 2019-07-23 23:00:55 +10:00
Richard van der Hoff
824707383b
Remove access-token support from RegistrationHandler.register (#5641)
Nothing uses this now, so we can remove the dead code, and clean up the
API.

Since we're changing the shape of the return value anyway, we take the
opportunity to give the method a better name.
2019-07-08 19:01:08 +01:00
Richard van der Hoff
80cc82a445
Remove support for invite_3pid_guest. (#5625)
This has never been documented, and I'm not sure it's ever been used outside
sytest.

It's quite a lot of poorly-maintained code, so I'd like to get rid of it.

For now I haven't removed the database table; I suggest we leave that for a
future clearout.
2019-07-05 16:47:58 +01:00
Amber Brown
463b072b12
Move logging utilities out of the side drawer of util/ and into logging/ (#5606) 2019-07-04 00:07:04 +10:00
Amber Brown
32e7c9e7f2
Run Black. (#5482) 2019-06-20 19:32:02 +10:00
Erik Johnston
6745b7de6d Handle failing to talk to master over replication 2019-06-07 10:47:31 +01:00
Erik Johnston
5dbff34509 Fixup bsaed on review comments 2019-05-17 15:48:04 +01:00
Erik Johnston
d46aab3fa8 Add basic editing support 2019-05-16 16:54:45 +01:00
Erik Johnston
b5c62c6b26 Fix relations in worker mode 2019-05-16 10:38:13 +01:00
Richard van der Hoff
f50efcb65d Replace SlavedKeyStore with a shim
since we're pulling everything out of KeyStore anyway, we may as well simplify
it.
2019-04-08 23:59:07 +01:00
Richard van der Hoff
3352baac4b
Remove unused server_tls_certificates functions (#5028)
These have been unused since #4120, and with the demise of perspectives, it is
unlikely that they will ever be used again.
2019-04-08 21:50:18 +01:00
Neil Johnson
e8419554ff
Remove presence lists (#4989)
Remove presence list support as per MSC 1819
2019-04-03 11:11:15 +01:00
Richard van der Hoff
297bf2547e
Fix sync bug when accepting invites (#4956)
Hopefully this time we really will fix #4422.

We need to make sure that the cache on
`get_rooms_for_user_with_stream_ordering` is invalidated *before* the
SyncHandler is notified for the new events, and we can now do so reliably via
the `events` stream.
2019-04-02 12:42:39 +01:00
Richard van der Hoff
4b91c313a9 Combine the CurrentStateDeltaStream into the EventStream 2019-03-27 22:07:05 +00:00
Richard van der Hoff
1f6d6f918a Make EventStream rows have a type
... as a precursor to combining it with the CurrentStateDelta stream.
2019-03-27 22:07:05 +00:00
Richard van der Hoff
015b3622eb Skip building a ROW_TYPE when building updates
We're about to turn it straight into a JSON object anyway so building a
ROW_TYPE is a bit pointless, and reduces flexibility in the update_function.
2019-03-27 21:58:03 +00:00
Richard van der Hoff
f570916a3e Add parse_row method to replication stream class
This will allow individual stream classes to override how a row is parsed.
2019-03-27 21:32:33 +00:00
Richard van der Hoff
71dcb275f1 move FederationStream out to its own file 2019-03-27 21:13:14 +00:00
Richard van der Hoff
aa1e017864 move EventsStream out to its own file 2019-03-27 21:13:14 +00:00
Richard van der Hoff
a5798de067 Move replication.tcp.streams into a package 2019-03-27 21:13:14 +00:00
Richard van der Hoff
acaa18f7dd
Fix/improve some docstrings in the replication code. (#4949) 2019-03-27 21:12:36 +00:00
Richard van der Hoff
8cbbedaa2b
Fix ClientReplicationStreamProtocol.__str__ (#4929)
`__str__` depended on `self.addr`, which was absent from
ClientReplicationStreamProtocol, so attempting to call str on such an object
would raise an exception.

We can calculate the peer addr from the transport, so there is no need for addr
anyway.
2019-03-25 16:41:51 +00:00
Richard van der Hoff
9bde730ef8
Fix bug where read-receipts lost their timestamps (#4927)
Make sure that they are sent correctly over the replication stream.

Fixes: #4898
2019-03-25 16:38:05 +00:00
Richard van der Hoff
cdb8036161
Add a config option for torture-testing worker replication. (#4902)
Setting this to 50 or so makes a bunch of sytests fail in worker mode.
2019-03-20 16:04:35 +00:00
Erik Johnston
face0c5b3c Prefill client IPs cache on workers 2019-03-06 17:39:32 +00:00
Andrew Morgan
7b8a157b79
Merge pull request #4792 from matrix-org/anoa/replication_tokens
Support batch updates in the worker sender
2019-03-06 15:48:29 +00:00
Brendan Abolivier
a4c3a361b7
Add rate-limiting on registration (#4735)
* Rate-limiting for registration

* Add unit test for registration rate limiting

* Add config parameters for rate limiting on auth endpoints

* Doc

* Fix doc of rate limiting function

Co-Authored-By: babolivier <contact@brendanabolivier.com>

* Incorporate review

* Fix config parsing

* Fix linting errors

* Set default config for auth rate limiting

* Fix tests

* Add changelog

* Advance reactor instead of mocked clock

* Move parameters to registration specific config and give them more sensible default values

* Remove unused config options

* Don't mock the rate limiter un MAU tests

* Rename _register_with_store into register_with_store

* Make CI happy

* Remove unused import

* Update sample config

* Fix ratelimiting test for py2

* Add non-guest test
2019-03-05 14:25:33 +00:00
Andrew Morgan
b9f6163092 Simplify token replication logic 2019-03-05 13:58:30 +00:00
Erik Johnston
a84b8d56c2 Fixup slave stores 2019-03-04 18:04:57 +00:00
Andrew Morgan
fe7bd23a85 Clean up logic and add comments 2019-03-04 15:08:15 +00:00
Andrew Morgan
9f7cdf3da1 Clearer branching, fix missing list clear 2019-03-04 14:36:52 +00:00
Andrew Morgan
5f0c449dd5 Prevent replication wedging 2019-03-04 14:03:18 +00:00
Erik Johnston
1e315017d3 When presence is enabled don't send over replication 2019-02-27 13:53:46 +00:00
Erik Johnston
7590e9fa28
Merge pull request #4749 from matrix-org/erikj/replication_connection_backoff
Fix tightloop over connecting to replication server
2019-02-27 11:00:59 +00:00
Erik Johnston
6bb1c028f1 Limit cache invalidation replication line length (#4748) 2019-02-27 10:28:37 +00:00
Erik Johnston
6870fc496f Move connecting logic into ClientReplicationStreamProtocol 2019-02-27 10:23:51 +00:00
Erik Johnston
25814921f1 Increase the max delay between retry attempts
Otherwise if you have many workers they can easily take out master with
their connection attempts
2019-02-26 15:12:33 +00:00
Erik Johnston
313987187e Fix tightloop over connecting to replication server
If the client failed to process incoming commands during the initial set
up of the replication connection it would immediately disconnect and
reconnect, resulting in a tightloop.

This can happen, for example, when subscribing to a stream that has a
row that is too long in the backlog.

The fix here is to not consider the connection successfully set up until
the client has succesfully subscribed and caught up with the streams.
This ensures that the retry logic timers aren't reset until then,
meaning that if an error does happen during start up the client will
continue backing off before retrying again.
2019-02-26 15:05:41 +00:00
Erik Johnston
80467bbac3 Fix state cache invalidation on workers 2019-02-22 14:38:14 +00:00
Erik Johnston
dbdc565dfd Fix registration on workers (#4682)
* Move RegistrationHandler init to HomeServer

* Move post registration actions to RegistrationHandler

* Add post regisration replication endpoint

* Newsfile
2019-02-20 18:47:31 +11:00
Erik Johnston
a9b5ea6fc1 Batch cache invalidation over replication
Currently whenever the current state changes in a room invalidate a lot
of caches, which cause *a lot* of traffic over replication. Instead,
lets batch up all those invalidations and send a single poke down
the replication streams.

Hopefully this will reduce load on the master process by substantially
reducing traffic.
2019-02-18 17:53:31 +00:00
Erik Johnston
af691e415c Move register_device into handler 2019-02-18 16:49:38 +00:00
Erik Johnston
eb2b8523ae Split out registration to worker
This allows registration to be handled by a worker, though the actual
write to the database still happens on master.

Note: due to the in-memory session map all registration requests must be
handled by the same worker.
2019-02-18 12:12:57 +00:00
Erik Johnston
a4f52a33fe Fix replication for room v3 (#4523)
* Fix replication for room v3

We were not correctly quoting the path fragments over http replication,
which meant that it exploded when the event IDs had a slash in them

* Newsfile
2019-01-30 14:19:52 +00:00
Erik Johnston
b6b73a0bcf Fix receiving events from federation via a worker
This bug was introduced in PR #4470, commit 678a92cb56
2019-01-29 10:30:26 +00:00
Erik Johnston
678a92cb56 Replace missed usages of FrozenEvent 2019-01-25 10:32:30 +00:00
Erik Johnston
be6a7e47fa
Revert "Require event format version to parse or create events" 2019-01-25 10:23:51 +00:00
Erik Johnston
e8c9f15397 Replace missed usages of FrozenEvent 2019-01-24 11:14:07 +00:00
Erik Johnston
a163b748a5 Don't truncate command name in metrics 2018-10-29 17:34:21 +00:00
Amber Brown
c4b3698a80
Make the replication logger quieter (#4108) 2018-10-29 22:59:44 +11:00
Amber Brown
381d2cfdf0
Make workers work on Py3 (#4027) 2018-10-13 00:14:08 +11:00
Travis Ralston
f1a7264663
Fix minor typo in exception 2018-09-13 11:51:12 -06:00
Amber Brown
7c27c4d51c
merge (#3576) 2018-09-14 03:11:11 +10:00
Erik Johnston
3e242dc149 Remove conn_id 2018-09-04 11:45:52 +01:00
Erik Johnston
b13836da7f Remove conn_id from repl prometheus metrics
`conn_id` gets set to a random string, and so we end up filling up
prometheus with tonnes of data series, which is bad.
2018-09-03 17:22:49 +01:00
Erik Johnston
2aa7cc6a46
Merge pull request #3713 from matrix-org/erikj/fixup_fed_logging
Fix logging bug in EDU handling over replication
2018-08-20 10:51:45 +01:00
Erik Johnston
3b2dcfff78 Fix logging bug in EDU handling over replication 2018-08-17 11:11:06 +01:00
Richard van der Hoff
0e8d78f6aa Logcontexts for replication command handlers
Run the handlers for replication commands as background processes. This should
improve the visibility in our metrics, and reduce the number of "running db
transaction from sentinel context" warnings.

Ideally it means converting the things that fire off deferreds into the night
into things that actually return a Deferred when they are done. I've made a bit
of a stab at this, but it will probably be leaky.
2018-08-17 00:43:43 +01:00
Erik Johnston
488ffe6fdb Use federation handler function rather than duplicate
This involves renaming _persist_events to be a public function.
2018-08-15 14:17:18 +01:00
Erik Johnston
773db62a22 Rename slave TransactionStore to SlaveTransactionStore 2018-08-15 14:17:06 +01:00
Erik Johnston
b179537f2a Move clean_room_for_join to master 2018-08-09 10:37:38 +01:00
Erik Johnston
72d1902bbe Fixup doc comments 2018-08-09 10:23:49 +01:00
Erik Johnston
5785b93711 Merge branch 'develop' of github.com:matrix-org/synapse into erikj/split_federation 2018-08-09 10:16:16 +01:00
Erik Johnston
2bdafaf3c1
Merge pull request #3632 from matrix-org/erikj/refactor_repl_servlet
Add helper base class for generating new replication endpoints
2018-08-09 10:06:23 +01:00
Erik Johnston
62564797f5 Fixup wording and remove dead code 2018-08-09 09:56:10 +01:00
Erik Johnston
bebe325e6c Rename POST param to METHOD 2018-08-08 10:36:18 +01:00
Erik Johnston
5011417632 Fixup logging and docstrings 2018-08-08 10:29:58 +01:00
Erik Johnston
1e2bed9656 Import all functions from TransactionStore 2018-08-06 15:23:38 +01:00
Erik Johnston
a3f5bf79a0 Add EDU/query handling over replication 2018-08-06 15:23:31 +01:00
Erik Johnston
e26dbd82ef Add replication APIs for persisting federation events 2018-08-06 15:02:28 +01:00
Erik Johnston
051a99c400 Fix isort 2018-08-06 14:29:31 +01:00
Richard van der Hoff
0ca459ea33 Basic support for room versioning
This is the first tranche of support for room versioning. It includes:
 * setting the default room version in the config file
 * new room_version param on the createRoom API
 * storing the version of newly-created rooms in the m.room.create event
 * fishing the version of existing rooms out of the m.room.create event
2018-08-03 16:08:32 +01:00
Erik Johnston
cb298ff623 Merge branch 'develop' of github.com:matrix-org/synapse into erikj/refactor_repl_servlet 2018-08-03 09:25:15 +01:00
Richard van der Hoff
01e93f48ed Kill off MatrixCodeMessageException
This code brings the SimpleHttpClient into line with the
MatrixFederationHttpClient by having it raise HttpResponseExceptions when a
request fails (rather than trying to parse for matrix errors and maybe raising
MatrixCodeMessageException).

Then, whenever we were checking for MatrixCodeMessageException and turning them
into SynapseErrors, we now need to check for HttpResponseExceptions and call
to_synapse_error.
2018-08-01 16:02:46 +01:00
Erik Johnston
443da003bc Use new helper base class for membership requests 2018-07-31 14:32:23 +01:00
Erik Johnston
729b672823 Use new helper base class for ReplicationSendEventRestServlet 2018-07-31 14:32:23 +01:00
Erik Johnston
d81602b75a Add helper base class for generating new replication endpoints
This will hopefully reduce the boiler plate required to implement new
internal HTTP requests.
2018-07-31 14:32:20 +01:00
Richard van der Hoff
f59be4eb0e Fix unit tests
on_notifier_poke no longer runs synchonously, so we have to do a different hack
to make sure that the replication data has been sent. Let's actually listen for
its arrival.
2018-07-25 10:30:36 +01:00
Richard van der Hoff
371da42ae4 Wrap a number of things that run in the background
This will reduce the number of "Starting db connection from sentinel context"
warnings, and will help with our metrics.
2018-07-25 09:41:12 +01:00
Erik Johnston
0faa3223cd Fix missing attributes on workers.
This was missed during the transition from attribute to getter for
getting state from context.
2018-07-23 16:28:00 +01:00
Erik Johnston
05f5dabc10 Use stream cache in get_linearized_receipts_for_room
This avoids us from uncessarily hitting the database when there has been
no change for the room
2018-07-10 17:22:42 +01:00
Amber Brown
49af402019 run isort 2018-07-09 16:09:20 +10:00
Amber Brown
6350bf925e
Attempt to be more performant on PyPy (#3462) 2018-06-28 14:49:57 +01:00
Erik Johnston
33fdcfa957
Merge pull request #3441 from matrix-org/erikj/redo_erasure
Fix user erasure and re-enable
2018-06-25 14:37:01 +01:00
Erik Johnston
eb50c44eaf Add UserErasureWorkerStore to workers 2018-06-25 14:22:24 +01:00
Amber Brown
07cad26d65
Remove all global reactor imports & pass it around explicitly (#3424) 2018-06-25 14:08:28 +01:00
Amber Brown
77ac14b960
Pass around the reactor explicitly (#3385) 2018-06-22 09:37:10 +01:00
Amber Brown
99b77aa829
Fix tcp protocol metrics naming (#3410) 2018-06-21 09:39:27 +01:00
Richard van der Hoff
b7e7fd2d0e Fix replication metrics
fix bug introduced in #3256
2018-06-04 16:23:05 +01:00
Amber Brown
754826a830 Merge remote-tracking branch 'origin/develop' into 3218-official-prom 2018-05-28 18:57:23 +10:00
Amber Brown
1f69693347
Merge pull request #3244 from NotAFile/py3-six-4
replace some iteritems with six
2018-05-24 13:04:07 -05:00
Amber Brown
b6063631c3 more cleanup 2018-05-22 17:36:20 -05:00
Amber Brown
228f1f584e fix the test failures 2018-05-22 15:02:38 -05:00
Amber Brown
8f5a688d42 cleanups, self-registration 2018-05-22 10:56:03 -05:00
Amber Brown
a8990fa2ec Merge remote-tracking branch 'origin/develop' into 3218-official-prom 2018-05-22 10:50:26 -05:00
Richard van der Hoff
9ea219c514 Send users a server notice about consent
When a user first syncs, we will send them a server notice asking them to
consent to the privacy policy if they have not already done so.
2018-05-22 11:54:51 +01:00
Amber Brown
fcc525b0b7 rest of the changes 2018-05-21 19:48:57 -05:00
Amber Brown
df9f72d9e5 replacing portions 2018-05-21 19:47:37 -05:00
Adrian Tschira
933bf2dd35 replace some iteritems with six
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-05-19 17:59:26 +02:00
Adrian Tschira
57b58e2174 make imports local
Signed-off-by: Adrian Tschira <nota@notafile.com>
2018-04-28 13:41:41 +02:00
Richard van der Hoff
b78395b7fe Refactor ResponseCache usage
Adds a `.wrap` method to ResponseCache which wraps up the boilerplate of a
(get, set) pair, and then use it throughout the codebase.

This will be largely non-functional, but does include the following functional
changes:

* federation_server.on_context_state_request: drops use of _server_linearizer
  which looked redundant and could cause incorrect cache misses by yielding
  between the get and the set.
* RoomListHandler.get_remote_public_room_list(): fixes logcontext leaks
* the wrap function includes some logging. I'm hoping this won't be too noisy
  on production.
2018-04-12 13:02:15 +01:00
Richard van der Hoff
b3384232a0 Add metrics for ResponseCache 2018-04-10 23:14:47 +01:00
Richard van der Hoff
3ee4ad09eb Fix json encoding bug in replication
json encoders have an encode method, not a dumps method.
2018-04-03 15:09:48 +01:00
Richard van der Hoff
05630758f2 Use static JSONEncoders
using json.dumps with custom options requires us to create a new JSONEncoder on
each call. It's more efficient to create one upfront and reuse it.
2018-03-29 23:13:33 +01:00
Erik Johnston
9aa5a0af51 Explicitly use simplejson 2018-03-20 09:58:13 +00:00
Erik Johnston
610accbb7f Fix replication after switch to simplejson
Turns out that simplejson serialises namedtuple's as dictionaries rather
than tuples by default.
2018-03-19 16:12:48 +00:00
Erik Johnston
fa72803490 Merge branch 'master' of github.com:matrix-org/synapse into develop 2018-03-19 11:41:01 +00:00
Erik Johnston
926ba76e23 Replace ujson with simplejson 2018-03-15 23:43:31 +00:00
Erik Johnston
57db62e554
Merge pull request #2992 from matrix-org/erikj/implement_member_workre
Implement RoomMemberWorkerHandler
2018-03-14 14:29:33 +00:00
Erik Johnston
0011ede3b0 Fix imports 2018-03-14 14:19:23 +00:00
Erik Johnston
62ad701326 s/join/joined/ in notify_user_membership_change 2018-03-14 14:17:43 +00:00
Erik Johnston
b27320b550 Implement RoomMemberWorkerHandler 2018-03-13 18:26:00 +00:00
Erik Johnston
3518d0ea8f Split up ProfileStore 2018-03-13 17:36:50 +00:00
Erik Johnston
d0fcc48f9d extra_users is actually a list of UserIDs 2018-03-13 11:20:06 +00:00
Erik Johnston
2e223163ff Split Directory store 2018-03-05 15:11:30 +00:00
Erik Johnston
fafa3e7114 Split registration store 2018-03-02 13:48:27 +00:00
Erik Johnston
1a6c7cdf54
Merge pull request #2928 from matrix-org/erikj/read_marker_caches
Fix typo in getting replication account data processing
2018-03-01 17:56:14 +00:00
Erik Johnston
89b7232ff8 Fix typo in getting replication account data processing 2018-03-01 17:50:30 +00:00
Erik Johnston
1773df0632
Merge pull request #2925 from matrix-org/erikj/split_sig_fed
Split out SignatureStore and EventFederationStore
2018-03-01 17:32:58 +00:00
Erik Johnston
65cf454fd1 Remove unused DataStore 2018-03-01 17:27:53 +00:00
Erik Johnston
9e08a93a7b
Merge pull request #2927 from matrix-org/erikj/read_marker_caches
Improve caching for read_marker API
2018-03-01 17:12:34 +00:00
Erik Johnston
a83c514d1f Improve caching for read_marker API
We add a new storage function to get a paritcular type of room account
data. This allows us to prefill the cache when updating that acount
data.
2018-03-01 17:08:17 +00:00
Erik Johnston
33bebb63f3 Add some caches to help read marker API 2018-03-01 17:08:17 +00:00
Erik Johnston
2ad4d5b5bb Merge branch 'develop' of github.com:matrix-org/synapse into erikj/split_sig_fed 2018-03-01 16:59:39 +00:00
Erik Johnston
64346be26d Merge branch 'develop' of github.com:matrix-org/synapse into erikj/split_stream_store 2018-03-01 16:26:42 +00:00
Erik Johnston
22518e2833
Merge pull request #2923 from matrix-org/erikj/stream_ago_worker
Calculate stream_ordering_month_ago correctly on workers
2018-03-01 16:23:54 +00:00
Erik Johnston
f793bc3877 Split out stream store 2018-03-01 15:13:08 +00:00
Erik Johnston
6411f725be Calculate stream_ordering_month_ago correctly on workers 2018-03-01 14:20:53 +00:00
Erik Johnston
a9a2d66cdd Split out SignatureStore and EventFederationStore 2018-03-01 14:17:53 +00:00
Erik Johnston
0c8ba5dd1c Split up RoomStore 2018-03-01 14:01:19 +00:00
Erik Johnston
126b9bf96f Log in the correct places 2018-03-01 12:05:33 +00:00
Erik Johnston
157298f986 Don't do preserve_fn for every request 2018-03-01 11:59:45 +00:00
Erik Johnston
89f90d808a Add some logging 2018-03-01 11:59:16 +00:00
Erik Johnston
8ded8ba2c7 Make repl send_event idempotent and retry on timeouts
If we treated timeouts as failures on the worker we would attempt to
clean up e.g. push actions while the master might still process the
event.
2018-03-01 11:20:34 +00:00
Erik Johnston
6b8604239f Correctly send ratelimit and extra_users params 2018-03-01 10:08:39 +00:00
Erik Johnston
28e973ac11 Calculate push actions on worker 2018-02-28 18:02:30 +00:00
Erik Johnston
3594dbc6dc
Merge pull request #2904 from matrix-org/erikj/receipt_cache_invalidation
Fix missing invalidations for receipt storage
2018-02-27 11:34:26 +00:00
Erik Johnston
2311189ee4
Merge pull request #2903 from matrix-org/erikj/split_roommember_store
Split out RoomMemberStore
2018-02-27 11:32:10 +00:00
Erik Johnston
c57607874c
Merge pull request #2901 from matrix-org/erikj/split_as_stores
Split AS stores
2018-02-27 10:07:07 +00:00
Erik Johnston
d62ce972f8 Merge branch 'develop' of github.com:matrix-org/synapse into erikj/split_roommember_store 2018-02-23 11:46:24 +00:00
Erik Johnston
6ae9a3d2a6 Update copyright 2018-02-23 11:44:49 +00:00
Erik Johnston
a90c60912f Merge branch 'develop' of github.com:matrix-org/synapse into erikj/split_event_push_actions 2018-02-23 11:26:31 +00:00
Erik Johnston
50e8657867
Merge pull request #2902 from matrix-org/erikj/split_events_store
Split out get_events and co into a worker store
2018-02-23 11:23:52 +00:00
Erik Johnston
1cf9e071dd
Merge pull request #2899 from matrix-org/erikj/split_pushers
Split PusherStore
2018-02-23 11:23:35 +00:00
Erik Johnston
d0957753bf
Merge pull request #2898 from matrix-org/erikj/split_push_rules_store
Split PushRulesStore
2018-02-23 11:23:23 +00:00
Erik Johnston
70349872c2 Update copyright 2018-02-23 11:14:35 +00:00
Erik Johnston
eba93b05bf Split EventsWorkerStore into separate file 2018-02-23 11:01:21 +00:00
Erik Johnston
bf8a36e080 Update copyright 2018-02-23 10:52:10 +00:00
Erik Johnston
c2ecfcc3a4 Update copyright 2018-02-23 10:41:34 +00:00
Erik Johnston
7e6cf89dc2 Update copyright 2018-02-23 10:39:19 +00:00
Erik Johnston
26d37f7a63 Update copyright 2018-02-23 10:33:55 +00:00
Erik Johnston
bb73f55fc6 Use absolute imports 2018-02-23 10:31:16 +00:00
Erik Johnston
faeb369f15 Fix missing invalidations for receipt storage 2018-02-21 15:19:54 +00:00
Erik Johnston
3dec9c66b3 Split out RoomMemberStore 2018-02-21 12:07:26 +00:00
Erik Johnston
46244b2759 Split AS stores 2018-02-21 11:49:34 +00:00
Erik Johnston
27b094f382 Split out get_events and co into a worker store 2018-02-21 11:41:48 +00:00
Erik Johnston
d15d237b0d Split out EventPushActionWorkerStore 2018-02-21 11:01:13 +00:00
Erik Johnston
6f72765371 Split PusherStore 2018-02-21 10:54:21 +00:00
Erik Johnston
cbaad969f9 Split PushRulesStore 2018-02-21 10:43:31 +00:00
Erik Johnston
ca9b9d9703 Split AccountDataStore and TagStore 2018-02-21 10:15:04 +00:00
Erik Johnston
95e4cffd85 Fix comment 2018-02-20 17:58:40 +00:00
Erik Johnston
e316bbb4c0 Use abstract base class to access stream IDs 2018-02-20 17:43:57 +00:00
Erik Johnston
f5ac4dc2d4 Split ReceiptsStore 2018-02-20 16:28:28 +00:00
Erik Johnston
106906a65e Don't serialize current state over replication 2018-02-15 13:53:18 +00:00
Erik Johnston
ef344b10e5 Don't log errors propogated from send_event 2018-02-15 11:03:49 +00:00
Erik Johnston
8ec2e638be Add event_creator worker 2018-02-07 10:32:32 +00:00
Erik Johnston
24dd73028a Add replication http endpoint for event sending 2018-02-07 10:32:32 +00:00
Erik Johnston
3d33eef6fc
Store state groups separately from events (#2784)
* Split state group persist into seperate storage func

* Add per database engine code for state group id gen

* Move store_state_group to StateReadStore

This allows other workers to use it, and so resolve state.

* Hook up store_state_group

* Fix tests

* Rename _store_mult_state_groups_txn

* Rename StateGroupReadStore

* Remove redundant _have_persisted_state_group_txn

* Update comments

* Comment compute_event_context

* Set start val for state_group_id_seq

... otherwise we try to recreate old state groups

* Update comments

* Don't store state for outliers

* Update comment

* Update docstring as state groups are ints
2018-02-06 14:31:24 +00:00
Richard van der Hoff
5c3c32f16f Metrics for number of RDATA commands received
I found myself wishing we had this.
2018-01-15 17:45:55 +00:00
Richard van der Hoff
0edf085b68 Fix some logcontext leaks in replication resource
The @measure_func annotations rely on the wrapped function respecting the
logcontext rules. Add the necessary yields to make this work.
2017-11-23 23:19:43 +00:00
Richard van der Hoff
35a4b63240 Pull out bits of StateStore to a mixin
... so that we don't need to secretly gut-wrench it for use in the slaved
stores. I haven't done the other stores yet, but we should. I'm tired of the
workers breaking every time we tweak the stores because I forgot to gut-wrench
the right method.

fixes https://github.com/matrix-org/synapse/issues/2655.
2017-11-14 11:43:58 +00:00
Richard van der Hoff
6cfee09be9 Make __init__ consitstent across Store heirarchy
Add db_conn parameters to the `__init__` methods of the *Store classes, so that
they are all consistent, which makes the multiple inheritance work correctly
(and so that we can later extract mixins which can be used in the slavedstores)
2017-11-13 10:46:07 +00:00
Richard van der Hoff
eaaabc6c4f replace 'except:' with 'except Exception:'
what could possibly go wrong
2017-10-23 15:52:32 +01:00
hera
f807f7f804 log when we get an exception handling replication updates 2017-10-12 11:51:24 +01:00
Erik Johnston
2cc998fed8 Fix replication. And notify 2017-07-20 17:13:18 +01:00
Erik Johnston
925b3638ff Reduce log levels in tcp replication 2017-07-11 10:04:21 +01:00
Erik Johnston
27f26e48b7 Serialize user ip command as json 2017-06-27 16:25:38 +01:00
Erik Johnston
8c23221666 Fix up 2017-06-27 15:53:45 +01:00
Erik Johnston
78cefd78d6 Make workers report to master for user ip updates 2017-06-27 14:58:10 +01:00
Erik Johnston
dae9a00a28 Initialise exclusive_user_regex 2017-06-21 14:19:33 +01:00
Erik Johnston
8177563ebe Fix for workers 2017-06-21 13:57:49 +01:00
Erik Johnston
6aa5bc8635 Initial worker impl 2017-06-16 11:47:11 +01:00
Erik Johnston
d53fe399eb Add cache for is_host_joined 2017-06-13 09:56:18 +01:00
Erik Johnston
a837765e8c Merge pull request #2266 from matrix-org/erikj/host_in_room
Change is_host_joined to use current_state table
2017-06-12 09:49:51 +01:00
Erik Johnston
8060974344 Fix replication 2017-06-09 16:40:52 +01:00
Erik Johnston
2cac7623a5 Add missing notifier 2017-06-09 11:24:41 +01:00
Erik Johnston
298d83b340 Fix replication 2017-06-09 11:01:28 +01:00
Erik Johnston
dfbda5e025 Faster cache for get_joined_hosts 2017-05-25 17:24:44 +01:00
Erik Johnston
f85a415279 Add missing storage function to slave store 2017-05-22 16:31:24 +01:00
Erik Johnston
9ac263ed1b Add new storage functions to slave store 2017-05-04 14:29:03 +01:00
Erik Johnston
e4f3431116 Remove unused cache 2017-04-24 13:27:38 +01:00
Erik Johnston
247c736b9b Merge pull request #2115 from matrix-org/erikj/dedupe_federation_repl
Reduce federation replication traffic
2017-04-12 11:07:13 +01:00
Erik Johnston
9c712a366f Move get_presence_list_* to SlaveStore 2017-04-11 16:07:33 +01:00
Erik Johnston
28a4649785 Remove HTTP replication APIs 2017-04-11 09:52:11 +01:00
Erik Johnston
29574fd5b3 Reduce federation presence replication traffic
This is mainly done by moving the calculation of where to send presence
updates from the presence handler to the transaction queue, so we only
need to send the presence event (and not the destinations) across the
replication connection. Before we were duplicating by sending the full
state across once per destination.
2017-04-10 16:48:30 +01:00
Erik Johnston
2e6f5a4910 Typo 2017-04-10 16:17:40 +01:00
Erik Johnston
efcb6db688 Merge pull request #2109 from matrix-org/erikj/send_queue_fix
Fix up federation SendQueue and document types
2017-04-10 13:09:25 +01:00
Erik Johnston
0364d23210 Up replication ping timeout 2017-04-10 11:32:05 +01:00
Erik Johnston
ab904caf33 Comments 2017-04-10 10:02:17 +01:00
Erik Johnston
98ce212093 Merge pull request #2103 from matrix-org/erikj/no-double-encode
Don't double encode replication data
2017-04-07 09:39:52 +01:00
Erik Johnston
ad544c803a Document types of the replication streams 2017-04-06 13:28:52 +01:00
Erik Johnston
69b3fd485d Fix incorrect type when using InvalidateCacheCommand 2017-04-06 09:36:38 +01:00
Erik Johnston
fcc803b2bf Add log lines 2017-04-05 17:13:44 +01:00
Erik Johnston
3f213d908d Rearrange metrics 2017-04-05 14:15:09 +01:00
Erik Johnston
1ca0e78ca1 Fix typo 2017-04-05 13:43:39 +01:00
Erik Johnston
b43d3267e2 Fixup some metrics for tcp repl 2017-04-05 13:34:54 +01:00
Erik Johnston
a5c401bd12 Merge pull request #2097 from matrix-org/erikj/repl_tcp_client
Move to using TCP replication
2017-04-05 09:36:21 +01:00
Erik Johnston
a76886726b Merge pull request #2098 from matrix-org/erikj/repl_tcp_fix
Advance replication streams even if nothing is listening
2017-04-04 15:40:51 +01:00
Erik Johnston
4264ceb31c Fiddle tcp replication logging 2017-04-04 14:14:03 +01:00
Erik Johnston
023ee197be Advance replication streams even if nothing is listening
Otherwise the streams don't advance and steadily fall behind, so when a
worker does connect either a) they'll be streamed lots of old updates or
b) the connection will fail as the streams are too far behind.
2017-04-04 13:19:26 +01:00
Erik Johnston
3a1f3f8388 Change slave storage to use new replication interface
As the TCP replication uses a slightly different API and streams than
the HTTP replication.

This breaks HTTP replication.
2017-04-03 15:34:19 +01:00
Erik Johnston
52bfa604e1 Add basic replication client handler and factory 2017-04-03 15:34:13 +01:00
Erik Johnston
0a6a966e2b Always advance stream tokens 2017-04-03 15:22:56 +01:00
Erik Johnston
1df7c28661 Use callbacks to notify tcp replication rather than deferreds 2017-03-31 15:42:51 +01:00
Erik Johnston
36d2b66f90 Add a timestamp to USER_SYNC command
This timestamp is used to indicate when the user last sync'd
2017-03-31 15:42:22 +01:00
Erik Johnston
bfcf016714 Fix up docs 2017-03-31 11:19:24 +01:00
Erik Johnston
4d7fc7f977 Add server side resource for tcp replication 2017-03-30 13:24:45 +01:00
Erik Johnston
7450693435 Initial TCP protocol implementation
This defines the low level TCP replication protocol
2017-03-30 12:54:46 +01:00
Erik Johnston
8da6f0be48 Define the various streams we will replicate 2017-03-30 12:54:46 +01:00
Erik Johnston
11880103b1 Make federation send queue take the current position 2017-03-30 12:54:36 +01:00
Erik Johnston
24d35ab47b Add new storage functions for new replication
The new replication protocol will keep all the streams separate, rather
than muxing multiple streams into one.
2017-03-30 11:48:35 +01:00
Erik Johnston
09f79aaad0 Use presence replication stream to invalidate cache
Instead of using the cache invalidation replication stream to invalidate
the _get_presence_cache, we can instead rely on the presence replication
stream. This reduces the amount of replication traffic considerably.
2017-03-24 13:21:08 +00:00
Erik Johnston
d58b1ffe94 Replace some calls to cursor_to_dict
cursor_to_dict can be surprisinglh expensive for large result sets, so lets
only call it when we need to.
2017-03-24 11:07:02 +00:00
Erik Johnston
aac6d1fc9b PEP8 2017-03-20 13:47:56 +00:00
Erik Johnston
61f471f779 Don't send the full event json over replication 2017-03-17 15:50:01 +00:00
Richard van der Hoff
29ed09e80a Fix assertion to stop transaction queue getting wedged
... and update some docstrings to correctly reflect the types being used.

get_new_device_msgs_for_remote can return a long under some circumstances,
which was being stored in last_device_list_stream_id_by_dest, and was then
upsetting things on the next loop.
2017-03-15 12:16:55 +00:00
Erik Johnston
45c7f12d2a Add new storage function to slave store 2017-03-13 16:26:44 +00:00
Erik Johnston
8f267fa8a8 Fix it for the workers 2017-03-10 11:22:25 +00:00
Erik Johnston
e933a2712d Don't log unknown cache warnings in workers 2017-02-28 16:22:41 +00:00
Erik Johnston
095b45c165 Aggregate event push actions 2017-02-14 13:39:41 +00:00
Erik Johnston
9e617cd4c2 Cache get_presence storage 2017-02-13 13:50:03 +00:00
Erik Johnston
6bba80241c Merge pull request #1912 from matrix-org/markjh/roominitialsync
Add db functions needed for room initial sync to slave
2017-02-13 12:20:21 +01:00
Mark Haines
3a46280ca3 Add db functions needed for room initial sync to slave 2017-02-13 11:16:53 +00:00
Erik Johnston
0f3e296cb7 Fix replication 2017-02-02 15:02:03 +00:00
Erik Johnston
458b6f4733 Only invalidate membership caches based on the cache stream
Before we completely invalidated get_users_in_room whenever we updated
any current_state_events table. This was way too aggressive.
2017-01-31 16:09:03 +00:00
Erik Johnston
3670025e64 Rename func 2017-01-30 14:11:31 +00:00
Erik Johnston
252b503fc8 Hook device list updates to replication 2017-01-27 14:31:35 +00:00
Erik Johnston
a55fa2047f Insert delta of current_state_events to be more efficient 2017-01-20 17:10:18 +00:00
Erik Johnston
09cbcb78d3 Add cache to get_public_room_ids_at_stream_id 2016-12-12 14:41:51 +00:00
Erik Johnston
f32fb65552 Add new API appservice specific public room list 2016-12-06 16:12:27 +00:00
Erik Johnston
26072df6af Ensure only main or federation_sender process can send federation traffic 2016-11-23 14:09:47 +00:00
Erik Johnston
4c79a63fd7 Explicit federation ack 2016-11-23 10:40:44 +00:00
Erik Johnston
90565d015e Invalidate retry cache in both directions 2016-11-22 17:45:44 +00:00
Erik Johnston
7c9cdb2245 Store federation stream positions in the database 2016-11-21 11:33:08 +00:00
Erik Johnston
f8ee66250a Handle sending events and device messages over federation 2016-11-17 15:48:04 +00:00
Erik Johnston
ed787cf09e Hook up the send queue and create a federation sender worker 2016-11-16 17:34:44 +00:00
Erik Johnston
668f91d707 Fix check of wrong variable 2016-10-11 13:57:22 +01:00
Erik Johnston
748d8fdc7b Reduce DB hits for replication
Some streams will occaisonally advance their positions without actually
having any new rows to send over federation. Currently this means that
the token will not advance on the workers, leading to them repeatedly
sending a slightly out of date token. This in turns requires the master
to hit the DB to check if there are any new rows, rather than hitting
the no op logic where we check if the given token matches the current
token.

This commit changes the API to always return an entry if the position
for a stream has changed, allowing workers to advance their tokens
correctly.
2016-09-23 16:49:21 +01:00
Erik Johnston
995f2f032f Fix public room pagination for client_reader app 2016-09-16 14:48:21 +01:00
Erik Johnston
418bcd4309 Add new storage function to slave store 2016-09-16 08:37:39 +01:00
Erik Johnston
cb3edec6af Use stream_change cache to make get_forward_extremeties_for_room cache more effective 2016-09-15 14:28:13 +01:00
Erik Johnston
55e6fc917c Add cache to get_forward_extremeties_for_room 2016-09-15 14:04:28 +01:00
Erik Johnston
211786ecd6 Stream public room changes down replication 2016-09-15 11:47:23 +01:00
Erik Johnston
a4339de9de Correctly handle typing stream id resetting 2016-09-09 16:44:26 +01:00
Erik Johnston
ab80d5e0a9 Drop replication log levels 2016-09-09 14:56:50 +01:00
Mark Haines
6a6cbfcf1e Track the max_stream_device_id in a separate table, since we delete from the inbox table 2016-09-09 11:48:23 +01:00
Mark Haines
fa9d36e050 Merge branch 'develop' into markjh/direct_to_device_federation 2016-09-08 13:43:43 +01:00
Mark Haines
2a0159b8ae Fix the stream change cache to work over replication 2016-09-07 15:58:00 +01:00
Erik Johnston
a99e933550 Add upgrade script that will slowly prune state_groups_state entries 2016-09-05 10:05:36 +01:00
Erik Johnston
44982606ee Merge pull request #1060 from matrix-org/erikj/state_ids
Assign state groups in state handler.
2016-09-01 14:20:42 +01:00
Erik Johnston
826ca61745 Add storage function to SlaveStore 2016-08-31 14:45:04 +01:00