mirror of
https://git.anonymousland.org/anonymousland/synapse.git
synced 2025-01-13 13:09:29 -05:00
Add some design documentation for replication
This commit is contained in:
parent
d33d623f0d
commit
aa5ce4d450
58
docs/replication.rst
Normal file
58
docs/replication.rst
Normal file
@ -0,0 +1,58 @@
|
|||||||
|
Replication Architecture
|
||||||
|
========================
|
||||||
|
|
||||||
|
Motivation
|
||||||
|
----------
|
||||||
|
|
||||||
|
We'd like to be able to split some of the work that synapse does into multiple
|
||||||
|
python processes. In theory multiple synapse processes could share a single
|
||||||
|
postgresql database and we'd scale up by running more synapse processes.
|
||||||
|
However much of synapse assumes that only one process is interacting with the
|
||||||
|
database, both for assigning unique identifiers when inserting into tables,
|
||||||
|
notifying components about new updates, and for invalidating its caches.
|
||||||
|
|
||||||
|
So running multiple copies of the current code isn't an option. One way to
|
||||||
|
run multiple processes would be to have a single writer process and multiple
|
||||||
|
reader processes connected to the same database. In order to do this we'd need
|
||||||
|
a way for the reader process to invalidate its in-memory caches when an update
|
||||||
|
happens on the writer. One way to do this is for the writer to present an
|
||||||
|
append-only log of updates which the readers can consume to invalidate their
|
||||||
|
caches and to push updates to listening clients or pushers.
|
||||||
|
|
||||||
|
Synapse already stores much of its data as an append-only log so that it can
|
||||||
|
correctly respond to /sync requests so the amount of code changes needed to
|
||||||
|
expose the append-only log to the readers should be fairly minimal.
|
||||||
|
|
||||||
|
Architecture
|
||||||
|
------------
|
||||||
|
|
||||||
|
The Replication API
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Synapse will optionally expose a long poll HTTP API for extracting updates. The
|
||||||
|
API will have a similar shape to /sync in that clients provide tokens
|
||||||
|
indicating where in the log they have reached and a timeout. The synapse server
|
||||||
|
then either responds with updates immediately if it already has updates or it
|
||||||
|
waits until the timeout for more updates. If the timeout expires and nothing
|
||||||
|
happened then the server returns an empty response.
|
||||||
|
|
||||||
|
However until the /sync API this replication API is returning synapse specific
|
||||||
|
data rather than trying to implement a matrix specification. The replication
|
||||||
|
results are returned as arrays of rows where the rows are mostly lifted
|
||||||
|
directly from the database. This avoids unnecessary JSON parsing on the server
|
||||||
|
and hopefully avoids an impedance mismatch between the data returned and the
|
||||||
|
required updates to the datastore.
|
||||||
|
|
||||||
|
This does not replicate all the database tables as many of the database tables
|
||||||
|
are indexes that can be recovered from the contents of other tables.
|
||||||
|
|
||||||
|
The format and parameters for the api are documented in
|
||||||
|
``synapse/replication/resource.py``.
|
||||||
|
|
||||||
|
|
||||||
|
The Slaved DataStore
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
There are read-only version of the synapse storage layer in
|
||||||
|
``synapse/replication/slave/storage`` that use the response of the replication
|
||||||
|
API to invalidate their caches.
|
Loading…
Reference in New Issue
Block a user