matrix-doc/attic/drafts/erik-model.rst

142 lines
5.7 KiB
ReStructuredText

This is a standalone description of the data architecture of Synapse. There is a
lot of overlap with the current specification, so it has been split out here for
posterity. Hopefully all the important bits have been merged into the relevant
places in the main spec.
Model
-----
Overview
~~~~~~~~
Matrix is used to reliably distribute data between sets of `users`.
Users are associated with one of many matrix `servers`. These distribute,
receive and store data on behalf of its registered users. Servers can be run on
any host accessible from the internet.
When a user wishes to send data to users on different servers the local server
will distribute the data to each remote server. These will in turn distribute
to their local users involved.
A user sends and receives data using one or more authenticated `clients`
connected to his server. Clients may persist data locally or request it when
required from the server.
Events
~~~~~~
An event is a collection of data (the `payload`) and metadata to be distributed
across servers and is the primary data unit in Matrix. Events are extensible
so that clients and servers can add extra arbitrary fields to both the payload
or metadata.
Events are distributed to interested servers upon creation. Historical events
may be requested from servers; servers are not required to produce all
or any events requested.
All events have a metadata `type` field that is used by client and servers to
determine how the payload should be processed and used. There are a number of
types reserved by the protocol for particular uses, but otherwise types may be
defined by applications, clients or servers for their own purposes.
.. TODO : Namespacing of new types.
Graph
+++++
Each event has a list of zero or more `parent` events. These relations form
directed acyclic graphs of events called `event graphs`. Every event graph has
a single root event, and each event graph forms the basis of the history of a
matrix room.
Event graphs give a partial ordering of events, i.e. given two events one may
be considered to have come before the other if one is an ancestor of the other.
Since two events may be on separate branches, not all events can be compared in
this manner.
Every event has a metadata `depth` field that is a positive integer that is
strictly greater than the depths of any of its parents. The root event should
have a depth of 1.
[Note: if one event is before another, then it must have a strictly smaller
depth]
Integrity
+++++++++
.. TODO: Specify the precise subset of essential fields
Portions of events will be signed by one or more servers or clients. The parent
relations, type, depth and payload (as well as other metadata fields that will
be specified) must be signed by the originating server. [Note: Thus, once an
event is distributed and referenced by later events, they effectively become
immutable].
The payload may also be encrypted by clients, except in the case where the
payload needs to be interpreted by the servers. A list of event types that
cannot have an encrypted payload are given later.
State
~~~~~
Event graphs may have meta information associated with them, called `state`.
State can be updated over time by servers or clients, subject to
authorisation.
The state of a graph is split into `sections` that can be atomically updated
independently of each other.
State is stored within the graph itself, and can be computed by looking at the
graph in its entirety. We define the state at a given event to be the state of
the sub graph of all events "before" and including that event.
Some sections of the state may determine behaviour of the protocol, including
authorisation and distribution. These sections must not be encrypted.
State Events
++++++++++++
`State events` are events that update a section of state data for a room. These
state events hold all the same properties of events, and are part of the event
graph. The payload of the event is the replacement value for the particular
section of state being updated.
State events must also include a `state_key` metadata field. The pair of fields
type and state_key uniquely defines the section of state that is to be updated.
State Resolution
++++++++++++++++
A given state section may have multiple state events associated with it in a
given graph. A consistent method of selecting which state event takes
precedence is therefore required.
This is done by taking the latest state events, i.e. the set of events that are
either incomparable or after every other event in the graph. A state resolution
algorithm is then applied to this set to select the single event that takes
precedence.
The state resolution algorithm must be transitive and not depend on server
state, as it must consistently select the same event irrespective of the server
or the order the events were received in.
State Dictionary
++++++++++++++++
The state dictionary is the mapping from sections of state to the state events
which set the section to its current value. The state dictionary, like the
state itself, depends on the events currently in the graph and so is updated
with each new event received.
Since the sections of the state are defined by the pair of strings from the
type and state_key of the events that update them, the state dictionary can be
defined as a mapping from the pair (type, state_key) to a state event with
those values in the graph.
Deleting State
++++++++++++++
State sections may also be deleted, i.e. removed from the state dictionary. The
state events will still be present in the event graph.
This is done by sending a special state event indicating that the given entry
should be removed from the dictionary. These events follow the same rules for
state resolution, with the added requirement that it loses all conflicts.
[Note: This is required to make the algorithm transitive.]