500 lines
21 KiB
Markdown
500 lines
21 KiB
Markdown
+++
|
|
title = "State Resolution v2 for the Hopelessly Unmathematical"
|
|
extra.author = "Neil Alexander Twigg"
|
|
aliases = ["/docs/guides/implementing-stateres"]
|
|
+++
|
|
|
|
So you don't have a PhD in Mathematics and you want to implement [State
|
|
Resolution v2](https://matrix.org/docs/spec/rooms/v2)! This document aims to
|
|
demystify not just the _how_ but also the _why_ of resolving state conflicts in
|
|
Matrix room versions 2 and later, along with a bit of pseudocode to demonstrate
|
|
the implementation.
|
|
|
|
In Matrix, rooms don't belong to a specific server, but instead are replicated
|
|
across all servers that are participating in a given room. As a result, each
|
|
server in a room needs to be able to work out the room state independently and
|
|
all servers in a room need to be able to arrive, as far as possible, at the same
|
|
state when doing so. Doing this is what allows a room to "appear" the same for
|
|
all users, regardless of the homeserver they are joining from.
|
|
|
|
State resolution is the act of taking two or more different sets of state and
|
|
working out the final consistent state that we deem truthful. The idea is that
|
|
all homeservers will agree on what state looks like for a given room by
|
|
following the same state resolution algorithms.
|
|
|
|
When resolving state, we adhere to the following guidelines:
|
|
|
|
1. Higher power levels should take precedence over lower ones, e.g. the actions
|
|
of an administrator should overrule those of a moderator
|
|
2. Earlier events should be processed before later ones where possible
|
|
|
|
It's worth mentioning that concepts such as "earlier" and "later" are difficult
|
|
to define in Matrix and this is a significant part of what makes state
|
|
resolution difficult to do. In State Res v2, we have a couple of new concepts
|
|
for defining when an event is "earlier" or "later" than another event - these
|
|
are covered in the sections below on ordering events.
|
|
|
|
### Defining state
|
|
|
|
First of all, a state event is a Matrix event which contains a `state_key`. The
|
|
state key defines _what_ the state applies to (or is often an empty string if
|
|
the state should apply to the entire room). Examples of state events are
|
|
`m.room.create`, `m.room.join_rules`, `m.room.power_level`, `m.room.member` etc.
|
|
|
|
Given that state events are also a part of the room history, state can be
|
|
different at different points in time. Therefore, the state at a specific event
|
|
contains a snapshot of all state events that apply to the room at the time of
|
|
that given event and won't include state events that came after it. In the case
|
|
of state events, the state at the time of that given event includes the changes
|
|
in state as provided by the event.
|
|
|
|
A state tuple is a `(event type, state key)` tuple. For example:
|
|
|
|
* `("m.room.power_levels", "")` for room power levels
|
|
* `("m.room.join_rules", "")` for room join rules
|
|
* `("m.room.member", "@neilalexander:matrix.org")` for `neilalexander`'s room
|
|
membership
|
|
|
|
Each state event attempts to update a `(event type, state key)->event` key-value
|
|
mapping in a given room, thus allowing for key-value data to be stored per room.
|
|
|
|
### Defining control events and normal state events
|
|
|
|
In addition to the above, we also define _control events_ as being a subset of
|
|
state events which specifically have the power to add or remove abilities from
|
|
another user. These are:
|
|
|
|
* **Setting room power levels:** `m.room.power_levels` where the `state_key`
|
|
equals `""`
|
|
* **Setting room join rules:** `m.room.join_rules` where the `state_key`
|
|
equals `""`
|
|
* **Kicking a user:** `m.room.member` with `"leave"` content but where the
|
|
`sender` (the admin/mod) does not match the `state_key` (the kicked user)
|
|
* **Banning a user:** `m.room.member` with `"ban"` content but where the
|
|
`sender` (the admin/mod) does not match the `state_key` (the banned user)
|
|
|
|
Any _state event_ that does not match the above definition is, instead, classed
|
|
as a normal state event. These include, for example:
|
|
|
|
* **Setting room topic:** `m.room.topic` where the `state_key` equals `""`
|
|
* **Setting room avatar:** `m.room.display_avatar` where the `state_key`
|
|
equals `""`
|
|
* **Users joining or leaving:** `m.room.member` with `"join"` or `"leave"`
|
|
content where the `sender` is equal to the `state_key`
|
|
|
|
### Conflicted and unconflicted state sets
|
|
|
|
The ultimate goal of state resolution is to take some _conflicted_ inputs and
|
|
give _unconflicted_ output, and to do so in a consistent way such that all
|
|
servers will agree on the same state. In effect, we may have been given multiple
|
|
sets of state due to forks in the room and, when we combine them together, we
|
|
can end up with multiple different values for one or more given state tuples.
|
|
The goal is to ensure that our final state only contains a single value for each
|
|
state tuple.
|
|
|
|
Forks typically happen when servers that are participating in a room are no
|
|
longer able to keep each other up-to-date over federation. It's possible that
|
|
users on different servers will continue to send any number of events into the
|
|
room, including state events, resulting in state drifting across those servers.
|
|
When the servers are able to federate again, i.e. after network connectivity is
|
|
restored, then we can work out the conflicted and unconflicted state by merging
|
|
the forks.
|
|
|
|
Conflicted state happens when:
|
|
|
|
* We have _more than one_ different value for a given state tuple in each fork
|
|
* We have _only one_ value for a given state tuple but it happened in only one
|
|
fork and not the other
|
|
|
|
Unconflicted state happens when:
|
|
|
|
* We have _only one_ value for a given state tuple because it is the same in all
|
|
forks
|
|
* We have _only one_ value for a given state tuple because it happened before
|
|
the fork
|
|
|
|
Events in the input set that are _unconflicted_ are considered to be truthful,
|
|
therefore apart from expecting that they pass auth, we don't do any further
|
|
state resolution on them. An _unconflicted_ event will _always_ appear in the
|
|
final resolved state.
|
|
|
|
Events in the input set that are _conflicted_ are the opposite - to begin with,
|
|
we have multiple events claiming different values and don't know which to
|
|
believe. We perform state resolution on them to decide which of those values we
|
|
wish to select, and _only_ the selected values appear in the final resolved
|
|
state.
|
|
|
|
Since we only want to perform state resolution on the _conflicted_ set, start by
|
|
working through the set of input events, including their auth events and sorting
|
|
them into _conflicted_ and _unconflicted_ buckets:
|
|
|
|
```
|
|
var conflicted_events: list(event)
|
|
var unconflicted_events: list(event)
|
|
```
|
|
|
|
### Partial state
|
|
|
|
We refer to the state as _partial_ when we have not yet completed the state
|
|
resolution algorithm, therefore the result is incomplete. In the first stage,
|
|
the partial state is equal to the unconflicted state:
|
|
|
|
```
|
|
var partial_state: map(event_type -> map(state_key -> event))
|
|
|
|
for each unconflicted_events as e:
|
|
partial_state[e.event_type()][e.state_key()] = e
|
|
```
|
|
|
|
Later, the resolved state blocks from the _conflicted state_ will be layered
|
|
onto the partial state, eventually bringing us to a complete _resolved state_.
|
|
|
|
### Resolved state
|
|
|
|
The final resolved state of a room is worked out from doing these things:
|
|
|
|
1. Starting with the _unconflicted_ state as our "base layer"
|
|
2. Working out the sequence of _control events_ using reverse topological
|
|
ordering and then layering those on
|
|
3. Working out the sequence of _normal state events_ using mainline ordering and
|
|
then layering those on
|
|
4. Reapplying any _unconflicted_ state keys which may have been overwritten in
|
|
the previous steps
|
|
|
|
### Auth differences
|
|
|
|
Now that we have a set of _conflicted_ events, we also need to make sure that we
|
|
have enough information to satisfy the auth checks, regardless of which event we
|
|
eventually resolve to.
|
|
|
|
For example, imagine the case that Alice gives some power to Bob (in event `A`),
|
|
Bob gives some power to Charlie (in event `B`) and then Charlie changes the ban
|
|
level (in event `C`):
|
|
|
|
![drawing showing an example flow of power level changes](/docs/legacy/2020-04-14-stateresv2-1.svg)
|
|
|
|
However, in another fork, event `A` is known (so Bob has received power from
|
|
Alice) but `B` was not, therefore in the fork, Charlie never received power and
|
|
therefore was not authorised to change the ban level. When the two forks
|
|
converge, we would need to know about `B` in order to successfully authorise
|
|
`C`.
|
|
|
|
The _auth difference_ therefore contains all auth events that were not common to
|
|
both forks - if an auth event appears in only one fork but not the other then
|
|
that auth event must appear in the _auth difference_:
|
|
|
|
```
|
|
var auth_difference: list(event)
|
|
var auth_sets: map(fork_id -> list(event))
|
|
|
|
for each conflicted_events as e:
|
|
found = true
|
|
for each auth_sets as auth_set:
|
|
if e not in auth_set:
|
|
found = false
|
|
|
|
if not found:
|
|
auth_difference.append(e)
|
|
```
|
|
|
|
The _full conflicted set_ is effectively the conflicted events combined with the
|
|
auth difference:
|
|
|
|
```
|
|
var full_conflicted_set: list(event)
|
|
|
|
full_conflicted_set = conflicted_events + auth_difference
|
|
```
|
|
|
|
### Reverse topological ordering of control events
|
|
|
|
Now that we have calculated the _full conflicted set_, which also includes the
|
|
_auth difference_, we now can start to work out the order in which power was
|
|
given or taken away from specific users.
|
|
|
|
When resolving state conflicts, it's important to work out who really had power
|
|
to do certain things at a given point in time. Working this out for control
|
|
events takes priority over working out the order of normal state events - those
|
|
will be sorted based on the power levels from this step.
|
|
|
|
To work out the timeline of control events, we implement reverse topological
|
|
ordering, which results in a list of events in which the order guarantees that
|
|
auth events _always_ come before the events that refer to them. This is done
|
|
using an adaptation of Kahn's algorithm.
|
|
|
|
#### **Kahn's algorithm**
|
|
|
|
Since an event's auth chain forms a directed acyclic graph (DAG), we can say
|
|
that each auth event is a "node" in the graph and each relationship between two
|
|
events is an "edge". In our case, two nodes in the auth DAG have a relationship
|
|
when one event refers to another event as an auth event.
|
|
|
|
More precisely, we would say that an event that _refers to_ other auth events
|
|
has one or more _outgoing edges_ and that an event that is relied on by other
|
|
events has one or more _incoming edges_.
|
|
|
|
This allows us to make a couple of assertions: that an event that has no
|
|
incoming edges happened most recently, and an event that has no outgoing edges
|
|
happened least recently.
|
|
|
|
In our case, Kahn's algorithm works on three main components:
|
|
|
|
1. The unsorted input list of events, including the auth events
|
|
2. A map, where we count how many incoming edges each node has
|
|
3. The sorted output list of events, effectively the result of the algorithm
|
|
|
|
Assuming the following data structures, we can populate the event map and
|
|
incoming edges with initial values:
|
|
|
|
```
|
|
var output_events: list(event)
|
|
var event_map: map(event_id -> event)
|
|
var incoming_edges: map(event_id -> integer)
|
|
|
|
for each full_conflicted_set as e:
|
|
event_map[e.event_id] = e
|
|
incoming_edges[e.event_id] = 0
|
|
```
|
|
|
|
The Matrix-specific details of our implementation of Kahn's algorithm lie in the
|
|
tie-break between events. The tie-break ensures that we get a deterministic
|
|
ordering and that we also meet our two guidelines from the beginning: that we
|
|
will always try to prefer events from the more powerful, and we will always try
|
|
to process older events first. When sorting, you can assert that **x < y** by
|
|
running the following tests in order:
|
|
|
|
1. The effective power level at **x** is greater than the effective power level
|
|
at **y**
|
|
2. The event origin server timestamp of **x** is less than the event origin
|
|
server timestamp of **y** (that is, event **x** is older than **y**)
|
|
3. The event ID of **x** is less than the event ID of **y**, as computed by a
|
|
lexicographical comparison, as a fallback
|
|
|
|
To ensure that the algorithm is deterministic (that is, running in the same
|
|
order) for a given set of inputs, we must ensure that the map of incoming edges
|
|
has been sorted using the above rules before we do anything with it:
|
|
|
|
```
|
|
func sorted_incoming_edges() returns list(event):
|
|
sort incoming_edges where x < y when:
|
|
x.power_level > y.power_level or
|
|
x.origin_server_ts < y.origin_server_ts or
|
|
lexographic_compare(x.event_id, y.event_id) < 0
|
|
return incoming_edges
|
|
```
|
|
|
|
Once the incoming edges have been sorted, iterate through the input list to
|
|
count how many incoming edges each event has and update the incoming edges map
|
|
with those values:
|
|
|
|
```
|
|
for each input_events as e:
|
|
for each e.auth_events as a:
|
|
incoming_edges[a.event_id] += 1
|
|
```
|
|
|
|
Then look for events that have no incoming edges and insert them into the
|
|
beginning of the output list—note that we differ from a pure implementation of
|
|
Kahn's algorithm here as we specifically require the incoming edges map to be
|
|
sorted (as above):
|
|
|
|
```
|
|
while len(incoming_edges) > 0:
|
|
for each sorted_incoming_edges() as id, count:
|
|
if count == 0:
|
|
output_event.prepend(event_map[id])
|
|
|
|
for each event_map[id].auth_events as auth_event:
|
|
incoming_edges[auth_event.event_id] -= 1
|
|
|
|
remove(incoming_edges, id)
|
|
```
|
|
|
|
Each time we add a node into the output list, we can "remove" its outgoing
|
|
edges. This means that every auth event referred to by this event has one less
|
|
incoming edge. Once it has been exhausted, we also remove the event from the
|
|
incoming edge map, so that we won't try to process it again.
|
|
|
|
Now we repeat the process, continuously, until the algorithm eventually exhausts
|
|
the entire incoming edges map. Once the map is empty, all edges have been
|
|
removed and the earliest events will appear at the beginning of the output list.
|
|
|
|
The output list is now sorted such that all referred events now appear before
|
|
the events that refer to them. It is sorted in _reverse topological order_.
|
|
|
|
### Iteratively apply resolved control events
|
|
|
|
Now that our control events have been sorted into reverse topological order,
|
|
where the first events in the list are the "oldest", we can now start to layer
|
|
these resolved control events in order to build the partial state.
|
|
|
|
To build up the partial state, start at the oldest event and loop through the
|
|
list, [performing auth checks on each
|
|
event](https://matrix.org/docs/spec/rooms/v1#authorization-rules) by using the
|
|
current `partial_state` as an allowed set of auth events. We define this process
|
|
as the function `auth_against_partial_state`.
|
|
|
|
It is expected that `partial_state` is an empty list for the first iteration, as
|
|
the oldest event (which is the room create event) does not require auth events.
|
|
The partial state (and, by extension, the selection of allowed auth events) will
|
|
grow with each iteration.
|
|
|
|
If the event passes auth checks then update the partial state to include the
|
|
newly authed event before moving onto the next iteration:
|
|
|
|
```
|
|
for each output_event as e:
|
|
if auth_against_partial_state(e):
|
|
partial_state[e.event_type][e.state_key] = e
|
|
```
|
|
|
|
Events that fail the auth check should be ignored and should not find their way
|
|
into the partial state.
|
|
|
|
### Creating the power level mainline
|
|
|
|
Once the control events have been applied to the partial state, the remainder of
|
|
the normal state events should be ordered and processed based on the ordering of
|
|
their power level auth events. This is done using a technique called _mainline
|
|
ordering_.
|
|
|
|
The _power level_ _mainline_ is created by following the path of power level
|
|
events, starting from the resolved `m.room.power_level` event from the partial
|
|
state, back to the point that the room was created:
|
|
|
|
```
|
|
var resolved_power_level_event: partial_state["m.room.power_level"][""] as event
|
|
var power_level_mainline: list(event)
|
|
|
|
func mainline_iterate(e as event):
|
|
power_level_mainline += e
|
|
for each e.auth_events as auth_event:
|
|
if auth_event.event_type == "m.room.power_level":
|
|
mainline_iterate(auth_event)
|
|
```
|
|
|
|
We refer to the time period between any two given power level events as an
|
|
_epoch_.
|
|
|
|
In the below illustration, the "resolved" event is the event that appears in the
|
|
partial state from the previous section.
|
|
|
|
![drawing showing partial state example](/docs/legacy/2020-04-14-stateresv2-2.svg)
|
|
|
|
### Mainline ordering of normal state events
|
|
|
|
Since the mainline will be ordered based on when the power level changes
|
|
happened, and other state events use power level events as auth events, it is
|
|
possible to sort the order of normal state events based on which mainline power
|
|
level event they most closely refer to, or by definition, which epoch the event
|
|
happened in.
|
|
|
|
Note that events may appear that refer to power level events which don't appear
|
|
in the mainline, in which case it is necessary to step through the power level
|
|
events iteratively until you encounter one that is on the mainline. It is
|
|
referred to as the _closest mainline event_:
|
|
|
|
```
|
|
func get_closest_mainline_event(e as event) returns event:
|
|
var closest_mainline_event: event
|
|
|
|
func closest_iterate(e as event):
|
|
if e in power_level_mainline:
|
|
closest_mainline_event = e
|
|
else:
|
|
for each e.auth_events as auth_event:
|
|
if auth_event.event_type == "m.room.power_level":
|
|
closest_iterate(auth_event)
|
|
|
|
closest_iterate(e)
|
|
return closest_mainline_event
|
|
```
|
|
|
|
In the following illustration, _Event B_ happens in a more recent epoch than
|
|
_Event A_ as the closest mainline event referred to by _Event A_ is older
|
|
(closer to the room's create event) than the closest mainline event referred to
|
|
by _Event B_:
|
|
|
|
![drawing showing an example of mainline ordering](/docs/legacy/2020-04-14-stateresv2-3.svg)
|
|
|
|
To sort the events, we use a method very similar to that used in the reverse
|
|
topological ordering of control events, only instead of initially sorting by
|
|
power level, we use the position of the nearest mainline power level event on
|
|
the mainline. When sorting, you can assert that **x < y** by running the
|
|
following tests in order:
|
|
|
|
1. The closest mainline event to **x** is later in the mainline then the closest
|
|
mainline event to **y** (for example, in the above diagram, **A** is less
|
|
than **B** as **B** refers to a more recent mainline power level event)
|
|
2. The event origin server timestamp of **x** is less than the event origin
|
|
server timestamp of **y**
|
|
3. The event ID of **x** is less than the event ID of **y**, as computed by a
|
|
lexicographical comparison
|
|
|
|
To ensure that the algorithm is deterministic (that is, running in the same
|
|
order) for a given set of inputs, we ensure that the normal state events have
|
|
been sorted using the above criteria:
|
|
|
|
```
|
|
func sorted_normal_state_events() as list(event):
|
|
sort normal_events where x < y when:
|
|
x.position_on_mainline < y.position_on_mainline or
|
|
x.origin_server_ts < y.origin_server_ts or
|
|
lexographic_compare(x.event_id, y.event_id) < 0
|
|
return normal_events
|
|
```
|
|
|
|
In the event that multiple candidate events refer to the same mainline power
|
|
level event, the remaining two tests will allow the sorting to be deterministic.
|
|
|
|
### Iteratively apply resolved normal state events
|
|
|
|
Now that our normal state events have been sorted into mainline order, where the
|
|
first events in the list are the "oldest", we can now start to layer these
|
|
resolved normal state events on top of the _partial state_.
|
|
|
|
Just as with the control events, each normal state event should be authed
|
|
against the partial state, before applying the valid events to the partial
|
|
state:
|
|
|
|
```
|
|
for each sorted_normal_state_events() as e:
|
|
if auth_against_partial_state(e):
|
|
partial_state[e.event_type][e.state_key] = e
|
|
```
|
|
|
|
Any event that fails to auth against the current partial state should be ignored
|
|
and should not be applied to the partial state.
|
|
|
|
### Re-apply unconflicted state
|
|
|
|
Now that all of the events from the original conflicted set have been resolved
|
|
and applied, we complete one final pass over the partial state by re-applying
|
|
the original unconflicted set. Ordinarily, nothing from the conflicted set
|
|
should overwrite anything from the unconflicted set, but pulling in auth events
|
|
may bring in additional state.
|
|
|
|
Reapply the unconflicted state events onto the partial state to complete the
|
|
state:
|
|
|
|
```
|
|
for each unconflicted_events as e:
|
|
partial_state[e.event_type][e.state_key] = e
|
|
```
|
|
|
|
At this point, the partial state is now effectively _complete_, therefore it now
|
|
represents the final resolved state.
|
|
|
|
### Finishing up
|
|
|
|
Now that you have completed all of the above steps, you have successfully
|
|
implemented State Resolution v2! If you have any questions, please feel free to
|
|
join us in the [Homeserver
|
|
Developers](https://matrix.to/#/#homeservers-dev:matrix.org) channel.
|
|
|
|
Further reading:
|
|
|
|
- [Matrix Specification - Room Version 2](https://matrix.org/docs/spec/rooms/v2)
|
|
- [State Resolution: Reloaded (in Haskell)](https://matrix.uhoreg.ca/stateres/reloaded.html)
|
|
- [MSC1442 - State Resolution: Reloaded](https://github.com/matrix-org/matrix-doc/blob/erikj/state_res_msc/proposals/1442-state-resolution.md)
|