matrix.org/content/docs/older/stateres-v2.md

21 KiB

+++ title = "State Resolution v2 for the Hopelessly Unmathematical" extra.author = "Neil Alexander Twigg" aliases = ["/docs/guides/implementing-stateres"] +++

So you don't have a PhD in Mathematics and you want to implement State Resolution v2! This document aims to demystify not just the how but also the why of resolving state conflicts in Matrix room versions 2 and later, along with a bit of pseudocode to demonstrate the implementation.

In Matrix, rooms don't belong to a specific server, but instead are replicated across all servers that are participating in a given room. As a result, each server in a room needs to be able to work out the room state independently and all servers in a room need to be able to arrive, as far as possible, at the same state when doing so. Doing this is what allows a room to "appear" the same for all users, regardless of the homeserver they are joining from.

State resolution is the act of taking two or more different sets of state and working out the final consistent state that we deem truthful. The idea is that all homeservers will agree on what state looks like for a given room by following the same state resolution algorithms.

When resolving state, we adhere to the following guidelines:

  1. Higher power levels should take precedence over lower ones, e.g. the actions of an administrator should overrule those of a moderator
  2. Earlier events should be processed before later ones where possible

It's worth mentioning that concepts such as "earlier" and "later" are difficult to define in Matrix and this is a significant part of what makes state resolution difficult to do. In State Res v2, we have a couple of new concepts for defining when an event is "earlier" or "later" than another event - these are covered in the sections below on ordering events.

Defining state

First of all, a state event is a Matrix event which contains a state_key. The state key defines what the state applies to (or is often an empty string if the state should apply to the entire room). Examples of state events are m.room.create, m.room.join_rules, m.room.power_level, m.room.member etc.

Given that state events are also a part of the room history, state can be different at different points in time. Therefore, the state at a specific event contains a snapshot of all state events that apply to the room at the time of that given event and won't include state events that came after it. In the case of state events, the state at the time of that given event includes the changes in state as provided by the event.

A state tuple is a (event type, state key) tuple. For example:

  • ("m.room.power_levels", "") for room power levels
  • ("m.room.join_rules", "") for room join rules
  • ("m.room.member", "@neilalexander:matrix.org") for neilalexander's room membership

Each state event attempts to update a (event type, state key)->event key-value mapping in a given room, thus allowing for key-value data to be stored per room.

Defining control events and normal state events

In addition to the above, we also define control events as being a subset of state events which specifically have the power to add or remove abilities from another user. These are:

  • Setting room power levels: m.room.power_levels where the state_key equals ""
  • Setting room join rules: m.room.join_rules where the state_key equals ""
  • Kicking a user: m.room.member with "leave" content but where the sender (the admin/mod) does not match the state_key (the kicked user)
  • Banning a user: m.room.member with "ban" content but where the sender (the admin/mod) does not match the state_key (the banned user)

Any state event that does not match the above definition is, instead, classed as a normal state event. These include, for example:

  • Setting room topic: m.room.topic where the state_key equals ""
  • Setting room avatar: m.room.display_avatar where the state_key equals ""
  • Users joining or leaving: m.room.member with "join" or "leave" content where the sender is equal to the state_key

Conflicted and unconflicted state sets

The ultimate goal of state resolution is to take some conflicted inputs and give unconflicted output, and to do so in a consistent way such that all servers will agree on the same state. In effect, we may have been given multiple sets of state due to forks in the room and, when we combine them together, we can end up with multiple different values for one or more given state tuples. The goal is to ensure that our final state only contains a single value for each state tuple.

Forks typically happen when servers that are participating in a room are no longer able to keep each other up-to-date over federation. It's possible that users on different servers will continue to send any number of events into the room, including state events, resulting in state drifting across those servers. When the servers are able to federate again, i.e. after network connectivity is restored, then we can work out the conflicted and unconflicted state by merging the forks.

Conflicted state happens when:

  • We have more than one different value for a given state tuple in each fork
  • We have only one value for a given state tuple but it happened in only one fork and not the other

Unconflicted state happens when:

  • We have only one value for a given state tuple because it is the same in all forks
  • We have only one value for a given state tuple because it happened before the fork

Events in the input set that are unconflicted are considered to be truthful, therefore apart from expecting that they pass auth, we don't do any further state resolution on them. An unconflicted event will always appear in the final resolved state.

Events in the input set that are conflicted are the opposite - to begin with, we have multiple events claiming different values and don't know which to believe. We perform state resolution on them to decide which of those values we wish to select, and only the selected values appear in the final resolved state.

Since we only want to perform state resolution on the conflicted set, start by working through the set of input events, including their auth events and sorting them into conflicted and unconflicted buckets:

var conflicted_events:   list(event)
var unconflicted_events: list(event)

Partial state

We refer to the state as partial when we have not yet completed the state resolution algorithm, therefore the result is incomplete. In the first stage, the partial state is equal to the unconflicted state:

var partial_state: map(event_type -> map(state_key -> event))

for each unconflicted_events as e:
    partial_state[e.event_type()][e.state_key()] = e

Later, the resolved state blocks from the conflicted state will be layered onto the partial state, eventually bringing us to a complete resolved state.

Resolved state

The final resolved state of a room is worked out from doing these things:

  1. Starting with the unconflicted state as our "base layer"
  2. Working out the sequence of control events using reverse topological ordering and then layering those on
  3. Working out the sequence of normal state events using mainline ordering and then layering those on
  4. Reapplying any unconflicted state keys which may have been overwritten in the previous steps

Auth differences

Now that we have a set of conflicted events, we also need to make sure that we have enough information to satisfy the auth checks, regardless of which event we eventually resolve to.

For example, imagine the case that Alice gives some power to Bob (in event A), Bob gives some power to Charlie (in event B) and then Charlie changes the ban level (in event C):

drawing showing an example flow of power level changes

However, in another fork, event A is known (so Bob has received power from Alice) but B was not, therefore in the fork, Charlie never received power and therefore was not authorised to change the ban level. When the two forks converge, we would need to know about B in order to successfully authorise C.

The auth difference therefore contains all auth events that were not common to both forks - if an auth event appears in only one fork but not the other then that auth event must appear in the auth difference:

var auth_difference:     list(event)
var auth_sets:           map(fork_id -> list(event))

for each conflicted_events as e:
    found = true
    for each auth_sets as auth_set:
        if e not in auth_set:
            found = false

    if not found:
        auth_difference.append(e)

The full conflicted set is effectively the conflicted events combined with the auth difference:

var full_conflicted_set: list(event)

full_conflicted_set = conflicted_events + auth_difference

Reverse topological ordering of control events

Now that we have calculated the full conflicted set, which also includes the auth difference, we now can start to work out the order in which power was given or taken away from specific users.

When resolving state conflicts, it's important to work out who really had power to do certain things at a given point in time. Working this out for control events takes priority over working out the order of normal state events - those will be sorted based on the power levels from this step.

To work out the timeline of control events, we implement reverse topological ordering, which results in a list of events in which the order guarantees that auth events always come before the events that refer to them. This is done using an adaptation of Kahn's algorithm.

Kahn's algorithm

Since an event's auth chain forms a directed acyclic graph (DAG), we can say that each auth event is a "node" in the graph and each relationship between two events is an "edge". In our case, two nodes in the auth DAG have a relationship when one event refers to another event as an auth event.

More precisely, we would say that an event that refers to other auth events has one or more outgoing edges and that an event that is relied on by other events has one or more incoming edges.

This allows us to make a couple of assertions: that an event that has no incoming edges happened most recently, and an event that has no outgoing edges happened least recently.

In our case, Kahn's algorithm works on three main components:

  1. The unsorted input list of events, including the auth events
  2. A map, where we count how many incoming edges each node has
  3. The sorted output list of events, effectively the result of the algorithm

Assuming the following data structures, we can populate the event map and incoming edges with initial values:

var output_events:  list(event)
var event_map:      map(event_id -> event)
var incoming_edges: map(event_id -> integer)

for each full_conflicted_set as e:
    event_map[e.event_id] = e
    incoming_edges[e.event_id] = 0

The Matrix-specific details of our implementation of Kahn's algorithm lie in the tie-break between events. The tie-break ensures that we get a deterministic ordering and that we also meet our two guidelines from the beginning: that we will always try to prefer events from the more powerful, and we will always try to process older events first. When sorting, you can assert that x < y by running the following tests in order:

  1. The effective power level at x is greater than the effective power level at y
  2. The event origin server timestamp of x is less than the event origin server timestamp of y (that is, event x is older than y)
  3. The event ID of x is less than the event ID of y, as computed by a lexicographical comparison, as a fallback

To ensure that the algorithm is deterministic (that is, running in the same order) for a given set of inputs, we must ensure that the map of incoming edges has been sorted using the above rules before we do anything with it:

func sorted_incoming_edges() returns list(event):
    sort incoming_edges where x < y when:
        x.power_level > y.power_level or
        x.origin_server_ts < y.origin_server_ts or
        lexographic_compare(x.event_id, y.event_id) < 0
    return incoming_edges

Once the incoming edges have been sorted, iterate through the input list to count how many incoming edges each event has and update the incoming edges map with those values:

for each input_events as e:
    for each e.auth_events as a:
        incoming_edges[a.event_id] += 1

Then look for events that have no incoming edges and insert them into the beginning of the output list—note that we differ from a pure implementation of Kahn's algorithm here as we specifically require the incoming edges map to be sorted (as above):

while len(incoming_edges) > 0:
    for each sorted_incoming_edges() as id, count:
        if count == 0:
            output_event.prepend(event_map[id])

            for each event_map[id].auth_events as auth_event:
                incoming_edges[auth_event.event_id] -= 1

            remove(incoming_edges, id)

Each time we add a node into the output list, we can "remove" its outgoing edges. This means that every auth event referred to by this event has one less incoming edge. Once it has been exhausted, we also remove the event from the incoming edge map, so that we won't try to process it again.

Now we repeat the process, continuously, until the algorithm eventually exhausts the entire incoming edges map. Once the map is empty, all edges have been removed and the earliest events will appear at the beginning of the output list.

The output list is now sorted such that all referred events now appear before the events that refer to them. It is sorted in reverse topological order.

Iteratively apply resolved control events

Now that our control events have been sorted into reverse topological order, where the first events in the list are the "oldest", we can now start to layer these resolved control events in order to build the partial state.

To build up the partial state, start at the oldest event and loop through the list, performing auth checks on each event by using the current partial_state as an allowed set of auth events. We define this process as the function auth_against_partial_state.

It is expected that partial_state is an empty list for the first iteration, as the oldest event (which is the room create event) does not require auth events. The partial state (and, by extension, the selection of allowed auth events) will grow with each iteration.

If the event passes auth checks then update the partial state to include the newly authed event before moving onto the next iteration:

for each output_event as e:
    if auth_against_partial_state(e):
        partial_state[e.event_type][e.state_key] = e

Events that fail the auth check should be ignored and should not find their way into the partial state.

Creating the power level mainline

Once the control events have been applied to the partial state, the remainder of the normal state events should be ordered and processed based on the ordering of their power level auth events. This is done using a technique called mainline ordering.

The power level mainline is created by following the path of power level events, starting from the resolved m.room.power_level event from the partial state, back to the point that the room was created:

var resolved_power_level_event: partial_state["m.room.power_level"][""] as event
var power_level_mainline:       list(event)

func mainline_iterate(e as event):
    power_level_mainline += e
    for each e.auth_events as auth_event:
        if auth_event.event_type == "m.room.power_level":
            mainline_iterate(auth_event)

We refer to the time period between any two given power level events as an epoch.

In the below illustration, the "resolved" event is the event that appears in the partial state from the previous section.

drawing showing partial state example

Mainline ordering of normal state events

Since the mainline will be ordered based on when the power level changes happened, and other state events use power level events as auth events, it is possible to sort the order of normal state events based on which mainline power level event they most closely refer to, or by definition, which epoch the event happened in.

Note that events may appear that refer to power level events which don't appear in the mainline, in which case it is necessary to step through the power level events iteratively until you encounter one that is on the mainline. It is referred to as the closest mainline event:

func get_closest_mainline_event(e as event) returns event:
    var closest_mainline_event: event

    func closest_iterate(e as event):
        if e in power_level_mainline:
            closest_mainline_event = e
        else:
            for each e.auth_events as auth_event:
                if auth_event.event_type == "m.room.power_level":
                    closest_iterate(auth_event)

    closest_iterate(e)
    return closest_mainline_event

In the following illustration, Event B happens in a more recent epoch than Event A as the closest mainline event referred to by Event A is older (closer to the room's create event) than the closest mainline event referred to by Event B:

drawing showing an example of mainline ordering

To sort the events, we use a method very similar to that used in the reverse topological ordering of control events, only instead of initially sorting by power level, we use the position of the nearest mainline power level event on the mainline. When sorting, you can assert that x < y by running the following tests in order:

  1. The closest mainline event to x is later in the mainline then the closest mainline event to y (for example, in the above diagram, A is less than B as B refers to a more recent mainline power level event)
  2. The event origin server timestamp of x is less than the event origin server timestamp of y
  3. The event ID of x is less than the event ID of y, as computed by a lexicographical comparison

To ensure that the algorithm is deterministic (that is, running in the same order) for a given set of inputs, we ensure that the normal state events have been sorted using the above criteria:

func sorted_normal_state_events() as list(event):
    sort normal_events where x < y when:
        x.position_on_mainline < y.position_on_mainline or
        x.origin_server_ts < y.origin_server_ts or
        lexographic_compare(x.event_id, y.event_id) < 0
    return normal_events

In the event that multiple candidate events refer to the same mainline power level event, the remaining two tests will allow the sorting to be deterministic.

Iteratively apply resolved normal state events

Now that our normal state events have been sorted into mainline order, where the first events in the list are the "oldest", we can now start to layer these resolved normal state events on top of the partial state.

Just as with the control events, each normal state event should be authed against the partial state, before applying the valid events to the partial state:

for each sorted_normal_state_events() as e:
    if auth_against_partial_state(e):
        partial_state[e.event_type][e.state_key] = e

Any event that fails to auth against the current partial state should be ignored and should not be applied to the partial state.

Re-apply unconflicted state

Now that all of the events from the original conflicted set have been resolved and applied, we complete one final pass over the partial state by re-applying the original unconflicted set. Ordinarily, nothing from the conflicted set should overwrite anything from the unconflicted set, but pulling in auth events may bring in additional state.

Reapply the unconflicted state events onto the partial state to complete the state:

for each unconflicted_events as e:
    partial_state[e.event_type][e.state_key] = e

At this point, the partial state is now effectively complete, therefore it now represents the final resolved state.

Finishing up

Now that you have completed all of the above steps, you have successfully implemented State Resolution v2! If you have any questions, please feel free to join us in the Homeserver Developers channel.

Further reading: