matrix-doc/proposals/3901-deleting-state.md

27 KiB

MSC3901: Deleting State

Tasks:

  • Create document
  • Intro and motivation
  • History
  • Room upgrades will help
  • Definition of obsolete state
  • Rename to "obsolete"
  • Brief summary of each sub-proposal
  • Consider changing "definition of obsolete state" into a sub-proposal
    • No, I think it is part of sub-proposal 1
  • Go through the meeting notes and transfer ideas into sub-proposals
  • Add thoughts from the Deleting state room on marking invitations so we know they come from an upgrade and should be auto-joined.
  • Complete tasks scattered through the doc
  • Add Travis' thought about bans [1]
  • Complete detailed definition of sub-proposals, with help from people who know about each area
  • Request review
  • Ask whether we can speed up faster remote joins by omitting obsolete state

Introduction

See also the video of Solving the Historical State Problem in Matrix in which Andrew Morgan introduces this proposal at the GPN21 event.

Why delete state?

Matrix rooms have an ever-growing list of state, caused by real-world events like someone joining a room or sharing their live location.

Even when this state is potentially of little interest (e.g a person left the room a long time ago, or they stopped sharing their MSC3489 location), servers and clients must continue processing and passing around the state: once a state_key has been created it will always exist in a room.

This ever-increasing list of state causes load on servers and clients in terms of CPU, memory, disk and bandwidth. Since some of this state is of little interest to users, it would be good to reduce this load.

Further, some more recent spec proposals attempt to increase the number of state events in use (e.g. MSC3401, MSC3489), and give permission by default for less-privileged users to create state events (e.g. MSC3757, MSC3779). If these proposals are accepted, it will be easier for malicious or spammy users to flood a room with undeletable state, potentially mounting a denial of service attack against involved homeservers. So, some solution to "clean" an affected room is desirable.

Note that throughout this document we are only concerned with state events[^and-redactions]: other events are not relevant to this problem.

[^and-redactions] (and of course, events that redact state events.)

How this came about

Over several months in 2022 some interested people got together and discussed how to address this situation. There was much discussion of how to structure the room graph to allow "forgetting" old state, and not all ideas were fully explored, but all added complexity, and most ended up with some idea of a new root node, similar in character to a m.room.create event.

We already have a mechanism to start a new room based on an old one: room upgrades. So, we agreed to explore ideas about how to make room upgrades more seamless, in the hope that they will become good enough to allow "cleaning" rooms of unimportant state.

Improving room upgrades will help

We propose improving room upgrades in various ways, and offering an "Optimise Room" function in clients that allows room administators to "upgrade" a room (manually) to a new one with the same version.

With enough improvements to upgrades, we believe this will materially improve the current situation, since there will be a viable plan for rooms that become difficult to use due to heavy activity or abuse.

We accept that an automatic process that was fully seamless would be better, but we were unable to design one, and we hope that:

a) improvements to room upgrades may eventually lead to a process that is so smooth it can be automatic, or at least very easy, and

b) improvements to room upgrades will bring benefits to Matrix even if they don't turn out to be the right way to solve the deleting state problem.

Also, reduce state sent to clients

In addition to improving room upgrades, we think we can improve the situation by shrinking the state that is sent to clients on initial sync. This should reduce unnecessary bandwidth use, and reduce storage use within clients.

Structure of this document

This MSC will probably eventually be split into several MSCs, but they are gathered together for now to ensure we keep their shared purpose in mind: reducing the burden of uninteresting state.

Additionally, this document contains a definition of "obsolete" state, which is referenced in several of the sub-proposals.

The sub-proposals are all believed to be independent[1], but they are listed in an order that we think makes sense to use, since those listed earlier will probably be simpler and help us think more clearly about the later ones.

[1] Although 3 (auto-accept invites) does not make a lot of sense without 2 (create invites).

Definition of obsolete state

Purpose of this definition

If we can define clearly what state we consider to be "obsolete", we can make decisions about what to do with it, including not sending it to clients on an initial sync, and not copying it across when a room is upgraded.

Motivation for the definition

Loosely, "obsolete" state is state that is not useful for understanding the state of the room at this point. For example, knowing that someone shared their location in the past is of historical interest, but is not useful for displaying a live indication of who is sharing now. Similarly, knowing that someone left the room is not useful for displaying a list of current room members.

Removing a piece of "obsolete" state does not materially change the actual condition of the room (again, speaking loosely).

Formal definition

An obsolete state event is a state event that has m.obsolete: true at the top level of its content.

For example, this event is an obsolete state event:

{
  "type": "m.beacon_info",
  "state_key": "@matthew:matrix.org_46583241",
  "content": {
    "description": "Matthew's other phone",
    "live": false,
    "m.obsolete": true,
    "m.ts": 1436829458432,
    "timeout": 86400000,
    "m.asset": { "type": "m.self" }
  }
}

(This example is from MSC3489, and in that specific case it would need to be considered whether m.obsolete makes the live property redundant.)

If a state event has m.obsolete: false or no m.obsolete property at all, it is not obsolete.

No event should ever have an m.obsolete property with any other value (other than true or false. (If a different value is encountered, it should be treated as false.)

To mark some state as obsolete, a client sends a state event with m.obsolete: true in its content. To "unobsolete" some state later, the client sends another state event with no m.obsolete property (or with m.obsolete: false).

Redacted state events are obsolete

We propose to update the definition of event redaction[^spec-redactions] to specify that all redacted state events contain m.obsolete: true in their content.

[^spec-redactions] https://spec.matrix.org/v1.4/rooms/v10/#redactions

Leave events are obsolete

We propose to update the definition of membership events so that every event with membership: "leave" must also have m.obsolete: true in its content.

Note: membership: "ban" events are not considered obsolete since this information is needed in future to prevent bad actors from re-entering a room. Similarly, invite rejections are not considered obsolete.

Encrypted obsolete state events

Currently, state events are not encrypted, but MSC3414 proposes allowing them to be encrypted.

If MSC3414 goes ahead, an obsolete encrypted state event should contain m.obsolete: true in its unencrypted content, as a sibling of e.g. algorithm and ciphertext.

When the ciphertext is decrypted, the content in the plaintext JSON should also contain m.obsolete: true. The unencrypted and encrypted information should always be identical (present in one if and only if it is present in the other, and with identical values if present). If a client encounters different values here, the unencrypted value should be considered the source of truth (since servers can't read the encrypted value and we want servers to agree with clients).

Alternative definitions

content: null

We considered defining an obsolete state event as an event with a state_key and null content.

However, some existing obsolete state events such as leaving events (membership events indicating that someone left the room) contain useful content, and there is no reason to assume that future ones won't also want to do something similar.

m.obsolete as a sibling of content

We could say that the m.obsolete property is not inside content, but alongside it.

This might make it easier for servers to find and index obsolete state.

However, it would require us to provide a special mechanism (e.g. a new endpoint) to allow clients to mark events as obsolete, making the implementation burden of this proposal much greater for both clients and servers.

Avoiding a new room version by adding special cases

Some state is already, loosely speaking, "obsolete" in the sense that new members don't really care about it. For example, leaving events.

It might be possible to define obsolete state as including these special cases, and this might allow us to avoid needing a new room version. It would also reduce unnecessary boilerplate (and hence bandwidth) in cases like membership: leave, where we will always required obsolete: true as well.

However, we believe that we need to change the rules around redacted events, meaning that we can't avoid a new room version. Since we need a new room version anyway, we have gone for a simpler definition of obsolete state with no special cases. We believe the extra boilerplate is worth it to avoid any chance of confusion.

Sub-proposal 1: Hide obsolete state from clients on initial sync

Proposal

Based on our definition of "obsolete" state, when sending room state to clients for an initial sync, do not include obsolete state.

Proposed spec wording change

In GET /_matrix/client/v3/sync, under "Responses", "Joined Room", in the Description of "state", should be updated to read:

Updates to the state in the form of state events. Only includes events that occurred before the events provided in timeline.

If since is not provided, or full_state is true, this includes one event for each non-obsolete state key that was updated before the start of the events

Updates to the state, between the time indicated by the since parameter, and the start of the timeline (or all state up to the start of the timeline, if since is not given, or full_state is true).

N.B. state updates for m.room.member events will be incomplete if lazy_load_members is enabled in the /sync filter, and only return the member events required to display the senders of the timeline events in this response.

For reference, the current wording is:

Updates to the state, between the time indicated by the since parameter, and the start of the timeline (or all state up to the start of the timeline, if since is not given, or full_state is true).

N.B. state updates for m.room.member events will be incomplete if lazy_load_members is enabled in the /sync filter, and only return the member events required to display the senders of the timeline events in this response.

New room version

Since this depends on the definition of obsolete state, which requires changes to redaction logic, this proposal requires a new room version.

Potential issues

If clients actually need obsolete state to render properly, this would imply that events have been marked as obsolete when they should not have been. (Note: we are discussing current room state here, not state events. Obsolete state events should be returned as normal when the events timeline is requested. This allows users to explore historical events.)

The only time when an obsolete state event is needed to update room state is when a client has already received non-obsolete state for this state_key. Since this proposal only affects initial sync, clients have not received any state, so this does not apply.

Alternatives

We could simply not do this, and hope that the measures we will take to reduce the load of state on the server will also be enough to help clients.

However, this seems a relatively easy proposal, and we hope that implementing it will help us understand what we really mean by "obsolete" state, and flush out problems we have not yet considered.

Security considerations

If security-critical events were not sent to clients, this could cause security problems, but since only events that are irrelevant to clients should be marked as obsolete, this should not happen.

Dependencies

As soon as we can agree on a definition of obsolete state, we believe this proposal can be implemented.

We will want to adapt existing and proposed behaviour to mark obsolete events as such. (Examples: leave events, stopping live location sharing, ending a video call.) However, this does not need to be done at the same time as implementing the behaviour of not sending obsolete state to clients: we can create the behaviour first and gradually adapt events to fit with it later.

Sub-proposal 2: Invite users to an upgraded room

Currently, when an invite-only room is upgraded, all the users must be re-invited to the new room.

We propose to invite all users as part of the room upgrade process.

Proposal

Relevant spec section: 11.33.3 Room upgrades - Server Behaviour.

When a client requests to upgrade a room using POST /rooms/{roomId}/upgrade, this should be interpreted by the server as a request not only to create the room, but also to invite all members of the old room to the new one, with the same power level.

The server should send invitations on behalf of the user performing the upgrade. These invitations should contain a part_of property in their content, whose value is the ID of the m.room.create event of the new room. (This makes a later step, automatically accepting these invitations, possible - see sub-proposal 3).

Note that this behaviour does not affect the auth rules for either room in any way: the server simply sends invitations on behalf of the upgrading user.

Specific spec wording changes

In point 3 of Server behaviour:

Before:

Membership events should not be transferred to the new room due to technical
limitations of servers not being able to impersonate people from other
homeservers. Additionally, servers should not transfer state events which are
sensitive to who sent them, such as events outside of the Matrix namespace where
clients may rely on the sender to match certain criteria.

After:

Servers should not transfer state events which are sensitive to who sent them,
such as events outside of the Matrix namespace where clients may rely on the
sender to match certain criteria.

Add a new point after point 3:

If the user upgrading the room is registered with this homeserver, create
invitation events on behalf of the upgrading user for every user who is
currently a member of the old room, inviting them to the new room. Also set the
room power levels to give the same power level to each user that they had in the
old room.

Only members who are currently members of the room should be invited to the new
ones.

`m.room.member` events should also be created for users who are banned from
the old room, banning them from the new room with the same information.

(Note: if the admin wishes to forget this ban state, they may unban the users in the usual way - setting their membership to leave, which will make the member state event obsolete, meaning it will be forgotten in any upgrade they perform later.)

In m.room.member, under "Content", add a property:

Name: part_of
Type: string
Description: The Event ID of the m.room.create event that this invitation is
part of, if any.

Potential issues

Invitations will not be generated if the upgrading user's homeserver is not participating in the room. However, since the user is in the room, their homeserver will be participating.

Alternatives

MSC3325 proposes that all users in the old room be allowed to join the new room by using a restricted join rule.

MSC3325 also mentions as an alternative that the room membership of each user could be set as invited without actually sending an invitation, to avoid invite spam.

Security considerations

This operation causes a homeserver to send out lots of invitations, which could be a cause of invite spam. It can only be caused by someone who is an admin of a room already containing the recipients, so that limits the scope.

Dependencies

No dependencies.

Sub-proposal 3: Auto-accept invitations to upgraded rooms

Currently, when a room is upgraded, users do not join them until their client follows the room link in the tombstone event. Some clients require users to perform this step manually, and others do it automatically.

This makes room upgrades clunky, and prevents users from receiving events for upgraded rooms until their client triggers the upgrade. This can cause users to miss important messages.

We propose to specify how servers can evaluate suggested room upgrades, and if they consider them valid, automatically join users from the old room to the upgraded one.

Proposal

When a homeserver observes that a room is being upgraded, we propose that it accepts the resulting invitation to that room on behalf of all users invited to the new room who are registered with this homeserver.

To do this safely, the server must check that the user was a member of the room before it was upgraded.

The server will begin this process if it finds a new m.room.member event that has its part_of property set. This should contain the event ID of an m.room.create event. If it does, the server should examine that event to find a predecessor room and event ID. If these exist, the server should validate that the predecessor event ID refers to a tombstone event in that room, that the tombstone event refers to the new room as successor, and that the user was a member of the old room at the time the tombstone was created. If all these are true, the server should auto-join the user to the new room by emitting an m.room.member event on their behalf whose properties match their membership of the old room (excluding join_authorised_via_users_server, which should be omitted since the user is invited, so does not need additional authorisation).

Note that this behaviour does not affect the auth rules for either room in any way: the server simply accepts invitations on behalf of the user under these circumstances.

Potential issues

Alternatives

Security considerations

Joining a room automatically could very easily be problematic, so this proposal requires close scrutiny.

We believe that it is safe because the requirement to check back in the old room and validate that there is a tombstone pointing at the new room, and that the user was a member of the old room at the time of the tombstone mean that this process can only be triggered by someone able to create a tombstone within a room of which the user is a member.

So only an admin of a room I am in can trigger me to auto-join a new room.

Dependencies

This depends on sub-proposal 2, because it requires that m.room.member events contain the part_of property.

Sub-proposal 4: Copy more state to upgraded rooms

Currently, when a room is upgraded, the new room is only somewhat similar to the old one.

We propose to expand the definition of a room upgrade to copy all useful information from the old to the new room.

This involves copying all non-obsolete, non-user-scoped room state by creating state events in the upgraded room.

Proposal

When upgrading a room, the homeserver should examine the state of the old room and create state events in the new room with the same state_key and contents, but with sender set to the mxid of the user performing the upgrade.

The server should copy all state except:

  • Obsolete state, as defined earlier in this proposal
  • User-scoped state i.e. any state whose state_key is equal to the sender's mxid. (If MSC3779 "Owned state events" is merged, user-scoped state will also include anything with a state_key that starts with the user's mxid plus underscore.

Note: if a client creates custom state events that for some reason should not survive a room upgrade, the client should mark them as obsolete before the upgrade is performed.

Proposed spec wording change

In 11.33.3 Server behaviour, under "Room Upgrades", step 3 should be updated to read:

Replicates transferable state events to the new room.

The homeserver should examine the state of the old room and create state events in the new room with the same state_key and contents, but with sender set to the mxid of the user performing the upgrade.

The server should copy all state except:

  • Obsolete state, as defined in section ...
  • User-scoped state i.e. any state whose state_key is equal to the sender's mxid.

(Note that if MSC3779 is merged, user-scoped state will need a different definition.)

For reference, the current wording is:

Replicates transferable state events to the new room. The exact details for what is transferred is left as an implementation detail, however the recommended state events to transfer are:

m.room.server_acl, m.room.encryption, m.room.name, m.room.avatar,
m.room.topic, m.room.guest_access, m.room.history_visibility,
m.room.join_rules,
m.room.power_levels

Membership events should not be transferred to the new room due to technical limitations of servers not being able to impersonate people from other homeservers. Additionally, servers should not transfer state events which are sensitive to who sent them, such as events outside of the Matrix namespace where clients may rely on the sender to match certain criteria.

Potential issues

Homeservers cannot impersonate users from other homeservers, so no one homeserver can copy the required state.

Part of the reason for this proposal is to reduce the amount of state that is held in a room, so we need to make sure we are not copying unnecessary state here, and that unwanted state such as spam or abuse can be excluded.

The existing spec states:

servers should not transfer state events which are sensitive to who sent them, such as events outside of the Matrix namespace where clients may rely on the sender to match certain criteria.

Instead, we propose including all events except those that are considered obsolete, and ones in the user's namespace. This change might be surprising to some clients who use custom state events, and rely on the sender property for their behaviour.

Alternatives

We could consider also copying user-scoped state, perhaps in a future MSC. One way to achieve this would be to allow a room founder special permission to create user-scoped events for users other than themselves under particular circumstances.

For example, we could permit this kind of not-my-user user-scoped event for the founder, if it occurs between their m.room.create and before any m.room.member events. Of course, the definition of "between" needs to be carefully crafted, and, if possible, some provision to prevent the room founder from forking the room later and modifying the outcome would be useful.

An earlier draft proposed an additional exclude_from_upgrade property on state events to allow explicitly avoiding copying some events, but no clear use case could be found for this that is not covered by simply marking events that are no longer needed as obsolete.

Security considerations

New state events are created by the upgrading user, so it may be possible for that user to make it look like they were the initiators of events that were actually created by a different user in the previous room.

A room upgrade will change the sender of any maliciously-added event, making it harder to remove all state created by a malicious user.

Dependencies

In order to exclude obsolete state, the definition of obsolete from this proposal is required, but the main part of this sub-proposal does not depend on any others.

Sub-proposal 5: Upgraded rooms have the same room ID

Proposal

After a room is upgraded, links to the room still point at the old ID.

For example:

  • room aliases point at the old room ID
  • bot integrations (including moderation bots) refer to the old room ID
  • space hierarchies depend on room ID to represent parent/child space relationships
  • push rules refer to room IDs
  • sync filters are based on room IDs
  • 3PID invitations include room ID

We propose to make upgraded rooms keep the same room ID as the old version, by introducing a server-only sub-ID that represents the version of the room.

Clients and external systems continue to use the existing room ID, and servers use room ID + room version to identify the real actual room.

When a client talks to a server using just room ID, the server automatically picks the most recent version of that room.

Potential issues

If servers disagree on which version is most recent, and which version exists, split brain situations could occur.

Alternatives

Security considerations

Unstable prefix

Dependencies

Future work

This section lists partially-formed ideas of further proposals that could complement or enhance this proposal

Pruning bans of deactivated users

Some rooms have large numbers of bans, which normally need to be carried over on a room upgrade. However, it is common for accounts that have been banned in one room to end up deactivated on the homeserver.

If an account has been deactivated, the ban is no longer useful, so we could exclude it from the room state.

Risks include:

  • Malicious homeservers being able to reverse bans. We could mitigate this by restricting the behaviour to the homeserver that is doing the upgrade, and in the longer term federating deactivations and trusting some other homeservers.
  • Accounts may be reactivated, so this could only be implemented on homeservers that implement policies preventing this from happening in ways which would disrupt rooms.

Bulk invite events

When a room is upgraded and we invite all users to the new room, we expect to invite a lot of users. It would almost certainly improve performance to collect these invitations into larger events.

Events have a limited size, so we would need to allow sending multiple bulk events, not just one.