matrix-doc/proposals/1767-extensible-events.md

23 KiB

MSC1767: Extensible events in Matrix

While events are currently JSON blobs which accept additional metadata appended to them, there is no formal structure for how to represent this information or interpret it on the client side, particularly in the case of unknown event types.

When specifying new events, the proposals often reinvent the same wheel instead of reusing existing blocks or types, such as in cases where captions, thumbnails, etc need to be considered for an event. This has further issues of clients not knowing how to render these newly-specified events, leading to mixed compatibility within the ecosystem.

The above seriously hinders the uptake of new event types (and therefore features) within the Matrix ecosystem. In the current system, a new event type would be introduced and all implementations slowly gain support for it - if we instead had reusable types then clients could automatically support a "good enough" version of that new event type while "proper" support is written in over time. Such an example could be polls: not every client will want polls right away, but it would be quite limiting as a user experience if some users can't even see the question being posed.

This proposal introduces a structure for how extensible events are represented, using the existing extensible nature of events today, laying the groundwork for more reusable blocks of content in future events.

With text being the simplest form of representation for events today, this MSC also specifies a relatively basic text schema for room messages that can be reused in other events. Other building block types are specified by other MSCs:

Some examples of new features/events using extensible events are:

Note: Readers might find Andy's blog useful for understanding the problem space. Unfortunately, for those who need to understand the changes to the protocol/specification, the best option is to read this proposal.

Proposal

In a new room version (why is described later in this proposal), events are declared to be represented by their extensible form, as described by this MSC. m.room.message is formally deprecated by this MSC, with removal from the specification happening as part of a room version adopting the feature. Clients are expected to use extensible events only in rooms versions which explicitly declare such support (in both unstable and stable settings), except where noted later in this proposal.

An extensible event is made up of two critical parts: an event type and zero or more content blocks. The event type defines which content blocks a receiver can expect, and the content blocks carry the information needed to render the event (whether the client understands the event type or not).

Content blocks are simply any top-level key in content on the event. They can have any value type (that is also legal in an event generally: string, integer, etc), and are namespaced using the Matrix conventions for namespacing.

Content blocks can be invented independent of event types and should be reusable in nature. For example, this proposal introduces an m.text content block which can be reused by other event types to represent textual fallback.

When a client encounters an extensible event (any event sent in a supported room version) that it does not understand, the client begins searching for a best match based on event type schemas it does know. This may mean combining multiple different content blocks to match a suitable schema, such as in the case of MSC3553 video events. Which schemas to try, and in what order, is left as a deliberate implementation detail. A client might decide to try parsing the event as a video, then image, then file, then text message, for example.

It is generally not expected that a single content block will describe an entire event, except in the exceedingly trivial cases (like text messages in this proposal). Multiple content blocks will usually fully describe the information in the event, and mixins (described later) can further change how an event is represented or processed.

Note that a "client" in an extensible events sense will typically mean an application using the Client-Server API, however in reality a client will be anything which needs to parse and understand event contents (servers for some functions like push rules, application services, etc).

Per the introduction, text is the baseline format that most/all Matrix clients support today, often through use of HTML and m.room.message. Instead of using m.room.message to represent this content, clients would instead use an m.message event with, at a minimum, a m.text content block:

{
    // irrelevant fields not shown
    "type": "m.message",
    "content": {
        "m.text": [
            { "body": "<i>Hello world</i>", "mimetype": "text/html" },
            { "body": "Hello world" }
        ]
    }
}

m.text has the following definitions associated with it:

  • An ordered array of mimetypes and applicable string content to represent a single marked-up blob of text. Each element is known as a representation.
  • body in a representation is required, and must be a string.
  • mimetype is optional in a representation, and defaults to text/plain.
  • Zero representations are permitted, however senders should aim to always specify at least one.
  • Invalid representations are skipped by clients (missing body, not an object, etc).
  • The first representation a renderer understands should be used.
  • Senders are strongly encouraged to always include a plaintext representation.
  • The mimetype of a representation determines its body - no effort is made to limit what is allowed in the body, however clients are still strongly encouraged to validate/sanitize the content further, like in the existing spec for HTML.
  • Custom text formats in a representation are specified by a suitably custom mimetype. For example, a representation might use a text format extending HTML or XML, or an all-new markup. This can be used to create bridge-compatible clients where the destination network's markup is first in the array, followed by more common HTML and text formats.

Like with the event described above, all event types now describe which content blocks they expect to see on their events. These content blocks could be required, as is the case of m.text in m.message, or they could be optional depending on the situation. Of course, senders are welcome to send even more blocks which aren't specified in the schema for an event type, however clients which understand that event type might not consider them at all.

In m.message's case, m.text is the only required content block. The m.text block can be reused by other events to include a text-like format for the event, such as a text fallback for clients which do not understand how to render a custom event type.

To reiterate, when a client encounters an unknown event type it first tries to see if there's a set of content blocks present that it can associate with a known event type. If it finds suitable content blocks, it parses the event as though the event were of the known type. If it doesn't find anything useful, the event is left as unrenderable, just as it likely would today.

To avoid a situation where events end up being unrenderable, it is strongly recommended that all event types support at least an m.text content block in their schema, thus allowing all events to theoretically be rendered as message events (in a worst case scenario).

For clarity, events are not able to specify how they are handled when the receiver doesn't know how to render the event type: the sender simply includes all possible or feasible representations for the data, hoping the receiver will pick the richest form for the user. As an example, a special medical imaging event type might also be represented as a video, static image, or text (URL to some healthcare platform): the sender includes all 3 fallbacks by specifying the needed content blocks, and the receiver may pick the video, image, or text depending on its own rules.

Events must still only represent a single logical piece of information, thus encouraging sensible fallback options in the form of content blocks. The information being represented is described by the event type, as it always has been before this MSC. It is explicitly not permitted to represent two or more pieces of information in a single event, such as a livestream reference and poll: senders should look into relationships instead.

Worked example: Custom temperature event

In a hypothetical scenario, a temperature event might look as such:

{
    // irrelevant fields not shown
    "type": "org.example.temperature",
    "content": {
        "m.text": [{"body": "It is 22 degrees at Home"}],
        "org.example.probe_value": {
            "label": "Home",
            "units": "org.example.celsius",
            "value": 22
        }
    }
}

In this scenario, clients which understand how to render an org.example.temperature event might use the information in org.example.probe_value exclusively, leaving the m.text block for clients which don't understand the temperature event type.

Another event type might find inspiration and use the probe value block for their event as well. Such an example might be in a more industrial control application:

{
    // irrelevant fields not shown
    "type": "org.example.tank.level",
    "content": {
        "m.text": [{"body": "[Danger] The water tank is 90% full."}],
        "org.example.probe_value": {
            "label": "Tank 3",
            "units": "org.example.litres",
            "value": 9037
        },
        "org.example.danger_level": "alert"
    }
}

This event also demonstrates a org.example.danger_level block, which uses a string value type instead of the previously demonstrated objects and values - this is a legal content block, as blocks can be of any type.

Clients should be cautious and avoid reusing too many unspecified types as it can create opportunities for confusion and inconsistency. There should always be an effort to get useful event types into the Matrix spec for others to benefit from.

Room version

This MSC requires a room version to make the transition process clear and coordinated. Normally for a feature such as this, an effort would be made to attempt to support backwards compatibility for a duration of time, however for a feature that requires significant overhaul of clients, servers, and Matrix as a whole it feels more important to bias towards a clear switch between legacy and modern (extensible) events.

Note: A previous draft of this proposal (codenamed "v1 extensible events") did attempt to describe a timeline-based approach, allowing for event types to mix concepts of content blocks and legacy fields, however that approach did not give sufficient reason for clients to fully adopt the extensible events changes.

In room versions supporting extensible events, clients MUST only send extensible events. Deprecated event types (to be enumerated at the time of making the room version) MUST NOT be sent into extensible event-supporting room versions, and clients MUST treat deprecated event types as unrenderable by force. For example, if a client sees an m.room.message in an extensible event-supporting room version, it must not render it, even if it knows how to render that type.

While full enforcement of this restriction is not feasible, servers are encouraged to block Client-Server API requests for sending known-banned event types into applicable rooms. This obviously does not help when the room is encrypted, or the client is sending custom events in a non-extensible form, hence the requirement that clients treat the events as invalid too.

Using the usual MSC process, the Spec Core Team (SCT) will be responsible for determining the minimum scope of extensible events in a published (stable) room version.

Meanwhile, clients are welcome to use the unstable implementations of extensible event-supporting features, provided they are in an appropriate room version. Some event type MSCs declare explicit support for what would normally be an unsupported room version - client authors should check the applicable MSC or specification for the feature to determine if they are allowed to do this. Such examples include MSC3381 Polls and MSC3245 Voice Messages.

State events

Unknown state event types generally should not be parsed by clients. This is to prevent situations where the sender masks a state change as some other, non-state, event. For example, even if a state event has an m.text content block, it should not be treated as a room message.

Note that state events MUST still make use of content blocks in applicable room versions, and that any top-level key in content is defined as a content block under this proposal. As such, this MSC implicitly promotes all existing content fields of m.* state events to independent content blocks as needed. Other MSCs may override this decision on a per-event type basis (ie: redeclaring how room topics work to support content blocks, deprecating the existing m.room.topic event in the process, like in MSC3765). Unlike most content blocks, these promoted-to-content-blocks are not realistically meant to be reused: it is simply a formality given this MSC's scope.

Notifications

Currently push notifications describe how an event can cause a notification to the user, though it makes the assumption that there are m.room.message events flying around to denote "messages" which can trigger keyword/mention-style alerts. With extensible events, the same might not be possible as it relies on understanding how/when the client will render the event to cause notifications.

For simplicity, when content.body is used in an event_match condition, it now looks for an m.text block's text/plain representation (implied or explicit) in room versions supporting extensible events. This is not an easy rule to represent in the existing push rules schema, and this MSC has no interest in designing a better schema. Note that other conditions applied to push notifications, such as an event type check, are not affected by this: clients/servers will have to alter applicable push rules to handle the new event types (see also: MSC3933 and friends).

Power levels

This MSC proposes no changes to how power levels interact with events: they are still capable of restricting which users can send an event type. Though events might be rendered as a different logical type (ie: unknown event being rendered as a message), this does not materially impact the room's ability to function. Thus, considerations for how to handle power levels more intelligently are details left for a future MSC.

As of writing, most rooms fit into two categories: any event type is possible to send, or specific cherry-picked event types are allowed (announcement rooms: reactions & redactions). Extensible events don't materially change the situation implied by this power levels structure.

Mixins specifically allowed

A mixin is a specific type of content block which can be added to any type of event to change how that event is processed. Content blocks which are mixins will be called out as such in the spec. Mixins are meant to be purely additive, thus all event types MUST support being rendered/processed without the use of mixins.

See also the Wikipedia entry on mixins.

Note that mixins differ from optional content blocks in an event type's schema: a mixin is able to be applied to any event type sensibly while optional content blocks are generally only valuable to the applicable event types.

Though this MSC does not describe any such mixins itself, MSC3955 does by allowing any event to be flagged as "automated" - a strictly additive annotation on events.

Another possible mixin would be m.relates_to (not described by this MSC). Currently, some features like the key verification framework rely on relationships as part of making the feature work. The expectation is that these features would be adapted to meet the "purely additive" condition (assuming m.relates_to does actually end up being a mixin).

Uses of HTML & text throughout the spec

For an abundance of clarity, all functionality not explicitly called out in this MSC which relies on the formatted_body of an m.room.message is expected to transition to using an appropriate m.text representation instead. For example, the HTML representation of a mention will now appear under m.text's text/html representation (adding one if required).

A similar condition is applied to body in m.room.message: all existing functionality will instead use the text/plain representation within m.text, if not explicitly called out by this MSC.

Potential issues

It's a bit ugly to not know whether a given key in content will take a string, object, boolean, integer, or array.

It's a bit ugly to not know at a glance if a content block is a mixin or not.

It's a bit ugly that you have to look over the keys of contents to see what blocks are present, but better than duplicating this into an explicit blocks list within the event content (on balance).

We're skipping over defining rules for which fallback combinations to display (i.e. "display hints") for now; these can be added in a future MSC if needed. MSC1225 contains a proposal for this.

Placing content blocks at the top level of content is a bit unfortunate, though mixes nicely thanks to namespacing. Potentially conflicting cases in the wild would be namespaced fields, which would get translated as unrenderable events if the value type doesn't meet the client's known schema.

This MSC does not rewrite or redefine all possible events in the specification: this is deliberately left as an exercise for several future MSCs.

Security considerations

Like today, it's possible to have the different representations of an event not match, thus introducing a potential for malicious payloads (text-only clients seeing something different to HTML-friendly ones). Clients could try to do similarity comparisons, though this is complicated with features like HTML and arbitrary custom markup (markdown, etc) showing up in the plaintext or in tertiary formats on the events. Historically, room moderators have been pretty good about removing these malicious senders from their rooms when other users point out (quite quickly) that the event is appearing funky to them.

Note about spec process

Extensible events as a spec feature requires dozens of different MSCs, with this MSC being the structure definition and text baseline. It is not expected that this MSC will be written into spec once it has passed FCP. Instead, it is expected that all of the "core" extensible events MSCs will pass FCP and extensible events be assigned a stable room version before any spec authoring begins. Thus, this particular MSC should be anticipated to sit in accepted-but-not-merged (stable, not formal spec yet) for a while, and that's okay.

The Spec Core Team (SCT) has decision making power over what is considered core for extensible events, though the recommendation is to ensure replacements for all non-state m.room.* types have accepted (successful FCP) MSCs to replace them.

Unstable prefix

While this MSC is not considered stable by the specification, implementations must use org.matrix.msc1767 as a prefix to denote the unstable functionality. For example, sending an m.message event would mean sending an org.matrix.msc1767.message event instead.

For purposes of testing, implementations can use a dynamically-assigned unstable room version org.matrix.msc1767.<version> to use extensible events within. For example, org.matrix.msc1767.10 for room version 10 or org.matrix.msc1767.org.example.cool_ver for a hypothetical org.example.cool_ver room version. Any events sent in these room versions can use stable identifiers given the entire room version itself is unstable, however senders must take care to ensure stable identifiers do not leak out to other room versions - it may be simpler to not send stable identifiers at all.

Changes from MSC1225

  • converted from googledoc to MD, and to be a single PR rather than split PR/Issue.
  • simplifies it by removing displayhints (for now - deferred to a future MSC).
  • replaces the clunky m.text.1 idea with lists for types which support fallbacks.
  • removes the concept of optional compact form for m.text by instead having m.text always in expanded form.
  • tries to accommodate most of the feedback on GH and Google Docs from MSC1225.

Historical changes

  • Anything that wasn't simple text rendering was broken out to dedicated MSCs in an effort to get the structure approved and adopted while the more complex types get implemented independently.
  • Renamed subtypes/reusable types to just "content blocks".
  • Allow content blocks to be nested.
  • Fix push rules in the most basic sense, deferring to a future MSC on better support.
  • Explicitly make no changes to power levels, deferring to a future MSC on better support.
  • Drop timeline for transition in favour of an explicit room version.
  • Move most push rule changes and such into their own/future MSCs.
  • Move emotes, notices, and encryption out to their own dedicated MSCs.