matrix-doc/proposals/2675-aggregations-server.md

13 KiB

MSC2675: Serverside aggregations of message relationships

It's common to want to send events in Matrix which relate to existing events - for instance, reactions, edits and even replies/threads.

Clients typically need to track the related events alongside the original event they relate to, in order to correctly display them. For instance, reaction events need to be aggregated together by summing and be shown next to the event they react to; edits need to be aggregated together by replacing the original event and subsequent edits, etc.

It is possible to treat relations as normal events and aggregate them clientside, but to do so comprehensively could be very resource intensive, as the client would need to spider all possible events in a room to find relationships and maintain a correct view.

Instead, this proposal seeks to solve this problem by defining APIs to let the server calculate the aggregations on behalf of the client, and so bundle the aggregated data with the original event where appropriate. It also proposes an API to let clients paginate through all relations of an event.

This proposal is one in a series of proposals that defines a mechanism for events to relate to each other. Together, these proposals replace MSC1849.

  • MSC2674 defines a standard shape for indicating events which relate to other events.
  • This proposal defines APIs to let the server calculate the aggregations on behalf of the client, and so bundle the aggregated data with the original event where appropriate.
  • MSC2676 defines how users can edit messages using this mechanism.
  • MSC2677 defines how users can annotate events, such as reacting to events with emoji, using this mechanism.

Proposal

Aggregations

Relation events can be aggregated per relation type by the server. The format of the aggregated value (hereafter called "aggregation") depends on the relation type.

Some relation types might group the aggregations by the key property in the relation and aggregate to an array, while others might aggregate to a single object or any other value really.

Here are some non-normative examples of what aggregations can look like:

Example aggregation for m.thread (which aggregates all relations into a single object):

{
  "latest_event": {
    "content": { ... },
    ...
  },
  "count": 7,
  "current_user_participated": true
}

Example aggregation for m.annotation (which aggregates relations into a list of objects, grouped by key).

[
  {
      "key": "👍",
      "origin_server_ts": 1562763768320,
      "count": 3
  },
  {
      "key": "👎",
      "origin_server_ts": 1562763768320,
      "count": 2
  }
]

Bundling

Other than during non-gappy incremental syncs, timeline events that have other events relate to them should include the aggregation of those related events in the m.relations property of their unsigned data. This process is referred to as "bundling", and the aggregated relations included via this mechanism are called "bundled aggregations".

By sending a summary of the relations, bundling avoids us having to always send lots of individual relation events to the client.

Aggregations are never bundled into state events. This is a current implementation detail that could be revisited later, rather than a specific design decision.

The following client-server APIs should bundle aggregations with events they return:

  • GET /rooms/{roomId}/messages
  • GET /rooms/{roomId}/context/{eventId}
  • GET /rooms/{roomId}/event/{eventId}
  • GET /sync, only for room sections in the response where limited field is true; this amounts to all rooms in the response if the since request parameter was not passed, also known as an initial sync.
  • GET /relations, as proposed in this MSC.

Deprecated APIs like /initialSync and /events/{eventId} are not required to bundle aggregations.

The bundled aggregations are grouped according to their relation type. The format of m.relations (here with non-normative examples of the m.replace and m.annotation relation types) is as follows:

{
  "event_id": "abc",
  "unsigned": {
    "m.relations": {
      "m.annotation": {
        "key": "👍",
        "origin_server_ts": 1562763768320,
        "count": 3
      },
      "m.replace": {
        "event_id": "$edit_event_id",
        "origin_server_ts": 1562763768320,
        "sender": "@alice:localhost"
      },
    }
  }
}

Client-side aggregation

Bundled aggregations on an event give a snapshot of what relations were known at the time the event was received. When relations are received through /sync, clients should locally aggregate (as they might have done already before supporting this MSC) the relation on top of any bundled aggregation the server might have sent along previously with the target event, to get an up to date view of the aggregations for the target event. The aggregation algorithm is the same as the one described here for the server.

Querying relations

A single event can have lots of associated relations, and we do not want to overload the client by, for example, including them all bundled with the related-to event. Instead, we also provide a new /relations API in order to paginate over the relations, which behaves in a similar way to /messages, except using next_batch and prev_batch names (in line with /sync API). Tokens from /sync or /messages can be passed to /relations to only get relating events from a section of the timeline.

The /relations API returns the discrete relation events associated with an event that the server is aware of in standard topological order. Note that events may be missing, see limitations. You can optionally filter by a given relation type and the event type of the relating event:

GET /_matrix/client/v1/rooms/{roomID}/relations/{event_id}[/{rel_type}[/{event_type}]][?from=token][&to=token][&limit=amount]
{
  "chunk": [
    {
      "type": "m.reaction",
      "sender": "...",
      "content": {
        "m.relates_to": {
          "rel_type": "m.annotation",
          ...
        }
      }
    }
  ],
  "prev_batch": "some_token",
  "next_batch": "some_token",
}

The endpoint does not have any trailing slashes. It requires authentication and is not rate-limited.

The from and limit query parameters are used for pagination, and work just like described for the /messages endpoint.

Note that MSC2676 adds the related-to event in original_event property of the response. This way the full history (e.g. also the first, original event) of the event is obtained without further requests. See that MSC for further details.

End to end encryption

Since the server has to be able to aggregate relation events, structural information about relations must be visible to the server, and so the m.relates_to field must be included in the plaintext.

A future MSC may define a method for encrypting certain parts of the m.relates_to field that may contain sensitive information.

Redactions

Redacted relations should not be taken into consideration in bundled aggregations, nor should they be returned from /relations.

Requesting /relations on a redacted event should still return any existing relation events. This is in line with other APIs like /context and /messages.

Local echo

For the best possible user experience, clients should also include unsent relations into the client-side aggregation. When adding a relation to the send queue, clients should locally aggregate it into the relations of the target event, ideally regardless of the target event having received an event_id already or still being pending. If the client gives up on sending the relation for some reason, the relation should be de-aggregated from the relations of the target event. If the client offers the user a possibility of manually retrying to send the relation, it should be re-aggregated when the user does so.

De-aggregating a relation refers to rerunning the aggregation for a given target event while not considering the de-aggregated event any more.

Upon receiving the remote echo for any relations, a client is likely to remove the pending event from the send queue. Here, it should also de-aggregate the pending event from the target event's relations, and re-aggregate the received remote event from /sync to make sure the client-side aggregation happens with the same event data as on the server.

When adding a redaction for a relation to the send queue, the relation referred to should be de-aggregated from the relations of the target of the relation. Similar to a relation, when the sending of the redaction fails or is cancelled, the relation should be aggregated again.

If the target event is still pending and hasn't received its event_id yet, clients can locally relate relation events to their target by using transaction_id like they already do for detecting remote echos when sending events.

Edge cases

How do you handle ignored users?

  • Information about relations sent from ignored users must never be sent to the client, either in aggregations or discrete relation events. This is to let you block someone from harassing you with emoji reactions (or using edits as a side-channel to harass you). Therefore, it is possible that different users will see different aggregations (a different last edit, or a different reaction count) on an event.

Limitations

Relations can be missed while not being in the room

Relation events behave no different from other events in terms of room history visibility, which means that some relations might not be visible to a user while they are not invited or have not joined the room. This can cause a user to see an incomplete edit history or reaction count based on discrete relation events upon (re)joining a room.

Ideally the server would not include these events in aggregations, as it would mean breaking the room history visibility rules, but this MSC defers addressing this limitation and specifying the exact server behaviour to MSC3570.

Servers might not be aware of all relations of an event

The response of /relations might be incomplete because the homeserver potentially doesn't have the full DAG of the room. The federation API doens't have an equivalent of the /relations API, so has no way but to fetch the full DAG over federation to assure itself that it is aware of all relations.

MSC2836 provided a proposal for following relationships over federation in the "Querying relationships over federation" section via a /_matrix/federation/v1/event_relationships API

Event type based aggregation and filtering won't work well in encrypted rooms

The /relations endpoint allows filtering by event type, which for encrypted rooms will be m.room.encrypted, rendering this filtering less useful for encrypted rooms. Aggregation algorithms that take the type of the relating events they aggregate into account will suffer from the same limitation.

Future extensions

Handling limited (gappy) syncs

For the special case of a gappy incremental sync, many relations (particularly reactions) may have occurred during the gap. It would be inefficient to send each one individually to the client, but it would also be inefficient to send all possible bundled aggregations to the client.

The server could tell the client the event IDs of events which predate the gap which received relations during the gap. This means that the client could invalidate its copy of those events (if any) and then requery them (including their bundled relations) from the server if/when needed, for example using an extension of the /event API for batch requests.

The server could do this with a new stale_events field of each room object in the sync response. The stale_events field would list all the event IDs prior to the gap which had updated relations during the gap. The event IDs would be grouped by relation type, and paginated as per the normal Matrix pagination model.

This was originally part of this MSC but left out to limit the scope to what is implemented at the time of writing.

Prefix

While this MSC is not considered stable, the endpoints become:

  • GET /_matrix/client/unstable/rooms/{roomID}/relations/{eventID}[/{relationType}[/{eventType}]]

None of the newly introduced identifiers should use a prefix though, as this MSC tries to document relation support already being used in the wider matrix ecosystem.