matrix-doc/proposals/3871-gappy-timelines.md

439 lines
14 KiB
Markdown

# MSC3871: Gappy timeline
`/messages` returns a linearized version of the event DAG. From any given
homeservers perspective of the room, the DAG can have gaps where they're missing
events. This could be because the homeserver hasn't fetched them yet or because
it failed to fetch the events because those homeservers are unreachable and no
one else knows about the event.
Currently, there is an unwritten rule between the server and client that the
server will always return all contiguous events in that part of the timeline.
But the server has to break this rule sometimes when it doesn't have the event
and is unable to get the event from anyone else. This MSC aims to change the
dynamic so the server can give the client feedback and an indication of where
the gaps are.
This way, clients know where they are missing events and can even retry fetching
by perhaps adding some UI to the timeline like "We failed to get some messages
in this gap, try again."
This can also make servers faster to respond to `/messages`. For example,
currently, Synapse always tries to backfill and fill in the gap (even when it
has enough messages locally to respond). In big rooms like `#matrix:matrix.org`
(Matrix HQ), almost every place you ask for has gaps in it (thousands of
backwards extremities) and lots of those events are unreachable so we try the
same thing over and over hoping the response will be different this time but
instead, we just make the `/messages` response time slow. With this MSC, we can
instead be more intelligent about backfilling in the background and just tell
the client about the gap that they can retry fetching a little later.
## Proposal
Add a `gaps` field to the response of [`GET
/_matrix/client/v3/rooms/{roomId}/messages`](https://spec.matrix.org/v1.1/client-server-api/#get_matrixclientv3roomsroomidmessages).
This field is an array of `GapEntry` indicating where missing events in the
timeline are as defined below.
### 200 response
This describes the new `gaps` response field being added to the `200 response`
of `/messages`:
Name | Type | Description | required
--- | --- | --- | ---
`gaps` | `[GapEntry]` | A list of gaps indicating where events are missing in the `chunk` | no
#### `GapEntry`
key | type | value | description | required
--- | --- | --- | --- | ---
`event_id` | string | Event ID | The event ID indicating the position in the `/messages` `chunk` response | yes
`prev_pagination_token` | string | Pagination token | A pagination token that represents the spot in the DAG before the given `event_id` in the `chunk`. Omitting this field just means there is no gap on this side. | no
`next_pagination_token` | string | Pagination token | A pagination token that represents the spot in the DAG after the given `event_id` in the `chunk`. Omitting this field just means there is no gap on this side. | no
### `/messages` response examples
The following mermaid diagram represents the room DAG snapshot used for the following
`/messages` responses. The slightly transparent events with no background are events
that the homeserver does not have and are in the gap.
Pagination tokens are positions between events. This already an established concept but
to illustrate this better, see the following `tX` pagination tokens in the following
diagram.
```mermaid
flowchart RL
after[newest events...]:::gap-event -->|t10| fred -->|t9| waldo:::gap-event -->|t8| garply -->|t7| grault:::gap-event -->|t6| corge -->|t5| qux:::gap-event -->|t4| baz -->|t3| bar:::gap-event -->|t2| foo -->|t1| before[oldest events...]:::gap-event
classDef gap-event opacity:0.8,fill:transparent;
```
The idea is to be able to keep paginating from
`prev_pagination_token`/`next_pagination_token` in the respective direction to fill in
the gap.
#### `/messages?dir=b`
`/messages?dir=b` response example with gaps (`chunk` has events in
reverse-chronoligcal order since we're paginating backwards):
`/messages?dir=b&from=t6`
```json5
{
"chunk": [
{
"event_id": "$corge",
"type": "m.room.message",
"content": {
"body": "corge",
}
},
{
"event_id": "$baz",
"type": "m.room.message",
"content": {
"body": "baz",
}
},
{
"event_id": "$foo",
"type": "m.room.message",
"content": {
"body": "foo",
}
}
]
"gaps": [
{
"prev_pagination_token": "t6",
"event_id": "$corge",
"next_pagination_token": "t5",
},
{
"prev_pagination_token": "t4",
"event_id": "$baz",
"next_pagination_token": "t3",
},
{
"prev_pagination_token": "t2",
"event_id": "$foo",
"next_pagination_token": "t1",
}
]
}
```
#### `/messages?dir=f`
`/messages?dir=f` response example with gaps (`chunk` has events in
chronoligcal order since we're paginating forwards):
`/messages?dir=f&from=t6`
```json5
{
"chunk": [
{
"event_id": "$garply",
"type": "m.room.message",
"content": {
"body": "garply",
}
},
{
"event_id": "$fred",
"type": "m.room.message",
"content": {
"body": "fred",
}
},
]
"gaps": [
{
"prev_pagination_token": "t7",
"event_id": "$garply",
"next_pagination_token": "t8",
},
{
"prev_pagination_token": "t9",
"event_id": "$fred",
"next_pagination_token": "t10",
}
]
}
```
## Potential issues
Lots of gaps/extremities are generated when a spam attack occurs and federation
falls behind. If clients start showing gaps with retry links, we might just be
exposing the spam more.
## Alternatives
As an alternative, we can continue to do nothing as we do today and not worry
about the occasional missing events. People seem not to notice any missing
messages anyway but they do probably see our slow `/messages` pagination.
### Expose `prev_events` to the client
One alternative is including the `prev_events` in the events that the client sees so
they can figure out the DAG chain themselves and see if there is an missing event in the
middle.
There is an [unspecced `/messages?raw=true` query parameter in
Synapse](https://github.com/matrix-org/synapse/blob/20c76cecb9eb84dadfa7b2d25b436d3ab9218a1a/synapse/rest/client/room.py#L653)
that returns the full raw event as seen over federation which means it will include the
`prev_events`.
You can also specify `event_format: federation` directly in that JSON `filter` parameter
of `/messages` ->
`/_matrix/client/v3/rooms/{room_id}}/messages?dir=b&filter=%7B%22event_format%22%3A%20%22federation%22%7D`
Related to:
- https://github.com/matrix-org/matrix-spec/issues/859
- https://github.com/matrix-org/matrix-spec/issues/1047
### Synthetic `m.timeline.gap` event alternative
Another alternative is using synthetic events (thing that looks like an event
without an `event_id`) which the server inserts alongside other events in the
`chunk` to indicate where the gap is. But this has detractors since it's harder
to implement in strongly typed SDK's and easy for a client to naively display
every "event" in the `chunk`.
`/messages` response example with a gap:
```json
{
"chunk": [
{
"type": "m.room.message",
"content": {
"body": "foo",
}
},
{
"type": "m.timeline.gap",
"content": {
"gap_start_event_id": "$12345",
"pagination_token": "t47409-4357353_219380_26003_2265",
}
},
{
"type": "m.room.message",
"content": {
"body": "baz",
}
},
]
}
```
### `GapEntry` alternative only indicating a gap `next_to_event_id` (only one side)
Same concept as the existing `GapEntry` proposal but we only indicate the gap on one
side of an event `next_to_event_id` according to the direction that `/messages` is going
already.
The problem with this alternative is that clients store events differently and it's
valid to want to paginate in either direction from a given event. This alternative works
fine in the Element Web case where you always paginate backwards in the scrollback and
store events as a whole timeline list but another client like the [Trixinity
SDK](https://github.com/benkuly/trixnity), where events are stored individually in a
linked list, where each event could have a gap before and after, and where a gap could
be 100's, 1000's of events wide, it would be useful to paginate from both ends to fill
the gap faster.
<details>
<summary>
Details for the <code>GapEntry</code> alternative only indicating a gap <code>next_to_event_id</code>
</summary>
#### `GapEntry`
key | type | value | description | required
--- | --- | --- | --- | ---
`next_to_event_id` | string | Event ID | The event ID indicating the position in the `/messages` `"chunk"` response where the gap starts after that position. This field can be `null` or completely omitted to indicate that the gap is at the start of the `/messages` `"chunk"` | no
`pagination_token` | string | Pagination token | A pagination token that represents the spot in the DAG to be able to continue paginating in the same direction as the request and fill in the gap from `next_to_event_id` to the next known event. | yes
### `/messages` response examples
The following mermaid diagram represents the room DAG snapshot used for the following
`/messages` responses. The slightly transparent events with no background are events
that the homeserver does not have and are in the gap.
Pagination tokens are positions between events. This already an established concept but
to illustrate this better, see the following `tX` pagination tokens in the following
diagram.
```mermaid
flowchart RL
after[newest events...]:::gap-event -->|t10| fred -->|t9| waldo:::gap-event -->|t8| garply -->|t7| grault:::gap-event -->|t6| corge -->|t5| qux:::gap-event -->|t4| baz -->|t3| bar:::gap-event -->|t2| foo -->|t1| before[oldest events...]:::gap-event
classDef gap-event opacity:0.8,fill:transparent;
```
The idea is to be able to keep paginating from `pagination_token` in the same
direction of the request to fill in the gap.
#### `/messages?dir=b`
`/messages?dir=b` response example with gaps (`chunk` has events in
reverse-chronoligcal order since we're paginating backwards):
`/messages?dir=b&from=t6`
```json5
{
"chunk": [
// there is no gap from `t6` to `$corge` as expected
{
"event_id": "$corge",
"type": "m.room.message",
"content": {
"body": "corge",
}
},
// <the first `GapEntry` indicates a gap here>
{
"event_id": "$baz",
"type": "m.room.message",
"content": {
"body": "baz",
}
},
// <the second `GapEntry` indicates a gap here>
{
"event_id": "$foo",
"type": "m.room.message",
"content": {
"body": "foo",
}
}
// <the third `GapEntry` indicates a gap here>
]
"gaps": [
{
"next_to_event_id": "$corge",
"pagination_token": "t5",
},
{
"next_to_event_id": "$baz",
"pagination_token": "t3",
},
{
"next_to_event_id": "$foo",
"pagination_token": "t1",
}
]
}
```
#### `/messages?dir=f`
`/messages?dir=f` response example with gaps (`chunk` has events in
chronoligcal order since we're paginating forwards):
`/messages?dir=f&from=t6`
```json5
{
"chunk": [
// <the first `GapEntry` indicates a gap here>
{
"event_id": "$garply",
"type": "m.room.message",
"content": {
"body": "garply",
}
},
// <the second `GapEntry` indicates a gap here>
{
"event_id": "$fred",
"type": "m.room.message",
"content": {
"body": "fred",
}
},
// <the third`GapEntry` indicates a gap here>
]
"gaps": [
{
"next_to_event_id": null,
"pagination_token": "t6",
},
{
"next_to_event_id": "$garply",
"pagination_token": "t8",
},
{
"next_to_event_id": "$fred",
"pagination_token": "t10",
}
]
}
```
</details>
## Future considerations
In the future, we should consider adding the same `gaps` field to `/context` because
it's another endpoint that returns a linearized version of the DAG.
It could make sense to roll this into this MSC but it might make the proposal less clear
if we have to bulk it up by specifying the same details for `/context`. Leaving it to be
follow-up MSC for now.
## Security considerations
Only your own homeserver controls whether a gap is added to the `/messages`
response so there shouldn't be any weird edge case where someone else can
control whether you to fetch something.
## Unstable prefix
While this feature is in development, the `gaps` field can be used as
`org.matrix.msc3871.gaps`
### While the MSC is unstable
During this period, to detect server support clients should check for the
presence of the `org.matrix.msc3871` flag in `unstable_features` on `/versions`.
Clients are also required to use the unstable prefixes (see [unstable
prefix](#unstable-prefix)) during this time.
### Once the MSC is merged but not in a spec version
Once this MSC is merged, but is not yet part of the spec, clients should rely on
the presence of the `org.matrix.msc3871.stable` flag in `unstable_features` to
determine server support. If the flag is present, clients are required to use
stable prefixes (see [unstable prefix](#unstable-prefix)).
### Once the MSC is in a spec version
Once this MSC becomes a part of a spec version, clients should rely on the
presence of the spec version, that supports the MSC, in `versions` on
`/versions`, to determine support. Servers are encouraged to keep the
`org.matrix.msc3871.stable` flag around for a reasonable amount of time
to help smooth over the transition for clients. "Reasonable" is intentionally
left as an implementation detail, however the MSC process currently recommends
*at most* 2 months from the date of spec release.