matrix-doc/proposals/1659-event-id-as-hashes.md

131 lines
5.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Changing Event IDs to be Hashes
## Motivation
Having event IDs separate from the hashes leads to issues when a server receives
multiple events with the same event ID but different reference hashes. While
APIs could be changed to better support dealing with this situation, it is
easier and nicer to simply drop the idea of a separate event ID entirely, and
instead use the reference hash of an event as its ID.
## Identifier Format
Currently hashes in our event format include the hash name, allowing servers to
choose which hash functions to use. The idea here was to allow a gradual change
between hash functions without the need to globally coordinate shifting from one
hash function to another.
However now that room versions exist, changing hash functions can be achieved by
bumping the room version. Using this method would allow using a simple string as
the event ID rather than a full structure, significantly easing their usage.
One side effect of this would be that there would be no indication about which
hash function was actually used, and it would need to be inferred from the room
version. To aid debuggability it may be worth encoding the hash function into
the ID format.
**Conclusion:** Don't encode the hash function, since the hash will depend on
the version specific redaction algorithm anyway.
The proposal is therefore that the event IDs are a sha256 hash, encoded using
[unpadded
Base64](https://matrix.org/docs/spec/appendices.html#unpadded-base64), and
prefixed with `$` (to aid distinguishing different types of identifiers). For
example, an event ID might be: `$CD66HAED5npg6074c6pDtLKalHjVfYb2q4Q3LZgrW6o`.
The hash is calculated in the same way as previous event reference hashes were,
which is:
1. Redact the event
2. Remove `signatures` field from the event
3. Serialize the event to canonical JSON
4. Compute the hash of the JSON bytes
Event IDs will no longer be included as part of the event, and so must be
calculated by servers receiving the event.
## Changes to Event Formats
As well as changing the format of event IDs, we also change the format of the
`auth_events` and `prev_events` keys in events to simply be lists of event IDs
(rather than being lists of tuples).
A full event would therefore look something like (note that this is just an
illustrative example, and that the hashes are not correct):
```json
{
"auth_events": [
"$5hdALbO+xIhzcLTxCkspx5uqry9wO8322h/OI9ApnHE",
"$Ga0DBIICBsWIZbN292ATv8fTHIGGimwjb++w+zcHLRo",
"$zc4ip/DpPI9FZVLM1wN9RLqN19vuVBURmIqAohZ1HXg",
],
"content": {
"body": "Here is the message content",
"msgtype": "m.message"
},
"depth": 6,
"hashes": {
"sha256": "M6/LmcMMJKc1AZnNHsuzmf0PfwladVGK2Xbz+sUTN9k"
},
"origin": "localhost:8800",
"origin_server_ts": 1548094046693,
"prev_events": [
"$MoOzCuB/sacqHAvgBNOLICiGLZqGT4zB16MSFOuiO0s",
],
"room_id": "!eBrhCHJWOgqrOizwwW:localhost:8800",
"sender": "@anon-20190121_180719-33:localhost:8800",
"signatures": {
"localhost:8800": {
"ed25519:a_iIHH": "N7hwZjvHyH6r811ebZ4wwLzofKhJuIAtrQzaD3NZbf4WQNijXl5Z2BNB047aWIQCS1JyFOQKPVom4et0q9UOAA"
}
},
"type": "m.room.message"
}
```
## Changes to existing APIs
All APIs that accept event IDs must accept event IDs in the new format.
For S2S API, whenever a server needs to parse an event from a request or
response they must either already know the room version *or* be told the room
version in the request/response. There are separate MSCs to update APIs where
necessary.
For C2S API, the only change clients will see is that the event IDs have changed
format. Clients should already be treating event IDs as opaque strings, so no
changes should be required. Servers must add the `event_id` when sending the
event to clients, however.
Note that the `auth_events` and `prev_events` fields aren't sent to clients, and
so the changes proposed above won't affect clients.
## Protocol Changes
The `auth_events` and `prev_events` fields on an event need to be changed from a
list of tuples to a list of strings, i.e. remove the old event ID and simply
have the list of hashes.
The auth rules also need to change:
- The event no longer needs to be signed by the domain of the event ID (but
still needs to be signed by the senders domain)
- We currently allow redactions if the domain of the redaction event ID
matches the domain of the event ID it is redacting; which allows self
redaction. This check is removed and redaction events are always accepted.
Instead, the redaction event only takes effect and is sent down to clients
if/when the original event is received, and the domain of the events'
senders match. (While this is clearly suboptimal, it is the only practical
suggestion)
## Room Version
There will be a new room version v3 that is the same as v2 except uses the new
event format proposed above. v3 will be marked as 'stable' as defined in [MSC1804](https://github.com/matrix-org/matrix-doc/blob/travis/msc/room-version-client-advertising/proposals/1804-advertising-capable-room-versions.md)