matrix-doc/proposals/1659-event-id-as-hashes.md

5.0 KiB
Raw Permalink Blame History

Changing Event IDs to be Hashes

Motivation

Having event IDs separate from the hashes leads to issues when a server receives multiple events with the same event ID but different reference hashes. While APIs could be changed to better support dealing with this situation, it is easier and nicer to simply drop the idea of a separate event ID entirely, and instead use the reference hash of an event as its ID.

Identifier Format

Currently hashes in our event format include the hash name, allowing servers to choose which hash functions to use. The idea here was to allow a gradual change between hash functions without the need to globally coordinate shifting from one hash function to another.

However now that room versions exist, changing hash functions can be achieved by bumping the room version. Using this method would allow using a simple string as the event ID rather than a full structure, significantly easing their usage.

One side effect of this would be that there would be no indication about which hash function was actually used, and it would need to be inferred from the room version. To aid debuggability it may be worth encoding the hash function into the ID format.

Conclusion: Don't encode the hash function, since the hash will depend on the version specific redaction algorithm anyway.

The proposal is therefore that the event IDs are a sha256 hash, encoded using unpadded Base64, and prefixed with $ (to aid distinguishing different types of identifiers). For example, an event ID might be: $CD66HAED5npg6074c6pDtLKalHjVfYb2q4Q3LZgrW6o.

The hash is calculated in the same way as previous event reference hashes were, which is:

  1. Redact the event
  2. Remove signatures field from the event
  3. Serialize the event to canonical JSON
  4. Compute the hash of the JSON bytes

Event IDs will no longer be included as part of the event, and so must be calculated by servers receiving the event.

Changes to Event Formats

As well as changing the format of event IDs, we also change the format of the auth_events and prev_events keys in events to simply be lists of event IDs (rather than being lists of tuples).

A full event would therefore look something like (note that this is just an illustrative example, and that the hashes are not correct):

{
  "auth_events": [
    "$5hdALbO+xIhzcLTxCkspx5uqry9wO8322h/OI9ApnHE",
    "$Ga0DBIICBsWIZbN292ATv8fTHIGGimwjb++w+zcHLRo",
    "$zc4ip/DpPI9FZVLM1wN9RLqN19vuVBURmIqAohZ1HXg",
  ],
  "content": {
    "body": "Here is the message content",
    "msgtype": "m.message"
  },
  "depth": 6,
  "hashes": {
    "sha256": "M6/LmcMMJKc1AZnNHsuzmf0PfwladVGK2Xbz+sUTN9k"
  },
  "origin": "localhost:8800",
  "origin_server_ts": 1548094046693,
  "prev_events": [
    "$MoOzCuB/sacqHAvgBNOLICiGLZqGT4zB16MSFOuiO0s",
  ],
  "room_id": "!eBrhCHJWOgqrOizwwW:localhost:8800",
  "sender": "@anon-20190121_180719-33:localhost:8800",
  "signatures": {
    "localhost:8800": {
      "ed25519:a_iIHH": "N7hwZjvHyH6r811ebZ4wwLzofKhJuIAtrQzaD3NZbf4WQNijXl5Z2BNB047aWIQCS1JyFOQKPVom4et0q9UOAA"
    }
  },
  "type": "m.room.message"
}

Changes to existing APIs

All APIs that accept event IDs must accept event IDs in the new format.

For S2S API, whenever a server needs to parse an event from a request or response they must either already know the room version or be told the room version in the request/response. There are separate MSCs to update APIs where necessary.

For C2S API, the only change clients will see is that the event IDs have changed format. Clients should already be treating event IDs as opaque strings, so no changes should be required. Servers must add the event_id when sending the event to clients, however.

Note that the auth_events and prev_events fields aren't sent to clients, and so the changes proposed above won't affect clients.

Protocol Changes

The auth_events and prev_events fields on an event need to be changed from a list of tuples to a list of strings, i.e. remove the old event ID and simply have the list of hashes.

The auth rules also need to change:

  • The event no longer needs to be signed by the domain of the event ID (but still needs to be signed by the senders domain)

  • We currently allow redactions if the domain of the redaction event ID matches the domain of the event ID it is redacting; which allows self redaction. This check is removed and redaction events are always accepted. Instead, the redaction event only takes effect and is sent down to clients if/when the original event is received, and the domain of the events' senders match. (While this is clearly suboptimal, it is the only practical suggestion)

Room Version

There will be a new room version v3 that is the same as v2 except uses the new event format proposed above. v3 will be marked as 'stable' as defined in MSC1804