274 lines
11 KiB
Markdown
274 lines
11 KiB
Markdown
# Grammars for identifiers in the Matrix protocol
|
|
|
|
## Background
|
|
|
|
Matrix uses client- or server-generated identifiers in a number of
|
|
places. Historically the grammars for these have been underspecified, which
|
|
leads to confusion about what is or is not a valid identifier with the
|
|
possibility of incompatability between implementations.
|
|
|
|
This proposal presents tightly-specified grammars for a number of
|
|
identifiers.
|
|
|
|
## Common Identifiers
|
|
|
|
[Spec](https://matrix.org/docs/spec/appendices.html#common-identifier-format)
|
|
|
|
Proposal:
|
|
|
|
> `localpart` may not include `:`. When parsing a Common Identifier, it should
|
|
> be split at the *leftmost* `:`.
|
|
|
|
Rationale: server names may contain multiple `:`s (think IPv6 literals), so the
|
|
first colon is the only sane place to split them. This is a Known Thing, but I
|
|
don't think we spell it out anywhere in the spec.
|
|
|
|
## User IDs
|
|
|
|
User IDs are
|
|
[well-specified](https://matrix.org/docs/spec/appendices.html#user-identifiers),
|
|
however we should consider dropping `/` from the list of allowed characters,
|
|
because HTTP proxies might rewrite
|
|
`/_matrix/client/r0/profile/@foo%25bar:matrix.org/displayname` to
|
|
`/_matrix/client/r0/profile/@foo/bar:matrix.org/displayname`, messing things
|
|
up.
|
|
|
|
History: `/` was introduced with the intention of acting as a hierarchical
|
|
namespacing character, particularly with consideration to the gitter protocol
|
|
which uses it as a hierarchical separator. However, this was not as effective
|
|
as hoped because `@foo/bar:example.com` looks like the ID is partitioned into
|
|
`@foo` and `bar:example.com`.
|
|
|
|
Proposal:
|
|
|
|
> Remove `/` from the list of allowed characters in User IDs.
|
|
|
|
`/` will of course be maintained under the grammar of "historical user
|
|
IDs". Sorting out that mess is a longer-term project.
|
|
|
|
## Room IDs and Event IDs
|
|
|
|
[Issue](https://github.com/matrix-org/matrix-doc/issues/667)
|
|
[Spec](https://matrix.org/docs/spec/appendices.html#room-ids-and-event-ids)
|
|
|
|
These currently have similar formats, though it is likely that event ids will
|
|
be replaced with something else due to
|
|
[#1127](https://github.com/matrix-org/matrix-doc/issues/1127).
|
|
|
|
Currently they are both specified as ``?opaque_id:domain``, without clues as to
|
|
what the opaque_id should be.
|
|
|
|
Synapse uses: `[A-Za-z]{18}`.
|
|
[Dendrite](https://github.com/matrix-org/dendrite/blob/b71d922/src/github.com/matrix-org/dendrite/clientapi/routing/createroom.go#L125)
|
|
uses (I think) `[A-Za-z0-9]{16}` via
|
|
[json.go](https://github.com/matrix-org/util/blob/master/json.go#L185). However,
|
|
some server implementations/forks are known to generate event IDs (and possibly
|
|
room IDs) using a wide alphabet, which means that there exist rooms that
|
|
include unusual event IDs.
|
|
|
|
Proposal:
|
|
|
|
> The opaque_id part must not be empty, and must consist entirely of the
|
|
> characters `[0-9a-zA-Z.=_-]`.
|
|
>
|
|
> The total length (including sigil and domain) must not exceed 255 characters.
|
|
>
|
|
> This is only enforced for v2 rooms - servers and clients wishing to support
|
|
> v1 rooms should be more tolerant.
|
|
|
|
|
|
## Key IDs (for federation, e2e, and identity servers)
|
|
|
|
These are always of the form `<algorithm>:<tok>`.
|
|
|
|
Valid algorithms are defined at
|
|
https://matrix.org/docs/spec/client_server/unstable.html#key-algorithms, though
|
|
we should define the alphabet for future algorithms.
|
|
|
|
Proposal:
|
|
|
|
> Future algorithm identifiers will be assigned from the alphabet `[a-z0-9_.]`
|
|
> and will be at most 31 characters in length.
|
|
|
|
For federation keys,
|
|
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/config/key.py#L159)
|
|
generates key ids as `ed25519:a_[A-Za-z]{4}`, though an HS admin can configure
|
|
them manually to be anything without whitespace.
|
|
|
|
Key IDs end up in an Authorization header which looks like `X-Matrix
|
|
origin=origin.example.com,key="keyId",sig="ABCDEF..."`. The Synapse
|
|
implementation splits on `,` and `=` without regard to quoting so this
|
|
currently precludes the use of `,` or `=` in a key ID.
|
|
|
|
For e2e, device keys have a `tok` corresponding to the device id, whilst
|
|
one-time keys are generated by libolm, which uses a base64-encoded 32-bit int, ie
|
|
`[A-Za-z0-9+/]{6}`.
|
|
|
|
A key ID needs to be unique over the lifetime of the server (for federation) or
|
|
the device (for e2e). However, they are used fairly widely, so making them long
|
|
is unattractive as they could significantly increase the amount of data being
|
|
transmitted. Let's limit the 'tok' part of the key to 31 characters too.
|
|
|
|
Proposal:
|
|
|
|
> Key IDs use the following BNF grammar:
|
|
>
|
|
> ```
|
|
> key_id = algorithm ":" tok
|
|
>
|
|
> algorithm = 1*31 alg_chars
|
|
>
|
|
> tok = 1*31 tok_chars
|
|
>
|
|
> alg_chars = %x61-7a / %30-39 / "_" / "."
|
|
> ; a-z 0-9 _ .
|
|
>
|
|
> tok_chars = ALPHA / DIGIT / "." / "=" / "_" / "-"
|
|
> ; A-Z a-z 0-9 . = _ -
|
|
> ```
|
|
>
|
|
|
|
Note that enforcing this grammar will mean:
|
|
|
|
* Making sure that synapse handles "=" characters in key IDs (easy).
|
|
|
|
* Making libolm not put + and / characters in key IDs (easy enough, but there
|
|
will be a bunch of malformed unique keys out there in the wild. Possibly they
|
|
would just get thrown away. Servers may need to continue to tolerate `+` and
|
|
`/` in e2e keys for a while.)
|
|
|
|
## Opaque IDs
|
|
|
|
[Issue](https://github.com/matrix-org/matrix-doc/issues/666)
|
|
|
|
This is a class of identifier types where nobody is really meant to parse any
|
|
part of the ID - they are just unique identifiers (with varying scopes of
|
|
uniqueness). See below for discussion on what is currently in use.
|
|
|
|
I propose to specify the almost the same grammar for all of these, for
|
|
simplicity and consistency.
|
|
|
|
Proposal:
|
|
|
|
> Opaque IDs must be strings consisting entirely of the characters
|
|
> `[0-9a-zA-Z.=_-]`. Their length must not exceed 255 characters and they must
|
|
> not be empty.
|
|
|
|
For almost all of the current implementations I have looked at (listed below),
|
|
the grammar above is a superset of the generated identifiers, and a subset of
|
|
the understood identifiers. There should therefore be no
|
|
backwards-compatibility problems with its introduction.
|
|
|
|
The exception is transaction IDs generated by some clients. I think that we'll
|
|
just have to fix those clients and accept that old versions may not work with
|
|
future servers.
|
|
|
|
### Call IDs
|
|
|
|
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#m-call-invite)
|
|
|
|
These are only used within the body of `m.call.*` events, as far as I am
|
|
aware. They should be unique within the lifetime of a room. (Some
|
|
implementations currently treat them as globally unique, but that is considered
|
|
an implementation bug.)
|
|
|
|
[matrix-js-sdk](https://github.com/matrix-org/matrix-js-sdk/blob/4d310cd4618db4e98a8e6b5eb812480102ee4dee/src/webrtc/call.js#L72) uses `c[0-9.]{32}`.
|
|
[matrix-android-sdk](https://github.com/matrix-org/matrix-android-sdk/blob/5c6f785e53632e7b6fb3f3859a90c3d85b040e7f/matrix-sdk/src/main/java/org/matrix/androidsdk/call/MXWebRtcCall.java#L221) uses `c[0-9]{13}`.
|
|
|
|
Additional proposal:
|
|
|
|
> Call IDs should be long enough to make clashes unlikely.
|
|
|
|
### Media IDs
|
|
|
|
[Spec](https://matrix.org/docs/spec/client_server/r0.3.0.html#id67)
|
|
|
|
These are generated by the server on upload, and then embedded in `mxc://` URIs
|
|
and used in the C-S API and the S-S API.
|
|
|
|
They must be URI-safe to be sensibly embedded in `mxc://` URIs.
|
|
|
|
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/rest/media/v1/media_repository.py#L153)
|
|
uses `[A-Za-z]{24}`, though it also uses `[0-9A-Za-z_-]{27}` for
|
|
[URL
|
|
previews](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/rest/media/v1/preview_url_resource.py#L285).
|
|
|
|
[matrix-media-repo](https://github.com/turt2live/matrix-media-repo/blob/539f25ee75ba6cdbb0410314b29978f4b8b1d7fe/src/github.com/turt2live/matrix-media-repo/controllers/upload_controller/upload_controller.go#L50)
|
|
uses `[A-Za-z0-9]{32}`, via [random.go](https://github.com/turt2live/matrix-media-repo/blob/539f25ee75ba6cdbb0410314b29978f4b8b1d7fe/src/github.com/turt2live/matrix-media-repo/util/random.go#L18-L27).
|
|
|
|
### Filter IDs
|
|
|
|
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#post-matrix-client-r0-user-userid-filter)
|
|
|
|
These are generated by the server and then used in the CS API. They are only
|
|
required to be unique for a given user. `{` is already forbidden by the spec.
|
|
|
|
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/storage/filtering.py#L70-L73)
|
|
uses a stringified int.
|
|
|
|
### Auth Session IDs
|
|
|
|
[Spec](https://matrix.org/docs/spec/client_server/r0.3.0.html#user-interactive-authentication-api)
|
|
|
|
These are generated by the server during auth, and then used in the CS
|
|
API. However, they need to be unique for a given server.
|
|
|
|
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/auth.py#L494) uses `[A-Za-z]{24}`.
|
|
|
|
### Transaction IDs (for federation)
|
|
|
|
[Spec](https://matrix.org/docs/spec/server_server/unstable.html#put-matrix-federation-v1-send-txnid)
|
|
|
|
Generated by sending server. Needs to be unique for a given pair of servers.
|
|
|
|
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/federation/transaction_queue.py#L593) uses a stringified int and accepts pretty much anything.
|
|
|
|
### Transaction IDs (for C-S API)
|
|
|
|
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#put-matrix-client-r0-rooms-roomid-send-eventtype-txnid)
|
|
|
|
These are generated by the client. They only need to be unique within the
|
|
context of a single access_token/device.
|
|
|
|
Synapse doesn't appear to do any sanity-checking here currently.
|
|
|
|
[matrix-js-sdk](https://github.com/matrix-org/matrix-js-sdk/blob/c6b500bc09994ab5924ef8aab9bd10fc7ded5dae/src/base-apis.js#L123)
|
|
uses `m[0-9]{13}.[0-9]{1,}`.
|
|
[matrix-android-sdk](https://github.com/matrix-org/matrix-android-sdk/blob/088414fb187cae341690c3a01493b87d97f0169f/matrix-sdk/src/main/java/org/matrix/androidsdk/rest/model/Event.java#L503)
|
|
uses a room ID plus a timestamp, hence kinda could be anything, but certainly
|
|
will include a `!`.
|
|
|
|
### Device IDs
|
|
|
|
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#relationship-between-access-tokens-and-devices)
|
|
|
|
These are normally generated by the server on login. It's possible for clients
|
|
to present their own device_ids, but we're not aware of this feature being
|
|
widely used.
|
|
|
|
They are used between users and across federation for E2E and to-device
|
|
messages. They need to be unique for a particular user. They also appear in key
|
|
IDs and must therefore be a subset of that grammar.
|
|
|
|
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/device.py#L89)
|
|
generates device IDs with `[A-Z]{10}`. It appears to do little sanity-checking
|
|
of client-generated device IDs currently.
|
|
|
|
Additional proposal:
|
|
|
|
> Device IDs must not exceed 31 characters in length.
|
|
|
|
### Message IDs
|
|
|
|
These are used in the server-server API for
|
|
[Send-to-device messaging](https://matrix.org/docs/spec/server_server/unstable.html#send-to-device-messaging).
|
|
|
|
Synapse uses `[A-Za-z]{16}`, and accepts anything that fits in a postgres TEXT
|
|
field. Ref: [devicemessage.py](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/devicemessage.py#L102).
|
|
|
|
|
|
## Room Aliases
|
|
|
|
These are a complex topic and are discussed in [MSC
|
|
1608](https://github.com/matrix-org/matrix-doc/issues/1608).
|