matrix-doc/proposals/2758-textual-id-grammar.md

2.8 KiB

MSC2758: Common grammar for textual identifiers

The matrix specification uses textual identifiers for a wide range of concepts. Examples include "event types" and "room versions".

In the past, these identifiers have often lacked a formal grammar, leaving servers and clients to make assumptions about questions such as which characters are permitted, minimum and maximum lengths, etc.

This proposal suggests a common grammar which can be used as a basis for future identifier types, to reduce the work involved in future specification work.

No attempt is made here to bring existing identifiers into line; however examples of identifiers which might have benefitted from such a grammar in the past include:

Proposal

We define a "common namespaced identifier grammar". This can then be referenced by other parts of the grammar, in much the same way as Unpadded Base64 is defined today.

The grammar is defined as follows:

  • An identifier may not be less than one character or more than 255 characters in length.
  • Identifiers must start with one of the characters [a-z], and be entirely composed of the characters [a-z], [0-9], -, _ and ..
  • Identifiers starting with the characters m. are reserved for use by the formal matrix specification.
  • Implementations wishing to implement unspecified identifiers should follow the Java Package Naming convention of starting with a reversed domain name (with a dot after the domain name part). For example, for the organisation example.com, a valid identifier would be com.example.identifier.

This grammar is intended for use entirely by internal identifiers, and not for user-visible strings.

Rationale

  • Avoiding non-ascii characters sidesteps any issues with homoglyphs or alternative encodings of the same characters.
  • Avoiding upper-case character sidesteps any concerns over case-sensitivity.