matrix.org/static/jira/browse/SPEC-1

219 lines
8.0 KiB
Plaintext

---
summary: Need to specify the grammar for room aliases and user_ids.
---
assignee: richvdh
created: 2014-09-15 23:29:36.0
creator: matthew
description: |-
We need to specify the grammar for internal protocol identifiers:
* event types
* room IDs
We need a grammar for ids that are used both in the protocol and are exposed to humans:
* room aliases
* user IDs
Additionally we may need to restrict the allowed characters in human readable names:
* room display names
* user display names
id: '10006'
key: SPEC-1
number: '1'
priority: '1'
project: '10001'
reporter: matthew
resolution: '3'
resolutiondate: 2016-04-19 12:03:59.0
status: '5'
type: '2'
updated: 2016-06-01 09:53:54.0
votes: '0'
watches: '9'
workflowId: '10321'
---
actions:
- author: kegan
body: This is outlined in docs/human-id-rules.rst and basically follows NAMEPREP/STRINGPREP - This needs to be implemented on the HS and have a bit more discussion.
created: 2014-09-16 09:28:52.0
id: '10106'
issue: '10006'
type: comment
updateauthor: kegan
updated: 2014-09-16 09:28:52.0
- author: matthew
body: This is getting more urgent, with people managing to create aliases which include whitespace, and other such insanity :(
created: 2015-04-07 20:28:41.0
id: '11476'
issue: '10006'
type: comment
updateauthor: matthew
updated: 2015-04-07 20:28:41.0
- author: markjh
body: We urgently need to give a grammar for the user ID and for the room ID since the v2 filter API makes assumptions that '*" is not a valid character in a room id or a user ID so that it can use it as a wildcard.
created: 2015-09-23 16:09:08.0
id: '12157'
issue: '10006'
type: comment
updateauthor: markjh
updated: 2015-09-23 16:09:08.0
- author: leonerd
body: |-
Can we make some initial progress here? I'd personally like to suggest some partial rules on character sets and the like.
Definitely in:
* US-ASCII letters; lower and uppercase.
* Decimal digits
* Punctuation of _, -
Definitely out:
* Any kind of whitespace
* Punctuation of : or . except where required by string structure rules
This still leaves a great deal of punctuation characters undecided, not to mention any thoughts on Unicode...
created: 2015-09-23 16:14:50.0
id: '12158'
issue: '10006'
type: comment
updateauthor: leonerd
updated: 2015-09-23 16:14:50.0
- author: matthew
body: |-
NB that Kegan did start this at https://github.com/matrix-org/matrix-doc/blob/master/drafts/model/third-party-id.rst
Mark: thanks for clarifying and sanitizing the description
My thoughts for room alises and user ids: "utf8, with a blacklist of explicitly disallowed characters (all whitespace, *, /, :, ., any others we want to reserve). you're not allowed to mix charsets (fsvo charset), and possibly deny other homomorph attacks eg l v I" IDs are compared case insensitively.
The internal identifiers can be much more strict. Display Names etc could just be length-limited utf8 strings with no restrictions, unless we want to protect against homomorph attacks by disambiguation somehow.
created: 2015-09-23 17:31:25.0
id: '12159'
issue: '10006'
type: comment
updateauthor: matthew
updated: 2015-09-23 17:31:25.0
- author: matthew
body: oh, and no zero length ids
created: 2015-09-23 17:35:40.0
id: '12160'
issue: '10006'
type: comment
updateauthor: matthew
updated: 2015-09-23 17:35:40.0
- author: richvdh
body: Possibly matthew meant to link to https://github.com/matrix-org/matrix-doc/blob/master/drafts/human-id-rules.rst (plus https://github.com/matrix-org/matrix-doc/pull/3).
created: 2015-11-05 09:43:05.0
id: '12325'
issue: '10006'
type: comment
updateauthor: richvdh
updated: 2015-11-05 09:43:05.0
- author: matthew
body: er, yes. sorry. i'm a crank. but you knew that.
created: 2015-11-05 10:03:26.0
id: '12327'
issue: '10006'
type: comment
updateauthor: matthew
updated: 2015-11-05 10:03:26.0
- author: richvdh
body: |-
> Display Names etc could just be length-limited utf8 strings with no restrictions, unless we want to protect against homomorph attacks by disambiguation somehow.
I think we do want to protect against homomorph attacks, as per SPEC-221.
created: 2015-11-05 10:26:08.0
id: '12333'
issue: '10006'
type: comment
updateauthor: richvdh
updated: 2015-11-05 10:26:08.0
- author: kegan
body: https://github.com/matrix-org/matrix-doc/pull/3 is the proposal, the one currently on {{master}} is very old early notes.
created: 2015-11-06 10:11:21.0
id: '12345'
issue: '10006'
type: comment
updateauthor: kegan
updated: 2015-11-06 10:11:21.0
- author: xena
body: |-
I think a reasonable thing would be to allow anything allowed in an RFC 1459 channel name but modified for matrix:
- Spaces are not allowed
- Commas are not allowed
- Colons are not allowed
- \007 (BEL) is not allowed
created: 2015-12-09 19:28:34.0
id: '12452'
issue: '10006'
type: comment
updateauthor: xena
updated: 2015-12-09 19:28:34.0
- author: matthew
body: I've tried to incorporate tonight's discussions from HQ into https://github.com/matrix-org/matrix-doc/pull/3#issuecomment-163453706
created: 2015-12-10 01:05:00.0
id: '12453'
issue: '10006'
type: comment
updateauthor: matthew
updated: 2015-12-10 01:05:00.0
- author: neb
body: 'By @matthew:matrix.org: eternaleye suggests: Matthew: Unicode XID_Start XID_Continue* maybe? Matthew: Since those are meant to be identifiers, such as for programming languages to restrict variable names to. Matthew: (Rust does so, in fact, though it further restricts variables to ASCII unless you enable the unicode identifiers feature gate IIRC)'
created: 2016-01-04 23:06:08.0
id: '12498'
issue: '10006'
type: comment
updateauthor: neb
updated: 2016-01-04 23:06:08.0
- author: eternaleye
body: |-
A similar ticket for display names may need opened separately, but today in #matrix:matrix.org it was discovered that display names permit whitespace that they probably shouldn't (a trailing \n\t\t on a display name caused some confusion).
Some whitespace cannot be blocked - the zero-width joiner, in particular, is [needed to construct the letterforms of some languages that Unicode does not support sufficiently|https://modelviewculture.com/pieces/i-can-text-you-a-pile-of-poo-but-i-cant-write-my-name]. In addition, spaces are widely used, and blocking those would likely cause widespread breakage. However, it is entirely possible that all other whitespace can be banned without detrimental effect.
created: 2016-03-10 14:56:08.0
id: '12759'
issue: '10006'
type: comment
updateauthor: eternaleye
updated: 2016-03-10 14:59:29.0
- author: eternaleye
body: |-
From #matrix:matrix.org:
{quote}
\* eternaleye would personally support a hard ban on anything outside of \[0-9A-Za-z_.-], and if people want to be silly with it then they can use punycode and render in the client
The dot supports corporate-style firstname.lastname (in countries where that's used), the underscore supports IRC-like names, the dash is needed for punycode, and the rest just are baseline
Punycode also allows representing anything other networks do, without needing state, so IRC nicks with backticks could just get punycoded by the AS.
Same for square brackets
(Display Name is all that gets shown when set anyways)
{quote}
created: 2016-03-18 22:33:53.0
id: '12767'
issue: '10006'
type: comment
updateauthor: eternaleye
updated: 2016-03-18 22:33:53.0
- author: richvdh
body: This issue is trying to cover too many different things. I'm going to split it up.
created: 2016-04-19 10:16:14.0
id: '12845'
issue: '10006'
type: comment
updateauthor: richvdh
updated: 2016-04-19 10:16:14.0
- author: richvdh
body: |-
Now split up as follows:
* opaque IDs: SPEC-388
* room IDs and event IDs: SPEC-389
* user IDs: SPEC-390
* room aliases: SPEC-391
* display names: SPEC-392
created: 2016-04-19 12:03:44.0
id: '12850'
issue: '10006'
type: comment
updateauthor: richvdh
updated: 2016-04-19 12:03:44.0