617 lines
29 KiB
Markdown
617 lines
29 KiB
Markdown
# MSC4143: MatrixRTC
|
|
|
|
MatrixRTC is short for Matrix real time communication.
|
|
This MSC defines the modules with which the Matrix real time system is built.
|
|
|
|
MatrixRTC specifies how a real time session is described in a room and how matrix users can connect to
|
|
a session.
|
|
|
|
The MatrixRTC specification is separated into different modules:
|
|
|
|
|
|
- The MatrixRTC room state that defines the state of the real time session.\
|
|
It is the source of truth for:
|
|
- Who is part of a session
|
|
- Who is connected via what technology/backend
|
|
- Metadata per device used by other participants to decide whether the streams
|
|
from this source are of interest / need to be subscribed.
|
|
- The MatrixRTC backend.
|
|
- Allows for multiple backend implementations to be used.
|
|
- It defines how to discover the available backend(s).
|
|
- It defines how to connect the participating peers.
|
|
- Defines how to connect to a server/other peers, how to update the connection,
|
|
how to subscribe to different streams...
|
|
- A proposal utilising LiveKit is the standard for this as of writing.
|
|
- Another planned backend is a full mesh implementation based on [MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401).
|
|
- The MatrixRTC application.
|
|
- Each application type can have it's own spec.
|
|
- Voice and video conferencing can be done with an application of type `m.call`
|
|
- The application defines all the details of the RTC experience:
|
|
- How to interpret the metadata of the member events.
|
|
- What streams to connect to.
|
|
- What data in which format to sent over the RTC channels.
|
|
- What MatrixRTC backends are supported.
|
|
- End-to-end encryption of media streams
|
|
|
|
This MSC will focus on the Matrix room state which is responsible for the high
|
|
level signalling of a RTC session:
|
|
|
|
## Proposal
|
|
|
|
Each RTC session is made out of a collection of `m.rtc.member` room state events.
|
|
Each `m.rtc.member` event defines who (the `member`) is a participant of which session (the `session`).
|
|
|
|
### The MatrixRTC room state
|
|
|
|
All data related to a MatrixRTC session
|
|
(current session, sessions history, join/leave events, ...) only
|
|
requires one event type.
|
|
|
|
(current session, sessions history, join/leave events, ...) only
|
|
require one event type:.
|
|
|
|
We use a set of `m.rtc.member` (one for each participant) state events to represent a session.
|
|
|
|
based on the content a `m.rtc.member` state event can either represent a connected or a disconnected member.
|
|
|
|
#### Joining a session
|
|
|
|
Sending a well-formed `m.rtc.member` event that describes a connected state for a state key that is not yet used or contains a disconnected `m.rtc.member` event represents a join action.
|
|
|
|
The fields are as follows:
|
|
|
|
- `member` required object - describes the participant of the RTC session:
|
|
- `id` required string - a unique identifier for this session membership as defined above. Recommended to be a UUID. It can be reused if the user leaves and rejoins the session.
|
|
It should be unique across all devices of the user. TODO: define grammar
|
|
- `device_id` required string - the Matrix device ID of the device that is joining the session. This is used when sending
|
|
[to-device messages](https://spec.matrix.org/v1.11/client-server-api/#send-to-device-messaging).
|
|
- `user_id` required string - the Matrix user ID of the user that is joining the session. This is needed as we cannot rely
|
|
on the owner of state event as it might have been modified by an admin or similar.
|
|
- `session` required object - an object that is used to uniquely identify this session across RTC member events
|
|
of the Matrix room:
|
|
- `application` required string - a recognised application type. e.g. `m.call` as linked below
|
|
- additional fields as defined by the application type
|
|
- `created_ts` - timestamp in milliseconds since UNIX epoch.
|
|
- this should **not** be present the first time that the `m.rtc.member` event is sent.
|
|
- if the `m.rtc.member` event is sent again, the `created_ts` should be populated with the `origin_server_ts`
|
|
that was given to the previous version of the state event.
|
|
- `focus_active` required Focus object - specifies the algorithm that defines how to choose a Focus for this member. See below for details.
|
|
- `foci_preferred` required array of Focus objects - specifies the input data for this algorithm contributed by this member. See below for details.
|
|
|
|
Additional fields may be added depending on the application type.
|
|
|
|
A full `m.rtc.member` state event for a joined member looks like this:
|
|
|
|
```json5
|
|
// event type: "m.rtc.member"
|
|
// state key: see next section for definition
|
|
{
|
|
"session": {
|
|
"application": "m.call"
|
|
// further fields for the application
|
|
},
|
|
"member": {
|
|
"id": "xyzABCDEF10123",
|
|
"device_id": "DEVICEID",
|
|
"user_id": "@user:matrix.domain"
|
|
},
|
|
"created_ts": Time | undefined,
|
|
"focus_active": {...FOCUS_A},
|
|
"foci_preferred": [
|
|
{...FOCUS_1},
|
|
{...FOCUS_2}
|
|
]
|
|
}
|
|
```
|
|
|
|
This gives us the information, that user: `@user:matrix.domain` with member ID `DEVICEID_m:call_123456789`
|
|
is part of a session identified by `{}` using application of type `m.call` connected over `FOCUS_A`.
|
|
This is sufficient information for another room member to detect the running session and join it.
|
|
|
|
`created_ts` is an optional property that caches the time of creation. It is not required
|
|
for an event that, has not yet been updated, there the `origin_server_ts` is used.
|
|
|
|
> [!NOTE]
|
|
> We introduce `created_ts()` as the notation for `created_ts ?? origin_server_ts`
|
|
|
|
Once the event gets updated the `origin_server_ts` needs to be copied into the `created_ts` field.
|
|
An existing `created_ts` field implies that this is a state event updating the current session
|
|
and a missing `created_ts` field implies that it is a join state event.
|
|
All membership events that belong to one member session can be grouped with the index
|
|
`created_ts()`+`state_key`. This is why the `m.rtc.member` events deliberately do NOT include something akin to a `membership_id`.
|
|
|
|
Other then the membership sessions, there is **no event** to represent a RTC session (containing all members).
|
|
This event would include shared information where it is not trivial to decide who has authority over it.
|
|
Instead the session is a computed value based on `m.rtc.member` events.
|
|
The list of events with the same `session` content represent one session.
|
|
This array allows to compute fields like participant count, start time etc.
|
|
|
|
Based on the value of `application`, the event might include additional parameters
|
|
to provide additional session parameters.
|
|
|
|
> A [Third Room](https://thirdroom.io) like experience could include the information of an approximate position
|
|
> on the map, so that clients can omit connecting to participants that are not in their
|
|
> area of interest.
|
|
|
|
#### State key for `m.rtc.member`
|
|
|
|
The state key is generated from the `member` field of the `m.rtc.member` event.
|
|
|
|
We want to choose a state key that is compatible with whichever state protection proposal is accepted to ensure that
|
|
users cannot modify one another's sessions.
|
|
|
|
For [MSC3757](https://github.com/matrix-org/matrix-spec-proposals/pull/3757) we generate the state key by
|
|
concatenating the following strings:
|
|
|
|
- the Matrix ID of the user
|
|
- an `_` (underscore)
|
|
- the `member`.`id` field
|
|
|
|
For example with a `member`.`id` of `xyzABCDEF10123` for user `@user:matrix.domain` the state key would be `@user:matrix.domain_xyzABCDEF10123`.
|
|
|
|
For a client parsing the state key we would treat anything before the first `_` as the Matrix ID of the user
|
|
and anything after as the `member`.`id` field.
|
|
|
|
#### Leaving a session
|
|
|
|
Sending an empty `m.rtc.member` event represents a leave action. The state key must be the same as boefore
|
|
|
|
There is an optional `leave_reason` field that can be used to provide a reason for leaving the session:
|
|
|
|
- `leave_reason` optional string - one of: `lost_connection`
|
|
|
|
An example of leaving a session where the user explicitly disconnects:
|
|
|
|
```json5
|
|
// event type: "m.rtc.member"
|
|
// state key: "@user:matrix.domain_xyzABCDEF10123"
|
|
{
|
|
}
|
|
```
|
|
|
|
The client should use the `prev_content` field of the [room state event](https://spec.matrix.org/v1.11/client-server-api/#room-event-format)
|
|
to determine the details of the leave event.
|
|
|
|
For example:
|
|
|
|
```json5
|
|
// event type: "m.rtc.member"
|
|
// state key: "@user:matrix.domain_xyzABCDEF10123"
|
|
{
|
|
"content": {
|
|
"leave_reason": "lost_connection"
|
|
},
|
|
"prev_content": {
|
|
"session": {
|
|
"application": "m.call",
|
|
"call_id": ""
|
|
},
|
|
"member": {
|
|
"id": "xyzABCDEF10123",
|
|
"device_id": "DEVICEID",
|
|
"user_id": "@user:matrix.domain"
|
|
},
|
|
"created_ts": 123456,
|
|
"focus_active": {...FOCUS_A},
|
|
"foci_preferred": [
|
|
{...FOCUS_1},
|
|
{...FOCUS_2}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Reliability requirements for the room state
|
|
|
|
Room state is a very well suited place to store the data for a MatrixRTC session.
|
|
It allows:
|
|
|
|
- The client to determine current ongoing sessions without loading history for every room.
|
|
Or doing additional work other then the sync loop that needs to run anyways.
|
|
- The client can compute/access data of past sessions without any additional redundant data.
|
|
- Sessions (start/end/participant count) are federated and there is not redundant data storage that
|
|
could result in conflicts, or can get out of sync. The room state events are part of the DAG and this
|
|
is solved like any other Persistent Data Unit (PDU) in Matrix.
|
|
|
|
However, a challenging circumstance with using the room state to represent a session is
|
|
the disconnection behaviour. If the client disconnects from a call because of a network issue,
|
|
an application crash or a user forcefully quitting the client, the room state cannot be updated anymore.
|
|
The client is required to leave by sending a new empty state which cannot happen once connection is lost.
|
|
|
|
If the state is not updated correctly we end up with a room state that is not
|
|
correctly representing the current RTC session state. Historic and current MatrixRTC session data would be broken.
|
|
|
|
For an acceptable solution, the following requirements need to be taken into consideration:
|
|
|
|
- Room state is set to empty if the client looses connection. (A heartbeat like system is desired)
|
|
- The best source of truth for a call participation is a working connection to the SFU.
|
|
It is desired that the disconnect of the member on the SFU gets propagated to the room state.
|
|
- It should be possible to updated the room state without the client being online.
|
|
- All this should be compatible when Matrix uses cryptographic identities.
|
|
|
|
[MSC4140](https://github.com/matrix-org/matrix-spec-proposals/pull/4140) proposes a concept to
|
|
delay the leave events until one of the leave conditions (heartbeat or SFU disconnect) occur
|
|
and fulfil all of the these requirements.
|
|
|
|
A MatrixRTC client has to first send/schedule the following delayed leave event:
|
|
|
|
```json5
|
|
// event type: "m.rtc.member"
|
|
// state key: "@user:matrix.domain_xyzABCDEF10123"
|
|
{
|
|
"leave_reason": "lost_connection"
|
|
}
|
|
```
|
|
|
|
only after that the actual state event can be sent, so that we guarantee that the state will be empty eventually.
|
|
The `leave_reason` is added so clients can be more verbal about why a user disconnected from a call.
|
|
It allows to communicate with other participants in a session if the user has disconnected intentionally or lost connection.
|
|
|
|
#### Session history
|
|
|
|
Since there is no single entry for a historic session (because of the ownership discussion),
|
|
historic sessions need to be computed and most likely cached on the client.
|
|
|
|
Each state event can either mark a join or leave:
|
|
|
|
- join: `prev_state.session != current_state.session` &&
|
|
`current_state.session != undefined`
|
|
(where an empty `m.rtc.member` event would imply `state.session == undefined`)
|
|
- leave: `prev_state.session != current_state.session` &&
|
|
`current_state.session == undefined`
|
|
|
|
Based on this one can find user sessions. (The range between a join and a leave
|
|
event) of specific times.
|
|
The collection of all overlapping user sessions with the same `session` contents
|
|
define one MatrixRTC history event.
|
|
|
|
### The RTC backend
|
|
|
|
Backend **infrastructure** in this context can be anything that can serve as the backend for a
|
|
MatrixRTC session. In most cases this is a SFU. But also a full mesh implementation could
|
|
be an infrastructure. Not all kind of infrastructure require a way of sourcing a backend resource
|
|
(e.g. full-mesh). In this MSC we only refer to infrastructure where it is necessary to have access to additional
|
|
data to participate in the MatrixRTC session.
|
|
|
|
The backend is referred to as a Focus or as Foci in plural.
|
|
|
|
Note that these backends are independent of the application (e.g. `m.call`) being used in the session.
|
|
|
|
A Focus is represented as a JSON object with one mandatory field:
|
|
|
|
- `type` required string: The type of the Focus as defined by an RTC backend..
|
|
|
|
Additional fields will be present depending on `type`.
|
|
|
|
Only users with the same type can connect in one session. If a frontend does
|
|
not support the used type they cannot connect.
|
|
|
|
Each Focus type will get its own MSC in which the detailed procedure to get from
|
|
the foci information to working WebRTC connections to the streams of all the
|
|
participants is explained.
|
|
|
|
Foci are represented in three places:
|
|
|
|
- `focus_active` of `m.rtc.member` state event - specifies the algorithm that defines how to choose a Focus for this member.
|
|
- `foci_preferred` of `m.rtc.member` state event- specifies the input data for this algorithm contributed by this member.
|
|
- `m.rtc_foci` of the `.well-known/matrix/client` - specifies the list of available Foci for the homeserver.
|
|
|
|
The `focus_active` algorithm needs to be designed so that all participants converge to the same SFU/Focus.
|
|
|
|
The following Focus `type` values are defined:
|
|
|
|
- `livekit` - a backend using the [LiveKit](https://livekit.io/) SFU as described in
|
|
[MSC4195](https://github.com/matrix-org/matrix-spec-proposals/pull/4195).
|
|
- `full_mesh` - a backend using a full-mesh approach based on [MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401).
|
|
|
|
#### Choosing the value of `foci_preferred` for the `m.rtc.member` state event
|
|
|
|
At some point session participants have to decide/propose which Focus they will use.
|
|
|
|
Based on the Focus type and application choosing the method by which the contents of the `foci_preferred` field on the `m.rtc.member`
|
|
can be different.
|
|
|
|
There are three guidelines which should be obeyed by a client when building the `foci_preferred` list:
|
|
|
|
1. It is always desired to have as few Focus switches as possible.
|
|
|
|
If there are other participants on the session (i.e. other `m.rtc.member` events) the client should calculate what the Focus it should connect to
|
|
based on the `m.rtc.member` events for the existing participants.
|
|
This should happen reactively on each `m.rtc.member` state event change.
|
|
Each MatrixRTC frontend is responsible that it can deal with focus switches based on changing state gracefully. It is part of the design of MatrixRTC and a requirement for a eventually consistent distributed system.
|
|
|
|
The calculated Focus should then be present at the start of the `foci_preferred` list.
|
|
|
|
2. The client should lookup the suggested foci from the homeserver `.well-known/matrix/client` as defined below.
|
|
|
|
MatrixRTC is designed around the same culture that makes Matrix possible: A large amount of infrastructure in the form of homeservers is provided by the users.
|
|
|
|
To achieve a stable and healthy ecosystem backend RTC infrastructure should be thought of as a part of a homeserver.
|
|
|
|
It is very similar to a TURN server: mostly traffic and little CPU load.
|
|
|
|
To not end up in a world where each user is only using one central SFU but where the traffic
|
|
is split over multiple SFU's it is important that we leverage the SFU distribution on the
|
|
homeserver federation.
|
|
|
|
These proposals from **your own** homeserver should come next in the `foci_preferred` list of the member event.
|
|
|
|
3. Clients should not use a hard-coded Focus.
|
|
|
|
Looking up the preferred Foci from a client is toxic to a federated system. If the majority of users
|
|
decide to use the same client all of the users will use one Focus. This destroys the passive security mechanism, that
|
|
each instance is not an interesting attack vector since it is only a fraction of the network.
|
|
Additionally it will result in poor performance if every user on Matrix would use the same Focus.
|
|
|
|
However, there are cases where this is acceptable:
|
|
|
|
- Transitioning to MatrixRTC. Here it might be beneficial to have a client that has a fallback Focus
|
|
so calls also work with homeservers not supporting it.
|
|
- For testing purposes where a different Focus should be tested but one does not want to touch the .well-known
|
|
- For custom deployments that benefit from having the Focus configuration on a per client basis instead of per homeserver.
|
|
|
|
Therefore, if a client does use a hard-coded Focus it should come last in the `foci_preferred` list.
|
|
|
|
#### Discovery of Foci using `.well-known/matrix/client`
|
|
|
|
> [!NOTE]
|
|
> Backend **infrastructure** in this context can be anything that can serve as the backend for a
|
|
> MatrixRTC session. In most cases this is a SFU. But also a full mesh implementation could
|
|
> be an infrastructure. Not all kind of infrastructure require a way of sourcing a backend resource
|
|
> (e.g. full-mesh). In this MSC we only refer to infrastructure where it is necessary to have access to additional
|
|
> data to participate in the MatrixRTC session.
|
|
|
|
We use a `m.rtc_foci` key in the homeserver `.well-known/matrix/client` that can be used to expose
|
|
a sorted (by priority) list of Focus description objects.
|
|
|
|
For example in generic form:
|
|
|
|
```json5
|
|
{
|
|
"m.rtc_foci": [
|
|
{
|
|
"type": "some-focus-type",
|
|
"additional-type-specific-field": "https://my_focus.domain",
|
|
"another-additional-type-specific-field": ["with", "Array", "type"]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
Or a concrete example for a `livekit` Focus:
|
|
|
|
```json5
|
|
{
|
|
"m.rtc_foci": [
|
|
{
|
|
"type":"livekit",
|
|
"livekit_service_url":"https://livekit-jwt.call.element.io"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### The RTC application types
|
|
|
|
Each application type might have its own specification in how the different streams
|
|
are interpreted and even what Focus type to use. This makes this proposal extremely
|
|
flexible. A Jitsi conference could be added by introducing a new `application`
|
|
and a new Focus type and would be MatrixRTC compatible. It would not be compatible
|
|
with applications that do not use the Jitsi Focus but clients would know that there
|
|
is an ongoing session of unknown type and unknown Focus and could display/represent
|
|
this in the user interface.
|
|
|
|
To make it easy for clients to support different application types, the recommended
|
|
approach is to provide a Matrix widget for each application type. This way the
|
|
client developers can use the widget as the first implementation if they want to
|
|
support this RTC application type.
|
|
|
|
Each application should get its own MSC in which the all the additional
|
|
fields are explained and how the communication with the possible foci is
|
|
defined:
|
|
|
|
- `m.call` - voice and video conferencing described by [MSC4196](https://github.com/matrix-org/matrix-spec-proposals/pull/4196).
|
|
|
|
#### Interoperability between applications
|
|
|
|
There is a use-case in which a `m.call` app might want to participate in a session of type (application) `custom-call-with-more-features`. A native mobile matrix client might support `m.call` and is at hand to join the feature rich application/session.
|
|
|
|
There could be fallback mechanisms but the most flexible approach is to treat it per application type. If it makes sense for an application type to fully conform to `m.call` a client that can connect to an `m.call` RTC session (application) could claim that it is also compatible with `custom-call-with-more-features` . It is than the job of the `custom-call-with-more-features` session type (application) to define some kind of feature list so that it can tell if users are joining with an m.call client or a dedicated `custom-call-with-more-features` client.
|
|
### End-to-end encryption of media streams
|
|
|
|
We define how the key material is shared between the participants of the call to facilitate end-to-end encryption of the media streams.
|
|
|
|
The backend (e.g. LiveKit) MSC defines how the key material is actually used.
|
|
|
|
#### Shared password
|
|
|
|
A shared password may be used to encrypt the media streams sent via the RTC backend that has been distributed ahead of time to the participants.
|
|
|
|
For example, it could be in the query parameter of a private URL attached to a calendar invitation.
|
|
|
|
#### Per-participant sender key
|
|
|
|
A participant can share it's chosen key with other participants by sending Matrix [to-device messaging](https://spec.matrix.org/v1.11/client-server-api/#send-to-device-messaging) to the other participants.
|
|
|
|
The key is sent as an event of type `m.rtc.encryption_keys` as an encrypted to-device message.
|
|
|
|
The device ID that is being sent to is the `member`.`device_id` from the `m.rtc.member` events.
|
|
|
|
The event contains the following fields:
|
|
|
|
- `session` required object: The contents of the `session` from the `m.rtc.member` event.
|
|
- `member` required object: The contents of the `member` from the corresponding `m.rtc.member` event.
|
|
- `keys` required array of objects: The sender keys to be distributed to the participant:
|
|
- `key` required string: The base64 encoded key material.
|
|
- `index` required int: The index of the key to distinguish it from other keys. This must be a between 0 and 255 inclusive.
|
|
In some implementations of MatrixRTC this may correspond to the `keyID` field of the WebRTC [SFrame](https://www.w3.org/TR/webrtc-encoded-transform/#sframe) header.
|
|
- `invalidates_key_index` optional int: The index of the key that is invalidated by this key. If this is set, the application should invalidate the key identified
|
|
by `invalidates_key_index` once it receives a frame with the new `index`. This is to protect against an exfiltrated key being used to forge frames.
|
|
- `invalidates_after_ms` optional int: The number of milliseconds after the key identified by `invalidates_key_index` is invalidated by this key even if no frames
|
|
are received. Again, this is to protect against an exfiltrated key being used to forge frames.
|
|
|
|
Depending on the RTC application, additional fields may be added to this event.
|
|
|
|
An example to-device event:
|
|
|
|
```json5
|
|
// event type: "m.rtc.encryption_keys"
|
|
{
|
|
"session": {
|
|
"application": "m.call",
|
|
"call_id": "",
|
|
"scope": "m.room"
|
|
},
|
|
"member": {
|
|
"id": "xyzABCDEF10123",
|
|
"device_id": "DEVICEID",
|
|
"user_id": "@user:matrix.domain"
|
|
},
|
|
"room_id": "!roomid:matrix.domain",
|
|
"keys": [
|
|
{
|
|
"index": 10,
|
|
"key": "base64encodedkey",
|
|
"invalidates_key_index": 9,
|
|
"invalidates_after_ms": 5000
|
|
},
|
|
],
|
|
}
|
|
```
|
|
|
|
On receipt of the `m.rtc.encryption_keys` event the application can associate the received key with the RTC session by matching the `session` and `member` contents with the corresponding `m.rtc.member` event.
|
|
|
|
When the application joins the session it should send the key to all the existing participants.
|
|
|
|
To ensure forward secrecy and post compromise security, the key material should be rotated (i.e. a new key generated) when a participant joins or leaves the session.
|
|
|
|
Key rotation is done as follows:
|
|
|
|
- the sending application generates the new key material for the participant.
|
|
- the sending application sends the new key material to all the participants with a new `index` value and `invalidates_key_index` set to the current `index`.
|
|
- the receiving application stores the new key material for the specified `index`.
|
|
- the sending application continues to use the old/current key to encrypt media.
|
|
- the sending application waits for a period of time. The default should be 3 seconds.
|
|
It is possible to overwrite this on a per application basis in case an application has specific requirements on security or wants to minimize missed stream data.
|
|
Also negotiation approaches can be defined where the RTC application uses data channels to communicate if everyone has received the next key.
|
|
- the sending application starts to use the new key to encrypt media.
|
|
- the receiving application invalidates the existing key with the `invalidates_key_index` value.
|
|
|
|
### Discovery/negotiation of application types
|
|
|
|
Problem: If a user wants to make a call to a user or room, then which call/application options should the client present to the user?
|
|
|
|
This should also take account of non-MatrixRTC calling: legacy 1:1 VoIP, room state widget for Jitsi.
|
|
|
|
TODO: write up notes.
|
|
|
|
## Potential issues
|
|
|
|
## Alternatives
|
|
|
|
### One state event per user
|
|
|
|
[MSC3401](https://github.com/matrix-org/matrix-spec-proposals/pull/3401) proposed to have one state event per user with that state event containing an array of memberships.
|
|
|
|
This introduces two problems:
|
|
|
|
- potential inconsistency where one user device overwrites the state of another device during a concurrent update.
|
|
- when handling client disconnects the MSC3757 proposal could not be used as you would not know what the correct
|
|
state is at the time of the disconnect.
|
|
|
|
### One state event per device
|
|
|
|
This would mean not using `member`.`id` in the state key anymore. Race conditions can be solved by the client which would need to manage multiple sessions at once.
|
|
|
|
### A separate system not associated with Matrix accounts
|
|
|
|
This MSC proposes to combine the MatrixRTC backend infrastructure with the homeserver.
|
|
Other sources where the backend could be sourced from are:
|
|
|
|
- A separate system not associated with Matrix accounts.
|
|
(you would need a Matrix account + a "LiveKit provider" account for example)
|
|
- The client could bring its own backend link.
|
|
- A centralized solution.
|
|
|
|
The centralized solution would not fit to Matrix. A separate system would match the distributed
|
|
nature of Matrix but would not match the user experience goals for MatrixRTC calls.
|
|
|
|
The client defining the SFU that is used, is the current solution. This causes the issue, that clients
|
|
in general are less distributed than homeservers. There is only a limited set of clients that a large
|
|
percentage of users use.
|
|
Using this as the source for the infrastructure would result in just a handful of very large infrastructure
|
|
hosts.
|
|
This is harder to scale and it is harder to justify who is covering the costs. (For Matrix homeservers, this
|
|
is an already solved problem where there are individuals, communities and institutions that have their own individual
|
|
solutions and answers for how and why they provide the infrastructure.)
|
|
|
|
### `m.rtc.encryption_keys` room event
|
|
|
|
Earlier iterations of this MSC used an encrypted `m.rtc.encryption_keys` room event to distribute the per-participant sender keys.
|
|
|
|
Whilst reducing traffic by only needing to send one event per participant, this approach does not allow for perfect forward secrecy
|
|
as the keys are stored in the room history.
|
|
|
|
The encrypted content of the `m.rtc.encryption_keys` event was as follows:
|
|
|
|
```json5
|
|
{
|
|
"session": {
|
|
"application": "m.call",
|
|
"call_id": ""
|
|
},
|
|
"member": {
|
|
"id": "xyzABCDEF10123",
|
|
"device_id": "DEVICEID",
|
|
"user_id": "@user:matrix.domain"
|
|
}.
|
|
"keys": [
|
|
{
|
|
"index": 0,
|
|
"key": "base64encodedkey"
|
|
},
|
|
],
|
|
}
|
|
```
|
|
|
|
## Security considerations
|
|
|
|
### Discoverability of infrastructure
|
|
|
|
The `.well-known/matrix/client` is publicly readable, hence everyone can read and know
|
|
about the infrastructure which could lead to resource "stealing".
|
|
Each infrastructure however has their own authentication mechanism defined in the infrastructure specification.
|
|
Those mechanisms for instance can use a service to interact with the homeserver and based on that decide to allow users
|
|
to use the infrastructure.
|
|
|
|
This is defined in the respective infrastructure MSC.
|
|
|
|
### Forward secrecy for end-to-end encryption of media streams
|
|
|
|
The considerations to ensure forward secrecy are described in the [End-to-end encryption of media streams](#end-to-end-encryption-of-media-streams)
|
|
section above.
|
|
|
|
### End-to-end media encryption key rotation lag
|
|
|
|
The proposed key rotation semantics does mean that a participant could continue to decrypt media that was sent in the three seconds after
|
|
leaving the session.
|
|
|
|
## Unstable prefix
|
|
|
|
Use `org.matrix.msc3401.call.member` as the state event type in place of `m.rtc.member`.
|
|
|
|
For discovery via `.well-known/matrix/client` the prefix `org.matrix.msc4158.rtc_foci` is used in place of `m.rtc_foci`.
|
|
|
|
Use `io.element.call.encryption_keys` in place of the `m.rtc.encryption_keys` room event and to-device event types.
|
|
|
|
## Dependencies
|
|
|
|
This proposal depends on
|
|
[MSC3757: Restricting who can overwrite a state event](https://github.com/matrix-org/matrix-spec-proposals/pull/3757)
|
|
to provide access control for the decentralised management of call membership state. However, an alternative such
|
|
as [MSC3779: "Owned" State Events](https://github.com/matrix-org/matrix-spec-proposals/pull/3779) could be used instead with
|
|
some adaptations.
|
|
|
|
This proposal also depends on [MSC4140: Cancellable delayed events](https://github.com/matrix-org/matrix-spec-proposals/pull/4140)
|
|
to provide a mechanism for clients to ensure that they can update the room state even if they lose connection.
|