matrix-doc/proposals/3202-encrypted-appservices.md

159 lines
8.8 KiB
Markdown

# MSC3202: Encrypted Appservices
Presently, appservices in Matrix are capable of attaching themselves to a homeserver for high-traffic
bot-like usecases, such as bridging and operationally expensive bots. Traditionally, these appservices
only work in unencrypted rooms due to not having enough context on the encryption state to actually
function properly.
This MSC targets the missing bits to support encryption at the appservice level: other MSCs, such as
[MSC2409](https://github.com/matrix-org/matrix-doc/pull/2409) and [MSC2778](https://github.com/matrix-org/matrix-doc/pull/2778)
give appservices foundational pieces to get device IDs and to-device messages, as required by encryption.
## Proposal
This proposal takes inspiration from [MSC2409](https://github.com/matrix-org/matrix-doc/pull/2409) by
defining a new set of keys on the appservice `/transactions` endpoint, similar to sync:
```json5
{
"events": [
// as defined today
],
"ephemeral": [
// MSC2409
],
"to_device": [
// MSC2409
],
"device_lists": {
"changed": ["@alice:example.org"],
"left": ["@bob:example.com"]
},
"device_one_time_keys_count": {
"@_irc_bob:example.org": {
"DEVICEID": {
"curve25519": 10,
"signed_curve25519": 20
}
}
},
"device_unused_fallback_key_types": {
"@_irc_bob:example.org": {
"DEVICEID": ["signed_curve25519"]
}
}
}
```
These fields are heavily inspired by [the extensions to /sync](https://matrix.org/docs/spec/client_server/r0.6.1#id84)
in the client-server API.
All the new fields can be omitted if there are no changes for the appservice to handle. For
`device_one_time_keys_count` and `device_unused_fallback_key_types`, the format is slightly different
from the client-server API to better map the appservice's user namespace users to the counts. Users
in the namespace without keys or which have unchanged keys since the last transaction can be omitted
(more details on this later on). Note that fallback keys are described in
[MSC2732](https://github.com/matrix-org/matrix-doc/pull/2732) as of writing.
Like MSC2409, any user the appservice would be considered "interested" in (user in the appservice's
namespace, or sharing a room with an appservice user/namespaced room) would qualify for the device
list changes section.
Note that it's typical for clients to pause sync loops when processing device list changes to avoid
a scenario where they are unable to decrypt/encrypt a message from/to a particular device. Appservices
are expected to mirror this by ensuring the transaction request does not complete until processing
is complete. In the worst case, the server will time out the request and retry it verbatim, so
appservices might wish to track which device list changes in which transaction they already processed
or keep processing transactions in the background while retries are attempted.
In order to allow the appservice to masquerade as its users, an extension to the existing
[identity assertion](https://matrix.org/docs/spec/application_service/r0.1.2#identity-assertion)
ability is proposed. To compliment the (optional) `user_id` when using an `as_token` as an access
token, a similarly optional `device_id` query parameter is proposed. When provided, the server asserts
that the device ID is valid for the user, and that the appservice is able to masquerade as that user.
If valid, that device ID should be assumed as being used for that request. For many requests, this
means updating the "last seen IP" and "last seen timestamp" for the device, however for some endpoints
it means interacting with that device (such as when uploading keys).
### Optimization: when to send OTKs/fallback keys
As mentioned above, in order to keep the transaction byte size down the server can (and should) exclude
OTK counts and unused fallback keys when they haven't changed since the last transaction. Appservices
however should be tolerable of the server over-communicating the counts as a quick/cheap approach would
be to *always* include the OTK counts/unused fallback keys for all known users rather than trying to
detect changes.
As a middle ground, servers might be interested in an algorithm which doesn't detect changes between
transactions but does attempt to reduce traffic. If the appservice is about to receive an event or
message typically associated with encryption, the counts for the affected users could be included. This
would result in the following rules:
* If an `m.room.encrypted` event is being included in the transaction's `events`, include OTK counts and
unused fallback key types for all appservice users which reside in that room.
* If an appservice user is receiving a to-device message in the transaction's `to_device` array, include
OTK counts and unused fallback key types for that user.
This approach has the advantage of typically minimal changes to the internals of the homeserver, works
similar to `/sync`, and reduces noisy traffic in the transaction sending. This additionally still honours
the "when they change, send the counts" requirement to a reasonable degree: typically a use of an OTK will
be followed by a to-device message. It is theoretically possible for an appservice to run out of OTKs if
a remote user claims all OTKs without actually using them. Implementations may be interested in
[MSC2732: Fallback keys](https://github.com/matrix-org/matrix-doc/pull/2732) which will avoid a scenario
where the appservice can no longer decrypt messages.
However, as mentioned, servers are free to include this information as little or often as they'd like,
provided they send it at least as often as when it changes.
### Optimization: Don't encrypt as often
Appservices theoretically do not need to establish Olm sessions with other appservice users as the appservice
will typically be managing the devices in one place. In short, this means that a room with 10k appservice
users and only 1 non-appservice user can be sped up by only encrypting from the appservice's users to the
non-appservice user. The appservice would not need to set up 10k * 10k Olm sessions given the encryption
and decryption all happens in the same place. As an added bonus, this improves performance of the appservice
as it doesn't have to handle to-device messages sent to itself.
Some implementations might not be able to support this sort of optimization though, so it is still permitted
to establish sessions and such between appservice users.
## Potential issues
Servers would have to track and send this information to appservices, however this is still perceived
to be more performant than appservices using potentionally thousands of `/sync` streams.
Appservices additionally cannot opt-in (or out) of this functionality unlike with MSC2409. It is
expected that servers will optimize for not including/calculating the fields if the appservice has
no interest in the information. Specifically, appservices which don't have any keys under their user
namespace can be assumed to not need device list changes and thus can be optimized out.
## Alternatives
An endpoint for appservices to poll could work, though this is extra work for the appservice and would
likely need pagination and such, which is all heavyweight for the server. Instead, having the server
batch up updates and send them to the appservice is likely faster.
## Security considerations
None relevant - this is the same information the appservice would get if it spawned `/sync` streams for
all the users in its namespace.
## Unstable prefix
While this MSC is not considered stable for implementation, implementations should use `org.matrix.msc3202.`
as a prefix to the fields on the `/transactions` endpoint. For example:
* `device_lists` becomes `org.matrix.msc3202.device_lists`
* `device_one_time_keys_count` becomes `org.matrix.msc3202.device_one_time_keys_count`
* `device_unused_fallback_key_types` becomes `org.matrix.msc3202.device_unused_fallback_key_types`
Appservices which support encryption but never see these fields (ie: server is not implementing this in an
unstable capacity) should be fine, though encryption might not function properly for them. It is the
responsibility of the appservice to try and recover safely and sanely, if desired, when the server is not
implementing this in an unstable capacity. This is not a concern once the MSC becomes stable in a released
version of the specification, as servers will be required to implement it.
For servers wishing to force appservices to opt-in to this behaviour, they may use `org.matrix.msc3202: true`
in the registration file. Servers will be able to check for "opt-in" behaviour once this MSC is stable by
seeing whether or not the appservice has an encryption-capable device recorded in its users namespaces.
To use device ID masquerading, implementations should use `org.matrix.msc3202.device_id` instead of `device_id`
in the query string while this MSC is considered unstable.