matrix-doc/proposals/3202-encrypted-appservices.md

8.8 KiB

MSC3202: Encrypted Appservices

Presently, appservices in Matrix are capable of attaching themselves to a homeserver for high-traffic bot-like usecases, such as bridging and operationally expensive bots. Traditionally, these appservices only work in unencrypted rooms due to not having enough context on the encryption state to actually function properly.

This MSC targets the missing bits to support encryption at the appservice level: other MSCs, such as MSC2409 and MSC2778 give appservices foundational pieces to get device IDs and to-device messages, as required by encryption.

Proposal

This proposal takes inspiration from MSC2409 by defining a new set of keys on the appservice /transactions endpoint, similar to sync:

{
  "events": [
    // as defined today
  ],
  "ephemeral": [
    // MSC2409
  ],
  "to_device": [
    // MSC2409
  ],
  "device_lists": {
    "changed": ["@alice:example.org"],
    "left": ["@bob:example.com"]
  },
  "device_one_time_keys_count": {
    "@_irc_bob:example.org": {
      "DEVICEID": {
        "curve25519": 10,
        "signed_curve25519": 20
      }
    }
  },
  "device_unused_fallback_key_types": {
    "@_irc_bob:example.org": {
      "DEVICEID": ["signed_curve25519"]
    }
  }
}

These fields are heavily inspired by the extensions to /sync in the client-server API.

All the new fields can be omitted if there are no changes for the appservice to handle. For device_one_time_keys_count and device_unused_fallback_key_types, the format is slightly different from the client-server API to better map the appservice's user namespace users to the counts. Users in the namespace without keys or which have unchanged keys since the last transaction can be omitted (more details on this later on). Note that fallback keys are described in MSC2732 as of writing.

Like MSC2409, any user the appservice would be considered "interested" in (user in the appservice's namespace, or sharing a room with an appservice user/namespaced room) would qualify for the device list changes section.

Note that it's typical for clients to pause sync loops when processing device list changes to avoid a scenario where they are unable to decrypt/encrypt a message from/to a particular device. Appservices are expected to mirror this by ensuring the transaction request does not complete until processing is complete. In the worst case, the server will time out the request and retry it verbatim, so appservices might wish to track which device list changes in which transaction they already processed or keep processing transactions in the background while retries are attempted.

In order to allow the appservice to masquerade as its users, an extension to the existing identity assertion ability is proposed. To compliment the (optional) user_id when using an as_token as an access token, a similarly optional device_id query parameter is proposed. When provided, the server asserts that the device ID is valid for the user, and that the appservice is able to masquerade as that user. If valid, that device ID should be assumed as being used for that request. For many requests, this means updating the "last seen IP" and "last seen timestamp" for the device, however for some endpoints it means interacting with that device (such as when uploading keys).

Optimization: when to send OTKs/fallback keys

As mentioned above, in order to keep the transaction byte size down the server can (and should) exclude OTK counts and unused fallback keys when they haven't changed since the last transaction. Appservices however should be tolerable of the server over-communicating the counts as a quick/cheap approach would be to always include the OTK counts/unused fallback keys for all known users rather than trying to detect changes.

As a middle ground, servers might be interested in an algorithm which doesn't detect changes between transactions but does attempt to reduce traffic. If the appservice is about to receive an event or message typically associated with encryption, the counts for the affected users could be included. This would result in the following rules:

  • If an m.room.encrypted event is being included in the transaction's events, include OTK counts and unused fallback key types for all appservice users which reside in that room.
  • If an appservice user is receiving a to-device message in the transaction's to_device array, include OTK counts and unused fallback key types for that user.

This approach has the advantage of typically minimal changes to the internals of the homeserver, works similar to /sync, and reduces noisy traffic in the transaction sending. This additionally still honours the "when they change, send the counts" requirement to a reasonable degree: typically a use of an OTK will be followed by a to-device message. It is theoretically possible for an appservice to run out of OTKs if a remote user claims all OTKs without actually using them. Implementations may be interested in MSC2732: Fallback keys which will avoid a scenario where the appservice can no longer decrypt messages.

However, as mentioned, servers are free to include this information as little or often as they'd like, provided they send it at least as often as when it changes.

Optimization: Don't encrypt as often

Appservices theoretically do not need to establish Olm sessions with other appservice users as the appservice will typically be managing the devices in one place. In short, this means that a room with 10k appservice users and only 1 non-appservice user can be sped up by only encrypting from the appservice's users to the non-appservice user. The appservice would not need to set up 10k * 10k Olm sessions given the encryption and decryption all happens in the same place. As an added bonus, this improves performance of the appservice as it doesn't have to handle to-device messages sent to itself.

Some implementations might not be able to support this sort of optimization though, so it is still permitted to establish sessions and such between appservice users.

Potential issues

Servers would have to track and send this information to appservices, however this is still perceived to be more performant than appservices using potentionally thousands of /sync streams.

Appservices additionally cannot opt-in (or out) of this functionality unlike with MSC2409. It is expected that servers will optimize for not including/calculating the fields if the appservice has no interest in the information. Specifically, appservices which don't have any keys under their user namespace can be assumed to not need device list changes and thus can be optimized out.

Alternatives

An endpoint for appservices to poll could work, though this is extra work for the appservice and would likely need pagination and such, which is all heavyweight for the server. Instead, having the server batch up updates and send them to the appservice is likely faster.

Security considerations

None relevant - this is the same information the appservice would get if it spawned /sync streams for all the users in its namespace.

Unstable prefix

While this MSC is not considered stable for implementation, implementations should use org.matrix.msc3202. as a prefix to the fields on the /transactions endpoint. For example:

  • device_lists becomes org.matrix.msc3202.device_lists
  • device_one_time_keys_count becomes org.matrix.msc3202.device_one_time_keys_count
  • device_unused_fallback_key_types becomes org.matrix.msc3202.device_unused_fallback_key_types

Appservices which support encryption but never see these fields (ie: server is not implementing this in an unstable capacity) should be fine, though encryption might not function properly for them. It is the responsibility of the appservice to try and recover safely and sanely, if desired, when the server is not implementing this in an unstable capacity. This is not a concern once the MSC becomes stable in a released version of the specification, as servers will be required to implement it.

For servers wishing to force appservices to opt-in to this behaviour, they may use org.matrix.msc3202: true in the registration file. Servers will be able to check for "opt-in" behaviour once this MSC is stable by seeing whether or not the appservice has an encryption-capable device recorded in its users namespaces.

To use device ID masquerading, implementations should use org.matrix.msc3202.device_id instead of device_id in the query string while this MSC is considered unstable.