matrix.org/content/blog/2022/03/2022-03-29-how-do-you-imple...

25 KiB
Raw Permalink Blame History

+++ title = "How do you implement interoperability in a DMA world?" date = "2022-03-29T17:57:10Z" updated = "2022-03-29T09:23:58Z" path = "/blog/2022/03/29/how-do-you-implement-interoperability-in-a-dma-world"

[taxonomies] author = ["Matthew Hodgson"] category = ["General"]

[extra] image = "https://matrix.org/blog/img/2022-03-29-chat-bubbles.jpg" +++

With last weeks revelation that the EU Digital Markets Act will require tech gatekeepers (companies valued at over $75B or with over $7.5B of turnover) to open their communication APIs for the purposes of interoperability, theres been a lot of speculation on what this could mean in practice, To try to ground the conversation a bit, weve had a go at outlining some concrete proposals for how it could work.

However, before we jump in, we should review how the DMA has come to pass.

Whats driven the DMA?

Todays gatekeepers all began with a great product, which got more and more popular until it grew to such a size that one of the biggest reasons to use the service is not necessarily the product any more, but the benefits of being able to talk to a large network of users. This rapidly becomes anti-competitive, however: the user becomes locked into the network and cant switch even if they want to. Even when people have a really good reason to move provider (e.g. WhatsApps terms of use changing to share user data with Facebook, Apple doing a 180 on end-to-end encrypting iCloud backups, or Telegram not actually being end-to-end encrypted), in practice hardly anything changes - because the users are socially obligated to keep using the service in order to talk to the rest of the users on it.

As a result, its literally harmful to the users. Even if a new service launches with a shiny new feature, there is enormous inertia that must be overcome for users to switch, thanks to the pull of their existing network - and even if they do, theyll just end up with their conversations haphazardly fragmented over multiple apps. This isnt accepted for email; it isnt accepted for the phone network; it isnt accepted on the Internet itself - and it shouldnt be accepted for messaging apps either.

Similarly: the closed networks of todays gatekeepers put a completely arbitrary limit on how users can extend and enrich their own conversations. On email, if you want to use a fancy new client like Superhuman - you can. If you want to hook up a digital assistant or translation service to help you wrangle your email - you can. If you want to hook up your emails to a CRM to help you run your business - you can. But with todays gatekeepers, you have literally no control: youre completely at the mercy of the service provider - and for something like WhatsApp or iMessage the options are limited at best.

Finally - all the users conversation metadata for that service (who talks to who, and when) ends up centralised in the gatekeepers databases, which then become an incredibly valuable and sensitive treasure trove, at risk of abuse. And if the service provider identifies users by phone number, the user is forced to disclose their phone number (a deeply sensitive personal identifier) to participate, whether they want to or not. Meanwhile the user is massively incentivised not to move away: effectively they are held hostage by the pull of the services network of users.

So, the DMA exists as a strategy to improve the situation for users and service providers alike by building a healthier dynamic ecosystem for communication apps; encouraging products to win by producing the best quality product rather than the biggest network. To quote Cédric O (Secretary of State for the Digital Sector of France), the strategy of the legislation came from Washington advice to address the anticompetitive behaviour of the gatekeepers “not by breaking them up… but by breaking them open.” By requiring the gatekeepers to open their APIs, the door has at last been opened to give users the option to pick whatever service they prefer to use, to choose who they trust with their data and control their conversations as they wish - without losing the ability to talk to their wider audience.

However, something as groundbreaking as this is never going to be completely straightforward. Of course while some basic use cases (i.e. non-E2EE chat) are easy to implement, they initially may not have a UX as smooth as a closed network which has ingested all your address book; and other use cases(eg E2EE support) may require some compromises at first. Its up to the industry to figure out how to make the most of that challenge, and how to do it in a way which minimises the impact on privacy - especially for end-to-end encrypted services.

What problems need to be solved?

Weve already written about this from a Matrix perspective, but to recap - the main challenge is the trade-off between interoperability and privacy for gatekeepers who provide end-to-end encryption, which at a rough estimate means: WhatsApp, iMessage, secret chats in Facebook Messenger, and Google Messages. The problem is that even with open APIs which correctly expose the end-to-end encrypted semantics (as DMA requires), the point where you interoperate with a different system inevitably means that youll have to re-encrypt the messages for that system, unless they speak precisely the same protocol - and by definition you end up trusting the different system to keep the messages safe. Therefore this increases the attack surface on the conversations, putting the end-to-end encryption at risk.

Alex Stamos (ex-CISO at Facebook) said that “WhatsApp rolling out mandatory end-to-end encryption was the largest improvement in communications privacy in human history and we agree. Guaranteed end-to-end encrypted conversations on WhatsApp is amazing, and should be protected at all costs. If users are talking to other users on WhatsApp (or any set of users communicating within the same E2EE messenger), E2EE should and must be maintained - and there is nothing in the DMA which says otherwise.

But what if the user consciously wants to prioritise interoperability over encryption? What if the user wants to hook their WhatsApp messages into a CRM, or run them through a machine translation service, or try to start a migration to an alternative service because they dont trust Meta? Should privacy really come so spectacularly at the expense of user freedom?

We also have the problem of figuring out how to reference users on other platforms. Say that I want to talk to a user with a given phone number, but theyre not on my platform - how do I locate them? What if my platform only knows about phone numbers, but youre trying to talk to a user on a platform which uses a different format for identifiers?

Finally, we have the problem of mitigating abuse: opening up APIs makes it easier for bad actors to try to spam or phish or otherwise abuse users within the gatekeepers. There are going to have to be changes in anti-abuse services/software, and some signals that the gatekeeper platforms currently use are going to go away or be less useful, but that doesn't mean the whole thing is intractable. It will require changes and innovative thinking, but weve been making steady progress (e.g. the work done by Elements trust and safety team). Meanwhile, the gatekeepers already have massive anti-abuse systems in place to handle the billions of users within their walled gardens, and unofficial APIs are already widespread: adding official APIs does not change the landscape significantly (assuming interoperability is implemented in such a way that the existing anti-abuse mechanisms still apply).

In the past, gatekeepers dismissed the effort of interop as not being worthwhile - after all, the default course of action is to build a walled garden, and having built one, the temptation is to try to trap as many users as possible. It was also not always clear that there were services worth interoperating with (thanks to the chilling effects of the gatekeepers themselves, in terms of stifling innovation for communication startups). Nowadays this situation has fundamentally changed however: there is a vibrant ecosystem of open communication startups out there, and a huge appetite to build a vibrant open ecosystem for interoperable communication, but like the open web itself.

What are the requirements?

Before going further in considering solutions, we need to review the actual requirements of the DMA. Our best understanding at this point is that the DMA will mandate that:

  • Gatekeepers will have to provide open and documented APIs to their services, on request, in order to facilitate interoperability (i.e. so that other services can communicate with their users).
  • These APIs must preserve the same level of end-to-end encryption (if any) to remote users as is available to local users.
  • This applies to 1:1 messaging and file transfer in the short term, and group messaging, file-transfer, 1:1 VoIP and group VoIP in the longer term.

So, what could this actually look like?

The DMA legislation deliberately doesnt focus on implementation, instead letting the industry figure out how this could actually work in practice. There are many different possible approaches, and so from our point of view as Matrix weve tried to sketch out some options to make the discussion more concrete. Please note these are preliminary thoughts, and are far from perfect - but hopefully useful as a starting point for discussion.

Finding Bob

Imagine that you have a user Alice on an existing gatekeeper, which well call AliceChat, who runs an E2EE messaging service which identifies users using phone numbers. Say that they want to start a 1-to-1 conversation with Bob, who doesnt use AliceChat, but Alice knows he is a keen user of BobChat. Today, youd have no choice but to send them an SMS and nag them to join AliceChat (sucks to be them if they dont want to use that service, or if theyre unable to for whatever reason - e.g. their platform isnt supported, or their government has blocked access, etc), or join BobChat yourself.


However, imagine if instead the gatekeeper app had a user experience where the app prompted you to talk to the user via a different platform instead. Itd be no different to your operating system prompting you to pick which app to use to open a given file extension (rather than the OS vendor hardcoding it to one of their own apps - another win for user rights led by the EU!).


Now, the simplest approach in the short term would be for each gatekeeper to pre-provision a set of options of possible alternative networks. (The DMA says that, on request, other service providers can ask to have access to the gatekeepers APIs for the purposes of interoperability, so the gatekeeper knows who the alternative networks may be). “Bob is not on AliceChat - do you want to try to reach him instead on BobChat, CharlieChat, DaveChat (etc)”.

Much like users can configure their preferred applications for file extensions in an operating system today, users would also be able to add their own preferred service providers - simply specifying their domain name.

Connecting to Bob

Now, AliceChat itself needs to figure out how to query the remote service provider to see if Bob actually exists there. Given the DMA requires that gatekeepers provide open APIs with the same level of security to remote users as their local ones using todays private APIs - and very deliberately doesnt mandate specific protocols for interoperability - they will need to locate a bridge which can connect to the other system.

In this thought experiment, the bridge used would be up to the destination provider. For instance, bobchat.com could announce that AliceChat users should connect to it via alicechat-bridge.bobchat.com using the AliceChat protocol(or matrix-bridge.bobchat.com via Matrix or xmpp-bridge.bobchat.com via XMPP) by a simple HTTP API or even a .well-known URL. Users might also be able to override the bridge used to connect to them (e.g. to point instead at a client-side bridge), and could sign the advertisement to prove that it hadnt been tampered with.

AliceChat would then connect to the discovered bridge using AliceChats vendor-specific newly opened API, and would then effectively treat Bob as if they were a real AliceChat user and client to all intents and purposes. In other words, Bob would effectively be a “ghost user” on AliceChat, and subject to all their existing anti-abuse mechanisms.

Meanwhile, the other side of the bridge converts through to whatever the target system is - be that XMPP, Matrix, a different proprietary API, etc. For Matrix, itd be chatting away to a homeserver via the Application Service API (using End-to-Bridge Encryption via MSC3202). Its also worth noting that the target might not even be a bridge - it could be a system which already natively speaks AliceChats end-to-end encrypted API, thus preserving end-to-end encryption without any need to re-encrypt. Its also worth noting that while historically bridges have had a bad reputation as being a second class (often a second class afterthought), Matrix has shown that by considering them as a first class citizen and really focusing on mapping the highest common denominator between services rather than lowest common denominator, its possible for them to work transparently in practice. Beeper is a great example of Matrix bridging being used for real in the wild (rather amusingly they just shipped emoji reactions for WhatsApp on iOS via their WhatsApp<->Matrix bridge before WhatsApp themselves did…)

Architecturally, it could look like this:

Or, more likely (given a dedicated bridge between two proprietary services would be a bit of a special case, and youd have to solve the dilemma of who hosts the bridge), both services could run a bridge to a common open standard protocol like Matrix or XMPP instead (thus immediately enabling interoperability with everyone else connected to that network):

Please note that while these examples show server-side bridges, in practice it would be infinitely preferable to use client-side bridges when connecting to E2EE services - meaning that decrypted message data would only ever be exposed on the client (which obviously has access to the decrypted data already). Client-side bridges are currently complicated by OS limits on background tasks and push notification semantics (on mobile, at least), but one could envisage a scenario where you install a little stub AliceChat client on your phone which auths you with AliceChat and then sits in the background receiving messages and bridging them through to Matrix or XMPP, like this:

Another possible architecture could be for the E2EE gatekeeper to expose their open APIs on the clients, rather than the server. DMA allows this, to the best of our knowledge - and would allow other apps on the device to access the message data locally (with appropriate authorisation, of course) - effectively doing a form of realtime data liberation from the closed service to an open system, looking something like this:

Finally, it's worth noting that when peer-to-peer decentralised protocols like P2P Matrix enter production, clientside bridges could bridge directly into a local communication server running on the handset - thus avoiding metadata being exposed on Matrix or XMPP servers providing a common language between the service providers.

Locating users

Now, the above describes the simplest and most naive directory lookup system imaginable - the problem of deciding which provider to use to connect to each user is shouldered by the users. This isnt that unreasonable - after all, users may have strong feelings about what providers to use to talk to a given user. Alice might be quite happy to talk to Bob via BobChat, but might be very deliberately avoiding talking to him on DaveChat, for whatever ominous reasons.

However, its likely in future we will also see other directory services appear in order to map phone numbers (or other identities) to providers - whether these piggyback on top of existing identity providers (gatekeepers, DNS, telcos, SSO providers, governments) or are decentralised through some other mechanism. For instance, Bob could send AliceChat a blinded proof that he authorises them to automatically route traffic to him over at BobChat, with BobChat maintaining a matching proof that Bob is who he claims to be (having gone through BobChats auth process) - and the proofs could be linked using a temporary key such that Bob doesnt even need to maintain a long-term one. (Thanks to James Monaghan for suggesting this one!)

Another alternative to having the user decide where to find each other could be to use a decentralised Keybase-style identity system to track verified mappings of identities (phone numbers, email addresses etc) through to service providers - perhaps something like IDX might fit the bill? While this decentralised identity lookups have historically been a hard problem, there is a lot of promising work happening in this space currently and the future looks promising.

Talking to Bob

Meanwhile, Alice still needs to talk to Bob. As already discussed, unless everyone speaks the same end-to-end encrypted protocol (be it Matrix, WhatsApp or anything else), we inevitably have a trade-off here between interoperability and privacy if Bob is not on the same system as Alice (assuming AliceChat is end-to-end encrypted) - and we will need to clearly warn Alice that the conversation is no longer end-to-end encrypted:


To be clear: right now, today, if Bob were on AliceChat, he could be copy-pasting all your messages into (say) Google Translate in a frantic effort to workaround the fact that his closed E2EE chat platform has no way to do machine translation. However, in a DMA world, Bob could legitimately loop a translation bot into the conversation… and Alice would be warned that the conversation was no longer secure (given the data is now being bridged over to Google).

This is a clear improvement in user experience and transparency. Likewise, if Im talking to a bridged user today on one of these platforms, I have no way of telling that they have chosen to prioritise interop over E2EE - which is frankly terrifying. If Im talking to someone on WhatsApp today I blindly assume that they are E2EE as they are on the same platform - and if theyre using an unofficial app or bridge, I have no way to tell. Whereas in a DMA world, you would expect the gatekeeper to transparently expose it.

If anything, this is good news for the gatekeeper in that it consciously advertises a big selling point for them: that for full E2EE, users need to talk to other users in the same walled garden (unless of course the platform speaks the same protocol). No more need for bus shelter adverts to remind everywhere that WhatsApp is E2EE - instead they can remind the user every time they talk to someone outside the walled garden!

Just to spell it out: the DMA does not require or encourage any reduction in end-to-end encryption for WhatsApp or similar: full end-to-end encryption will still be there for users in the same platform, including through to users on custom clients (assuming the gatekeeper doesnt flex and turn it off for other reasons).

Obviously, this flow only considers the simple case of Alice inviting Bob. The flow is of course symmetrical for Bob inviting Alice; AliceChat will need to advertise bridges which can be used to connect to its users. As Bob pops up from BobChat, the bridge would use AliceChats newly open APIs to provision a user for him, authing him as per any other user (thus ensuring that AliceChat doesnt need to have trusted BobChat to have authenticated the user). The bridge then sends/receives messages on Bobs behalf within AliceChat.

Group communication

This is all very well for 1:1 chats - which are the initial scope of the DMA. However, over the coming years, we expect group chats to also be in scope. The good news is that the same general architecture works for group chats too. We need a better source of identity though: AliceChat cant possibly independently authenticate all the new users which might be joining via group conversations on other servers (especially if they join indirectly via another server). This means adopting one of the decentralised identity lookup approaches outlined earlier to determine whether Charlie on CharlieChat is the real Charlie or an imposter.

Another problem which emerges with group chats which span multiple service providers is that of indirect routing, especially if the links between the providers use different protocols. What if AliceChat has a direct bridge to BobChat (a bit like AIM and ICQ both spoke OSCAR), BobChat and CharlieChat are connected by Matrix bridges, and AliceChat and CharlieChat are connected via XMPP bridges? We need a way for the bridges to decide who forwards traffic for each network, and who bridges the users for which network. If they were all on Matrix or XMPP this would happen automatically, but with mixed protocols wed probably have to extend the lookup protocol to establish a spanning tree for each conversation to prevent forwarding loops.

Heres a deliberately twisty example to illustrate the above thought experiment:

There is also a risk of bridge proliferation here - in the worst case, every service would have to source bridges to directly connect to every other service who came along, creating a nightmarish n-by-m problem. But in practice, we expect direct proprietary-to-proprietary bridges to be rare: instead, we already have open standard communication protocols like Matrix and XMPP which provide a common language between bridges - so in practice, you could just end up in a world where each service has to find a them-to-Matrix or them-to-XMPP bridge (which could be run by them, or whatever trusted party they delegate to).

Conclusion

A mesh of bridges which connect together the open APIs of proprietary vendors by converting them into open standards may seem unwieldy at first - but its precisely the sort of ductwork which links both phone networks and the Internet together in practice. As long as the bridging provides for highest common denominator fidelity at the best impedance ratio, then its conceptually no different to converting circuit switched phone calls to VoIP, or wired to wireless Ethernet, or any of the other bridges which we take entirely for granted in our lives thanks to their transparency.

Meanwhile, while this means a bit more user interface in the communication apps in order to select networks and warn about trustedness, the benefits to users are enormous as they put the user squarely back in control of their conversations. And the UX will improve as the tech evolves.

The bottom line is, we should not be scared of interoperability, just because weve grown used to a broken world where nothing can interconnect. There are tractable ways to solve it in a way that empowers and informs the user - and the DMA has now given the industry the opportunity to demonstrate that it can work.