matrix.org/content/blog/2021/06/2021-06-14-adventures-in-fu...

427 lines
22 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

+++
title = "Adventures in fuzzing libolm"
path = "/blog/2021/06/14/adventures-in-fuzzing-libolm"
[taxonomies]
author = ["Denis Kasak"]
category = ["Security", "Research"]
+++
## Introduction
Hi all! My name is Denis and I'm a [security
researcher](https://dkasak.github.io/). Six months ago, I started working for
Element on doing dedicated security research on important Matrix projects.
After some initial focus on Synapse, I decided to take a closer look at
[libolm](https://gitlab.matrix.org/matrix-org/olm). In this entry, I'd like to
present an overview of that work, along with some early fruits that came out of
it.
**TL;DR: we found some bugs which had crept in since libolm's original audit in
2016, thanks to properly overhauling our fuzzing capability, and we'd like to
tell you all about it! The bugs were not easily exploitable (if at all), and
have already been fixed.**
**Update:
[CVE-2021-34813](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-34813)
has now been assigned to this.**
To give a bit of a background, libolm is a cryptographic library implementing
the [Double Ratchet
Algorithm](https://en.wikipedia.org/wiki/Double_Ratchet_Algorithm) pioneered by
Signal and it is the cryptographic workhorse behind Matrix. The classic
algorithm is called Olm in Matrix land, but libolm also implements Megolm which
is a variant for efficient encrypted group chats between many participants.
Since libolm is currently used in all Matrix clients supporting end-to-end
encryption, it makes for a particularly juicy target. The present state of
libolm's monopoly on Matrix encryption is somewhat unfortunate -- luckily there
are some exciting new developments on the horizon, such as the
[vodozemac](https://github.com/matrix-org/matrix-rust-sdk/tree/vodozemac-bench)
implementation in Rust. But for now, we're stuck with libolm.
To start, I decided to do a bit of fuzzing. libolm already had a fuzzing setup
using AFL, but it was written a while ago. The state of the art in fuzzing had
advanced quite rapidly in the last few years, so the setup was missing many
modern features and techniques. As an example, the fuzzing setup was configured
to use the now ancient afl-gcc coverage mode, which can be slower than the more
modern LLVM-based coverage by a factor of 2.
I also noticed that the fuzzing was done with non-hardened binaries (instead of
using something like ASAN), so many memory errors could've gone unnoticed.
There were also no corpora available from previous fuzzing runs and some of the
newer code was not covered by the harnesses.
## Preparation
I decided to tackle these one by one, adding ASAN and MSAN builds as a first
step. I took the opportunity to switch to [AFL++](https://aflplus.plus/) since
it is a drop-in replacement and contains numerous improvements, notably
improved coverage modes which are either much faster (e.g. LLVM-PCGUARD) or
guaranteed to have no collisions (LTO)[^1]. AFL++ also optimizes mutation
scheduling (by using scheduling algorithms from
[AFLFast](https://github.com/mboehme/aflfast)) and mutation operator selection
(through [MOpt](https://github.com/puppet-meteor/MOpt-AFL)). All of this makes
it much more efficient at discovering bugs.
After this, I changed the existing harnesses to use AFL's persistent mode
(which lowers process creation overhead and thus increases fuzzing
performance). This change, combined with the switch to a newer coverage mode,
increased the fuzzing exec/s from ~2.5k to ~5.5k on my machine, so this is not
an insignificant gain!
[^1]: In the context of fuzzing, collisions are situations where
two different execution paths appear to the fuzzer as the same one due to
technical limitations.
Classically, AFL tracks coverage by tracking so-called "edges" (or "tuples").
Edges are really pairs of (A, B), where A and B represent basic blocks. Each
edge is meant to represent a different execution "jump", but sometimes, as
the number of basic blocks in a program grows, two different execution paths
can end up being encoded as the same edge. LTO mode in AFL++ does some magic
so that this is guaranteed not to happen.
After this preparatory work, I generated a small initial corpus and ran a small
fleet of fuzzers with varying parameters. Almost immediately, I started getting
heaps of crashes. Luckily, after some investigation, these turned out not to be
serious bugs in the library but a double-free in the fuzzing harness! The
double-free only got triggered when the input was of size 0. It also only
happened with AFL++ and not vanilla AFL, presumably due to differences in input
trimming logic, which must be the reason no one noticed this earlier. I quickly
came up with a patch and resumed.
## The plot thickens
I let the fuzzers run for a while. Since ASAN introduces a bit of a performance
overhead, I only run a single AFL instance with ASAN variant of the binary.
This is okay because all fuzzer instances actually synchronize their findings,
which means every instance gets to see every input which increases coverage.
When I came back to check, there was another crash waiting. This time the
crashing input wasn't being generated continually so it looked much more
promising -- and only the ASAN instance was crashing. *A-ha*!
Running the offending input on the ASAN variant of the harness revealed it was
an invalid read one byte past the end of a heap buffer. The read was happening
in the base64 decoder:
```
./build/fuzzers/fuzz_group_decrypt_asan "" pickled-inbound-group-session.txt <input
=================================================================
==1838065==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xf4a00795 at pc 0x56560660 bp 0xffff9df8 sp 0xffff9de8
READ of size 1 at 0xf4a00795 thread T0
#0 0x5656065f in olm::decode_base64(unsigned char const*, unsigned int, unsigned char*) src/base64.cpp:124
#1 0x565607b5 in _olm_decode_base64 src/base64.cpp:165
#2 0x565d5a9e in olm_group_decrypt_max_plaintext_length src/inbound_group_session.c:304
#3 0x56558e75 in main fuzzers/fuzz_group_decrypt.cpp:46
#4 0xf7509a0c in __libc_start_main (/usr/lib32/libc.so.6+0x1ea0c)
#5 0x5655a0f4 in _start (/home/dkasak/code/olm/build/fuzzers/fuzz_group_decrypt_asan+0x50f4)
0xf4a00795 is located 0 bytes to the right of 5-byte region [0xf4a00790,0xf4a00795)
allocated by thread T0 here:
#0 0xf7a985c5 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x56558ce3 in main fuzzers/fuzz_group_decrypt.cpp:32
#2 0xf7509a0c in __libc_start_main (/usr/lib32/libc.so.6+0x1ea0c)
SUMMARY: AddressSanitizer: heap-buffer-overflow src/base64.cpp:124 in olm::decode_base64(unsigned char const*, unsigned int, unsigned char*)
Shadow bytes around the buggy address:
0x3e9400a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x3e9400b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x3e9400c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x3e9400d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x3e9400e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x3e9400f0: fa fa[05]fa fa fa 05 fa fa fa fa fa fa fa fa fa
0x3e940100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x3e940110: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x3e940120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x3e940130: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x3e940140: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==1838065==ABORTING
```
Following the stack trace, I quickly pinpointed the root of the bug: the logic
of the decoder was subtly flawed, unconditionally accessing a remainder
byte[^2] in the base64 input which might not actually be there.
This occurs when the input is 1 (mod 4) in length, which can never happen in
a *valid* base64 payload, but of course we cannot assume all inputs are
necessarily valid payloads. Specifically, if the payload was not 0 (mod 4) in
length, the code was assuming it was at least 2 (mod 4) or more in length and
immediately read the second byte. This spurious byte was then incorporated into
the output value.
[^2]: By *remainder byte*, I mean bytes which are not part of
a group of 4. These can only occur at the end of a base64 payload and they're
the ones that get suffixed with padding in padded base64.
I examined the code in an attempt to find a way to have it leak more than
a single byte, but it was impossible. As it turned out, not even the full byte
of useful information was encoded into the output -- due to the way the byte is
encoded, only about 6 bits of useful information ended up in the output value.
Still, even a single leaked bit is too much in a cryptographic context. Could
we do some heap hacking so that something of interest is placed there and then
have it be leaked to us?
I next tracked down all call sites of the vulnerable function
`olm::decode_base64`. Most of them were immune to the problem since they were
preceded with calls to another function, `olm::decode_base64_length`, which
checks that the base64 payload is of legal length. This left me with only a few
potentially vulnerable call sites, so I examined where their base64 inputs come
from. Promisingly, two of them received input from other conversation
participants, but they either had no way of leaking the information back to the
attacker or they hardcoded the number of bytes to be processed, after ensuring
the input was of some minimum length. The output of the remaining function
[`olm_pk_decrypt`][olm_pk_decrypt] is never sent anywhere externally, so there
was again no way of leaking the data to the attacker.
[olm_pk_decrypt]:
<https://gitlab.matrix.org/matrix-org/olm/-/blob/0a7b6da9a0959694303d5b0724a09e63e8c217ab/src/pk.cpp#L352>
In conclusion, even though this invalid read is a valid bug, I was not able to
find a working exploit for it.
But wait a second! Something was still bothering me about `olm_pk_decrypt`.
It's a fairly complex function, *receives several string inputs from the
homeserver* and it itself isn't tested by any of the harnesses. Furthermore,
the reason I started looking at it in the first place is that it was missing
the `olm::decode_base64_length` check. Perhaps it warrants a closer look?
## It does
And sure enough, there was something amiss. As `olm_pk_decrypt` receives three
base64 inputs from the homeserver: the ciphertext to decrypt, an ephemeral
public key and a MAC. All three are eventually passed to `olm::decode_base64`
to be decoded. Yet there was only a single length check there, to ensure the
decrypted ciphertext would fit its output buffer. What would happen if the
server returned a public key that was longer than expected?
```c++
struct _olm_curve25519_public_key ephemeral;
olm::decode_base64(
(const uint8_t*)ephemeral_key, ephemeral_key_length,
(uint8_t *)ephemeral.public_key
);
```
As can be seen from the snippet, the decoded version of public key gets written
to `ephemeral.public_key`, which is an array allocated on the stack. If the
input is longer than expected, this will become a stack buffer overflow.
The purpose of `olm_pk_decrypt` is to decrypt secrets previously stored by
a Matrix device on the homeserver. The point of encryption is to prevent the
server from learning these secrets since they're supposed to be known only by
your own devices. One use case for this mechanism is to allow one of your
devices to store encrypted end-to-end encryption keys on the homeserver. Your
other devices can then retrieve those keys from the homeserver, making it
possible to view all of your private conversations on each of your devices.
I decided to go for an end-to-end test to confirm the bug is triggerable by
connecting with the latest Element Android from my test phone to my homeserver,
with [mitmproxy](https://mitmproxy.org/) sitting in between. This allowed me to
write a small mitmproxy script which intercepts HTTP calls fetching the E2E
encryption keys from the homeserver and modifies the response so that the key
is longer than expected.
```python
import json
from mitmproxy import ctx, http
def response(flow: http.HTTPFlow) -> None:
if ("/_matrix/client/unstable/room_keys/keys" in flow.request.pretty_url
and flow.request.method == "GET"):
response_body = flow.response.content.decode("utf-8")
response_json = json.loads(response_body)
rooms = response_json["rooms"]
room_id = list(rooms.keys())[0]
sessions = rooms[room_id]["sessions"]
session = list(sessions.keys())[0]
session_data = sessions[session]["session_data"]
ephemeral = session_data["ephemeral"]
ctx.log.info(f"Replacing ephemeral key '{ephemeral}' with '{ephemeral * 10}'")
session_data["ephemeral"] = ephemeral * 10
modified_body = json.dumps(response_json).encode("utf-8")
flow.response.content = modified_body
```
This longer value is then eventually passed by Element Android to libolm's
`olm_pk_decrypt`, which triggers the buffer overflow. With all of that in
place, I deleted the local encryption key backup on my device and asked for it
to be restored from the server:
```
F libc : stack corruption detected (-fstack-protector)
F libc : Fatal signal 6 (SIGABRT), code -6 (SI_TKILL) in tid 24517 (DefaultDispatch), pid 24459 (im.vector.app)
F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
F DEBUG : Build fingerprint: 'xiaomi/tissot/tissot_sprout:9/PKQ1.180917.001/V10.0.24.0.PDHMIXM:user/release-keys'
F DEBUG : Revision: '0'
F DEBUG : ABI: 'arm64'
F DEBUG : pid: 24459, tid: 24517, name: DefaultDispatch >>> im.vector.app <<<
F DEBUG : signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
F DEBUG : Abort message: 'stack corruption detected (-fstack-protector)'
F DEBUG : x0 0000000000000000 x1 0000000000005fc5 x2 0000000000000006 x3 0000000000000008
F DEBUG : x4 0000000000000000 x5 0000000000000000 x6 0000000000000000 x7 0000000000000030
F DEBUG : x8 0000000000000083 x9 7d545b4513138652 x10 0000000000000000 x11 fffffffc7ffffbdf
F DEBUG : x12 0000000000000001 x13 0000000060b0f2a9 x14 0022ed916fede200 x15 0000d925cd93f18f
F DEBUG : x16 00000079e741b2b0 x17 00000079e733c9d8 x18 0000000000000000 x19 0000000000005f8b
F DEBUG : x20 0000000000005fc5 x21 0000007940e3c400 x22 000000000000026b x23 00000000000001d0
F DEBUG : x24 000000000000002f x25 000000793d9653f0 x26 0000007948303368 x27 0000007945dd5588
F DEBUG : x28 00000000000001d0 x29 0000007945dd37d0
F DEBUG : sp 0000007945dd3790 lr 00000079e732e00c pc 00000079e732e034
```
### Impact
This vulnerability is a server-controlled stack buffer overflow in Matrix
clients supporting room key backup.
Of course, the largest fear stemming from any remotely controlled stack buffer
overflow is code execution. This is perhaps even doubly so in a cryptographic
library, where we have the additional worry of an attacker being able to leak
our dearly protected conversations.
The federated architecture of Matrix may be somewhat of a mitigating
circumstance in this case, since users are much more likely to know and trust
the homeserver owner, but we don't want to have to rely on this trust.
#### Native binaries
Luckily, on its own, this bug is not enough to successfully execute code on
native binaries. By default, libolm is compiled for all supported targets with
stack canaries (also called stack protectors or stack cookies), which are magic
values unknown to the attacker, placed just before the current function's frame
on the stack. This value is checked upon returning from the function -- if its
value is changed, the process aborts itself to prevent further damage. This is
evident from the `Abort message: 'stack corruption detected
(-fstack-protector)'` message above. Besides canaries, other system-level
protections exist to make exploiting bugs such as this harder, such as ASLR.
Therefore, to achieve remote code execution, an attacker would need to find
additional vulnerabilities which would allow him to exfiltrate the stack canary
and addresses of key memory locations from the system.
#### WASM
With WASM, the analysis is much more complicated due to its very different
memory and execution model. In WASM, the unmanaged stack is generally much more
vulnerable due to it missing support for stack canaries. This implies a stack
buffer overflow can not only overwrite the frame of the function in which the
overflow occurred but also *all parent frames*.
On the other hand, due to typed calls and much stronger control-flow integrity
techniques, it's much harder for the attacker to make the code do something
that is (maliciously) useful. Notably, return addresses live outside unmanaged
memory and are out of reach to the attacker. Because of this, the primary way
of influencing code execution is by manipulating `call_indirect` instructions
in such a way as to call.
The analysis of the impact of this bug on the WASM binary is thus left as an
exercise to the reader. If you're interested, the 2020 USENIX paper [Everything
Old is New Again: Binary Security of WebAssembly][everything-old-is-new-again]
is a great starting point.
[everything-old-is-new-again]:
<https://www.usenix.org/conference/usenixsecurity20/presentation/lehmann>
## The fix
Once the problems were identified, [the patches][patch1] were [rather
trivial][patch2] and the issues were promptly resolved. The first libolm
release that includes the fix is 3.2.3 which was released on 2021-05-25.
[patch1]:
<https://gitlab.matrix.org/matrix-org/olm/-/commit/e82f2601b0dc2a69d3167499ee969555267b7aa6>
[patch2]:
<https://gitlab.matrix.org/matrix-org/olm/-/commit/ccc0d122ee1b4d5e5ca4ec1432086be17d5f901b>
We reached out to all Matrix clients which were determined to be affected. The
Element client versions which first fix the issue are as follows:
- Element Web/Desktop: v1.7.29
- Element Android: v1.1.9
- Element iOS: v1.4.0
For the mobile clients, these versions are already available in their
respective application stores at the time of publishing this post. If you
haven't already, please upgrade.
## Future work
Even though the fuzzing setup is in a much better shape now (or rather will be,
since I still have some PRs to merge upstream), there's still a lot that can be
done to further improve it.
Right now, there are undoubtedly parts of the codebase that are not fuzzed
well. The reasons for this range from the obvious, like some parts of the code
simply not being called by any the existing harnesses, to more subtle ones such
as the fact that cryptographic operations form a nearly-insurmountable natural
barrier for naive fuzzing operations[^3]. Finally, some of
the existing harnesses accept additional parameters as command-line arguments,
meaning we would have to re-run the same harness with different values of those
parameters in order to reach full coverage of the code. This is suboptimal.
So the plan for future work is roughly as follows:
1. Write missing harnesses to cover more portions of the codebase.
2. Write starting corpus generators. These should generate believable, valid
input for each of the harnesses. For example, for the decryption harness, we
should generate a variety of encrypted messages: empty, short, long, text,
binary, etc.
3. Modify the harnesses so that their extra parameters are determined from the
fuzzed input. This will allow the fuzzer to vary these itself, which reduces
the importance of the human in the loop and makes it harder to forget some
combination.
4. Fuzz for some time until coverage stops increasing. The corpora generated
should be saved so that future fuzzing attempts can resume from an earlier
point so that this work is not wasted.
5. Use afl-cov to investigate which parts of the code are not covered well or
at all. This should inform us what further changes are needed.
6. Write intelligent, custom mutators. These will allow the fuzzer to take
a valid input and easily produce another valid input instead of only
corrupting it with a high probability.
7. Design harnesses which test for wanted semantic properties instead of only
memory errors.
[^3]: Classic fuzzers famously have a hard time
circumventing magic values and checksums, and cryptography is full of these.
This is further complicated by the fact that the double ratchet algorithm is
very stateful and depends on the two ratchets evolving in lockstep. This
means that even if, for example, the decryption harness is supplied with
a corpus of valid encrypted messages, the mutations done by the fuzzer would
only manage to produce corrupted versions of those messages which will fail
to decrypt, but it will \~never manage to produce a different *valid*
encrypted message.
It's very exciting that we're able to do full-time security research on Matrix
these days (thanks to Element's funding), and going forwards we'll publish any
interesting discoveries for the visibility and education of the whole Matrix
community. We'd also like to remind everyone that we run an official [Security
Disclosure Policy](https://matrix.org/security-disclosure-policy/) for
Matrix.org and we'd welcome other researchers to come join our Hall of Fame!
(And hopefully we will get more bounty programmes running in future.)