Hey! I'm David, a security engineer at the Blockchain team of Facebook, previously a security consultant for the Cryptography Services of NCC Group. I'm also the author of the Real World Cryptography book. This is my blog about cryptography and security and other related topics that I find interesting.
I've talked about the SHA-3 standard FIPS 202 quite a lot, but haven't talked too much about the second function the standard introduces: SHAKE.
SHAKE is not a hash function, but an Extendable Output Function (or XOF). It behaves like a normal hash function except for the fact that it produces an “infinite” output. So you could decide to generate an output of one million bytes or an output of one byte. Obviously don't do the one byte output thing because it's not really secure. The other particularity of SHAKE is that it uses saner parameters that allow it to achieve the desired targets of 128-bit (for SHAKE128) or 256-bit (for SHAKE256) for security.
This makes it a faster alternative than SHA-3 while being a more flexible and versatile function.
SHAKE is intriguing enough that just a year following the standardization of SHA-3 (2016) another standard is released from the NIST's factory: Special Publication 800-185. Inside of it a new customizable version of SHAKE (named cSHAKE) is defined, the novelty: it takes an additional "customization string" as argument. This string can be anything from an empty string to the name of your protocol, but the slightest change will produce entirely different outputs for the same inputs. This
customization string is mostly used as domain separation for the other functions defined in the new document: KMAC, TupleHash and ParallelHash. The rest of this blogpost explains what these new functions are for.
Imagine that you want to send a message to your good friend Bob. You do not care about encrypting your message, but to make sure that nobody modifies the message in transit, you hash it with SHA-256 (the variant of SHA-2 with an output length of 256-bit) and append the hash to the message you're sending.
message || SHA-256(message)
On the other side, Bob detaches the last 256-bit of the message (the hash), and computes SHA-256 himself on the message. If the obtained result is different from the received hash, Bob will know that someone has modified the message.
Does this work? Is this secure?
Of course not, I hope you know that. A hash function is public, there are no secrets involved, someone who can modify the message can also recompute the hash and replace the original one with the new one.
Alright, so you might think that doing the following might work then:
message || SHA-256(key || message)
Both you and Bob now share that symmetric key which should prevent any man-in-the-middle attacker to recompute that hash.
Do you really think this is working?
Nope it doesn't. The reason, not always known, is that SHA-256 (and most variants of SHA-2) are vulnerable to what is called a length extension attack.
You see, unlike the sponge construction that releases just a part of its state as final output, SHA-256 is based on the Merkle–Damgård construction which outputs the entirity of its state as final output. If an attacker observes that hash, and pretends that the absorption of the input hasn't finished, he can continue hashing and obtain the hash of message || more (pretty much, I'm omitting some details like padding). This would allow the attacker to add more stuff to the original message without being detected by Bob:
message || more || SHA-256(key || message || more)
Fortunately, every SHA-3 participants (including the SHA-3 winner) were required to be resistant to these kind of attacks. Thus, KMAC is a Message Authentication Code leveraging the resistance of SHA-3 to length-extension attacks. The construction HASH(key || message) is now possible and the simplified idea of KMAC is to perform the following computation:
KMAC also uses a trick to allow pre-computation of the keyed-state: it pads the key up to the block size of cSHAKE. For that reason I would recommend not to come up with your own SHAKE-based MAC construction but to just use KMAC if you need such a function.
TupleHash is a construction allowing you to hash a structure in an non-ambiguous way. In the following example, concatenating together the parts of an RSA public key allows you to obtain a fingerprint.
A malicious attacker could compute a second public key, using the bits
of the first one, that would compute to the same fingerprint.
Ways to fix this issue are to include the type and length of each
element, or just the length, which is what TupleHash does. Simplified,
the idea is to compute:
ParallelHash makes use of a tree hashing construction to allow
faster processing of big inputs and large files. The input is first
divided in several chunks of B bytes (where B is an argument of your
choice), each chunk is then separately hashed with
cSHAKE(custom_string=“”, . ) producing as many 256-bit output as
there are chunks. This step can be parallelized with SIMD instructions
or other techniques available on your architecture. Finally each output
is concatenated and hashed a final time with
cSHAKE(custom_string=“ParallelHash”, . ). Again, details have
1. The Strobe protocol framework is a framework to build symmetric protocols. It's all based on the SHA-3 permutation (keccak-f) and the duplex construction. Codebase is tiny (~1000LOC) and it can also be used to build simple cryptographic operations.
2. The Noise protocol framework is a framework to build things like TLS. It's very simple and flexible, and I believe a good TLS alternative for today.
Looking at the previous diagram representing the NX handshake pattern of Noise (where a client is not authenticated and a server sends its long-term static key as part of the handshake) I thought to myself: I can simplify this. For example, you can see:
an h value absorbing every messages being sent and received, and being used to authenticate the transcript at some points in the handshake.
a ck value being used to derive keys from the different key exchanges happening during the handshake.
These things can be simplified greatly by using Strobe to get rid of all the symmetric tricks, while at the same time getting rid of all the symmetric primitives in use (AES-GCM, SHA-256, HMAC and HKDF).
This is exactly how I came up with Disco, merging Noise and Strobe to simplify the former.
Here is the simplification I made of the previous diagram. We're using Strobe's functions like send_CLR, recv_CLR and AD to absorb messages being sent or received as well as the output of the different key exchanges. We're also using send_AEAD and recv_AEAD to encrypt/decrypt and authenticate the whole transcript up to this point (these functions don't exist in Strobe, but they are basically send/recv_ENC followed by send/recv_MAC).
You can see that everything looks suddenly much more simple to implement or understand. send_CLR, recv_CLR and AD are all functions that do the same thing: they XOR the input with the rate (public part) of our strobe state. It is so elegant that I made another diagram showing what is really happening in this diagram with Strobe. (Something that I obviously couldn't have done with AES-GCM, SHA-256, HMAC and HKDF.)
You can see two lines here in the StrobeState. The capacity (secret part) is on the left and the rate (public part) is on the right. Most things get absorbed by just XORing the input with the public part (of course if we reach the end of the public part, we would permute and start on a new block like we do for hashing with the sponge construction).
When we send or receive encrypted data, we also need to do a little dance and first permute the state to produce something based on all of the data we've previously absorbed (including outputs of diffie-hellman key exchanges). This output is random enough to allow us to encrypt (or decrypt) by just imitating one-time pads and stream ciphers: XORing the randomized public part with a plaintext (or a ciphertext).
Once this is done, the state is permuted again to generate a new series of random numbers (in the public part) which will be the authentication tag, allowing us to authenticate everything that was absorbed previously.
After that the state can be cloned and differentiated to allow both sides to encrypt data on different channels (unless they want to use the same channel by taking turns). Strobe functions can continue to be used to continuously encrypt/decrypt application data and authenticate the whole transcript (starting from the first handshake message to the last message sent or received).
I thought the idea was worth exploring, and so I wrote a specification and proposed it as an extension to Noise. You can read it here). Details are still being actively discussed on the Noise mailing list. Major points of contention seem to be that the Strobe functions used do not introduce intra-handshake forward-secrecy, and that the post-handshake API does not mirror the Noise's post-handshake API one (nonce-based) by default. The latter is on purpose to avoid having to setup nonces and keeping track of them if not needed (because messages are expected to arrive in order thanks to the transport protocol used underneath disco).
After all of that, I figured out that I would probably have to be the first one to implement Disco. So I went ahead and first implemented a Noise-based protocol in Golang (that I call NoisePlugAndPlay). I tested it with test vectors and other libraries to get a minimum amount of confidence in what I did, then I decided to implement Disco on top of it. The protocol I created is called libdisco.
It's more than just a protocol to encrypt communications though. Since I'm using Strobe, I can also make it a symmetric cryptographic library without adding much lines of code (100 wrapping lines of code to be exact).
Of course it's all experimental. I will not recommend anyone to use this in production.
Instead, play with it and appreciate the concepts. Down the line, this could really be the modern alternative to TLS we've been waiting for (of course I'm biased here). But the road is long and paved with issues that need better be fixed before entering a stable version.
This is a walk through of the Ethernaut capture-the-flag competition where each challenge was an ethereum smart contract you had to break.
I did this at 2am in a hotel room in Romania and ended up not finishing the last challenge because I took too long and didn't want to re-record that part. Basically what I was missing in my malicious contract: a function to withdraw tokens from the victim contract (it would have work since I had a huge amount of token via the attack). I figured I should still upload that as it might be useful to someone.
It is now a real protocol built from the Noise protocol framework!
Noise doesn't work right off-the-bat because it does not have a length field in its messages. This means that two problems can arise:
if a noise message is fragmented, the receiver will not be able to know and will not be able to parse the received fragments
if noise messages are received concatenated to one another, noise will interpret that as one big message and the integrity verification of the encrypted message (via GCM or Poly1305) will fail
In different words, without an indication of a length, noise cannot know where a message stops. Messages can get fragmented by middleboxes, and can get concatenated just because of latency or the way the other peer send its messages. In lab condition this might not be a problem, but in real life without a framing protocol below Noise things will fail quickly.
This is why by default, you can't implement the components of the Noise specification and expect it to work. Having said that, with this minimal addition of a length field things do work!
But that's not the only problem that the specification fails to tackle. The other problem is the authentication of static public keys.
You see, in Noise you have many different ways of doing handshakes (named Noise_XX, Noise_KN, ...), and some of them do not require one of the peer's authentication. Kind of like the typical browser ↔ webserver scenario where only the webserver will authenticate itself via a certificate (and perhaps the browser will later authenticate itself via credentials in a form). Some patterns that you should never use fail to authenticate both side (Noise_NN) and that is why I haven't implemented them. But for patterns that do authenticate one of the side (or both), problems arise: the Noise specification does not have any safeguards in its algorithms to prevent you from failing to authenticate the other side of the connection.
This means that if you implement Noise following the specification, patterns like Noise_XX where both sides require authentication will happily go through without caring about authenticating anything. This leads to trivial active man-in-the-middle attacks.
What I've done is that:
if the pattern in use have your peer send a static public key to the other one, I require you to provide a proof when setting up your peer. Think about a signature from an authoritative root signing key.
if the pattern in use have your peer receive a static public key from the other one, I require you to provide a callback function to verify that key, optionally using payloads sent during the handshake (for example, a certificate containing a signature).
To make things truly plug-and-play, I've created helper functions that let you generate an authoritative root signing key and create proofs or callbacks for a set of parameters. I've written some documentation here that should get you started in no time. If things are broken (this is a beta) or not clear please let me now.
And again, don't use this in production.
The biggest achievement of this implementation though is the fact that it is implementing the net.Conn interface of the Golang standard library. This mean that if you're already using networking code in your Go application, you can just replace your net.Conn or tls.Conn with noise.Conn and things will continue to work seamlessly.