david wong

Hey ! I'm David, a security consultant at Cryptography Services, the crypto team of NCC Group . This is my blog about cryptography and security and other related topics that I find interesting.

How to obtain a nice trace in Tamarin last month

ugly trace

This is not a nice trace

Learning about the inners of Tamarin, I've came up to the point where I wanted to get a nice clean trace to convince myself that the protocol I wrote made sense.

The way to check for proper functioning is to create exist-trace lemmas.

Unfortunately, using Tamarin to prove such lemma will not always yield the nicest trace.

Here are some tricks I've used to obtain a simple trace. Note that these matter less when you're writing lemmas to "attack" the protocol (as you really don't want to impose too many restrictions that might not hold in practice).

1. Use a fresh "session id". A random session id can be added to different states and messages to identify a trace from another one.

rule client_start_handshake:
    [!Init($client, ~privkey, pubkey), !Init($server, ~priv, pub), Fr(~sid)]
    --[Start(~sid, $client, $server)]->
    [Out(<'1', ~sid, $client, $server, pubkey>)]

2. Use temporals to make sure that everything happens in order.

Ex sid, client, server, sessionkey #i #j.
      Start(sid, client, server) @ #i & 
      Session_Success(sid, 'server', server, client, sessionkey) @ #j &
    #i < #j

3. Make sure your identities are different. Start by defining your lemma by making sure a server cannot communicate with itself.

Ex sid, client, server, sessionkey #i #j.
      not(server = client) & ...

4. Use constraints to force peers to have a unique key per identity. Without that, you cannot force Tamarin to believe that one identity could create multiple keys. Of course you might want your model to allow many-keys per identity, but we're talking about a nice trace here :)

restriction one_key_per_identity: "All X key key2 #i #j. Gen(X, key) @ #i & Gen(X, key2) @ #j ==> #i = #j"

rule generate_identity_and_key:
    let pub = 'g'^~priv
    [ Fr(~priv) ]--[Gen($A, pub)]->[ !Init($A, ~priv, pub) ]

5. Disable the adversary. This might be the most important rule here, and it ended up being the breakthrough that allowed me to find a nice trace. Unfortunately you cannot disable the adversary, but you can force Ins and Outs to happen via facts (which are not seen by the adversary, and remove any possible interaction from the adversary). For this you can replace every In and Out with InSecure and OutSecure and have these rules:

#ifdef not_debug
[OutSecure(a)] -->[Out(a)]
[In(a)] -->[InSecure(a)]

#ifdef debug

You can now call tamarin like this:

tamarin interactive . -Dnot_debug

And we now have a nice trace!

nice trace

Our protocol is an opportunistic Diffie-Hellman. You can see the two generate_identity_and_key rules at the top that creates the two different identities: a client and a server.

The second rule client_start_handshake is used to start the handshake. The client sends the first message 1 with a fresh session id ~sid.

The third rule server_handshake propagates the received session id to the next message and to its action fact (so that the lemma can keep the session id consistent).

The protocol ends in the last rule client_handshake_finish being used.


PS: thanks to Katriel Cohn-Gordon for answering ten thousands of my questions about Tamarin.

comment on this story

Bug in KangarooTwelve's reference implementation last month

When playing around with KangarooTwelve I noticed that the python implementation's right_encode() function was not consistent with the C version:

python implementation of right_encode

λ python right_encode.py            
right_encode(0): [0]
right_encode(1): [1][3]
right_encode(2): [2][3]
right_encode(255): [255][5]
right_encode(256): [1][0][6]
right_encode(257): [1][1][6]

C implementation of right_encode:

λ ./right_encode        
right_encode(0): 00
right_encode(1): 0101
right_encode(2): 0201
right_encode(255): ff01
right_encode(256): 010002
right_encode(257): 010102

The problem was that the python implementation used the function bytes() to create byte strings, which has breaking behaviors between python2 and python3.


>>> bytes([1])


>>> bytes([1])

The fix was to use bytearray() instead which is consistent across the two versions of Python.

comment on this story

KangarooTwelve last month

kangaroo twelve

In late 2008, 64 candidates took part in NIST's hashing competition, fighting to replace the SHA-2 family of hash functions.

Keccak won the SHA-3 competition, and became the FIPS 202 standard on August 5, 2015.

All the other contenders lost. The final round included Skein, JH, Grøstl and BLAKE.

Interestingly in 2012, right in the middle of the competition, BLAKE2 was released. Being too late to take part in the fight to become SHA-3, it just stood on its own awkwardly.

Fast forward to this day, you can now read an impressive list of applications, libraries and standards making use of BLAKE2. Argon2 the winner of the Password Hashing Competition, the Noise protocol framework, libsodium, ...

On their page you can read:

BLAKE2 is a cryptographic hash function faster than MD5, SHA-1, SHA-2, and SHA-3, yet is at least as secure as the latest standard SHA-3. BLAKE2 has been adopted by many projects due to its high speed, security, and simplicity.

And it is indeed for the speed that BLAKE2 is praised. But what exactly is BLAKE2?

BLAKE2 relies on (essentially) the same core algorithm as BLAKE, which has been intensively analyzed since 2008 within the SHA-3 competition, and which was one of the 5 finalists. NIST's final report writes that BLAKE has a "very large security margin", and that the the cryptanalysis performed on it has "a great deal of depth". The best academic attack on BLAKE (and BLAKE2) works on a reduced version with 2.5 rounds, whereas BLAKE2b does 12 rounds, and BLAKE2s does 10 rounds. But even this attack is not practical: it only shows for example that with 2.5 rounds, the preimage security of BLAKE2b is downgraded from 512 bits to 481 bits, or that the collision security of BLAKE2s is downgraded from 128 bits to 112 bits (which is similar to the security of 2048-bit RSA).

And that's where BLAKE2 shines. It's a modern and secure hash function that performs better than SHA-3 in software. SHA-3 is fast in hardware, this is true, but Intel does not have a good track record of quickly implementing cryptography in hardware. It is only in 2013 that SHA-2 joined Intel's set of instructions, along with SHA-1...

What is being done about it?

Some people are recommending SHAKE. Here is what the Go SHA-3 library says on the subject:

If you aren't sure what function you need, use SHAKE256 with at least 64 bytes of output. The SHAKE instances are faster than the SHA3 instances; the latter have to allocate memory to conform to the hash.Hash interface.

But why is SHA-3 slow in software? How can SHAKE be faster?

This is because SHA-3's parameters are weirdly big. It is hard to know exactly why. The NIST provided some vague explanations. Other reasons might include:

  • Not to be less secure than SHA-2. SHA-2 had 256-bit security against pre-image attacks so SHA-3 had to provide the same (or at least a minimum of) 256-bit security against these attacks. (Thanks to Krisztián Pintér for pointing this out.)
  • To be more secure than SHA-2. Since SHA-2 revealed itself to be more secure than we previously thought, there was no reason to develop a similar replacement.
  • To give some credits to the NIST. In view of people's concerns with NIST's ties with the NSA, they could not afford to act lightly around security parameters.


The following table shows the parameters of the SHAKE and SHA-3 constructions. To understand them, you need to understand two of the parameters:

  • The rate (r): it is the block size of the hash function. The smaller, the more computation will have to be done to process the whole input (more calls to the permutation (f)).
  • The capacity (c): it is the secret part of the hash function. You never reveal this! The bigger the better, but the bigger the smaller is your rate :)
algorithmratecapacitymode (d)output length (bits)security level (bits)
SHA3-384832 7680x06384192
SHA3-512576 10240x06512256

The problem here is that SHA-3's capacities is way bigger than it has to be.

512 bits of capacity (secret) for SHA3-256 provides the square root of security against (second pre-image attacks (that is 256-bit security, assuming that the output used is long enough). SHA3-256's output is 256 bits in length, which gives us the square root of security (128-bit) against collision attacks (birthday bound). We end up with the same theoretical security properties of SHA-256, which is questionably large for most applications.

Keccak even has a page to tune its security requirements which doesn't advise such capacities.

tune keccak

Is this really the end?

Of course not. In the end, SHA-3 is a big name, people will use it because they hear about it being the standard. It's a good thing! SHA-3 is a good hash function. People who know what they are doing, and who care about speed, will use BLAKE2 which is equally as good of a choice.

But these are not the last words of the Keccak's team, and not the last one of this blog post either. Keccak is indeed a surprisingly well thought construction relying on a single permutation. Used in the right way it HAS to be fast.

Mid-2016, the Keccak team came back with a new algorithm called KangarooTwelve. And you might have guessed, it is to SHA-3 what BLAKE2 is to BLAKE.

It came with 12 rounds instead of 24 (which is still a huge number compared to current cryptanalysis) and some sane parameters: a capacity of 256 bits like the SHAKE functions.

Tree hashing, a feature allowing you to parallelize hashing of large files, was also added to the mix! Although sc00bz says that they do not provide a "tree hashing" construction but rather a" leaves stapled to a pole" construction.

leaves stapled to a pole construction

credit: sc00bz

Alright, how does KangarooTwelve work?

First your input is mixed with a "customization string" (which allows you to get different results in different contexts) and the encoding of the length of the customization string (done with right_encode())

input kangarootwelve

Second if the message is "short" (≤8192bytes), the tree hashing is not used. The previous blob is padded and permuted with the sponge. Here the sponge call is Keccak with its default multi-rate padding 101*. The output can be as long as you want like the SHAKE functions.

short message kangarootwelve

Third if the message is "long", it is divided into "chunks" of 8192 bytes. Each chunk (except the first one) is padded and permuted (so effectively hashed). The end result is structured/concatenated/padded and passed through the Sponge again.

long message kangarootwelve

CORRECTION: the last padding uses right_encode(number of chunks - 1) which in the example is right_encode(3) instead of right_encode(4)

This allows each different chunk to be hashed in parallel on different cores or via some SIMD magic.

Before you go, here is a simple python implementation and the reference C implementation.

Daemen gave some performance estimations for KangarooTwelve during the rump session of Crypto 2016:

  • Intel Core i5-4570 (Haswell) 4.15 c/b for short input and 1.44 c/b for long input
  • Intel Core i5-6500 (Skylake) 3.72 c/b for short input and 1.22 c/b for long input
  • Intel Xeon Phi 7250 (Knights Landing) 4.56 c/b for short input and 0.74 c/b for long input

Thank-you for reading!

comment on this story

Strobe's padding last month

Let me re-introduce STROBE: it's a protocol framework based on the Duplex construction. It's kind of the symmetric part of TLS if you will, where everything depends on what happened previously. Kind of like when Noise uses the hash of the entire transcript to derive new keys.


You can do different things with different "operations". Encrypt some plaintext, generate some MAC, generate some random numbers, etc... These different operations can be combined to form a protocol, or can be used as stand-alone in your logic. This is one of the beautiful aspect of STROBE (heavily based on the beauty of Keccak): it provides you with numerous primitives with only a few APIs, and thus a very small source code: count 1,046 lines of code for the current reference implementation in C.

But like anything based on Keccak, you need a padding and it has to be complicated :)

Duplex Calls

The initialization of a Strobe object is done with cSHAKE by XOR'ing this data: [1,self.R,1,0,1,12*8, "STROBEv1.0.2"] then permuting the state.

After the initialization, for every duplex call (before a permutation happens) a padding is applied. This successfully allows series of duplex call to be seen as one cSHAKE call. The rest of this article explains how duplex calls work. These duplex calls are made of potentially multiple operations combined with a padding function, which in STROBE is a combination of cSHAKE's padding and STROBE's own padding.

if we just started an operation:

[old_begin, flag, data up until the block size (strobe.R)]

flag is one byte computed from:

  • The direction of the message when sending or receiving data (I_0).
  • The type of operation.

old_begin is the start of the last operation in the current block. It can be seen as the pos_begin of the previous operation. It is computed as:

  • If we're starting an operation in a new block: 0. The older operation happened in a previous block.
  • If we’re starting an operation in the same block as the previous operation: the starting position of the previous operation in the current block.

if we have already started an operation:

[data up until the block size (strobe.R)]

when a permutation is called (forced or because we reached the blocksize) append:

[pos_begin, cSHAKE's padding = 0x04, 0x00 ... 0x00, 0x80]

pos_begin is the start of our operation in the current block, it is computed as:

  • If we're starting the operation on a fresh block: 1. (At index 0 is the value old_begin which can be seend as the padding of the previous operation.)
  • If we've already started the operation and already ran F once: 0, indicating that the operation started in a previous block.
  • if we're starting the operation, and we're in the same block of another operation: the position of the new operation (which starts at the flag, not the old_begin)


Example 1: the second operation starts on a new block

block1[old_begin=0, flag, data_1, ..., data_R-2, pos_begin=1, padding= 0x04, 0x80] // the first operation takes more than one block
block2[data_R-1, pos_begin=0, padding=0x04, 0, ..., 0, 0x80] // end of the first operation (and forced permutation)
block3[old_begin=0, flag, data_1, ..., data_R-3, pos_begin=1, padding=0x04, 0, 0x80] // start of the second operation

Example 2: the second operation starts on the same block as the previous operation

block1[old_begin=0, flag, data_1, ..., data_R-2, pos_begin=1, padding= 0x04, 0x80] // the first operation takes more than one block
block2[data_R-1, old_begin=1, flag, data2_1, pos_begin=2 , padding=0x04, 0, ..., 0, 0x80] 

Example 3: three operations in the same block

block1[old_begin=0, flag, data_1, ..., data_R-2, pos_begin=1, padding= 0x04, 0x80] // the first operation takes more than one block
block2[data_R-1, old_begin=1, flag, data2_1, pos_begin=2 , old_begin=2, flag, data3_1, pos_begin=5, padding=0x04, 0, ..., 0, 0x80]
comment on this story

Tamarin Prover last month

I was at Eurocrypt's workshop in Paris last week. One of the reason is because I really like workshops. It was good. Good because of the Tamarin workshop, less good because it ended up just being like a normal conference, but still good because it was incredibly close to Real World Crypto.

Tamarin is a "protocol" verification tool. You give it a protocol, a set of security properties it should respect and Tamarin gives you a formal proof (using constraints solving) that the security guarantees are indeed provided OR it gives you a counter example (an attack).

By default, the adversary is a Dolev-Yao adversary that controls the network and can delete, inject, modify and intercept messages on the network

The thing is realllyyyy easy to install.

In the morning, Cas Cremers gave us an intro to Tamarin and taught us how to read inputs and outputs of the tool. In the afternoon, Ralf Sasse told us about more complex concepts while forcing us to write some input for Tamarin.

It was a lot of fun. Way more fun than I expected. I have to admit that I came skeptical, and never really considered formal verification as "practical". But this day changed my mind.

In an afternoon I was capable of creating a pretty simple input to model an opportunistic ephemeral Diffie-Hellman key agreement. The input I wrote is available here. (I have made a better one here and Santiago Zanella made one here as well).

You can see a lot of rules describing the possible transitions of the state machine, and a final lemma which is saying "for any created session (where neither a key or a session key has been leaked) a man-in-the-middle attacker cannot influence the protocol to retrieve the session key". Of course this is ugly, and reading it again a week afterwards I see things I want to rewrite, and probably some "pros" are going to cringe... But it worked out! and Tamarin ended up finding the trivial man-in-the-middle attack.

If you want to play with it, check the online manual which is just great. It provides a few examples that quickly get you started. The hardest part, you must have guessed, is to translate your protocol into this Tamarin input language.

comment on this story

Keeping up with TLS 1.3 last month


Ekr kick started the TLS:DIV workshop last Sunday. "The number of changes since draft 13 is too damn high" read one of the slide. Not wrong I said to myself. I did read draft 18 in its entirety when we had to review Cloudflare's TLS 1.3 implementation, and I tried to keep up with the changes ever since but I can honestly say that I completely failed.

So I thought, why not creating a nice diff that would allow me to go through all these changes just by reading the spec one more time. With the magic of git diff --color-words --word-diff=porcelain -U1000000 and some python I created a nice spec that shows up differences between draft 18 and the latest commit on the github spec.


You can find it here

If you want the same thing for a different draft version say something in the comment section!

comments (2)

A hash function does not provide integrity! April 2017

Some of you might have seen the answer of this famous stack overflow question what are the differences between a digital signature, a mac and a hash?:

hash vs mac vs signatures

The above table is from the most upvoted answer --but it is false. A hash function does not provide integrity, a MAC provides integrity.

Instead a cryptographic hash function provides three properties, well defined in the world of cryptography: collision resistance, pre-image resistance and second pre-image resistance. Nothing else.

comments (2)

SHAKE, cSHAKE and some more bit ordering April 2017


cSHAKE is something to make your own SHAKE! It's defined in this NIST's doc SP 800-185 along other awesome Keccak structures like KMAC (a message authentication code a la hash(key + data) with no length-extension attacks!) and TuppleHash (a hash function to hash structures without ambiguity (fingerprints!)).

And without surprises, it suffers from the same SHA-3 bit ordering problems I talked about in this blogpost and this one as well.

So I thought it would be a good opportunity for me to talk more about SHAKE, cSHAKE and remind you how the bit ordering works!

What is SHAKE?

SHAKE (Secure Hash Algorithm and KECCAK), defined in FIPS 202, is an Extendable-Output Function (or XOF). It's like SHA-3 but with an infinite output. For whatever length you need. You can use that as a KDF, a PRNG, a stream cipher, etc ...

It's based on The Keccak permutation function, and it works pretty much like SHA-3 does. The two differences are that a different domain separator is used (1111 is appended to the input instead of 01 for SHA-3) and you can continue to squeeze the sponge as much as you like (whereas SHA-3 defines how much you should squeeze).

Of course the squeezing will affect the security of your output, if used as a hash you can expect to have 128-bit security against all attacks (including collision resistance) when using an output of 256-bit. If you truncate that down to 128-bit, you will get rid of the collision resistance property but still have (at least) 128-bit security against pre-image attacks. More information is available in the FIPS 202 document where you will find this table in Appendix A:

sha and shake security


SHA-3 and SHAKE were great constructions, but Keccak has so much to offer that the NIST had to release another document: SP 800-185 (for "special publication"). It covers some other constructions (as I said previously), but this is still not the extent of what is possible with Keccak. If you're interested, check KangarooTwelve (which is to SHA-3 what Blake2 is to Blake, a less secure but more efficient hash function), the duplex construction (use the sponge continuously!), Ketje and Keyak (two authenticated encryption modes), ...

SP 800-185 defines cSHAKE as a customable version SHAKE. It works the same except that you have one new input to play with: a customization string. If you've used HKDF before you will quickly get its purpose: avoid collisions between different usages of the Shake function. Here are two examples given by the NIST special publication:

  • cSHAKE128(input: public_key, output: 256 bits, "key fingerprint"), where "key fingerprint" is the customization string.
  • cSHAKE128(input: email_contents, output: 256 bits, "email signature")

how it works

As with SHAKE, you can choose an output length, here is how the input and the customization string gets fed to the cSHAKE function:

  1. Use an empty N string (this is for NIST use only)
  2. Choose a customization string S
  3. apply encode_string() on the empty string N and on your customization string S
  4. apply the bytepad function to both with the rate of the permutation in bytes (168 in the case of cSHAKE128).
  5. prepend this to your input
  6. append the bits 00 to your input (domain separator)
KECCAK(bytepad(encode_string(N) || encode_string(S), 168) || input || 00)

What are these functions?

encode_string: left_encode(len(input)) || input.

left_encode: a way to encode a length. It encodes the length of the length, then the length :) all of that in an LSB bit numbering.

bytepad: left_encode(block_size) | input | 0000...0000 with enough 0s for the resulting output to be a byte array multiple of the rate.

The above function can be rewritten:

KECCAK(left_encode(block_size) || left_encode(0) || "" || left_encode(len(S)) || S || 0000...0000 || input || 00)

(This input will then get padded inside of Keccak.)


These functions are defined in the special publication NIST document, but they are quite different in the reference implementation).


So here is another explanation on this bit re-ordering trick. Imagine that we have this input to the left_encode function:

0001 0000 | 0011 0001 | 0111 1110

As an hex table it would look like this:

0x10 | 0x31 | 0x7e

Now, the left_encode specification tells us to completely reverse this value, then add a reversed length on the left (here 3). It would look like this:

            0111 1110 | 1000 1100 | 0000 1000
1100 0000 | 0111 1110 | 1000 1100 | 0000 1000

This is quite annoying to code. Fortunately, the reference implementation has a trick: it takes the bytes as-is but re-position them following the little endian ordering convention, then add the padding in a natural MSB first bit numbering:

            0111 1110 | 0011 0001 | 0001 0000  
0000 0011 | 0111 1110 | 0011 0001 | 0001 0000

The Sponge code then re-interprets this in little endian convention:

0001 0000 | 0011 0001 | 0111 1110 | 0000 0011

Which is exactly what was expected, except in reversed order :) which is OK because if you remember correctly from my previous blogpost every operation of the spec is implemented in reversed in the reference implementations.

How to use cSHAKE?

The official Keccak Code Package contains all the keccak functions in C. I've written a Golang version of cShake here that you can put in your golang.org/x/crypto/sha3/ directory, as it hasn't been reviewed by anyone yet I would take this with chopsticks.

comment on this story

Byte ordering and bit numbering in Keccak and SHA-3 April 2017

As I talked about in a previous blogpost, Keccak and SHA3-3 are different in their bit convention, which gave birth to quite an overly complicated specification. The lack of good explanations in both the reference implementations and the reference documents led me to write this blogpost.


Before going further, let me briefly re-explain the multi-rate padding also called pad10*1:

  1. take the input and append the delimiter (01 for SHA-3)
  2. append a 1 to start the padding
  3. append enough 0s followed by a final 1 so that the resulting output is a multiple of the rate

Here is a graphical example with input 0x9c = 1001 1100 and where the rate of our permutation is only 3 bytes:


Now, I'll just acknowledge that most implementations, and even the specification, talk about using 0x80 for the final bit. This blogpost should answer why.

Theoretical bit numbering

Keccak is specified in term of bits, but is discreetly aware of byte arrays as well. This is important as the rounds of Keccak require us to XOR parts of the input with lanes of the state. These lanes can be represented as uint64 types for keccak-f[1600], which we'll be looking at in this blogpost.

It could be hard to understand what is really going on in FIPS 202, but a look at an older version of the specification shows that Keccak interprets byte arrays in big-endian, but with an LSB bit numbering.

If you look at the Appendix B.1 of FIPS 202. You can see that before an input can be interpreted by SHA-3, it needs to go through some transformations.

Here is our own example of how it works with an input of 0xaebcdf. First the bits are re-mapped, then a padding is added, finally it is split into as many lanes as possible (here we have two 16-bit lanes):


The same examples with bits:


Two things that you must understand here:

  • the padding is applied after re-mapping the bits, not before! Since the padding is already defined for Keccak's internal ordering of its bits.
  • all of this is done so that a bit string is indexed as LSB first, not MSB first. It's weird but Keccak did it this way.

Why is the indexing of a bitstring so important?

Because we later apply the permutation function on the state, and some of the operations need to know what are the indexes of each bits. More on that later.

How to implement the re-mapping

It must be annoying to reverse all these bits right? What about not doing it? Well brace yourself, there is indeed a clever trick that would allow you to avoid doing this. And this trick is what is used in most of SHA-3's implementations. Here is a graphical explanation of what they do:


By reading the byte array as little-endian, and using an already reversed delimiter + padding, some magic happens!

The result is exactly the same but in reverse order.

If you aren't convinced look at the following diagram (which shows the same thing with bits) and look at the previous section result. Same, but the bitstream is readable in the reversed direction.


This means that now the index 0 of Keccak's internal ordering is on the right, not on the left. How will it influence the round functions that apply on our lanes?


It turns out that a lot of operations are unchanged. Who would have known that XORs, ANDs and NOTs operations were not affected by the order of the bits! but only some rotations and bit positioning are. If you look at the reference implementations you will see that indeed, rotations have been reversed (compared to what the reference tells you to do).

And this is how a clever implementation trick made its way in a specification with no real explanation.

You're welcome!

comment on this story

SHA-3, Keccak and disturbing implementation stories April 2017

"Why is Java a big pile of crap?" said the article. And after reading through the argumentation, I would move to the comment section and read a divergent point of view. Who was right? I wondered. Although everyone knows that Java is indeed a huge pile of crap, many other articles follow the same path of disputing one biased opinion that might be right, wrong, but really is both. I often mused on if I would come to write such a piece, and I think today is the day.

Keccak's (and/or SHA-3's) specification makes no sense.

Yup, it makes no sense, and I have a list of points you'll have to read through to understand how exactly.

By the way, if you didn't know, Keccak was the winner of the SHA-3 competition and blablabla...


Chocoweird indexing

First, let me introduce you to the internal state of Keccak.

internal state keccak

It is some sort of rectangular 3D object, and its different sub-objects have different names. Each little cube is a bit (a 1 or a 0). Cute.

I mean why not, AES was a square right? If we make it 3D we augment the security by one dimension. Sounds logical.

It gets weirder.

This is how the little cubes are indexed.

indexing keccak

If you've started to reason on how you could translate that thing into a struct, stop. Don't overthink it. The x and y coordinates don't matter: all operations are done modulo the other columns or rows or you name it... you could have the indexes x = 0 and y = 0 anywhere really; it wouldn't change a thing. This picture doesn't even appear in Keccak's specification, only in FIPS 202. It is probably a joke from the NIST.

Multiple of bytes? Nopecheese

A uint8_t array

...you must have thought, still trying to have a head start, still shooting to figure out how you would stuck these bits in your code.

But wait. The NIST loves to think about bits though, not bytes. That's often surprising to people who end up implementing their specs with a byte length instead of a bit length and it led to what DJB calls a bit attack.

SHA-3 was no exception, NIST said to Keccak: you shall be able to handle the darkest desires of our stupid developers. You must account for ALL INPUTS. You must accept non-multiple of bytes.

Keccak provided. You can now hash 7-bit inputs with SHA-3.

The poor user is given enough rope with which to hang himself -- something a standard should not do. —Rivest, 1992, commenting on the NIST/NSA “DSA” proposal

Bit indexing brainfudge

Let's talk about bit numbering, not byte order like big-endian and little-endian, no-no, bit numbering.

There are two ways to order bits in a byte (and we'll say here that a byte is an octet: it contains 8 bits):

  • MSB first: 0x02 = 0000 0010, 0x01 = 0000 0001, ...
  • LSB first: 0x40 = 0000 0010, 0x80 = 0000 0001, ...

Now what does this have to do with Keccak?

Keccak's bit numbering is LSB first, whereas NIST's one is MSB first (no kidding).

This means a re-mapping needs to take place when converting an input to the internal state of Keccak. This was all explained in the old specification of Keccak (see section 5). Unfortunately these explanations disappeared in the latest version of the spec, and the NIST ended up writing up the conversion in an appendix (B.1) of their own specification FIPS 202.

Let me tell you: FIPS 202's explanation makes no sense. They end up mixing the theoretical internal state of Keccak with methods on how you can implement the re-mapping without having to inverse the bits of each bytes. It took me a while to figure it out and I am not the only one (most questions about Keccak/SHA-3 on internet end up being about their bit re-ordering).


Is SHA-3 Complicated? Some people would say no. But in reality there is no way to understand how to implement SHA-3 without looking at a reference implementation. NIST standards should seriously take a look at the process of TLS 1.3: no-one has been seen copying a reference implementation. Implementers are independently coding up what they understand of the draft specifications, and if cross-testing doesn't work it's either because they failed to understand something or because the specification needs some more love.

Having said that, Keccak is really cool and some of its applications look promising.

comments (1)

One GCM implementation pitfall March 2017

If you look at Go's implementation of GCM, in particular this, you can see that the counter is set to nonce||1:

if len(nonce) == gcmStandardNonceSize {
    // Init counter to nonce||1
    copy(counter[:], nonce)
    counter[gcmBlockSize-1] = 1

It needs to be. Without it, the first block of keystream is the encryption of 0 if the nonce is 0 (which can happen if nonces are generated from a counter). The encryption of 0 is also... the authentication key!

comment on this story

TLS 1.3 - Draft 19 March 2017

Draft 19 has been published, this is a quick post talking about what has changed. This is mosty taken from the draft changelog as well as reading the commits since version 18. I've tried to keep up with what was important, but I ignored the many typos and small fixes, as well as what I think doesn't matter from draft to draft (exporters, cookies, ...)

Add pre-extract Derive-Secret stages to key schedule

The key schedule part of TLS 1.3 is quite tricky. At different stage of the connection a "secret" is derived from different inputs, and used again to derive different keys and IVs (to encrypt/decrypt data). When a new phase begins (for example when handshake messages need to be encrypted, or when the handshake is over and application data messages need to be encrypted) a new secret is derived and new keys can be derived from that secret.

key schedule

Here you can see (the top arrow) that the Handshake secret is derived using the HKDF-Extract() function with the previous secret and the Diffie-Hellman key exchange output. Keys and IVs for both the Client and the Server are then derived from that secret (arrows pointing to the right). Next (all the way down) the Master secret is derived from that secret and the traffic keys, protecting the application data, can be derived from it.

Notice the Derive-Secret() function being used before re-using HKDF-Extract() again. This is new in Draft 19. This Derive-Secret() is the HKDF-Expand() function. If you know HKDF you know that it acts like a lot of different KDFs: in two steps. It extract the entropy given, and it then expands it. This was used to derive keys (you can see it with the arrows pointing on the right), but not to derive secrets. It is now fixed and that's why you can see it being used to derive the new Master secret. One of the positive outcome of this change is that HKDF can now more easily be replaced with another KDF.

Consolidate "ticket_early_data_info" and "early_data" into a single extension

This was an easy one.

The early_data extension was an empty extension used by a Client in the ClientHello message when it wanted to send 0-RTT data; and by a Server in the EncryptedExtensions message to confirm acceptance of the 0-RTT data.

The ticket_early_data_info was an extension that a Server would send in a ticket (for session resumption) to advertise to the Client that 0-RTT was available. It only contained one field: the maximum size of the data that should be sent as 0-RTT data.

Both are now merged under early_data which can be used for both purposes. Less extensions :) it's a good thing.

struct {} Empty;

struct {
   select (Handshake.msg_type) {
       case new_session_ticket:   uint32 max_early_data_size;
       case client_hello:         Empty;
       case encrypted_extensions: Empty;
} EarlyDataIndication;

Change end_of_early_data to be a handshake message

To use the 0-RTT feature of TLS 1.3, a client can start sending 0-RTT data right after a ClientHello message. It is encrypted with a set of keys and IVs derived from the PSK and the ClientHello. The PSK will typically be some resumption secret from a previous connection, but it can also be a pre-shared key (yeah, you can't do 0-RTT if you don't already share some secret with the server). The image below is taken from the key schedule section.

key schedule early data

When a client is done sending this 0-RTT data, it can then finish the handshake by sending the remaining handshake messages (Certificate, CertificateVerify, Finished). These remaining handshake messages are encrypted (new in TLS 1.3) under a handshake key. To warn the server that the client is done sending 0-RTT data and that it is now using the handshake key to encrypt messages, it would previously (draft 18) send a end_of_early_data alert. This alert is now a handshake message! Which is what it should have been all along :)

Add state machine diagram

New diagrams representing the different client/server state machines made their way in draft 19! TLS 1.3 will officially be the first TLS RFC to provide so many diagrams. And they're beautiful too, what an artistic performance.

server workflow

tls 1.3 state machine client

client workflow

tls 1.3 state machine server

That's about it

I haven't spotted anything else major. Some SHOULDs became MUSTs, and some MUSTs became SHOULDs. At the moment, I believe the PSK and 0-RTT flows are quite hard to understand. Perhaps a diagram representing the flow of PSK from a server or client would be nice to have. Eric Rescorla has pointed out a few questions that should be answered to move the draft forward. I've seen so many issues, PR and commits these last weeks that I'm wondering when TLS 1.3 will be considered ready... but it's moving forward :)

comments (1)

NCC Con and the NCC T-shirt January 2017

After Real World Crypto, like every year, NCC Con follows next.

NCC Con is the NCC Group conference, a conference for its employees only, with a bunch of talks and drinks. I gave a talk on TLS 1.3 that I hope I can translate to a blog post at some point.

This year I also designed the NCC Group's Mascot! Which was given away as a T-shirt and a sticker to all the employees at the conference. It was a pretty surreal moment for me to see people around me, at the conference and in the casinos (it was in Vegas) wearing that shirt I designed :D



And here is a wallpaper with my original submission

comments (2)

Real World Crypto 2017: Day 3 January 2017

(The notes for day 2 are here.)

The first talk on Quantum Computers was really interesting, but unfortunately mostly went over my head. Although I'm glad we had a pro who could come and talk to us on the subject. The take away I took from it was to go read the SMBC comics on the same subject.

After that there was a talk about TPMs and DAA for Direct Anonymous Attestation. I should probably read the wikipedia page because I have no idea what that was about.

Helena Handschuh from Cryptography Research talked about DPA Resistance for Real People. There are many techniques we use as DPA countermeasures but it seems like we still don't have the secret sauce to completely prevent that kind of issues, so what we really end up doing is rotating the keys every 3 encryption/decryption operations or so... That's what they do at Rambus, and at least what I've heard other people doing when I was at CHES this year. Mike Hamburg describes the way they rotate keys in his STROBE project a bit. Handschuh also talked about the two ways to certify a DPA-resistant product. There are evaluations like Common Criteria, which is usually the normal route, but now there is also validation. Don't ask me about that.

David Cash then entered the stage and delivered what I believe was the best talk of the conference. He started with a great explanation of ORE vs OPE. OPE (Order Preserving Encryption) allows you to encrypt data in a way that ciphertexts conserve the same lexicographic order, ORE (Order Revealing Encryption) does not, but some function over the ciphertexts end up revealing the order of the plaintexts. So they're kind of the same in the end and the distiction doesn't seem to really matter for his attacks. What matters is the distinction between Ideal ORE and normal ORE (and the obviously, the latter is what is used in the real world).

Ideal ORE only reveals the order while the ORE schemes we use also reveal other things, for example the MSDB (most significant different bits) which is the position of the first non-similar bit between two plaintexts.

Previous research focused on attacking a single column of encrypted data while their new research attacks columns of correlated data. David gives the example of coordinates and shows several illustrations of his attack revealing an encrypted image of the linux penguin, encrypted locations on a map or his go-about saved and encrypted by a GPS. Just by looking at the order of coordinates everything can be visually somewhat restored.

Just by analyzing the MSDB property, a bunch of bits from the plaintexts can be restored as well. It seemed like very bad news for the ORE schemes analyzed.

Finally, two points that seemed really important in this race for the perfect ORE scheme is that first: the security proofs of these constructions are considering any database data as uniformly random, whereas we know that we rarely need to store completely random data :) Especially columns are often correlated with one another. Second, even an (hypothetical) ideal ORE was vulnerable to their research and to previous research (he gave the example of encrypting the whole domain in which case the order would just reveal the plaintexts).

This is a pretty bad stab at ORE scheme in general, showing that it is intuitively limited.

Paul Grubbs followed with an explanation of BoPETS, a term that I believe he recently coined, meaning "Building on Property revealing EncrypTion". He gave a good list of products that I replicated below in a table.

Order Preserving Encryption SAP, Cipherbase, skyhigh, CipherCloud, CryptDB
Searchable Encryption Shadowcrypt, Mylar, kryptonostik, gitzero, ZeroDB, CryptDB
Deterministic Encryption Perspecsys, skyhigh, CipherCloud, CryptDB

They looked at Mylar and saw if they could break it from 3 different attacker setups: a snapshot setup (smash and grab attack), passive (attacker got in, and is now observing what is happening), active (attacker can do pretty much anything I believe).

Mylar uses some encrypted data blob/structure in addition to a "principal graph" containing some metadata, ACL, etc... related to the encrypted data. Grubbs showed how he could recover most of the plaintexts from all the different setups.

Tal Malkin interjected that these attacks would probably not apply to some different PPE systems like IBM OXT. Grubbs said it did not matter. Who's right, I don't know.

As for the active attacker problem, there seem to exist no intuitive solution there. If the attacker can do anything, you're probably screwed.

Raluca Ada Popa (Mylar) followed Grubbs by talking about her product Mylar and rejected all of Grubbs claims, saying that there were out of the scope of Mylar OR were attacking mis-use of the product. IIRC the same thing happened when CryptDB was "broken", CryptDB released a paper arguing that these were false claims.

After Mylar, Popa intend to release two new products with better security: Verena and Opaque.

David Mcgrew mentionned Joy and gave a long talk about detecting PRNG failures. Basically look for public values affected by a PRNG like signatures or the server/client random in TLS.

And that was it. See you next year.

If you have anything to say about my notes, the talks, or other people's notes, please do so in the comments :)

There was a HACS workshop right after RWC, and Tim Taubert wrote some notes on it here.

comment on this story

Real World Crypto 2017: Day 2 January 2017

Here we go again. Some really short notes as well for today. (The notes for day 1 are here.)

Trevor Perrin talked about Message Encryption from an historical point of view, from key directories to public key infrastructures and how to authenticate users to each other. Something interesting that Trevor talked about was CONIKS, some sort of Certificate Transparency-like protocol but for secure messaging (they call it key transparency).

when Alice wants to send a secure message to some other user, say Bob, her CONIKS client looks up Bob's key at the key directory, and verifies that this key has not changed unexpectedly over time. It also checks that this key is consistent with the key other clients are seeing for Bob. Only if these two consistency checks pass will the CONIKS client send Alice's message to Bob. The CONIKS client also performs these same checks for Alice's own key on a regular basis to ensure that the service provider is not tampering with Alice's key.

This sounds like an audit system (users can check what a key distribution server has been up to) + a gossip protocol (users can talk between them to verify consistency of the obtained public keys). Which seems like an excellent idea and makes me wonder why would Signal not use it.

djb mentioned the Self-Healing feature of ZRTP, similar to the recovery feature of Signal.

ZRTP caches symmetric key material used to compute secret session keys, and these values change with each session. If someone steals your ZRTP shared secret cache, they only get one chance to mount a MiTM attack, in the very next session. If they miss that chance, the retained shared secret is refreshed with a new value, and the window of vulnerability heals itself, which means they are locked out of any future opportunities to mount a MiTM attack. This gives ZRTP a "self-healing" feature if any cached key material is compromised.

Signal's self-healing property comes from the fact that an ephemeral Diffie-Hellman key agreement is continuously happening during communication. Like ZRTP it seems like it works out well only if the attacker is slow to act, thus it doesn't seem to be exactly comparable to backward secrecy (which might just be impossible in a protocol).

Later, someone (I don't know who from Felix Günther, Britta Hale, Tibor Jager and Sebastian Lauer because the program doesn't specify who the speaker is), presented on a 0-RTT system that would provide forward secrecy and anti-replayability. 0-RTT is one of the feature of TLS 1.3, which allows a client to start sending encrypted data to a server during its very first flight of messages. Unfortunately, and this was the topic of many discussions on TLS 1.3, these messages are replayable. The work builds on top of Math Green's work with Puncturable Encryption where the server (and the client?) use some key derivation system and remove parts of it after a message has been sent using the 0-RTT feature. I am not sure if this system is really efficient though, especially since the point of 0-RTT is to be able to be fast. If this solution isn't faster or, worse, slower than doing a normal TLS 1.3 handshake (1.5 round trips) then the 0-RTT has no meaning in life anymore.

It also seems like this wouldn't be applicable to the "ticket" way of doing 0-RTT in TLS 1.3, which basically encrypts the whole state and hand up the opaque blob to the client, this way the server doesn't store anything.

Hugo Krawczyk (the HKDF guy) talked about passwords and leaks with some Comic Sans MS (and there was this handy website to check if your username/password/... had been compromised). Hugo then presented some of his recent work on SPHINX, PPSS, X-PAKE, ... everything is listed with link to papers here.

SPHINX is a client-focused and transparent-to-the-server password manager (like all of them really). The desktop password manager uses some derivation parameter stored online or on a user's mobile phone to derive any website key from a master password. The online service or the mobile phone never sees anything (thanks to a simple blinding technique, reminding me of what Ari Juel did last year's RWC with PASS). Because of that, no offline attack are possible. The slides are here and are pretty self explanatory. I have to admit that the design makes a lot of sense to me. I dozed off for the second part of the talk but it was about "How to store a secret" and his PPSS thing (Password Protected Secret Sharing), same for the third part of the talk that was about X-PAKE, which I can imagine was a mix of his ideas with the PAKE protocol.

There were two talks about memory-hardness and proving that password hashing functions are memory-hard. It seemed like some people think it's important that these functions be data-independent as well (probably because in some cases cache attacks might be an issue). Most of the techniques here seemed to make sure that a minimum amount of memory was to be used at all time, and that this couldn't be reduced. I would have liked to see a comparison between Argon2 (the winner of the PHC), Blake 2 (which seems to be the thing people like and use) and Balloon Hashing (which seems to be Dan Boneh's new thing).

George Tankersley and Filippo Valsorda finished the day with a talk on Cloudflare and their CAPTCHA problem. A lot of attacks/spam seems to come from TOR, which has deteriorated the reputation of the TOR nodes' IPs. This means that when Cloudflare sees some traffic coming from TOR, it will present the user with a CAPTCHA to make sure they are dealing with a human. TOR users tend to strongly dislike Cloudflare because these CAPTCHA are shown for every different website, and for every time the TOR path is changed (10 minutes?). This, in addition to TOR already slowing down your browsing efficiency, has annoyed more than one person. Cloudflare is trying to remediate the problem by giving not one, but N tokens to the user for one CAPTCHA solved. By using blind signatures Cloudflare hopes to demonstrate its inability to deanonymize users by using a CAPTCHA token as a tracking cookie.

(I have been made aware of this problem in the past and have manually added TOR visitors as an "allowed country" in my Cloudflare's setup for cryptologie.net., which is one of the solution given to Cloudflare's customers.)

I believe the drafted solution is readable here if you're interested.

Here are more resources:

The notes for day 3 are here.

comment on this story