David Wong | Cryptologie | Markdown http://www.cryptologie.net/ About my studies in Cryptography. en-us Fri, 01 Sep 2017 16:43:25 +0200 Go Assembly by Example David Wong Fri, 01 Sep 2017 16:43:25 +0200 http://www.cryptologie.net/article/420/go-assembly-by-example/ http://www.cryptologie.net/article/420/go-assembly-by-example/#comments
[You can check it out here](https://davidwong.fr/goasm), and someone already translated it in Chinese [here](http://colobu.com/goasm/).

![example](/upload/Screen_Shot_2017-09-01_at_3.43_.04_PM_.png) ]]>
Zero'ing memory, compiler optimizations and memset_s David Wong Fri, 25 Aug 2017 14:42:19 +0200 http://www.cryptologie.net/article/419/zeroing-memory-compiler-optimizations-and-memset_s/ http://www.cryptologie.net/article/419/zeroing-memory-compiler-optimizations-and-memset_s/#comments
When a program uses a **secret key** for some **cryptographic operation**, it will store it somewhere in memory. This is a problem because it is trivial to read what has been previously stored in memory from a different program, just create something like this:

```c
#include <stdio.h>

int main(){
unsigned char a[5000];
for(int i = 0; i < 10000; i++) {
printf("x", a[i]);
}
printf("\n");
}
```

This will print out whatever was previously there in memory, because the buffer `a` is not initialized to zeros. Actually, C seldom initializes things to zeros, it can if you specifically use something like `calloc` instead of `malloc` or `static` in front of a global variable/struct/...

EDIT: as [Fred Akalin](https://twitter.com/fakalin/status/902120747472011265?refsrc=email&s=11) pointed to me, it looks like this is [fixed in most modern OS](https://softwareengineering.stackexchange.com/questions/181577/is-it-possible-to-read-memory-from-another-program-by-allocating-all-the-empty-s). [Colin Perceval](http://www.daemonology.net/blog/2014-09-04-how-to-zero-a-buffer.html) notes that there are other issues with not zero'ing memory:

> if someone is able to exploit an unrelated problem — a vulnerability which yields remote code execution, or a feature which allows uninitialized memory to be read remotely, for example — then ensuring that sensitive data (e.g., cryptographic keys) is no longer accessible will reduce the impact of the attack. In short, zeroing buffers which contained sensitive information is an exploit mitigation technique.

**This is a problem**.

To remove a key from memory, developers tend to write something like this:

```c
memset(private_key, 0, sizeof(*private_key));
```

Unfortunately, when the compiler sees something like this, **it will remove it**. Indeed, this code is useless since the variable is not used anymore after, and the compiler will optimize it out.

**How to fix this issue?**

A [memset_s](http://en.cppreference.com/w/c/string/byte/memset) function was proposed and introduced in C11. It is basically a safe `memset` (you need to pass in the size of the pointer you're zero'ing as argument) that will not get optimized out. Unfortunately as [Martin Sebor](https://sourceware.org/bugzilla/show_bug.cgi?id=17879#c2) notes:

> memset_s is an optional feature of the C11 standard and as such isn't really portable. (AFAIK, there also are no conforming C11 implementations that provide the optional Annex K in which the function is defined.)

To use it, a `#define` at the right place can be used, and another `#define` is used as a notice that you can now use the `memset_s` function.

```c
#define __STDC_WANT_LIB_EXT1__ 1
#include <string.h>
#include <stdlib.h>

// ...

#ifdef __STDC_LIB_EXT1__
memset_s(pointer, size_data, 0, size_to_remove);
```

Unfortunately you cannot rely on this for portability. For example on macOS the two `#define` are not used and you need to use `memset_s` directly.

[Martin Sebor](https://sourceware.org/bugzilla/show_bug.cgi?id=17879#c2) adds in the same comment:

> The GCC -fno-builtin-memset option can be used to prevent compatible compilers from optimizing away calls to memset that aren't strictly speaking necessary.

Unfortunately, it seems like macOS' gcc (which is really clang) ignores this argument.

**What else can we do?**

I asked [Robert Seacord](https://en.wikipedia.org/wiki/Robert_C._Seacord) who always have all the answers, here's what he gave me in return:

```c
void *erase_from_memory(void *pointer, size_t size_data, size_t size_to_remove) {
if(size_to_remove > size_data) size_to_remove = size_data;
volatile unsigned char *p = pointer;
while (size_to_remove--){
*p++ = 0;
}
return pointer;
}
```

**Does this `volatile` keyword works?**

Time to open `gdb` (or `lldb`) to verify what the compiler has done. (This can be done after compiling with or without -O1, -O2, -O3 (different levels of optimization).)

Let's write a small program that uses this code and debug it:

```c
int main(){
char a[6] = "hello";
printf("%s\n", a);
erase_from_memory(a, 6, 6);
}
```

![gdb](/upload/gdb.png)

1. we open gdb with the program we just compiled
2. we set a break point on `main`
3. we run the program which will stop in `main`

![disas](/upload/disas.png)

We notice a bunch of `movb $0x0 ...`

Is this it? Let's put a breakpoint on the first one and see what the stack pointer (`rsp`) is pointing to.

![b](/upload/b.png)

It's pointing to the string "hello" as we guessed.

![x](/upload/x.png)

Going to the next instruction via `ni`, we can then see that the first letter `h` has been removed. Going over the next instructions, we see that the full string end up being zero'ed.

![success](/upload/success.png)

**It's a success!**

[The full code](https://gist.github.com/mimoo/4d7a898da2f0e05d1a6b2afe0e9289e1) can be seen here as an `erase_from_memory.h` header file that you can just include in your codebase:

```c
#ifndef __ERASE_FROM_MEMORY_H__
#define __ERASE_FROM_MEMORY_H__ 1

#define __STDC_WANT_LIB_EXT1__ 1
#include <stdlib.h>
#include <string.h>

void *erase_from_memory(void *pointer, size_t size_data, size_t size_to_remove) {
#ifdef __STDC_LIB_EXT1__
memset_s(pointer, size_data, 0, size_to_remove);
#else
if(size_to_remove > size_data) size_to_remove = size_data;
volatile unsigned char *p = pointer;
while (size_to_remove--){
*p++ = 0;
}
#endif
return pointer;
}

#endif // __ERASE_FROM_MEMORY_H__
```

Many thanks to Robert Seacord!

PS: [here is how libsodium does it](https://github.com/jedisct1/libsodium/blob/be58b2e6664389d9c7993b55291402934b43b3ca/src/libsodium/sodium/utils.c#L78:L101)

EDIT: As [Colin Percival wrote here](http://www.daemonology.net/blog/2014-09-06-zeroing-buffers-is-insufficient.html), this problem is far from being solved. Secrets can get copied around in (special) registers which won't allow you to easily remove them. ]]>
integer promotion in C David Wong Thu, 03 Aug 2017 15:07:38 +0200 http://www.cryptologie.net/article/418/integer-promotion-in-c/ http://www.cryptologie.net/article/418/integer-promotion-in-c/#comments
In spite of the obvious controversy of launching a new crypto library, I really like it. Note that this is not me officially endorsing the library, I just think it's cool and I would only consider using it after it had matured a bit more.

[The whole thing is one ~1500LOC file](https://github.com/LoupVaillant/Monocypher/blob/master/src/monocypher.c) and is pretty clear to read. It only [implements](http://loup-vaillant.fr/projects/monocypher/manual) a few crypto functions.

The blog post mentions a few bugs that were found in his library (and I appreciate how open he is about it). Here's an interesting one:

> Bug 5: signed integer overflow
> This one was sneaky. I wouldn't have caught it without UBSan.
> I was shifting a uint8_t, 24 bits to the left. I failed to realise that integer promotion means this unsigned byte would be converted to a signed integer, and overflow if the byte exceeded 127. (Also, on crazy platforms where integers are smaller than 32 bits, this would never have worked.) An explicit conversion to uint32_t did the trick.
> At this point, I was running the various sanitisers just to increase confidence. Since I used Valgrind already, I didn't expect to actually catch a bug. Good thing I did it anyway.
> Lesson learned: Never try anything serious in C or C++ without sanitisers. They're not just for theatrics, they catch real bugs.

This is the [problem patched](https://github.com/LoupVaillant/Monocypher/commit/347189c50c053cf13ce1310818c2913f4904c1eb).

![patch](/upload/Screen_Shot_2017-08-03_at_2.08_.20_PM_.png)

Simplified, the bad code really looks like this:

```c
uint32_t = uint8_t << 8 * i;
```

And all the theory behind the problem can be dismissed, if he had written his code with precautions. When I see something like this, the first thing I think about is that it should probably be written like this:

```c
uint32_t = (uint32_t)uint8_t << 8 * i;
```

This would avoid any weird C problems as a casting (especially to a bigger type) usually goes fine.

OK but what was the problem with the above code?

Well, in C some operations will usually promote the type to something bigger. [See the C standard](http://c0x.coding-guidelines.com/6.5.7.html):

> shift-expression << additive-expression
> The **integer promotions** are performed on each of the operands

What is an integer promotion? [See the C standard](http://c0x.coding-guidelines.com/6.3.1.1.html):

> If an int can represent all values of the original type, the value is converted to an int;
> otherwise, it is converted to an unsigned int.
> These are called the **integer promotions**

So looking back at our bad snippet:

```c
uint32_t = uint8_t << 8 * i;
```

1. the maximum value of `uint8_t` is 255, which can largely be hold in a `signed int` of 16-bit or 32-bit (depends on the architecture). So `01` is promoted to `00 00 00 01` if a `signed int` is 32-bit (which it probably is). (In the case were we would have been dealing with a `uint32-t`, there would have been no problems as "big" values that cannot be represented in a `signed int` of 32-bit would have been promoted to a `unsigned int` instead of a `signed int`.)
2. the bits are shifted on the left. For example of 8 places `00 00 01 00`.
3. the result gets casted to uint32_t. We still get `00 00 01 00`.

This doesn't look like an issue, and it probably isn't most of the time. Now imagine if in 1. our value was `80` (which is `1000 0000` in bits).

Imagine now that in 2. we shift it of 24 bits on the left, that will give us `80 00 00 00` which is an all zero bitstring except for the most significant bit (MSB). In an `int` type the MSB is the signing bit. I believe at this point, the value will be automatically sign extended to the size of the register, so in your 64-bit machine it will be saved as `ff ff ff ff 80 00 00 00`.

Now in 3. The result now get casted to a uint32_t. Which doesn't do anything but change the value of the pointer. But we now have a wrong result! What we wanted here was `00 00 00 00 80 00 00 00`. If you're not convinced, you can run the following script on your computer:


```c
#include <stdio.h>
#include <stdint.h>

int main(){

uint8_t start = -1;
printf("%x\n", start); // prints 0xff
uint64_t result = start << 24;
printf("%llx\n", result); // should print 00000000ff000000, but will print ffffffffff000000
result = (uint64_t)start << 24;
printf("%llx\n", result); // prints 00000000ff000000
return 0;
}
```

Looking at the binary in Hopper we can see this:

![reverse](/upload/Screen_Shot_2017-08-03_at_2.23_.58_PM_.png)

And we notice the `movsxd` instruction which is "move doubleword to quadword with sign-extension".
It moves the result of the shift left (`shl`) into a register, making sure that its result is the same for an int64_t which is the maximum value your register can hold. ]]>
How did length extension attacks made it into SHA-2? David Wong Thu, 03 Aug 2017 10:49:25 +0200 http://www.cryptologie.net/article/417/how-did-length-extension-attacks-made-it-into-sha-2/ http://www.cryptologie.net/article/417/how-did-length-extension-attacks-made-it-into-sha-2/#comments
The attack targets such hashes: `SHA-256(key | message)` where the `key` is **secret** and where `|` means concatenation.

This is because a **SHA-2** hash (unless we're talking about the truncated versions) is literally a full copy of the state of the hash. It is not the state of hashing `key` and `message`, but rather `key` and `message` and some `padding`. Because like everything in the symmetric crypto world you need to pad to the block size. I believe this is 512 bits in the **Secure Hash Algorithm 2**.

The attack lets you take such a hash, and continue the hashing to obtain the hash of `key | message | padding | more` where `more` is whatever you want. And all of this **without any knowledge of the secret key!**

![merkle damgard](/upload/fig2.png)

Interestingly, this comes from the way the [Merkle-Damgard](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction) construction is applied (without a good finalization function). And because of this hash functions like **MD4, MD5, SHA-1** and **SHA-2** have all suffered from the same issues. You'd be glad to hear that this issue is fixed in any of the **SHA-3** contestant (read: **BLAKE2** and **SHAKE** and **SHA-3** are fine). **Keccak** (SHA-3's winner) fixes it by using a [Sponge construction](https://www.cryptologie.net/article/416/the-strobe-protocol-framework/), not letting you see a big part of the state (the capacity) while **BLAKE2** fixes it by using the [HAsh Iterative FrAmework](https://en.wikipedia.org/wiki/HAIFA_construction) (HAIFA), using a "number of bits hashed so far" (not including the padding) inside of the compression function.

![haifa](/upload/fig13.png)

While looking at the exact date length extension attacks were found (which I couldn't find), [Samuel Neves](https://twitter.com/sevenps?ref_src=twsrc%5Etfw&ref_url=https%3A%2F%2Fwww.cryptologie.net%2F) came up with an interesting response.

![twitter](/upload/Screen_Shot_2017-08-03_at_9.55_.39_AM_.png)

It looks like the NIST was made aware, during the standardization process of SHA-2, that simple fixes would **prevent** length extension attacks.

[This comment from John Kelsey](http://www.cs.utsa.edu/~wagner/CS4363/SHS/dfips-180-2-comments1.pdf) (who later joined the NIST) is from 28 august 2001 (by the way it doesn't make sense to write dates as month/day/year. Nobody can understand it outside of the US. We have an ISO format that specifies a logical year-month-day). In it he talks about the attack, and proposes a simple fix:

> Niels Ferguson suggested the following simple fix to me, some time ago: Choose some nonzero constant C0, of the same size as the hash function chaining variable. Hash messages normally, until we come to the last block in the padded message. XOR C0 into the chaining variable input into that last compression function computation. The resulting compression function output is used as the hash result. For concreteness, I propose C0 = 0xa5a5...a5, with the 0xa5 repeated until every byte is filled in. This should be interpreted in little-endian bit ordering.

Why did the NIST ignore this when it could have modified the draft before publication? I have no idea. Is this one more fuck up from their part? ]]>
The Strobe Protocol Framework David Wong Wed, 02 Aug 2017 19:24:48 +0200 http://www.cryptologie.net/article/416/the-strobe-protocol-framework/ http://www.cryptologie.net/article/416/the-strobe-protocol-framework/#comments
## Introduction

The **Strobe Protocol Framework** is a specification, [available here](http://strobe.sourceforge.io/), which you can use to implement a primitive called the Strobe Duplex Construction. The implemented Strobe object should respond to a dozen of calls that can be combined together to allow you to generate random numbers, derive keys, hash, encrypt, authenticate, and even build complex symmetric protocols.

The thing is sexy for several reasons:

1. you only use a single primitive to do all of your symmetric crypto
2. it makes the code size of your library extremely **small**, easy to fit in embedded devices and **easy to audit**
3. on top of that it allows you to create TLS-like protocols
4. every message/operation of your protocol depends on all the previous messages/operations

The last one might remind you of [Noise](http://noiseprotocol.org/) which is a protocol framework as well that mostly focus on the asymmetric part (handshake). More on that later :)

## Overview

From a high level point of view, here is a very simple example of using it to **hash a string**:

```py
myHash = Strobe_init("hash")
myHash.AD("something to be hashed")
hash = myHash.PRF(outputLen=16)
```

You can see that you first instantiate a Strobe object with a custom name. I chose "hash" here but it could have been anything. The point is to personalize the result to your own protocol/system: initializing Strobe with a different name would give you a different hash function.

Here two functions are used: `AD` and `PRF`. The first one to insert the data you're about to hash, the second one to obtain a digest of 16 bytes. Easy right?

Another example to **derive keys**:

```py
KDF = Strobe_init("deriving keys for something")
KDF.KEY(keyInput)
key1 = KDF.PRF(outputLen=16)
key2 = KDF.PRF(outputLen=16)
```

Here we use a new call `KEY` which is similar to `AD` but provides [forward-secrecy](https://en.wikipedia.org/wiki/Forward_secrecy) as well. It is not needed here but it looks nicer and so I'll use it. We then split the output in two in order to form two new keys out of our first one.

Let me now give you a more complex example. So far we've only used Strobe to create primitives, what if I wanted to **create a protocol**? For example on the client side I could write:

```py
myProtocol = Strobe_init("my protocol v1.0")
myProtocol.KEY(sharedSecret)
buffer += myProtocol.send_ENC("GET /")
buffer += myProtocol.send_MAC(len=16)
// send the buffer
// receive a ciphertext
message = myProtocol.recv_ENC(ciphertext[:-16])
ok = myProtocol.recv_MAC(ciphertext[-16:])
if !ok {
// reset the connection
}
```

Since this is a symmetric protocol, something similar should be done on the server side.
The code above initializes an instance of Strobe called "my protocol v1.0", and then keys it with a pre-shared secret or some key exchange output. Whatever you like to put in there. Then it encrypts the GET request and sends the ciphertext along with an authentication tag of 16 bytes (should be enough). The client then receives some reply and uses the inverse operations to decrypt and verify the integrity of the message. This is what the server must have done when it received the GET request as well. This is pretty simple right?

There's so much more Strobe can do, it is up to you to build your own protocol using the different calls Strobe provides. Here is the full list:

* **AD**: Absorbs data to authenticate.
* **KEY**: Absorbs a key.
* **PRF**: Generates a random output (forward secure).
* **send_CLR**: Sends non-encrypted data.
* **recv_CLR**: Receives non-encrypted data.
* **send_ENC**: Encrypts data.
* **recv_ENC**: Decrypts data.
* **send_MAC**: Produces an authentication tag.
* **recv_MAC**: Verifies an authentication tag.
* **RATCHET**: Introduce forward secrecy.

There are also meta variants of some of these operations which allow you to specify that what you're operating on is some frame data and not the real data itself. But this is just a detail.

## How does it work?

Under its surface, Strobe is a duplex construction. Before I can explain that, let me first explain the **sponge construction**.

![permutation](/upload/Screen_Shot_2017-08-02_at_5.45_.06_PM_.png)

A sponge belongs to a field in cryptography called **permutation-based cryptography**. This is because at its core, it works on top of a permutation. The whole security of the thing is proven as long as your permutation is secure, meaning that it behaves like a random oracle. **What's a permutation?** Oh sorry, well, imagine the AES block cipher with a fixed key of `00000000000000000`. It takes all the possible inputs of 128-bit, and it will give you all the possible outputs of 128-bit. It's a one-to-one mapping, for one plaintext there is always one ciphertext. That's a permutation.

**SHA-3** is based on the sponge construction by the way, and it uses the **keccak-f[1600]** permutation at its core. Its security was assessed by long years of cryptanalysis (read: people trying to break it) and it works very similarly as AES: it has a series of steps that modify an input, and these steps are repeated many many times in what we call rounds. AES-128 has 10 rounds, Keccak-f[1600] has 24 rounds. The 1600 part of the name means that it has an input/ouput size of 1600 bits.

![public/secret](/upload/Screen_Shot_2017-08-02_at_5.54_.53_PM_.png)

So here our permutation is Keccak-f[1600], and we imagine that our input/output is divided into two parts: the public part (rate) and the secret part (capacity). Intuitively we'll say that the bigger the secret part is, the more secure the construction is. And indeed, SHA-3 has several flavors that will use different sizes according to the security advertised.

![absorb](/upload/Screen_Shot_2017-08-02_at_5.45_.17_PM_.png)

The message is padded and split into multiple blocks of the same size as the public part. To **absorb** them into our sponge, we just XOR each blocks with the public part of the state, then we permute the state.

![squeeze](/upload/Screen_Shot_2017-08-02_at_5.45_.23_PM_.png)

To obtain an output from this construction, we just retrieve the public part of our state. If it's not enough, we permute to modify the state of the sponge, then we collect the new public part so that it can be appended to the previous one. And we continue to do that until we have enough. If it's too much we truncate :)

And that's it! It's a sponge, we absorb and we squeeze. Makes sense right?

This is exactly how SHA-3 works, and the output is your hash.

What if we're not done though? What if we want to continue absorbing, then squeeze again, then absorb again, etc... This would give us a nice property: everything that we squeeze will depend on everything that has been absorbed and squeezed so far. This provides us **transcript consistency**.

![duplex](/upload/Screen_Shot_2017-08-02_at_6.00_.28_PM_.png)

The [Keccak team](http://keccak.noekeon.org/) said we can, and they created the **Duplex construction**. It's just something that allows us to absorb, to squeeze, to absorb, to squeeze, and on and on...

## Building Strobe

"How is Strobe constructed on top of the Duplex construction?" you may ask. And I will give you an intuition of an answer.

Strobe has fundamentally 3 types of **internal** operations, that are used to build the operations we've previously saw (`KEY`, `AD`, `Send_ENC`, ...). They are the following:

* **default**: `state = input ⊕ state`
* **cbefore**: `state = input`
* **cafter**: `output, state = input ⊕ state`

The **default** one simply absorbs the input with the state. This is useful for any kind of operation since we want them to affect the outcome of the next ones.

The **cbefore** internal operation allows you to replace the bits of the state with your input. This is useful when we want to provide forward-secrecy: if the state is later leaked, the attacker will not be able to recover a previous state since bits of the rate have been erased. This is used to construct the `KEY`, `RATCHET` and `PRF` operations. While `KEY` replaces the state with bits from a key, `RATCHET` and `PRF` replaces the state with zeros.

**cafter** is pretty much the same as the **default** operation, except that it also retrieves the output of the XOR. If you've seen how stream ciphers or one-time pads work, you might have recognized that this is how we can encrypt our plaintext. And if it wasn't more obvious to you, this is what will be used to construct the `Send_ENC` operations.

There is also one last thing: an internal flag called `forceF` that allows you to run the permutation **before** using any one of these internal operations. This is useful when you need to produce something from the Duplex construction: a ciphertext, a random number, a key, etc... Why? Because we want the result to depend on what happened previously, and since we can have many operations per block size we need to do this. You can imagine problems if we were not to do that: an encryption operation that would not depend on the previously inserted key for example.

Let's see some examples!

![KEY](/upload/Screen_Shot_2017-08-02_at_6.10_.10_PM_.png)

We'll start by keying our protocol. **We first absorb the name of the operation** (Strobe is verbose). We then permute (via the `forceF` flag) to start on a fresh block. Since the `KEY` operation also provides forward-secrecy, the `cbefore` internal operation is used to replace the bits of the state with the bits of the input (the key).

![Send_ENC](/upload/Screen_Shot_2017-08-02_at_6.10_.14_PM_.png)

After that we want to encrypt some data. We'll absorb the name of the operation (`send_ENC`), we'll permute (`forceF`) and we'll XOR our plaintext with the state to encrypt it. We can then send that ciphertext, which is coincidentally also part of the new state of our duplex construction.

I'll give you two more examples. We can't just send encrypted data like that, we need to protect its integrity. And why not including some additional data that we want to authenticate:

![AD, Send_MAC](/upload/Screen_Shot_2017-08-02_at_6.14_.03_PM_.png)

You'll notice, `AD` does not need to permute the Strobe state, this is because we're not sending anything (or obtaining an output from the construction) so we do not need to depend on what has happened previously yet. For the `send_MAC` operation we do need that though, and we'll use the `cafter` internal operation with an input of 16 zeros to obtain the first 16 bytes of the state.

In these description, I've simplified Strobe and omitted the padding. There is also a flag that is differently set depending on who sent the first message. All these details can be learned through [the specification](https://strobe.sourceforge.io/specs/).

## Now what?

Go play with it! Here is a list of things:

* [the specification is here](https://strobe.sourceforge.io/specs/)
* [the white paper is here](https://strobe.sourceforge.io/papers/)
* [you can subscribe to the mailing lists here](https://sourceforge.net/p/strobe/mailman/strobe-announce/)
* [Mike Hamburg's talk at Real World Crypto](https://www.youtube.com/watch?v=l7xV5z1eJLw)
* [the reference C and python implementations](https://strobe.sourceforge.io/code/)
* [my readable Go implementation](https://github.com/mimoo/StrobeGo/blob/master/strobe/strobe.go)

Note that this is still a beta, and it's still **experimental**.



]]>
Interview: How to pique your curiosity in cryptography David Wong Sun, 30 Jul 2017 23:52:21 +0200 http://www.cryptologie.net/article/415/interview-how-to-pique-your-curiosity-in-cryptography/ http://www.cryptologie.net/article/415/interview-how-to-pique-your-curiosity-in-cryptography/#comments
> We talked to the cryptographer David Wong about crypto-related blogs worth reading and exploring in an interview. We also asked him about the changing landscape of the crypto-world and the awareness of IT security issues.

[You can read the full interview here](https://netzpolitik.org/2017/interview-how-to-pique-your-curiosity-in-cryptography/).

The list of crypto/security blogs I maintain is available [here on Github](https://github.com/mimoo/crypto_blogs). ]]>
SHA-3 vs the world: slides David Wong Sat, 29 Jul 2017 18:38:10 +0200 http://www.cryptologie.net/article/414/sha-3-vs-the-world-slides/ http://www.cryptologie.net/article/414/sha-3-vs-the-world-slides/#comments
]]>
BEAST: An Explanation of the CBC Padding Oracle Attack on TLS David Wong Fri, 28 Jul 2017 17:41:13 +0200 http://www.cryptologie.net/article/413/beast-an-explanation-of-the-cbc-padding-oracle-attack-on-tls/ http://www.cryptologie.net/article/413/beast-an-explanation-of-the-cbc-padding-oracle-attack-on-tls/#comments
<iframe width="853" height="480" src="https://www.youtube.com/embed/-_8-2pDFvmg" frameborder="0" allowfullscreen></iframe> ]]>
Defcon: SHA-3 vs the world David Wong Mon, 10 Jul 2017 13:00:21 +0200 http://www.cryptologie.net/article/412/defcon-sha-3-vs-the-world/ http://www.cryptologie.net/article/412/defcon-sha-3-vs-the-world/#comments
It will be about recent hash functions, it will focus a lot on SHA-3 and it will try to avoid any of the recent controversy on which hash function is better (it will be hard but I will try to be neutral and fair).

It'll be recorded if you can't make it. If you can make it, head to the crypto village at 11am on Friday. [See the Defcon Crypto Village schedule here](https://cryptovillage.org/dc25/schedule.html). And here is the abstract:

> Since Keccak has been selected as the winner of the SHA-3 competition in 2012, a myriad of different hash functions have been trending. From BLAKE2 to KangarooTwelve we'll cover what hash functions are out there, what is being used, and what you should use. Extending hash functions, we’ll also discover STROBE, a symmetric protocol framework derived from SHA-3.

![sponge](/upload/Screen_Shot_2017-07-10_at_2.22_.30_PM_.png) ]]>
How big are TLS records during a handshake? David Wong Thu, 22 Jun 2017 16:31:42 +0200 http://www.cryptologie.net/article/410/how-big-are-tls-records-during-a-handshake/ http://www.cryptologie.net/article/410/how-big-are-tls-records-during-a-handshake/#comments
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Around how many bytes is a Client Hello in TLS 1.2?</p>— David Wong (@lyon01_david) <a href="https://twitter.com/lyon01_david/status/877843516960985088">June 22, 2017</a></blockquote>

<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Approximately how many byte is a Server Hello in TLS 1.2?</p>— David Wong (@lyon01_david) <a href="https://twitter.com/lyon01_david/status/877843723870027776">June 22, 2017</a></blockquote>

People mostly got it right for the **Client Hello**. But it wasn't as easy for the **Server Hello**.

**Client Hello** → 212 bytes

**Server Hello** → 66 bytes

These are just numbers I got from a TLS 1.2 handshake with a random website. These numbers are influenced by the browser I use and the configuration of the server. But they should be close to that range anyway as the structure of a Client Hello or a Server Hello are quite simple.

<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">In a TLS 1.2 handshake. What is bigger?</p>— David Wong (@lyon01_david) <a href="https://twitter.com/lyon01_david/status/877894618372878337">June 22, 2017</a></blockquote>
A better question would be: what is the bigger message? and the **Client Hello** would always win. This is because the **Server Hello** only replies with one choice from the list of choices the client proposed. For example the server will choose only one ciphersuite from the 13 suites the client proposed. The server will choose one curve from the 3 different curves proposed by the client. The server will choose a single signature algorithm from the client's 10 propositions. And on and on...

<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Approximately how many bytes is a Server's Certificate in TLS 1.2?</p>— David Wong (@lyon01_david) <a href="https://twitter.com/lyon01_david/status/877844553402912772">June 22, 2017</a></blockquote>

Everyone (mostly) got this one!

**Certificate** → 2540 bytes

Obviously, this is the biggest message of the handshake by far. The number I got is from receiving two different certificates where each certificate is about a thousand bytes. This is because servers tend to send the full chain of certificates to the client, longer chains will increase the size of this message. Probably why there are propositions for a [certification compression extension](https://www.ietf.org/mail-archive/web/tls/current/msg22875.html).

<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Approximately how many bytes is a ClientKeyExchange in TLS 1.2?</p>— David Wong (@lyon01_david) <a href="https://twitter.com/lyon01_david/status/877850352531509248">June 22, 2017</a></blockquote>

<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Approximately how many bytes is a ServerKeyExchange in TLS 1.2?</p>— David Wong (@lyon01_david) <a href="https://twitter.com/lyon01_david/status/877850238203166720">June 22, 2017</a></blockquote>

**ServerKeyExchange** → 338 bytes

**ClientKeyExchange** → 75 bytes

Both of these messages include the peer's public key during ephemeral key exchanges. But the **ServerKeyExchange** additionally contains the parameters of the key exchange algorithm and a signature of the server's public key. In my case, the signature was done with RSA-2048 and of size 256 bytes, while the NIST p256 public keys were of size 65 bytes.

Using ECDSA for signing, signatures could have been smaller. Using FFDH for the key agreement, public keys could have been bigger.
[Tim Dierks](https://twitter.com/tdierks) also mentioned that using RSA-10000 would have drastically increased the size of the **ServerKeyExchange**.
Maybe a better question, again, would be which one is the bigger message.

<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Which one is bigger in a TLS 1.2 handshake?</p>— David Wong (@lyon01_david) <a href="https://twitter.com/lyon01_david/status/877850494378455040">June 22, 2017</a></blockquote>

People mostly got it right here!

The rest of the handshake is negligible:

**ChangeCipherSpec** is just 6 bytes indicating a switch to encryption, it will always be the same size no matter what kind of handshake you went through, most of its length comes from the record's header.

**Finished** is 45 bytes. Its content is a MAC of the handshake transcript, but an additional MAC is added to protect the integrity of the ciphertext (ciphertext expansion). Remember, **Finished** is the first (and only) encrypted message in a handshake.

<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> ]]>
Crypto training at Black Hat USA David Wong Tue, 20 Jun 2017 12:02:19 +0200 http://www.cryptologie.net/article/409/crypto-training-at-black-hat-usa/ http://www.cryptologie.net/article/409/crypto-training-at-black-hat-usa/#comments
It will be a blend of culture, exercises and technical dives. For 2 days, students get to learn all the cool crypto attacks, get to dive into some of them deeply, and get to interact via numerous exercises. ]]>
Noise+Strobe=Disco David Wong Mon, 19 Jun 2017 13:39:46 +0200 http://www.cryptologie.net/article/408/noisestrobedisco/ http://www.cryptologie.net/article/408/noisestrobedisco/#comments
[Noise](http://noiseprotocol.org/) is a protocol framework allowing you to build different lightweight TLS-like handshakes depending on your use case. Benefits are a short code size, very few dependencies, simplicity of the security guarantees and analysis. It focuses primarily on the initial asymmetric phase of the setup of a secure channel, but does leave you with two ciphers that you can use to read and write on both sides of the connection. If you want to know more, I wrote a [readable implementation](https://github.com/mimoo/NoiseGo/blob/master/readable/noise.go), and have a [tutorial video](https://www.cryptologie.net/article/349/the-noise-protocol-framework/).

[Strobe](https://strobe.sourceforge.io/) is a protocol framework as well, focusing on the symmetric part of the protocol. Its simplicity boils down to only using one cryptographic primitive: the duplex construction. Which allows developers to benefit from an ultra short cryptographic code base supporting their custom-made symmetric protocols as well as their different needs of cryptographic functions. Indeed, Strobe can be used as well to instantiate a hash function, a key derivation function, a pseudo-random number generator, a message authentication code, an authenticated encryption with associated data cipher, etc... If you want to know more, I wrote a [readable implementation](https://www.cryptologie.net/article/398/strobego/) and [Mike Hamburg gave a talk at RWC](https://www.youtube.com/watch?v=l7xV5z1eJLw).

**Noise+Strobe=Disco**. One of Noise's major character is that it keeps a running hash, digesting every message and allowing every new handshake message to mix the transcript in its encryption while authenticating previous messages received and sent. Strobe works like that naturally. Its duplex function absorbs every calls being made to the underlying primitive (the Keccak permutation), to the extent that every new operation is influenced by any operation that happened previously. These two common traits in Strobe and Noise led me to pursue a merge between the two: what if that running hash and symmetric state in Noise was simply Strobe's primitive? And what if at the end of a handshake Noise would just spew out two Strobe's objects also depending on the handshake transcript? I talked to Trevor Perrin about it and his elegant suggestion for a name (Disco) and my curiosity led to [an implementation of what it would look like](https://github.com/mimoo/NoiseGo/blob/master/disco/disco.go).

This is of course highly experimental. I [modified the Noise's specification](https://github.com/mimoo/NoiseGo/blob/master/disco/disco.md) to see how much I could remove/simplify from it and the result is already enjoyable.

I've discussed the changes on [the mailing list](https://moderncrypto.org/mail-archive/noise/2017/001122.html). But simply put: the CipherState has been removed, the SymmetricState has been replaced by calls to Strobe. This leaves us only with one object: the HandshakeState. Every symmetric algorithm has been removed (HDKF, HMAC, HASH, AEAD). The specification looks way shorter, while the Disco implementation is more than half the size of the Noise implementation.

The Strobe's calls naturally absorbs every operation, and can encrypt/decrypt the handshake messages even if no shared secret has been negotiated (with a non-keyed duplex construction), which simplifies corner cases where you would have to test if you have already negotiated a shared secret or not. ]]>
Readable implementation of the Noise protocol framework David Wong Mon, 19 Jun 2017 13:21:01 +0200 http://www.cryptologie.net/article/407/readable-implementation-of-the-noise-protocol-framework/ http://www.cryptologie.net/article/407/readable-implementation-of-the-noise-protocol-framework/#comments
To learn more about Noise you can also check this screencast I shot last year:

<iframe width="853" height="480" src="https://www.youtube.com/embed/ceGTgqypwnQ" frameborder="0" allowfullscreen></iframe>

My current research includes merging this framework with [the Strobe protocol framework](https://strobe.sourceforge.io/) [I've talked about previously](https://www.cryptologie.net/article/392/strobes-padding/).

This led me to first implement a readable and understandable version of Noise [here](https://github.com/mimoo/NoiseGo/blob/master/readable/noise.go).

Note that this is highly experimental and it has not been thoroughly tested.

I also had to deviate from the specification when naming things because Golang:

* doesn't use snake_case, but Noise does.
* capitalizes function names to make them public, Noise does it for different reasons.
]]>
SIMD instructions in Go David Wong Mon, 19 Jun 2017 11:29:39 +0200 http://www.cryptologie.net/article/406/simd-instructions-in-go/ http://www.cryptologie.net/article/406/simd-instructions-in-go/#comments
Note that we do not need to use SSE3/SSE4 (or AVX2) as the interesting functions are contained in SSE2 (respectively AVX) which will have more support and be contained in greater versions of SSE (respectively AVX) anyway.

[The official Blake2](https://github.com/golang/crypto/tree/master/blake2b) implementation in Go actually uses SIMD instructions. Looking at it is a good way to see how SIMD coding works in Go.

In [_amd64.go](https://github.com/golang/crypto/blob/master/blake2b/blake2bAVX2_amd64.go#L9:L12), they use the builtin `init()` function to figure out at runtime what is supported by the host architecture:

```go
func init() {
useAVX2 = supportsAVX2()
useAVX = supportsAVX()
useSSE4 = supportsSSE4()
}
```

Which are calls to assembly functions detecting what is supported either via:

1. [a CPUID call directly](https://github.com/golang/crypto/blob/master/blake2b/blake2b_amd64.s#L284) for SSE4.
1. [calls to Golang's runtime library](https://github.com/golang/crypto/blob/master/blake2b/blake2bAVX2_amd64.s#L753) for AVX and AVX2.

In the second solution, the [runtime variables](https://golang.org/src/runtime/runtime2.go#L746) seems to be undocumented and only available since [go1.7](https://github.com/golang/crypto/blob/master/blake2b/blake2b_amd64.go#L5), they are probably filled via [cpuid](https://github.com/minio/blake2b-simd/blob/master/cpuid.go#L30:L60) calls as well. Surprisingly, the [internal/cpu package](https://github.com/golang/go/blob/master/src/internal/cpu/cpu.go#L15:L32) already has all the necessary functions to detect flavors of SIMD. See an example of use in the [bytes package](https://github.com/golang/go/blob/master/src/bytes/bytes_amd64.go#L19).

And that's it! Blake2's [hashBlocks()](https://github.com/golang/crypto/blob/master/blake2b/blake2bAVX2_amd64.go#L33LL43) function then dynamically decides which function to use at runtime:

```go
func hashBlocks(h *[8]uint64, c *[2]uint64, flag uint64, blocks []byte) {
if useAVX2 {
hashBlocksAVX2(h, c, flag, blocks)
} else if useAVX {
hashBlocksAVX(h, c, flag, blocks)
} else if useSSE4 {
hashBlocksSSE4(h, c, flag, blocks)
} else {
hashBlocksGeneric(h, c, flag, blocks)
}
}
```

Because Go does not have [intrisic functions](https://groups.google.com/forum/#!topic/golang-nuts/yVOfeHYCIT4) for SIMD, these are implemented directly in assembly. You can look at the code in the relevant [_amd64.s](https://github.com/golang/crypto/blob/master/blake2b/blake2bAVX2_amd64.s) file. Now it's kind of tricky because Go has invented its own assembly language (based on [Plan9](http://doc.cat-v.org/plan_9/4th_edition/papers/asm)) and you have to find out things the hard way. Instructions like [VINSERTI128](http://www.felixcloutier.com/x86/VINSERTI128.html) and [VPSHUFD](http://www.felixcloutier.com/x86/PSHUFD.html) are the SIMD instructions. MMX registers are M0...M7, SSE registers are X0...X15, AVX registers are Y0, ..., Y15. [MOVDQA](http://www.felixcloutier.com/x86/MOVDQA.html) is called MOVO (or MOVOA) and [MOVDQU](http://www.felixcloutier.com/x86/MOVDQU.html) is called MOVOU. Things like that.

As for AVX-512, Go probably still doesn't have instructions for that. So you'll need to write the raw opcodes yourself using `BYTE` ([like here](https://github.com/golang/crypto/blob/master/blake2b/blake2bAVX2_amd64.s#L115)) and [as explained here](https://golang.org/doc/asm#unsupported_opcodes).

]]>
SIMD instructions in crypto David Wong Mon, 19 Jun 2017 10:21:22 +0200 http://www.cryptologie.net/article/405/simd-instructions-in-crypto/ http://www.cryptologie.net/article/405/simd-instructions-in-crypto/#comments
## MMX, SSE, SSE2, AVX, AVX2, AVX-512

To support parallelization, a common way is to use [SIMD instructions](https://en.wikipedia.org/wiki/SIMD), a set of instructions generally available on any modern 64-bit architecture that allows computation on large blocks of data (64, 128, 256 or 512 bits). Using them to operate in blocks of data is what we often call [vector/array programming](https://en.wikipedia.org/wiki/Array_programming), the compiler will sometimes [optimize your code](https://en.wikipedia.org/wiki/Automatic_vectorization) by automatically using these large SIMD registers.

SIMD instructions have been here since the 70s, and have become really common. This is one of the reason why image, sound, video and games all work so well nowadays. Generally, if you're on a 64-bit architecture your CPU will support SIMD instructions.

There are several versions of these instructions. On Intel's side these are called MMX, SSE and AVX instructions. AMD has SSE and AVX instructions as well. On ARM these are called NEON instructions.

MMX allows you to operate on 64-bit registers at once (called MM registers). SSE, SSE2, SSE3 and SSE4 all allow you to use 128-bit registers (XMM registers). AVX and AVX2 introduced 256-bit registers (YMM registers) and the more recent AVX-512 supports 512-bit registers (ZMM registers).

## How To Compile?

OK, looking back at the [Keccak Code Package](https://github.com/gvanas/KeccakCodePackage/), [I need to choose what architecture to compile my Keccak code with](https://gist.github.com/mimoo/746205bc29e171ba8ad5b75793283057) to take advantage of the parallelization. I have a macbook pro, but have no idea what kind version of SSE or AVX my CPU model supports. One way to find out is to use [www.everymac.com](http://www.everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-i7-3.1-13-early-2015-retina-display-specs.html) → I have an Intel CPU [Broadwell](https://en.wikipedia.org/wiki/Broadwell_\(microarchitecture\)) which seems to support AVX2!

Looking at the list of architectures supported by the Keccak Code Package I see [Haswell](), which is of the same family and supports AVX2 as well. Compiling with it, [I can run my KangarooTwelve code with AVX2 support](https://gist.github.com/mimoo/746205bc29e171ba8ad5b75793283057), which parallelizes four runs of the Keccak permutation at the same time using these 256-bit registers!

In more details, the Keccak permutation goes through several rounds (12 for KangarooTwelve, 24 for ParallelHash) that need to serially operate on a succession of 64-bit lanes. AVX (no need for AVX2) 256-bit's registers allow four 64-bit lanes to be operated on at the same time. That's effectively four Keccak permutations running in parallel.

## Intrisic Instructions

[Intrisic functions](https://en.wikipedia.org/wiki/Intrinsic_function) are functions you can use directly in code, and that are later recognized and handled by the compiler.

[Intel has an awesome guide on these here](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX2&expand=3). You just need to find out which function to use, which is pretty straight forward looking at the documentation.

![doc](/upload/Screen_Shot_2017-06-18_at_7.11_.02_PM_.png)

In C, if you're compiling with GCC on an Intel/AMD architecture you can start using intrisic functions for SIMD by including [x86intrin.h](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/x86intrin.h). Or you can use this script to include the correct file for different combination of compilers and architectures:

```c
#if defined(_MSC_VER)
/* Microsoft C/C++-compatible compiler */
#include <intrin.h>
#elif defined(__GNUC__) && (defined(__x86_64__) || defined(__i386__))
/* GCC-compatible compiler, targeting x86/x86-64 */
#include <x86intrin.h>
#elif defined(__GNUC__) && defined(__ARM_NEON__)
/* GCC-compatible compiler, targeting ARM with NEON */
#include <arm_neon.h>
#elif defined(__GNUC__) && defined(__IWMMXT__)
/* GCC-compatible compiler, targeting ARM with WMMX */
#include <mmintrin.h>
#elif (defined(__GNUC__) || defined(__xlC__)) && (defined(__VEC__) || defined(__ALTIVEC__))
/* XLC or GCC-compatible compiler, targeting PowerPC with VMX/VSX */
#include <altivec.h>
#elif defined(__GNUC__) && defined(__SPE__)
/* GCC-compatible compiler, targeting PowerPC with SPE */
#include <spe.h>
#endif
```

If we look at the reference implementation of [KangarooTwelve in C](https://github.com/gvanas/KeccakCodePackage/blob/master/PlSnP/KeccakP-1600-times4/SIMD256/KeccakP-1600-times4-SIMD256.c) we can see how they decided to use the AVX2 instructions. [They first define](https://github.com/gvanas/KeccakCodePackage/blob/master/PlSnP/KeccakP-1600-times4/SIMD256/KeccakP-1600-times4-SIMD256.c#L876) a `__m256i` variable which will hold 4 lanes at the same time.

```c
typedef __m256i V256;
```

They then [declare a bunch of them](https://github.com/gvanas/KeccakCodePackage/blob/master/PlSnP/KeccakP-1600-times4/SIMD256/KeccakP-1600-times4-SIMD256.c#L444). Some of them will be used as temporary registers.

They then use [unrolling](https://www.cryptologie.net/article/399/loop-unrolling/) to write [the 12 rounds of Keccak](https://github.com/gvanas/KeccakCodePackage/blob/master/SnP/KeccakP-1600/Optimized/KeccakP-1600-unrolling.macros#L44). Which are [defined](https://github.com/gvanas/KeccakCodePackage/blob/master/PlSnP/KeccakP-1600-times4/SIMD256/KeccakP-1600-times4-SIMD256.c#L473) via relevant [AVX2 instructions](https://github.com/gvanas/KeccakCodePackage/blob/master/PlSnP/KeccakP-1600-times4/SIMD256/KeccakP-1600-times4-SIMD256.c#L39):

```c
#define ANDnu256(a, b) _mm256_andnot_si256(a, b)
#define CONST256(a) _mm256_load_si256((const V256 *)&(a))
#define CONST256_64(a) (V256)_mm256_broadcast_sd((const double*)(&a))
#define LOAD256(a) _mm256_load_si256((const V256 *)&(a))
#define LOAD256u(a) _mm256_loadu_si256((const V256 *)&(a))
#define LOAD4_64(a, b, c, d) _mm256_set_epi64x((UINT64)(a), (UINT64)(b), (UINT64)(c), (UINT64)(d))
#define ROL64in256(d, a, o) d = _mm256_or_si256(_mm256_slli_epi64(a, o), _mm256_srli_epi64(a, 64-(o)))
#define ROL64in256_8(d, a) d = _mm256_shuffle_epi8(a, CONST256(rho8))
#define ROL64in256_56(d, a) d = _mm256_shuffle_epi8(a, CONST256(rho56))
#define STORE256(a, b) _mm256_store_si256((V256 *)&(a), b)
#define STORE256u(a, b) _mm256_storeu_si256((V256 *)&(a), b)
#define STORE2_128(ah, al, v) _mm256_storeu2_m128d((V128*)&(ah), (V128*)&(al), v)
#define XOR256(a, b) _mm256_xor_si256(a, b)
#define XOReq256(a, b) a = _mm256_xor_si256(a, b)
#define UNPACKL( a, b ) _mm256_unpacklo_epi64((a), (b))
#define UNPACKH( a, b ) _mm256_unpackhi_epi64((a), (b))
#define PERM128( a, b, c ) (V256)_mm256_permute2f128_ps((__m256)(a), (__m256)(b), c)
#define SHUFFLE64( a, b, c ) (V256)_mm256_shuffle_pd((__m256d)(a), (__m256d)(b), c)
```

And if you're wondering how each of these `_mm256` function is used, you can check the [same Intel documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3,307,3,3,4759&text=_mm256_shuffle_pd)

![avx shuffle](/upload/Screen_Shot_2017-06-19_at_9.29_.25_AM_.png)

Voila! ]]>
Tamarin Prover Introduction David Wong Wed, 14 Jun 2017 21:42:58 +0200 http://www.cryptologie.net/article/404/tamarin-prover-introduction/ http://www.cryptologie.net/article/404/tamarin-prover-introduction/#comments
<iframe width="853" height="480" src="https://www.youtube.com/embed/XptJG19hDcQ" frameborder="0" allowfullscreen></iframe> ]]>
A New Public-Key Cryptosystem via Mersenne Numbers David Wong Sun, 11 Jun 2017 20:47:04 +0200 http://www.cryptologie.net/article/403/a-new-public-key-cryptosystem-via-mersenne-numbers/ http://www.cryptologie.net/article/403/a-new-public-key-cryptosystem-via-mersenne-numbers/#comments
![paper](/upload/Screen_Shot_2017-06-11_at_8.24_.27_PM_.png)

a lot of keywords here are really interesting. But first, **what is a Mersenne prime?**

A mersenne prime is simply a prime \\(p\\) such that \\(p=2^n - 1\\). The nice thing about that, is that the programming way of writing such a number is

```c
(1 << n) - 1
```

which is a long series of `1`s.

![mersenne](/upload/Screen_Shot_2017-06-11_at_8.27_.14_PM_.png)

A number modulo this prime can be any bitstring of the mersenne prime's length.

OK we know what's a Mersenne prime. How do we build our new **public key cryptosystem** now?

Let's start with a private key: `(secret, privkey)`, two bitstrings of low hamming weight. Meaning that they do not have a lot of bits set to `1`.

![privkey](/upload/Screen_Shot_2017-06-11_at_8.33_.35_PM_.png)

Now something very intuitive happens: the inverse of such a bitstring will probably have a high hamming weight, which let us believe that \\(secret \cdot privkey^{-1} \pmod{p}\\) **looks random**. This will be our **public key**.

Now that we have a `private key` and a `public key`. **How do we encrypt ?**

[The paper](http://eprint.iacr.org/2017/481) starts with a very simple scheme on how to encrypt a bit \\(b\\).

\\[ciphertext = (-1)^b \cdot ( A \cdot pubkey + B ) \pmod{p} \\]

with \\(A\\) and \\(B\\) two public numbers that have low hamming weights as well.

We can see intuitively that the ciphertext will have a high hamming weight (and thus might look random).

If you are not convinced, all of this is based on actual proofs that such operations between low and high hamming weight bitstrings will yield other low or high hamming weight bitstrings. All of this really work because we are modulo a \\(1111\cdots\\) kind of number. The following lemmas taken from the [paper](http://eprint.iacr.org/2017/481) are proven in section 2.1.

![lemma](/upload/Screen_Shot_2017-06-11_at_8.40_.07_PM_.png)

**How do you decrypt such an encrypted bit?**

This is how:

\\[ciphertext \cdot privkey \pmod{p}\\]

This will yield either a low hamming weight number → the original bit \\(b\\) was a \\(0\\),
or a high hamming weight number → the original bit \\(b\\) was a \\(1\\).

You can convince yourself by following the equation:

![decryption](/upload/Screen_Shot_2017-06-11_at_8.43_.18_PM_.png)

And again, intuitively you can see that everything is low hamming weight except for the value of \\((-1)^b\\).

![value](/upload/Screen_Shot_2017-06-11_at_8.44_.19_PM_.png)

This scheme doesn't look CCA secure nor practical. [The paper](http://eprint.iacr.org/2017/481) goes on with an explanation of a more involved cryptosystem in section 6.

**EDIT**: there is already a reduction of the security estimates [published on eprint](http://eprint.iacr.org/2017/522.pdf). ]]>
Is Symmetric Security Solved? David Wong Sun, 11 Jun 2017 16:35:14 +0200 http://www.cryptologie.net/article/402/is-symmetric-security-solved/ http://www.cryptologie.net/article/402/is-symmetric-security-solved/#comments
A few days ago, at the [Crypto SummerSchool in Croatia](https://summerschool-croatia.cs.ru.nl/2017/), **Nik Kinkel** told me that he would generally recommend against letting developers tweak the nonce value, based on how AES-GCM tended to be heavily mis-used in the wild. For a recap, if a nonce is used twice to encrypt two different messages **AES-GCM will leak the authentication key**.

I think it's a fair improvement of AES-GCM to remove the nonce argument. By doing so, nonces have to be **randomly generated**. Now the new danger is that the same nonce is randomly generated twice for the same key. The [birthday bound](https://www.cryptologie.net/article/166/the-birthday-paradox/) tells us that after \\(2^{n/2}\\) messages, \\(n\\) being the bit-size of a nonce, you have great odds of generating a previous nonce.

The optimal rekey point has been studied by [Abdalla and Bellare](https://cseweb.ucsd.edu/~mihir/papers/rekey.html) and can computed with the cubic root of the nonce space. If more nonces are generated after that, chances of a nonce collision are too high. For AES-GCM this means that after \\(2^{92/3} = 1704458900\\) different messages, the key should be rotated.

This is of course assuming that you use 92-bit nonces with 32-bit counters. Some protocol and implementations will actually fix the first 32 bits of these 92-bit nonces reducing the birthday bound even further.

Isn't that a bit low?

Yes it kinda is. An interesting construction by Dan J. Bernstein called [XSalsa20](http://cr.yp.to/snuffle/xsalsa-20081128.pdf) (and that can be extended to XChacha20) allow us to use nonces of 192 bits. This would mean that you should be able to use the same key for up to \\(2^{192/3} = 18446744073709551616\\) messages. Which is already twice more that what a `BIG INT` can store in a database

It seems like Sponge-based AEADs should benefit from large nonces as well since their rate can store even more bits. This might be a turning point for these constructions in the last round of the [CAESAR competition](https://aezoo.compute.dtu.dk/doku.php). There are currently 4 of these: Ascon, Ketje, Keyak and NORX.

With that in mind, is nonce mis-use resistance now fixed?

**EDIT**: Here is a list of recent papers on the subject:

* [Reconsidering the Security Bound of AES-GCM-SIV](https://eprint.iacr.org/2017/708)
* [Better Bounds for Block Cipher Modes of Operation via Nonce-Based Key Derivation](https://eprint.iacr.org/2017/702)
* [Increasing the Lifetime of Symmetric Keys for the GCM Mode by Internal Re-keying](https://eprint.iacr.org/2017/697)
* [updatable Authenticated Encryption](http://eprint.iacr.org/2017/527) ]]>
Implementation of Kangaroo Twelve in Go David Wong Sun, 11 Jun 2017 15:58:02 +0200 http://www.cryptologie.net/article/401/implementation-of-kangaroo-twelve-in-go/ http://www.cryptologie.net/article/401/implementation-of-kangaroo-twelve-in-go/#comments
It is heavily based on the official Go's [x/crypto/sha3](https://godoc.org/golang.org/x/crypto/sha3) library. But because of minor implementation details the relevant files have been copied and modified there so you do not need Go's SHA-3 implementation to run this package. Hopefully one day Go's SHA-3 library will be more flexible to allow other keccak construction to rely on it.

I have tested this implementation with different test vectors and it works fine. Note that it has not received proper peer review. If you look at the code and find issues (or not) please [let me know](https://www.cryptologie.net/contact/)!

See here [why you should use KangarooTwelve instead of SHA-3](https://www.cryptologie.net/article/393/kangarootwelve/). But [see here first why you should still not skip SHA-3](https://www.cryptologie.net/article/400/maybe-you-shouldnt-skip-sha-3/).

This implementation does not yet make use of **SIMD** to parallelize the implementation. But we can already see improvements due to the smaller number of rounds:

<table>
<tr>
<td></td><td>100 bytes</td><td>1000 bytes</td><td>10,000 bytes</td>
</tr>
<tr>
<td>K12</td><td>761 ns/op</td><td>1875 ns/op</td><td>15399 ns/op</td>
</tr>
<tr>
<td>SHA3</td><td>854 ns/op</td><td>3962 ns/op</td><td>34293 ns/op</td>
</tr>
<tr>
<td>SHAKE128</td><td>668 ns/op</td><td>2853 ns/op</td><td>29661 ns/op</td>
</tr>
</table>

This was done with a very simple [bench script](https://gist.github.com/mimoo/62c400454cee863bfbe41bc34f3b286a) on my 2 year-old macbook pro. ]]>
Maybe you shouldn't skip SHA-3 David Wong Fri, 02 Jun 2017 12:14:20 +0200 http://www.cryptologie.net/article/400/maybe-you-shouldnt-skip-sha-3/ http://www.cryptologie.net/article/400/maybe-you-shouldnt-skip-sha-3/#comments
Speed drives adoption, [Daniel J. Bernstein](https://news.ycombinator.com/item?id=11355742) (djb) probably understand this more than anyone else. And this is what led Adam Langley to decide to either stay on SHA-2 or move to BLAKE2. Is that a good advice? Should we all follow his steps?

I think it is important to remember that Google, as well as the other big players, have an agenda for speed. I'm not saying that they do not care about security, it would be stupid for me to say that, especially seeing all of the improvements we've had in the field thanks to them in the last few years (Project Zero, Android, Chrome, [Wycheproof](https://github.com/google/wycheproof), [Tink](https://github.com/google/tink), [BoringSSL](https://boringssl.googlesource.com/boringssl/), email encryption, [Key Transparency](https://github.com/google/keytransparency), [Certificate Transparency](https://www.certificate-transparency.org/), ...)

What I'm saying is that although they care deeply about security, they are also looking for compromises to gain speed. This can be seen with the push for [0-RTT](https://blog.cloudflare.com/introducing-0-rtt/) in TLS, but I believe we're seeing it here as well with a push for either KangarooTwelve (K12) or BLAKE2 instead of SHA-3/SHAKE and BLAKE (which are more secure versions of K12 and BLAKE2).

Adam Langley even went as far as to recommend folks to stay on SHA-2.

But how can we keep advising SHA-2 when we know it can be **badly** misused? Yes I'm talking about length extension attacks. These attacks that prevent you from using `SHA-2(key|data)` to create a Message Authentication Code (MAC).

We recently cared so much about **misuses** of [AES-GCM](https://eprint.iacr.org/2016/475) (**documented misuses!**) that Adam Langley's previous blog post to the SHA-3 one is about [AES-GCM-SIV](https://www.imperialviolet.org/2017/05/14/aesgcmsiv.html).
We cared so much about **simple APIs**, that recently [Tink](https://github.com/google/tink) simply removed the nonce argument from AES-GCM's API.

If we cared so much about documented misusage of crypto APIs, how can we not care about this **undocumented** misuse of SHA-2? Intuitively, if SHA-2 was to behave like a random oracle there would be no problem at all with the `SHA-2(key|data)` construction. Actually, none of the more secure hash functions like Blake and SHA-3 have this problem.

If we cared so much about simple APIs, why would we advise people to "fix" SHA-2 by truncating 256 bits of SHA-512's output (SHA-512/256)?

The reality is that **you should use SHA-3**. I'm making this as a broad recommendation for people who do not know much about cryptography. You can't go wrong with the NIST's standard.

Now let's see how we can dilute a good advice.

If you care about speed you should use SHAKE or BLAKE.

If you really care about speed you should use KangarooTwelve or BLAKE2.

If you really really care about speed you should use SHA-512/256. (**edit**: people are pointing to me that Blake2 is also faster than that)

If you really really really care about speed you should use CRC32 (don't, it's a joke).

How big of a **security compromise** are you willing to make? We know the big players have decided, but have they decided for you?

**Is SHA-3 that slow**?

Where is a hash function usually used? One major use is in signing messages. You hash the message before signing it because you can't sign something that is too big.

Here is a number taken from [djb's benchmarks](http://bench.cr.yp.to/results-sign.html) on how many cycles it takes to sign with ed25519: `187746`

Here is a number taken from [djb's benchmarks](https://bench.cr.yp.to/results-hash.html) on how many cycles it takes to hash a byte with keccak512: `15`

[Keccak has a page with more numbers for software performance](http://keccak.noekeon.org/sw_performance.html)

Spoiler Alert: you probably don't care about a few cycles.

Not to say there are no cases where this is relevant, but if you're doing a huge amount of hashing and you're hashing big messages, you should rather switch to a tree-hashing mode. KangarooTwelve, BLAKE2bp and BLAKE2sp all support that.

**EDIT**: The Keccak team just released [a response](http://keccak.noekeon.org/is_sha3_slow.html) as well. ]]>