david wong

Hey! I'm David, the author of the Real-World Cryptography book. I'm a crypto engineer at O(1) Labs on the Mina cryptocurrency, previously I was the security lead for Diem (formerly Libra) at Novi (Facebook), and a security consultant for the Cryptography Services of NCC Group. This is my blog about cryptography and security and other related topics that I find interesting.

Quick access to articles on this page:

more on the next page...

What's my birthday? posted September 2019

My colleague Mesut asked me if using random identifiers of 128-bit would be enough to avoid collisions.

I've been asked similar questions, and every time my answer goes something like this:

you need to calculate the number of outputs you need to generate in order to get good odds of finding collisions. If that number is impressively large, then it's fine.

The birthday bound is often used to calculate this. If you crypto, you must have heard of something like this:

with the SHA-256 hash function, you need to generate at least 2128 hashes in order to have more than 50% chance of finding collisions.

And you know that usually, you can just divide the exponent of your domain space by two to find out how much output you need to generate to reach such a collision.

Now, this figure is a bit deceiving when it comes to real world cryptography. This is because we probably don't want to define "OK, this is bad" as someone reaching the point of having 50% chance of finding a collision. Rather, we want to say:

someone reaching one in a billion chance (or something much lower) to find a collision would be bad.

In addition, what does it mean for us? How many identifiers are we going to generate per second? How much time are we willing to keep this thing secure?

To truly answer this question, one needs to plug in the correct numbers and play with the birthday bound formula. Since this is not the first time I had to do this, I thought to myself "why don't I create an app for this?" and voila.

birthday bound

You can play with it here.

Thanks to my tool, I can now answer Mesut's question:

If you generate one million identifiers per second, in 26 years you will have one in a billion chance to generate a collision. Is this enough? If this is not adversary-controlled, or it is rate-limited, you will probably not generate millions of identifiers per second though, but rather thousands, in this case it will take 265 centuries to get these odds.

3 comments

My book Real World Cryptography is out in pre-access posted June 2019

Manning Publications reached out to me last year with an opportunity for a book. I had been thinking of a book for quite some time, as I felt that the landscape lacked a book targeted to developers, students and engineers who did not want to learn about the history of cryptography, or have to sort through too many technical details and mathematic formulas, and wanted an up to date survey of modern applied cryptography. In addition, I love diagrams. I don’t understand why most books underuse them. When you think of AES-CTR what do you think about? I bet the excellent diagrams from Wikipedia just flash in your mind.

The book Real World Cryptography was released today in pre-access. This means you’ll be able to read new chapters as I write them, and be able to provide feedback on topics you wished I would include and questions you wish I would answer.

5 comments

Libra: a usable cryptocurrency posted June 2019

libra

At 2am this morning Libra was released.

and it seems to have broken the internet (sorry about that ^.^")

I've never worked on something this big, and I'm overwhelmed by all this reception. This is honestly pretty surreal from where I'm standing.

Libra is a cryptocurrency, which is on-par with other state-of-the-art blockchains. Meaning that it attempts to solve a lot of the problems Bitcoin originally had:

  • Energy Waste. The biggest reproach that people have on Bitcoin, is that it wastes a lot of our electricity. Indeed, because of the proof of work mechanism people constantly use machines to hash useless data in order to find new blocks. Newer cryptocurrencies, including Libra, make use of Byzantine Fault Tolerance (BFT) consensus protocols, which are pretty green by definition.
  • Efficiency. Bitcoin is notably slow, with a block being mined every 10 minutes, and a minimum confirmation time of one hour. BFT allows us to "mine" a new block every 3 seconds (in reality it can even go much faster).
  • Safety. Another problem with Bitcoin (or proof of work-based cryptocurrencies) is that it forks, constantly, and then re-organize itself around the "main chain". This is why one must wait several blocks to confirm that their transaction has been included. This concept is not great at all, as we've seen with Ethereum Classic which was forked (not so long ago) with more than 100 block in the past! BFT protocols never fork once they commit a block. What you see on the chain, is the final chain always. This is why it is so fast (and so sexy).
  • Stability. This one is pretty self-explanatory. Bitcoin's price has been anything but stable. Gamblers actually strive on that. But for a global currency to be useful, it has to keep a certain rate for people to use it safely. Libra uses a reserve of real assets to back the currency. This is the most conservative way to achieve stability, and it is probably the most contentious point about Libra, but one needs to remember that this is all in order to achieve stability. Stability is required if we want this to be useful for everyone.
  • Adoption. This final point is the most important in my opinion, and this is the reason I've joined Facebook on this journey. Adoption is the largest problem to all cryptocurrencies right now, even though you hear about them in the news very few people use them to actually transact (and most people use them to speculate instead). The mere size of the association (which is planned to reach 100 members from all around the world) and the user-base of Facebook is going to be a big factor in adoption. That's the most exciting thing about the project.

On top of that, it is probably one of the most interesting projects in cryptography right now. The codebase is in Rust, it uses the Noise Protocol Framework, it will include BLS signatures and formally verified smart contracts. And there's a bunch of other exciting stuff to discover!

If you're interested you should definitely check the many papers we've published:

I've read many comments about this project, and here's how I would summarize my point of view: this is a crazy and world-scale project. There are not many projects with such an impact, and we'll have to be very careful about how we walk towards that goal. How will it change the world? Like a lot of global projects, it will have its ups and downs, but I believe that this is a positive net worth project for the world (if it works). We're in a unique position to change the status quo for the better. It's going to be exciting :)

If you're having trouble understanding why this could work, think about it this way. You currently can't transact money easily as soon as you're crossing a border, and actually, for a lot of countries (like the US) even intra-border money transfers are a pain. Currently the best banks in the world are probably Monzo and Revolut, and they're not available everywhere. Why? Because the banking system is very complex. By using a cryptocurrency, you are skipping decades of progress and setting up a interoperable network. Any banks and custody wallets can now use this network. You literally get the same thing you would get with your normal bank (same privacy, same usability, etc.) except that now banks themselves have access to a cheap and global network. The cherry on top is that normal users can bypass banks and use it directly, and you can monitor the total amount of money on the network. No more random printing of money.

A friend compared this project to nuclear energy: you can debate about it long and large, but there's no doubt it has advanced humanity. I feel the same way about this one. This is a clear improvement.

2 comments

A book in preparation posted June 2019

I've started writing a book on applied cryptography at the beginning of 2019, and I will soon release a pre-access version. I will talk about that soon on this blog!

Alice and Bob

(picture taken from the book)

The book's audience is for students, developers, product managers, engineers, security consultants, curious people, etc. It tries to avoid the history of cryptography (which seems to be unavoidable in any book about cryptography these days), and shy away from mathematical formulas. Instead, it relies heavily on diagrams! A lot of them! As such, it is a broad introduction to what is useful in cryptography and how one can use the different primitives if seen as black boxes. It attempts to also serve the right amount of details, to satisfy the reader's curiosity. I'm hopping for it to be a good book for quickly getting introduced to different concepts going from TLS to PAKE. It will also include more modern topics like post-quantum cryptography and cryptocurrencies.

I don't think there's anything like this yet. the classic Applied Cryptography is quite old now and did not do much to encourage best practices or discourage rolling your own. The excellent Serious Cryptography is more technical and has more depth than what I'm aiming for. My book will rather be something in between, or something that would (hopefully) look like Matthew Green's blog if it was a book (minus a lot of the humor, because I suck at making jokes).

More to come!

1 comment

Why 2f+1 posted May 2019

Have you ever wondered why byzantine agreement protocols seem to all start with the assumptions that less than a third of the participants can be malicious?

This axiom is useful down the line when you want to prove safety, or in other words that your protocol can't fork. I made a diagram to show that, an instance of the protocol that forks (2 proposals have been committed with 2f+1 votes from the participants) is absurd.

2f+1 byzantine agreement

comment on this story

New Job, New Design posted April 2019

As I'm transitioning to the Blockchain team of Facebook, I decided it would be a good time to give this place a bit of a refresh :)

I've started blogging here in 2013:

2013

I quickly found out a layout I liked and sticked with it. In 2014 it looked like this:

2014

In 2018 it looked a bit different:

2018

And finally the new re-design which should be brighter:

2019

Hope you like it :)

1 comment

Disco: whitepaper posted February 2019

I already wrote a lot about Disco:

Today, I released the white paper that introduces the construction. Check it out on ePrint.

disco paper

It's a combination of everything I've talked about, plus:

  • some experimental results on embedded devices done by Matteo Bocchi and Ruggero Susella
  • some preliminary results on formal verification with Tamarin Prover

I unfortunately did not have the time to complete the formal verification of several important lemmas. But I am hopping that I can achieve this in a later paper. The formal verification code is on github and anybody is welcome to help :)

comment on this story

Bits and Bytes ordering in 5 minutes posted February 2019

1. Bits and Their Encoding

Imagine that I generate a key to encrypt with AES. I use AES-128, instead of AES-256, so I need a 128-bit key.

I use whatever mechanism my OS gives me to generate a long string of bits. For example in python:

>>> import os;
>>> random_number = os.urandom(16)
>>> print(bin(int(random_number, 16))[2:])
11111010110001111100010010101111110101101111111011100001110000001000010100001000000010001001000110111000000111101101000000101011

These bits, can be interpreted as a large number in base 2. Exactly how you would interpret 18 as "eighteen" in base 10.

This is the large number in base 10:

>>> print(int(random_number, 16))
333344255304826079991460895939740225579

According to wolframalpha it reads like so in english:

333 undecillion 344 decillion 255 nonillion 304 octillion 826 septillion 79 sextillion 991 quintillion 460 quadrillion 895 trillion 939 billion 740 million 225 thousand 579.

This number can be quite large sometimes, and we can make use of letters to shorten it into something more human readable. Let's try base 16 which is hexadecimal:

>>> print(random_number.encode('hex'))
fac7c4afd6fee1c085080891b81ed02b

You often see this method of displaying binary strings to a more human readable format. Another popular one is base64 which is using, you guessed it, base 64:

>>> import base64
>>> print(base64.b64encode(random_number))
+sfEr9b+4cCFCAiRuB7QKw==

And as you can see, the bigger the base, the shorter the string we get. That is quite useful to keep something human readable and short.

2. Bytes and Bit-wise Operations

Let's go back to our bitstring

11111010110001111100010010101111110101101111111011100001110000001000010100001000000010001001000110111000000111101101000000101011

this is quite a lot of bits, and we need to find a way to store that in our computer memory.

The most common way, is to pack these bits into bytes of 8 bits (also called one octet):

11111010 11000111 11000100 10101111 11010110 11111110 11100001 11000000 10000101 00001000 00001000 10010001 10111000 00011110 11010000 00101011

As you can see, we just split things every 8 bits. In each bundle of 8 bits, we keep the bit-numbering with the most significant bit (MSB) first. We could have had the least significant bit (LSB) first instead, but since our larger binary string already had MSB first, it makes sense to keep it this way. It's also more "human" as we are used to read numbers from left to right (at least in English, French, Chinese, etc.)

Most programming languages let you access octets instead of bits directly. For example in Golang:

a := []byte{98, 99} // an array of bytes
b := a[0] // the byte represented by the base 10 number '98'

To act on a specific bit, it's a bit more effort as we need to segregate it via bitwise operations like NOT, AND, OR, XOR, SHIFTs, ROTATIONs, etc.

For example in Golang:

a := byte(98)
firstBit := a >> 7 // shifting 7 bits to the right, leaving the MSB intact and zero'ing the others

So far, all of these things can be learned and anchored in your brain by writing code for something like cryptopals for example.

3. Memory

OK. How do we store these octets in memory? Unfortunately, because of historical reasons, we have two ways of doing this:

  1. Big-Endian: from low memory address (00000....) to high memory address (999999...) in that order.
  2. Little-Endian: from high memory address (9999999...) to lower memory address (0000000...) in that order.

We call this Endianness.

I'm sorry, but to understand the rest of this article, you are going to have to parse this small snippet of C first:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

int main(){
  uint8_t a[] = {1, 255};         // storing [1, 255]
  printf("%p: x\n", a, *a);       // 0x7ffdc5e78a70: 01
  printf("%p: x\n", a+1, *(a+1)); // 0x7ffdc5e78a71: ff
}

As we can see, everything works as expected:

  • a points to an address in memory (0x7ffdc5e78a70) containing $1$
  • the next address (0x7ffdc5e78a71) points to the value $255$ (displayed in hexadecimal)

The number 0x01ff (the 0x is a nice way to indicate that it is hexadecimal) represents the number $1 \times 16^2 + 15 \times 16^1 + 15 \times 16^0 = 511$ (remember, f represents the number 15 in hexadecimal).

So let's try to store that number in a different way in C:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

int main(){
  uint16_t b = 0x01ff;               // storing [1, 255] ?
//uint16_t b = 511                   // these two lines are equivalent

  uint8_t *a = (uint8_t*)&b;         // getting octet pointer on b
  printf("%p: x\n", a, *a);       // 0x7ffd78106986: ff
  printf("%p: x\n", a+1, *(a+1)); // 0x7ffd78106987: 01
}

Wait what? Why is the order of 01 and ff reversed?

This is because the machine I used to run this uses little-endianness to map values to memory (like most machines nowadays).

If you didn't know about this, it should freak you out.

But relax, this weirdness almost NEVER matters. Because:

  1. in most languages, you do not do pointer arithmetic (what I just did when I incremented a)
  2. in most scenarios, you do not convert back and forth between bytestrings and number types (like int or uint16_t).

And this is pretty much why most systems don't care too much about using little-endian instead of big-endian.

4. Network

Networking is usually the first challenge someone unfamiliar with endianness encounters.

When receiving bytes from a TCP socket, one usually stores them into an array. Here is a simple example in C where we receive a string from the network:

char *a = readFromNetwork() // [104, 101, 108, 108, 111, 0]
printf("%s\n", a);          // hello

Notice that we do not necessarily know in which order (endianness) the bytes were sent, but protocols usually agree to use network byte order which is big-endian. This works pretty well for strings, but when it comes to number larger than 8-bit, you need to know how to re-assemble it in memory depending on your machine.

Let's see why this is a problem. Imagine that we want to transmit the number $511$. We need two bytes: 0x01 and 0x0ff. We transmit them in this order since it is big-endian which is the prefered network-byte order. On the other side, here is how we can receive the two bytes, and convert them back into a number type:

uint8_t a1[] = {1, 255};      // storing the received octets as-is (from left to right)
uint8_t a2[] = {255, 1};      // storing the octets from right to left after reversing them
uint16_t *b1 = (uint16_t*)a1;
uint16_t *b2 = (uint16_t*)a2;
printf("%"PRIu16"\n", *b1);    // 65281
printf("%"PRIu16"\n", *b2);    // 511

In this case, we see that to collect the correct number $511$ on the other end of the connection, we had to reverse the order of the bytes in memory. This is because our machine is little-endian.

This is what confuses most people!

In reality, it shouldn't. And this should re-assure you, because trying to figure out the endianness of your machine before converting a series of bytes received from the network into a number can be daunting.

Instead, we can rely on bitwise operations that are always emulating big-endianness! Let's take a deep look at this short snippet of code:

uint8_t* a[] = {1, 255};          // the number 511 encoded in network byte-order
uint16_t b = (a[0] << 8) | a[1];
printf("%"PRIu16"\n", b);         // 511

Here, we placed the received big-endian numbers in the correct big-endian order via the left shift operation. This code works on any machine. It is the key to understanding why endianness doesn't matter in most cases: bit-wise operations are endianness-independent.

Unless your job is to implement low-level stuff like cryptogaphy, you do not care about endianness. This is because you will almost never convert a series of bytes to a number, or a number to a series of bytes.

If you do, because of networking perhaps, you use the built-in functions of the language (see Golang or C for example) and endianness-independent operations (like left shift), but never pointer arithmetic.

4 comments