david wong

Hey! I'm David, a security consultant at Cryptography Services, the crypto team of NCC Group . This is my blog about cryptography and security and other related topics that I find interesting.

Myths about /dev/urandom

posted December 2014

In one of my class the teacher advise against /dev/urandom.

I wondered why. I remembered reading some articles about random vs urandom and urandom being better, but that was years ago and my memory is not fresh. Wikipedia does advise against it, as does the manpage if you want to generate a long term key.

I also stumbled on one of Thomas Pornin's answer on the security SO also pointing to a blogpost from Thomas Hühn

tl;dr:

Fact: /dev/urandom is the preferred source of cryptographic randomness on UNIX-like systems.

comment on this story

Differential Fault Analysis

posted December 2014

I wrote about Differential Power Analysis (DPA) but haven't said that there were way more efficient attacks (although that might be more costy to setup). Differential Fault Analysis is a kind of differential cryptanalysis: you analyse the difference between blocks of the internal state and try to extract a subkey or a key. Here we do a fault injection on the internal state of the smartcard during an encryption operation (usually with lasers (photons have the property of igniting a curant in a circuit), or by quickly changing the temperature). The attack presented in http://eprint.iacr.org/2010/440.pdf and https://eprint.iacr.org/2003/010.pdf is targeting the last subkey.

aes fault

We inject a fault on 1 byte of AES (in the picture we consider the internal state of AES to be a 4x4 matrix of bytes) at a particular spot (before the last round) and we see that at one point it creates a diagonal of errors. We can XOR the internal state without fault with the faulty one to display only the propagation of the fault.

aes fault

Here, by doing an hypothesis on keys and seeing how the Addkey operation is modifying this difference we can compute the last subkey.

On AES-128, it is sufficient to know K10 to find the cipher key, but on AES-256, you must know K13 and K14

Although this is only my understanding of the DFA. It also seems to be easier to produce on RSA (and it was originally found by Shamir on RSA).

comment on this story

The Birthday Paradox

posted November 2014

I'm studying the internals of hash functions and MACs right now. One-way Compression Functions, Sponge functions, CBC-MAC and... the Merkle–Damgård construction. Trying to find a youtube video about it I run into... The Cryptography course of Dan Boneh I already took 3 years ago. I have a feeling I will forever return to that course during my career as a cryptographer.

The whole playlist is here on youtube and since his course is awesome I just watched again the whole part about MACs. And I thought I should post this explanation of the birthday paradox since as he says:

Everybody should see a proof of the birthday paradox at least once in their life

Something that always bugged me though is that he says the formula for the birthday is 1.2 sqrt(365) whereas it should be square root of 366 since there are indeed 366 different birthdays possible.

comment on this story

ROP and ROPGadget

posted November 2014

This morning I had a course on Return Oriented Programming given by Jonathan Salwan, a classmate of mine also famous inventor of RopGadget.

The slides are here.

A lot of interesting things there. Apparently it's still kind of impossible to completely protect your C code against that kind of attack. Even with all the ASLR, PIE, NX bit and other protections... There is also an awesome lecture about ROP on Coursera I linked to in the previous post here.

Basically, since you can't execute code in the stack, and since the addresses of libraries are randomized because of ASLR, you can find bits of codes ending with a return (called gadgets) and chain them since you control the stack (thus the saved EIPs). What I learned by doing was that it gets complicated if it's 64bits (since a lot of address will have a lot of 0x00 and you can't point to those doing a buffer overflow through a strcpy or something similar) and you won't get a lot of those gadgets if you have dynamically loaded libraries. Static libraries are loaded in the .text section (which is executable of course), so that's all good. Also a good way to store strings of data are in the .data section since it is untouched by the randomization contrarily to the stack.

A lot of researches is done on the subject and new tools like RopGadget are coming, using an old concept (but still actively researched): the SAT solvers. There seems to be a problem though, those SAT solvers yield a set of gadgets to be used for some action you want to accomplish with your shellcode, but you have to do the work of putting them in the right order.

This is what I took from that talk, you can question the guy if that interests you!

comment on this story

Software Security course on Coursera

posted November 2014

I've already talked about Coursera before, and how much I liked it.

The Cryptography course by Dan Boneh is amazing and I often come back to it when I need a reminder. For example, even today I rewatched his video on AES because I was studying Differential Fault Analysis on AES (which is changing bits of the state during one round of AES to leak information about the last round subkey).

So if I could give you another course recommendation, it would be Software Security by Michael Hicks. It looks ultra complete and the few videos I've watched (to complete the security course I'm taking at the University of Bordeaux by Emmanuel Fleury) are top notch.

comment on this story

DPA: Differential Power Analysis

posted November 2014

Studying about smartcard there seem to be a lot about whitboxes to learn, since it is indeed a whitebox: the encryption/decryption that are done inside the cards can be analyzed since you own the card. Analysis are separated in different categories like non-intrusive and intrusive. Intrusive because for efficient analysis you would have to remove some part of the plastic covering the interesting parts and directly plug yourself on the chip. This is what Differential Power Analysis (DPA) do, it's a stronger kind of Simple Power Analaysis (SPA).

Kocher & al found out about this in 1998 and released a paper that is still very useful today: http://www.cryptography.com/public/pdf/DPA.pdf

The idea is to record the power consumption of the chip along multiple encryptions. You then obtain curves with pics that you can correlate to XORs operations being performed. You can guess what cipher is used, and where are the known rounds/operations of the cipher from the intensities of some peaks, and the periodicity of some patterns. In the paper they study DES which is still the state of the art for block ciphers then.

Looking at a big number of such curves, along with the messages (or ciphertexts) they encrypted, you can focus on one operation and one bit of the internal state to find out one bit of one of the subkey. One bit should affect the number of XORs being performed thus you should find a correlation between the bit you're looking for and the power consumption at one point. Repeat and find all the other ones. It's powerful because you only need to find one bit of the subkey, one after the other.

It's pretty hard to explain it without pictures (and a video would be even better, that's always something I have been wanting to do, if I dig deeper into it maybe I'll try that). But the basic idea is here, if you want more info check the original paper

comment on this story

Pseudo Random Number Generators using a block cipher in CTR mode

posted November 2014

I was wondering why Randomized Algorithm were often more efficient than non-randomized algorithm.

Then I looked at a list of random number generators (or RNG).

Of course we usually talk about PRNG (Pseudo Random Number Generator) since "truly random" is impossible/hard to achieve.

An interesting thing I stumbled into is that you can create a PRNG using a block cipher in counter mode, by iterating the counter and always encrypting the same thing, if the block cipher used is good, it should look random.

This sounds solid since ciphers sometimes need to have Ciphertext Indistinguishability from random noise.

To support such deniable encryption systems, a few cryptographic algorithms are specifically designed to make ciphertext messages indistinguishable from random bit strings

Also under the Ciphertext indistinguishability property that a cipher should respect, you shouldn't be able to find any relations between the ciphertexts coming from the same input but encrypted with an increasing counter.

1 comment

MicroCorruption

posted November 2014

microcorruption

MicroCorruption is a "game" made by Matasano in which you will have to debug some programs in assembly. There is a total of 19 levels and it gets harder and harder by the number. The first levels are made for beginners though! So it seems like a great tool to learn, and that's what our prof Emmanuel Fleury must have thought when he gave us this as homework. One rule: every level is worth one point.

I started writing the solutions here but as people told me "it was unethical to post solutions of a challenge online", I promptly removed them. If someday the challenge will shut down I will post the write ups online so that people can still learn from it :)

comment on this story

Elliptic Curves

posted October 2014

I feel like I don't write much about my formation, and that it could be useful to people who are wondering about studying Cryptography at Bordeaux University.

There is a good article from a M1 student here: http://journaldumaster.stats.yt/master-csi-presentation/

And as it says there, the master 1 is do-able both for maths and CS people as long as you're willing to catch up in the other subject. There's a lot of theory that will allow you to study more interesting subjects in the second year of Master.

I've talked about some of the subjects but one subject I forgot to talk about was a M1 class: Elliptic Curves, taught by Fabien Pazuki and if you have the chance of taking a class from the guy just do it. He's one of the best math teacher I have had in my life, along with Vincent Borrelli (Surfaces & Curves at Lyon 1) and some dude I can't remember the name of. Each one of them were both really passionate and making true efforts to be pedagogical.

comment on this story

Bruteforce Apr1 hashes.

posted May 2014

One of my professor organized a Hacking Week this semester but I didn't have time to do it. Since I'm in holidays I thought I would take a look at it and write a bit about how I solved them.

Here's the Crypto Challenge number 2 (out of 5) from this CTF (Capture The Flag):

user0:$apr1$oTsx8NNn$bAjDZHpM7tCvHermlXKfZ0 user1:$apr1$UxOdpNtW$funTxZxL/8y3m8STvonWj0
user2:$apr1$w7YNTrjQ$0/71H7ze5o9/jCnKLt0mj0 user3:$apr1$AIw2h09/$Ti0TRlU9mDpCGm5zg.ZDP. user4:$apr1$048HynE6$io7TkN7FwrBk6PmMzMuyC. user5:$apr1$T2QG6cUw$eIPlGIXG6KZsn4ht/Kpff0 user6:$apr1$2aLkQ0oD$YRb6aFYMkzPoUCj70lsdX0

You have 7 different users with their respective password hashed and you have to find them. It's just the 2nd out of 5 crypto problems, it's pretty basic, but I never brute forced passwords for real before (I remember using John The Ripper when I was in middle school but that's for script kiddies).

What's Apr1 ? It's a hash function that uses md5. And md5 is pretty weak, lots of rainbow tables on google.

This is how Apr1 looks in PHP according to Wikipedia, also the passwords are supposed to be alpha (a to z) in lowercase.

function apr1($mdp, $salt) {
    $max = strlen($mdp);
    $context = $mdp.'$apr1$'.$salt;
    $binary = pack('H32', md5($mdp.$salt.$mdp));
    for($i=$max; $i>0; $i-=16)
        $context .= substr($binary, 0, min(16, $i));
    for($i=$max; $i>0; $i>>=1)
        $context .= ($i & 1) ? chr(0) : $mdp{0};
    $binary = pack('H32', md5($context));
    for($i=0; $i<1000; $i++) {
        $new = ($i & 1) ? $mdp : $binary;
        if($i % 3) $new .= $salt;
        if($i % 7) $new .= $mdp;
        $new .= ($i & 1) ? $binary : $mdp;
        $binary = pack('H32', md5($new));
    }
     $hash = '';
    for ($i = 0; $i < 5; $i++) {
        $k = $i+6;
        $j = $i+12;
        if($j == 16) $j = 5;
        $hash = $binary{$i}.$binary{$k}.$binary{$j}.$hash;
    }
    $hash = chr(0).chr(0).$binary{11}.$hash;
    $hash = strtr(
        strrev(substr(base64_encode($hash), 2)),
        'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
        './0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
    );
    return '$apr1$'.$salt.'$'.$hash;
}

It seems pretty difficult to reverse. Let's not forget that hashes are one-way functions and that they also lose information. I don't know if they do lose information on a 7-letters-password though, but it seemed quite stupid to go down this road when I could just brute force it.

What language offers a good library to hash with Apr1? Well I didn't know, and I felt like maybe Unix could do it well for me.

Turns out that OpenSSL has a command line for it:

openssl passwd -apr1 -salt SALT PASSWD

A quick bash script later:

#!/bin/bash

test[1]='$apr1$oTsx8NNn$bAjDZHpM7tCvHermlXKfZ0'
salt[1]='oTsx8NNn'

test[2]='$apr1$UxOdpNtW$funTxZxL/8y3m8STvonWj0'
salt[2]='UxOdpNtW'

test[3]='$apr1$w7YNTrjQ$0/71H7ze5o9/jCnKLt0mj0'
salt[3]='w7YNTrjQ'

test[4]='$apr1$AIw2h09/$Ti0TRlU9mDpCGm5zg.ZDP.'
salt[4]='AIw2h09/'

test[5]='$apr1$048HynE6$io7TkN7FwrBk6PmMzMuyC.'
salt[5]='048HynE6'

test[6]='$apr1$T2QG6cUw$eIPlGIXG6KZsn4ht/Kpff0'
salt[6]='T2QG6cUw'

test[7]='$apr1$2aLkQ0oD$YRb6aFYMkzPoUCj70lsdX0'
salt[7]='2aLkQ0oD'

while read line          
do          
    if [ "${#line}" == 7 ]
    then
    for num in {1..7}
    do
        noob=$(openssl passwd -apr1 -salt $salt[$num] $line)
        if [ "$noob" == "$test[$num]" ];
        then
        echo $line;
        fi
    done
    fi
done < /usr/share/dict/words

I read the /user/share/dict/words that contains a simple dictionary of words on Unix, I try only the 7-letters-words.

The test ran in a few minutes and gave me nothing.

Well, I guess with a 7 letters password they must have used gibberish words. Let's try a real bruteforce:

for a in {a..z}
do
    for b in {a..z}
    do
        for c in {a..z}
        do
            for d in {a..z}
            do
                for e in {a..z}
                do
                    for f in {a..z}
                    do
                        for g in {a..z}
                        do
                            truc=$a$b$c$d$e$f$g;

                            for num in {1..7}
                            do
                            noob=$(openssl passwd -apr1 -salt $salt[$num] $truc)
                            if [ "$noob" == "$test[$num]" ];
                            then
                                echo $truc;
                            fi
                            done
                        done
                    done
                done
            done
        done
    done
done

It ran and ran and... nothing.

Well. Let's not spend too much on this. There is John The Ripper that does this well and even oclHashcat that does this with the GPU.

Let's create a john.conf with the following to limit the password to 7 letters:

[Incremental:Alpha7]
File = $JOHN/alpha.chr
MinLen = 7
MaxLen = 7
CharCount = 26

Let's launch John:

john -i=Alpha7 hackingweek.txt

(don't forget to put the hashed password in hackingweek.txt).

Wait and wait and wait.. and get the passwords =)

comment on this story

Find all the pairs in a list that are summing to a known number

posted May 2014

I got asked this question in an interview. And I knew this question beforehands, and that it had to deal with hashtables, but never got to dig into it since I thought nobody would asked me that for a simple internship.

I didn't know how to answer, in my mind I just had a simple php script that would have looked like this:

$arr = array(-5, 5, 3, 1, 7, 8);
$target = 8;

for($i = 0; $i < sizeof($arr) - 1; $i++)
{
    for($j = $i + 1; $j < sizeof($arr); $j++)
    {
    if($arr[$i] + $arr[$j] == $target)
        echo "pair found: ${arr[i]}, ${arr[j]}";
    }    
}

But it's pretty slow, it's mathematically correct, but it's more of a CS-oriented question. How to implement that quickly for machines? The answer is hash tables. Which are implemented as arrays in PHP (well, arrays are like super hash tables) and as dictionaries in Python.

I came up with this simple example in python:

arr = (-5, 5, 3, 1, 7, 8)
target = 8

dic = {}

for i, item in enumerate(arr):
    dic[item] = i

    if dic.has_key(target - item) and dic[target - item] != i:
        print item, (target - item)
  1. iterate the list
  2. assign the hash of the value to the index of the value in the array
  3. to avoid finding a pair twice, we do this in the same for loop:
    we do the difference of the target sum and the number we're on, we hash it, if we find that in the hash table that's good!
  4. but it could also be the number itself, so we check for its index, and it has to be different than its own index.

Voilà! We avoid the n-1! additions and comparisons of the first idea with hash tables (I actually have no idea how fast they are but since most things use hash tables in IT, I guess that it is pretty fast).

comment on this story

Notes on ECC (Elliptic Curve Cryptography) & Internship progress

posted May 2014

One last exam, ECC, and then I'm free to do whatever I want (no I still haven't found an internship, but I talked with TrueVault, Cloudflare, MatterMark, Spotify and maybe Matasano so this has been a good experience nonetheless).

I stumbled upon the notes of Ben Lynn an ex Stanford's student that took an ECC class there. They're pretty awesome and I kinda want to do something like that on this blog. Maybe next year it's a bit late for that :)

The notes are here

comment on this story