SHA-3, Keccak and disturbing implementation stories last month
"Why is Java a big pile of crap?" said the article. And after reading through the argumentation, I would move to the comment section and read a divergent point of view. Who was right? I wondered. Although everyone knows that Java is indeed a huge pile of crap, many other articles follow the same path of disputing one biased opinion that might be right, wrong, but really is both. I often mused on if I would come to write such a piece, and I think today is the day.
Keccak's (and/or SHA-3's) specification makes no sense.
Yup, it makes no sense, and I have a list of points you'll have to read through to understand how exactly.
By the way, if you didn't know, Keccak was the winner of the SHA-3 competition and blablabla...
First, let me introduce you to the internal state of Keccak.
It is some sort of rectangular 3D object, and its different sub-objects have different names. Each little cube is a bit (a
1 or a
I mean why not, AES was a square right? If we make it 3D we augment the security by one dimension. Sounds logical.
It gets weirder.
This is how the little cubes are indexed.
If you've started to reason on how you could translate that thing into a struct, stop. Don't overthink it. The
y coordinates don't matter: all operations are done modulo the other columns or rows or you name it... you could have the indexes
x = 0 and
y = 0 anywhere really; it wouldn't change a thing. This picture doesn't even appear in Keccak's specification, only in FIPS 202. It is probably a joke from the NIST.
Multiple of bytes? Nopecheese
A uint8_t array
...you must have thought, still trying to have a head start, still shooting to figure out how you would stuck these bits in your code.
But wait. The NIST loves to think about bits though, not bytes. That's often surprising to people who end up implementing their specs with a byte length instead of a bit length and it led to what DJB calls a bit attack.
SHA-3 was no exception, NIST said to Keccak: you shall be able to handle the darkest desires of our stupid developers. You must account for ALL INPUTS. You must accept non-multiple of bytes.
Keccak provided. You can now hash 7-bit inputs with SHA-3.
The poor user is given enough rope with which to hang himself -- something a standard should not do. —Rivest, 1992, commenting on the NIST/NSA “DSA” proposal
Bit indexing brainfudge
- MSB first:
0x02 = 0000 0010,
0x01 = 0000 0001, ...
- LSB first:
0x40 = 0000 0010,
0x80 = 0000 0001, ...
Now what does this have to do with Keccak?
Keccak's bit numbering is LSB first, whereas NIST's one is MSB first (no kidding).
This means a re-mapping needs to take place when converting an input to the internal state of Keccak. This was all explained in the old specification of Keccak (see section 5). Unfortunately these explanations disappeared in the latest version of the spec, and the NIST ended up writing up the conversion in an appendix (B.1) of their own specification FIPS 202.
Let me tell you: FIPS 202's explanation makes no sense. They end up mixing the theoretical internal state of Keccak with methods on how you can implement the re-mapping without having to inverse the bits of each bytes. It took me a while to figure it out and I am not the only one (most questions about Keccak/SHA-3 on internet end up being about their bit re-ordering).
Is SHA-3 Complicated? Some people would say no. But in reality there is no way to understand how to implement SHA-3 without looking at a reference implementation. NIST standards should seriously take a look at the process of TLS 1.3: no-one has been seen copying a reference implementation. Implementers are independently coding up what they understand of the draft specifications, and if cross-testing doesn't work it's either because they failed to understand something or because the specification needs some more love.
Having said that, Keccak is really cool and some of its applications look promising.