Hey! I'm David, cofounder of zkSecurity and the author of the Real-World Cryptography book. I was previously a crypto architect at O(1) Labs (working on the Mina cryptocurrency), before that I was the security lead for Diem (formerly Libra) at Novi (Facebook), and a security consultant for the Cryptography Services of NCC Group. This is my blog about cryptography and security and other related topics that I find interesting.

It's my first survey ever and I had much fun writing it! I don't really know if I can call it a survey, it reads like a vulgarization/explanation of the papers from Coppersmith, Howgrave-Graham, Boneh and Durfee, Herrmann and May. There is a short table of the running times at the end of each sections. There is also the code of the implementations I coded at the end of the survey.

If you spot a typo or something weird, wrong, or badly explained. Please tell me!

I've Implemented a Coppersmith-type attack (using LLL reductions of lattice basis). It was done by Boneh and Durfee and later simplified by Herrmann and May. The program can be found on my github.

The attack allows us to break RSA and the private exponent d.
Here's why RSA works (where e is the public exponent, phi is euler's totient function, N is the public modulus):

\[ ed = 1 \pmod{\varphi(N)} \]
\[ \implies ed = k \cdot \varphi(N) + 1 \text{ over } \mathbb{Z} \]
\[ \implies k \cdot \varphi(N) + 1 = 0 \pmod{e} \]
\[ \implies k \cdot (N + 1 - p - q) + 1 = 0 \pmod{e} \]
\[ \implies 2k \cdot (\frac{N + 1}{2} + \frac{-p -q}{2}) + 1 = 0 \pmod{e} \]

The last equation gives us a bivariate polynomial \( f(x,y) = 1 + x \cdot (A + y) \). Finding the roots of this polynomial will allow us to easily compute the private exponent d.

The attack works if the private exponent d is too small compared to the modulus: \( d < N^{0.292} \).

To use it:

look at the tests in boneh_durfee.sage and make your own with your own values for the public exponent e and the public modulus N.

guess how small the private exponent d is and modify delta so you have d < N^delta

tweak m and t until you find something. You can use Herrmann and May optimized t = tau * m with tau = 1-2*delta. Keep in mind that the bigger they are, the better it is, but the longer it will take. Also we must have 1 <= t <= m.

you can also decrease X as it might be too high compared to the root of x you are trying to find. This is a last recourse tweak though.

Here is the tweakable part in the code:

# Tweak values here !
delta = 0.26 # so that d < N^delta
m = 3 # x-shifts
t = 1 # y-shifts # we must have 1 <= t <= m

Because studying Cryptography is also about using LaTeX, it's nice to spend a bit of time understanding how to make pretty documents. Because, you know, it's nicer to read.

Here's an awesome quick introduction of Tikz that allows to make beautiful diagram with great precision in a short time:

And I'm bookmarking one more that seems go way further.

I've used Cmder for a while on Windows. Which is a pretty terminal that brings a lot of tools and shortcuts from the linux world. I also have Chocolatey as packet manager. And all in all it works pretty great except Cmder is pretty slow.

I've ran into Babun yesterday, that seems to be kind of the same thing, but with zsh, oh-my-zsh and another packet manager: pact. The first thing I did was downloading tmux and learning how to use it. It works pretty well and I think I have found a replacement for Cmder =)

I won't go too much into the details because this is for a later post, but you can use such an attack on several relaxed RSA models (meaning you have partial information, you are not totally in the dark).

I've used it in two examples in the above code:

Stereotyped messages

For example if you know the most significant bits of the message. You can find the rest of the message with this method.

The usual RSA model is this one: you have a ciphertext c a modulus N and a public exponent e. Find m such that m^e = c mod N.

Now, this is the relaxed model we can solve: you have c = (m + x)^e, you know a part of the message, m, but you don't know x.
For example the message is always something like "the password today is: [password]".
Coppersmith says that if you are looking for N^1/e of the message it is then a small root and you should be able to find it pretty quickly.

let our polynomial be f(x) = (m + x)^e - c which has a root we want to find modulo N. Here's how to do it with my implementation:

dd = f.degree()
beta = 1
epsilon = beta / 7
mm = ceil(beta**2 / (dd * epsilon))
tt = floor(dd * mm * ((1/beta) - 1))
XX = ceil(N**((beta**2/dd) - epsilon))
roots = coppersmith_howgrave_univariate(f, N, beta, mm, tt, XX)

You can play with the values until it finds the root. The default values should be a good start. If you want to tweak:

beta is always 1 in this case.

XX is your upper bound on the root. The bigger is the unknown, the bigger XX should be. And the bigger it is... the more time it takes.

Factoring with high bits known

Another case is factoring N knowing high bits of q.

The Factorization problem normally is: give N = pq, find q. In our relaxed model we know an approximation q' of q.

Here's how to do it with my implementation:

let f(x) = x - q' which has a root modulo q.
This is because x - q' = x - ( q + diff ) = x - diff mod q with the difference being diff = | q - q' |.

beta = 0.5
dd = f.degree()
epsilon = beta / 7
mm = ceil(beta**2 / (dd * epsilon))
tt = floor(dd * mm * ((1/beta) - 1))
XX = ceil(N**((beta**2/dd) - epsilon)) + 1000000000000000000000000000000000
roots = coppersmith_howgrave_univariate(f, N, beta, mm, tt, XX)

What is important here if you want to find a solution:

we should have q >= N^beta

as usual XX is the upper bound of the root, so the difference should be: |diff| < XX

The constant values used are chosen to be nothing up my sleeve numbers: the four round constants k are 230 times the square roots of 2, 3, 5 and 10. The first four starting values for h0 through h3 are the same with the MD5 algorithm, and the fifth (for h4) is similar.

In cryptography, nothing up my sleeve numbers are any numbers which, by their construction, are above suspicion of hidden properties. They are used in creating cryptographic functions such as hashes and ciphers. These algorithms often need randomized constants for mixing or initialization purposes. The cryptographer may wish to pick these values in a way that demonstrates the constants were not selected for a nefarious purpose, for example, to create a backdoor to the algorithm. These fears can be allayed by using numbers created in a way that leaves little room for adjustment. An example would be the use of initial digits from the number π as the constants. Using digits of π millions of places into its definition would not be considered as trustworthy because the algorithm designer might have selected that starting point because it created a secret weakness the designer could later exploit.

I'm digging into the code source of Sage and I see that a lot of functions are implemented with Shoup's NTL. There is also FLINT used. I was wondering what were the differences. I can see that NTL is in c++ and FLINT is in C. On wikipedia:

It is developed by William Hart of the University of Warwick and David Harvey of Harvard University to address the speed limitations of the Pari and NTL libraries.

Although in the code source of Sage I'm looking at they use FLINT by default and switch to NTL when the modulus is getting too large.

By the way, all of that is possible because Sage uses Cython, which allows it to use C in python. I really should learn that...

EDIT:

This implementation is generally slower than the FLINT implementation in :mod:~sage.rings.polynomial.polynomial_zmod_flint, so we use FLINT by default when the modulus is small enough; but NTL does not require that n be `int`-sized, so we use it as default when n is too large for FLINT.

So the reason behind it seems to be that NTL is better for large numbers.