Hardware Solutions To Highly-Adversarial Environments Part 2: HSM vs TPM vs Secure Enclave posted April 2020
In the previous post (part 1) you learned about:
- The threat today is not just an attacker intercepting messages over the wire, but an attacker stealing or tampering with the device that runs your cryptography. So called Internet of Things (IoT) devices often run into this type of threats and are by default unprotected against sophisticated attackers.
- Hardware can help protect cryptography applications in highly-adversarial environment. One of the idea is to provide a device with a tamper-resistant chip to store and perform crypto operations. That is, if the device falls in the hands of an attacker, extracting keys or modifying the behavior of the chip will be hard. But hardware-protected crypto is not a panacea, it is merely defense-in-depth, effectively slowing down and increasing the cost of an attack.
- smart cards were one of the first such secure microcontroller that could be used as a micro computer to store secrets and perform cryptographic operations with them. These are supposed to use a number of techniques to discourage physical attackers.
- the concept of a smart card was generalized as a secure element, which is a term employed differently in different domains, but boils down to a smart card that can be used as a coprocessor in a greater system that already has a main processor.
- Google having troubles dealing with the telecoms to host credit card information on SIM cards (which are secure elements), the concept of secure element in the cloud was born. In the payment space this is called host card emulation (HCE). It works simply by storing the credit card information (which is a 3DES symmetric key shared with the bank) in a secure element in the cloud, and only giving a single-use token to the user: if the phone is compromised, the attacker can only use it to pay once.
All good?
In this part 2 of our blog series you will learn about more hardware that supports cryptographic operations! These are all secure elements in concept, and are all doing sort of the same things but in different contexts. Let’s get started!
Hardware Security Module (HSM)
If you understood what a secure element was, well a hardware secure module (HSM) is pretty much a bigger secure element. Not only the form factor of secure elements require specific ports, but they are also slow and low on memory. (Note that being low on memory is sometimes OK, as you can encrypt keys with a secure element master key, and then store the encrypted keys outside of the secure element.) So HSM is a solution for a more portable, more efficient, more multi-purpose secure element. Like some secure elements, some HSMs can run arbitrary code as well.
HSMs are also subject to their own set of standards and security level. One of the most widely accepted standard is FIPS 140-2: Security Requirements for Cryptographic Modules, which defines security levels between 1 and 4, where level 1 HSMs do not provide any protection against physical attacks and level 4 HSMs will wipe their whole memory if they detect any intrusion!
Typically, you find an HSM as an external device with its own shelf on a rack (see the picture of a luna HSM below) plugged to an enterprise server in a data center.
(To go full circle, some of these HSMs can be administered using smart cards.)
Sometimes you can also find an HSM as a PCIe card plugged into a server’s motherboard, like the IBM Crypto Express in the picture below.
Or even as small dongles that you can plug via USB (if you don’t care about performance), see the picture of a YubiHSM below.
HSMs are highly used in some industries. Every time you enter your PIN in an ATM or a payment terminal, the PIN ends up being verified by an HSM somewhere. Whenever you connect to a website via HTTPS, the root of trust comes from a Certificate Authority (CA) that stores its private key in an HSM, and the TLS connection is possibly terminated by an HSM. You have an Android or iPhone? Chances are Google or Apple are keeping a backup of your phone safe with a fleet of HSMs. This last case is interesting because the threat model is reversed: the user does not trust the cloud with its data, and thus the cloud service provider claims that its service can’t see the user’s encrypted backup nor can access the keys used to encrypt it.
HSMs don’t really have a standard, but most of them will at least implement the Public-Key Cryptography Standard 11 (PKCS#11), one of these old standards that were started by the RSA company and that were progressively moved to the OASIS organization (2012) in order to facilitate adoption of the standards.
While PKCS#11 last version (2.40) was released in 2015, it is merely an update of a standard that originally started in 1994. For this reason it specifies a number of old cryptographic algorithms, or old ways of doing things. Nevertheless, it is good enough for many uses, and specifies an interface that allow different systems to easily interoperate with each other.
While HSMs’ real goals are to make sure nobody can extract key material from them, their security is not always shining. A lot about the security of these hardware solutions really relies on their high price, the protection techniques used not being disclosed, and the certifications (like FIPS and Common Criteria) mostly focusing on the hardware side of things. In practice, devastating software bugs have been found and it is not always straight forward to know if the HSM you use is vulnerable to any of these vulnerabilities (Cryptosense has a good summary of known attacks against HSMs).
By the way, not only the price of one HSM is high (it can easily be dozens of thousands of dollars depending on the security level), in addition to an HSM you often have another HSM you use for testing, and another one you use for backup (in case your first HSM dies with its keys in it). It can add up!
Furthermore, I still haven’t touched on the elephant in the room with all of these solutions: while you might prevent most attackers from reaching your secret keys, you can't prevent attackers from compromising the system and making their own calls to the secure hardware module (be it a secure element or an HSM). Again, these hardware solutions are not a panacea and depending on the scenario they provide more or less defense-in-depth.
By the way, if it applies to your situation modern cryptography can offer better ways of reducing the consequences of key material compromise and mis-use. For example using multi-signatures! Check my blog post on the subject.
Trusted Platform Module (TPM)
A Trusted Platform Module (TPM) is first and foremost a standard (unlike HSMs) developed in the open by the non-profit Trusted Computing Group (TCG). The latest version is TPM 2.0, published with the ISO/IEC (International Organization for Standardization and the International Electrotechnical Commission).
A TPM complying with the TPM 2.0 standard is a secure microcontroller that carries a hardware random number generator also called true random number generator (TRNG), secure memory for storing secrets, cryptographic operations, and the whole thing is tamper resistant. If this description reminds you of smart cards, secure element, and HSMs well… I told you that everything we were going to be talking about in this chapter were going to be secure elements of some form. (And actually, it’s common to see TPMs implemented as repackaging of secure elements.)
You usually find a TPM directly soldered to the motherboard of many enterprise servers, laptops, and desktop computers (see picture below).
Unlike solutions that we’ve seen previously though, a TPM does not run arbitrary code. It offers a well-defined interface that a greater system can take advantage of. Due to these limitations, a TPM is usually pretty cheap (even cheap enough that some IoT devices will ship with one!).
Here is a non-exhaustive list of interesting applications that a TPM can enable:
- User authentication. Ever heard of the FBI iPhone fiasco? TPMs can be used to require a user PIN or password. In order to prevent low entropy credentials to be easily bruteforced, a TPM can rate limit or even count the number of failed attempts.
- Secure boot. Secure boot is about starting a system in a known trusted state in order to avoid tampering of the OS by malware or physical intrusion. This can be done by using a platform’s TPM and the Unified Extensible Firmware Interface (UEFI) which is the piece of code that launches an operating system. Whenever the image of a new boot loader or OS or driver is loaded, the TPM can store the associated expected hash and compare it before running the code, and failing if the hash of the image is different. If you hold a public key you can also verify that a piece of code has been signed before running it. This is a gross over-simplification of how secure boot works in practice, but the crypto is pretty straight forward.
- Full disk encryption (FDE). This allows to store the key (or encrypt the key) that encrypts all data on the device at rest. If the device has been proven to be in a known good state (via secure boot) and the user authenticates correctly, the key can be released to decrypt data. When the devices is locked or shut down, the key vanishes from memory and has to be released by the TPM again. This is a must feature if you lose, or get your device stolen.
- Remote attestation. This allows a device to authenticate itself or prove that it is running specific software. In other words, a TPM can sign a random challenge and/or metadata with a key that can be tied to a unique per-TPM key (and is signed by the TPM vendor). Every TPM comes with such a unique key (called an endorsement key) along with the vendor’s certificate authority signature on the public key part. For example, during employee onboarding a company can add a new employee’s laptop’s TPM endorsement key to a whitelist of approved devices. Later, if the user wants to access one of the company’s service, the service can request the TPM to sign a random challenge along with hashes of what OS was booted to authenticate the user and prove the well-being of the user’s device.
There are more functionalities that a TPM can enable (there's afterall hundreds of commands that a TPM implements) which might even benefit user applications (which should be able to call the TPM).
Note that having a standard is great for inter-operability, and for us to understand what is going on, but unfortunately not everyone use TPMs. Apple has the secure enclave, Microsoft has Pluton, Google has Titan.
Perhaps, on a darker note, it is good to note that TPMs have their own controversies and have also been subjected to devastating vulnerabilities. For example the ROCA attack found that an estimated million TPMs (and even smart cards) from the popular Infineon vendor had been wrongly generating RSA private keys for years (the prime generation was flawed).
To recap, you’ve learned about:
- HSMs. They are external, bigger and faster secure elements. They do not follow any standard interface, but usually implement the PKCS#11 standard for cryptographic operations. HSMs can be certified with different levels of security via some NIST standard (FIPS 140-2).
- TPMs. They are chips that follow the TPM standard, more specifically they are a type of secure element with a specified interface. A TPM is usually a secure chip directly linked to the motherboard and perhaps implemented using a secure element. While it does not allow to run arbitrary programs like some secure elements, smart cards, and HSMs do, it enables a number of interesting applications for devices as well as user applications.
That’s it for now, check this blog again to read part 3 which will be about TEEs!
Many thanks to Jeremy O'Donoghue, Thomas Duboucher, Charles Guillemet, and Ryan Sleevi who provided help and reviews!
Comments
Neil Madden
This is a really good series, thanks.
Both PKCS#11 and FIPS 140 are in the process of being updated:
- PKCS#11 3.0 (https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=pkcs11#technical) which includes a lot of modern crypto: https://docs.oasis-open.org/pkcs11/pkcs11-curr/v3.0/cs01/pkcs11-curr-v3.0-cs01.html (Curve25519, Curve448, EdDSA - even XEdDSA, x3dh, etc from Signal, SHAKE, Blake2b, ChaPoly, etc). I was very pleasantly surprised. I don't know enough about the OASIS process to know how far from a final standard this is.
- FIPS 140-3 (https://csrc.nist.gov/publications/detail/fips/140/3/final) is currently being rolled out and will replace 140-2 this year.
Mat
I think you mixed up the FIPS 140-2 levels. The memory wiping is already done by level 3 devices iirc. Level 4 adds strong requirements to the physical security of the device environment.
david
Thanks for the pointer Neil!
I'm wondering if HSMs vendors are really going to update to the latest PKCS#11, they seem pretty old school.
FIPS 140-3 seems to have started in 2007, so I'm not sure if it'll ever see the light of day xD
Mat: https://csrc.nist.gov/csrc/media/projects/cryptographic-module-validation-program/documents/fips140-2/fips1402ig.pdf
level 3 provides protection against:
> Observable evidence of tampering.
> Physical boundary of the module is opaque to prevent direct observation of internal security components.
> Direct entry/probing attacks prevented.
> Strong tamper resistant enclosure or encapsulation material.
> If applicable, active zeroization if covers or doors opened.
> Software: logical access protection of the cryptographic modules unprotected CSPs and data is provided by the evaluated operating system at EAL3.
level 4:
> Observable evidence of tampering.
> Physical boundary of the module is opaque to prevent direct observation of internal security components.
> Direct entry/probing attacks prevented.
> Strong tamper resistant enclosure or encapsulation material.
> If applicable, active zeroization if covers or doors opened.
> A complete envelope of protection around the module preventing unauthorized attempts at physical access.
> Penetration of the module’s enclosure from any direction had a very high probability of being detected resulting in immediate zeroization of plaintext CSPs or severe damage to the module rendering it inoperable.
> Non-direct attacks prevented.
> Software: logical access protection of the cryptographic modules unprotected CSPs and data is provided by the evaluated operating system at EAL4.
further, the document emphasizes the value of level 4:
> The module shall zeroize all unprotected CSPs before an attacker can compromise the module. An attack is premeditated, well-funded, organized and determined.
Neil Madden
The timeline for FIPS 140-3 rollout is at https://csrc.nist.gov/projects/fips-140-3-transition-effort . Supposedly stopping new 140-2 certifications in Sept 2021, although they will still be valid until 2026.
Re: PKCS#11 3.0, the PKCS#11 standards make very few guarantees about what mechanisms or object types will be supported (e.g., AWS CloudHSM cannot even store certificates). There are some standard "profiles" but they make very minimal requirements about what a conforming HSM has to support.
My guess is that they might add the new functions introduced in V3 but only selectively adopt a handful of new mechanisms. I guess TLS 1.3 will drive some adoption around e.g. ChaPoly, Ed25519, etc.
JNR Management
Hi
Thanks for such great thoughts.
I found it very informative.
Krypto Agile
Thank you for an informative article.
I must say this is really very good series.
leave a comment...