david wong

Hey ! I'm David, a security consultant at Cryptography Services, the crypto team of NCC Group . This is my blog about cryptography and security and other related topics that I find interesting.

What are x509 certificates? RFC? ASN.1? DER? April 2015

RFC

So, RFC means Request For Comments and they are a bunch of text files that describe different protocols. If you want to understand how SSL, TLS (the new SSL) and x509 certificates (the certificates used for SSL and TLS) all work, for example you want to code your own OpenSSL, then you will have to read the corresponding RFC for TLS: rfc5280 for x509 certificates and rfc5246 for the last version of TLS (1.2).

rfc ex

x509

x509 is the name for certificates which are defined for:

informal internet electronic mail, IPsec, and WWW applications

There used to be a version 1, and then a version 2. But now we use the version 3. Reading the corresponding RFC you will be able to read such structures:

Certificate  ::=  SEQUENCE  {
    tbsCertificate       TBSCertificate,
    signatureAlgorithm   AlgorithmIdentifier,
    signatureValue       BIT STRING  }

those are ASN.1 structures. This is actually what a certificate should look like, it's a SEQUENCE of objects.

  • The first object contains everything of interest that will be signed, that's why we call it a To Be Signed Certificate
  • The second object contains the type of signature the CA used to sign this certificate (ex: sha256)
  • The last object is not an object, its just some bits that correspond to the signature of the TBSCertificate after it has been encoded with DER

ASN.1

It looks small, but each object has some depth to it.

The TBSCertificate is the biggest one, containing a bunch of information about the client, the CA, the publickey of the client, etc...

TBSCertificate  ::=  SEQUENCE  {
    version         [0]  EXPLICIT Version DEFAULT v1,
    serialNumber         CertificateSerialNumber,
    signature            AlgorithmIdentifier,
    issuer               Name,
    validity             Validity,
    subject              Name,
    subjectPublicKeyInfo SubjectPublicKeyInfo,
    issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
                         -- If present, version MUST be v2 or v3
                          subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL,
                      -- If present, version MUST be v2 or v3
     extensions      [3]  EXPLICIT Extensions OPTIONAL
                      -- If present, version MUST be v3
}

DER

A certificate is of course not sent like this. We use DER to encode this in a binary format.

Every fieldname is ignored, meaning that if we don't know how the certificate was formed, it will be impossible for us to understand what each value means.

Every value is encoded as a TLV triplet: [TAG, LENGTH, VALUE]

For example you can check the GITHUB certificate here

github cert

On the right is the hexdump of the DER encoded certificate, on the left is its translation in ASN.1 format.

As you can see, without the RFC near by we don't really know what each value corresponds to. For completeness here's the same certificate parsed by openssl x509 command tool:

x509 openssl parsed

How to read the DER encoded certificate

So go back and check the hexdump of the GITHUB certificate, here is the beginning:

30 82 05 E0 30 82 04 C8 A0 03 02 01 02

As we saw in the RFC for x509 certificates, we start with a SEQUENCE.

Certificate  ::=  SEQUENCE  {

Microsoft made a documentation that explains pretty well how each ASN.1 TAG is encoded in DER, here's the page on SEQUENCE

30 82 05 E0

So 30 means SEQUENCE. Since we have a huge sequence (more than 127 bytes) we can't code the length on the one byte that follows:

If it is more than 127 bytes, bit 7 of the Length field is set to 1 and bits 6 through 0 specify the number of additional bytes used to identify the content length.

(in their documentation the least significant bit on the far right is bit zero)

So the following byte 82, converted in binary: 1000 0010, tells us that the length of the SEQUENCE will be written in the following 2 bytes 05 E0 (1504 bytes)

We can keep reading:

30 82 04 C8 A0 03 02 01 02

Another Sequence embedded in the first one, the TBSCertificate SEQUENCE

TBSCertificate  ::=  SEQUENCE  {
    version         [0]  EXPLICIT Version DEFAULT v1,

The first value should be the version of the certificate:

A0 03

Now this is a different kind of TAG, there are 4 classes of TAGs in ASN.1: UNIVERSAL, APPICATION, PRIVATE, and context-specific. Most of what we use are UNIVERSAL tags, they can be understood by any application that knows ASN.1. The A0 is the [0] (and the following 03 is the length). [0] is a context specific TAG and is used as an index when you have a series of object. The github certificate is a good example of this, because you can see that the next index used is [3] the extensions object:

TBSCertificate  ::=  SEQUENCE  {
    version         [0]  EXPLICIT Version DEFAULT v1,
    serialNumber         CertificateSerialNumber,
    signature            AlgorithmIdentifier,
    issuer               Name,
    validity             Validity,
    subject              Name,
    subjectPublicKeyInfo SubjectPublicKeyInfo,
    issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
                         -- If present, version MUST be v2 or v3
                          subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL,
                      -- If present, version MUST be v2 or v3
     extensions      [3]  EXPLICIT Extensions OPTIONAL
                      -- If present, version MUST be v3
}

Since those obects are all optionals, skipping some without properly indexing them would have caused trouble parsing the certificate.

Following next is:

02 01 02

Here's how it reads:

  _______ tag:      integer
 |   ____ length: 1 byte
 |  |   _ value:  2
 |  |  |
 |  |  |
 v  v  v
02 01 02 

The rest is pretty straight forward except for IOD: Object Identifier.

Object Identifiers

They are basically strings of integers that reads from left to right like a tree.

So in our Github's cert example, we can see the first IOD is 1.2.840.113549.1.1.11 and it is supposed to represent the signature algorithm.

So go to http://www.alvestrand.no/objectid/top.html and click on 1, and then 1.2, and then 1.2.840, etc... until you get down to the latest branch of our tree and you will end up on sha256WithRSAEncryption.

Here's a more detailed explanation on IOD and here's the microsoft doc on how to encode IOD in DER.

Well done! You've reached the end of my post. Now you can leave me a comment :)

Jide Akinyemi

Thanks

Mamoon Ahmed

David, This is the best article on the internet to further explain the missing concepts of ASN.1 encoding. You saved alot of my time. Thank you so much and keep up the good work bro ....

Mikaz

This was useful for me today. Good job !

Nagarjuna

You saved alot of my time. Thank you so much and keep up the good work bro