21 oct 2013

Parsing X509v3 certificates and PKCS7 messages with Python


Introduction


Recently I had the need to get out the following information out of certificates and PKCS7 messages:

  • A certificate's validity period (notBefore, notAfter attributes)
  • A PKCS7 digital signature's author and signing time

Some basics


Digital certificates are ASN.1 (Abstract Syntax Notation One) structures DER (Distinguished Encoding Rules). 

ASN.1 is something like Backus-Naur Form used for describing data structures, e.g.:

 MyType ::= SEQUENCE {  
   myObjectIdentifier OBJECT IDENTIFIER,  
   myNumbers SEQUENCE OF MyNumber,  
   myMessage VisibleString  
 }  
 MyNumber ::= INTEGER (0..255)  

Although it's nearly 30 years old (being originally part of the CCITT X.409:1984 spec), it's still often used in the Public Key Infrastructure world. For example, digital certificates like X.509 and PKCS (Public Key Cryptography Standards) make use of ASN.1. Simply said, it's a simple and common way to define data structures.

Beside native data types like booleans, integer numbers, real numbers, date-times, strings and null, ASN.1 includes keywords to build complex data types. In the above example, SEQUENCE was used to build a C like struct data structure and  SEQUENCE OF for a list of numbers between 0 and 255.
The CHOICE keyword acts much like C's union keyword. It's used to pack several alternative data structures into a same space. One very special primitive type is Object Identifier. It's used to reference an already globally registered data type o semantic interpretation. For example, the commonName value used in digital certificates Subject field of type Name has the globally registered ID: 2.5.4.3.

On the other hand, DER is a way to digitally encode ASN.1 data structures with the goal to transfer this information to some other party.

If you have a certificate in PEM format it's easy to convert them to DER with OpenSSL:

openssl x509 -in cert.pem -out cert.der -outform DER


The digital certificate data structure


The X.509v3 digital certificate data structure is quite complex. The IETF has published its format as used on the Internet, which has evolved over time: RFC 2459  ->  RFC 3280  ->  RFC 5280.

Here is the ASN.1 description of the first two hierarchy levels:

Certificate ::= SEQUENCE {  
     tbsCertificate       TBSCertificate,  
     signatureAlgorithm   AlgorithmIdentifier,  
     signatureValue       BIT STRING }  
   
TBSCertificate ::= SEQUENCE {  
     version         [0]  EXPLICIT Version DEFAULT v1,  
     serialNumber         CertificateSerialNumber,  
     signature            AlgorithmIdentifier,  
     issuer               Name,  
     validity             Validity,  
     subject              Name,  
     subjectPublicKeyInfo SubjectPublicKeyInfo,  
     issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,  
                          -- If present, version shall be v2 or v3  
     subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL,  
                          -- If present, version shall be v2 or v3  
     extensions      [3]  EXPLICIT Extensions OPTIONAL  
                          -- If present, version shall be v3  
     }


Well, this doesn't seem to be quite complex. Name is basically a collection of tuples (Object Identifier, Value), where:

  • Object Identifier is a globally (Internet) registered identifier which you can look up in the internet, e.g. in oid-info.com. One example could be "commonName" which is 2.5.4.3.
  • Value is normally a string. (There are several types of strings in ASN.1.)
The "not so trivial" part of this data structure is the extensions part which only may be present in X.509 certificates of version 3 or later. The original RFC states:

   The extensions defined for X.509 v3 certificates provide methods for
   associating additional attributes with users or public keys and for
   managing the certification hierarchy.

One of the more interesting standard extensions is the Subject Alternative Names (aka SubjectAltName) extension:

   The subject alternative names extension allows additional identities
   to be bound to the subject of the certificate.  Defined options
   include an Internet electronic mail address, a DNS name, an IP
   address, and a uniform resource identifier (URI).  Other options
   exist, including completely local definitions.  Multiple name forms,
   and multiple instances of each name form, may be included.  Whenever
   such identities are to be bound into a certificate, the subject
   alternative name (or issuer alternative name) extension MUST be used.

   Because the subject alternative name is considered to be
   definitiviely bound to the public key, all parts of the subject
   alternative name MUST be verified by the CA.

As things happen, some of our spanish officially recognized Certificate Authorities packs non-standard attributes into SubjectAltNames. The extension data is available again as a DER encoded ASN.1 data package, so that you have to feed it through the appropriate parser.

About reading X.509 digital certificates with Python


Now, we already know that X.509 certificates are ASN.1, DER-encoded data structures. Thanks to the excellent PyASN1 library we can read those data structures. But something is still missing. DER encoded ASN.1 data packages are not self describing, i.e. we must have a data structure description, just like a C typedef struct or a Python class definition.

It would be great to have a ASN.1 to PyASN1 compiler. Then we could pick up the X509v3 ASN.1 description and translate it to Python. Until recently there was none but now there is an attempt to fill this gap: asn1ate. Before, most existing data model descriptions for PyASN1 were translated by hand. The separate PyASN1-modules package includes common data structures like PKCS12, X509v3 (RFC2459), etc.

Here is one example: the PKCS12 data structure translated to PyASN1:
#
# PKCS#12 syntax
#
# ASN.1 source from:
# ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-12/pkcs-12.asn
#
# Sample captures could be obtained with "openssl pkcs12" command
#
from pyasn1.type import tag, namedtype, namedval, univ, constraint
from pyasn1_modules.rfc2459 import *
from pyasn1_modules import rfc2251
class Attributes(univ.SetOf):
    componentType = rfc2251.Attribute()
class Version(univ.Integer): pass
class CertificationRequestInfo(univ.Sequence):
    componentType = namedtype.NamedTypes(
        namedtype.NamedType('version', Version()),
        namedtype.NamedType('subject', Name()),
        namedtype.NamedType('subjectPublicKeyInfo', SubjectPublicKeyInfo()),
        namedtype.NamedType('attributes',
            Attributes().subtype(implicitTag=tag.Tag(
                tag.tagClassContext, tag.tagFormatConstructed, 0)))
    )

Hands on with PyASN.1


Now, let's try parse a certificate. You can find the test certificate used in this example in the pyx509 package described below. You can also generate your own certificate with OpenSSL:
openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.der -days 1000 -outform DER -nodes


First, we'll install pyasn1 and pyasn1-modules:

$ sudo pip install pyasn1
Downloading/unpacking pyasn1
  Downloading pyasn1-0.1.7.tar.gz (68kB): 68kB downloaded
  Running setup.py egg_info for package pyasn1
    
Installing collected packages: pyasn1
  Running setup.py install for pyasn1
    
Successfully installed pyasn1
Cleaning up...
$ sudo pip install pyasn1-modules
Downloading/unpacking pyasn1-modules
  Downloading pyasn1-modules-0.0.5.tar.gz
  Running setup.py egg_info for package pyasn1-modules
    
Requirement already satisfied (use --upgrade to upgrade): pyasn1>=0.1.4 in /Library/Python/2.7/site-packages (from pyasn1-modules)
Installing collected packages: pyasn1-modules
  Running setup.py install for pyasn1-modules
    
Successfully installed pyasn1-modules
Cleaning up...

Now we'll go ahead and read a certificate:

$ python
>>> from pyasn1.codec.der.decoder import decode
>>> from pyasn1_modules import rfc2459
>>> derData = file('cert.der', 'rb').read()
>>> cert, rest = decode(derData, asn1Spec=rfc2459.Certificate())
>>> print cert.prettyPrint()
Certificate:
 tbsCertificate=TBSCertificate:
  version='v3'
  serialNumber=1019333950
  signature=AlgorithmIdentifier:
   algorithm=1.2.840.113549.1.1.5
   parameters=0x0500

  issuer=Name:
   =RDNSequence:
    RelativeDistinguishedName:
     AttributeTypeAndValue:
      type=2.5.4.6
      value=0x13024553
    RelativeDistinguishedName:
     AttributeTypeAndValue:
      type=2.5.4.10
      value=0x1304464e4d54
    RelativeDistinguishedName:
     AttributeTypeAndValue:
      type=2.5.4.11
      value=0x130f464e4d5420436c6173652032204341

  validity=Validity:
   notBefore=Time:
    utcTime=100903074356Z

   notAfter=Time:
    utcTime=130903074356Z

  subject=Name:
   =RDNSequence:
    RelativeDistinguishedName:
     AttributeTypeAndValue:
      type=2.5.4.6
      value=0x13024553
    RelativeDistinguishedName:
     AttributeTypeAndValue:
      type=2.5.4.10
      value=0x1304464e4d54
    RelativeDistinguishedName:
...

Not bad for the first attempt. But nearly all attributes seem to be encoded. Let's get the subject and see if we can transform it to a readable string.

>>> cert = cert['tbsCertificate'] # just get the core part of the certificate
>>> subject = cert['subject']
>>> rdnsequence = subject[0] # the subject is only composed by one component
>>> for rdn in rdnsequence:
...    oid, value = rdn[0]  # rdn only has 1 component: (object id, value) tuple
...    print oid, ':', str(value)
...
2.5.4.6 : ES
2.5.4.10 : FNMT
2.5.4.11 : FNMT Clase 2 CA
2.5.4.11 :  703002474
2.5.4.3 : 8NOMBRE REVILLA DERKSEN ALEJANDRO ERNESTO - NIF ...

Now we have some readable output. The Object Identifiers have the following meaning:

  • 2.5.4.6: countryName, abbreviated: C
  • 2.5.4.10: organizationName, abbreviated: O
  • 2.5.4.11: organizationalUnitName, abbreviated: OU
  • 2.5.4.3: commonName, abbreviated: CN
With OpenSSL, it is normally displayed like this:
$ openssl x509 -in ub1204/svn/pyx509/exampledata/cert.der -inform DER -subject -noout
subject= /C=ES/O=FNMT/OU=FNMT Clase 2 CA/OU=703002474/CN=NOMBRE REVILLA DERKSEN ALEJANDRO ERNESTO - NIF ...


Using pyx509 to parse X.509 certificates


The pyx509 library is an attempt to offer a more Python like data structure. It brings it's own model of X.509 for PyASN1. My fork of pyx509 includes the possibility to parse / display SubjectAltName directory name (dirName) name parts. Sorry, no still no PyPi / setup.py, so you have to download the zip/tar ball and uncompress it.

Here an example:

./x509_parse exampledata/cert.der
=== X509 Certificate ===
X.509 version: 3 (0x2)
Serial no: 0x3cc1cd3e
Signature algorithm: SHA1/RSA
Issuer: C=ES, O=FNMT, OU=FNMT Clase 2 CA
Validity:
 Not Before: 2010-09-03 07:43:56
 Not After: 2013-09-03 07:43:56
Subject: C=ES, CN=NOMBRE REVILLA DERKSEN ALEJANDRO ERNESTO - NIF ..., O=FNMT, OU=703002474, OU=FNMT Clase 2 CA
Subject Public Key Info:
 Public Key Algorithm: RSA
  Modulus: (b64)
...
  Exponent: 65537

Extensions:
...
 Subject Alternative Name: is_critical: False
  email: ernesto.revilla@gmail.com
  dirName: Apellido1=REVILLA, Apellido2=DERKSEN, DNI=..., Nombre=ALEJANDRO ERNESTO
...
=== EOF X509 Certificate ===

This seems to give us a much more usable output and may be a good alternative to parsing OpenSSL output.

Displaying digital signatures / timestamps with pyx509

With pyx509 we can also display some data of digital signatures complying PKCS7:


= PKCS7 signature block =
PKCS7 Version: 1
== Encapsulated content Info ==
ContentType: data
Content: None
== Signer info ==
Certificate serial number: 0x89bbba0749918db3
Issuer: C=es, ...
Digest Algorithm: SHA-1
Signature: (b64)
 gXpU5jadSY+FVBoeCdvn1/m5bzEMzN3ZKuiN9sPk79iJgX+DDDOMH6K5Scnh
 wLL7nHRT983GlhTY1A2QE1VryWTbuBGK08oalKIM8QZs3UfZa5dXsx83eS4b
 /M/icfIf6CHu1fWZ4VBJ4mva2N3nh2r0FV09bvuj1bodl4kXJAs=
Attributes:
     contentType: data
     serialNumber: 0x89bbba0749918db3
     signingTime: 2011-10-04 14:36:51
     messageDigest: y8OX3qoZBY4/Cc6/w0xuRqzzQzU=
     signingCertificate: 0x89bbba0749918db3
== EOF Signer info ==
=== X509 Certificate ===
X.509 version: 3 (0x2)
Serial no: 0x89bbba0749918db3
Signature algorithm: SHA1/RSA
Issuer: C=es, ...
Validity:
 Not Before: 2011-09-17 00:00:00
 Not After: 2031-09-12 11:09:54
Subject: C=es, CN=REVILLA DERKSEN, ALEJANDRO ERNESTO...
Subject Public Key Info:
 Public Key Algorithm: RSA
  Modulus: (b64)
   AOs2/Pip46F5BJPBQd/5bwS1HO97lJ74ZjJfGtvEH831d6Ld4bsF9jdFOjlx
   mv+kxYNFryZZFWM109+zng/PiU8NZPRZt4XlTO7qb3r2g5AR17EQWJNokQto
   s3w3cXSEDPxxFmTHEhGarTLddEg2o1v9/UIlMS8mzHej0Q9uBuuh
  Exponent: 65537

Extensions:
 Authority Key Id Ext: is_critical: False
  key id: (b64)
   AiuDvGb4bxWnCsZJ9/RHNrRhSxk=
 Basic Constraints Ext: is_critical: False
  CA: False
  max_path_len: None
 Subject Alternative Name: is_critical: False
  email: tramitacion.electronica@telefonica.es
  dirName: Apellido1=REVILLA, Apellido2=DERKSEN, DNI=..., Nombre=ALEJANDRO ERNESTO
 Subject Key Id: is_critical: False
  key id: (b64)
   Zh0L6JJSz+GgiCimE4U7s5PHH+g=
Signature: (b64)
 k1OVoQyNZv0ASor/bitI6JgJm37piIheIzwdKSgEtKeQuIXfA5V5rclPVUg7
 PW71JTQyY8iDbvJB4sb4FH5XyjOXUmf3CXiG7ppS48cQXSf1k3wHWZB0neTE
 V3XxZnPjqWvv0x0ScsOGKxpHjyy8SFZMKR6tnfQ4TXfHMxid7dw=
=== EOF X509 Certificate ===


We can clearly see that there is one signature block (Signer Info) which specifies the original message's digest, the digest algorithm used (SHA-1), the signature, a reference to the certificate, the certificate itself and the signing time.

Here one example for a time stamp token gotten from a public Time Stamp Authority (TSA):
./pkcs7_parse.py exampledata/timestamp.tst = PKCS7 signature block = PKCS7 Version: 3 == Encapsulated content Info == ContentType: TimeStampToken === Timestamp Info === Version: 1 Policy: 1.3.4.6.1.3.4.6 msgImprint: Algorithm Id: 1.3.14.3.2.26 Value: (b64) rnLdD3molzRsebPvq7oOSG9n8fU= Serial number: 134059559 Time: 20131011084712Z ==== Accuracy ==== Seconds: 1 Milis: 1 Micros 2 ==== EOF Accuracy ==== TSA: === EOF Timestamp Info === == Signer info == Certificate serial number: 0x5079e Issuer: C=ES, CN=MINISDEF-EC-WPG, O=MDEF, OU=PKI Digest Algorithm: SHA-1 Signature: (b64) ...
Attributes: contentType: TimeStampToken messageDigest: KpRSk0vbBke+8G40MIII9NNb51E= signingCertificate: 0x5079e == EOF Signer info == === X509 Certificate === X.509 version: 3 (0x2) Serial no: 0x5079e Signature algorithm: SHA1/RSA Issuer: C=ES, CN=MINISDEF-EC-WPG, O=MDEF, OU=PKI Validity: Not Before: 2011-08-17 09:50:22 Not After: 2021-08-17 09:50:22 Subject: C=ES, CN=Sello de tiempo TS@ - @firma - desarrollo, O=MDEF, OU=PKI, serialNumber=S2833002E Subject Public Key Info: Public Key Algorithm: RSA Modulus: (b64) ...
Exponent: 65537 Extensions: ...
Extended Key Usage: is_critical: True timeStamping Key Usage: is_critical: True digitalSignature,nonRepudiation Subject Alternative Name: is_critical: False email: soporte.afirma5@mpt.es dirName: CN=TS@- Autoridad Sellado de tiempo-desarrollo, O=Ministerio de la Política Territorial y Administración Pública, certType=sello de tiempo, serialNumber=S2833002E ...
=== EOF X509 Certificate === = EOF PKsCS7 signature block = 
 

Conclusions


Although pyx509 is rather incomplete it may fulfill your needs and may be an alternative to parsing certificates, digital signatures and timestamps with OpenSSL.




6 comentarios:

  1. Excellent article full of useful information. Thanks, Deny.

    ResponderEliminar
  2. Nice article Erny. Your little module looks very promising.

    Just a small question: are the timestamps in UTC or is there a way to specify a timezone?

    Keep the good work!

    ResponderEliminar
  3. Hi Lorenzo.

    RFC 5280 (x509v3) states for Validity datetimes:

    CAs conforming to this profile MUST always encode certificate validity dates through the year 2049 as UTCTime; certificate validity dates in 2050 or later MUST be encoded as GeneralizedTime. Conforming applications MUST be able to process validity dates that are encoded in either UTCTime or GeneralizedTime.


    RFC 2985 (PKCS#9 Selected Object Classes and Attribute Types) states for the SigningTime syntax the following quoted text:
    "Dates between 1 January 1950 and 31 December 2049 (inclusive) MUST be encoded as UTCTime. Any dates with year values before 1950 or after 2049 MUST be encoded as GeneralizedTime. [Further,] UTCTime values MUST be expressed in Greenwich Mean Time (Zulu) and MUST include seconds (i.e., times are YYMMDDHHMMSSZ), even where the number of seconds is zero. Midnight (GMT) must be represented as "YYMMDD000000Z". Century information is implicit, and the century shall be determined as follows:

    - Where YY is greater than or equal to 50, the year shall be interpreted as 19YY; and
    - Where YY is less than 50, the year shall be interpreted as 20YY.

    GeneralizedTime values shall be expressed in Greenwich Mean Time (Zulu) and must include seconds (i.e., times are YYYYMMDDHHMMSSZ), even where the number of seconds is zero. GeneralizedTime values must not include fractional seconds."


    Thanks for your nice comments. :-)

    ResponderEliminar
  4. Cool, I didn't know about the GeneralizedTime but it makes perfect sense

    ResponderEliminar
  5. Thanks for your time in writing this, Erny. However, pyx509 seems broken. There are errors at both the command-line, as well as importing and calling the functionality:

    https://github.com/hiviah/pyx509/issues/1
    https://github.com/hiviah/pyx509/issues/2

    ResponderEliminar
    Respuestas
    1. Thanks for your feedback. I'll try to find some time in the next few weeks to check "otherName" extension. I would be glad if you could provide some test certificate with this extension.

      Eliminar