SecureBlackbox 16: Securing PDF documents

Note: This article applies only to SecureBlackbox Legacy. For future development please consider using the latest version.

Portable Document Format (PDF) has gained wide popularity as a standard for the distribution of printable documents and presentations: manuals for computer hardware and cars are distributed in PDF format; internet standards and standard proposals are now offered as portable documents; search engines process PDF files and index them. Among its other uses, PDF has become a standard for reporting and form-filling activities.

Of course, not all documents published in PDF format are released for public use. Privately distributed documents may contain commercial secrets, personal information, or other data that should not be disclosed. And even public documents may not be intended for any use. Some publishers may decide to let users read documents, but not print them. Other limitations of use are also possible. The PDF format defines certain Digital Rights Management (DRM) techniques to enforce such limitations. Unfortunately, third-party software can circumvent most of them, and no suitable technical solution has been found so far.

The most important factor for e-commerce is that the document can be encrypted in order to prevent access by unauthorized parties. The PDF specification defines several ways to encrypt the document. Today, the two schemes most often used are symmetric and asymmetric encryption. We will review them below in detail.

Another common task is proof of authenticity. Publishers often need to confirm that the document was really published by them. This is done by electronically signing the document. The user of the document can later check the signature and ensure that the document was published by a certain publisher and that it has not been tampered with since it has been published. The PDF specification defines a certificate-based signing scheme and enables third parties to extend the specification with more schemes. When reporting and form-filling activities are undertaken, the electronic signature becomes an integral part of the document being created.

One great benefit (and at the same time a certain disadvantage) is that PDF security is built into the specification. In other words, the encrypted or signed document is still a PDF document and can be opened with PDF management software without taking extra action. Of course, this software will need to decrypt the document if it is encrypted, but in all cases the software can reach the metadata (information about the document) and document structure. The signer of a document can fill document forms while not breaking the integrity of the document. This benefit is not available with other document formats so far.

Contents

Standard Encryption Measures: Symmetric Encryption

Currently, most PDF publishers use built-in symmetric encryption of the document. This encryption is supported by the Acrobat software and by many other solutions for creating and viewing PDFs.

Symmetric encryption is often called password-based. With symmetric encryption, the publishers and intended users of the documents must know some secret key (a password or pass-phrase). This key is used both to protect the document and to access the protected data. Once the document has been protected (encrypted), the secret key is required to get access to the document contents. The scheme is easy to use since the secret key is usually text; however, the disadvantage is also significant -- the password or secret key must be passed from the publisher to the intended users. Also, the password is the same for each user, so when the documents are published on a regular basis and some user must be excluded from the list, then the password must be changed and the rest of the users must be notified about this approach. Management of passwords in this case can become a nightmare.

Another disadvantage of password-based encryption is that passwords chosen by people are often weak. A detailed analysis of attacks on password-based encryption is beyond the scope of this article, but password-based encryption is often vulnerable to the following types of attacks:

  • A dictionary attack: This attack uses a dictionary of words.
  • A brute-force attack: This attack constructs the passwords one by one and tries them on the document.

The PDF specification defines the use of RC4 and AES symmetric algorithms to encrypt the data. For RC4, the length of the encryption key (the encryption key is usually derived from the password) is 40 or 128 bit. For AES, the length of the encryption key is 128 bit. 40-bit keys are very weak and do not provide the desired level of protection. However, 128-bit symmetric encryption is subject to software export regulations in many countries; this may cause problems if you develop software that creates or manages encrypted PDF documents.

As mentioned earlier, the encryption key is derived from the secret key or password in some way. This means that no matter how long your password is, security will not be stronger than that defined by the length of the encryption key. 128-bit security, on the other hand, provides the necessary level of confidence (if you take measures to counteract the above-mentioned attacks).

When encryption is used, the data chunks are encrypted. But, the document structure is generally available since it is not encrypted. Also, by default, the same encryption method is used for the whole document.

Asymmetric Encryption

Asymmetric encryption is often called public-key encryption. In fact, a pair of keys are used - a public key and a secret (private) key. Each intended user of the PDF document must have a pair of keys -- the author, Alice, for example, gives her public key to the publisher and keeps the private key in a safe place. The publisher uses Alice's public key to encrypt the document. Each user applies the private key that belongs to them to decrypt the document and to read it.

Consider an example of a client, Bob, opening a bank account. To open the account, he must fill an application form and send it to the bank. In the case of symmetric encryption, Bob has to encrypt the application form with the password and then send the form and password to the bank. Both of these information blocks can be intercepted, analyzed, and used against the bank or the client. Also, the bank must manage a myriad of passwords from all of its clients. With asymmetric encryption the bank maintains just a pair of keys. And each client uses the bank’s public key to encrypt the application.

From a technical point of view, the keys are used to encrypt or decrypt a symmetric encryption key, which in turn is used with the above-mentioned encryption algorithms to encrypt the actual data. But since the encryption key is created randomly for each data encryption operation, it is not vulnerable to guess (dictionary) attacks or most brute-force attacks.

The scheme with asymmetric encryption is also easier to use when the document series are distributed. If the user is to be excluded from the distribution, the document is just not encrypted with this user's key. There is no need to distribute new passwords, as in the case of symmetric encryption.

Unlike passwords, public and secret keys in asymmetric encryption are quite long (each key is 128 bytes or longer). This makes it harder to manage those keys. To solve the problem of managing such long keys, X.509 certificates from PKI were employed. (Public key infrastructure, or PKI, is a set of standards that define the creation, use, and management of key pairs in asymmetric encryption.) With an X.509 certificate, the public key is contained in the certificate structure, which, besides the key itself, also contains information about who created the key and who is authorized to use the key, the validity period for the key, the intended use of the certificate, and more. The private key is linked to the X.509 certificate.

Certificate management is a large topic, which is briefly discussed in other articles written by EldoS. In relation to PDF security, you need to know that certificates are a useful way to solve certain security problems. Also note that certificates involve the use of certificate authorities, which prevent certificates from being compromised. Certificate validation requires checking each certificate with the authority to ensure that the certificate has not been revoked (canceled) and that it has not been tampered with or forged. This is a purely technical step that must be taken to properly implement PKI management, but this step does exist.

As with symmetric encryption, public-key based encryption encrypts the data chunks in the whole document.

Signing the Documents

As mentioned above, signing the document is necessary to prove that the document was really created by the person pretending to be its author and also to ensure that the document was not altered in any way. For example, when sending a tax report, Alice, as the reporter, must ensure that she is responsible for the report data. Also, there must be some counteraction to attacks from third parties, who could alter the report and harm the reporter.

Due to its nature, data signing involves PKI and key pairs. The signing process works as follows:

  1. Bob, the publisher, calculates the hash (a special number, derived from the document data) and uses his private key to encrypt the calculated hash.
  2. Bob encloses the encrypted hash and the public key with the document.
  3. Alice, the user of the document, checks the validity period of the enclosed public key and decrypts the enclosed hash with the enclosed public key.
  4. Alice calculates the hash too and compares it with the hash enclosed with the document.

As with asymmetric encryption, the management of the key pairs becomes much easier if X.509 certificates are employed. The PDF specification defines two signing schemes that both use X.509 certificates.

The document signature is applied to the document as a whole. It is possible to exclude PDF forms (each form can be excluded separately, and it can be excluded completely or just certain fields) from the signing process. This lets the user fill the already signed document without breaking the signature.

Time-Stamping

Sometimes it is important to know when exactly the document was issued. For example, if a tax report is sent, it is important to ensure that it was sent within the period defined by the law. If the publisher (the reporter in our example) just puts the time to the document, this will not be proof, since the value can be incorrect. To ensure the correct value, a third-party time-stamping service must be used. This service takes the document (or the hash of the document data) and issues some other data block (a time-stamp). This block can be verified later to ensure the correctness of the time-stamp. In this scenario the time mark is actually put by the third-party trusted service.

Pluggable Security

The PDF specification does not limit publishers and software developers to predefined encryption, signing, and compression schemes, but lets them extend the functionality by introducing other schemes, using so-called document handlers. For example, one may introduce biometric signing or PGP-based encryption by creating an add-in module for Acrobat and defining its own scheme as a specification for developers of third-party software. Existing add-ins include handlers for the PKI-based and biometric signing of documents.

If some specifications that perform document processing are introduced, it is not absolutely necessary to create an Acrobat add-in. The security handler that is implemented in your software might be enough for your needs.

If you decide to create an Acrobat security handler, you need to know that Acrobat supports signed and unsigned add-ins. Add-ins are usually signed by Adobe, after which they become trusted add-ins. If the add-in is not trusted, it can be blocked by Acrobat. So it makes sense to sign the add-in with Adobe.

PDF Security in Your Applications

When you create a software product that displays, creates, or in some other way manages PDF documents, you will have to deal with PDF security. Most software components and libraries used for PDF creation and management support the symmetric encryption of documents. However, the vast majority of PDF management solutions do not support PKI-based encryption and signing.

SecureBlackbox (with its PDFBlackbox package) provides support for both symmetric and certificate-based encryption and for certificate-based signing. SecureBlackbox is a collection of components for Windows and .NET development.

We appreciate your feedback.  If you have any questions, comments, or suggestions about this article please contact our support team at kb@nsoftware.com.