Articles

VPN implementation pitfalls on routers

Although IPSec and IKE have been out for a long time, and considered by the Internet community to be mature protocols, you can still find low-level issues in certain designs. When an implementation represents internal keying material that is shorter than expected, it can cause interoperability problems with other implementations. Developers that used draft versions of the IKE NAT traversal specification nay have code causing IKE tunnels to never initiate. This can cause big headaches for SOHO router users trying to use VPN functionality.

Example IKE traffic flow

vpn ipsec ike example flow

Here’s a simple example of a gateway setting up a VPN tunnel due to a trigger from the LAN (from our ike_1.tcl test case). The LAN client starts by attempting an ICMP echo (ping). The gateway begins the IKE sequence, going through both main and quick modes of operation.

When the tunnel is established, the ICMP echo is passed through the tunnel but it’s encrypted in the ESP encapsulated packets.

Short Diffie-Hellman secrets in IKE

IKE employs the Diffie-Hellman key agreement protocol to choose a shared key used in protecting Phase 1 (and Phase 2 when using PFS) exchanges. For mathematical reasons we will not get into, the shared key is expected to have the same number of bits as the Diffie-Hellman group used to generate it. This is not always the case.

There is a small chance that in some instances, the shared key can be several bits shorter than expected. Since it is used inside a hashing function to generate keying material, it is very important that interoperable implementations handle these short keys the same way.

From RFC 2409 - section 5:

The Diffie-Hellman public value passed in a KE payload, in either a phase 1 or phase 2 exchange, MUST be the length of the negotiated Diffie-Hellman group enforced, if necessary, by pre-pending the value with zeros.

RFC 2409 makes no mention of the shared secret or what to do in the case where it is shorter than the Diffie-Hellman group. Should it be padded up to the expected length like the public value is?

Was this handled in IPSec?

A thread posted to the IPsec mailing list (ipsec@lists.tislabs.com) in April of 1999 brought up the topic of how to handle a short shared secret. Archived at www.sandelman.ottawa.on.ca/ipsec/1999/04/threads.html in the discussion, titled: “representation of IKE DH shared secret”

IPsec mailing list participant Tero Kivinen suggested the following rewrite of the specification:

gxy is the Diffie-Hellman shared secret. When this value is included in the SKEYID calculation as a input for PRF it MUST be prepended with zero bits up to 8-bit boundary, so that it has same length in octects [sic] than group prime number p. When this value is used in the hash calculation it MUST be in network byte order.

From: www.sandelman.ottawa.on.ca/ipsec/1999/04/msg00092.html

Dan Harkins, co-author of the IKE RFC 2409, agreed with Kivinen in a reply to the mailing list. Saying,

Sounds good. I will incorporate this and similar verbage on EC into the document.

From: www.sandelman.ottawa.on.ca/ipsec/1999/04/msg00097.html

The IKE RFC was last updated in November 1998 and this discussion didn’t occur until April of 1999. That is most likely the reason the updated wording never made it into the RFC.

Further support for this position can be found in RFC 2631 (Diffie-Hellman Key Agreement Method) under section 2.1.2 where it discusses the shared key ZZ:

Leading zeros MUST be preserved, so that ZZ occupies as many octets as p._ - RFC2631

At the time, early IKE Bakeoffs were discovering that all the interoperable implementations pre-pended zeros to short shared keys prior to using them in the hash function. They made sure that the shared key followed the same length requirements as KE did in the RFC.

How Diffie-Hellman math works

At its core, the Diffie-Hellman key agreement protocol is nothing more than a bit of multiplication. To quickly demonstrate the underlying principles, consider the following situation between Alice and Bob.

Alice and Bob each choose their own separate private values. Alice chooses a and Bob chooses b. They have already agreed on values for parameters g and p. Both parties compute their public keys (Ya and Yb respectively) as follows:

    alice public key:   Ya = g^a mod p
    bob public key:   Yb = g^b mod p

They may safely exchange the values of Ya and Yb in the clear, without any concern about compromising the secrecy of K, the shared secret key. Once they have exchanged their public keys, Alice computes:

    K = Yb^a mod p

And Bob:

   K = Ya^b mod p

Since gab = gba, Alice and Bob are each able to compute the same value for K.

In order for the math to protect our shared key K, the choices for a, b, and p are very important. They need to be Big Numbers. Bigger than the majority of programming languages can support by themselves. Thankfully, we have some help in that area.

Does OpenSSL solve the problem?

The folks working on the OpenSSL project have given us not only a library that helps compute these Big Numbers, but an entire API that takes care of the Diffie-Hellman key agreement protocol.

First, we look at the DH group of functions provided by OpenSSL. These functions implement the Diffie-Hellman key agreement protocol, and handle all related exponentiation and factoring.

At the heart of OpenSSL’s Diffie-Hellman support is a struct that has the following members: (There is more to it than this, but these are the ones important to this discussion)

    struct DH {
        BIGNUM *p;
        BIGNUM *g;
        BIGNUM *pub_key;
        BIGNUM *priv_key;
    }

Look familiar? The DH struct holds all of the necessary information to compute and share Diffie-Hellman keys. We install values for p and g, and OpenSSL takes care of the rest.

    DH *dh;
    dh = DH_new();
    dh->p = p;
    dh->g = g;  
    DH_generate_key(dh);

DH_generate_key produces values for the private-key which we keep to ourselves, and the public-key which we share with the other peer. Once the IKE peer responds with their own public-key value (pub), we can use it to compute the shared key. This time we put our private-key (priv) from above into the struct as well.

    int i = 0;
    dh->p = p;
    dh->g = g;
    dh->priv_key = priv;

    i = DH_compute_key(shared_key, pub, dh);

The variable shared_key should be an unsigned char* that will hold the secret shared-key. The length of shared_key in bytes is the return value of DH_compute_key, stored in the variable i.

As was mentioned earlier, this shared key should be the same length as the Diffie-Hellman group that was used to compute it. In our example, that group means the Prime Number p. OpenSSL provides us with another function useful here: DH_size(dh) which returns the expected length in bytes of the shared key.

It is very important to note that DH_size() doesn’t know what the shared key will actually be, just how long it is expected to be. When we use DH_compute_key() to actually handle the math, its return value (i in our example) will be the actual length of the shared key.

What’s The Significance?

When OpenSSL computes a shared key, it only counts the significant bits in that number. Consider using the 768-bit prime number defined in Oakley group 1 in the Diffie-Hellman protocol. The resulting shared key should also have 768 bits. However, generating a key with some specific number of bits doesn’t always set the highest bits in that key. There is a 1256 chance that the result can have no significant bits set in the first byte of the bit stream. Statically, 1 in 256 keys will be short. Because OpenSSL may not include bytes with leading zeros, the shared key may not be the expected length.

At this point we have the secret key stored in the variable shared_key, and from DH_compute_key() we have the length of that key stored in the variable i. As we now know, the actual length of the shared key is not always the same as the expected length.

It is important for the implementation to realize that the key may be shorter than expected, and compensate by pre-pending zeros to the key before applying a hash function to it.

Since the public transmitted in KE is also required to have zeros pre-pended to it, it is suggested that the same method used there is applied to the shared key to enforce its length.

Because this problem has a fairly good probability of occurring, it is not uncommon. Several vendors have managed to avoid the leading-zero problem altogether. In the middle of the DH-Key negotiation process if a private key that is shorter than the Diffie-Hellman group is calculated, the current Phase 1 or Phase 2 exchange is abandoned and the device will reinitiate a new one. The chances of the next exchange also having a troublesome key are much lower, so this second attempt should succeed, and does not negatively affect the overall establishment of the IKE phase.

Is My Device Affected?

This problem can be difficult to diagnose in existing implementations. Many vendors test their devices back-to-back with the same identical device. In that scenario, both devices will use the same short key to generate the same keying material, and the connection will always succeed.

If the device is tested against another implementation that correctly adds leading zeros to a short shared key, then the keying material they use for encryption of future IKE payloads will not match. Upon receipt by the other side, the payload will not be able to be decrypted. This can appear in logs and test results as anything from a PAYLOAD-MALFORMED to an INVALID-PAYLOAD-TYPE error. Errors such as those can easily be attributed to a mangled or fragmented packet or other common network interference.

Typically in that situation, the packet is dropped, and the Phase 1 or 2 negotiation is restarted eventually. Usually sooner than later, and the VPN connection is established. Since the problem is actually in the internal representation of the shared key, it is difficult to track down. Combined with the infrequency of short keys, this error could easily slip through interoperability testing.

Testing the device against a tool that is able to highlight specific decryption errors and the related keying material will help greatly in tracking down this problem. It is also helpful if the device has a log that displays internal variables for debugging purposes.

IKE NAT traversal

If you’re building an IKE implementation, there’s other common bugs, security, and interoperability issues that can arise. We often find these bugs while testing devices in our own lab here at QA Cafe.

When performing tests using the CDRouter IKE add-on, CDRouter and the DUT go through the steps of setting up an IKE tunnel as part of the test setup. During this process, the DUT conveys version and feature support information for the IKE tunnel, including its method for NAT traversal in the IKE. This is performed through an exchange of vendor-ID payloads using the NAT-D packets.

This interesting bug came as result of a change in the Payload Type identifier in the published version of IETF RFC 3947, Negotiation of NAT-Traversal in the IKE, versus its value in draft versions of the document. The standard specifies the Payload Type value to be 20. In earlier versions of the draft, it was specified to be 15. Protocol stacks that implemented IKE NAT traversal before the standard was completed may be using the wrong Payload Type value, which will cause the IKE tunnel to not successfully initiate.

The moral of this story: be careful when implementing a standard before it is finished! You might miss important changes later.

Testing in CDRouter

QA Cafe’s CDRouter IKE add-on provides a straight forward testing environment for detecting these problems. Prior to the release of QA Cafe’s CDRouter-IKE test suite, automatic testing for DH-key issues was not available. CDRouter IKE offers a quick and deterministic way to verify if an implementation is at risk for DH-key issues.

CDRouter IKE offers over 50 unique test cases that range from basic IKE behavior to specific interoperability problems such as the DH-key issue. During IKE Phase 1 and 2 exchanges, once CDRouter IKE learns the peer’s public key, it searches for a corresponding key pair that will produce a zero-leading private key. This allows CDRouter IKE to force this condition to occur and quickly verify if an IKE implementation is at risk.