Domain separation

Domain separation refers to separating objects into well-defined, distinct domains. One example of this is metadata and plaintext: during cryptographic processing, it is crucial that we treat metadata and plaintext as two distinct types of strings, otherwise attacks may become possible.

Simple example

Consider this simple authenticated encryption scheme:

E1(key K, nonce N, metadata A, plaintext P):
    C ← P + FK(N)
    T ← 0t + FK(C ∘ A ∘ N)
    return (C,T)

D1(key K, nonce N, metadata A, ciphertext C, tag T):
    T' ← 0t + FK(C ∘ A ∘ N)
    if T' = T then
        P ← C - FK(N)
	return P
    return ⊥

What if we wanted to extend the capabilities of this mode to allow for optional metadata/plaintext? That is, what if we wish to save on the cost of absorbing a blank string by adding some conditional logic?

E2(key K, nonce N, metadata A, plaintext P):
    C ← P + FK(N)
    if A ≠ null and P ≠ null then
        T ← 0t + FK(C ∘ A ∘ N)
    else if P ≠ null then
        T ← 0t + FK(C ∘ N)
    else
        T ← 0t + FK(A ∘ N)
    return (C,T)

D2(key K, nonce N, metadata A, ciphertext C, tag T):
    if A ≠ null and P ≠ null then
        T' ← 0t + FK(C ∘ A ∘ N)
    else if P ≠ null then
        T' ← 0t + FK(C ∘ N)
    else
        T' ← 0t + FK(A ∘ N)

    if T' = T then
        P ← C - FK(N)
	return P
    return ⊥

This mode may seem secure at first glance, but it is actually broken due to an ambiguity between string types. There are multiple ways of arriving at an identical internal state.

Scenario one:

E2(K, N, A, "")
Final state: A ∘ N

Scenario two:

E2(K, N, "", A)
Final state: A ∘ N

Both of the above produce an identical authentication tag, which means that an adversary could swap the strings around in transit and the receiver would not detect a problem. This is due to the fact that both the metadata and ciphertext strings are being treated identically when they are input into the Deck function. The solution to this problem is to use domain separation. One simple way to achieve this is to mark the strings with a frame bit to indicate whether it is a metadata string or a ciphertext string:

E3(key K, nonce N, metadata A, plaintext P):
    C ← P + FK(N)
    if A ≠ null and P ≠ null then
        T ← 0t + FK(C||1 ∘ A||0 ∘ N)
    else if P ≠ null then
        T ← 0t + FK(C||1 ∘ N)
    else
        T ← 0t + FK(A||0 ∘ N)
    return (C,T)

D3(key K, nonce N, metadata A, ciphertext C, tag T):
    if A ≠ null and P ≠ null then
        T' ← 0t + FK(C||1 ∘ A||0 ∘ N)
    else if P ≠ null then
        T' ← 0t + FK(C||1 ∘ N)
    else
        T' ← 0t + FK(A||0 ∘ N)

    if T' = T then
        P ← C - FK(N)
	return P
    return ⊥

Now if the adversary swaps the fields in the message, the authentication tag will not match on the receiving end thanks to the frame bits.

Scenario one:

E3(K, N, A, "")
Final state: A||0 ∘ N

Scenario two:

E3(K, N, "", A)
Final state: A||1 ∘ N

Subtle example

Now that we have solved the optional metadata/plaintext problem, let's now extend the mode to support sessions (for cleanliness, I omitted the if/else branches, but the metadata/plaintext strings are optional just like in the previous mode):

init(key K, nonce N):
    history ← N
    T ← 0t + FK(history)
    return T

wrap(metadata A, plaintext P):
    C ← P + FK(history) << t
    history ← C||1 ∘ A||0 ∘ history
    T ← 0t + FK(history)
    return (C,T)

unwrap(metadata A, ciphertext C, tag T):
    T' ← 0t + FK(C||1 ∘ A||0 ∘ history)
    if T' = T then
        P ← C - FK(history) << t
    	history ← C||1 ∘ A||0 ∘ history
	return P
    return ⊥

Obviously this mode is secure since we are domain separating the metadata and ciphertext string types, right? Actually, it is broken, but in a subtle way. We have failed to domain separate (metadata,ciphertext) pairs from one another. Once again, there are multiple ways of arriving at an identical internal state.

Scenario one:

init(K, N)
wrap(A, "")
wrap("", P)
Final state: C||1 ∘ A||0 ∘ N

Scenario two:

init(K, N)
wrap(A, P)
Final state: C||1 ∘ A||0 ∘ N

Now it is clear that there is nothing telling us whether a pair of strings goes together (e.g. (A,P)) or whether they are part of separate messages (e.g. (A,"") and ("",P)). This means that an active adversary could intercept two messages and merge them into a new message that was never intended, and the receiver does not detect the tampering because of a domain separation ambiguity. We need a way to group the strings together; we can add another frame bit to resolve this ambiguity. There are actually two ways of looking at this problem:

I tend to prefer the even/odd parity approach because it is very simple:

init(key K, nonce N):
    e ← 01
    history ← N
    T ← 0t + FK(history)
    return T

wrap(metadata A, plaintext P):
    C ← P + FK(history) << t
    history ← C||1||e ∘ A||0||e ∘ history
    T ← 0t + FK(history)
    e ← e + 11
    return (C,T)

unwrap(metadata A, ciphertext C, tag T):
    T' ← 0t + FK(C||1||e ∘ A||0||e ∘ history)
    if T' = T then
        P ← C - FK(history) << t
    	history ← C||1||e ∘ A||0||e ∘ history
        e ← e + 11
	return P
    return ⊥

The ambiguity is resolved thanks to the parity bit being different between adjacent messages.

Scenario one:

init(K, N)
wrap(A, "")
wrap("", P)
Final state: C||11 ∘ A||00 ∘ N

Scenario two:

init(K, N)
wrap(A, P)
Final state: C||10 ∘ A||00 ∘ N

The domain separation of messages exchanged as part of a session can be cast in the light of a graph coloring problem. Consider the following list of messages exchanged between a sender and a receiver:

  1. (A,C)
  2. (A)
  3. (A)
  4. (C)
  5. (C)
  6. (A,C)

Visualizing the internal state as a graph, we have:

Note that without the parity information, the above is ambiguous in that there is not a single unique way to recover the message stream:

If we add the parity bit and color even nodes as black text on a white background, and odd nodes as white text on a black background, we get:

Now the session has only one unique interpretation; all ambiguity is removed:

  1. (A,C)
  2. (A)
  3. (A)
  4. (C)
  5. (C)
  6. (A,C)

As mentioned earlier, an alternative to the parity bit is a bit that encodes whether a string is the last one in a message:

The parity bit and last string bit methods are intimately related: the last string bit approach can be thought of as the mathematical derivative of the parity bit approach.