Domain separation

Domain separation refers to separating objects into well-defined, distinct domains. One example of this is metadata and plaintext: during cryptographic processing, it is crucial that we treat metadata and plaintext as two distinct types of strings, otherwise attacks may become possible.

Simple example

Consider this simple authenticated encryption scheme:

AEAD

What if we wanted to extend the capabilities of this mode to allow for optional metadata/plaintext? That is, what if we wish to save on the cost of absorbing a blank string by adding some conditional logic?

AEAD

This mode may seem secure at first glance, but it is actually broken due to an ambiguity between string types. There are multiple ways of arriving at an identical internal state.

Scenario one:

E2(K, N, A, "")
Final state: A ∘ N

Scenario two:

E2(K, N, "", A)
Final state: A ∘ N

Both of the above produce an identical authentication tag, which means that an adversary could swap the strings around in transit and the receiver would not detect a problem. This is due to the fact that both the metadata and ciphertext strings are being treated identically when they are input into the Farfalle instance. The solution to this problem is to use domain separation. One simple way to achieve this is to mark the strings with a frame bit to indicate whether it is a metadata string or a ciphertext string:

AEAD

Now if the adversary swaps the fields in the message, the authentication tag will not match on the receiving end thanks to the frame bits.

Scenario one:

E3(K, N, A, "")
Final state: A||0 ∘ N

Scenario two:

E3(K, N, "", A)
Final state: A||1 ∘ N

Subtle example

Now that we have solved the optional metadata/plaintext problem, let's now extend the mode to support sessions:

AEAD

Obviously this mode is secure since we are domain separating the metadata and ciphertext string types, right? Actually, it is broken, but in a subtle way. We have failed to domain separate (metadata,ciphertext) pairs from one another. Once again, there are multiple ways of arriving at an identical internal state.

Scenario one:

init(K, N)
wrap(A, "")
wrap("", P)
Final state: C||1 ∘ A||0 ∘ N

Scenario two:

init(K, N)
wrap(A, P)
Final state: C||1 ∘ A||0 ∘ N

Now it is clear that there is nothing telling us whether a pair of strings goes together (e.g. (A,P)) or whether they are part of separate messages (e.g. (A,"") and ("",P)). This means that an active adversary could intercept two messages and merge them into a new message that was never intended, and the receiver does not detect the tampering because of a domain separation ambiguity. We need a way to group the strings together; we can add another frame bit to resolve this ambiguity. There are multiple ways of approaching this problem:

  1. Indicate whether a string is the first one in the tuple of strings
  2. Indicate whether a string is the last one in the tuple of strings
  3. Indicate whether a tuple of strings is "even" or "odd" relative to other tuples

For example, here is the even/odd parity approach:

AEAD

The ambiguity is resolved thanks to the parity bit being different between adjacent messages.

Scenario one:

init(K, N)
wrap(A, "")
wrap("", P)
Final state: C||11 ∘ A||00 ∘ N

Scenario two:

init(K, N)
wrap(A, P)
Final state: C||10 ∘ A||00 ∘ N

The domain separation of messages exchanged as part of a session can be cast in the light of a graph coloring problem. Consider the following list of messages exchanged between a sender and a receiver:

  1. (A,C)
  2. (A)
  3. (A)
  4. (C)
  5. (C)
  6. (A,C)

Visualizing the internal state as a graph, we have:

Domain Separation

Note that without the parity information, the above is ambiguous in that there is not a single unique way to recover the message stream:

We can use a frame bit to encode whether a string is the first one in a message:

Domain Separation

Now the session has only one unique interpretation; all ambiguity is removed:

  1. (A,C)
  2. (A)
  3. (A)
  4. (C)
  5. (C)
  6. (A,C)

Alternatively, we can use a frame bit to encode whether a string is the last one in a message:

Domain Separation

Another approach is to encode the parity of the tuple:

Domain Separation

The first/last string bit and parity bit methods are intimately related: the parity bit approach can be thought of as the mathematical integral of the first/last string bit approach. One interesting thing to note is that in the "first string" encoding, we can apply a mode-specific optimization to arrive at yet another valid encoding. The metadata string's "first string" indicator is always 1; when we rearrange the type/indicator bits and strip out the redundant indicator bit from the metadata string, we end up with an encoding that indicates whether a ciphertext string is the first string:

Domain Separation

Let's take a look at the algorithm for the "first ciphertext string" approach:

AEAD

The ambiguity is resolved thanks to the indicator bit on the ciphertext string type. If two metadata strings are adjacent, then we immediately know they are members of two separate messages. If a metadata and ciphertext string are adjacent, then the indicator bit on the ciphertext string tells us whether it begins a new message or is part of the existing one.

Scenario one:

init(K, N)
wrap(A, "")
wrap("", P)
Final state: C||11 ∘ A||0 ∘ N

Scenario two:

init(K, N)
wrap(A, P)
Final state: C||01 ∘ A||0 ∘ N

A similar specialization could be applied to the "last string" encoding. I tend to prefer the "first string" encoding due to its nice properties: