Consider this simple authenticated encryption scheme:
What if we wanted to extend the capabilities of this mode to allow for optional metadata/plaintext? That is, what if we wish to save on the cost of absorbing a blank string by adding some conditional logic?
This mode may seem secure at first glance, but it is actually broken due to an ambiguity between string types. There are multiple ways of arriving at an identical internal state.
E2(K, N, A, "") Final state: A ∘ N
E2(K, N, "", A) Final state: A ∘ N
Both of the above produce an identical authentication tag, which means that an adversary could swap the strings around in transit and the receiver would not detect a problem. This is due to the fact that both the metadata and ciphertext strings are being treated identically when they are input into the Farfalle instance. The solution to this problem is to use domain separation. One simple way to achieve this is to mark the strings with a frame bit to indicate whether it is a metadata string or a ciphertext string:
Now if the adversary swaps the fields in the message, the authentication tag will not match on the receiving end thanks to the frame bits.
E3(K, N, A, "") Final state: A||0 ∘ N
E3(K, N, "", A) Final state: A||1 ∘ N
Now that we have solved the optional metadata/plaintext problem, let's now extend the mode to support sessions:
Obviously this mode is secure since we are domain separating the metadata and ciphertext string types, right? Actually, it is broken, but in a subtle way. We have failed to domain separate (metadata,ciphertext) pairs from one another. Once again, there are multiple ways of arriving at an identical internal state.
init(K, N) wrap(A, "") wrap("", P) Final state: C||1 ∘ A||0 ∘ N
init(K, N) wrap(A, P) Final state: C||1 ∘ A||0 ∘ N
Now it is clear that there is nothing telling us whether a pair of strings goes together (e.g. (A,P)) or whether they are part of separate messages (e.g. (A,"") and ("",P)). This means that an active adversary could intercept two messages and merge them into a new message that was never intended, and the receiver does not detect the tampering because of a domain separation ambiguity. We need a way to group the strings together; we can add another frame bit to resolve this ambiguity. There are actually three ways of looking at this problem:
I tend to prefer the even/odd parity approach because it is branchless:
The ambiguity is resolved thanks to the parity bit being different between adjacent messages.
init(K, N) wrap(A, "") wrap("", P) Final state: C||11 ∘ A||00 ∘ N
init(K, N) wrap(A, P) Final state: C||10 ∘ A||00 ∘ N
The domain separation of messages exchanged as part of a session can be cast in the light of a graph coloring problem. Consider the following list of messages exchanged between a sender and a receiver:
Visualizing the internal state as a graph, we have:
Note that without the parity information, the above is ambiguous in that there is not a single unique way to recover the message stream:
If we add the parity bit and color even nodes as black text on a white background, and odd nodes as white text on a black background, we get:
Now the session has only one unique interpretation; all ambiguity is removed:
As mentioned earlier, an alternative to the parity bit is a bit that encodes whether a string is the first one in a message:
Or a bit that encodes whether a string is the last one in a message:
The parity bit and first/last string bit methods are intimately related: the first/last string bit approach can be thought of as the mathematical derivative of the parity bit approach.