A stream cipher generates an arbitrary-length stream of bits as a function of a secret key K and a public nonce N. This stream of bits, known as the keystream, has the property of being computationally indistinguishable from randomness to an adversary who does not know the secret key. To perform the actual encryption, we simply add the plaintext P and the keystream together in GF(2), which is just XOR. To build a stream cipher on top of Farfalle instance F, we plug in nonce N as input and request |P| bits of output, which are then added to P:
To demonstrate, let's encrypt some text:
As shown above, the two ciphertexts appear completely uncorrelated thanks to the nonce providing diversification. It is very important that a nonce is never reused, otherwise the adversary can calculate the sum of the plaintexts simply by summing the ciphertexts together to cancel out the keystream.
A message authentication code (MAC) generates a short authentication tag T as a function of a secret key K and an arbitrary-length message M. The tag T has the property of being computationally indistinguishable from randomness. To build a MAC on top of Farfalle instance F, we plug in message M as input and request t bits of output:
To demonstrate, let's calculate the MAC of P1 and represent the authentication tag in hexadecimal. To make things interesting, let's create a P2 such that it differs from P1 by only a single byte and MAC that as well for comparison:
As demonstrated above, it does not matter how small of a change is made to the input message; the MAC function maps every possible input down to a much smaller space in a random-looking way. Due to the pigeonhole principle, there are an infinite number of inputs that map to each of the 2128 output buckets. How much effort would it take to craft a message that MACs to the same tag as P1? It would take 2127 attempts on average!
In any situation where an adversary can actively tamper with a message, which I would argue is the vast majority of situations, a stream cipher by itself does not provide adequate security. The biggest problem is malleability: flipping a bit of ciphertext at location n flips the corresponding bit of plaintext at location n. The adversary can flip individual bits at any location, and the recipient has no robust and consistent way of detecting that this happened. To demonstrate this, I have crafted ciphertext C1 such that its decrypted plaintext specifies a very different transfer amount, using only Estream(K1, 0, P1):
Stated another way, plain encryption provides confidentiality, but does not provide any integrity or authenticity guarantees; we have no way of determining whether the ciphertext has been tampered with. The solution is to use authenticated encryption, which provides confidentiality, integrity and authenticity guarantees. We have the following inputs:
This can be as easy as combining the stream cipher and MAC together:
The sender transmits the ciphertext and the receiver authenticates and decrypts it. If the ciphertext is not authentic, then it is thrown out and an error is returned. If the ciphertext is authentic, then it is decrypted and used. To demonstrate this, let's encrypt and authenticate P1:
Just like with the MAC example, it would take about 2127 attempts on average to achieve a successful forgery. Why is this? Going back to the plain stream cipher, the space of valid ciphertexts is very dense: every ciphertext is considered valid and will decipher without error. In this authenticated encryption mode, we are effectively making the space of valid ciphertexts very sparse by adding an authentication tag. Because the tag is 128 bits in length, in the space of all possible ciphertexts, approximately only 1 in every 2128 ciphertexts is considered valid!
This mode has a number of things going for it:
However, not all is perfect:
We could work around these issues by breaking up the plaintext into smaller pieces and then encrypting/authenticating each piece by appending a counter to the nonce. This would work fine and is secure, but is there a more robust solution?
We can use the incrementality property of Farfalle to our advantage. Consider the following mode:
At first glance, this may appear to be a totally impractical mode: do we really need to store a growing list of strings in memory? Not at all! Thanks to the incrementality property of Farfalle, strings are compressed into a fixed-size state. You could exchange a single message or a million messages and the state size remains constant. This stateful mode allows us to exchange a sequence of messages, and produce an intermediate authentication tag after each message. Thanks to the beauty of incrementality, each authentication tag authenticates not only the most recent message, but all messages up to that point in the session. This means that if any messages are dropped, duplicated, reordered or tampered with in any way, the authentication tag will not match and we can terminate the session immediately. Not only that, but the state would become corrupt as well, which scrambles all subsequent messages.
This mode solves the problems with the previous mode, and actually introduces an additional feature: a startup tag is generated at session initialization, which authenticates the nonce used to initialize the session. If an adversary tries establishing a session with a server, the tag will inevitably not match, and the server can drop the connection before any data is processed. This saves valuable CPU time.
Farfalle provides a powerful design tool. For example, consider that most applications need only process metadata at the beginning of a session. We can use this observation to simplify the mode:
Now, the metadata and nonce are combined into a single string provided at session startup. The metadata is compressed into the state, which diversifies it and also produces the startup tag. To demonstrate, let's initialize a session and wrap an identical plaintext repeatedly:
One of the nice things about a stateful session mode is that inputting a nonce once at the beginning of the session diversifies the entire session. In addition, note how each ciphertext is different despite wrapping an identical plaintext repeatedly. This is thanks to the fact that strings are incrementally accumulated into the state. This means that each authentication tag authenticates everything absorbed up to that point. It is also possible to allow for optional metadata or optional plaintext throughout a session by using frame bits; this is left as an exercise to the reader. Hopefully it is clear by now why I claimed Farfalle provides the ideal level of abstraction.