We have the secret key and the plaintext as input. It seems logical to plug the secret key into the Farfalle instances, but how do we reversibly map the plaintext to a random-looking ciphertext? One idea that comes to mind is to split the plaintext into two pieces, and use one piece to encrypt the other and vice versa, in a reversible way. This approach is known as a Feistel network. We define E1(K, P) as:
To demonstrate, let's use an uncompressed image as the plaintext and visualize the pixels:
It looks promising, but this is not a secure construction. There is a distinguisher in the encipher oracle. We craft P2 by cloning P1 and modifying the right half, and then observe that the sum of the right halves of the plaintexts is equal to the sum of the right halves of the enciphered plaintexts:
There is a very similar distinguisher in the decipher oracle as well, but with the left half:
What is the big deal? The problem is that this is very unlikely to occur with an ideal pseudorandom permutation, and if the adversary is able to distinguish our construction from an ideal pseudorandom permutation, then we have failed our goal.
The key realization regarding the aforementioned distinguishers is that we need to ensure that every bit of output is a pseudorandom function of all of the input bits, and vice versa. One way to do this is to add a couple of additional rounds, but this is excessive. We really just need to communicate the digest of PL over into PR, and the digest of CR over into CL. We define E2(K, P) as:
At this point, we have achieved a full-blown pseudorandom permutation: flipping just a single bit of plaintext causes every bit of the ciphertext to flip with 50% probability, and vice versa. And this process is fully reversible thanks to how the rounds are stacked sequentially. The distinguishers are no longer present:
It is important to note that this is a deterministic mapping: plugging in an identical key and plaintext produces an identical ciphertext. To achieve semantic security, it is desirable to turn this into a probabilistic mapping. We could add a diversifier (e.g., a nonce) to the plaintext, but this would expand the block size of the wide block cipher. We could change the secret key per encipherment, but this is inconvenient and inelegant. It would be much better if we could instead tweak this mapping with a public diversifier such as a nonce. We can simply input the tweak into the Farfalle instances, here denoted as W. We define E3(K, W, P) as:
Now we are able to give every encipherment its own independent mapping by plugging the nonce in as the tweak. If a plaintext is enciphered with different tweaks, the resulting ciphertexts have no similarities. Likewise, if a ciphertext is deciphered with different tweaks, the resulting plaintexts have no similarities. The tweak allows us to achieve semantic security. To demonstrate this, I have enciphered image P1 with an identical key but two different 1-bit tweaks. The neat thing about this is that I can make the tweak public and it does not reduce the security of the scheme. In addition, had I not labeled the tweaks below, the adversary would not be able to figure out which ciphertext was created with which tweak better than a complete guess:
Not only is it possible to build an authenticated encryption scheme on top of a wide block cipher, but it takes very little effort. As shown above, we can take the plaintext and plug it into the block cipher directly, but this does not quite give us integrity and authenticity guarantees.
To validate integrity and authenticity, we need known redundancy that we can check in the decipher oracle. The simplest way to achieve this is to simply append e.g. 128 0-bits onto the end of the plaintext before running it through the wide block cipher. Then, in the decipher oracle, we just need to validate that all 128 0-bits are intact after decipherment. This works because the wide block cipher is non-malleable: any modification of the ciphertext results in every bit of plaintext flipping with 50% probability, and vice versa. This means that the adversary has only a 2-128 chance of all 128 0-bits being intact when tampering with the ciphertext.
This is great, but we have not yet handled metadata. We could of course slam the metadata alongside the plaintext (with proper domain separation), but this expands the block width and is excessive since we do not want to encrypt the metadata. The elegant solution is to plug the metadata into the tweak input. This way, the block width is unaffected and if the metadata is tampered with in any way, the mapping of the block cipher changes and the redundancy is damaged.
Remarkably, building an authenticated encryption mode on top of a tweakable wide block cipher gives us misuse resistance in the strongest sense:
Another benefit of the wide block cipher is that it allows us to take advantage of redundancy that may already be present in the plaintext. If you are only ever enciphering PNG images for example, then the decipher oracle can verify known PNG header bits. Even by just validating the string "PNG" in the header, 24 bits of integrity/authenticity is achieved. In other words, an adversary would have to make approximately 223 attempts to achieve a 50% chance of successful forgery.