We have the secret key and the plaintext as input. It seems logical to plug the secret key into the Deck function, but how do we reversibly map the plaintext to a random-looking cryptogram? One idea that comes to mind is to split the plaintext into two pieces, and use one piece to encrypt the other and vice versa, in a reversible way. This approach is known as a Feistel network. We define E1(K, P) as:
To demonstrate, let's use an uncompressed image as the plaintext and visualize the pixels:
It looks promising, but is it secure? It turns out this is not a secure construction. There is a distinguisher in the encipher oracle. We craft P2 by cloning P1 and modifying the left half, and then observe that the sum of the left halves of the plaintexts is equal to the sum of the left halves of the enciphered plaintexts:
There is a very similar distinguisher in the decipher oracle as well, but with the right half:
What is the big deal? The problem is that this is very unlikely to occur with an ideal pseudorandom permutation, and if the adversary is able to distinguish our construction from an ideal pseudorandom permutation, then we have failed our goal.
The key realization about the aforementioned distinguishers is that we need to prevent direct access to PL from the encipher direction and CR from the decipher direction. One way to do this is to add a couple of additional rounds, but this is excessive. We really just need to communicate the digest of PL over into PR, and the digest of CR over into CL. Think of this digest like a MAC, but with less stringent requirements: this digest is in essence encrypted by a future round, and so the adversary can never observe it directly. As a result, we just need a differentially uniform function, here denoted as H1 and H4. We define E2(K, P) as:
At this point, we have achieved a full-blown pseudorandom permutation: flipping just a single bit of plaintext causes every bit of cryptogram to flip with 50% probability, and vice versa. And this process is fully reversible thanks to how the rounds are stacked sequentially. The distinguishers are no longer present:
It is important to note that this is a deterministic mapping: plugging in an identical key and plaintext produces an identical cryptogram. To achieve semantic security, it is desirable to turn this into a probabilistic mapping. We could add some diversifier (e.g., a nonce) to the plaintext, but this would expand the block size of the wide block cipher. We could change the secret key per encipherment, but this is inconvenient and inelegant. It would be much better if we could instead tweak this mapping with a public diversifier such as a nonce. Since the inner two rounds are full-blown pseudorandom functions that touch every bit, we simply need to inject the tweak there, here denoted as W. We define E3(K, W, P) as:
Now we are able to give every encipherment its own independent mapping by plugging the nonce in as the tweak. If a plaintext is enciphered with different tweaks, the resulting cryptograms have no similarities. Likewise, if a cryptogram is deciphered with different tweaks, the resulting plaintexts have no similarities. The tweak allows us to achieve semantic security. To demonstrate this, I have enciphered image P1 with an identical key but two different 1-bit tweaks. The neat thing about this is that I can make the tweak public and it does not reduce the security of the scheme. In addition, had I not labeled the tweaks below, the adversary would not be able to figure out which cryptogram was created with which tweak better than a complete guess:
Is it possible to build an authenticated encryption scheme on top of the wide block cipher? Not only is it possible, but it takes very little effort. As shown above, we can take the plaintext and plug it into the block cipher directly... but is this really giving us integrity and authenticity guarantees? Not quite.
To validate integrity and authenticity, we need known redundancy that we can check in the decipher oracle. The simplest way to achieve this is to simply append e.g. 128 0-bits onto the end of the plaintext before running it through the wide block cipher. Then, in the decipher oracle, we just need to validate that all 128 0-bits are intact after decipherment. This works because the wide block cipher is non-malleable: any modification of the cryptogram results in every bit of plaintext flipping with 50% probability, and vice versa. This means that the adversary has only a 1 in 2128 chance of all 128 0-bits being intact when tampering with the cryptogram.
This is great, but we have not yet handled metadata. We could of course slam the metadata alongside the plaintext (with proper domain separation), but this expands the block width and is excessive since we do not want to encrypt the metadata. The elegant solution is to plug the metadata into the tweak input. This way, the block width is unaffected and if the metadata is tampered with in any way, the mapping of the block cipher changes and the redundancy is damaged.
Remarkably, building an authenticated encryption mode on top of a tweakable wide block cipher gives us misuse resistance in the strongest sense:
Another benefit of the wide block cipher is that it allows us to take advantage of redundancy that may already be present in the plaintext. If you are only ever enciphering PNG images for example, then the decipher oracle can verify known PNG header bits. Even by just validating the string "PNG" in the header, 24 bits of integrity/authenticity is achieved. In other words, an adversary would have to make approximately 223 attempts to achieve a 50% chance of successful forgery.