The basic idea behind a VPN is to intercept IP packets, encipher/decipher them, and then pass them along. The VPN client and VPN server both listen for activity from two sources: an Internet-facing socket and a virtual network interface TUN. The TUN device is assigned an IP address and a route is configured to direct traffic to it. For example, when browsing the web on the client, the IP packets from the browser are routed to the TUN device. The VPN client reads IP packets from TUN, enciphers them, then sends them to the VPN server via a regular UDP or TCP socket over the Internet. The VPN server receives the enciphered IP packets over the socket, deciphers them, then writes them to its TUN device. The opposite direction is similar: the VPN server reads IP packets from TUN, enciphers them, then sends them to the VPN client. The VPN client receives the enciphered IP packets, deciphers them, then writes them to its TUN device.
In the latest revision of my VPN software, I have eliminated the handshake entirely to achieve total statelessness. This allows for more robust roaming and simplifies the protocol. VPNCrypt uses a custom mode on top of Xoofff that resembles SIV. In a nutshell, the mode is efficient and has a robustness feature: if packet metadata is accidentally reused, the packet plaintext is not immediately leaked. The authentication tag is 128 bits in length which means identical metadata would need to be reused about 264 times for there to be a 50% chance of keystream reuse.
The client and server communicate via UDP packets. UDP is lightweight, stateless, unordered and potentially unreliable. Therefore, both the client and server must be able to exchange packets in a stateless way. For each encipherment/decipherment, the packet is enciphered/deciphered using a SIV-like mode with the packet type, client ID, server ID, timestamp and counter as metadata. If the sender's packet is outside of a +/- 128 second window relative to the receiver's clock, then it is dropped. To further protect against replay attacks, the newest packet's timestamp is cached at the receiver's end and any packet more than 5 seconds old relative to the newest packet is dropped. The 32-bit counter is initialized to a random value at startup and is incremented once per packet to ensure semantic security.
At first sight, it may seem that using a protocol in which packets can be lost is a bad thing. But actually, this is just because UDP is a very lightweight layer on top of IP and just inherits the properties of raw IP packets. A lost UDP packet does not cause any sort of malfunction: if a client application is performing TCP communications over the UDP-based VPN and a UDP packet is lost, then the TCP protocol uses its built-in mechanisms to resend the lost packet. If a client application is performing UDP communications over the UDP-based VPN and a UDP packet is lost, then it is no different than if not using a VPN at all. It is for this reason that UDP is desirable for VPN communications: it is the closest you can get to raw IP packets while still being accessible from user space.