Skip to content

MTU and MSS for IPsec Overhead

A Deep Dive and Byte-for-Byte breakdown of IPsec overhead to aid in calculating MSS Clamping, understanding why it’s needed, and its effects.

Whats the difference between MTU and MSS?

MTU or Maximum Transmission Unit is the largest IP Payload an interface can accept. It applies to the whole IP Packet. It does not include Layer 2 Protocols/Overhead such as Ethernet, 802.1q VLAN tagging, PPPoE, MPLS, etc. 1500 is the typical MTU.

MSS on the other hand is the maximum TCP Payload. It does not include the TCP or IP Headers. Nor does it include any other protocols such as ESP encapsulation.

The MTU and MSS boundaries are depicted below…

                                        |=======MSS======|
           |======================MTU====================|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet | IP Header | TCP/UDP Header | Payload / Data | FCS|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

During a TCP Handshake, the MSS is negotiated using the TCP Option for MSS [1] in each SYN Packet.
TCP Option – Maximum segment size: 1460 bytes
Both the initiator (SYN) and responder (SYN-ACK) send their own MSS and the lowest value is used because even if a recipient may support a larger MSS, the sender can only send payloads as large as it’s configured for.

On a server, the MSS value is configured locally and by default on most Operating Systems is 1460. In most scenarios, this is OK.

IP Header = 20 Bytes (Most Often, but additional header Options can increase this to 60 Bytes)
TCP Header = 20 Bytes (Most Often, but additional header Options can increase this to 60 Bytes)

      20            20                  1460
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP Header | TCP/UDP Header |     Payload / Data    |
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|====================1500 Byte MTU===================|
                             |=====1460 Byte MSS=====|

If MSS only accounts for the TCP Payload, and we have to ensure the total packet size is equal-to or less-than the smallest MTU in a path, we can typically rely on the TCP/IP headers adding another 40 bytes to a Payload, and assume a MSS of 1460 will be adequate for a 1500 MTU.

What if an interface in between two hosts/servers has an MTU smaller than 1500?

If traffic arrives at an interface/router that is too large for it’s MTU, and can’t be fragmented, that hop can respond with an ICMP message. Type-3 Code-4 [2]. This response includes the supported MTU of the hop where the traffic has stopped, allowing the sending host to adjust the TCP Payload size of it’s message.

This solution depends on ICMP being enabled on the hop dropping the packet and Path-MTU-Discovery being supported though, not to mention the inefficient use of packets back-and-forth. In addition, in the circumstance of a Site-to-Site VPN, a Hop will not be able send this ICMP message to the original sender, because the payload along with original Source/Destination IP is encrypted. What results is dropped packets show symptoms of TCP Timeouts and SYN/ACK Sequence number gaps.

What is MSS Clamping?

MSS Clamping is a solution deployed on a Router/Firewall in a network path that intercepts TCP SYN Packets, and adjusts the MSS Option to a value specified. The reason for this is to accommodate scenarios like or similar to those mentioned above. By controlling the MSS value of all TCP Conversations that traverse a particular network hop, it’s possible to ensure that no fragmentation or drops occur, providing a reliable network path. This behavior occurs bi-bidirectionally, meaning both initiator and receiver will see a manipulated MSS value during the TCP handshake, controlling their maximum packet size for all TCP flows that traverse the hop implementing the MSS Clamp.

How to calculate the MSS Clamp for IPsec Tunnels

Finding the MSS for an IPsec tunnel is the process of seeing how large a TCP payload can be, when the following additional overhead is factored:

  • Original TCP Headers
  • Original IP Headers
  • ESP encapsulation
    • ESP Header
      • SPI number
      • Sequence Number
      • Initialization Vector (Presence depends on Encryption Cipher)
    • ESP Trailer
      • Payload Padding
      • Payload Pad value
      • Next Header
    • Integrity Check Value
  • New IP Header (If in Tunnel Mode)
  • New UDP Header (If using NAT-T)

Determining the size of each of the protocol fields above and subtracting from a standard MTU of 1500 will determine how large of a TCP payload the connection can support. In other words, the Maximum Segment Size. Multiple parameters above are dependent on IPsec configuration though, and are not the same connection to connection. Some values are reliable though.

Constants that add overhead:

  • 4 Byte ESP SPI Number: An identity value to let the recipient know which SPI to use to decrypt the payload.
  • 4 Byte ESP Sequence Number: This values prevents replay attacks, and must be used.
  • 1 Byte ESP Padding Value: This value indicates to the recipient how much padding was appended to the original payload. (Why we might need padding is coming next, see ‘Encryption Cipher’)
  • 1 Byte ESP Next Header: This value indicates the Payload type. The term “Next” header is a bit misleading, as this field describes what the encrypted payload contains. IPv4/TCP for example.

Parameters that variably effect protocol overhead size:

  • Encryption Cipher: Different Ciphers have different Byte Boundaries. What’s meant by that is that AES 128 for example can only encrypt data at 16 Byte intervals. If the payload that needs to be encrypted is not divisible by 16, padding is added until it is. Additionally, different ciphers require different Optional overhead. AES requires the Initialization Vector be included in each packet, an additional 16 Bytes.
  • Integrity Check Value (ICV) / HMAC Cipher: Similar to the the Encryption Cipher, the Integrity Cipher has a minimum size as well. HMAC_SHA1_96 for example, has a 96 bit, or 12 byte size.
0               2               4               6               8
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Security Parameters Index(SPI) |        Sequence Number        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               IV if needed, 16 Bytes for AES128               |
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     Payload Data* (variable)                  |
|                                                               ~
~                                                               |
|               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               |     Padding (0-255 bytes) (variable)          |
+-+-+-+-+-+-+-+-+               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |   Pad Length  |  Next Header  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Integrity Check Value-ICV   (variable)                |
~                                                               ~
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Additional Considerations:

20 Byte IPv4 Header: When using ESP in Tunnel Mode, not only is the data encrypted but the original Source/Destination IP parameters are as well. A new IP Header is applied with the Source/Destination of the VPN Endpoints.

NAT Traversal: If NAT Traversal is used, ESP is encapsulated in UDP, which has a header size of 8 Bytes that must be accounted for.

How to assemble an ESP Encapsulated Packet.

Determining if padding is needed must be done after the ESP Trailer is applied. Everything that is to be encrypted must fall on a 16 Byte Boundary, and the ESP Trailer is encrypted.

Using the following IPsec Tunnel settings and ciphers…

TUNNEL-in-UDP (NAT-T), ESP:AES_CBC-128/HMAC_SHA1_96/MODP_1024

  1. Original IP Packet (Payload) has the ESP Trailer value appended, adding 1 byte for Padding Value and 1 Byte for NextHeader
  2. If the resulting Payload + Trailer Size is not divisible by 16, padding is added
  3. In the case of AES, the initialization vector or IV is also included in the original Payload, however it’s not actually encrypted.
  4. The ESP Headers, totaling 8 bytes are added
  5. The 12 Byte ICV is calculated and added

While the IntegrityCheckValue has similar byte boundaries as AES, and can only perform a hash on 64 Bytes of data, additional zero’d bytes are appended to the ESP encapsulation. This data unlike padding however does not persist and the actual transmitted payload does not increase in size. Additionally, the entire SHA1 Hash (20bytes) isn’t included when using SHA1 for ICV, only first 96 bits, or 12 bytes.

What results are the following Integrity Protected (Hashed) and Confidentiality Protected (Encrypted) Boundaries of the ESP payload. Encryption is reliant on the IV, while Integrity is reliant on the ICV.

       0               2               4               6               8
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +----
       |         SPI (4 Bytes)         |    Sequence Number (4 Bytes)  | |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
       |                   IV for AES128 (16 Bytes)                    | |
       |                                                               | |
-----+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ICV
     | |                     Payload Data* (variable)                  | | calcu-
Encr-| |               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | lated
ypted| |               |     Padding (0-255 bytes) (variable)          | |
     | +-+-+-+-+-+-+-+-+               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
     | |                               |  Pad Length(1)| Next Header(1)| |
-----+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +----
       |         Integrity Check Value-ICV   (12 Bytes)                |
       |                                                               |
       |                                                               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Payload Size Examples

TUNNEL-in-UDP (NAT-T), ESP:AES_CBC-128/HMAC_SHA1_96/MODP_1024


Original DATA Payload : 1382

Resulting IP Packet includes IP and TCP Headers (40Bytes) : 1422

ESP Trailer is added (2Bytes) : 1424

Payload is divisible by 16 : 1424 / 16 = 89

No Padding needed : 1424

Add Initialization Vector (16Bytes) : 1440

Add ESP Headers (8Bytes) : 1448

Add ICV hash (12Bytes) : 1460

Add Outer IP Header (20Bytes) : 1480

Add UDP Header (8 Bytes _ Nat-Traversal is enabled) : 1488

Final packet size will be a 1488 byte, IPv4 UDP packet.


Original DATA Payload : 1000

Resulting IP Packet includes IP and TCP Headers (40Bytes) : 1040

ESP Trailer is added (2Bytes) : 1042

Payload is not divisible by 16 : 1042 / 16 = 65.125

Determine needed padding : 66*16 = 1056 -1042 (14 additional bytes)

Padding added : 1056

Add Initialization Vector (16Bytes) : 1072

Add ESP Headers (8Bytes) : 1080

Add ICV hash (12Bytes) : 1092

Add Outer IP Header (20Bytes) : 1112

Add UDP Header (8 Bytes _ Nat-Traversal is enabled) : 1120

Final packet size will be a 1120 byte, IPv4 UDP packet.


Original DATA Payload : 1200

Resulting IP Packet includes IP and TCP Headers (40Bytes) : 1240

ESP Trailer is added (2Bytes) : 1242

Payload is not divisible by 16 : 1242 / 16 = 77.625

Determine needed padding : 78*16 = 1248 – 1242 (6 additional bytes)

Padding added : 1248

Add Initialization Vector (16Bytes) : 1264

Add ESP Headers (8Bytes) : 1272

Add ICV hash (12Bytes) : 1284

Add Outer IP Header (20Bytes) : 1304

Add UDP Header (8 Bytes _ Nat-Traversal is enabled) : 1312

Final packet size will be a 1312 byte, IPv4 UDP packet.


Original DATA Payload : 536

Resulting IP Packet includes IP and TCP Headers (40Bytes) : 576

ESP Trailer is added (2Bytes) : 578

Payload is not divisible by 16 : 578 / 16 = 36.125

Determine needed padding : 37*16 = 592 – 578 (14 additional bytes)

Padding added : 592

Add Initialization Vector (16Bytes) : 608

Add ESP Headers (8Bytes) : 616

Add ICV hash (12Bytes) : 628

Add Outer IP Header (20Bytes) : 648

Add UDP Header (8 Bytes _ Nat-Traversal is enabled) : 656

Final packet size will be a 656 byte, IPv4 UDP packet.


What does Packet-Loss due to MTU/MSS mismatch look like?

This problem typically only exhibits symptoms during large data transfers, causing what may seem like intermittent network path loss. For example, a TCP connection will succeed, a TLS handshake will likely succeed as well. General transfers such as light weight HTTP GET and response traffic will also likely pass without issue. However, when large payloads have to be broken up into multi-packet responses, some or all of the data will be dropped in transit.

From the perspective of a Client:

The Client requesting and receiving data will see a jump in TCP Sequence Numbers after a Application Timeout. (The following is an SFTP transfer, client IP 192.168.20.5, server IP 192.168.10.5)

12:10:53.564 192.168.10.5 192.168.20.5	TCP, Src Port: 22, Dst Port: 34672, Seq: 9479, Ack: 4989, Len: 336
12:10:53.564 192.168.20.5 192.168.10.5	TCP, Src Port: 34672, Dst Port: 22, Seq: 4989, Ack: 9815, Len: 0
12:11:53.559 192.168.20.5 192.168.10.5	TCP, Src Port: 34672, Dst Port: 22, Seq: 4989, Ack: 9815, Len: 68
12:11:53.565 192.168.10.5 192.168.20.5	TCP, Src Port: 22, Dst Port: 34672, Seq: 24940, Ack: 5057, Len: 0

In the above, notice the 60 second interval between the first ACK for Seq# 9815 and the second. This behavior indicates that the Client is encountering a Timeout and reaching back to the Server to indicate what Sequence number it has received so far, again 9815.

The Server then responds with Sequence Number 24940, indicating that 15125 bytes, or over 15 megabytes were transferred. All of which are missing from the Client Pcap above.

From the perspective of a Server:

Most often, any payload fragmentation today is offloaded to the NIC, so we don’t see the actual packet size that a client/server is sending when performing a pcap on a host. The server in this case might see the following from the same conversation.

12:10:53.551 192.168.10.5 192.168.20.5	TCP, Src Port: 22, Dst Port: 34672, Seq: 9479, Ack: 4989, Len: 336
12:10:53.572 192.168.20.5 192.168.10.5	TCP, Src Port: 34672, Dst Port: 22, Seq: 4989, Ack: 9815, Len: 0
12:10:53.981 192.168.10.5 192.168.20.5	TCP, Src Port: 22, Dst Port: 34672, Seq: 24940, Ack: 4989, Len: 15125
12:11:53.570 192.168.20.5 192.168.10.5	TCP, Src Port: 34672, Dst Port: 22, Seq: 4989, Ack: 9815, Len: 68
12:11:53.553 192.168.10.5 192.168.20.5	TCP, Src Port: 22, Dst Port: 34672, Seq: 24940, Ack: 5057, Len: 0

From the network path:

However, taking a pcap downstream from the server (on a network hop in the path), we can see the resulting fragmentation. The example that follows is not the same transfer, but demonstrates what MTU/MSS caused packet-loss looks like from an intermediate device between the two hosts, after all payload fragmentation at the NIC has completed.

14:30:38.307 192.168.10.5 192.168.20.5 TCP 54 0.043807000 22 → 45278 [ACK] Seq=2498974799 Ack=3159960368 Win=37760 Len=0
14:30:38.366 192.168.10.5 192.168.20.5 SSHv2 106 0.059541000 Server: Encrypted packet (len=52)
14:30:38.366 192.168.10.5 192.168.20.5 SSHv2 1429 0.000108000 Server: Encrypted packet (len=1375)
14:30:38.366 192.168.10.5 192.168.20.5 SSHv2 1429 0.000049000 Server: Encrypted packet (len=1375)
14:30:38.366 192.168.10.5 192.168.20.5 SSHv2 1429 0.000004000 Server: Encrypted packet (len=1375)
14:30:38.366 192.168.10.5 192.168.20.5 SSHv2 1429 0.000005000 Server: Encrypted packet (len=1375)
14:30:38.366 192.168.10.5 192.168.20.5 SSHv2 1429 0.000005000 Server: Encrypted packet (len=1375)
14:30:38.366 192.168.10.5 192.168.20.5 SSHv2 1429 0.000002000 Server: Encrypted packet (len=1375)
14:30:38.366 192.168.10.5 192.168.20.5 SSHv2 1429 0.000003000 Server: Encrypted packet (len=1375)
14:30:38.366 192.168.10.5 192.168.20.5 SSHv2 1429 0.000002000 Server: Encrypted packet (len=1375)
14:30:38.366 192.168.10.5 192.168.20.5 SSHv2 1429 0.000003000 Server: Encrypted packet (len=1375)
14:30:38.399 192.168.10.5 192.168.20.5 SSHv2 1429 0.032402000 Server: Encrypted packet (len=1375)
14:30:38.427 192.168.20.5 192.168.10.5 TCP 54 0.027887000 45278 → 22 [ACK] Seq=3159960368 Ack=2498974851 Win=262400 Len=0
14:30:38.427 192.168.10.5 192.168.20.5 SSHv2 1429 0.000406000 Server: Encrypted packet (len=1375)
14:30:38.591 192.168.10.5 192.168.20.5 TCP 1429 0.163606000 [TCP Retransmission] 22 → 45278 [ACK] Seq=2498974851 Ack=3159960368 Win=37760 Len=1375
14:30:39.059 192.168.10.5 192.168.20.5 TCP 1429 0.468082000 [TCP Retransmission] 22 → 45278 [ACK] Seq=2498974851 Ack=3159960368 Win=37760 Len=1375
14:30:39.955 192.168.10.5 192.168.20.5 TCP 1429 0.896035000 [TCP Retransmission] 22 → 45278 [ACK] Seq=2498974851 Ack=3159960368 Win=37760 Len=1375
14:30:41.747 192.168.10.5 192.168.20.5 TCP 1429 1.792011000 [TCP Retransmission] 22 → 45278 [ACK] Seq=2498974851 Ack=3159960368 Win=37760 Len=1375
14:30:45.299 192.168.10.5 192.168.20.5 TCP 1429 3.551950000 [TCP Retransmission] 22 → 45278 [ACK] Seq=2498974851 Ack=3159960368 Win=37760 Len=1375
14:30:52.467 192.168.10.5 192.168.20.5 TCP 1429 7.168095000 [TCP Retransmission] 22 → 45278 [ACK] Seq=2498974851 Ack=3159960368 Win=37760 Len=1375
14:31:06.803 192.168.10.5 192.168.20.5 TCP 1429 14.335945000 [TCP Retransmission] 22 → 45278 [ACK] Seq=2498974851 Ack=3159960368 Win=37760 Len=1375
14:31:36.499 192.168.10.5 192.168.20.5 TCP 1429 29.695978000 [TCP Retransmission] 22 → 45278 [ACK] Seq=2498974851 Ack=3159960368 Win=37760 Len=1375
14:31:38.372 192.168.20.5 192.168.10.5 SSHv2 122 1.873535000 Client: Encrypted packet (len=68)
14:31:38.373 192.168.10.5 192.168.20.5 TCP 54 0.000623000 22 → 45278 [ACK] Seq=2498989976 Ack=3159960436 Win=37760 Len=0
14:32:33.843 192.168.10.5 192.168.20.5 TCP 1429 55.470098000 [TCP Retransmission] 22 → 45278 [ACK] Seq=2498974851 Ack=3159960436 Win=37760 Len=1375
14:32:38.382 192.168.20.5 192.168.10.5 SSHv2 106 4.539239000 Client: Encrypted packet (len=52)
14:32:38.383 192.168.20.5 192.168.10.5 TCP 54 0.000529000 45278 → 22 [FIN, ACK] Seq=3159960488 Ack=2498974851 Win=262400 Len=0

In the pcap above, notice the same 60 second timeout/retry by the Client Application. Also note that all TCP Payloads with a length of 1375 are not being ACK’d by the client, indicating that an MSS 1375 is too large for this network path.

How to find MTU/MSS without Path MTU Discovery

I previously mentioned that Path MTU Discovery doesn’t work with IPsec encapsulated payloads due the the Overlay network (Public Internet or similar path) not being able to communicate to the hosts within ESP payload, or even knowing their IP. While it is possible to calculate the MTU by using the details above, it’s not always (how about never) practical. Also mentioned was that we can’t account to Layer2 overhead. Lastly, it’s possible for a IPsec Peer to have multiple ECMP routes to it’s IP, each of which may have different MTU’s.

I’ve found that the most reliable way to validate MTU is to not use ping/icmp with length and DF bit set, but rather use hping3. Using hping we can fully simulate a TCP payload of a particular size, circumventing any need to calculate ICMP maximum size Vs. TCP. In summary, using the following command, if the payload is delivered, then the DataLength (-d) value can be used as the MSS Clamp on a router/firewall.

sudo hping 172.17.0.25 -d 1200 -y -S -p 443

hping $targethost -d $TCP_Payload_Size -y (Don’t Frag) -S (TCP-SYN_Packet) -p $Port

Resources:

AES_CBC-128
https://www.ietf.org/rfc/rfc3602.txt
HMAC_SHA1_96
https://www.ietf.org/rfc/rfc2404.txt
ESP
https://www.ietf.org/rfc/rfc2406.txt

[1]https://datatracker.ietf.org/doc/html/rfc6691
[2]https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml#icmp-parameters-codes-3