This topic covers one of the most common issues seen with communications between your cloud network and on-premises network: a hanging connection, even though you can ping hosts across the connection.
Summary of Problem and Solutions
Symptom: Your virtual cloud network (VCN) is connected to your existing on-premises network via an IPSec VPN, or Oracle Cloud Infrastructure FastConnect. Hosts on one side of the connection can ping hosts on the other side, but the connection hangs. For example:
- You can SSH to a host across the connection, but after you log in to the host, the connection hangs.
- You can start a Virtual Networking Computing (VNC) connection, but the session hangs.
- You can start an SFTP download, but the download hangs.
General problem: Path Maximum Transmission Unit Discovery (PMTUD) is probably not working on one or both sides of the connection. It must be working on both sides of the connection so that both sides can know if they're trying to send packets that are too large for the connection and adjust accordingly. For a brief overview of Maximum Transmission Unit (MTU) and PMTUD, see Overview of MTU and Overview of PMTUD.
Solutions for fixing PMTUD:
- Ensure that your hosts are configured to use PMTUD: If the hosts in your on-premises network don't use PMTUD (that is, if they don't set the Don't Fragment flag in the packets), they have no way to discover if they're sending packets that are too large for the connection. Your instances on the Oracle side of the connection use PMTUD by default. Do not change that configuration on the instances.
Ensure both the VCN security lists and the instance firewalls allow ICMP type 3 code 4 messages: When PMTUD is in use, the sending hosts receive a special ICMP message if they send packets that are too large for the connection. Upon receipt of the message, the host can dynamically update the size of the packets to fit the connection. However, your instances can't receive these important ICMP messages if both the security lists for the subnet in the VCN and the instance firewalls aren't configured to accept them.
If you're using stateful security list rules (for TCP, UDP, or ICMP traffic), you don't need to ensure that your security list has an explicit rule to allow ICMP type 3 code 4 messages because the Networking service tracks the connections and automatically allows those messages. Stateless rules require an explicit ingress security list rule for ICMP type 3 code 4 messages. Confirm that the instance firewalls are set up correctly.
To check to see if a host is receiving the messages, see Finding Where PMTUD Is Broken.
- Ensure that your router honors the Don't Fragment flag: If the router doesn't honor the flag and thus ignores the use of PMTUD, it sends fragmented packets to the instances in the VCN, which is bad (see Why Avoid Fragmentation?). The VCN's security lists are most likely configured in such a way that they recognize only the initial fragment, and the remaining ones are dropped, causing the connection to hang. Instead, your router should use PMTUD and honor the Don't Fragment flag to determine the correct size of unfragmented packets to send through the connection.
The parts of the solution are numbered and called out in red italics in the following diagram. It shows an example scenario with your on-premises network connected to your VCN over an IPSec VPN.
Keep reading for a brief overview of MTU and PMTUD, and how to check if PMTUD is working on both sides of the network connection.
You may be wondering why you want to avoid fragmentation. First, it adversely affects the performance of your application. Fragmentation requires reassembly of the fragments and retransmission if fragments are lost. Reassembly and retransmission require time and CPU resources.
Second, only the first fragment contains the source and destination port information. This means that firewalls or your VCN's security lists will probably drop the other packets, because they are typically configured to evaluate the port information. For fragmentation to work with your firewalls and security lists, you would have to configure them to be more permissive than usual, which is not desirable.
The communications between any two hosts across an Internet Protocol (IP) network use packets. Each packet has a source and destination IP address and a payload of data. Every network segment between the two hosts has a Maximum Transmission Unit (MTU) that represents the number of bytes that a single packet can carry.
The standard internet MTU size is 1500 bytes. This is also true for most home networks and many corporate networks (and their Wi-Fi networks). Some data centers, including those for Oracle Cloud Infrastructure, can have a larger MTU. The Compute instances use an MTU of 9000 by default. On a Linux host, you can use the
ifconfig command to display the MTU of the host's network connection. For example, here's the
ifconfig output from an Ubuntu instance (the MTU is highlighted in red italics):
ifconfig ens3 Link encap:Ethernet HWaddr 00:00:00:00:00:01 inet addr:10.0.6.9 Bcast:10.0.6.31 Mask:255.255.255.224 inet6 addr: 2001:db8::/32 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
For comparison, here's the output from a machine connected to a corporate network:
ifconfig en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
Notice that its MTU is the more typical 1500 bytes.
If the host is connected through a corporate VPN, the MTU is even smaller, because the VPN tunnel must encapsulate the traffic inside an IPSec packet and send it across the local network. For example:
ifconfig utun0: flags=81d1<UP,POINTOPOINT,RUNNING,NOARP,PROMISC,MULTICAST> mtu 1300
How do the two hosts figure out how large of a packet they can send to each other? For many types of network traffic, such as HTTP, SSH, and FTP, the hosts use TCP to establish new connections. During the initial three-way handshake between two hosts, they each send the Maximum Segment Size (MSS) for how large their payload can be. This is smaller than the MTU. (TCP runs inside the Internet Protocol (IP), which is why it's referred to as TCP/IP. Segments are to TCP what packets are to IP.)
Using the tcpdump application, you can see the MSS value shared during the handshake. Here's an example from tcpdump (with the MSS highlighted in red italics):
12:11:58.846890 IP 192.168.0.25.22 > 10.197.176.19.58824: Flags [S.], seq 2799552952, ack 2580095593, win 26844, options [mss 1260,sackOK,TS val 44858491 ecr 1321638674,nop,wscale 7], length 0
The preceding packet is from an SSH connection to an instance from a laptop connected to a corporate VPN. The local network the laptop uses for its internet connection has an MTU of 1500 bytes. The VPN tunnel enforces an MTU of 1300 bytes. Then when the SSH connection is attempted, TCP (running inside the IP connection) tells the Oracle Cloud Infrastructure instance that it supports TCP segments that are less than or equal to 1260 bytes. With a corporate VPN connection, the laptop connected to the VPN typically has the smallest MTU and MSS compared to anything it's communicating with across the internet.
A more complex case is when the two hosts have a larger MTU than some network link between them that is not directly connected to either of them. The following diagram illustrates an example.
The example shows two servers, each directly connected to its own routed network that supports a 9000-byte MTU. The servers are in different data centers. Each data center is connected to the internet, which supports a 1500-byte MTU. An IPSec VPN tunnel connects the two data centers. That tunnel crosses the internet, so the inside of the tunnel has a smaller MTU than the internet. In this diagram, the MTU is 1380 bytes.
If the two servers try to communicate (with SSH, for example), during the three-way handshake, they agree on an MSS around 8960. The initial SSH connection might succeed, because the maximum packet sizes during the initial SSH connection setup are usually less than 1380 bytes. When one side tries to send a packet larger than the smallest link between the two endpoints, Path MTU Discovery (PMTUD) becomes critical.
Path MTU Discovery is defined in RFC 1191. It works by requiring the two communicating hosts to set a Don't Fragment flag in the packets they each send. If a packet from one of these hosts reaches a router where the egress (or outbound) interface has an MTU smaller than the packet length, the router drops that packet. The router also returns an ICMP type 3 code 4 message to the host. This message specifically says "Destination Unreachable, Fragmentation Needed and Don't Fragment Was Set" (defined in RFC 792). Effectively the router tells the host: "You told me not to fragment packets that are too large, and this one's too large. I'm not sending it." The router also tells the host the maximum size packets allowed through that egress interface. The sending host then adjusts the size of its outbound packets so they're smaller than the value the router provided in the message.
Here's an example that shows the results when an instance tries to ping a host (203.0.113.2) over the internet with an 8000-byte packet and the Don't Fragment flag set (that is, with PMTUD in use). The returned ICMP message is highlighted in red italics:
ping 203.0.113.2 -M do -s 8000 PING 203.0.113.2 (203.0.113.2) 8000(8028) bytes of data. From 10.0.0.2 icmp_seq=1 Frag needed and DF set (mtu = 1500)
The response is exactly what's expected. The destination host is across the internet, which has an MTU of 1500 bytes. Even though the sending host's local network connection has an MTU of 9000 bytes, the host can't reach the destination host with the 8000-byte packet and gets an ICMP message accordingly. PMTUD is working correctly.
For comparison, here's the same ping, but the destination host is across an IPSec VPN tunnel:
ping 192.168.6.130 -M do -s 8000 PING 192.168.0.130 (192.168.0.130) 8000(8028) bytes of data. From 192.0.2.2 icmp_seq=1 Frag needed and DF set
Here the VPN router sees that to send this packet to its destination, the outbound interface is a VPN tunnel. That tunnel goes across the internet, so the tunnel must fit inside the internet's 1500-byte MTU link. The result is that the inside of the tunnel only allows packets up to 1360 bytes (which the router then lowered to 1358, which can make things more confusing).
If PMTUD isn't working somewhere along the connection, you need to figure out why and where. Typically it's because the ICMP type 3 code 4 packet (from the router with the constrained link that can't fit the packet) never gets back to the sending host. This can happen if there's something blocking that kind of traffic between the host and the router. And it can happen on either side of the VPN tunnel (or other constrained MTU link).
Try Pinging From Each Side of the Connection
To troubleshoot the broken PMTUD, you must determine if PMTUD is working on each side of the connection. In this scenario, let's assume the connection is an IPSec VPN.
How to ping: Like in Overview of PMTUD, ping a host on the other side of the connection with a packet that you know is too large to fit through the VPN tunnel (for example, 1500 bytes or larger). Depending on which operating system the sending host uses, you might need to format the ping command slightly different to ensure the Don't Fragment flag is set. For both Ubuntu and Oracle Linux, you use the
-M flag with the ping command.
Here's information about the
-M pmtudisc_opt Select Path MTU Discovery strategy. pmtudisc_option may be either do (prohibit fragmentation, even local one), want (do PMTU discovery, fragment locally when packet size is large), or dont (do not set DF flag).
Here's an example ping (with the -M flag and the resulting ICMP message highlighted in red italics)
ping -M do -s 1500 192.168.6.130 PING 192.168.0.130 (192.168.0.130) 1500(1528) bytes of data. From 10.0.0.2 icmp_seq=1 Frag needed and DF set (mtu = 1358)
If the result includes the line "From x.x.x.x icmp_seq=1 Frag needed and DF set (mtu = xxxx)", then PMTUD is working on that side of the tunnel. Note that the source address of the ICMP message is the public IP address of the tunnel the traffic is trying to go out (for example 203.0.113.13 in the preceding Ubuntu example).
Also, ping from the other side of the connection to confirm PMTUD is working from that side. Both sides of the connection must recognize that there is a tunnel between them that can't fit the large packets.
If you're sending the ping from a host in your on-premises network, and the ping succeeds, that probably means your edge router is not honoring the Don't Fragment flag. Instead the router is fragmenting the large packet. The first fragment reaches the destination host, so the ping succeeds, which is misleading. If you try to do more than just ping, the fragments after the first get dropped, and the connection will hang.
Verify that your router configuration honors the Don't Fragment flag. The router's default configuration is to honor it, but someone might have changed the default.
When testing from the VCN side of the connection, if you don't see the ICMP message in the response, there is probably something dropping the ICMP packet before it reaches your instance.
There could be two issues:
- Security list: The Networking security list could be missing an ingress rule that allows ICMP type 3 code 4 messages to reach the instance. This is an issue only if you're using stateless security list rules. If you're using stateful rules, your connections are tracked and the ICMP message is automatically allowed without needing a specific security list rule to allow it. If you're using stateless rules, ensure that the subnet the instance is in has a security list with an ingress rule that allows ICMP traffic type 3 code 4 from source 0.0.0.0/0 and any source port. For more information, see Security Lists, and specifically To update rules in an existing security list.
- Instance firewall: The instance's firewall rules (set in the OS) could be missing a rule that allows ICMP type 3 code 4 messages to reach the instance. Specifically for a Linux instance, ensure that iptables or firewalld is configured to allow the ICMP type 3 code 4 messages.
Avoiding the Need for PMTUD
Oracle recommends using PMTUD. However, in some situations it's possible to configure servers so they don't need to rely on it. Consider the case of the instances in your VCN communicating across an IPSec VPN to hosts in your on-premises network. You know the range of IP addresses for your on-premises network. You can add a special route to your instances that specifies the maximum MTU to use when communicating with hosts in that address range. The instance-to-instance communication within the VCN still uses an MTU of 9000 bytes.
The following information shows how to set that route on a Linux instance.
The default route table on the instance typically has two routes: the default route (for the default gateway), and a local route (for the local subnet). For example:
ip route show default via 10.0.6.1 dev ens3 10.0.6.0/27 dev ens3 proto kernel scope link src 10.0.6.9
You can add another route that points to the same default gateway, but with the address range of the on-premises network and a smaller MTU. For example, in the following command, the on-premises network is 22.214.171.124/8, the default gateway is 10.0.6.1, and the maximum MTU size is 1300 for packets being sent to the on-premises network.
ip route add 126.96.36.199/8 via 10.0.6.1 mtu 1300
The updated route table looks like this:
ip route show default via 10.0.6.1 dev ens3 188.8.131.52/8 via 10.0.6.1 dev ens3 mtu 1300 10.0.6.0/27 dev ens3 proto kernel scope link src 10.0.6.9
Within the VCN, the instance-to-instance communication continues to use 9000 MTU. However, communication to the on-premises network uses a maximum of 1300. This example assumes there's no part of the connection between the on-premises network and VCN that uses an MTU smaller than 1300.
The preceding commands do not persist if you reboot the instance. You can make the route permanent by adding it to a configuration file in the OS. Oracle Linux, for example, uses an interface-specific file called
/etc/sysconfig/network-scripts/route-<interface>. For more information, see the documentation for your variant of Linux.