Why kubectl Hangs for 30 Seconds on VPN (IPv6 + EKS Explained)
We first noticed this on an on-prem cluster after a VPN configuration change. kubectl get pods, which had been instant for months, suddenly took 27 seconds. Every command, every time, but only on VPN.
$ time kubectl get pods
NAME READY STATUS RESTARTS AGE
api-server-6d4f7b8c-x2k9 1/1 Running 0 3d
worker-84c7d6f5b-tm4p 1/1 Running 0 3d
real 0m27.350s
Same cluster, same kubeconfig, no VPN: instant. We blamed DNS for a while. Then the VPN provider. Turns out the problem was much simpler, and it affects EKS, GKE, AKS, and on-prem dual-stack clusters equally.
It's a TCP Timeout on a Dead IPv6 Path
Most Kubernetes API endpoints on dual-stack clusters publish both A (IPv4) and AAAA (IPv6) DNS records. When your system resolves the endpoint hostname, it gets both addresses and, per RFC 6724 address selection, typically prefers IPv6.
Many corporate VPNs don't route IPv6. They don't reject it, they just silently drop the packets. So the TCP SYN goes out over IPv6, nothing comes back, and the kernel retries the SYN following the standard exponential backoff (1s, 2s, 4s, 8s...) until the connection attempt times out somewhere around 20 to 30 seconds. Only then does the system fall back to IPv4, which connects instantly.
What happens (sequential):
kubectl ──► resolve endpoint ──► IPv6 SYN ──────── timeout (25s) ──────── ✗
│
IPv4 SYN ──► connected (0.05s) ◄────────┘
What should happen (concurrent):
kubectl ──► resolve endpoint ──┬─ IPv6 SYN ──────── timeout...
│
└─ IPv4 SYN ──► connected (0.05s) ✓ ← wins
That's the whole problem. The "slow VPN" is just your machine waiting for an IPv6 connection that will never arrive.
Why It's Hard to Catch
No tool surfaces this clearly. kubectl doesn't log address family selection. Your VPN client reports a healthy tunnel. ping to the endpoint usually defaults to IPv4, so it looks fine. curl -v will eventually succeed but the 25-second gap before the TLS handshake starts is easy to miss in verbose output.
The diagnostic is simple though:
# Does the endpoint have AAAA records?
$ dig +short AAAA your-cluster-endpoint
$ dig +short A your-cluster-endpoint
# Force IPv4 and compare
$ time curl -4 -sk https://your-cluster-endpoint:443/healthz
If forced IPv4 is instant but the default path takes ~25 seconds, you have an IPv6 black hole.
"Just Disable IPv6" (And Why That's a Trap)
The Stack Overflow answer is always:
# Linux
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
# macOS
sudo networksetup -setv6off Wi-Fi
This fixes it. But you're killing IPv6 system-wide for one broken path. You'll forget about it, and months later you'll waste a day debugging something unrelated that needs IPv6. We've done exactly this.
Stripping AAAA records at the DNS level is cleaner but requires DNS infrastructure control most teams don't have. Setting --request-timeout on kubectl just makes the failure faster without fixing the cause.
A Note on Go's Built-in Happy Eyeballs
Go's net.Dialer does implement RFC 6555 (Happy Eyeballs v1) with a default FallbackDelay of 300ms. In theory, this should limit the delay to around 350ms even on a broken IPv6 path: wait 300ms, start IPv4 in parallel, connect in ~50ms.
Whether that protection reaches kubectl depends on how client-go configures its transport. client-go constructs its own HTTP transport with custom dial functions, connection pooling, and HTTP/2 negotiation. If those layers use a dialer that doesn't go through net.Dialer's Happy Eyeballs path, or if they wrap the context in a way that changes timeout behavior, you may not get the fallback. We haven't traced every client-go code path, and the behavior likely varies across versions and configurations. What we do know is that the upstream Go issue tracking full RFC 8305 (Happy Eyeballs v2) support has been open since 2018, and reports of kubectl hanging on VPN remain common enough that this is clearly not fully solved in the standard toolchain.
This isn't kubectl-specific either. Any Go-based tool that connects to dual-stack Kubernetes endpoints can be affected: helm, flux, argocd, k9s, terraform with the Kubernetes provider.
How We Handle It in Kunobi
In Kunobi v0.1.0-beta.23, we implemented connection racing at the transport layer for all cluster endpoints. When connecting to any dual-stack API server, Kunobi starts IPv4 and IPv6 attempts concurrently and uses whichever completes first. If IPv6 is blackholed, IPv4 wins in milliseconds.
No sysctl changes. No DNS hacks. No configuration.
This isn't limited to VPN scenarios. The same timeout pattern shows up on home networks with partially broken IPv6, cloud regions with inconsistent dual-stack routing, and on-prem load balancers that advertise AAAA records but don't actually serve IPv6 traffic. Connection racing handles all of it transparently.
Takeaway
If kubectl hangs on VPN, check your IPv6 path before blaming bandwidth. And keep in mind it's not just a kubectl problem. It's a networking layer issue that affects the entire Go-based Kubernetes toolchain on dual-stack endpoints.
Browsers figured this out in 2012. Kubernetes tooling is catching up.
Kunobi v0.1.0-beta.23 includes connection racing for cluster endpoints, smarter shell reconnection, and fixes for AWS EKS OIDC detection. Full changelog · Download