minikube DNS fails after SRV query
(Jin Qing's Column, Dec., 2021)
My program is using K8s DNS SRV query to discovery service,
and when it's deployed on minikube, I find DNS failure.
I can use nslookup to reproduce the failure.
Querying a FQDN is OK. But after querying a non-existing SRV short name, the ping fails.
root@web-0:/# ping google.com
PING google.com (142.250.66.110) 56(84) bytes of data.
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=1 ttl=108 time=33.7 ms
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=2 ttl=108 time=33.8 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 33.779/33.834/33.889/0.055 ms
root@web-0:/# nslookup
> set type=srv
> nosuch-nosuch-nosuch-1234567890abcdefg.cn
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find nosuch-nosuch-nosuch-1234567890abcdefg.cn: NXDOMAIN
> exit
root@web-0:/# ping google.com
PING google.com (142.250.66.110) 56(84) bytes of data.
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=1 ttl=108 time=33.7 ms
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=2 ttl=108 time=33.7 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 33.730/33.735/33.741/0.183 ms
root@web-0:/# nslookup
> set type=srv
> nginx-wrong
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find nginx-wrong: SERVFAIL
> exit
root@web-0:/# ping google.com
ping: unknown host google.com
root@web-0:/#
The ping will recover to normal after about 1 minute.
If I query a existing internal service name, and nslookup returns correctly, then DNS is OK after I quit nslookup.
root@web-0:/# ping google.com
PING google.com (142.250.66.110) 56(84) bytes of data.
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=1 ttl=108 time=33.6 ms
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=2 ttl=108 time=34.8 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 33.648/34.260/34.872/0.612 ms
root@web-0:/# nslookup
> set type=srv
> nginx
Server: 10.96.0.10
Address: 10.96.0.10#53
nginx.default.svc.cluster.local service = 0 25 80 web-1.nginx.default.svc.cluster.local.
nginx.default.svc.cluster.local service = 0 25 80 web-2.nginx.default.svc.cluster.local.
nginx.default.svc.cluster.local service = 0 25 80 web-0.nginx.default.svc.cluster.local.
nginx.default.svc.cluster.local service = 0 25 80 web-3.nginx.default.svc.cluster.local.
> exit
root@web-0:/# ping google.com
PING google.com (142.250.66.110) 56(84) bytes of data.
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=1 ttl=108 time=33.5 ms
^C
--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 33.529/33.529/33.529/0.000 ms
root@web-0:/#
When DNS fails, the whole cluster can not query any domain name outside,
but internal name is OK.
https://github.com/kubernetes/minikube/issues/13137