15.6.17

Local-up-cluster: Get that damn DNS working!

How, in the simplest case, does kuberentes setup internal networking? 

In the kubernetes local-up-cluster script, we curate a microcosm of kubernetes which you can use to setup a cluster with kubernetes native style DNS.  This is done by setting API_HOST and turning on the KUBE_ENABLE_CLUSTER_DNS variables, and then simply running hack/local-up-cluster.sh.

However: when DNS breaks, it can be tricky to debug.  Is it breaking at the pod level (i.e. pod DNS configured wrong somehow)? Or is it happening because the DNS machinery?  Or is your machine simply broken and not NATting things properly.

Diagnosis:  

The simplest way to debug this is first look in the container: In a kubernetes cluster, you can inspect how the SDN is working by looking at /etc/resolv.conf in your individual containers (docker exec -t -i containerid cat /etc/resolv.conf).

Typically, you should see something like this:

~ $ cat /etc/resolv.conf
nameserver 10.0.2.3
search bds-ad.lc

1) You should be able to ping that nameserver.
2) That nameserver should be a pod on the internal network, i.e. something that docker knows about.

How can you tell wether that IP is in the right ballpark? 

You can easily run docker inspect on your existing containers.  Those containers will have IP addresses.  In your local containers, you should see that IP information is on the same basic network as your nameserver in the individual container you created.

Quick fixes if you did something crazy.

Make sure you exported your docker0 as the API_HOST properly:

API_HOST=172.17.0.1

Or whatever docker0 is, as an environment variable before startup.

Again, also make sure you enabled the KUBE_ENABLE_CLUSTER_DNS environment variable.

Whats going on with all these DNS pods?

The 10 internal network is the internal network of kubernetes.   So, your nameserver is associated with a pod on kubernetes that does internal resolving of IPs.    There are 3 containers for DNS that will spin up and provide services on this network.  The two you care about debugging are the DNSmasq nanny and the SkyDNS server.

The kube-system namespace is spinned up as "cluster.local", with a  "10.0.0.10" address by default. You can dive into this in the script, where the enable_dns setup function runs.

The nanny
The nanny's job is to optimize DNS queries, so that they can rapidly return results from its cache.  When you run the DNS masq nanny, what you get is a faster way to get internal IP addresses on the kubernetes network, running as a pod.
I0616 14:42:56.875227       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0616 14:42:56.941215       1 nanny.go:111]
W0616 14:42:56.941298       1 nanny.go:112] Got EOF from stdout
I0616 14:42:56.941346       1 nanny.go:108] dnsmasq[8]: started, version 2.76 cachesize 1000
I0616 14:42:56.941371       1 nanny.go:108] dnsmasq[8]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0616 14:42:56.941391       1 nanny.go:108] dnsmasq[8]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0616 14:42:56.941408       1 nanny.go:108] dnsmasq[8]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0616 14:42:56.941424       1 nanny.go:108] dnsmasq[8]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0616 14:42:56.941461       1 nanny.go:108] dnsmasq[8]: reading /etc/resolv.conf
I0616 14:42:56.941480       1 nanny.go:108] dnsmasq[8]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0616 14:42:56.941496       1 nanny.go:108] dnsmasq[8]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0616 14:42:56.941516       1 nanny.go:108] dnsmasq[8]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0616 14:42:56.941532       1 nanny.go:108] dnsmasq[8]: using nameserver 10.0.2.3#53
I0616 14:42:56.941581       1 nanny.go:108] dnsmasq[8]: read /etc/hosts - 7 addresses


SkyDNS

          SkyDNS is the abstraction that kube provides when you add new services.   Note that you can hack this if you SEND IN a resolv.conf to the kubelet, which will get mounted into individual pods when they start.  For most users, you probably don't need to hack this though, unless you want to run a special networked set of pods or something.
I0616 14:42:56.489106       1 server.go:113] FLAG: --version="false"
I0616 14:42:56.489118       1 server.go:113] FLAG: --vmodule=""
I0616 14:42:56.489150       1 server.go:176] Starting SkyDNS server (0.0.0.0:10053)
I0616 14:42:56.491983       1 server.go:198] Skydns metrics enabled (/metrics:10055)
I0616 14:42:56.492015       1 dns.go:147] Starting endpointsController
I0616 14:42:56.492049       1 dns.go:150] Starting serviceController
I0616 14:42:56.492593       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0616 14:42:56.492661       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0616 14:42:56.604077       1 dns.go:264] New service: kubernetes
I0616 14:42:56.604268       1 dns.go:462] Added SRV record &{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0616 14:42:56.604334       1 dns.go:264] New service: kube-dns
I0616 14:42:56.604378       1 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0616 14:42:56.604418       1 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0616 14:42:57.000516       1 dns.go:171] Initialized services and endpoints from apiserver
I0616 14:42:57.000675       1 server.go:129] Setting up Healthz Handler (/readiness)
I0616 14:42:57.000705       1 server.go:134] Setting up cache handler (/cache)
I0616 14:42:57.000731       1 server.go:120] Status HTTP port 8081

And finally the side car 

Which simply does health checks and so on for the DNS packag  https://github.com/kubernetes/dns/blob/master/docs/sidecar/README.md .  AFAIK your side car can die off without effecting things in any meaningfull way.  Also, its logs are noisy and confusing and need to be cleaned up #helpwanted :).

Finally a look at IP tables.

My good friend Patrick Jordan showed me how to diagnose some IP tables related issues today, specifically, looking at the final portion of the CHAIN information (after the INPUT).

FEDORA

Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-FIREWALL all -- anywhere anywhere
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-FIREWALL all -- anywhere anywhere
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */


Chain DOCKER (0 references)target prot opt source destination

Chain DOCKER-ISOLATION (0 references)target prot opt source destination

Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-SERVICES (2 references)
target prot opt source destination

UBUNTU ON EC2
Chain INPUT (policy ACCEPT)
target prot opt source destination


KUBE-FIREWALL all -- anywhere anywhere
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-FIREWALL all -- anywhere anywhere
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */

Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-SERVICES (2 references)
target prot opt source destination



In any case, it appears that Fedora on virtualbox is much happier.  I doubt it has anything to do with the unused chains for DOCKER and DOCKER_ISOLATION.  So, I conclude, that actually iptables rules for cloud instances forward to IPs that don't always work quite the same as simpler, Virtual machines.  So, running a ;local; kubernetes instance, in a cloud you dont understand, can be dangerous.


I also noted that on Vbox, the forwarded IPs in KUBE_SERVICES have REAL ip addresses, whereas in EC2, They arent real IPs.  No idea if that helps or not, Assume it might.  


Learning more

Beeps @ google has written a great proposal tying together the way DNS mask and nslookup failures work in the wild, fairly advanced but both an interesting idea as well as a good reference

  • https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
  • https://github.com/kubernetes/kubernetes/issues/32749.



No comments:

Post a Comment