jayunit100: Flannel and network interfaces, heads up !

Note: I got pinged about this recently, and there were alot of questions about the routing.
If you don't understand how routing works, have never used dig, or scapy, consider watching https://www.youtube.com/watch?v=hcPWAyxjd6E before reading further :). Alot of new kubernetes users don't realize that, when you're setting up a kube cluster, you need to understand a little bit about the networking model, in order to have the intuition necessary to maintain your cluster (i.e. you need to have a basic understanding, at least, of https, TCP, kube-proxy, flanneld, CNI, docker assigned IP addresses, in order for any of this to make sense).

anyways...

I've been debugging a flannel connectivity issue in a vagrant recipe this week. I noticed there aren't a whole lot of docs around flannel, so here are some screenshots that might help folks.

How flannel works

There aren't alot of great recipes for helping get started with flannel. I use Eric paris' https://github.com/eparis/kubernetes-ansible playbooks as a reference. From these, we can see how its really meant to work.

ETCD + Two layers of routing

Flannel is actually pretty easy to understand at a high level. There are really just two major points.

1) uses a distributed k/v store to write a subnet out for each machine in your cluster. This means you don't need to use some complex networking tool from the dark ages to store data about how your carving up your network. Why? Because flannel will make its OWN network on top of your existing network and store the metadata about it in the k/v store. In this case, the k/v store is etcd, which is basically a distributed hashmap with strong consistency, watch, directory semantics.

2) tells docker nodes running on individual machines to lookup their subnet before assigning IPs. This means no more 172 IPs that need to be port forwarded and so on. EACH container has an IP that can be routed first based on the MACHINE its running on, and then based on its exact container IP, and this is what flannel does for you.

Network interfaces

I like to do EVERYTHING in vagrant ALL THE TIME ! Well, guess what, sometimes you get in trouble. For example, in vagrant VMs, often eth0 is the BRIDGE.

In flannel, a default bridge is selected as the first one. So, if you have 3 nodes, each node will define the bridge IP as the public IP for their subnet.

This leads to a scenario where ALL ndies are using the SAME SUBNET for assigning IPs, which totally breaks the entire model of flannel subnets.

What you should do

Make sure you look at eth0, eth1, and so on each of your machines. Then setup your FLANNEL_OPTIONS (in fedora this will be in /etc/sysconfig/flanneld), like so

FLANNEL_OPTIONS="--iface=eth1"

if you have a network like this

If you have a internal or other network which you are assigning to your machines, use THAT not the first eth0, as flannels preferred iface.

What you should see

To be sure : If you have two machines, when you bring your machines up, watching the logs in flannel (journalctl -f -u flanneld) should show something similar to what we have below...

Jun 22 03:39:30 kube1.ha flanneld[23344]: I0622 03:39:30.005921   23344 main.go:247] Installing signal handlers
Jun 22 03:39:30 kube1.ha flanneld[23344]: I0622 03:39:30.006228   23344 main.go:205] Using 192.168.4.101 as external interface
Jun 22 03:39:30 kube1.ha flanneld[23344]: W0622 03:39:30.006264   23344 device.go:83] "flannel.1" already exists with incompatable configuration: vtep (external) interface: 3 vs 2; recreating device
Jun 22 03:39:30 kube1.ha flanneld[23344]: I0622 03:39:30.017812   23344 subnet.go:320] Picking subnet in range 80.1.0.0 ... 80.255.0.0
Jun 22 03:39:30 kube1.ha flanneld[23344]: I0622 03:39:30.020405   23344 subnet.go:83] Subnet lease acquired: 80.99.0.0/16
Jun 22 03:39:30 kube1.ha flanneld[23344]: I0622 03:39:30.020588   23344 main.go:215] VXLAN mode initialized
Jun 22 03:39:30 kube1.ha flanneld[23344]: I0622 03:39:30.020597   23344 vxlan.go:115] Watching for L2/L3 misses
Jun 22 03:39:30 kube1.ha flanneld[23344]: I0622 03:39:30.020604   23344 vxlan.go:121] Watching for new subnet leases
Jun 22 03:39:30 kube1.ha systemd[1]: Started Flanneld overlay address etcd agent.
Jun 22 03:39:30 kube1.ha flanneld[23344]: I0622 03:39:30.026963   23344 vxlan.go:184] Subnet added: 80.54.0.0/16
Jun 22 03:39:30 kube1.ha flanneld[23344]: I0622 03:39:30.026995   23344 vxlan.go:184] Subnet added: 80.1.0.0/16

Then a quick test

Finally, you can do a quick test. Spinning up two docker container, on on each node, you should see that the first has an IP like 80.1.0.3, the second should have 80.x.y.z (WHERE X.Y isn't 1.0) :) .

WHY IS THIS IMPORTANT

The reason this is important is that flannel NEEDS DIFFERENT SUBNETS PER each machine, because the entire basis for flannel routing is that machines which are in a given subnet all go to the same original place, and then their network packets are decomposed into local packets. So if all machines have the same subnet, docker connections on the flannel overlay won't be routable.

By the way: To quick start, just run etcd from a container one-liner

If all this makes your head spin, and you're having etcd issues - the best thing you can do for testing is just run ETCD from a container, so that you know its working perfectly, as a single node service. sudo docker run -t -i -p 8001:8001 -p 4001:4001 -p 2379:2379 quay.io/coreos/etcd:v2.0.10 --addr 0.0.0.0:4001 --name etcd-node1 --data-dir=/tmp/etcd4FlannelKube. Running etcd this way in a new cluster works fine, and is simple and allows you to focus on other parts of your distributed system.

FYI, a working flannel cluster, config might look like this (generated using contrib/ansible).

/etc/sysconfig/flanneld:# Flanneld configuration options
/etc/sysconfig/flanneld:FLANNEL_ETCD="http://kube-master:2379"
/etc/sysconfig/flanneld:# etcd config key. This is the configuration key that flannel queries
/etc/sysconfig/flanneld:FLANNEL_ETCD_KEY="/cluster.local/network"
/etc/sysconfig/flanneld:# By default, we just add a good guess for the network interface on Vbox. Otherwise, Flannel will probably make the right guess.
/etc/sysconfig/flanneld:FLANNEL_OPTIONS="--iface=eth1"
/etc/firewalld/direct.xml: <rule priority="1" table="filter" ipv="ipv4" chain="FORWARD">-i flannel.1 -o docker0 -j ACCEPT -m comment --comment 'flannel subnet'</rule>

Hacking around with Flannel

Flannel is surprisingly easy to hack around with and build for yourself. You can just clone it down from github, and then run rm -rf bin/flanneld ; ./build ; bin/flanneld . That pretty much is all you need to do. Associated with this post is a minor pr to cleanup flanneld options.

21.6.15

Flannel and network interfaces, heads up !

2 comments: