Kubernetes uses an elegant "watch" based model to coordinate the construction and maintaining of distributed services and replication controllers. This model also is used for other components, like the kubelets themselves. To understand the watch+controller pattern is to understand cabernets... And fortunately, the last couple of days, while the world was off hiding at OSCON, I got a chance to dive into replicationController internals.
I always knew there was a "watch" being used for communication, but I never witnessed it first hand and isolated it in the code until now. Hope this helps clarify things for folks wondering how all these autonomous daemons ultimately are using etcd to coordinate containers in kubernetes.
So in this post, I will show you how to look under the hood and see whats happening when you run kube commands against kubernetes, in particular, how etcd responds.
Before we look at the code for the controller+watch pattern, lets do an experiment !
First we can go into etcd, and we can see a replication controller definition.
etcdctl watch /registry/controllers/default/redisscWhile that runs, lets go run replication controller resize.
Right when this occurs, we can see that the watch returns a value to us...
There we go ! In the json above, we can see that the replicas were resized to "2" from "1".
This is the basic fundamental building block that kubernetes uses to implement state changes. Every time you make an API request, data is updated in etcd. The watches which run in the kubernetes controllers then receive this update, and act on it.
So, how does this work in the code?
In kubernetes, there is a general concept called a controller. Every controller in kubernetes needs to provide functionality for dealing with (1) when resources are added (2) when resources are updated and (3) when resources are deleted.
There is a data structure which represents this, see (pkg/controller/framework/controller.go).
This generic struct is provided by any construct in kubernetes which needs to watch etcd and respond to events. In other words, pretty much every API call you make leads to an update of etcd, which triggers a watch - which either leads to adding, updating, or deleting a resource. When a resource is modified, the watch calls back to the Go code which is defined in the lambda functions provided to this interface.
Example 1: The ReplicationController
In the replicationController, we can see that essentially any event leads to an "enqueueController" operation.
Example 2: Services
In the service endpoints controller, we again see a similar pattern: the "enqueueService" operation is called.
In any case, each of these queues are processed by a simple worker function which is similar in the rc as well as service controller:
In both cases this simply leads to an addition of a key to a workqueue, which then goes on The workqueue itself is almost identical in each case. Each workqueue ultimately calls a function (like snycReplicationController) which does all the specific tasks associated with all the details of confirming how many pods are running, and increasing/decreasing the number of pods. Similarly for services, the worker() implementation reads work items and calls the serviceSync function, which handles the logic necessary for maintaining/creating service endpoints based for the key which was updated.
Hope this clarifies the generic watch pattern which is used for kubernetes cluster state.




nice!
ReplyDelete