28.3.17

Etcd: sandboxing a cluster using kubernetes for DNS

To stress test etcd lately I've been playing with the idea of running it as a replication controller for sandboxing.

Its pretty easy to do this locally in kubernetes:

First export:

KUBE_ENABLE_CLUSTER_DNS=true
API_HOST=

Now, just grab https://github.com/coreos/etcd/blob/master/hack/kubernetes-deploy/etcd.yml , and run it as a replication controller

Modifying it to provide a loadbalanced endpoint 

Note that if you want, you can setup a loadbalanced endpoint, by simply setting 'nodePort' and type:LoadBalancer.


  - name: etcd-client-port
    port: 2379
    protocol: TCP
    targetPort: 2379
    nodePort: 30000
  selector:
    app: etcd
  type: LoadBalancer


Then, you can simply exec cluster health to see if everybody is happy:


cluster/kubectl.sh exec etcd0 /usr/local/bin/etcdctl cluster-health

And the logs will look something like this:

2017-03-30 20:46:27.754253 I | etcdserver: setting up the initial cluster version to 3.1
2017-03-30 20:46:27.757384 N | etcdserver/membership: set the initial cluster version to 3.1
2017-03-30 20:46:27.757450 I | etcdserver/api: enabled capabilities for version 3.1
2017-03-30 20:46:28.373507 E | rafthttp: failed to dial ade526d28b1f92f7 on stream Message (dial tcp 10.0.0.120:2380: i/o timeout)
2017-03-30 20:46:28.373535 I | rafthttp: peer ade526d28b1f92f7 became inactive
2017-03-30 20:46:28.475156 I | rafthttp: peer ade526d28b1f92f7 became active
2017-03-30 20:46:28.475193 I | rafthttp: established a TCP streaming connection with peer ade526d28b1f92f7 (stream Message reader)
2017-03-30 20:46:28.475427 I | rafthttp: established a TCP streaming connection with peer ade526d28b1f92f7 (stream MsgApp v2 reader)


 Getting the Endpoints

In my case, the goal here is to get endpoints for individual etcd instances for scraping and monitoring metrics.  This can now be easily done as follows:


 cluster/kubectl.sh get endpoints


NAME          ENDPOINTS                                         AGE
etcd-client   172.17.0.3:2379,172.17.0.4:2379,172.17.0.5:2379   23m
etcd0         172.17.0.3:2380,172.17.0.3:2379                   23m
etcd1         172.17.0.4:2380,172.17.0.4:2379                   23m
etcd2         172.17.0.5:2380,172.17.0.5:2379                   23m
kubernetes    10.240.0.3:6443                                   29m


Note that in doing this, kubernetes returns the docker IP addresses.

Debugging: Grabbing the local endpoints.

Interestingly here we can see 10.* addresses were given as variables to the ETCD containers in kube.  I'm not entirely sure why 172 is given out by kubectl get endpoints

 
➜  kubernetes git:(local-up-conformance) ✗ sudo docker inspect 16c5d46a36c8 | grep ETCD1
                "ETCD1_PORT_2380_TCP=tcp://10.0.0.223:2380",
                "ETCD1_PORT_2380_TCP_ADDR=10.0.0.223",
                "ETCD1_PORT_2380_TCP_PROTO=tcp",
                "ETCD1_SERVICE_PORT_SERVER=2380",
                "ETCD1_PORT_2379_TCP=tcp://10.0.0.223:2379",
                "ETCD1_PORT=tcp://10.0.0.223:2379",
                "ETCD1_PORT_2380_TCP_PORT=2380",
                "ETCD1_SERVICE_HOST=10.0.0.223",
                "ETCD1_SERVICE_PORT=2379",
                "ETCD1_SERVICE_PORT_CLIENT=2379",
                "ETCD1_PORT_2379_TCP_PROTO=tcp",
                "ETCD1_PORT_2379_TCP_ADDR=10.0.0.223",
                "ETCD1_PORT_2379_TCP_PORT=2379",
➜  kubernetes git:(local-up-conformance) ✗ sudo docker inspect 16c5d46a36c8 | grep ETCD2
                "ETCD2_SERVICE_HOST=10.0.0.92",
                "ETCD2_SERVICE_PORT_CLIENT=2379",
                "ETCD2_PORT=tcp://10.0.0.92:2379",
                "ETCD2_PORT_2379_TCP_PROTO=tcp",
                "ETCD2_PORT_2379_TCP_ADDR=10.0.0.92",
                "ETCD2_PORT_2379_TCP_PORT=2379",
                "ETCD2_PORT_2380_TCP=tcp://10.0.0.92:2380",
                "ETCD2_PORT_2380_TCP_ADDR=10.0.0.92",
                "ETCD2_SERVICE_PORT_SERVER=2380",
                "ETCD2_PORT_2379_TCP=tcp://10.0.0.92:2379",
                "ETCD2_PORT_2380_TCP_PORT=2380",
                "ETCD2_PORT_2380_TCP_PROTO=tcp",
                "ETCD2_SERVICE_PORT=2379",

Getting individual endpoints data

One of the most interesting things you can do with a local etcd kubernetes deployment is watch how it scales locally.  To do this, you can pull metrics endoints independently, i.e.

curl 172.17.0.3:2379/metrics.
 

WARNING

Depending on fedora/centos versions you might be using,  if you try to reach the endpoints, and get something like this: "404 not found", you may need to restart networking on your system (https://bugzilla.redhat.com/show_bug.cgi?id=1183973)



 Measuring ETCD using metrics/

So now that we have a 'real' cluster running, lets start measuring stuff.

First, fire up prometheus:

Create an etcd configuration file that scrapes your etcd /metrics endpoints:

➜  docker-locust git:(master) ✗ cat /home/jayunit100/work/prometheus/conf.yml
# my global config
global:
  scrape_interval:     2s
  evaluation_interval: 10s
  # scrape_timeout is set to the global default (10s).

scrape_configs:
- job_name: prometheus

  honor_labels: true
  # scrape_interval is defined by the configured global (15s).
  # scrape_timeout is defined by the global default (10s).

  # metrics_path defaults to '/metrics'
  # scheme defaults to 'http'.

  static_configs:
  - targets: ['172.17.0.3:2379','172.17.0.4:2379','172.17.0.5:2379']

Note that the targets == kubectl.sh get endpoints.

Now, mount that file into /etc/prometheus and start it: 

sudo docker run -p 9090:9090 -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

Side note: In GCE, you can export prometheus metrics for your etcd cluster like so, without exposing it to the outside world at all :

gcloud compute firewall-rules create prometheus --allow tcp:9090 --source-tags="jayunit100-scale-perf-devserv-0" --source-ranges=0.0.0.0/0 --description="expose prometheus"

Now, do something interesting ! 


Then use etcddeath/ a really simple suite of shell scripts that can increase load or break networking or other set of stuff to make etcd go crazy.    In particular, busycluster.sh  keyflood.sh 

This is super easy: Just clone down etcddeath  ! https://github.com/jdumars/etcdeath/blob/master/stress/busycluster.sh... And export a VICTIM and get started.

VICTIM=172.17.0.5 ./keyflood.sh

There is also ./busycluster.sh which doesn't destroy your disk i/o but simulates large load.  Once you start stressing your cluster out, you'll see some changes in snapshotting intervals etc...  This can be seen with "etcd_debugging_snap_save_marshalling_duration_seconds_bucket" which is a good metric ofh how long youre snapshotting is taking.  Over time, if this continues to increase unbounded, it can lead to pauses in etcd availability.

Results

Indeed you can easily destroy a  distributed etcd in containers and learn about how it behaves at scale, without a large cluster.

Etcd marshalling: higher bins increase once write strain exceeds disk i/o.




In another experiment (not visualized here)  I also noticed that the etcd_server_proposals_pending metric was useful when the cluster was busy.  The other metrics worth looking at are here https://github.com/coreos/etcd/blob/master/Documentation/metrics.md.

More hacking... 

If etcd is growing you can quickly grab all the keys using the v3 api.  This has changed since V2, so its worth posting here.  Note that  "ETCDCTL_API=3 etcdctl get / --prefix --keys-only" is pulling *all* keys down with no need for recursion, because its a flattened key space (post 2x).  So, the two options we use here to get all keys are quite simple:

1) --prefix can just be anything (i.e. /r /e /..).  If you just use / as the prefix, it will return everything in the entire database (again, due to the flat structure).

2) The --key-only part is important here since the values are binary encoded, and you probably don't want all that muck in your terminal or in a plain text file.

1 comment: