6.10.22

How to figure out whats wrong in your K8s cluster one step at a time

TLDR: Kubernetes e2e tests can rapidly discover a problem in any K8s cluster, and runnin the sig-network and sig-storage tests which likely will highlight the important issues you might have in a broken clsuter (since these are commonly implemented as PLUGINS, can be run in ~20 minutes - - - whereas it takes maybe 2 hours to run the entire suite .... 

Thus, You DONT need to wait 2 hours for k8s conformance to complete to use Sonobuoy to triage cluster infra issues  !!!

As you know, 

- K8s unlike other distributed systems ships a great , big, massive set of conformance and end to end tests you can run on any cluster.

- all K8s e2es are broken up by sig, and that means you can chose to run sig-network, sig-storage, sig-scheduler, etc... tests separately, instead of all at once.

- This allows people to submit their results to https://github.com/cncf/k8s-conformance , and thus have certified kubernetes products.  But more importantly, it gives us a standard definition of what a k8s cluster is.

So  should you be running all of these tests when your triaging a broken cluster issue? 

... Well, don't worry, you dont need to run all of them. To quantify just how much of the K8s conformance suite is related to common failures (storage/network plugins) compared to other stable things (like apiserver, scheduler, ...), I wrote up a tiny program that will print a table out of the proportion of time your conformance tests are taking based on the sig-tag....   https://github.com/jayunit100/k8sprototypes/blob/master/sonoeasy/sonoeasy.py (grab a sample junit test run from the k8s-conformance site, i.e. at https://raw.githubusercontent.com/cncf/k8s-conformance/master/v1.23/eks/junit_01.xml )



That said, you SHOULD definetly always still  run this one sig-apps test 
<testcase name=
"[sig-apps] StatefulSet Basic StatefulSet functionality
[StatefulSetBasic] should perform rolling updates and
roll backs of template modifications [Conformance]"
classname="Kubernetes e2e suite" time="91.213657066"></testcase>

Because this test is likely to catch CNI related errors , for example, around IP addresses not being recycled correctly. 

REWIND... why are we talking about this ? 

Ok, so... Why are we bothering with this? Because, theres a logical , hierarchical way to triage problems in a Kubernetes cluster , and you can easily use tools like sonobuoy to do this. 

- "I think DNS is broken inside of my contianers"
- "The nodes in my cluster cannot make tmp/ or emptyDirs for pods"
- "There is a bug in the CNIs ability to route traffic"
- "Loadbalancer outages happen for some of my workloads"
- "I keep getting NXDomain errors for my pods"

Any ways, if you have a bug in a cluster, or think you do, walk through the RED arrows above like so --.... (you can look at the tags in your tests by simply grepping them out of a file such as https://raw.githubusercontent.com/cncf/k8s-conformance/master/v1.23/eks/junit_01.xml).

i.e. sonobuoy run --e2e-focus="[sig-apps] StatefulSet Basic StatefulSet functionality"

PRE_CHECKS

- First, check APIServer connectivity .  This is easy with or without sonobuoy running - kubectl get pods is enought to confirm you can access your APIServer.  Without this, you cant even run Sonbuoy to begin with 

- Next, check that core-dns is up .  If coreDNS isnt up, then you may not have a working CNI provider !  In that case, almost nothing else will work.  While your at it, try scaling your coredns pods from 2 to 3 (kubectl scale deployment coredns -n kube-system --replicas 3).

KUBERNETES E2E Checks


(note in the examples below i use e2e.test, but the same thing can be done w/ sonobuoy, just replace --ginkgo.focus with --e2e-focus and --ginkgo.skip with --e2e-skip... if you want to run the raw e2es instead of sonobouy, i have an installer i maintain that i stole from antonio, over here https://github.com/jayunit100/k8sprototypes/blob/master/e2e/e2e.sh) 

- After you know APIServers are contactable, and CNIs are running, you have a few E2E Tests which target the kube-proxy.  The Kube proxy's job is to write rules that forward traffic from k8s services to pods.  These tests are covered in the next bullet

- LoadBalancer or Layer 7 tests - that confirm ingress or gateway APIs work.  In this case, if these tests fail, you'll see that your ingress solution (i.e. contour, or AVI, or ... ) isnt working.  Or maybe your datacenter has firewall rules that are preventing external IPs from being routed.  Some people run ingress controllers on host ports which requires that your hosts are able to be accessesable.  If your ingress's binding to a host port fails, then, its ability to route incoming traffic from a layer 7 perspective into your cluster will fail because the ingress itself is not accessible.   

the ginkgo tag for these is L7, and there are 10 of them..., i.e. 

e2e.test --provider=local --kubeconfig=/home/ubuntu/.kube/config --dump-logs-on-failure=false --ginkgo.focus="L7" --ginkgo.dryRun=true

and also about another 20 or so general loadbalancing tests...

/e2e.test --provider=local --kubeconfig=/home/ubuntu/.kube/config --dump-logs-on-failure=false --ginkgo.focus="LoadBalancer" --ginkgo.dryRun=true

- EmptyDir and StatefulSets are next up.  Stateful Set tests can fail if your CNI doesnt do a good job of reusing IP addresses and rapidly finding them on the same node. EmptyDir failures are often related to things like SELinux labels or other security tools blocking access to kernel functionality.  The focus flag for this type of test was shown above. 

There are around 30 or so emptyDir/SubPath/fsGroup/tmpfs types of tests that you can run.

e2e.test --provider=local --kubeconfig=/home/ubuntu/.kube/config --dump-logs-on-failure=false --ginkgo.focus="mptyDir" --ginkgo.dryRun=true

/e2e.test --provider=local --kubeconfig=/home/ubuntu/.kube/config --dump-logs-on-failure=false --ginkgo.focus="LoadBalancer" --ginkgo.dryRun=true

- Finally theres CSI.  Sometimes a CSI will seem to fail because you havent installed a StorageClass or, you havent installed all the CSI components (there are many.. its a complicated spec).  Other times the CSI components are installed but cant make API calls to your cloud (i.e. cant make underlying EBS or VSphere or whatever... types of volumes).   The tests with the PVC ginkgo tags are a good measurement for these. 

For these, you want to skip the Driver specific tests, so you just run generic PV and PVC types of tests...

 ./e2e.test  --provider=local --kubeconfig=/home/kubo/.kube/config --dump-logs-on-failure=false --ginkgo.focus="VolumeClaim" --ginkgo.skip="river"

You'll see persistant volumes coming up, if CSI is properly setup ...


MORE RANDOM DATA I COLLECTED ON THIS STUFF..


So, From parsing recent results, it seems as though on eks, you can run 20% of the Conformance suite in about a minute , depending on how you select your tests...  That is,  if you look at  https://raw.githubusercontent.com/cncf/k8s-conformance/master/v1.23/eks/junit_01.xml for eks, you can easily run A TON of conformance tests very  extremely quickly...  for example, these tests could collectively evaluate aspects of sig-node, api-machinery, sig-network, sig-cli, and sig-auth, in like 30 seconds...

The python program at the top of this post will reproduce these stats for you, but you can just look at them below (ive pasted the sample results from a look at the eks/ runs above)....



So, as an example of how fast many tests run... look at all these tests... that run under 1 second



[sig-node] Secrets should fail to create secret due to empty secret key [Conformance]    0.155663036
[sig-api-machinery] server version should find the server version [Conformance]0.157615981
[sig-node] ConfigMap should fail to create ConfigMap with empty key [Conformance]0.164172996
[sig-network] Services should provide secure master service [Conformance]0.166908595
[sig-network] Services should find a service from listing all namespaces [Conformance]0.17186666
[sig-api-machinery] Servers with support for Table transformation should return a 406 for a backend which does not implement metadata [Conformance]0.172212388
[sig-api-machinery] CustomResourceDefinition resources [Privileged:ClusterAdmin] should include custom resource definition resources in discovery documents [Conformance]0.178700311
[sig-cli] Kubectl client Proxy server should support --unix-socket=/path [Conformance]0.179111051
[sig-node] Sysctls [LinuxOnly] [NodeConformance] should reject invalid sysctls [MinimumKubeletVersion:1.21] [Conformance]0.184012463
[sig-node] Kubelet when scheduling a busybox command that always fails in a pod should be possible to delete [NodeConformance] [Conformance]0.188583903
[sig-node] Pods Extended Pods Set QOS Class should be set on Pods with matching resource requests and limits for memory and cpu [Conformance]0.191343331
[sig-cli] Kubectl client Proxy server should support proxy with --port 0 [Conformance]0.200247087
[sig-cli] Kubectl client Kubectl version should check is all data is printed [Conformance]0.200461962
[sig-network] EndpointSlice should have Endpoints and EndpointSlices pointing to API Server [Conformance]0.20276523
[sig-node] ConfigMap should run through a ConfigMap lifecycle [Conformance]0.202904434
[sig-node] Secrets should patch a secret [Conformance]0.210351735
[sig-instrumentation] Events should ensure that an event can be fetched0.210512033
[sig-api-machinery] ResourceQuota should be able to update and delete ResourceQuota. [Conformance]0.211759457
[sig-api-machinery] Watchers should be able to restart watching from the last resource version observed by the previous watch [Conformance]0.212458945
[sig-cli] Kubectl client Kubectl api-versions should check if v1 is in available api versions [Conformance]0.213944743
[sig-api-machinery] Watchers should be able to start watching from a specific resource version [Conformance]0.216483074
[sig-node] PodTemplates should run the lifecycle of PodTemplates [Conformance]0.2165391
[sig-node] PodTemplates should delete a collection of pod templates [Conformance]0.217690885
[sig-storage] ConfigMap should be immutable if `immutable` field is set [Conformance]0.218364388
[sig-storage] Secrets should be immutable if `immutable` field is set [Conformance]0.223649127
[sig-node] Lease lease API should be available [Conformance]0.22748327
[sig-instrumentation] Events API should delete a collection of events [Conformance]0.228360465
[sig-instrumentation] Events API should ensure that an event can be fetched0.234803596
[sig-auth] ServiceAccounts should run through the lifecycle of a ServiceAccount [Conformance]0.235691026
[sig-network] Services should test the lifecycle of an Endpoint [Conformance]0.238658028
[sig-instrumentation] Events should delete a collection of events [Conformance]0.241981978
[sig-network] Services should delete a collection of services [Conformance]0.25657928
[sig-network] IngressClass API should support creating IngressClass API operations [Conformance]0.257101148
[sig-node] RuntimeClass should support RuntimeClasses API operations [Conformance]0.265997477
[sig-network] Services should complete a service status lifecycle [Conformance]0.270138599
[sig-network] EndpointSlice should support creating EndpointSlice API operations [Conformance]0.285983347
[sig-cli] Kubectl client Kubectl cluster-info should check if Kubernetes control plane services is included in cluster-info [Conformance]0.304683995
[sig-network] Ingress API should support creating Ingress API operations [Conformance]0.319593101
[sig-api-machinery] Namespaces [Serial] should patch a Namespace [Conformance]0.323458295
[sig-apps] CronJob should support CronJob API operations [Conformance]0.370868758
[sig-api-machinery] Discovery should validate PreferredVersion for each APIGroup [Conformance]0.637044312
[sig-api-machinery] CustomResourceDefinition resources [Privileged:ClusterAdmin] Simple CustomResourceDefinition getting/updating/patching custom resource definition status sub-resource works [Conformance]0.707887152
[sig-cli] Kubectl client Kubectl diff should check if kubectl diff finds a difference for Deployments [Conformance]0.816452287
[sig-auth] ServiceAccounts should allow opting out of API token automount [Conformance]0.856893351
[sig-auth] Certificates API [Privileged:ClusterAdmin] should support CSR API operations [Conformance]0.86773032

LONG RUNNING TESTS


[sig-apps] Job should run a job to completion when tasks sometimes fail and are locally restarted [Conformance]10.17500616
[sig-apps] ReplicaSet should serve a basic image on each replica with a public image [Conformance]10.21267254
[sig-api-machinery] Garbage collector should delete pods created by rc when not orphaning [Conformance]10.21614485
[sig-apps] ReplicationController should serve a basic image on each replica with a public image [Conformance]10.23349802
[sig-api-machinery] Watchers should observe an object deletion if it stops meeting the requirements of the selector [Conformance]10.24681852
[sig-apps] Daemon set [Serial] should rollback without unnecessary restarts [Conformance]10.31510448
[sig-apps] DisruptionController should block an eviction until the PDB is updated to allow it [Conformance]10.390905
[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of same group and version but different kinds [Conformance]11.06984923
[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of different groups [Conformance]11.08131781
[sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a replication controller. [Conformance]11.21695587
[sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a replica set. [Conformance]11.22082791
[sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a service. [Conformance]11.38315219
[sig-network] DNS should provide DNS for ExternalName services [Conformance]11.46801213
[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] removes definition from spec when one version gets changed to not be served [Conformance]11.93622644
[sig-network] Services should have session affinity work for NodePort service [LinuxOnly] [Conformance]12.20273115
[sig-api-machinery] Garbage collector should keep the rc around until all its pods are deleted if the deleteOptions says so [Conformance]12.21750599
[sig-network] Services should serve a basic endpoint from pods [Conformance]12.32925287
[sig-network] DNS should resolve DNS of partial qualified names for services [LinuxOnly] [Conformance]12.64054188
[sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a pod. [Conformance]13.31501141
[sig-api-machinery] Namespaces [Serial] should ensure that all pods are removed when a namespace is deleted [Conformance]13.48462069
[sig-cli] Kubectl client Update Demo should scale a replication controller [Conformance]13.68950916
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should be able to deny pod and configmap creation [Conformance]13.87319204
[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] updates the published spec when one version gets renamed [Conformance]14.15743553
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] Should recreate evicted statefulset [Conformance]14.42719602
[sig-network] Networking Granular Checks: Pods should function for intra-pod communication: udp [NodeConformance] [Conformance]15.33207411
[sig-network] HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol [LinuxOnly] [Conformance]15.52827386
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should honor timeout [Conformance]15.87803072
[sig-api-machinery] ResourceQuota should verify ResourceQuota with best effort scope. [Conformance]16.31443937
[sig-api-machinery] ResourceQuota should verify ResourceQuota with terminating scopes. [Conformance]16.32086439
[sig-api-machinery] Garbage collector should not delete dependents that have both valid owner and owner that's waiting for dependents to be deleted [Conformance]16.8917778
[sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a secret. [Conformance]17.24929131
[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of same group but different versions [Conformance]19.82316572
[sig-api-machinery] Watchers should observe add20.22437891
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should have a working scale subresource [Conformance]20.27918763
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should list20.28188958
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should validate Statefulset Status endpoints [Conformance]20.33999605
[sig-apps] Deployment deployment should support rollover [Conformance]21.42048895
[sig-node] Probing container with readiness probe should not be ready before initial delay and never restart [NodeConformance] [Conformance]22.20182905
[sig-node] Probing container should be restarted with a /healthz http liveness probe [NodeConformance] [Conformance]22.31884852
[sig-storage] Subpath Atomic writer volumes should support subpaths with secret pod [Excluded:WindowsDocker] [Conformance]24.3370857
[sig-storage] Subpath Atomic writer volumes should support subpaths with downward pod [Excluded:WindowsDocker] [Conformance]24.366907
[sig-storage] Subpath Atomic writer volumes should support subpaths with projected pod [Excluded:WindowsDocker] [Conformance]24.37687616
[sig-storage] Subpath Atomic writer volumes should support subpaths with configmap pod [Excluded:WindowsDocker] [Conformance]24.38001167
[sig-storage] Subpath Atomic writer volumes should support subpaths with configmap pod with mountPath of existing file [Excluded:WindowsDocker] [Conformance]24.39562544
[sig-network] Networking Granular Checks: Pods should function for intra-pod communication: http [NodeConformance] [Conformance]25.32674205
[sig-network] Networking Granular Checks: Pods should function for node-pod communication: http [LinuxOnly] [NodeConformance] [Conformance]25.43242362
[sig-node] Container Runtime blackbox test when starting a container that exits should run with the expected status [NodeConformance] [Conformance]27.57432488
[sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a configMap. [Conformance]28.23998371
[sig-network] EndpointSlice should create Endpoints and EndpointSlices for Pods matching a Service [Conformance]30.38298475
[sig-network] Services should have session affinity timeout work for NodePort service [LinuxOnly] [Conformance]31.56373826
[sig-network] Services should have session affinity timeout work for service with type clusterIP [LinuxOnly] [Conformance]31.65262952
[sig-auth] ServiceAccounts ServiceAccountIssuerDiscovery should support OIDC discovery of service account issuer [Conformance]34.26180581
[sig-apps] Job should delete a job [Conformance]34.55714629
[sig-node] Variable Expansion should succeed in writing subpaths in container [Slow] [Conformance]34.9252411
[sig-network] Networking Granular Checks: Pods should function for node-pod communication: udp [LinuxOnly] [NodeConformance] [Conformance]35.5197319
[sig-apps] Daemon set [Serial] should update pod when spec was updated and update strategy is RollingUpdate [Conformance]35.72246217
[sig-node] InitContainer [NodeConformance] should not start app containers if init containers fail on a RestartAlways pod [Conformance]41.06649579
[sig-api-machinery] Garbage collector should orphan pods created by rc if delete options say so [Conformance]42.88550429
[sig-node] Probing container should be restarted with a exec "cat /tmp/health" liveness probe [NodeConformance] [Conformance]52.5085755
[sig-storage] EmptyDir wrapper volumes should not cause race condition when used for configmaps [Serial] [Conformance]57.8745077
[sig-node] Probing container with readiness probe that fails should never be ready and never restart [NodeConformance] [Conformance]60.17892246
[sig-scheduling] SchedulerPreemption [Serial] PriorityClass endpoints verify PriorityClass endpoints can be operated with different HTTP methods [Conformance]60.57668914
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] Burst scaling should run to completion even with unhealthy pods [Slow] [Conformance]61.63529355
[sig-api-machinery] CustomResourceDefinition Watch [Privileged:ClusterAdmin] CustomResourceDefinition Watch watch on custom resource definition objects [Conformance]63.36940407
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should perform canary updates and phased rolling updates of template modifications [Conformance]70.6210827
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] Scaling should happen in predictable order and halt if any stateful pod is unhealthy [Slow] [Conformance]71.78964682
[sig-apps] CronJob should schedule multiple jobs concurrently [Conformance]72.1990144
[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance]77.04147942
[sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance]77.04958792
[sig-apps] CronJob should replace jobs when ReplaceConcurrent [Conformance]82.20141649
[sig-storage] Projected configMap optional updates should be reflected in volume [NodeConformance] [Conformance]82.89887111
[sig-scheduling] SchedulerPreemption [Serial] PreemptionExecutionPath runs ReplicaSets to verify preemption running path [Conformance]83.76246018
[sig-node] NoExecuteTaintManager Multiple Pods [Serial] evicts pods with minTolerationSeconds [Disruptive] [Conformance]88.48009067
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should perform rolling updates and roll backs of template modifications [Conformance]91.21365707
[sig-node] NoExecuteTaintManager Single Pod [Serial] removing taint cancels eviction [Disruptive] [Conformance]135.5091157
[sig-node] Probing container should have monotonically increasing restart count [NodeConformance] [Conformance]142.8706649
[sig-node] Variable Expansion should verify that a failing subpath expansion can be modified during the lifecycle of a container [Slow] [Conformance]154.8033392
[sig-node] Probing container should *not* be restarted with a tcp:8080 liveness probe [NodeConformance] [Conformance]243.2846658
[sig-node] Probing container should *not* be restarted with a exec "cat /tmp/health" liveness probe [NodeConformance] [Conformance]243.363388
[sig-node] Probing container should *not* be restarted with a /healthz http liveness probe [NodeConformance] [Conformance]243.5150772
[sig-apps] CronJob should not schedule jobs when suspended [Slow] [Conformance]300.194509
[sig-scheduling] SchedulerPredicates [Serial] validates that there exists conflict between pods with same hostPort and protocol but one using 0.0.0.0 hostIP [Conformance]304.4533643
[sig-apps] CronJob should not schedule new jobs when ForbidConcurrent [Slow] [Conformance]326.2101454




NEXT ? 


It would be interesting to compare vendors conformance submissions and see how much variation there is between performance of the tests across different types of infrastructure.


..... So, what are the SLOWEST TESTS grouped by sig ? we can run the sonoeasy.py program to see , looking at the EKS test conformance results: 


➜  sonoeasy git:(5a27286) ✗ python3 sonoeasy.py 

#document

*****************************

                shrt     avg     dev      long   cnt     totaltime(s)

sig-cli          0.2s    3.8     3.7     13.7s   17      65.3

sig-node         0.2s    22.9    53.9    243.5s  77      1760.1

sig-apps         0.4s    30.6    64.7    326.2s  47      1439.6

sig-auth         0.2s    6.4     12.4    34.3s   7       44.8

sig-api-machin   0.2s    9.1     10.2    63.4s   63      571.0

sig-instrum      0.2s    0.2     0.0     0.2s    4       0.9

sig-scheduling   1.3s    69.1    95.1    304.5s  9       622.0

sig-storage      0.2s    7.1     11.5    82.9s   81      574.0

sig-netw         0.2s    9.4     9.8     35.5s   41      385.1


longest sig-cli : [sig-cli] Kubectl client Update Demo should scale a replication controller  [Conformance] 13.689509157


longest sig-node : [sig-node] Probing container should *not* be restarted with a /healthz http liveness probe [NodeConformance] [Conformance] 243.515077214


longest sig-apps : [sig-apps] CronJob should not schedule new jobs when ForbidConcurrent [Slow] [Conformance] 326.2


longest sig-auth : [sig-auth] ServiceAccounts ServiceAccountIssuerDiscovery should support OIDC discovery of service account issuer [Conformance] 34.261805814


longest sig-api-machin : [sig-api-machinery] CustomResourceDefinition Watch [Privileged:ClusterAdmin] CustomResourceDefinition Watch watch on custom resource definition objects [Conformance] 63.369404072


longest sig-instrum : [sig-instrumentation] Events should delete a collection of events [Conformance] 0.241981978


longest sig-scheduling : [sig-scheduling] SchedulerPredicates [Serial] validates that there exists conflict between pods with same hostPort and protocol but one using 0.0.0.0 hostIP [Conformance] 304.453364309


longest sig-storage : [sig-storage] Projected configMap optional updates should be reflected in volume [NodeConformance] [Conformance] 82.898871112


longest sig-netw : [sig-network] Networking Granular Checks: Pods should function for node-pod communication: udp [LinuxOnly] [NodeConformance] [Conformance] 35.519731895

No comments:

Post a Comment