Update!!!!!! After writing this that you need to enable proxyAll for Antrea to proxy all traffic. Turns out (thanks to Quan tian on the Antrea team) that
- AntreProxy handles service traffic for CNI pods normally.
- it CAN handle hostNetwork pods and external traffic , though
- kube proxy's iptables rules will intercept traffic before you get to OVS though, so, if kube proxy is enabled, antreaproxy loadbalancing is irrelevant, and redundant.
- to replace kube proxy you need to set kubeAPIServerOverride and proxyAll:true for Antrea so that it can (1) talk direct to the apiserver instead of using the internal svc endpoint and (2) proxy all traffic instead of only proxying internal ClusterIP traffic.
NOW THAT SAID
If you have OLD IPTABLES RULES in a cluster for services, then those services might accidentally load balance you to pods that don't exist anymore. Deleting the tube-proxy wont automatically delete it's iptables rules !!!!
-A KUBE-SVC-SAL3JMSY3XQSA64S -m comment --comment "tkg-system/tkr-resolver-cluster-webhook-service -> 100.96.1.15:9443" -j KUBE-SEP-APPCGWWBW4GQN2HK
-A KUBE-SVC-SZBZVVMNBX2D3VFK ! -s 100.96.0.0/11 -d 100.68.215.151/32 -p tcp -m comment --comment "capi-kubeadm-bootstrap-system/capi-kubeadm-bootstrap-webhook-service cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ -A KUBE-SVC-SZBZVVMNBX2D3VFK -m comment --comment "capi-kubeadm-bootstrap-system/capi-kubeadm-bootstrap-webhook-service -> 100.96.1.8:9443" -j KUBE-SEP-Z5LDPDXWJ7G5IJIQ -A KUBE-SVC-TCOU7JCQXEZGVUNU ! -s 100.96.0.0/11 -d 100.64.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns -> 100.96.0.3:53" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-ONT5M3GKZCUX63CA -A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns -> 100.96.0.4:53" -j KUBE-SEP-X6Z4D2PLM3E4AFD7 -A KUBE-SVC-XI3EV6PAOLGPKNCY ! -s 100.96.0.0/11 -d 100.68.64.20/32 -p tcp -m comment --comment "tkg-system/tkr-conversion-webhook-service cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-XI3EV6PAOLGPKNCY -m comment --comment "tkg-system/tkr-conversion-webhook-service -> 100.96.1.14:9443" -j KUBE-SEP-EMM6VPN6RQ4BTZVT -A KUBE-SVC-Z4ANX4WAEWEBLCTM ! -s 100.96.0.0/11 -d 100.70.166.34/32 -p tcp -m comment --comment "kube-system/metrics-server:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ -A KUBE-SVC-Z4ANX4WAEWEBLCTM -m comment --comment "kube-system/metrics-server:https -> 100.96.1.4:4443" -j KUBE-SEP-FY6XAK6CPXJEDB4L
-A KUBE-SVC-ZUD4L6KQKCHD52W4 ! -s 100.96.0.0/11 -d 100.71.102.80/32 -p tcp -m comment --comment "cert-manager/cert-manager-webhook:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ -A KUBE-SVC-ZUD4L6KQKCHD52W4 -m comment --comment "cert-manager/cert-manager-webhook:https -> 100.96.1.5:10250" -j KUBE-SEP-WTGQJIGDMYOJ6KTB
These were all floating around in my cluster AFTER I deleted kube proxy
Here's my original debugging: you might find it useful if your trying to investigate failures in your kube proxy or king or antreaproxy or other changes to your k8s service proxying architecture.
I was seeing this in a cluster today
E0712 12:12:14.883784 1 cacher.go:450] cacher (vspheremachinetemplates.infrastructure.cluster.x-k8s.io): unexpected ListAndWatch error: failed to list infrastructure.cluster.x-k8s.io/v1alpha4, Kind=VSphereMachineTemplate: conversion webhook for infrastruc
ture.cluster.x-k8s.io/v1beta1, Kind=VSphereMachineTemplate failed: Post "https://capv-webhook-service.capv-system.svc:443/convert?timeout=30s": context deadline exceeded (Client.Timeout exceeded while awaiting headers); reinitializing... W0712 12:12:18.570987 1 dispatcher.go:187] Failed calling webhook, failing closed validation.vspheremachinetemplate.infrastructure.x-k8s.io: failed calling webhook "validation.vspheremachinetemplate.infrastructure.x-k8s.io": failed to call webhook: Post "
These seemed to only happen on
- Post "https://capv-webhook-service.capv-system.svc:443/convert?timeout=30s" ...
- Post "https://capi-webhook-service.capi-system.svc:443/convert?timeout=30s"...
So I wonder what is going on here? why are these web hooks failing? The services look existent
capi-system capi-webhook-service ClusterIP 100.70.159.142 <none> 443/TCP 3d9h cluster.x-k8s.io/provider=cluster-api
capv-system capv-webhook-service ClusterIP 100.69.250.28 <none> 443/TCP 3d9h cluster.x-k8s.io/provider=infrastructure-vsphere
Lets look at endpoints....
kubo@uOFLhGS9YBJ3y:~$ kubectl get endpoints -o wide -A E0712 12:16:36.748680 513955 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0712 12:16:36.752015 513955 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0712 12:16:36.755788 513955 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0712 12:16:36.759763 513955 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request NAMESPACE NAME ENDPOINTS AGE
...
capi-system capi-webhook-service 100.96.0.142:9443 3d10h
capv-system capv-webhook-service 100.96.0.134:9443 3d10h
kubectl delete apiservice v1beta1.metrics.k8s.io
No comments:
Post a Comment