updating Tanzu/CAPI to support wide deployments... 3 parametersss
Preventing MHC from killing a healthy node on a disconnected or slowly network cluster. Setting MHC_FALSE_STATUS_TIMEOUT The file at the right add a new parameter … MHC_FALSE_STATUS_TIMEOUT to a large value, such as 40 minutes. This quadruples the amount of time it takes for CAPI Machine Health checks to assume a node has not come up, and recreate it. It thus increases toleration for long disconnected or flakey networks. This parameter is in the YAML file that you use to define your cluster , i.e., the yaml file that you use as input to tanzu cluster create. | CLUSTER_PLAN: prod CNI: antrea INFRASTRUCTURE_PROVIDER: vsphere KUBERNETES_VERSION: v1.23.8+vmware.2 OS_ARCH: amd64 OS_NAME: photon OS_VERSION: '3' _VSPHERE_CONTROL_PLANE_ENDPOINT: 10.92.160.149 MHC_FALSE_STATUS_TIMEOUT: 40m |
Preventing MHC from killing a node, before it is born. Setting NODE_STARTUP_TIMEOUT Making our MHC_FALSE status was important in order to prevent machine health checks from deleting a healthy node… but sometimes, a node takes a long time to come up. In this case, MHCs can delete machines pre-emptively (i.e. maybe something went wrong in bootstrapping). In edge scenarios, you may want this to be timeout to be more forgiving . | Below, we quadruple the timeout from 15 minutes to 60, similar to what we did for MHC_FALSE_STATUS_TIMEOUTs. CLUSTER_PLAN: prod CNI: antrea INFRASTRUCTURE_PROVIDER: vsphere KUBERNETES_VERSION: v1.23.8+vmware.2 OS_ARCH: amd64 OS_NAME: photon OS_VERSION: '3' _VSPHERE_CONTROL_PLANE_ENDPOINT: 10.92.160.149 NODE_STARTUP_TIMEOUT: 60m |
Preventing etcd clients on the management cluster from prematurely failing while scanning health of etcd on the worker clusters. etcd-dial-timeout-duration in CAPv management clusters | kubectl edit capi-kubeadm-control-plane-controller-manager -n capi-system Modify the following arguments to be “long”, i.e. 40s. The default value for this normally is 10 seconds, thus, we can quadruple this value similar to what was done for other parameters above… - args: - --leader-elect - --metrics-bind-addr=localhost:8080 - --feature-gates=ClusterTopology=false - --etcd-dial-timeout-duration=40s command: - /manager image: projects.registry.vmware.com/tkg/cluster-api/kubeadm-control-plane-controller:v1.0.1_vmware.1 |
No comments:
Post a Comment