Possible Issues and Solutions in Kubernetes, Docker, and Containerd
During Docker, Containerd, and Kubernetes installations and usage, you may encounter various issues. You can review examples on this page for commonly encountered situations.
Problem | |
---|---|
Reason/Cause | Pod overlay network'ü için kullanılan Flannel dosya ve ayarları sunucuda kalmakta ve manuel silinmesi gerekmektedir |
Solution | In the Control-plane Kubernetes node, the relevant server is removed from the cluster with "kubectl delete node <NODENAME>", then the cluster settings on the server to be disconnected are cleared with the "sudo kubeadm reset" command. Then, the following operations are performed. systemctl stop kubelet && systemctl stop containerd rm -rf /var/lib/cni/ rm -rf /var/lib/kubelet/* rm -rf /etc/cni/ ifconfig cni0 down && ip link delete cni0 ifconfig flannel.1 down && ip link delete flannel.1 systemctl restart containerd && systemctl restart kubelet |
Problem | |
---|---|
Reason/Cause | With the release of RHEL 8 and CentOS 8, the docker package was removed from the default package repositories, replaced by docker podman and buildah. RedHat has decided not to provide official support for Docker. For this reason, these packages prevent docker installation. |
Solution | yum remove podman* -y yum remove buildah* -y |
Problem | |
---|---|
Reason/Cause | In Linux operating systems, "swap" and "selinux", which are usually active, should be turned off. |
Solution | sudo swapoff -a sudo sed -i '/ swap / s/^/#/' /etc/fstab sudo reboot kubeadm reset kubeadm init --ignore-preflight-errors all |
Problem | |
---|---|
Sebep/Neden | deleting namespace stuck at "Terminating" state |
Çözüm | kubectl get namespace "<NAMESPACE>" -o json | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" | kubectl replace --raw /api/v1/namespaces/<NAMESPACE>/finalize -f - |
Problem | |
---|---|
Reason/Cause | If the relevant institution does not use https, the following line is added to the daemon file of docker. This process is repeated for all nodes using Docker. |
Solution | $ sudo vi /etc/docker/daemon.json |
Reason/Cause | If the relevant institution uses https, the relevant institution must add the ssl certificate ("crt") to the servers. |
Solution | cp ssl.crt /usr/local/share/ca-certificates/
sudo update-ca-trust extract |
Problem | |
---|---|
Reason/Cause | If the relevant institution uses Nexus proxy, servers with docker are directed to this address. |
Solution | $ sudo vi /etc/docker/daemon.json { "data-root":"/docker-data", "insecure-registries":["nexusdocker.institutionaddress.com.tr"], "registry-mirrors":["https://nexusdocker.institutionaddress.com.tr"], "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2" } |
Problem | |
---|---|
Reason/Cause | Node stays on Ready,SchedulingDisabled |
Test |
If the result is as below everything is correct. If the result is as follows, there is an error and the following steps need to be checked.Server: 10.96.0.10 Check the Resolv.conf file. (correct) nameserver 10.96.0.10 (false) nameserver 10.96.0.10 |
Solution | In a client, it was resolved by adding the institution's domain address to the /etc/resolv.conf file. search institution.gov.tr |
Problem | |
---|---|
Reason/Cause | On Ubuntu servers, changes made to the dns server may not always be reflected in resolv.conf or may be skipped. Since Kubernetes by default looks at the cat /etc/resolv.conf file on the server after its own internal dns, you should make sure that this file is correct. |
Solution | On all server: sudo rm /etc/resolv.conf sudo ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf sudo systemctl restart systemd-resolved ls -l /etc/resolv.conf cat /etc/resolv.conf Only on the master server: kubectl -n kube-system rollout restart deployment coredns |
Problem | docker: Error response from daemon: Get https://registry-1.docker.io/v2/: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "institutionCertificateName-CA"). |
---|---|
Reason/Cause | Firewall adds its own certificate by doing ssl inspection. |
Solution | docker.io will be added to "ssl inspection exception" on firewall. |
Problem | |
---|---|
Reason/Cause | In master, kube-flannel somehow fails to create required folder and files. |
Solution | (Alternative solutions: https://github.com/kubernetes/kubernetes/issues/54918) $ sudo mkdir -p /etc/cni/net.d $ sudo vi /etc/cni/net.d/10-flannel.conflist #add the below. {"name": "cbr0","plugins": [{"type": "flannel","delegate": {"hairpinMode": true,"isDefaultGateway": true}},{"type": "portmap","capabilities": {"portMappings": true}}]}---------- {"name": "cbr0","cniVersion": "0.3.1","plugins": [{"type": "flannel","delegate": {"isDefaultGateway": true}},{"type": "portmap","capabilities": {"portMappings": true}}]}------------ sudo chmod -Rf 777 /etc/cni /etc/cni/* sudo chown -Rf apinizer:apinizer /etc/cni /etc/cni/* sudo systemctl daemon-reload sudo systemctl restart kubelet #Check if there is still a pod that cannot take an image: kubectl get pods -n kube-system describe pod podAdi -n kube-system |
Problem | |
---|---|
Reason/Cause | Unable to connect to the server: x509: certificate has expired or is not yet valid |
Solution | #These operations should be done on all master servers. |
Problem | |
---|---|
Reason/Cause | That problem can occur from the reasons below:
|
Solution | sudo swapoff -a sudo vi /etc/fstab (swap line must be commented out or deleted) mkdir -p $HOME/.kube sudo reboot (optional) |
Problem | |
---|---|
Reason/Cause | Although there are various reasons for this problem, if the error says that there is no /etc/kubernetes/bootstrap-kubelet.conf file or any .conf file can be found, all configs can be created from scratch by following the procedures below. |
Solution | #Existing configs and certificates are backed up and operations are performed cd /etc/kubernetes/pki/ sudo kubeadm init phase certs all --apiserver-advertise-address <MasterIP> sudo kubeadm init phase kubeconfig all |
Problem | |
---|---|
Reason/Cause | The problem above is a problem that occurs when you do not have a trusted certificate when taking images from the Private registry. |
Solution | We provide the solution with the -skip-verify parameter. For example, the command to include it in the "k8s.io" namespace: ctr --namespace k8s.io images pull xxx.harbor.com/apinizercloud/managerxxxx -skip-verify |
Problem | |
---|---|
Reason/Cause | Kubernetes does not distribute pods in a balanced way because, by default, pods are placed on nodes that are deemed most suitable based on available resources, without any specific strategy or limitations. |
Solution | Add the YAML file showing how to distribute the pods in a balanced way using Pod Topology Spread Constraints after the second spec section.
YML
Warning: Control:
CODE
|
Problem | |
---|---|
Reason/Cause | When a node in Kubernetes shuts down unexpectedly (Non-Graceful Shutdown), the Kubernetes Master detects this situation and takes necessary actions. However, this detection process may be delayed because it depends on the timeout parameters of the system. |
Solution | The main parameters to take into account to set this duration are: 1. Node Status Update Frequency
CODE
2. Node Monitor Grace Period
CODE
3. Pod Eviction Timeout
CODE
|
- A server removed from a Kubernetes Cluster does not get the correct overlay network IP when added to another Kubernetes Cluster.
- Error while installing docker on Centos 8.3.x servers
- kubeadm error: "kubelet isn’t running or healthy and connection refused"
- deleting namespace stuck at "Terminating" state
- "x509 certificate" issue during docker pull
- If Nexus proxy is in use
- Kubernetes DNS Problem (connection timed out; no servers could be reached)
- On Ubuntu servers with Kubernetes Clusters, HOST names cannot be resolved due to changes in DNS settings not being reflected in the /etc/resolv.conf file
- docker: Error response from daemon: Get https://registry-1.docker.io/v2/: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "institutionCertificateName-CA").
- Node stucks at status NotReady and error message is as follows: "Unable to update cni config: no networks found in /etc/cni/net.d"
- Client certificates generated by kubeadm expire after 1 year - "internal server error. Error Detail: operation: [list] for kind: [pod] with name: [null] in namespace: [prod] failed"
- The connection to the server x.x.x.:6443 was refused - did you specify the right host or port?
- kubelet.service: Main process exited, code=exited, status=255
- ctr: failed to verify certificate: x509: certificate is not valid
- Failure to distribute pods evenly
- Non-Graceful Node Shutdown in Kubernetes