Possible Issues and Solutions in Kubernetes, Docker, and Containerd

During Docker, Containerd, and Kubernetes installations and usage, you may encounter various issues. You can review examples on this page for commonly encountered situations.

Problem	Error while installing docker on Centos 8.3.x servers
Reason/Cause	With the release of RHEL 8 and CentOS 8, the docker package was removed from the default package repositories, replaced by docker podman and buildah. RedHat has decided not to provide official support for Docker. For this reason, these packages prevent docker installation.
Solution	yum remove podman* -y yum remove buildah* -y

Problem

kubeadm error: "kubelet isn’t running or healthy and connection refused"

Reason/Cause

In Linux operating systems, "swap" and "selinux", which are usually active, should be turned off.

Solution

sudo swapoff -a sudo sed -i '/ swap / s/^/#/' /etc/fstab

sudo reboot

kubeadm reset kubeadm init --ignore-preflight-errors all

Problem	deleting namespace stuck at "Terminating" state
Sebep/Neden	deleting namespace stuck at "Terminating" state
Çözüm	kubectl get namespace "env-name" -o json \| tr -d "\n" \| sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \| kubectl replace --raw /api/v1/namespaces/env-name/finalize -f -

Problem	"x509 certificate" issue during docker pull
Reason/Cause	If the relevant institution does not use https, the following line is added to the daemon file of docker. This process is repeated for all nodes using Docker.
Solution	$ sudo vi /etc/docker/daemon.json `"insecure-registries" : ["hub.docker.com:443", "registry-1.docker.io:443", "quay.io"]sudo systemctl daemon-reload sudo systemctl restart docker` #It is checked with the following. docker info
Reason/Cause	If the relevant institution uses https, the relevant institution must add the ssl certificate ("crt") to the servers.
Solution	cp ssl.crt /usr/local/share/ca-certificates/ update-ca-certificates service docker restart #Centos 7 sudo cp -p ssl.crt /etc/pki/ca-trust/source sudo cp ssl.crt /etc/pki/ca-trust/source/anchors/myregistrydomain.com.crt sudo update-ca-trust extract sudo systemctl daemon-reload sudo systemctl restart docker

Problem

If Nexus proxy is in use

Reason/Cause

If the relevant institution uses Nexus proxy, servers with docker are directed to this address.

Solution

$ sudo vi /etc/docker/daemon.json

{

"data-root":"/docker-data",

"insecure-registries":["nexusdocker.institutionaddress.com.tr"],

"registry-mirrors":["https://nexusdocker.institutionaddress.com.tr"],

"exec-opts": ["native.cgroupdriver=systemd"],

"log-driver": "json-file",

"log-opts": { "max-size": "100m" },

"storage-driver": "overlay2"

}

Problem

Kubernetes DNS Problem (connection timed out; no servers could be reached)

Reason/Cause Node stays on Ready,SchedulingDisabled

Test

kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml


kubectl get pods dnsutils

kubectl exec -i -t dnsutils -- nslookup kubernetes.default

If the result is as below everything is correct.

Server:    10.0.0.10
Address 1: 10.0.0.10

Name:      kubernetes.default
Address 1: 10.0.0.1

If the result is as follows, there is an error and the following steps need to be checked.

Server: 10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1

Check the Resolv.conf file.

$ kubectl exec -ti dnsutils -- cat /etc/resolv.conf

(correct)

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local institution.gov.tr
options ndots:5

(false)

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

kubectl rollout restart -n kube-system deployment/coredns

Solution

In a client, it was resolved by adding the institution's domain address to the /etc/resolv.conf file.

search institution.gov.tr

Problem	docker: Error response from daemon: Get https://registry-1.docker.io/v2/: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "institutionCertificateName-CA").
Reason/Cause	Firewall adds its own certificate by doing ssl inspection.
Solution	docker.io will be added to "ssl inspection exception" on firewall.

Problem

Node stucks at status NotReady and error message is as follows: "Unable to update cni config: no networks found in /etc/cni/net.d"

Reason/Cause

In master, kube-flannel somehow fails to create required folder and files.

Solution

(Alternative solutions: https://github.com/kubernetes/kubernetes/issues/54918)

$ sudo mkdir -p /etc/cni/net.d

$ sudo vi /etc/cni/net.d/10-flannel.conflist

#add the below.

{"name": "cbr0","plugins": [{"type": "flannel","delegate": {"hairpinMode": true,"isDefaultGateway": true}},{"type": "portmap","capabilities": {"portMappings": true}}]}

----------

{"name": "cbr0","cniVersion": "0.3.1","plugins": [{"type": "flannel","delegate": {"isDefaultGateway": true}},{"type": "portmap","capabilities": {"portMappings": true}}]}

------------

sudo chmod -Rf 777 /etc/cni /etc/cni/*

sudo chown -Rf apinizer:apinizer /etc/cni /etc/cni/*

sudo systemctl daemon-reload

sudo systemctl restart kubelet

#Check if there is still a pod that cannot take an image:

kubectl get pods -n kube-system

describe pod podAdi -n kube-system

Problem

Client certificates generated by kubeadm expire after 1 year - "internal server error. Error Detail: operation: [list] for kind: [pod] with name: [null] in namespace: [prod] failed"

Reason/Cause

Unable to connect to the server: x509: certificate has expired or is not yet valid

Solution

#These operations should be done on all master servers.

sudo kubeadm alpha certs check-expiration 
sudo kubeadm alpha certs renew all 

mkdir -p $HOME/.kube 
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config 
sudo chown $(id -u):$(id -g) $HOME/.kube/config

#all the nodes
sudo reboot -i

#further readings:
https://serverfault.com/questions/1065444/how-can-i-find-which-kubernetes-certificate-has-expired)
https://www.oak-tree.tech/blog/k8s-cert-yearly-renewwal

Problem

The connection to the server x.x.x.:6443 was refused - did you specify the right host or port?

Reason/Cause

That problem can occur from the reasons below:

If the disk capacity extended, swap migth have been opened.
The user may not have authorization.
You may not be on the Master Kubernetes Server.

Solution

sudo swapoff -a

sudo vi /etc/fstab (swap line must be commented out or deleted)

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

sudo reboot (optional)

Problem

kubelet.service: Main process exited, code=exited, status=255

Reason/Cause

Although there are various reasons for this problem, if the error says that no .conf file can be found, all configs can be created from scratch by following the procedures below.

Solution

#Existing configs and certificates are backed up and operations are performed

cd /etc/kubernetes/pki/
mkdir /tmp/backup | mkdir /tmp/backup2
mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} /tmp/backup/

kubeadm init phase certs all --apiserver-advertise-address <MasterIP>
cd /etc/kubernetes/
mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} /tmp/backup2

kubeadm init phase kubeconfig all
reboot

Problem

ctr: failed to verify certificate: x509: certificate is not valid

Reason/Cause

The problem above is a problem that occurs when you do not have a trusted certificate when taking images from the Private registry.

Solution

We provide the solution with the -skip-verify parameter.

For example, the command to include it in the "k8s.io" namespace:

ctr --namespace k8s.io images pull xxx.harbor.com/apinizercloud/managerxxxx -skip-verify

Error while installing docker on Centos 8.3.x servers

kubeadm error: "kubelet isn’t running or healthy and connection refused"

deleting namespace stuck at "Terminating" state

"x509 certificate" issue during docker pull

If Nexus proxy is in use

Kubernetes DNS Problem (connection timed out; no servers could be reached)

docker: Error response from daemon: Get https://registry-1.docker.io/v2/: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "institutionCertificateName-CA").

Node stucks at status NotReady and error message is as follows: "Unable to update cni config: no networks found in /etc/cni/net.d"

Client certificates generated by kubeadm expire after 1 year - "internal server error. Error Detail: operation: [list] for kind: [pod] with name: [null] in namespace: [prod] failed"

The connection to the server x.x.x.:6443 was refused - did you specify the right host or port?

kubelet.service: Main process exited, code=exited, status=255

ctr: failed to verify certificate: x509: certificate is not valid