Kubernetes & Docker
Some problems may be encountered for Docker and Kubernetes installations and usage. These examples are based on Linux systems.
Problem | Error while installing docker on Centos 8.3.x servers |
---|---|
Reason/Cause | With the release of RHEL 8 and CentOS 8, the docker package was removed from the default package repositories, replaced by docker podman and buildah. RedHat has decided not to provide official support for Docker. For this reason, these packages prevent docker installation. |
Solution | yum remove podman* -y yum remove buildah* -y |
Problem | kubeadm error: "kubelet isn’t running or healthy and connection refused" |
---|---|
Reason/Cause | In Linux operating systems, "swap" and "selinux", which are usually active, should be turned off. |
Solution | sudo swapoff -a sudo sed -i '/ swap / s/^/#/' /etc/fstab sudo reboot kubeadm reset kubeadm init --ignore-preflight-errors all |
Problem | deleting namespace stuck at "Terminating" state |
---|---|
Sebep/Neden | deleting namespace stuck at "Terminating" state |
Çözüm | kubectl get namespace "env-name" -o json | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" | kubectl replace --raw /api/v1/namespaces/env-name/finalize -f - |
Problem | "x509 certificate" issue during docker pull |
---|---|
Reason/Cause | If the relevant institution does not use https, the following line is added to the daemon file of docker. This process is repeated for all nodes using Docker. |
Solution | $ sudo vi /etc/docker/daemon.json
|
Reason/Cause | If the relevant institution uses https, the relevant institution must add the ssl certificate ("crt") to the servers. |
Solution | cp ssl.crt /usr/local/share/ca-certificates/
sudo update-ca-trust extract |
Problem | If Nexus proxy is in use |
---|---|
Reason/Cause | If the relevant institution uses Nexus proxy, servers with docker are directed to this address. |
Solution | $ sudo vi /etc/docker/daemon.json { "data-root":"/docker-data", "insecure-registries":["nexusdocker.institutionaddress.com.tr"], "registry-mirrors":["https://nexusdocker.institutionaddress.com.tr"], "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2" } |
Problem | Kubernetes DNS Problem (connection timed out; no servers could be reached) |
---|---|
Reason/Cause | Node stays on Ready,SchedulingDisabled |
Test |
If the result is as below everything is correct.
If the result is as follows, there is an error and the following steps need to be checked.
Check the Resolv.conf file.
(correct) nameserver 10.96.0.10 (false) nameserver 10.96.0.10 - Note: a client experiencing this issue had IPV6 turned on. |
Solution | In a client, it was resolved by adding the institution's domain address to the /etc/resolv.conf file. search institution.gov.tr |
Problem | docker: Error response from daemon: Get https://registry-1.docker.io/v2/: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "institutionCertificateName-CA"). |
---|---|
Reason/Cause | Firewall adds its own certificate by doing ssl inspection. |
Solution | docker.io will be added to "ssl inspection exception" on firewall. |
Problem | Node stucks at status NotReady and error message is as follows: "Unable to update cni config: no networks found in /etc/cni/net.d" |
---|---|
Reason/Cause | In master, kube-flannel somehow fails to create required folder and files. |
Solution | (Alternative solutions: https://github.com/kubernetes/kubernetes/issues/54918) $ sudo mkdir -p /etc/cni/net.d $ sudo vi /etc/cni/net.d/10-flannel.conflist #add the below. {"name": "cbr0","plugins": [{"type": "flannel","delegate": {"hairpinMode": true,"isDefaultGateway": true}},{"type": "portmap","capabilities": {"portMappings": true}}]}---------- {"name": "cbr0","cniVersion": "0.3.1","plugins": [{"type": "flannel","delegate": {"isDefaultGateway": true}},{"type": "portmap","capabilities": {"portMappings": true}}]}------------ sudo chmod -Rf 777 /etc/cni /etc/cni/* sudo chown -Rf apinizer:apinizer /etc/cni /etc/cni/* sudo systemctl daemon-reload sudo systemctl restart kubelet #Check if there is still a pod that cannot take an image: kubectl get pods -n kube-system describe pod podAdi -n kube-system |
Problem | Client certificates generated by kubeadm expire after 1 year - "internal server error. Error Detail: operation: [list] for kind: [pod] with name: [null] in namespace: [prod] failed" |
---|---|
Reason/Cause | Unable to connect to the server: x509: certificate has expired or is not yet valid |
Solution | #These operations should be done on all master servers. |
Problem | The connection to the server x.x.x.:6443 was refused - did you specify the right host or port? |
---|---|
Reason/Cause | That problem can occur from the reasons below:
|
Solution | sudo swapoff -a sudo vi /etc/fstab (swap line must be commented out or deleted) mkdir -p $HOME/.kube sudo reboot (optional) |
Problem | kubelet.service: Main process exited, code=exited, status=255 |
---|---|
Reason/Cause | Although there are various reasons for this problem, if the error says that no .conf file can be found, all configs can be created from scratch by following the procedures below. |
Solution | #Existing configs and certificates are backed up and operations are performed cd /etc/kubernetes/pki/ kubeadm init phase certs all --apiserver-advertise-address <MasterIP> kubeadm init phase kubeconfig all |