升级k8s 集群(1.19–>1.20)

官方文档: https://v1-20.docs.kubernetes.io/zh/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#%E5%8D%87%E7%BA%A7-kubelet-%E5%92%8C-kubectl-1

升级master(来自官方文档,未验证)

执行 “kubeadm upgrade”

升级第一个master节点

  1. 升级 kubeadm
yum install -y kubeadm-1.20.15-0 --disableexcludes=kubernetes
  1. 验证下载操作正常,并且 kubeadm 版本正确
[root@xdf-14-python-base-57 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.15", GitCommit:"8f1e5bf0b9729a899b8df86249b56e2c74aebc55", GitTreeState:"clean", BuildDate:"2022-01-19T17:26:37Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
  1. 验证升级计划
[root@xdf-14-python-base-57 ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[preflight] The corefile contains plugins that kubeadm/CoreDNS does not know how to migrate. Each plugin listed should be manually verified for compatibility with the newer version of CoreDNS. Once ready, the upgrade can be initiated by skipping the preflight check. During the upgrade, kubeadm will migrate the configuration while leaving the listed plugin configs untouched, but cannot guarantee that they will work with the newer version of CoreDNS.
[preflight] Some fatal errors occurred:
	[ERROR CoreDNSUnsupportedPlugins]: CoreDNS cannot migrate the following plugins:
[Plugin "template" is unsupported by this migration tool in 1.7.0.]
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
  1. 选择要升级到的目标版本,运行合适的命令
kubeadm upgrade apply v1.20.15

命令结束会有以下提示

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.20.15". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
  1. 手动升级你的 CNI 驱动插件

如果 CNI 驱动作为 DaemonSet 运行,则在其他控制平面节点上不需要此步骤

升级其它master节点

与第一个控制面节点相同,但是使用:

kubeadm upgrade node

腾空节点

通过将节点标记为不可调度并腾空节点为节点作升级准备

kubectl drain <node-to-drain> --ignore-daemonsets

升级 kubelet 和 kubectl

  yum install -y kubelet-1.20.15-0 kubectl-1.20.15-0 --disableexcludes=kubernetes

重启 kubelet

systemctl daemon-reload
systemctl restart kubelet

解除节点的保护

kubectl uncordon <node-to-drain>

升级工作node

升级kubeadm

yum install -y kubeadm-1.20.15-0 --disableexcludes=kubernetes

执行 “kubeadm upgrade”

对于工作节点,下面的命令会升级本地的 kubelet 配置

kubeadm upgrade node

输出结果

[root@xdf-52-python-177 ~]# kubeadm upgrade node
[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks
[preflight] Skipping prepull. Not a control plane node.
[upgrade] Skipping phase. Not a control plane node.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.

腾空节点

将节点标记为不可调度并驱逐所有负载,准备节点的维护

kubectl drain <node-to-drain> --ignore-daemonsets

输出结果

➜  ~  kubectl drain <node-to-drain> --ignore-daemonsets
node/xdf-52-python-177 cordoned
DEPRECATED WARNING: Aborting the drain command in a list of nodes will be deprecated in v1.23.
The new behavior will make the drain command go through all nodes even if one or more nodes failed during the drain.
For now, users can try such experience via: --ignore-errors
error: unable to drain node "xdf-52-python-177", aborting command...

There are pending nodes to be drained:
 xdf-52-python-177
error: cannot delete Pods with local storage (use --delete-emptydir-data to override): ailearn-dev/ailearn-freestyle-interface-v1-cddc6f5cd-vrzjd, ailearn-dev/ailearn-gray-admin-v1-75c69f884d-qwpkh, ailearn-dev/ailearn-instruction-core-svr-v1-fc4df7bf7-7w5js, ailearn-dev/ailearn-instruction-router-svr-v1-776d9c66b5-ctn5q, ailearn-dev/ailearn-okminicourse-task-v1-74f7c9b464-q86zm, ailearn-dev/ailearn-rule-svr-v1-7f765b98d9-f7j7d, ailearn-dev/ailearn-work-svr-v1-656c696449-mw5v5, argocd/argocd-dex-server-5665ffc49-cd4lc, kube-system/metrics-server-799d467fd5-tw7ws

升级 kubelet 和 kubectl

  1. 升级kubelet 和kubectl
yum install -y kubelet-1.20.15-0 kubectl-1.20.15-0 --disableexcludes=kubernetes
  1. 重启 kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet

取消对节点的保护

通过将节点标记为可调度,让节点重新上线

# 将 <node-to-drain> 替换为当前节点的名称
kubectl uncordon <node-to-drain>

验证集群的状态

kubectl get nodes

遇到的问题

在准备升级的node节点上执行kubectl version 时提示如下

Error from server (InternalError): an error on the server ("unknown") has prevented the request from succeeding

使用kubectl时 需要配置在 kubernetes的配置目录配置admin.conf,连接apiserver

升级完kubectl,重启kubectl时报如下错

failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is differ
ent from docker cgroup driver: "cgroupfs"

错误原因:

docker和k8s使用的cgroup不一致导致

查看docker 使用的cgroup driver

# docker info |grep "Cgroup Driver"
 Cgroup Driver: cgroupfs

查看kubectl使用的cgroup driver

# kubectl -n kube-system get cm kubelet-config-1.19 -o yaml | grep "cgroupDriver"
    cgroupDriver: systemd

解决方式

修改 kubelet 的 ConfigMap, 但是在Kubernetes的1.22版本中,如果用户没有在KubeletConfiguration中设置cgroupDriver字段,kubeadm init 全将它设置为默认值systemd , 所以我们统一为systemd

  • 第一种方式

    使用下面命令修改kubelet的configMap

    kubectl edit cm kubelet-config -n kube-system
    

    修改现有 cgroupDriver 的值,或者新增如下式样的字段

    cgroupDriver: systemd
    
  • 第二种方式(这是我们要用的方法)

    修改docker的cgroupDriver,然后重启docker

    cat /etc/docker/daemon.json
    {
      "exec-opts": ["native.cgroupdriver=systemd"]
    }