Kubernetes update步骤
升级k8s 集群(1.19–>1.20)
升级master(来自官方文档,未验证)
执行 “kubeadm upgrade”
升级第一个master节点
- 升级 kubeadm
yum install -y kubeadm-1.20.15-0 --disableexcludes=kubernetes
- 验证下载操作正常,并且 kubeadm 版本正确
[root@xdf-14-python-base-57 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.15", GitCommit:"8f1e5bf0b9729a899b8df86249b56e2c74aebc55", GitTreeState:"clean", BuildDate:"2022-01-19T17:26:37Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
- 验证升级计划
[root@xdf-14-python-base-57 ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[preflight] The corefile contains plugins that kubeadm/CoreDNS does not know how to migrate. Each plugin listed should be manually verified for compatibility with the newer version of CoreDNS. Once ready, the upgrade can be initiated by skipping the preflight check. During the upgrade, kubeadm will migrate the configuration while leaving the listed plugin configs untouched, but cannot guarantee that they will work with the newer version of CoreDNS.
[preflight] Some fatal errors occurred:
[ERROR CoreDNSUnsupportedPlugins]: CoreDNS cannot migrate the following plugins:
[Plugin "template" is unsupported by this migration tool in 1.7.0.]
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
- 选择要升级到的目标版本,运行合适的命令
kubeadm upgrade apply v1.20.15
命令结束会有以下提示
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.20.15". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
- 手动升级你的 CNI 驱动插件
如果 CNI 驱动作为 DaemonSet 运行,则在其他控制平面节点上不需要此步骤
升级其它master节点
与第一个控制面节点相同,但是使用:
kubeadm upgrade node
腾空节点
通过将节点标记为不可调度并腾空节点为节点作升级准备
kubectl drain <node-to-drain> --ignore-daemonsets
升级 kubelet 和 kubectl
yum install -y kubelet-1.20.15-0 kubectl-1.20.15-0 --disableexcludes=kubernetes
重启 kubelet
systemctl daemon-reload
systemctl restart kubelet
解除节点的保护
kubectl uncordon <node-to-drain>
升级工作node
升级kubeadm
yum install -y kubeadm-1.20.15-0 --disableexcludes=kubernetes
执行 “kubeadm upgrade”
对于工作节点,下面的命令会升级本地的 kubelet 配置
kubeadm upgrade node
输出结果
[root@xdf-52-python-177 ~]# kubeadm upgrade node
[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks
[preflight] Skipping prepull. Not a control plane node.
[upgrade] Skipping phase. Not a control plane node.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.
腾空节点
将节点标记为不可调度并驱逐所有负载,准备节点的维护
kubectl drain <node-to-drain> --ignore-daemonsets
输出结果
➜ ~ kubectl drain <node-to-drain> --ignore-daemonsets
node/xdf-52-python-177 cordoned
DEPRECATED WARNING: Aborting the drain command in a list of nodes will be deprecated in v1.23.
The new behavior will make the drain command go through all nodes even if one or more nodes failed during the drain.
For now, users can try such experience via: --ignore-errors
error: unable to drain node "xdf-52-python-177", aborting command...
There are pending nodes to be drained:
xdf-52-python-177
error: cannot delete Pods with local storage (use --delete-emptydir-data to override): ailearn-dev/ailearn-freestyle-interface-v1-cddc6f5cd-vrzjd, ailearn-dev/ailearn-gray-admin-v1-75c69f884d-qwpkh, ailearn-dev/ailearn-instruction-core-svr-v1-fc4df7bf7-7w5js, ailearn-dev/ailearn-instruction-router-svr-v1-776d9c66b5-ctn5q, ailearn-dev/ailearn-okminicourse-task-v1-74f7c9b464-q86zm, ailearn-dev/ailearn-rule-svr-v1-7f765b98d9-f7j7d, ailearn-dev/ailearn-work-svr-v1-656c696449-mw5v5, argocd/argocd-dex-server-5665ffc49-cd4lc, kube-system/metrics-server-799d467fd5-tw7ws
升级 kubelet 和 kubectl
- 升级kubelet 和kubectl
yum install -y kubelet-1.20.15-0 kubectl-1.20.15-0 --disableexcludes=kubernetes
- 重启 kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet
取消对节点的保护
通过将节点标记为可调度,让节点重新上线
# 将 <node-to-drain> 替换为当前节点的名称
kubectl uncordon <node-to-drain>
验证集群的状态
kubectl get nodes
遇到的问题
在准备升级的node节点上执行kubectl version
时提示如下
Error from server (InternalError): an error on the server ("unknown") has prevented the request from succeeding
使用kubectl时 需要配置在 kubernetes的配置目录配置admin.conf,连接apiserver
升级完kubectl,重启kubectl时报如下错
failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is differ
ent from docker cgroup driver: "cgroupfs"
错误原因:
docker和k8s使用的cgroup不一致导致
查看docker 使用的cgroup driver
# docker info |grep "Cgroup Driver"
Cgroup Driver: cgroupfs
查看kubectl使用的cgroup driver
# kubectl -n kube-system get cm kubelet-config-1.19 -o yaml | grep "cgroupDriver"
cgroupDriver: systemd
解决方式
修改 kubelet 的 ConfigMap, 但是在Kubernetes的1.22版本中,如果用户没有在KubeletConfiguration
中设置cgroupDriver
字段,kubeadm init
全将它设置为默认值systemd
, 所以我们统一为systemd
-
第一种方式
使用下面命令修改kubelet的
configMap
kubectl edit cm kubelet-config -n kube-system
修改现有
cgroupDriver
的值,或者新增如下式样的字段cgroupDriver: systemd
-
第二种方式(这是我们要用的方法)
修改docker的
cgroupDriver
,然后重启dockercat /etc/docker/daemon.json { "exec-opts": ["native.cgroupdriver=systemd"] }