kubernetes删除node节点

avatar 2023年2月15日18:07:27 评论 645 次浏览

环境介绍

在kubernetes中,针对node节点的伸缩很多都是通过公有云可以做到自动化,但是对于伸缩的时候,在新节点创建的pod,公有云无法做到自动的回收,所以这里需要用到手动操作的问题,我这里使用自建的环境进行操作,因为不管是公有云还是私有云操作是一样的,只是公有云进行了封装。我这里用了三个节点,一个master,两个node节点,先看一下我的kubernetes环境。

[root@master ~]# kubectl get node -o wide
NAME     STATUS   ROLES                  AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION         CONTAINER-RUNTIME
master   Ready    control-plane,master   8d    v1.23.0   10.211.55.5   <none>        CentOS Linux 8 (Core)   4.18.0-80.el8.x86_64   docker://23.0.0
node1    Ready    <none>                 8d    v1.23.0   10.211.55.6   <none>        CentOS Linux 8 (Core)   4.18.0-80.el8.x86_64   docker://23.0.0
node2    Ready    <none>                 8d    v1.23.0   10.211.55.7   <none>        CentOS Linux 8 (Core)   4.18.0-80.el8.x86_64   docker://23.0.0

我这里要操作的是,我要把node2的节点进行删除操作,因为node2节点没有pod的所以我手动进行的创建。创建方式可以参考:

[root@master ~]# kubectl create deployment nginx-v5 --image=nginx
deployment.apps/nginx-v5 created
[root@master ~]# kubectl expose deployment nginx-v5 --port=80 --type=NodePort

因为我这里用不到service,所以就没有创建,这里只是为了给node节点打污点,所以使用不使用NodePort都不影响。看一下我创建的nginx已经调度到node2节点了。

[root@master ~]# kubectl get pod -A -o wide
NAMESPACE     NAME                                       READY   STATUS    RESTARTS         AGE    IP               NODE     NOMINATED NODE   READINESS GATES
default       nginx-85b98978db-brlfd                     1/1     Running   1 (6d23h ago)    7d2h   10.244.166.135   node1    <none>           <none>
default       nginx-v1-7d48f885fb-225sg                  1/1     Running   2 (6d23h ago)    8d     10.244.166.136   node1    <none>           <none>
default       nginx-v2-5f45d8768c-8bm48                  1/1     Running   1 (6d23h ago)    7d2h   10.244.166.137   node1    <none>           <none>
default       nginx-v3-7599d6fb5d-zpdr6                  1/1     Running   2 (6d23h ago)    8d     10.244.166.138   node1    <none>           <none>
default       nginx-v4-6989b5cbbf-72jfd                  1/1     Running   0                11m    10.244.104.3     node2    <none>           <none>
default       nginx-v5-8454d48d76-w66rm                  1/1     Running   0                20s    10.244.104.4     node2    <none>           <none>
kube-system   calico-kube-controllers-64cc74d646-jkfkj   1/1     Running   5 (6d23h ago)    8d     10.244.219.69    master   <none>           <none>
kube-system   calico-node-8wwhz                          1/1     Running   2 (6d23h ago)    8d     10.211.55.7      node2    <none>           <none>
kube-system   calico-node-pncp5                          1/1     Running   2 (6d23h ago)    8d     10.211.55.6      node1    <none>           <none>
kube-system   calico-node-tssn2                          1/1     Running   2 (6d23h ago)    8d     10.211.55.5      master   <none>           <none>
kube-system   coredns-6d8c4cb4d-2mz79                    1/1     Running   4 (6d23h ago)    8d     10.244.219.70    master   <none>           <none>
kube-system   coredns-6d8c4cb4d-8p8ld                    1/1     Running   4 (6d23h ago)    8d     10.244.219.71    master   <none>           <none>
kube-system   etcd-master                                1/1     Running   2 (6d23h ago)    8d     10.211.55.5      master   <none>           <none>
kube-system   kube-apiserver-master                      1/1     Running   4 (6d23h ago)    8d     10.211.55.5      master   <none>           <none>
kube-system   kube-controller-manager-master             1/1     Running   18 (6d23h ago)   8d     10.211.55.5      master   <none>           <none>
kube-system   kube-proxy-4w7k2                           1/1     Running   3 (6d23h ago)    8d     10.211.55.7      node2    <none>           <none>
kube-system   kube-proxy-7xgll                           1/1     Running   2 (6d23h ago)    8d     10.211.55.6      node1    <none>           <none>
kube-system   kube-proxy-b2ghj                           1/1     Running   2 (6d23h ago)    8d     10.211.55.5      master   <none>           <none>
kube-system   kube-scheduler-master                      1/1     Running   19 (73m ago)     8d     10.211.55.5      master   <none>           <none>

删除node节点

要删除node节点之前我们要避免删除影响现有的应用,所以我们要给node节点排空操作,就是把要删除的节点上的应用自动迁移走,然后在进行删除操作,排空也就是我们说的打污点。

[root@master ~]# kubectl drain node2 --delete-local-data --force --ignore-daemonsets
Flag --delete-local-data has been deprecated, This option is deprecated and will be deleted. Use --delete-emptydir-data.
node/node2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-8wwhz, kube-system/kube-proxy-4w7k2
evicting pod default/nginx-v5-8454d48d76-w66rm
evicting pod default/nginx-v4-6989b5cbbf-72jfd
pod/nginx-v5-8454d48d76-w66rm evicted
pod/nginx-v4-6989b5cbbf-72jfd evicted
node/node2 drained
[root@master ~]# kubectl get pod -A -o wide
NAMESPACE     NAME                                       READY   STATUS    RESTARTS         AGE    IP               NODE     NOMINATED NODE   READINESS GATES
default       nginx-85b98978db-brlfd                     1/1     Running   1 (6d23h ago)    7d3h   10.244.166.135   node1    <none>           <none>
default       nginx-v1-7d48f885fb-225sg                  1/1     Running   2 (6d23h ago)    8d     10.244.166.136   node1    <none>           <none>
default       nginx-v2-5f45d8768c-8bm48                  1/1     Running   1 (6d23h ago)    7d3h   10.244.166.137   node1    <none>           <none>
default       nginx-v3-7599d6fb5d-zpdr6                  1/1     Running   2 (6d23h ago)    8d     10.244.166.138   node1    <none>           <none>
default       nginx-v4-6989b5cbbf-nflhw                  1/1     Running   0                12s    10.244.166.140   node1    <none>           <none>
default       nginx-v5-8454d48d76-8jfq5                  1/1     Running   0                12s    10.244.166.139   node1    <none>           <none>
kube-system   calico-kube-controllers-64cc74d646-jkfkj   1/1     Running   5 (6d23h ago)    8d     10.244.219.69    master   <none>           <none>
kube-system   calico-node-8wwhz                          1/1     Running   2 (6d23h ago)    8d     10.211.55.7      node2    <none>           <none>
kube-system   calico-node-pncp5                          1/1     Running   2 (6d23h ago)    8d     10.211.55.6      node1    <none>           <none>
kube-system   calico-node-tssn2                          1/1     Running   2 (6d23h ago)    8d     10.211.55.5      master   <none>           <none>
kube-system   coredns-6d8c4cb4d-2mz79                    1/1     Running   4 (6d23h ago)    8d     10.244.219.70    master   <none>           <none>
kube-system   coredns-6d8c4cb4d-8p8ld                    1/1     Running   4 (6d23h ago)    8d     10.244.219.71    master   <none>           <none>
kube-system   etcd-master                                1/1     Running   2 (6d23h ago)    8d     10.211.55.5      master   <none>           <none>
kube-system   kube-apiserver-master                      1/1     Running   4 (6d23h ago)    8d     10.211.55.5      master   <none>           <none>
kube-system   kube-controller-manager-master             1/1     Running   18 (6d23h ago)   8d     10.211.55.5      master   <none>           <none>
kube-system   kube-proxy-4w7k2                           1/1     Running   3 (6d23h ago)    8d     10.211.55.7      node2    <none>           <none>
kube-system   kube-proxy-7xgll                           1/1     Running   2 (6d23h ago)    8d     10.211.55.6      node1    <none>           <none>
kube-system   kube-proxy-b2ghj                           1/1     Running   2 (6d23h ago)    8d     10.211.55.5      master   <none>           <none>
kube-system   kube-scheduler-master                      1/1     Running   19 (84m ago)     8d     10.211.55.5      master   <none>           <none>

给node2节点打污点后,发现node2节点上就没有应用了,有的也是kubernetes的组件,所以这个不受现有环境的影响。node2节点已经没有应用了,我们要做的就是把node2节点从集群中删除。

[root@master ~]# kubectl delete nodes node2
node "node2" deleted
[root@master ~]# kubectl get node
NAME     STATUS   ROLES                  AGE   VERSION
master   Ready    control-plane,master   8d    v1.23.0
node1    Ready    <none>                 8d    v1.23.0

这样node2节点已经从集群中完全删除了,到这里kubernetes删除node节点的操作已经完成了,下面进行一些扩展。

扩展

这里的扩展部分是为了把删除的node2节点重新加入到集群中,我这里对node2节点删除后没有做任何操作,我们尝试一下是否可以直接加入到集群中,如果不能我们在看一下需要注意事项。在添加node节点之前我们需要在master节点从新创建一下新的token,因为这个token的生效时间是24小时,所以我们需要先创建,然后在node2节点加入。

[root@master ~]# kubeadm token create --print-join-command
kubeadm join 10.211.55.5:6443 --token olvh4t.rzflkeyrmceemscc --discovery-token-ca-cert-hash sha256:b748add9c4d2077777c1ff3c283cceec928f504647ca704106da8e887151b8f7
[root@node2 ~]# kubeadm join 10.211.55.5:6443 --token olvh4t.rzflkeyrmceemscc --discovery-token-ca-cert-hash sha256:b748add9c4d2077777c1ff3c283cceec928f504647ca704106da8e887151b8f7
[preflight] Running pre-flight checks
  [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 23.0.0. Latest validated version: 20.10
error execution phase preflight: [preflight] Some fatal errors occurred:
  [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
  [ERROR Port-10250]: Port 10250 is in use
  [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

在node2节点加入的时候,提示的是端口被使用了,我们这里尝试一下重启一下docker,kubeadm和kubelet,然后把创建的文件都删除。

[root@node2 ~]# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0215 11:57:14.531663   28038 removeetcdmember.go:80] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
[root@node2 ~]# systemctl stop kubelet
[root@node2 ~]# systemctl stop docker
Warning: Stopping docker.service, but it can still be activated by:
  docker.socket
[root@node2 ~]# rm -rf /var/lib/cni/
[root@node2 ~]# rm -rf /var/lib/kubelet/*
[root@node2 ~]# rm -rf /etc/cni/
[root@node2 ~]# systemctl start docker
[root@node2 ~]# systemctl start kubelet
[root@node2 ~]# kubeadm join 10.211.55.5:6443 --token olvh4t.rzflkeyrmceemscc --discovery-token-ca-cert-hash sha256:b748add9c4d2077777c1ff3c283cceec928f504647ca704106da8e887151b8f7
[preflight] Running pre-flight checks
  [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 23.0.0. Latest validated version: 20.10
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

我们在这里删除了cni和kubelet的信息,然后重启加入集群,node2节点中提示加入成功了,我们需要在master节点看一下,是否加入成功。

[root@master ~]# kubectl get node
NAME     STATUS     ROLES                  AGE   VERSION
master   Ready      control-plane,master   8d    v1.23.0
node1    Ready      <none>                 8d    v1.23.0
node2    NotReady   <none>                 20s   v1.23.0
[root@master ~]# kubectl get node
NAME     STATUS   ROLES                  AGE   VERSION
master   Ready    control-plane,master   8d    v1.23.0
node1    Ready    <none>                 8d    v1.23.0
node2    Ready    <none>                 21s   v1.23.0

刚开始node2节点上的状态是NotReady,不过没关系,我们需要稍等一下,也可以使用,如果我们加入后还有问题,可以查看一下node2节点的网络插件是否有异常,也可以使用describe查看一下详细信息。kubernetes常用命令可以参考:https://www.wulaoer.org/?p=2736 到现在kubernetes中node节点删除操作才算完成,没有了,看些其他的吧。。。

avatar

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: