kubernetes部署高可用集群

avatar 2023年9月7日18:19:34 评论 232 次浏览

kubernetes的主要组件是etcd,kube-apiserver,kube-scheduler和kube-controller-manager。如果在多个master节点中利用负载均衡的方式把组件部署到三个节点中,整个kubernetes高可用就可以实现了。负载均衡方式有多种,proxy,nginx+keepalived等等。不过这里都不用,这里使用kube-vip。kube-vip 是一个用于创建高可用 Kubernetes 集群的工具。它提供了一种在无需外部负载均衡器的情况下实现高可用控制平面的方法。

kube-vip 使用了 Linux Virtual Server (LVS) 技术来实现负载均衡和高可用性。它通过在每个节点上运行一个虚拟 IP 地址,并使用 LVS 转发流量到控制平面节点来实现负载均衡。如果某个节点故障,kube-vip 会自动将虚拟 IP 地址迁移到其他健康的节点上,从而确保集群的高可用性。

使用 kube-vip,您可以在 Kubernetes 集群中实现以下功能:

  1. 高可用控制平面:kube-vip 使控制平面组件(如 etcd、API Server、Controller Manager 和 Scheduler)具有高可用性,即使某个节点发生故障,集群仍能正常运行。

  2. 负载均衡:kube-vip 使用 LVS 技术将流量负载均衡到多个控制平面节点上,从而提供更好的性能和可扩展性。

  3. 滚动升级:在进行 Kubernetes 版本升级或节点维护时,kube-vip 可以自动迁移虚拟 IP 地址,使您能够无缝地进行滚动升级,而无需中断集群的正常运行。

使用场景简介
 VIP 172.16.10.54
 k8s-master01  172.16.10.50
 k8s-master02  172.16.10.51
 k8s-master03  172.16.10.52
 k8s-node02    172.16.10.53
基础配置

首先,需要对所有的节点做基础配置,关闭防火墙,过滤包,开启ipvs以及时间同步等待,这里需要注意的是modprobe br_netfilter,ipvs是liunx的内核的一个模块,用于实现高性能的负载均衡。它提供了一种在传输层(Layer 4)对网络流量进行负载均衡的方法。开启后重启会失效,避免重启失效可以持久化echo "br_netfilter" > /etc/modprobe.d/modules.conf

 关闭防火墙:
 # systemctl stop firewalld
 # systemctl disable firewalld
 关闭selinux:
 # sed -i 's/enforcing/disabled/' /etc/selinux/config #永久,setenforce 0 #临时。
 关闭swap:swapoff -a #临时,vim /etc/fstab #永久–> c。
 将桥接的IPv4流量传递到iptables的链(要在每个机器上执行)过滤网桥上的包。
 # cat > /etc/sysctl.d/k8s.conf << EOF
 net.bridge.bridge-nf-call-ip6tables = 1
 net.bridge.bridge-nf-call-iptables = 1
 net.ipv4.ip_forward = 1
 EOF
 # modprobe br_netfilter #开启内核 ipv4 转发需要加载 br_netfilter 模块
 # sysctl --system #生效
 安装ipvs
 # cat > /etc/sysconfig/modules/ipvs.modules <<EOF
 #!/bin/bash
 modprobe -- ip_vs
 modprobe -- ip_vs_rr
 modprobe -- ip_vs_wrr
 modprobe -- ip_vs_sh
 modprobe -- nf_conntrack_ipv4
 EOF
 # chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
 nf_conntrack_ipv4      19149  0
 nf_defrag_ipv4         12729  1 nf_conntrack_ipv4
 ip_vs_sh               12688  0
 ip_vs_wrr              12697  0
 ip_vs_rr               12600  0
 ip_vs                 145458  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
 nf_conntrack          143360  2 ip_vs,nf_conntrack_ipv4
 libcrc32c              12644  3 xfs,ip_vs,nf_conntrack
 # yum install ipset -y
 # yum install ipvsadm -y
 
 时间同步
 # yum install chrony -y
 # systemctl enable chronyd
 # systemctl start chronyd
 # chronyc sources
 210 Number of sources = 4
 MS Name/IP address         Stratum Poll Reach LastRx Last sample
 ===============================================================================
 ^- ntp7.flashdance.cx            2  10   333   725    -33ms[  -33ms] +/-  135ms
 ^- ntp6.flashdance.cx            2  10    73   607    -16ms[  -16ms] +/-  142ms
 ^+ dns1.synet.edu.cn             2  10   357   402   -360us[ -360us] +/- 8084us
 ^* dns2.synet.edu.cn             1  10   367   650   +177us[ +222us] +/- 7121us
 # echo "vm.swappiness = 0" >> /etc/sysctl.d/k8s.conf
 # sysctl -p /etc/sysctl.d/k8s.conf
 net.bridge.bridge-nf-call-ip6tables = 1
 net.bridge.bridge-nf-call-iptables = 1
 net.ipv4.ip_forward = 1
 vm.swappiness = 0
安装containerd

基础配置完成后,可以安装containerd,kubernetes的1.20版本不在把docker作为默认支持的容器,这里使用containerd的目的就是为了简单。

 安装containerd
 使用yum安装
 # yum install -y yum-utils device-mapper-persistent-data lvm2
 # yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo #添加yum源
 # yum install containerd  crictl-tools jq -y
 # containerd config default > /etc/containerd/config.toml
 # ctr version
 Client:
   Version:  1.6.22
   Revision: 8165feabfdfe38c65b599c4993d227328c231fca
   Go version: go1.19.11
 
 Server:
   Version:  1.6.22
   Revision: 8165feabfdfe38c65b599c4993d227328c231fca
   UUID: 7fa75843-35f9-4f26-b50a-33ed04e64f26
 
 在https://github.com/containerd/containerd/releases/下载cri-containerd-cni-1.6.22-linux-amd64.tar.gz,然后上传到服务器上,并且解压配置环境变量即可。
 # wget https://download.fastgit.org/containerd/containerd/releases/download/v1.6.22/cri-containerd-cni-1.6.22-linux-amd64.tar.gz
  tar -C / -xzf cri-containerd-cni-1.5.5-linux-amd64.tar.gz
 echo '''export PATH=$PATH:/usr/local/bin:/usr/local/sbin''' >> /etc/profile
 source /etc/profile
containerd配置

containerd和docker一样,我们需要修改一下镜像加速器,避免拉取国外镜像超时的问题。

 # mkdir -p /etc/containerd
 # containerd config default > /etc/containerd/config.toml #生成配置文件
 
 配置文件修改
 [plugins."io.containerd.grpc.v1.cri"]
   ......................
   sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6" #添加镜像加速器
 ........................
     [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
        ....................
        SystemdCgroup = true  #把false改成true
     [plugins."io.containerd.grpc.v1.cri".registry]
       config_path = ""
 
       [plugins."io.containerd.grpc.v1.cri".registry.auths]
 
       [plugins."io.containerd.grpc.v1.cri".registry.configs]
 
       [plugins."io.containerd.grpc.v1.cri".registry.headers]
 
       [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
         [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] #添加镜像加速器
           endpoint = ["https://bqr1dr1n.mirror.aliyuncs.com"]
         [plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]#添加镜像加速器
         endpoint = ["https://registry.aliyuncs.com/k8sxio"]
 #重启
 # systemctl enable --now containerd
 # systemctl restart containerd
 # systemctl daemon-reload
安装kubeadm、kubelet、kubectl

kubeadm,kubelet,kubectl三个的版本一定要一致,避免后期的麻烦,这里使用yum安装。并设置开机自启动。

 #添加yum源
 # cat <<EOF > /etc/yum.repos.d/kubernetes.repo
 [kubernetes]
 name=Kubernetes
 baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
 enabled=1
 gpgcheck=1
 repo_gpgcheck=1
 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
 EOF
 # yum makecache fast -y
 # yum install -y --nogpgcheck kubelet kubeadm kubectl
 #-disableexcludes 禁掉除了kubernetes之外的别的仓库
 $ yum install -y kubelet-1.24.1 kubeadm-1.24.1 kubectl-1.24.1 --disableexcludes=kubernetes
 #可以使用下面的命令查看仓库中软件的版本
 $ yum --showduplicates list kubectl #如果已经安装可以使用kubectl version查看版本
 $ systemctl start kubelet
 $ systemctl enable kubelet
  systemctl enable --now kubelet 开机自启动

以上必须在所有节点上安装,不管是master节点还是node节点都需要用到containerd和kubeadm,kubelet,kubectl

负载均衡器

我们先生成kube-vip的pod模版文件,定义kube-vip的变量。记得每次初始化的时候会把这个yml文件删除,所以在初始化后一定要重新生成kube-vip的模版。

 # mkdir -p /etc/kubernetes/manifests/
 # export VIP=172.16.10.54
 # export INTERFACE=eth0
 # ctr image pull docker.io/plndr/kube-vip:v0.6.2
 # ctr run --rm --net-host docker.io/plndr/kube-vip:v0.6.2 vip \
 /kube-vip manifest pod \
 --interface $INTERFACE \
 --vip $VIP \
 --controlplane \
 --services \
 --arp \
 --leaderElection | tee  /etc/kubernetes/manifests/kube-vip.yaml
 apiVersion: v1
 kind: Pod
 metadata:
   creationTimestamp: null
   name: kube-vip
   namespace: kube-system
 spec:
   containers:
   - args:
     - manager
     env:
     - name: vip_arp
       value: "true"
     - name: vip_interface
       value: eth0
     - name: port
       value: "6443"
     - name: vip_cidr
       value: "32"
     - name: cp_enable
       value: "true"
     - name: cp_namespace
       value: kube-system
     - name: vip_ddns
       value: "false"
     - name: svc_enable
       value: "true"
     - name: vip_leaderelection
       value: "true"
     - name: vip_leaseduration
       value: "5"
     - name: vip_renewdeadline
       value: "3"
     - name: vip_retryperiod
       value: "1"
     - name: vip_address
       value: 172.16.10.50
     image: ghcr.io/kube-vip/kube-vip:v0.3.8
     imagePullPolicy: Always
     name: kube-vip
     resources: {}
     securityContext:
       capabilities:
         add:
         - NET_ADMIN
         - NET_RAW
         - SYS_TIME
     volumeMounts:
     - mountPath: /etc/kubernetes/admin.conf
       name: kubeconfig
   hostNetwork: true
   volumes:
   - hostPath:
       path: /etc/kubernetes/admin.conf
     name: kubeconfig
 status: {}

master节点和node节点的区别就是一个有负载均衡一个没有负载均衡,到此,一个master节点的配置安装完了,需要注意的是,第二个master在初始化之前一定要生成kube-vip模版后在加入集群。下面开始初始化集群。

修改初始化配置文件

我们通过kubeadm生成一个初始化文件,并修改配置,这里需要注意在文件中添加的注释,我的这个模版文件使用的是flannel网络插件,如果使用Calico插件只需要将 podSubnet 字段置空podSubnet: ""

 初始化master节点
 ######################################
 #  kubeadm config print init-defaults --component-configs KubeletConfiguration > kubeadm.yaml #生产初始化配置文件
 # cat kubeadm.yaml
 apiVersion: kubeadm.k8s.io/v1beta3
 bootstrapTokens:
 - groups:
   - system:bootstrappers:kubeadm:default-node-token
   token: abcdef.0123456789abcdef
   ttl: 24h0m0s
   usages:
   - signing
   - authentication
 kind: InitConfiguration
 localAPIEndpoint:
   advertiseAddress: 172.16.10.50  #当前节点ip
   bindPort: 6443
 nodeRegistration:
   criSocket: unix:///var/run/containerd/containerd.sock
   imagePullPolicy: IfNotPresent
   name: 172.16.10.50  #这里如果是主机名,一定要在host做域名解析
   taints:  # 给master添加污点,master节点不能调度应用
   - effect: "NoSchedule"
     key: "node-role.kubernetes.io/master"
 ---
 apiVersion: kubeproxy.config.k8s.io/v1alpha1
 kind: KubeProxyConfiguration
 mode: ipvs  # kube-proxy 模式
 ---
 apiVersion: kubeadm.k8s.io/v1beta3
 certificatesDir: /etc/kubernetes/pki
 clusterName: kubernetes
 controllerManager: {}
 dns: {}
 etcd:
   local:
     dataDir: /var/lib/etcd
 imageRepository: registry.aliyuncs.com/google_containers
 kind: ClusterConfiguration
 kubernetesVersion: 1.24.1 #kubernetes版本
 controlPlaneEndpoint: 172.16.10.54:6443  # 设置控制平面Endpoint地址
 apiServer:
   extraArgs:
     authorization-mode: Node,RBAC
   timeoutForControlPlane: 4m0s
   certSANs:  # 添加其他master节点的相关信息,如果这里定义的服务名称一定要在hosts里做本地解析
   - 172.16.10.54
   - 172.16.10.50
   - 172.16.10.51
 networking:
   dnsDomain: cluster.local
   serviceSubnet: 10.96.0.0/12
   podSubnet: 10.244.0.0/16  # 指定 pod 子网 如果使用Calico插件只需要将 podSubnet 字段置空podSubnet: ""
 scheduler: {}
 ---
 apiVersion: kubelet.config.k8s.io/v1beta1
 authentication:
   anonymous:
     enabled: false
   webhook:
     cacheTTL: 0s
     enabled: true
   x509:
     clientCAFile: /etc/kubernetes/pki/ca.crt
 authorization:
   mode: Webhook
   webhook:
     cacheAuthorizedTTL: 0s
     cacheUnauthorizedTTL: 0s
 cgroupDriver: systemd
 clusterDNS:
 - 10.96.0.10
 clusterDomain: cluster.local
 cpuManagerReconcilePeriod: 0s
 evictionPressureTransitionPeriod: 0s
 fileCheckFrequency: 0s
 healthzBindAddress: 127.0.0.1
 healthzPort: 10248
 httpCheckFrequency: 0s
 imageMinimumGCAge: 0s
 kind: KubeletConfiguration
 logging:
   flushFrequency: 0
   options:
     json:
       infoBufferSize: "0"
   verbosity: 0
 memorySwap: {}
 nodeStatusReportFrequency: 0s
 nodeStatusUpdateFrequency: 0s
 rotateCertificates: true
 runtimeRequestTimeout: 0s
 shutdownGracePeriod: 0s
 shutdownGracePeriodCriticalPods: 0s
 staticPodPath: /etc/kubernetes/manifests
 streamingConnectionIdleTimeout: 0s
 syncFrequency: 0s
 volumeStatsAggPeriod: 0s
 ###########################################

这里需要注意的是如果在certSANs中配置的是服务名称,vip通过域名请求的方式,一定要在hosts中配置主机名的解析,并且在kubeadm.yaml文件中也要配置主机名,避免解析失败的问题,而且使用主机名后在kubectl get node中NAME下显示的是主机名,如果不是主机名使用ip将会显示ip。

 .....................
 nodeRegistration:
   criSocket: /run/containerd/containerd.sock  # 使用 containerd的Unix socket 地址
   imagePullPolicy: IfNotPresent
   name: k8s-master01
 ..........................
 controlPlaneEndpoint: api.k8s.local:6443  # 设置控制平面Endpoint地址
 apiServer:
   extraArgs:
     authorization-mode: Node,RBAC
   timeoutForControlPlane: 4m0s
   certSANs:  # 添加其他master节点的相关信息
   - api.k8s.local
   - k8s-master01
   - k8s-master02
   - k8s-master03
   - 192.168.31.30
   - 192.168.31.31
   - 192.168.31.32

hosts的解析一定和这里对应,如果不对应在初始化的时候会找不到。

下载初始化需要的镜像文件,直接下载即可。

 # kubeadm config images pull --config kubeadm.yaml
 [config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.1
 [config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.24.1
 [config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.24.1
 [config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.24.1
 [config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.7
 [config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.3-0
 [config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.8.6
初始化集群

根据我们配置的yaml文件进行初始化。

 # kubeadm init --upload-certs --config kubeadm.yaml
 [init] Using Kubernetes version: v1.24.1
 [preflight] Running pre-flight checks
 [preflight] Pulling images required for setting up a Kubernetes cluster
 [preflight] This might take a minute or two, depending on the speed of your internet connection
 [preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
 [certs] Using certificateDir folder "/etc/kubernetes/pki"
 [certs] Generating "ca" certificate and key
 [certs] Generating "apiserver" certificate and key
 [certs] apiserver serving cert is signed for DNS names [172.16.10.50 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.16.10.50 172.16.10.54 172.16.10.51]
 [certs] Generating "apiserver-kubelet-client" certificate and key
 [certs] Generating "front-proxy-ca" certificate and key
 [certs] Generating "front-proxy-client" certificate and key
 [certs] Generating "etcd/ca" certificate and key
 [certs] Generating "etcd/server" certificate and key
 [certs] etcd/server serving cert is signed for DNS names [172.16.10.50 localhost] and IPs [172.16.10.50 127.0.0.1 ::1]
 [certs] Generating "etcd/peer" certificate and key
 [certs] etcd/peer serving cert is signed for DNS names [172.16.10.50 localhost] and IPs [172.16.10.50 127.0.0.1 ::1]
 [certs] Generating "etcd/healthcheck-client" certificate and key
 [certs] Generating "apiserver-etcd-client" certificate and key
 [certs] Generating "sa" key and public key
 [kubeconfig] Using kubeconfig folder "/etc/kubernetes"
 [kubeconfig] Writing "admin.conf" kubeconfig file
 [kubeconfig] Writing "kubelet.conf" kubeconfig file
 [kubeconfig] Writing "controller-manager.conf" kubeconfig file
 [kubeconfig] Writing "scheduler.conf" kubeconfig file
 [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
 [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
 [kubelet-start] Starting the kubelet
 [control-plane] Using manifest folder "/etc/kubernetes/manifests"
 [control-plane] Creating static Pod manifest for "kube-apiserver"
 [control-plane] Creating static Pod manifest for "kube-controller-manager"
 [control-plane] Creating static Pod manifest for "kube-scheduler"
 [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
 [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
 [apiclient] All control plane components are healthy after 10.011741 seconds
 [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
 [kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
 [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
 [upload-certs] Using certificate key:
 3a356484fecaaea190f52c359c6182e08297f742dd1cda3fd8054b8b0558c08c
 [mark-control-plane] Marking the node 172.16.10.50 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
 [mark-control-plane] Marking the node 172.16.10.50 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
 [bootstrap-token] Using token: abcdef.0123456789abcdef
 [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
 [bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
 [bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
 [bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
 [bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
 [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
 [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
 [addons] Applied essential addon: CoreDNS
 [addons] Applied essential addon: kube-proxy
 
 Your Kubernetes control-plane has initialized successfully!
 
 To start using your cluster, you need to run the following as a regular user:
 
   mkdir -p $HOME/.kube
   sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
   sudo chown $(id -u):$(id -g) $HOME/.kube/config
 
 Alternatively, if you are the root user, you can run:
 
   export KUBECONFIG=/etc/kubernetes/admin.conf
 
 You should now deploy a pod network to the cluster.
 Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
   https://kubernetes.io/docs/concepts/cluster-administration/addons/
 
 You can now join any number of the control-plane node running the following command on each as root:
 #添加master节点
   kubeadm join 172.16.10.54:6443 --token abcdef.0123456789abcdef \
   --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408 \
   --control-plane --certificate-key 3a356484fecaaea190f52c359c6182e08297f742dd1cda3fd8054b8b0558c08c
 
 Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
 As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
 "kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
 
 Then you can join any number of worker nodes by running the following on each as root:
 #添加node节点
 kubeadm join 172.16.10.54:6443 --token abcdef.0123456789abcdef \
   --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408

初始化成功,根据提示创建执行即可,那么在本节点上添加密钥,就可以看到kubernetes集群了。

 # mkdir -p $HOME/.kube
 # sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
 # sudo chown $(id -u):$(id -g) $HOME/.kube/config
 # kubectl get node
 NAME           STATUS     ROLES           AGE    VERSION
 172.16.10.50   Ready      control-plane   175m   v1.24.1

这是因为我在初始化时定义的使用ip地址,这里就显示ip地址了。

添加master集群节点

因为是高可用,所以需要在k8s-master02节点加入集群,可以通过上面初始化生成的密钥添加,也可以通过命令的方式添加,不过需要注意的是在加入集群时会解析本地的主机名,提示找不到,可以在hosts中指定k8s-master02解析到172。16.10.51这个ip,也就是当前的ip,然后加入集群。如果初始化集群生成的密钥超过了24小时,可以重新创建密钥,然后拼接

 # kubeadm token create --print-join-command --ttl=0
 kubeadm join 172.16.10.54:6443 --token nmw4yn.5dv52o9s8gcrzip5 --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408
 # kubeadm init phase upload-certs --upload-certs
 I0907 12:10:57.554477   28856 version.go:255] remote version is much newer: v1.28.1; falling back to: stable-1.24
 [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
 [upload-certs] Using certificate key:
 1de3f4baf520e102ef1836e3e774edf46c0ebcba047f475931344ffc9b9bcbd0
 拼接成
 # kubeadm join 172.16.10.54:6443 --token nmw4yn.5dv52o9s8gcrzip5 --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408 --control-plane --certificate-key  1de3f4baf520e102ef1836e3e774edf46c0ebcba047f475931344ffc9b9bcbd0

拼接后在新的master节点上执行即可。k8s-master03节点也是按照上面的方法。三个master节点就部署好了。

安装网络插件

因为上面我们初始化的时候定义了使用flannel插件,

 到此,三个master节点已经安装好了,但是我们没有安装插件,因为在生成kubeadm.yaml文件的时候我们指定的插件是flannel,
 # wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
 # 如果有节点是多网卡,则需要在资源清单文件中指定内网网卡
 # 搜索到名为 kube-flannel-ds 的 DaemonSet,在kube-flannel容器下面
 # vi kube-flannel.yml
 ......
 containers:
 - name: kube-flannel
   image: quay.io/coreos/flannel:v0.14.0 
   command:
   - /opt/bin/flanneld
   args:
   - --ip-masq
   - --kube-subnet-mgr
   - --iface=eth0  # 如果是多网卡的话,指定内网网卡的名称
 ......
 # kubectl apply -f kube-flannel.yml  # 安装 flannel 网络插件
 也可以不修改,看自己的实际情况。
 #calico网络插件
 # curl https://docs.projectcalico.org/manifests/calico.yaml -O
 #k8s-v1.20支持的最新版calico是v3.20
 #故正确获取calico的yaml文件应该用:
 #https://docs.projectcalico.org/archive/v3.20/manifests/calico.yaml
 # kubectl apply -f calico.yaml
添加node节点

master节点建议跑集群的组件,node节点可以跑应用,所以需要给集群加node节点。也可以根据初始化生成的密钥添加node节点也可以使用命令的方式添加。可以手动创建一下kubeadm token create --print-join-command,创建成功后在node节点执行即可。

 #master节点
 # kubeadm token create --print-join-command
 kubeadm join 172.16.10.54:6443 --token ncoknr.k0i67be6yhw0m27s --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408
 #node节点
 # kubeadm join 172.16.10.54:6443 --token ncoknr.k0i67be6yhw0m27s --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408
 [preflight] Running pre-flight checks
   [WARNING Hostname]: hostname "k8s-node02" could not be reached
   [WARNING Hostname]: hostname "k8s-node02": lookup k8s-node02 on 211.167.230.100:53: no such host
 [preflight] Reading configuration from the cluster...
 [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
 [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
 [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
 [kubelet-start] Starting the kubelet
 [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
 
 This node has joined the cluster:
 * Certificate signing request was sent to apiserver and a response was received.
 * The Kubelet was informed of the new secure connection details.
 
 Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
 # kubectl get node
 NAME           STATUS     ROLES           AGE     VERSION
 172.16.10.50   Ready      control-plane   3h25m   v1.24.1
 k8s-master02   Ready      control-plane   178m    v1.24.1
 k8s-master03   Ready      control-plane   96m     v1.24.1
 k8s-node02     NotReady   <none>          6m32s   v1.24.1

到此,kubernetes高可用集群部署完了,下面进行测试一下集群是否在宕节点的时候是否影响整个集群。

测试负载均衡

测试集群的高可用,目的是防止一个节点停止了,整个集群不可用的情况,下面手动测试,我初始化的时候把172.16.10.50这个节点的名称弄成ip了,心里别扭,我就把这个节点删除,然后重新加入节点,看一下是否对整个集群是否产生影响。在操作之前看一下vip选举那个是leader。

 # kubectl logs -f kube-vip-k8s-master02 -n kube-system
 time="2023-09-07T02:54:23Z" level=info msg="Starting kube-vip.io [v0.6.2]"
 time="2023-09-07T02:54:23Z" level=info msg="namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[true], Services:[true]"
 time="2023-09-07T02:54:23Z" level=info msg="prometheus HTTP server started"
 time="2023-09-07T02:54:23Z" level=info msg="Starting Kube-vip Manager with the ARP engine"
 time="2023-09-07T02:54:23Z" level=info msg="beginning services leadership, namespace [kube-system], lock name [plndr-svcs-lock], id [k8s-master02]"
 I0907 02:54:23.710713       1 leaderelection.go:245] attempting to acquire leader lease kube-system/plndr-svcs-lock...
 time="2023-09-07T02:54:23Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-cp-lock], id [k8s-master02]"
 I0907 02:54:23.712441       1 leaderelection.go:245] attempting to acquire leader lease kube-system/plndr-cp-lock...
 E0907 02:54:33.722376       1 leaderelection.go:327] error retrieving resource lock kube-system/plndr-cp-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-cp-lock": net/http: TLS handshake timeout
 E0907 02:54:33.722386       1 leaderelection.go:327] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-svcs-lock": net/http: TLS handshake timeout
 time="2023-09-07T02:54:44Z" level=info msg="Node [k8s-master01] is assuming leadership of the cluster"
 time="2023-09-07T02:54:44Z" level=info msg="new leader elected: k8s-master01"
 目前vip的leader节点也就是,172.16.10.50,下面我先在172.16.10.50节点上初始化,然后删除节点kubeadm reset,然后在另外一个master节点上删除‘kubectl  delete nodes 172.16.10.50’,这个时候看一下vip的日志,选举跳到其他节点上了。
 # kubectl logs -f kube-vip-k8s-master02 -n kube-system
 time="2023-09-07T02:54:23Z" level=info msg="Starting kube-vip.io [v0.6.2]"
 time="2023-09-07T02:54:23Z" level=info msg="namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[true], Services:[true]"
 time="2023-09-07T02:54:23Z" level=info msg="prometheus HTTP server started"
 time="2023-09-07T02:54:23Z" level=info msg="Starting Kube-vip Manager with the ARP engine"
 time="2023-09-07T02:54:23Z" level=info msg="beginning services leadership, namespace [kube-system], lock name [plndr-svcs-lock], id [k8s-master02]"
 I0907 02:54:23.710713       1 leaderelection.go:245] attempting to acquire leader lease kube-system/plndr-svcs-lock...
 time="2023-09-07T02:54:23Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-cp-lock], id [k8s-master02]"
 I0907 02:54:23.712441       1 leaderelection.go:245] attempting to acquire leader lease kube-system/plndr-cp-lock...
 E0907 02:54:33.722376       1 leaderelection.go:327] error retrieving resource lock kube-system/plndr-cp-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-cp-lock": net/http: TLS handshake timeout
 E0907 02:54:33.722386       1 leaderelection.go:327] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-svcs-lock": net/http: TLS handshake timeout
 time="2023-09-07T02:54:44Z" level=info msg="Node [k8s-master01] is assuming leadership of the cluster"
 time="2023-09-07T05:55:27Z" level=info msg="new leader elected: k8s-master03"
 I0907 05:55:29.053911       1 leaderelection.go:255] successfully acquired lease kube-system/plndr-cp-lock
 time="2023-09-07T05:55:29Z" level=info msg="Gratuitous Arp broadcast will repeat every 3 seconds for [172.16.10.54]"
 time="2023-09-07T05:55:29Z" level=info msg="Node [k8s-master02] is assuming leadership of the cluster"

然后我在把172.16.10.50这个节点加入到集群,加入集群方法参考上面,先生成一个kube-vip文件,然后创建两个密钥进行拼接,在172.16.10.50节点执行加入集群。

 # kubectl get node
 NAME           STATUS   ROLES           AGE     VERSION
 k8s-master01   Ready    control-plane   19m     v1.24.1
 k8s-master02   Ready    control-plane   3h34m   v1.24.1
 k8s-master03   Ready    control-plane   132m    v1.24.1
 k8s-node02     Ready    <none>          42m     v1.24.1

节点有了,并且都已经正常了,这时我们可以看一下master节点的网卡,在初始化集群时我们选择的eth0这个网段,在master节点上,虚拟的vip地址已经在eth0上有了,只要虚拟ip存在就可以和master节点通信了。

 # ip addr
 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
     inet6 ::1/128 scope host
        valid_lft forever preferred_lft forever
 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     link/ether 52:54:00:2b:e4:b6 brd ff:ff:ff:ff:ff:ff
     inet 172.16.10.50/16 brd 172.16.255.255 scope global eth0
        valid_lft forever preferred_lft forever
     inet 172.16.10.54/32 scope global eth0
        valid_lft forever preferred_lft forever
     inet6 fe80::5054:ff:fe2b:e4b6/64 scope link
        valid_lft forever preferred_lft forever

我们可以尝试创建应用,然后重启节点或者关闭一个master节点是否影响整个集群。我这里测试的是没有任何异常。下面是针对安装过程中出现的异常和解决方法。

异常一

master节点加入集群,提示错误,主要是因为libseccomp版本太低了,升级一下版本即可

 ctr: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/default/vip/log.json: no such file or directory): runc did not terminate successfully: exit status 127: unknown
 主要是因为libseccomp的版本太低了,所以要升级一下版本
 #  rpm -qa | grep libseccomp
 libseccomp-2.3.1-4.el7.x86_64
 # rpm -e libseccomp-2.3.1-4.el7.x86_64 --nodeps
 # rpm -qa | grep libseccomp
 # wget https://rpmfind.net/linux/centos/8-stream/BaseOS/x86_64/os/Packages/libseccomp-2.5.1-1.el8.x86_64.rpm
 # rpm -ivh libseccomp-2.5.1-1.el8.x86_64.rpm
 warning: libseccomp-2.5.1-1.el8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 8483c65d: NOKEY
 Preparing...                          ################################# [100%]
 Updating / installing...
    1:libseccomp-2.5.1-1.el8           ################################# [100%]

异常二

master初始化一直检查超时,提示kubelet没有启动,检查日志,通过systemctl status kubelet查看是找不到节点,在日志路径下查看到kube-vip下的日志提示通过计算机名找不到当前节点。可以修改yml文件把主机名都换成ip,也可以在host里做本地解析。

 [kubelet-start] Starting the kubelet
 [control-plane] Using manifest folder "/etc/kubernetes/manifests"
 [control-plane] Creating static Pod manifest for "kube-apiserver"
 [control-plane] Creating static Pod manifest for "kube-controller-manager"
 [control-plane] Creating static Pod manifest for "kube-scheduler"
 [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
 [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
 [kubelet-check] Initial timeout of 40s passed.
 
 Unfortunately, an error has occurred:
   timed out waiting for the condition
 
 This error is likely caused by:
   - The kubelet is not running
   - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
 
 If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
   - 'systemctl status kubelet'
   - 'journalctl -xeu kubelet'
 
 Additionally, a control plane component may have crashed or exited when started by the container runtime.
 To troubleshoot, list all containers using your preferred container runtimes CLI.
 Here is one example how you may list all running Kubernetes containers by using crictl:
   - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
   Once you have found the failing container, you can inspect its logs with:
   - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
 error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
 To see the stack trace of this error execute with --v=5 or higher
 
 #这里提示了,kubelet没有运行,也可以看到更细节的日志内容,在初始化命令后面加--v=5参数,如果重新初始化一定要初始化一下kubeadm,使用kubeadm reset命令即可,提示输入‘Y’。在kube-vip的日志中发现请求到k8s-master01的时候请求超时,需要本地做一下hosts解析即可。
 2023-09-06T18:00:49.873411929+08:00 stderr F E0906 10:00:49.873310       1 leaderelection.go:325] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://k8s-master01:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-svcs-lock": x509: certificate is valid for 172.16.10.50, api.k8s.local, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, not k8s-master01

异常三

下面我们就需要看一下在重启之后加入到集群遇到的异常,前面在加载,下面异常点。

 [preflight] Running pre-flight checks
 error execution phase preflight: [preflight] Some fatal errors occurred:
   [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
 To see the stack trace of this error execute with --v=5 or higher
 
 主要是因为重启后加载br_netfilter模块失效了,为了避免重启失效,我们需要持久化写到文件中。
 echo "br_netfilter" > /etc/modprobe.d/modules.conf
avatar

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: