很久没有折腾Kubernetes了,最近刚好手上有资源,可以够部署一个简单的集群,于是就利用起来了。这里采用的安装方式是通过Kubeadm来安装高可用集群。

1 版本信息

首先,物理服务器条件:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
    Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz x 2 Sockets
    共80 vCPU:
    NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
    NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
    
    内存共196G,其中128G分配为1G大小的HugePage内存.
    磁盘共786G.

    主机操作系统版本:Ubuntu 18.04 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

基于上述资源条件,创建虚机时做如下分配:

虚机名称 Controller-0 Controller-1 Controller-2 Node-0 Node-1 Node-2
CPU 4 4 4 16 16 16
内存 8G 8G 8G 16G 16G 16G
磁盘 64G 64G 64G 64G 64G 64G
角色 Controller Controller Controller Node Node Node

虚机部署就是通过virt-install和virsh来安装的KVM QEMU虚机,所有虚机都用HugePage内存,并做了CPU静态绑定,这里不对细节做过多陈述。

虚机上安装的操作系统为: Ubuntu 20.04 5.8.0-55-generic #62~20.04.1-Ubuntu SMP Wed Jun 2 08:55:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

安装的Kubernetes版本为v1.21.1.

使用的Docker版本为: Docker version 20.10.7, build f0df350.

2 安装过程

2.1 Kubernetes集群网络规划

Local Picture

  • 高可用集群的Service地址使用10.164.128.x/24网段的地址,在3个Controller上部署haproxy和keepalived来实现Service的高可用。
  • Kubernetes的默认k8s网络是192.168.100.x/24网段,基于Calico部署实施。跨worker node度Pod间通信是通过Calico网络进行的。
  • Data Network是用于高性能数据网络的。基于Multus CNI配置Pod支持多网络平面,Data Network可以是Flannel,MACVLAN等普通网络,也可以是SRIOV的高性能网络。

2.2 VM系统环境准备

2.2.1 配置系统环境

首先,确认Ubuntu 20.04已经升级到最新版本。通过“sudo apt upgrade -y”之后,确认内核版本统一为一个版本,如:

1
2
cifangzi@controller-0:~$ uname -a
Linux controller-0 5.8.0-55-generic #62~20.04.1-Ubuntu SMP Wed Jun 2 08:55:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

然后,在所有VM系统里修改/etc/hostname和/etc/hosts文件添加主机名,如:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
cifangzi@controller-0:~$ cat /etc/hostname
controller-0
cifangzi@controller-0:~$ cat /etc/hosts
...
10.164.128.188  vip.cluster.local
10.164.128.180  controller-0
10.164.128.181  controller-1
10.164.128.182  controller-2
10.164.128.183  node-0
10.164.128.184  node-1
10.164.128.185  node-2
10.164.128.187  node-3
...

这里10.164.128.188 vip.cluster.local作为API Server的高可用虚地址,后面安装时需要用到。修改后重启确认生效。

最后,设置环境依赖,kubeadm依赖br-netfilter以及对应设置,需要如下配置:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

#然后重启服务
sudo sysctl --system

2.2.2 安装Docker

Docker的安装流程参考官网链接:

1
https://docs.docker.com/engine/install/ubuntu/

具体步骤简化如下:

首先通过apt安装插件程序如下:

1
2
3
4
5
6
 sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

然后,下载安装docker archive的证书:

1
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

再向apt添加Docker的安装源:

1
2
3
echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

最后,安装Docker程序:

1
2
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Docker安装完成之后,还需要进行配置,配置Docker使用systemd作为cgroup driver,具体如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
sudo mkdir /etc/docker
cat <<EOF | sudo tee /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

因为国内无法访问Docker Hub,所以这里可以选择性的给Docker设置一个代理(此步骤为可选,如果Docker可以正常访问Docker Hub等国外网络,可以不做设置):

1
2
3
4
5
6
7
8
sudo mkdir -p /etc/systemd/system/docker.service.d
vim /etc/systemd/system/docker.service.d/http-proxy.conf

#添加如下内容
[Service]
Environment="HTTP_PROXY=http://proxy.example.com:80"
Environment="HTTPS_PROXY=https://proxy.example.com:443"
Environment="NO_PROXY=localhost,127.0.0.1,docker-registry.example.com,.corp"

最后,重启Docker服务:

1
2
sudo systemctl daemon-reload
sudo systemctl restart docker

2.3 安装Kubernetes

2.3.1 安装kubeadm、kubelet和kubectl

这个安装流程参考官网链接:

1
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

基本步骤可以简化如下:

先添加Kubernetes下载安装需要的证书:

1
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg

添加Kubernetes安装源到apt:

1
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

然后安装kubeadm、kubelet和kubectl:

1
2
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl

防止kubelet、kubeadm和kubectl自动更新版本,这里将软件版本做mark hold:

1
sudo apt-mark hold kubelet kubeadm kubectl

2.3.2 安装第一个控制节点

因为这里用haproxy和keepalived做控制节点高可用的方案,所以在安装控制节点的同时,也需要将haproxy和keepalived Pod通过静态方式安装上,并使用haproxy的VIP作为API Server的服务地址。

先创建一个kubeadm安装Kubernetes时用的配置文件kubeadm_config.yaml,内容如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# kubeadm-config.yaml
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta2
kubernetesVersion: v1.21.1
controlPlaneEndpoint: vip.cluster.local:8443
networking:
  podSubnet: 10.244.0.0/16
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd

这里主要提供了三个参数,controlPlaneEndpoint是API Server的地址,这里使用了vip.cluster.local这个域名以及8443端口;因为这里用Calico作为Pod网络CNI,所以需要通过podSubnet设置Pod IP CIDR,这里为10.244.0.0/16;cgroupDriver用来指定cgroup的driver类型为systemd。

因为需要静态创建haproxy和keepalived Pod,所以需要在/etc/kubernetes/manifests/路径下创建两个yaml文件,keepalived.yaml和haproxy.yaml,如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# keepalived.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  name: keepalived
  namespace: kube-system
spec:
  containers:
  - image: osixia/keepalived:2.0.17
    name: keepalived
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
        - NET_BROADCAST
        - NET_RAW
    volumeMounts:
    - mountPath: /usr/local/etc/keepalived/keepalived.conf
      name: config
    - mountPath: /etc/keepalived/check_apiserver.sh
      name: check
  hostNetwork: true
  volumes:
  - hostPath:
      path: /etc/keepalived/keepalived.conf
    name: config
  - hostPath:
      path: /etc/keepalived/check_apiserver.sh
    name: check
status: {}

# haproxy.yaml
apiVersion: v1
kind: Pod
metadata:
  name: haproxy
  namespace: kube-system
spec:
  containers:
  - image: haproxy:2.1.4
    name: haproxy
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: localhost
        path: /healthz
        port: 8443
        scheme: HTTPS
    volumeMounts:
    - mountPath: /usr/local/etc/haproxy/haproxy.cfg
      name: haproxyconf
      readOnly: true
  hostNetwork: true
  volumes:
  - hostPath:
      path: /etc/haproxy/haproxy.cfg
      type: FileOrCreate
    name: haproxyconf
status: {}

划重点:

在keepalived.yaml里,keepalived Pod直接mount了两个主机上的文件,分别是配置文件/etc/keepalived/keepalived.conf和探测脚本/etc/keepalived/check_apiserver.sh。

在haproxy.yaml里,haproxy Pod直接mount了主机上的配置文件/etc/haproxy/haproxy.cfg。

创建keepalived配置文件/etc/keepalived/keepalived.conf:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL
}
vrrp_script check_apiserver {
  script "/etc/keepalived/check_apiserver.sh"
  interval 3
  weight -2
  fall 10
  rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface ens7
    virtual_router_id 51
    priority 101
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        10.164.128.188
    }
    track_script {
        check_apiserver
    }
}

这里设置的VIP地址为10.164.128.188,以及vrrp相关参数,并且指定了API Server的track脚本为/etc/keepalived/check_apiserver.sh。

创建keepalived使用的track脚本/etc/keepalived/check_apiserver.sh:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/bin/sh

APISERVER_VIP="10.164.128.188"
APISERVER_DEST_PORT=8443

errorExit() {
    echo "*** $*" 1>&2
    exit 1
}

curl --silent --max-time 2 --insecure https://localhost:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://localhost:${APISERVER_DEST_PORT}/"
if ip addr | grep -q ${APISERVER_VIP}; then
    curl --silent --max-time 2 --insecure https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/"
fi

再创建haproxy的配置文件/etc/haproxy/haproxy.cfg:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    log /dev/log local0
    log /dev/log local1 notice
    daemon

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 1
    timeout http-request    10s
    timeout queue           20s
    timeout connect         5s
    timeout client          20s
    timeout server          20s
    timeout http-keep-alive 10s
    timeout check           10s

#---------------------------------------------------------------------
# apiserver frontend which proxys to the control plane nodes
#---------------------------------------------------------------------
frontend apiserver
    bind *:8443
    mode tcp
    option tcplog
    default_backend apiserver

#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
    option httpchk GET /healthz
    http-check expect status 200
    mode tcp
    option ssl-hello-chk
    balance     roundrobin
        server srv1 10.164.128.180:6443 check
        server srv2 10.164.128.181:6443 check
        server srv3 10.164.128.182:6443 check
        # [...]

这里配置了backend apiserver地址为准备安装的3个Controller的地址。

然后通过如下命令安装Kubernetes:

1
sudo kubeadm init --upload-certs --config ./kubeadm_config.yml

P.S.因为Kubnertes不支持Swap,所以在安装Kubernetes之前关闭系统的Swap。 https://github.com/kubernetes/kubeadm/issues/610

可以通过修改/etc/fstab注释掉Swap相关配置后重启动方式来关闭Swap。

如果顺利的话,至此,第一个Controller节点基本安装完成。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
cifangzi@controller-0:~$ kubectl get pods -A
NAMESPACE     NAME                                   READY   STATUS    RESTARTS   AGE
kube-system   coredns-558bd4d5db-6zvpw               0/1     Pending   0          11m
kube-system   coredns-558bd4d5db-kb2bh               0/1     Pending   0          11m
kube-system   etcd-controller-0                      1/1     Running   0          11m
kube-system   haproxy-controller-0                   1/1     Running   0          11m
kube-system   keepalived-controller-0                1/1     Running   0          11m
kube-system   kube-apiserver-controller-0            1/1     Running   0          11m
kube-system   kube-controller-manager-controller-0   1/1     Running   0          11m
kube-system   kube-proxy-2tg9m                       1/1     Running   0          11m
kube-system   kube-scheduler-controller-0            1/1     Running   0          11m

在安装其它Controller节点和worker节点之前,需要先安装Calico才能打通跨节点Pod间的通信。这里选择使用Helm来安装Calico。所以,先安装Helm:

1
2
3
4
curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm

在如下链接下载Calico Helm安装包,然后安装Calico:

1
2
3
https://docs.projectcalico.org/getting-started/kubernetes/helm

helm install calico tigera-operator-v3.19.1-2.tgz

安装完Calico之后,会多出如下Pod:

1
2
3
4
calico-system     calico-kube-controllers-7f58dbcbbd-blg4t   1/1     Running   0          20m
calico-system     calico-node-tfqdf                          1/1     Running   0          20m
calico-system     calico-typha-8d6bdd5d5-fzjpn               1/1     Running   0          20m
tigera-operator   tigera-operator-86c4fc874f-g5t9d           1/1     Running   1          20m

2.3.3 安装另外两个控制节点

第一个控制节点安装完成时,会打印一些信息,提示安装其它控制节点和worker节点的命令,如其它控制节点安装的命令:

1
2
3
kubeadm join vip.cluster.local:8443 --token llcqaz.lnhg7qwpyieb85y1 \
        --discovery-token-ca-cert-hash sha256:3d057e4477f58e7f99cefcf3eeae39e19f9c7058ff7431742fda6c1910e7b3d8 \
        --control-plane --certificate-key d2be6f1c641cd6244d3a62a57d4c6545dd89b9631a1ace54b57916fc5ebd4478

因为这里配置了haproxy和keepalived作为高可用的方式,所以,还需要添加haproxy和keepalived的Pod。 按照按照第一个控制节点的方式,先创建haproxy和keepalived的配置文件,修改keepalived的配置文件中VRRP的角色为SLAVE,创建keepalived用到的track脚本。然后再创建haproxy和keepalived的yaml文件,并拷贝到/etc/kubernetes/manifests/。通过如下命令重启kubelet后即可:

1
sudo systemctl restart kubelet

查看节点状态,如下:

1
2
3
4
cifangzi@controller-0:~$ kubectl get nodes
NAME           STATUS   ROLES                  AGE   VERSION
controller-0   Ready    control-plane,master   22h   v1.21.1
controller-1   Ready    control-plane,master   21h   v1.21.1

此时,coredns的状态应该都是running,如下:

1
2
kube-system       coredns-558bd4d5db-k8h55                   1/1     Running   0          20h
kube-system       coredns-558bd4d5db-svp75                   1/1     Running   0          20h

可以用相同方法,创建第三个控制节点。

2.3.4 安装worker节点

worker节点比较容易安装,按照之前第一个节点按照时的提示命令,可以直接安装。如下:

1
2
kubeadm join vip.cluster.local:8443 --token llcqaz.lnhg7qwpyieb85y1 \
        --discovery-token-ca-cert-hash sha256:3d057e4477f58e7f99cefcf3eeae39e19f9c7058ff7431742fda6c1910e7b3d8

按同样方法安装其它worker节点。安装后查看Pod和节点状态:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
cifangzi@controller-0:~$ kubectl get pods -A
NAMESPACE         NAME                                       READY   STATUS    RESTARTS   AGE
calico-system     calico-kube-controllers-7f58dbcbbd-blg4t   1/1     Running   0          20h
calico-system     calico-node-d94vp                          1/1     Running   2          20h
calico-system     calico-node-jt269                          1/1     Running   2          20h
calico-system     calico-node-n2rjj                          1/1     Running   2          20h
calico-system     calico-node-p49lm                          1/1     Running   0          20h
calico-system     calico-node-pnq4z                          1/1     Running   0          20h
calico-system     calico-node-tfqdf                          1/1     Running   0          20h
calico-system     calico-typha-8d6bdd5d5-fzjpn               1/1     Running   0          20h
calico-system     calico-typha-8d6bdd5d5-gtbsw               1/1     Running   0          20h
calico-system     calico-typha-8d6bdd5d5-tnjcz               1/1     Running   0          20h
kube-system       coredns-558bd4d5db-k8h55                   1/1     Running   0          20h
kube-system       coredns-558bd4d5db-svp75                   1/1     Running   0          20h
kube-system       etcd-controller-0                          1/1     Running   0          20h
kube-system       etcd-controller-1                          1/1     Running   0          20h
kube-system       etcd-controller-2                          1/1     Running   0          20h
kube-system       haproxy-controller-0                       1/1     Running   0          20h
kube-system       haproxy-controller-1                       1/1     Running   0          20h
kube-system       haproxy-controller-2                       1/1     Running   0          20h
kube-system       keepalived-controller-0                    1/1     Running   0          20h
kube-system       keepalived-controller-1                    1/1     Running   0          20h
kube-system       keepalived-controller-2                    1/1     Running   0          20h
kube-system       kube-apiserver-controller-0                1/1     Running   0          20h
kube-system       kube-apiserver-controller-1                1/1     Running   0          20h
kube-system       kube-apiserver-controller-2                1/1     Running   0          20h
kube-system       kube-controller-manager-controller-0       1/1     Running   1          20h
kube-system       kube-controller-manager-controller-1       1/1     Running   0          20h
kube-system       kube-controller-manager-controller-2       1/1     Running   0          20h
kube-system       kube-proxy-5pwws                           1/1     Running   0          20h
kube-system       kube-proxy-b2qrg                           1/1     Running   0          20h
kube-system       kube-proxy-c22tz                           1/1     Running   0          20h
kube-system       kube-proxy-hnd7s                           1/1     Running   0          20h
kube-system       kube-proxy-pkdkp                           1/1     Running   0          20h
kube-system       kube-proxy-tnbmj                           1/1     Running   0          20h
kube-system       kube-scheduler-controller-0                1/1     Running   1          20h
kube-system       kube-scheduler-controller-1                1/1     Running   0          20h
kube-system       kube-scheduler-controller-2                1/1     Running   0          20h
tigera-operator   tigera-operator-86c4fc874f-g5t9d           1/1     Running   1          20h

cifangzi@controller-0:~$ kubectl get nodes
NAME           STATUS   ROLES                  AGE   VERSION
controller-0   Ready    control-plane,master   22h   v1.21.1
controller-1   Ready    control-plane,master   21h   v1.21.1
controller-2   Ready    control-plane,master   21h   v1.21.1
node0          Ready    <none>                 21h   v1.21.1
node1          Ready    <none>                 21h   v1.21.1
node2          Ready    <none>                 21h   v1.21.1

2.4 安装Multus CNI

安装Multus CNI也比较简单,通过git clone Multus对应的源码,然后安装即可。

1
2
3
git clone https://github.com/k8snetworkplumbingwg/multus-cni.git
cd multus-cni
cat ./images/multus-daemonset.yml | kubectl apply -f -

至此,高可用的Kubernetes集群安装完成。