Tuesday, September 5, 2017

Kubernetes 1.7 on CentOS: This is How We Nailed It!

Please Note: If you are looking for a comprehensive, step-by-step guide on installing Kubernetes on CentOS, check out this guide (composed by one of my colleagues) instead.

Caution: This blog was originally composed several months ago, when the K8s release cycle was at v1.7.0-alpha.3. As always, K8s has zoomed past us, hence most of the content would be downright obsolete by now!


If you are a fan of Kubernetes (K8s) and the ton of amazing features it has in store for developers, you would surely agree with me on the fact that getting a local K8s setup up and running is one of the most fundamental steps of learning K8s.

Since we have been waiting for quite some time to re-spin our Integration Platform (IPS) product on top of a new K8s version, we decided it was high time to get a brand new cluster up and running with 1.7.0 (which is still in beta as of this writing, but would become stable by the time we migrate our own code on top of it). Due to some stability requirements we wanted a setup running a RedHat-compatible base OS, so we decided to go ahead with CentOS.

While we could have always have followed one of the pre-compiled guides from K8s docs, such as this, we wanted to be a bit more adventurous and get things done by way of the K8s source; besides, we wanted to be able to upgrade our set-up when required, for which the best way was to properly understand how the K8s authors have nailed it in the original source.

First of all, we got hold of a clone of the K8s source at the branch master branch (as we needed the latest code base) and tag v1.7.0-alpha.3 (since we were looking for that specific release).

Next we built the K8s binary artifacts using the make command at sources root. (You can utilize the Docker-based cross compilation approach mentioned in the above guide as well, but note that it would download a 1.8+ GB cross compiler image and build artifacts for all platforms resulting in a 20+ GB total size.)

We set up a VirtualBox virtual machine running CentOS 7, and installed docker, etcd and flanneld afresh. Docker was installed via yum (sudo yum install docker) and configured for use by our default, non-root user. Others were installed via yum as well, with versions etcd-3.1.3-1.el7.x86_64.rpm and flannel-0.7.0-1.el7.x86_64.rpm, respectively.

We were hoping to clone the machine in order to obtain a multi-node set-up (without undertaking the overhead of installing the individual components on each machine separately). However, this seemingly complicated the whole process.

The default K8s installer for CentOS (located at cluster/centos) by default downloads archives of K8s client/server, etcd, flanneld and docker binaries as part of the configuration process. Since we needed to install K8s binaries from our custom build instead, and didn't want to download the other binaries at all (as they were already installed), we had to customize the corresponding build and deployment scripts to avoid downloading the artifacts and utilize the build artifacts instead, and to control the reconfiguration of etcd, flanneld and docker.

What follows is a breakdown of the changes we had to perform on top of the v1.7.0-alpha.3 tag to get things working with native (yum-based) installations of etcd, flanneld and docker, with some attempts to explain the rationale behind each change.

Tip: If you wish to utilize any of these patches, it should be possible to save them into a plaintext file (with or without the .patch extension) and apply them on top of v1.7.0-alpha.3 tag of the K8s source using the git apply command.

Tip: If you only have a subset of the above services (etcd, flanneld and docker) already installed, patch out only the appropriate files/sections.

  1. Avoiding download of binary artifacts:
    diff --git a/cluster/centos/build.sh b/cluster/centos/build.sh
    index 18bbe6f..09a3631 100755
    --- a/cluster/centos/build.sh
    +++ b/cluster/centos/build.sh
    @@ -42,19 +42,6 @@ function clean-up() {
     function download-releases() {
       rm -rf ${RELEASES_DIR}
       mkdir -p ${RELEASES_DIR}
    -
    -  echo "Download flannel release v${FLANNEL_VERSION} ..."
    -  curl -L ${FLANNEL_DOWNLOAD_URL} -o ${RELEASES_DIR}/flannel.tar.gz
    -
    -  echo "Download etcd release v${ETCD_VERSION} ..."
    -  curl -L ${ETCD_DOWNLOAD_URL} -o ${RELEASES_DIR}/etcd.tar.gz
    -
    -  echo "Download kubernetes release v${K8S_VERSION} ..."
    -  curl -L ${K8S_CLIENT_DOWNLOAD_URL} -o ${RELEASES_DIR}/kubernetes-client-linux-amd64.tar.gz
    -  curl -L ${K8S_SERVER_DOWNLOAD_URL} -o ${RELEASES_DIR}/kubernetes-server-linux-amd64.tar.gz
    -
    -  echo "Download docker release v${DOCKER_VERSION} ..."
    -  curl -L ${DOCKER_DOWNLOAD_URL} -o ${RELEASES_DIR}/docker.tar.gz
     }
     
     function unpack-releases() {
    @@ -80,19 +67,12 @@ function unpack-releases() {
       fi
     
       # k8s
    -  if [[ -f ${RELEASES_DIR}/kubernetes-client-linux-amd64.tar.gz ]] ; then
    -    tar xzf ${RELEASES_DIR}/kubernetes-client-linux-amd64.tar.gz -C ${RELEASES_DIR}
         cp ${RELEASES_DIR}/kubernetes/client/bin/kubectl ${BINARY_DIR}
    -  fi
    -
    -  if [[ -f ${RELEASES_DIR}/kubernetes-server-linux-amd64.tar.gz ]] ; then
    -    tar xzf ${RELEASES_DIR}/kubernetes-server-linux-amd64.tar.gz -C ${RELEASES_DIR}
         cp ${RELEASES_DIR}/kubernetes/server/bin/kube-apiserver \
            ${RELEASES_DIR}/kubernetes/server/bin/kube-controller-manager \
            ${RELEASES_DIR}/kubernetes/server/bin/kube-scheduler ${BINARY_DIR}/master/bin
         cp ${RELEASES_DIR}/kubernetes/server/bin/kubelet \
            ${RELEASES_DIR}/kubernetes/server/bin/kube-proxy ${BINARY_DIR}/node/bin
    -  fi
     
       # docker
       if [[ -f ${RELEASES_DIR}/docker.tar.gz ]]; then
    
  2. Avoiding verification of artifacts (since we no longer download them):
    diff --git a/cluster/kube-up.sh b/cluster/kube-up.sh
    index 7877fb9..9e793ce 100755
    --- a/cluster/kube-up.sh
    +++ b/cluster/kube-up.sh
    @@ -40,8 +40,6 @@ fi
     
     echo "... calling verify-prereqs" >&2
     verify-prereqs
    -echo "... calling verify-kube-binaries" >&2
    -verify-kube-binaries
     
     if [[ "${KUBE_STAGE_IMAGES:-}" == "true" ]]; then
       echo "... staging images" >&2
    
  3. Coping for natively-installed etcd and flanneld binaries (which are located (rather symlinked) at /usr/bin instead of /opt/kubernetes/bin where K8s expects them to be):
    diff --git a/cluster/centos/master/scripts/etcd.sh b/cluster/centos/master/scripts/etcd.sh
    index aa73b57..6b575da 100755
    --- a/cluster/centos/master/scripts/etcd.sh
    +++ b/cluster/centos/master/scripts/etcd.sh
    @@ -74,7 +74,7 @@ Type=simple
     WorkingDirectory=${etcd_data_dir}
     EnvironmentFile=-/opt/kubernetes/cfg/etcd.conf
     # set GOMAXPROCS to number of processors
    -ExecStart=/bin/bash -c "GOMAXPROCS=\$(nproc) /opt/kubernetes/bin/etcd"
    +ExecStart=/bin/bash -c "GOMAXPROCS=\$(nproc) /usr/bin/etcd"
     Type=notify
     
     [Install]
    diff --git a/cluster/centos/master/scripts/flannel.sh b/cluster/centos/master/scripts/flannel.sh
    index 092fcd8..5d9630d 100644
    --- a/cluster/centos/master/scripts/flannel.sh
    +++ b/cluster/centos/master/scripts/flannel.sh
    @@ -37,7 +37,7 @@ After=network.target
     
     [Service]
     EnvironmentFile=-/opt/kubernetes/cfg/flannel
    -ExecStart=/opt/kubernetes/bin/flanneld --ip-masq \${FLANNEL_ETCD} \${FLANNEL_ETCD_KEY} \${FLANNEL_ETCD_CAFILE} \${FLANNEL_ETCD_CERTFILE} \${FLANNEL_ETCD_KEYFILE}
    +ExecStart=/usr/bin/flanneld --ip-masq \${FLANNEL_ETCD} \${FLANNEL_ETCD_KEY} \${FLANNEL_ETCD_CAFILE} \${FLANNEL_ETCD_CERTFILE} \${FLANNEL_ETCD_KEYFILE}
     
     Type=notify
     
    @@ -48,7 +48,7 @@ EOF
     # Store FLANNEL_NET to etcd.
     attempt=0
     while true; do
    -  /opt/kubernetes/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
    +  /usr/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
         --no-sync -C ${ETCD_SERVERS} \
         get /coreos.com/network/config >/dev/null 2>&1
       if [[ "$?" == 0 ]]; then
    @@ -59,7 +59,7 @@ while true; do
           exit 2
         fi
     
    -    /opt/kubernetes/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
    +    /usr/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
           --no-sync -C ${ETCD_SERVERS} \
           mk /coreos.com/network/config "{\"Network\":\"${FLANNEL_NET}\"}" >/dev/null 2>&1
         attempt=$((attempt+1))
    diff --git a/cluster/centos/node/bin/mk-docker-opts.sh b/cluster/centos/node/bin/mk-docker-opts.sh
    index 041d977..177ee9f 100755
    --- a/cluster/centos/node/bin/mk-docker-opts.sh
    +++ b/cluster/centos/node/bin/mk-docker-opts.sh
    @@ -69,7 +69,6 @@ done
     
     if [[ $indiv_opts = false ]] && [[ $combined_opts = false ]]; then
       indiv_opts=true
    -  combined_opts=true
     fi
     
     if [[ -f "$flannel_env" ]]; then
    diff --git a/cluster/centos/node/scripts/docker.sh b/cluster/centos/node/scripts/docker.sh
    index 320446a..3f38f3e 100755
    --- a/cluster/centos/node/scripts/docker.sh
    +++ b/cluster/centos/node/scripts/docker.sh
    @@ -35,7 +35,7 @@ Type=notify
     EnvironmentFile=-/run/flannel/docker
     EnvironmentFile=-/opt/kubernetes/cfg/docker
     WorkingDirectory=/opt/kubernetes/bin
    -ExecStart=/opt/kubernetes/bin/dockerd \$DOCKER_OPT_BIP \$DOCKER_OPT_MTU \$DOCKER_OPTS
    +ExecStart=/usr/bin/dockerd \$DOCKER_OPT_BIP \$DOCKER_OPT_MTU \$DOCKER_OPTS
     LimitNOFILE=1048576
     LimitNPROC=1048576
     
    diff --git a/cluster/centos/node/scripts/flannel.sh b/cluster/centos/node/scripts/flannel.sh
    index 2830dae..384788f 100755
    --- a/cluster/centos/node/scripts/flannel.sh
    +++ b/cluster/centos/node/scripts/flannel.sh
    @@ -39,7 +39,7 @@ Before=docker.service
     [Service]
     EnvironmentFile=-/opt/kubernetes/cfg/flannel
     ExecStartPre=/opt/kubernetes/bin/remove-docker0.sh
    -ExecStart=/opt/kubernetes/bin/flanneld --ip-masq \${FLANNEL_ETCD} \${FLANNEL_ETCD_KEY} \${FLANNEL_ETCD_CAFILE} \${FLANNEL_ETCD_CERTFILE} \${FLANNEL_ETCD_KEYFILE}
    +ExecStart=/usr/bin/flanneld --ip-masq \${FLANNEL_ETCD} \${FLANNEL_ETCD_KEY} \${FLANNEL_ETCD_CAFILE} \${FLANNEL_ETCD_CERTFILE} \${FLANNEL_ETCD_KEYFILE}
     ExecStartPost=/opt/kubernetes/bin/mk-docker-opts.sh -d /run/flannel/docker
     
     Type=notify
    @@ -52,7 +52,7 @@ EOF
     # Store FLANNEL_NET to etcd.
     attempt=0
     while true; do
    -  /opt/kubernetes/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
    +  /usr/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
         --no-sync -C ${ETCD_SERVERS} \
         get /coreos.com/network/config >/dev/null 2>&1
       if [[ "$?" == 0 ]]; then
    @@ -63,7 +63,7 @@ while true; do
           exit 2
         fi
     
    -    /opt/kubernetes/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
    +    /usr/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
           --no-sync -C ${ETCD_SERVERS} \
           mk /coreos.com/network/config "{\"Network\":\"${FLANNEL_NET}\"}" >/dev/null 2>&1
         attempt=$((attempt+1))
    

Once all fixes were in place, all we needed to do to get the ball rolling, was to run

MASTER=adrt@192.168.1.192 \
NODES="adrt@192.168.1.192 adrt@192.168.1.193 adrt@192.168.1.194" \
DOCKER_OPTS="--insecure-registry=hub.adroitlogic.com:5000" \
KUBERNETES_PROVIDER=centos \
CERT_GROUP=janaka \
./kube-up.sh

within the cluster directory of the K8s source.

  • Our master node is 192.168.1.192.
  • We have 3 worker nodes, 192.168.1.192, 192.168.1.193 and 192.168.1.194.
  • Our master node itself is a worker node (since we often don't have enough resources on our machines, to run a dedicated master).
  • adrt is the CentOS user on each node (with superuser privileges).
  • janaka is the username on my local machine (where the kube-up.sh script actually gets executed).
  • We have a local Docker hub for holding our IPS images, at hub.adroitlogic.com:5000 (resolved via internal hostname mappings), which we need to inject to the Docker startup script (in order to be able to utilize our own images within the future IPS set-up).

And within minutes, we got a working K8s cluster!


P.S.: All this happened a long time ago, and we utilized the cluster in developing and testing our Integration Platform (IPS), whose latest 17.07 release is now available for download. Check it out; you may happen to like it!

No comments: