Wednesday, September 13, 2017

Apps Script Navigator: UrlFetchApp Empowered with Cookie Management, Re-login and Much More!

If you are a fan of Google Apps Script, you would definitely have played around with its UrlFetchApp module, offering a convenient HTTP(S) client interface. As a frequent user of the module, although I admit that it is really cool, I always felt I needed a richer client, capable of handling stuff like automatic cookie management, proper redirection and logout detection with automatic re-login.

Hence came into being Navigator, a wrapper for UrlFetchApp that incorporates some of the basic HTTP client operations into convenient wrapper methods while addressing some of the pain points of the original UrlFetchApp module.

Unfortunately due to a bug in Apps Script framework, Navigator is not yet fully compatible with Apps Script editor's autocompletion feature. Hence for now we will have to depend on the comments in the source as documentation. Here is a summary of the features and utilities of the module:

A Navigator can be constructed using:

/**
 * a Navigator
 * invocation: Navigator.navigator(baseUrl)
 * returns a new Navigator object based on the given base URL (protocol://host:port/resource/path).
 * @class 
 * @param {baseUrl} default base URL for outbound requests
 * @implements {Navigator}
 * @return {Navigator} a new Navigator
 */
function Navigator(baseUrl)

The Navigator object currently supports the following methods:

/**
 * executes a GET request
 * @param {path} the destination path (relative or absolute)
 * @return the response payload
 */
Navigator.prototype.doGet

/**
 * executes a POST request
 * @param {path} the destination path (relative or absolute)
 * @param {payload} the payload (will be {@link UrlFetchApp}-escaped unless a String) to be sent with the
 *                  request; i.e. sent verbatim in case it is a string, or with escaping otherwise
 * @param {headers} an array of key-value pair headers to be sent with the request
 * @return the response payload
 */
Navigator.prototype.doPost

/**
 * executes an arbitrary request in {@link UrlFetchApp} style, for cases where you want to directly
 * manipulate certain options being passed to UrlFetchApp.fetch. However this still provides the
 * built-in enhancements of Navigator such as automatic cookie management.
 * @param {path} the destination path (relative or absolute)
 * @param {options} a {@link UrlFetchApp}-compatible options object
 * @return the response payload
 */
Navigator.prototype.sendRequest

The following configurator methods decide the behaviour of various features of Navigator:

/**
 * if set, cookies will be saved in {@link PropertiesService.getScriptProperties()}
 * @param {saveCookies} true if properties should be saved
 */
Navigator.prototype.setSaveCookies

/**
 * if saveCookies is set, decides the base username for saving cookies in the properties store (key {cookieusername}_cookie_{cookiename})
 * @param {cookieUsername} base username for cookies
 */
Navigator.prototype.setCookieUsername

/**
 * updates the local cookie cache with cookies received from a request, and returns the computed 'Cookie' header
 * @param {cookie} the current 'Cookie' header (string)
 * @param {rawCook} the cookie string ('Set-Cookie' header) received in a request
 * @return the updated 'Cookie' header string
 */
Navigator.prototype.updateCookies

/**
 * sets an absolute (starting with protocol://) or relative path for login requests to base website
 * @param {loginPath} path for login requests
 */
Navigator.prototype.setLoginPath

/**
 * sets the payload to be submitted during login (for automatic relogin)
 * @param {loginPayload} the login request payload
 */
Navigator.prototype.setLoginPayload

/**
 * if set, an automatic relogin will be performed whenever this content fragment is encountered in the response body
 * @param {logoutIndicator} content indicating a logout, for attempting relogin
 */
Navigator.prototype.setLogoutIndicator

/**
 * if set, when an automatic login is executed during a URL request, the original request will be replayed after login
 * @param {refetchOnLogin} true if refetch is required in case of a relogin
 */
Navigator.prototype.setRefetchOnLogin

/**
 * if set, logs would be generated for each request
 * @param {debug} true if request debug logging should be enabled
 */
Navigator.prototype.setDebug

The internal state of Navigator (such as the currently active cookies) can be obtained via the following methods:

/**
 * returns current 'Cookie' header
 * @return current 'Cookie' header string
 */
Navigator.prototype.getCookies

/**
 * returns headers received in the last navigation
 * @return headers from the last navigations
 */
Navigator.prototype.getLastHeaders

Navigator also provides some handy utility functions for extracting content from navigated pages, including those in the vicinity of HTML tags:

/**
 * similar to {@link extract} but is specialized for extracting form field values ("value" attributes)
 * @param {body} the HTML payload string
 * @param {locator} locator of the form field to be extracted (appearing before value)
 * @return value of the form field
 */
function getFormParam(body, locator)

/**
 * extracts a given tag attribute from a HTML payload based on a given locator; assumes locator appears before the attribute
 * @param {body} the HTML payload string
 * @param {key} key of the tag attribute
 * @param {locator} locator of the form field to be extracted (appearing before key)
 * @return value of the form field
 */
function extract(body, key, locator)

/**
 * similar to {@link extract} but performs a reverse match (for cases where the locator appears after the attribute)
 * @param {body} the HTML payload string
 * @param {key} key of the tag attribute
 * @param {locator} locator of the form field to be extracted (appearing after key)
 * @return value of the form field
 */
function extractReverse(body, key, locator)

Here are a few snippets utilizing the above features. I have masked the actual URLs and some of the parameters being passed, but you could get the hang of Navigator's usage.

Setting up automatic login and cookie saving:

var nav = new Navigator.Navigator("http://www.example.com");
nav.setSaveCookies(true);
nav.setCookieUsername(email);

// login form at http://www.example.com/login
nav.setLoginPath("login");

// will output all URLs, headers, cookies and payloads via Logger
nav.setDebug(true);

// for static login forms, this payload will be submitted during automatic log-in (if enabled)
nav.setLoginPayload({
	email: email,
	password: password
});

// only logged-out pages contain this; if we see this, we know we're logged out
nav.setLogoutIndicator("password_reset");

// if you request /home, and Navigator happens to find itself logged out and logs in again,
// setting this to true will make Navigator re-fetch /home right after the re-login
nav.setRefetchOnLogin(true);

// try #1; will automatically log in if stored cookies are already expired
var str = nav.doGet("daily");
if (str.indexOf("Get daily prize now") > 0) {
	str = nav.doGet("daily"); // try #2
	if (str.indexOf("Get daily prize now") > 0) {
		// notify failure
	}
}

Extracting HTML and form parameters for composing complex payloads:

// n is a Navigator

/* this will encode the payload as a regular form submission, just like UrlFetchApp
   Content-Type has no significance; if you want to send JSON you have to
   JSON.stringify() the payload yourself and pass the resulting string as the payload */

r = n.doPost("submit_path", {
	name: "name",
	email: email,
	"X-CSRF-Token": Navigator.extractReverse(s, "content", 'itemprop="csrf-token"')
}, {
	"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
	"X-Requested-With": "XMLHttpRequest"
});

/* sometimes, due to issues like empty param values (https://issuetracker.google.com/issues/36762225)
   you will have to do the encoding on your own, in which case you can directly submit a string payload
   which will be passed to UrlFetchApp verbatim (with escaping disabled) */

var payload = "authenticity_token=" +
	encodeURIComponent(Navigator.getFormParam(s, 'name="authenticity_token"')) +
	"&timestamp=" + Navigator.getFormParam(s, 'name="timestamp"') +
	"&spinner=" + Navigator.getFormParam(s, "spinner") + "&" +
	Navigator.extract(s, "name", "js-form-login") + "=USERNAME&login=&" +
	Navigator.extract(s, "name", "js-form-password") + "=PASSWORD&password=";

s = n.doPost("user_sessions", payload, {
	"Origin": "https://www.example.com",
	"X-CSRF-Token": Navigator.extractReverse(s, "content", 'itemprop="csrf-token"'),
	"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
	"X-Requested-With": "XMLHttpRequest"
});

Obtaining headers and cookies retrieved in the last response:

Logger.log(n.getLastHeaders());
Logger.log(n.getCookies());

You can use Navigator in your own Apps Script project, by adding it as an external library (Project key: MjQYH5uJpvHxeouBGoyDzheZkF1ZLDPsS). The source is also available via this Apps Script project, in case you are interested in peeking at the innards, or want to customize it with your own gimmicks. Should you decide that you need more, which may also be useful for others (including me), please do share your idea here (along with the implementation, if you already have one) so that I can incorporate it to Navigator.

Happy scripting!

Monday, September 11, 2017

Logging in Style: log4j 2, Contextuality, Auto-cleanup... All with No Strings Attached!

Logging—maintaining a temporal trace of operations—is vital for any mission-critical system, no matter how big or small. Same was the case with our Project-X framework, which is why we wanted to get it right, right from the beginning.

Contextual logging—where each log line automatically records its originating logical context, such as whether it came from a specific unit or from the base framework—was something we have been looking forward to, based on our experience with logging on the legendary UltraESB.

We already knew log4j2 was offering contextual logging with its CloseableThreadContext implementation, with almost everything that we need; but we needed more:

  1. We needed a proper log code governance mechanism where each log line contains a unique log code, indicating the subsystem, module (package) and even an exact "index" of the specific log statement, so that we would no longer need to grep through the whole codebase to find out where the bugger came from.
  2. We needed to inject environmental variables and system properties, with a certain prefix, to be automatically injected to the logging context, so that specific applications could inject their runtime parameters to the logs (such as cluster ID in case of our Integration Platform).

We also needed to be API-independent of log4j2, as we should retain the freedom to detach from log4j2 and utilize a different logging framework (such as logback) in case we need to. While we could have utilized a third-party wrapper such as SLF4J we couldn't find a wrapper that could readily fulfill all our demands.

Hence, as with the previous UltraESB, we wrapped log4j2 with x-logging, our own logging implementation. x-logging consists of an API and a set of bindings to real logging frameworks (like log4j2 and logback), one of which can be plugged in dynamically at server startup time with Java's dear old ServiceLoader mechanism. This helped us to avoid leaking of log4j2-specific stuff into our implementations, as the log4j2-based implementation (and hence log4j2 itself) could be completely removed from the set of compile-time dependencies.

Ruwan from our team, who was also the originator of Project-X, hacked around with log4j2 for some time, and finally came up with a cool design to automatically propagate the current context of a log line, i.e. whether it originated from the platform (system, a.k.a. engine) or from a deployed project, and if it's the latter, other metadata of the project (such as the version). The coolest part was that this context automatically gets cleaned up once execution leaves that particular context.

If you are familiar with CloseableThreadContext, this may all sound quite simple. For the rest of the crowd, it would be enough to mention that CloseableThreadContext facilitates injecting key-value pairs to the logging context, such that when the context is closed, only the ones injected in the current context get cleaned up. The injected values are automatically made available to the logging context (ThreadContext) of the calling thread; or, in English, every log line printed by that thread sees the parameter in its thread context (or MDC in old-school jargon).

Okay, I admit the above might have been a bit hard to understand. Perhaps a sample snippet may do a better job:

// assume we are walking in, with nothing useful inside the context

try (CloseableThreadContext.Instance level1 = CloseableThreadContext.put("level", "1")) {

    // now the context has "1" as "level"
    logger.debug("Commencing operation"); // will see {level=1} as the context
    // let's also put in a "clearance" value
    level1.put("clearance", "nypd");
    // now, any log lines would see {level=1,clearance=nypd}

    // let's go deeper
    try (CloseableThreadContext.Instance level2 = CloseableThreadContext.put("level", "2").put("clearance", "fbi")) {

        // now both of the above "level" and "clearance" values are "masked" by the new ones
        // and yes, you can chain together the context mutations
        logger.debug("Commencing investigation"); // will see {level=2,clearance=fbi}

        // putting in some more
        level2.put("access", "privileged");
        // now context is {level=2,clearance=fbi,access=privileged}

        // still deeper...
        try (CloseableThreadContext.Instance level3 = CloseableThreadContext.put("level", "3").put("clearance", "cia")) {

            // "level" and "clearance" are overridden, but "access" remains unchanged
            logger.debug("Commencing consipracy"); // {level=3,clearance=cia,access=privileged}

        }

        // cool thing is, once you're out of the level3 block, the context will be restored to that of level2
        // (thanks to the AutoCloseable nature of CloseableThreadContext.Instance)

        logger.debug("Back to investigation"); // {level=2,clearance=fbi,access=privileged}
    }

    // same for exiting level 2
    logger.debug("Back to operation"); // {level=1,clearance=nypd}; access is gone!

}

logger.debug("Back to square one"); // {}; oh no, all gone!

This mechanism was ideal for our use, as we needed to include the current execution context of a thread along with every log line generated by that thread:

  1. In Project-X, the underlying engine of UltraESB-X, a worker threadpool maintained at the base framework level is responsible for processing inbound messages on behalf of an integration flow belonging to a particular project.
  2. We consider the thread to be in the project's context only after the message has been injected to the ingress connector of a particular integration flow. The worker thread is supposed to do quite a bit of work before that, all of which would be considered to belong to a system context.
  3. We generate logs throughout the whole process, so they should automatically get tagged with the appropriate context.
  4. Moreover, since we have specific error codes for each log line, we need to open up a new context each time we actually output a log line, which would contain the required log code in addition to the other context parameters.

So, the life of a thread in the pool would be an endless loop of something like:

// wake up from thread pool

// do system level stuff

loggerA.debug(143, "Now I'm doing this cool thing : {}", param);

try (CloseableThreadContext.Instance projectCtx = CloseableThreadContext.put("project", project.getName())
     .put("version", project.getVersion())) {

    // do project level stuff

    loggerM.debug(78, "About to get busy : {}", param);

    // more stuff, tra la la la
}

// back to system level, do still more stuff

// jump back to thread pool and have some sleep

Internally, loggerA, loggerM and others will ultimately invoke a logImpl(code, message, params) method:

// context already has system/project info,
// logger already has a pre-computed codePrefix

try (CloseableThreadContext.Instance logCtx = CloseableThreadContext.put("logcode", codePrefix + code)) {
    // publish the actual log line
}

// only "logcode" cleared from the context, others remain intact

We simulated this behaviour without binding to log4j2, by introducing a CloseableContext interface whose log4j2 variant (Log4j2CloseableContext, obviously) will manipulate CloseableThreadContext instances in the same manner:

import java.io.Closeable;

public interface CloseableContext extends Closeable {

    CloseableContext append(final String key, final String value);

    void close();
}

and:

import org.adroitlogic.x.logging.CloseableContext;
import org.apache.logging.log4j.CloseableThreadContext;

public class Log4j2CloseableContext implements CloseableContext {

    private final CloseableThreadContext.Instance ctx;

    /
     * Creates an instance wrapping a new default MDC instance
     */
    Log4j2CloseableContext() {
        this.ctx = CloseableThreadContext.put("impl", "project-x");
    }

    /
     * Adds the provided key-value pair to the currently active log4j logging (thread) context
     *
     * @param key   the key to be inserted into the context
     * @param value the value to be inserted, corresponding to {@code key}
     * @return the current instance, wrapping the same logging context
     */
    @Override
    public CloseableContext append(String key, String value) {
        ctx.put(key, value);
        return this;
    }

    /
     * Closes the log4j logging context wrapped by the current instance
     */
    @Override
    public void close() {
        ctx.close();
    }
}

Now, all we have to do is to open up an appropriate context via a nice management interface, LogContextProvider:

// system context is active by default

...

try (CloseableContext projectCtx = LogContextProvider.forProject(project.getName(), project.getVersion())) {

    // now in project context
}

// back to system context

And in logImpl:

try (CloseableContext logCtx = LogContextProvider.overlayContext("logcode", codePrefix + code)) {
    // call the underlying logging framework
}

Since we load the CloseableContext implementation together with the logger binding (via ServiceLoader), we know that LogContextProvider will ultimately end up invoking the correct implementation.

And that's the story of contextual logging in our x-logging framework.

Maybe we could also explain our log code governance approach in a future post; until then, happy logging!

Sunday, September 10, 2017

How to Get Kubernetes Running - On Your Own Ubuntu Machine!

Note: This article largely borrows from my previous writeup on installing K8s 1.7 on CentOS.

Getting a local K8s cluster up and running is one of the first "baby steps" of toddling into the K8s ecosystem. While it would often be considered easier (and safer) to get K8s set up in a cluster of virtual machines (VMs), it does not really give you the same degree of advantage and flexibility as running K8s "natively" on your own host.

That is why, when we decided to upgrade our Integration Platform for K8s 1.7 compatibility, I decided to go with a native K8s installation for my dev environment, a laptop running Ubuntu 16.04 on 16GB RAM and a ?? CPU. While I could run three—at most four—reasonably powerful VMs in there as my dev K8s cluster, I could do much better in terms of resource saving (the cluster would be just a few low-footprint services, rather than resource-eating VMs) as well as ease of operation and management (start the services, and the cluster is ready in seconds). Whenever I wanted to try multi-node stuff like node groups or zones with fail-over I could simply hook up one or two secondary (worker) VMs to get things going, and shut them down when I'm done.

I had already been running K8s 1.2 on my machine, but the upgrade would not have been easy as it's a hyperjump from 1.2 to 1.7. Luckily, the K8s guys had written their installation scripts in an amazingly structured and intuitive way, so I could get everything running with around one day's struggle (most of which went into understanding the command flow and modifying it to suit Ubuntu, and my particular requirements).

I could easily have utilized the official kubeadm guide or the Juju-based installer for Ubuntu, but I wanted to get at least a basic idea of how things get glued together; additionally I also wanted an easily reproducable installation from scratch, with minimal dependencies on external packages or installers, so I could upgrade my cluster any time I desire, by directly building the latest stable—or beta or even alpha, if it comes to that—directly off the K8s source.

I started with the CentOS cluster installer, and gradually modified it to suit Ubuntu 16.04 (luckily the changes were minimal). Most of the things were in line with the installation for CentOS, including artifact build (make at the source root) and modifications to exclude custom downloads for etcd, flanneld and docker (I built the former two from their sources, and the latter I had already installed via the apt package manager). However the service configuration scripts had to be modified slightly to suit the Ubuntu (more precisely Debian) filesystem.

In my case I run apiserver on port 8090 rather than 8080 (to leave room for other application servers that are fond of 8080), hence I had to do some additional changes to propagate the port change throughout the K8s platform.

So, in summary, I had to utilize the following set of patches to get everything in place. If necessary, you could utilize them by applying them on top of the v1.7.2-beta.0 tag (possibly even a different tag) of the K8s source using the git apply command.

  1. Skipping etcd, flannel and docker binaries:

    diff --git a/cluster/centos/build.sh b/cluster/centos/build.sh
    index 5d31437..df057e4 100755
    --- a/cluster/centos/build.sh
    +++ b/cluster/centos/build.sh
    @@ -42,19 +42,6 @@ function clean-up() {
     function download-releases() {
       rm -rf ${RELEASES_DIR}
       mkdir -p ${RELEASES_DIR}
    -
    -  echo "Download flannel release v${FLANNEL_VERSION} ..."
    -  curl -L ${FLANNEL_DOWNLOAD_URL} -o ${RELEASES_DIR}/flannel.tar.gz
    -
    -  echo "Download etcd release v${ETCD_VERSION} ..."
    -  curl -L ${ETCD_DOWNLOAD_URL} -o ${RELEASES_DIR}/etcd.tar.gz
    -
    -  echo "Download kubernetes release v${K8S_VERSION} ..."
    -  curl -L ${K8S_CLIENT_DOWNLOAD_URL} -o ${RELEASES_DIR}/kubernetes-client-linux-amd64.tar.gz
    -  curl -L ${K8S_SERVER_DOWNLOAD_URL} -o ${RELEASES_DIR}/kubernetes-server-linux-amd64.tar.gz
    -
    -  echo "Download docker release v${DOCKER_VERSION} ..."
    -  curl -L ${DOCKER_DOWNLOAD_URL} -o ${RELEASES_DIR}/docker.tar.gz
     }
     
     function unpack-releases() {
    @@ -80,19 +67,12 @@ function unpack-releases() {
       fi
     
       # k8s
    -  if [[ -f ${RELEASES_DIR}/kubernetes-client-linux-amd64.tar.gz ]] ; then
    -    tar xzf ${RELEASES_DIR}/kubernetes-client-linux-amd64.tar.gz -C ${RELEASES_DIR}
         cp ${RELEASES_DIR}/kubernetes/client/bin/kubectl ${BINARY_DIR}
    -  fi
    -
    -  if [[ -f ${RELEASES_DIR}/kubernetes-server-linux-amd64.tar.gz ]] ; then
    -    tar xzf ${RELEASES_DIR}/kubernetes-server-linux-amd64.tar.gz -C ${RELEASES_DIR}
         cp ${RELEASES_DIR}/kubernetes/server/bin/kube-apiserver \
            ${RELEASES_DIR}/kubernetes/server/bin/kube-controller-manager \
            ${RELEASES_DIR}/kubernetes/server/bin/kube-scheduler ${BINARY_DIR}/master/bin
         cp ${RELEASES_DIR}/kubernetes/server/bin/kubelet \
            ${RELEASES_DIR}/kubernetes/server/bin/kube-proxy ${BINARY_DIR}/node/bin
    -  fi
     
       # docker
       if [[ -f ${RELEASES_DIR}/docker.tar.gz ]]; then
    diff --git a/cluster/centos/config-build.sh b/cluster/centos/config-build.sh
    index 4887bc1..39a2b25 100755
    --- a/cluster/centos/config-build.sh
    +++ b/cluster/centos/config-build.sh
    @@ -23,13 +23,13 @@ RELEASES_DIR=${RELEASES_DIR:-/tmp/downloads}
     DOCKER_VERSION=${DOCKER_VERSION:-"1.12.1"}
     
     # Define flannel version to use.
    -FLANNEL_VERSION=${FLANNEL_VERSION:-"0.6.1"}
    +FLANNEL_VERSION=${FLANNEL_VERSION:-"0.8.0"}
     
     # Define etcd version to use.
    -ETCD_VERSION=${ETCD_VERSION:-"3.0.9"}
    +ETCD_VERSION=${ETCD_VERSION:-"3.2.2"}
     
     # Define k8s version to use.
    -K8S_VERSION=${K8S_VERSION:-"1.3.7"}
    +K8S_VERSION=${K8S_VERSION:-"1.7.0"}
     
     DOCKER_DOWNLOAD_URL=\
     "https://get.docker.com/builds/Linux/x86_64/docker-${DOCKER_VERSION}.tgz"
    diff --git a/cluster/kube-up.sh b/cluster/kube-up.sh
    index 7877fb9..9e793ce 100755
    --- a/cluster/kube-up.sh
    +++ b/cluster/kube-up.sh
    @@ -40,8 +40,6 @@ fi
     
     echo "... calling verify-prereqs" >&2
     verify-prereqs
    -echo "... calling verify-kube-binaries" >&2
    -verify-kube-binaries
     
     if [[ "${KUBE_STAGE_IMAGES:-}" == "true" ]]; then
       echo "... staging images" >&2
    
  2. Changing apiserver listen port to 8090, and binary (symlink) and service configuration file locations to Debian defaults:

    diff --git a/cluster/centos/master/scripts/apiserver.sh b/cluster/centos/master/scripts/apiserver.sh
    index 6b7b1c2..62d24fd 100755
    --- a/cluster/centos/master/scripts/apiserver.sh
    +++ b/cluster/centos/master/scripts/apiserver.sh
    @@ -43,8 +43,8 @@ KUBE_ETCD_KEYFILE="--etcd-keyfile=/srv/kubernetes/etcd/client-key.pem"
     # --insecure-bind-address=127.0.0.1: The IP address on which to serve the --insecure-port.
     KUBE_API_ADDRESS="--insecure-bind-address=0.0.0.0"
     
    -# --insecure-port=8080: The port on which to serve unsecured, unauthenticated access.
    -KUBE_API_PORT="--insecure-port=8080"
    +# --insecure-port=8090: The port on which to serve unsecured, unauthenticated access.
    +KUBE_API_PORT="--insecure-port=8090"
     
     # --kubelet-port=10250: Kubelet port
     NODE_PORT="--kubelet-port=10250"
    @@ -101,7 +101,7 @@ KUBE_APISERVER_OPTS="   \${KUBE_LOGTOSTDERR}         \\
                             \${KUBE_API_TLS_PRIVATE_KEY_FILE}"
     
     
    -cat <<EOF >/usr/lib/systemd/system/kube-apiserver.service
    +cat <<EOF >/lib/systemd/system/kube-apiserver.service
     [Unit]
     Description=Kubernetes API Server
     Documentation=https://github.com/kubernetes/kubernetes
    diff --git a/cluster/centos/master/scripts/controller-manager.sh b/cluster/centos/master/scripts/controller-manager.sh
    index 3025d06..5aa0f12 100755
    --- a/cluster/centos/master/scripts/controller-manager.sh
    +++ b/cluster/centos/master/scripts/controller-manager.sh
    @@ -20,7 +20,7 @@ MASTER_ADDRESS=${1:-"8.8.8.18"}
     cat <<EOF >/opt/kubernetes/cfg/kube-controller-manager
     KUBE_LOGTOSTDERR="--logtostderr=true"
     KUBE_LOG_LEVEL="--v=4"
    -KUBE_MASTER="--master=${MASTER_ADDRESS}:8080"
    +KUBE_MASTER="--master=${MASTER_ADDRESS}:8090"
     
     # --root-ca-file="": If set, this root certificate authority will be included in
     # service account's token secret. This must be a valid PEM-encoded CA bundle.
    @@ -41,7 +41,7 @@ KUBE_CONTROLLER_MANAGER_OPTS="  \${KUBE_LOGTOSTDERR} \\
                                     \${KUBE_CONTROLLER_MANAGER_SERVICE_ACCOUNT_PRIVATE_KEY_FILE}\\
                                     \${KUBE_LEADER_ELECT}"
     
    -cat <<EOF >/usr/lib/systemd/system/kube-controller-manager.service
    +cat <<EOF >/lib/systemd/system/kube-controller-manager.service
     [Unit]
     Description=Kubernetes Controller Manager
     Documentation=https://github.com/kubernetes/kubernetes
    diff --git a/cluster/centos/master/scripts/etcd.sh b/cluster/centos/master/scripts/etcd.sh
    index aa73b57..34eff5c 100755
    --- a/cluster/centos/master/scripts/etcd.sh
    +++ b/cluster/centos/master/scripts/etcd.sh
    @@ -64,7 +64,7 @@ ETCD_PEER_CERT_FILE="/srv/kubernetes/etcd/peer-${ETCD_NAME}.pem"
     ETCD_PEER_KEY_FILE="/srv/kubernetes/etcd/peer-${ETCD_NAME}-key.pem"
     EOF
     
    -cat <<EOF >//usr/lib/systemd/system/etcd.service
    +cat <<EOF >/lib/systemd/system/etcd.service
     [Unit]
     Description=Etcd Server
     After=network.target
    diff --git a/cluster/centos/master/scripts/flannel.sh b/cluster/centos/master/scripts/flannel.sh
    index 092fcd8..21e2bbe 100644
    --- a/cluster/centos/master/scripts/flannel.sh
    +++ b/cluster/centos/master/scripts/flannel.sh
    @@ -30,7 +30,7 @@ FLANNEL_ETCD_CERTFILE="--etcd-certfile=${CERT_FILE}"
     FLANNEL_ETCD_KEYFILE="--etcd-keyfile=${KEY_FILE}"
     EOF
     
    -cat <<EOF >/usr/lib/systemd/system/flannel.service
    +cat <<EOF >/lib/systemd/system/flannel.service
     [Unit]
     Description=Flanneld overlay address etcd agent
     After=network.target
    diff --git a/cluster/centos/master/scripts/scheduler.sh b/cluster/centos/master/scripts/scheduler.sh
    index 1a68d71..3b444bf 100755
    --- a/cluster/centos/master/scripts/scheduler.sh
    +++ b/cluster/centos/master/scripts/scheduler.sh
    @@ -27,7 +27,7 @@ KUBE_LOGTOSTDERR="--logtostderr=true"
     # --v=0: log level for V logs
     KUBE_LOG_LEVEL="--v=4"
     
    -KUBE_MASTER="--master=${MASTER_ADDRESS}:8080"
    +KUBE_MASTER="--master=${MASTER_ADDRESS}:8090"
     
     # --leader-elect
     KUBE_LEADER_ELECT="--leader-elect"
    @@ -43,7 +43,7 @@ KUBE_SCHEDULER_OPTS="   \${KUBE_LOGTOSTDERR}     \\
                             \${KUBE_LEADER_ELECT}    \\
                             \${KUBE_SCHEDULER_ARGS}"
     
    -cat <<EOF >/usr/lib/systemd/system/kube-scheduler.service
    +cat <<EOF >/lib/systemd/system/kube-scheduler.service
     [Unit]
     Description=Kubernetes Scheduler
     Documentation=https://github.com/kubernetes/kubernetes
    diff --git a/cluster/centos/node/bin/mk-docker-opts.sh b/cluster/centos/node/bin/mk-docker-opts.sh
    index 041d977..177ee9f 100755
    --- a/cluster/centos/node/bin/mk-docker-opts.sh
    +++ b/cluster/centos/node/bin/mk-docker-opts.sh
    @@ -69,7 +69,6 @@ done
     
     if [[ $indiv_opts = false ]] && [[ $combined_opts = false ]]; then
       indiv_opts=true
    -  combined_opts=true
     fi
     
     if [[ -f "$flannel_env" ]]; then
    diff --git a/cluster/centos/node/scripts/docker.sh b/cluster/centos/node/scripts/docker.sh
    index 320446a..b0312fc 100755
    --- a/cluster/centos/node/scripts/docker.sh
    +++ b/cluster/centos/node/scripts/docker.sh
    @@ -20,10 +20,10 @@ DOCKER_OPTS=${1:-""}
     DOCKER_CONFIG=/opt/kubernetes/cfg/docker
     
     cat <<EOF >$DOCKER_CONFIG
    -DOCKER_OPTS="-H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock -s overlay --selinux-enabled=false ${DOCKER_OPTS}"
    +DOCKER_OPTS="-H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock -s aufs --selinux-enabled=false ${DOCKER_OPTS}"
     EOF
     
    -cat <<EOF >/usr/lib/systemd/system/docker.service
    +cat <<EOF >/lib/systemd/system/docker.service
     [Unit]
     Description=Docker Application Container Engine
     Documentation=http://docs.docker.com
    @@ -35,7 +35,7 @@ Type=notify
     EnvironmentFile=-/run/flannel/docker
     EnvironmentFile=-/opt/kubernetes/cfg/docker
     WorkingDirectory=/opt/kubernetes/bin
    -ExecStart=/opt/kubernetes/bin/dockerd \$DOCKER_OPT_BIP \$DOCKER_OPT_MTU \$DOCKER_OPTS
    +ExecStart=/usr/bin/dockerd \$DOCKER_OPT_BIP \$DOCKER_OPT_MTU \$DOCKER_OPTS
     LimitNOFILE=1048576
     LimitNPROC=1048576
     
    diff --git a/cluster/centos/node/scripts/flannel.sh b/cluster/centos/node/scripts/flannel.sh
    index 2830dae..a927bb2 100755
    --- a/cluster/centos/node/scripts/flannel.sh
    +++ b/cluster/centos/node/scripts/flannel.sh
    @@ -30,7 +30,7 @@ FLANNEL_ETCD_CERTFILE="--etcd-certfile=${CERT_FILE}"
     FLANNEL_ETCD_KEYFILE="--etcd-keyfile=${KEY_FILE}"
     EOF
     
    -cat <<EOF >/usr/lib/systemd/system/flannel.service
    +cat <<EOF >/lib/systemd/system/flannel.service
     [Unit]
     Description=Flanneld overlay address etcd agent
     After=network.target
    diff --git a/cluster/centos/node/scripts/kubelet.sh b/cluster/centos/node/scripts/kubelet.sh
    index 323a03e..4c93015 100755
    --- a/cluster/centos/node/scripts/kubelet.sh
    +++ b/cluster/centos/node/scripts/kubelet.sh
    @@ -39,7 +39,7 @@ NODE_HOSTNAME="--hostname-override=${NODE_ADDRESS}"
     
     # --api-servers=[]: List of Kubernetes API servers for publishing events,
     # and reading pods and services. (ip:port), comma separated.
    -KUBELET_API_SERVER="--api-servers=${MASTER_ADDRESS}:8080"
    +KUBELET_API_SERVER="--api-servers=${MASTER_ADDRESS}:8090"
     
     # --allow-privileged=false: If true, allow containers to request privileged mode. [default=false]
     KUBE_ALLOW_PRIV="--allow-privileged=false"
    @@ -63,7 +63,7 @@ KUBE_PROXY_OPTS="   \${KUBE_LOGTOSTDERR}     \\
                         \${KUBELET_DNS_DOMAIN}      \\
                         \${KUBELET_ARGS}"
     
    -cat <<EOF >/usr/lib/systemd/system/kubelet.service
    +cat <<EOF >/lib/systemd/system/kubelet.service
     [Unit]
     Description=Kubernetes Kubelet
     After=docker.service
    diff --git a/cluster/centos/node/scripts/proxy.sh b/cluster/centos/node/scripts/proxy.sh
    index 584987b..1f365fb 100755
    --- a/cluster/centos/node/scripts/proxy.sh
    +++ b/cluster/centos/node/scripts/proxy.sh
    @@ -29,7 +29,7 @@ KUBE_LOG_LEVEL="--v=4"
     NODE_HOSTNAME="--hostname-override=${NODE_ADDRESS}"
     
     # --master="": The address of the Kubernetes API server (overrides any value in kubeconfig)
    -KUBE_MASTER="--master=http://${MASTER_ADDRESS}:8080"
    +KUBE_MASTER="--master=http://${MASTER_ADDRESS}:8090"
     EOF
     
     KUBE_PROXY_OPTS="   \${KUBE_LOGTOSTDERR} \\
    @@ -37,7 +37,7 @@ KUBE_PROXY_OPTS="   \${KUBE_LOGTOSTDERR} \\
                         \${NODE_HOSTNAME}    \\
                         \${KUBE_MASTER}"
     
    -cat <<EOF >/usr/lib/systemd/system/kube-proxy.service
    +cat <<EOF >/lib/systemd/system/kube-proxy.service
     [Unit]
     Description=Kubernetes Proxy
     After=network.target
    diff --git a/cluster/centos/util.sh b/cluster/centos/util.sh
    index 88302a3..dbb3ca5 100755
    --- a/cluster/centos/util.sh
    +++ b/cluster/centos/util.sh
    @@ -136,7 +136,7 @@ function kube-up() {
     
       # set CONTEXT and KUBE_SERVER values for create-kubeconfig() and get-password()
       export CONTEXT="centos"
    -  export KUBE_SERVER="http://${MASTER_ADVERTISE_ADDRESS}:8080"
    +  export KUBE_SERVER="http://${MASTER_ADVERTISE_ADDRESS}:8090"
       source "${KUBE_ROOT}/cluster/common.sh"
     
       # set kubernetes user and password
    @@ -199,7 +199,7 @@ function troubleshoot-node() {
     function tear-down-master() {
     echo "[INFO] tear-down-master on $1"
       for service_name in etcd kube-apiserver kube-controller-manager kube-scheduler ; do
    -      service_file="/usr/lib/systemd/system/${service_name}.service"
    +      service_file="/lib/systemd/system/${service_name}.service"
           kube-ssh "$1" " \
             if [[ -f $service_file ]]; then \
               sudo systemctl stop $service_name; \
    @@ -217,7 +217,7 @@ echo "[INFO] tear-down-master on $1"
     function tear-down-node() {
     echo "[INFO] tear-down-node on $1"
       for service_name in kube-proxy kubelet docker flannel ; do
    -      service_file="/usr/lib/systemd/system/${service_name}.service"
    +      service_file="/lib/systemd/system/${service_name}.service"
           kube-ssh "$1" " \
             if [[ -f $service_file ]]; then \
               sudo systemctl stop $service_name; \
    

With these changes in place, installing K8s on my machine was straightforward, which included simply running this command from within the cluster directory of the patched source:

MASTER=janaka@10.0.0.1 \
NODES=janaka@10.0.0.1 \
DOCKER_OPTS="--insecure-registry=hub.adroitlogic.com:5000" \
KUBERNETES_PROVIDER=centos \
CERT_GROUP=janaka \
./kube-up.sh
  1. Because the certificate generation process has problems with localhost or 127.0.0.1 I had to use a static IP (10.0.0.1) assigned to one of my network interfaces.
  2. I have one master and one worker node, both of which are my own machine itself.
  3. janaka is the username on my local machine.
  4. We have a local Docker hub for holding our IPS images, at hub.adroitlogic.com:5000 (resolved via internal hostname mappings), which we need to inject to the Docker startup script (in order to be able to utilize our own images within the future IPS set-up).

Takeaway?

  • learned a bit about how K8s components fit together,
  • added quite a bit of shell scripting tips and tricks to my book of knowledge,
  • had my 101 on [Linux services (systemd)],
  • skipped one lunch,
  • fought several hours of frustration,
  • and finally felt a truckload of satisfaction, which flushed out the frustration without a trace.

Friday, September 8, 2017

Gracefully Shutting Down Java in Containers: Why You Should Double-Check!

Gracefulness is not only an admirable human quality: it is also a must-have for any application program, especially when it is heaving the burden of mission-critical domains.

UltraESB has had a good history of maintaining gracefulness throughout its runtime, including shutdown. The new UltraESB-X honoured the tradition and implemented graceful shutdown in its 17.07 release.

When we composed the ips-worker Docker image for our Integration Platform (IPS) as a tailored version of UltraESB-X, we could guarantee that ESBs running in the platform would shut down gracefully--or so we thought.

Unfortunately not.

As soon as we redeploy or change the replication count of a cluster, all ESB instances running under the cluster would terminate (and new instances get spawned to take their place). The termination is supposed to be graceful; the ESBs would first stop accepting any new incoming messages, and hold off the internal shutdown sequence for a few seconds until processing of in-flight messages gets completed, or a timeout ends the holdoff.

On our Kubernetes-based mainstream IPS release, we retrieve logs of ESB instances (pods) via the K8s API as well as via a database appender so that we can analyze them later. In analyzing the logs, we noticed that we were never seeing any ESB shutdown logs, no matter how big the log store has grown. It was as if the ESBs were getting brutally killed as soon as the termination signal was received.

To investigate the issue, I started off with a simplified Java program: one that registers a shutdown hook--the world-famous way of implementing graceful shutdown in Java, which we had utilized in both our ESBs--and keeps running forever, printing some text periodically (to indicate the main thread is active). As soon as the shutdown hook is triggered, I interrupted the main thread, changed the output to indicate that we are shutting down, and let the handler finish after a few seconds (consistent with a "mock" graceful shutdown).

class Kill {

	private static Thread main;

	public static void main(String[] a) throws Exception {

		Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
			public void run() {
				System.out.println("TERM");
				main.interrupt();
				for (int i = 0; i < 4; i++) {
					System.out.println("busy");
					try {
						Thread.sleep(1000);
					} catch (Exception e) {}
				}
				System.out.println("exit");
			}
		}));

		main = Thread.currentThread();
		while (true) {
			Thread.sleep(1000);
			System.out.println("run");
		}
	}
}

Testing it is pretty easy:

javac Kill.java
java Kill

While the program keeps on printing:

run
run
run
...

press Ctrl+C to see what happens:

...
run
run
^CTERM
busy
Exception in thread "main" java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at Kill.main(Kill.java:22)
busy
busy
busy
exit

Looks good.

That done, converting this into a fully-fledged Docker container took only a few minutes, and the following Dockerfile:

FROM openjdk:8-jre-alpine
ADD Kill*.class /
ENTRYPOINT ["java", "Kill"]

docker build -t kill:v1 .

Next I ran a container with the new image:

docker run -it --rm kill:v1

which gave the expected output:

run
run
run
...

Then I sent a TERM signal (which maps to Ctrl+C in normal jargon, and is the default trigger for Java's shutdown hook) to the process, using the kill command:

# pardon the fancy functions;
# they are quite useful for me when dealing with processes

function pid() {
    ps -ef | grep $1 | grep -v grep | awk '{print $2}'
}

function killsig() {
    for i in pid $2; do
        sudo kill $1 $i
    done
}

alias termit='killsig -15'

# with all the above in place, I just have to run:
termit Kill

As expected, the shutdown hook got invoked and executed smoothly.

Going a step further, I made the whole thing into a standalone K8s pod (backed by a single-replica Deployment):

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kill
spec:
  selector:
    matchLabels:
      k8s-app: kill
  template:
    metadata:
      labels:
        k8s-app: kill
    spec:
      containers:
      - name: kill
        image: kill:v1

and tried out the same thing, this time by zeroing-out spec.replicas (same as we do it in IPS) via the kubectl edit deployment command, instead of a manual kill -TERM:

kubectl edit deployment kill

# vi is my default editor
# set "replicas" to 0 (line 20 in my case)
# <ESC>:wq<ENTER>

while having a console tail of the pod in a separate window:

# fancy stuff again

function findapp() {
    kubectl get pod -l k8s-app=$1 -oname | cut -b 6-;
}

function klog() {
    kubectl logs -f findapp $1;
}

# the final command
klog kill

showing the output:

run
run
...
run
TERM
busy
Exception in thread "main" java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at Kill.main(Kill.java:22)
busy
busy
busy
exit

Damn, it still shuts down gracefully!

So what's wrong with my ips-worker?

Just to verify, I got a single-replica cluster running on IPS, manually changed the image (spec.template.spec.containers[0].image) and startup command (spec.template.spec.containers[0].command) of the K8s deployment via kubectl edit (keeping all other factors--such as environmental variables and volume mounts--unchanged), and tried out the same zero-out sequence;

Same result! Graceful shutdown!

Then it occurred to me that, while my kill container simply uses a java Kill command, ips-worker uses a bit more complicated command:

/bin/sh -c <copy some files> && <run some custom command> && <run ultraesb-x.sh>

where, in the last part, we construct (with a specially fabricated classpath, and some JVM parameters) and execute a pretty long java command that starts up the UltraESB-X beast.

So ultimately, the final live command in the container boils down to:

/bin/sh -c <basepath>/ultraesb-x.sh

Hence I tried a shell command on my kill container, by slightly changing the Dockerfile:

by slightly changing the Dockerfile:

FROM openjdk:8-jre-alpine
ADD Kill*.class /
# note the missing brackets and quotes, so that the command gets the default /bin/sh -c prefix
ENTRYPOINT java Kill

and yay! Graceful shutdown was no more. The Java process got killed brutally, on Docker (docker stop) as well as in K8s (replica zero-out).

Investigating further, I was guided by Google to this popular SE post which basically said that the shell (sh) does not pass received signals to its child processes by default. The suggested alternative was to run the internal command as an exec which would basically replace the parent process (sh) with the child (java, in case of kill):

FROM openjdk:8-jre-alpine
ADD Kill*.class /
ENTRYPOINT exec java Kill

For kill, that did the trick right away.

For ips-worker things were a bit different, as there were two levels of invocation: the container's command invoking a chain of commands via /bin/sh -c, and the built-in ultraesb-x.sh invoking the ultimate java command. Hence I had to include exec at two places:

Once at the end of the command chain:

/bin/sh -c \
<copy some files> && \
<run some custom command> && \
exec <basepath>/ultraesb-x.sh

And again at the end of ultraesb-x.sh:

# do some magic to compose the classpath and other info for ESB startup

exec $JAVA_HOME/bin/java <classpath and other params>

Simple as it may seem, those two execs were enough to bring back graceful shutdown to ips-worker, and hence to our Integration Platform.

Tuesday, September 5, 2017

Kubernetes 1.7 on CentOS: This is How We Nailed It!

Please Note: If you are looking for a comprehensive, step-by-step guide on installing Kubernetes on CentOS, check out this guide (composed by one of my colleagues) instead.

Caution: This blog was originally composed several months ago, when the K8s release cycle was at v1.7.0-alpha.3. As always, K8s has zoomed past us, hence most of the content would be downright obsolete by now!


If you are a fan of Kubernetes (K8s) and the ton of amazing features it has in store for developers, you would surely agree with me on the fact that getting a local K8s setup up and running is one of the most fundamental steps of learning K8s.

Since we have been waiting for quite some time to re-spin our Integration Platform (IPS) product on top of a new K8s version, we decided it was high time to get a brand new cluster up and running with 1.7.0 (which is still in beta as of this writing, but would become stable by the time we migrate our own code on top of it). Due to some stability requirements we wanted a setup running a RedHat-compatible base OS, so we decided to go ahead with CentOS.

While we could have always have followed one of the pre-compiled guides from K8s docs, such as this, we wanted to be a bit more adventurous and get things done by way of the K8s source; besides, we wanted to be able to upgrade our set-up when required, for which the best way was to properly understand how the K8s authors have nailed it in the original source.

First of all, we got hold of a clone of the K8s source at the branch master branch (as we needed the latest code base) and tag v1.7.0-alpha.3 (since we were looking for that specific release).

Next we built the K8s binary artifacts using the make command at sources root. (You can utilize the Docker-based cross compilation approach mentioned in the above guide as well, but note that it would download a 1.8+ GB cross compiler image and build artifacts for all platforms resulting in a 20+ GB total size.)

We set up a VirtualBox virtual machine running CentOS 7, and installed docker, etcd and flanneld afresh. Docker was installed via yum (sudo yum install docker) and configured for use by our default, non-root user. Others were installed via yum as well, with versions etcd-3.1.3-1.el7.x86_64.rpm and flannel-0.7.0-1.el7.x86_64.rpm, respectively.

We were hoping to clone the machine in order to obtain a multi-node set-up (without undertaking the overhead of installing the individual components on each machine separately). However, this seemingly complicated the whole process.

The default K8s installer for CentOS (located at cluster/centos) by default downloads archives of K8s client/server, etcd, flanneld and docker binaries as part of the configuration process. Since we needed to install K8s binaries from our custom build instead, and didn't want to download the other binaries at all (as they were already installed), we had to customize the corresponding build and deployment scripts to avoid downloading the artifacts and utilize the build artifacts instead, and to control the reconfiguration of etcd, flanneld and docker.

What follows is a breakdown of the changes we had to perform on top of the v1.7.0-alpha.3 tag to get things working with native (yum-based) installations of etcd, flanneld and docker, with some attempts to explain the rationale behind each change.

Tip: If you wish to utilize any of these patches, it should be possible to save them into a plaintext file (with or without the .patch extension) and apply them on top of v1.7.0-alpha.3 tag of the K8s source using the git apply command.

Tip: If you only have a subset of the above services (etcd, flanneld and docker) already installed, patch out only the appropriate files/sections.

  1. Avoiding download of binary artifacts:
    diff --git a/cluster/centos/build.sh b/cluster/centos/build.sh
    index 18bbe6f..09a3631 100755
    --- a/cluster/centos/build.sh
    +++ b/cluster/centos/build.sh
    @@ -42,19 +42,6 @@ function clean-up() {
     function download-releases() {
       rm -rf ${RELEASES_DIR}
       mkdir -p ${RELEASES_DIR}
    -
    -  echo "Download flannel release v${FLANNEL_VERSION} ..."
    -  curl -L ${FLANNEL_DOWNLOAD_URL} -o ${RELEASES_DIR}/flannel.tar.gz
    -
    -  echo "Download etcd release v${ETCD_VERSION} ..."
    -  curl -L ${ETCD_DOWNLOAD_URL} -o ${RELEASES_DIR}/etcd.tar.gz
    -
    -  echo "Download kubernetes release v${K8S_VERSION} ..."
    -  curl -L ${K8S_CLIENT_DOWNLOAD_URL} -o ${RELEASES_DIR}/kubernetes-client-linux-amd64.tar.gz
    -  curl -L ${K8S_SERVER_DOWNLOAD_URL} -o ${RELEASES_DIR}/kubernetes-server-linux-amd64.tar.gz
    -
    -  echo "Download docker release v${DOCKER_VERSION} ..."
    -  curl -L ${DOCKER_DOWNLOAD_URL} -o ${RELEASES_DIR}/docker.tar.gz
     }
     
     function unpack-releases() {
    @@ -80,19 +67,12 @@ function unpack-releases() {
       fi
     
       # k8s
    -  if [[ -f ${RELEASES_DIR}/kubernetes-client-linux-amd64.tar.gz ]] ; then
    -    tar xzf ${RELEASES_DIR}/kubernetes-client-linux-amd64.tar.gz -C ${RELEASES_DIR}
         cp ${RELEASES_DIR}/kubernetes/client/bin/kubectl ${BINARY_DIR}
    -  fi
    -
    -  if [[ -f ${RELEASES_DIR}/kubernetes-server-linux-amd64.tar.gz ]] ; then
    -    tar xzf ${RELEASES_DIR}/kubernetes-server-linux-amd64.tar.gz -C ${RELEASES_DIR}
         cp ${RELEASES_DIR}/kubernetes/server/bin/kube-apiserver \
            ${RELEASES_DIR}/kubernetes/server/bin/kube-controller-manager \
            ${RELEASES_DIR}/kubernetes/server/bin/kube-scheduler ${BINARY_DIR}/master/bin
         cp ${RELEASES_DIR}/kubernetes/server/bin/kubelet \
            ${RELEASES_DIR}/kubernetes/server/bin/kube-proxy ${BINARY_DIR}/node/bin
    -  fi
     
       # docker
       if [[ -f ${RELEASES_DIR}/docker.tar.gz ]]; then
    
  2. Avoiding verification of artifacts (since we no longer download them):
    diff --git a/cluster/kube-up.sh b/cluster/kube-up.sh
    index 7877fb9..9e793ce 100755
    --- a/cluster/kube-up.sh
    +++ b/cluster/kube-up.sh
    @@ -40,8 +40,6 @@ fi
     
     echo "... calling verify-prereqs" >&2
     verify-prereqs
    -echo "... calling verify-kube-binaries" >&2
    -verify-kube-binaries
     
     if [[ "${KUBE_STAGE_IMAGES:-}" == "true" ]]; then
       echo "... staging images" >&2
    
  3. Coping for natively-installed etcd and flanneld binaries (which are located (rather symlinked) at /usr/bin instead of /opt/kubernetes/bin where K8s expects them to be):
    diff --git a/cluster/centos/master/scripts/etcd.sh b/cluster/centos/master/scripts/etcd.sh
    index aa73b57..6b575da 100755
    --- a/cluster/centos/master/scripts/etcd.sh
    +++ b/cluster/centos/master/scripts/etcd.sh
    @@ -74,7 +74,7 @@ Type=simple
     WorkingDirectory=${etcd_data_dir}
     EnvironmentFile=-/opt/kubernetes/cfg/etcd.conf
     # set GOMAXPROCS to number of processors
    -ExecStart=/bin/bash -c "GOMAXPROCS=\$(nproc) /opt/kubernetes/bin/etcd"
    +ExecStart=/bin/bash -c "GOMAXPROCS=\$(nproc) /usr/bin/etcd"
     Type=notify
     
     [Install]
    diff --git a/cluster/centos/master/scripts/flannel.sh b/cluster/centos/master/scripts/flannel.sh
    index 092fcd8..5d9630d 100644
    --- a/cluster/centos/master/scripts/flannel.sh
    +++ b/cluster/centos/master/scripts/flannel.sh
    @@ -37,7 +37,7 @@ After=network.target
     
     [Service]
     EnvironmentFile=-/opt/kubernetes/cfg/flannel
    -ExecStart=/opt/kubernetes/bin/flanneld --ip-masq \${FLANNEL_ETCD} \${FLANNEL_ETCD_KEY} \${FLANNEL_ETCD_CAFILE} \${FLANNEL_ETCD_CERTFILE} \${FLANNEL_ETCD_KEYFILE}
    +ExecStart=/usr/bin/flanneld --ip-masq \${FLANNEL_ETCD} \${FLANNEL_ETCD_KEY} \${FLANNEL_ETCD_CAFILE} \${FLANNEL_ETCD_CERTFILE} \${FLANNEL_ETCD_KEYFILE}
     
     Type=notify
     
    @@ -48,7 +48,7 @@ EOF
     # Store FLANNEL_NET to etcd.
     attempt=0
     while true; do
    -  /opt/kubernetes/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
    +  /usr/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
         --no-sync -C ${ETCD_SERVERS} \
         get /coreos.com/network/config >/dev/null 2>&1
       if [[ "$?" == 0 ]]; then
    @@ -59,7 +59,7 @@ while true; do
           exit 2
         fi
     
    -    /opt/kubernetes/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
    +    /usr/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
           --no-sync -C ${ETCD_SERVERS} \
           mk /coreos.com/network/config "{\"Network\":\"${FLANNEL_NET}\"}" >/dev/null 2>&1
         attempt=$((attempt+1))
    diff --git a/cluster/centos/node/bin/mk-docker-opts.sh b/cluster/centos/node/bin/mk-docker-opts.sh
    index 041d977..177ee9f 100755
    --- a/cluster/centos/node/bin/mk-docker-opts.sh
    +++ b/cluster/centos/node/bin/mk-docker-opts.sh
    @@ -69,7 +69,6 @@ done
     
     if [[ $indiv_opts = false ]] && [[ $combined_opts = false ]]; then
       indiv_opts=true
    -  combined_opts=true
     fi
     
     if [[ -f "$flannel_env" ]]; then
    diff --git a/cluster/centos/node/scripts/docker.sh b/cluster/centos/node/scripts/docker.sh
    index 320446a..3f38f3e 100755
    --- a/cluster/centos/node/scripts/docker.sh
    +++ b/cluster/centos/node/scripts/docker.sh
    @@ -35,7 +35,7 @@ Type=notify
     EnvironmentFile=-/run/flannel/docker
     EnvironmentFile=-/opt/kubernetes/cfg/docker
     WorkingDirectory=/opt/kubernetes/bin
    -ExecStart=/opt/kubernetes/bin/dockerd \$DOCKER_OPT_BIP \$DOCKER_OPT_MTU \$DOCKER_OPTS
    +ExecStart=/usr/bin/dockerd \$DOCKER_OPT_BIP \$DOCKER_OPT_MTU \$DOCKER_OPTS
     LimitNOFILE=1048576
     LimitNPROC=1048576
     
    diff --git a/cluster/centos/node/scripts/flannel.sh b/cluster/centos/node/scripts/flannel.sh
    index 2830dae..384788f 100755
    --- a/cluster/centos/node/scripts/flannel.sh
    +++ b/cluster/centos/node/scripts/flannel.sh
    @@ -39,7 +39,7 @@ Before=docker.service
     [Service]
     EnvironmentFile=-/opt/kubernetes/cfg/flannel
     ExecStartPre=/opt/kubernetes/bin/remove-docker0.sh
    -ExecStart=/opt/kubernetes/bin/flanneld --ip-masq \${FLANNEL_ETCD} \${FLANNEL_ETCD_KEY} \${FLANNEL_ETCD_CAFILE} \${FLANNEL_ETCD_CERTFILE} \${FLANNEL_ETCD_KEYFILE}
    +ExecStart=/usr/bin/flanneld --ip-masq \${FLANNEL_ETCD} \${FLANNEL_ETCD_KEY} \${FLANNEL_ETCD_CAFILE} \${FLANNEL_ETCD_CERTFILE} \${FLANNEL_ETCD_KEYFILE}
     ExecStartPost=/opt/kubernetes/bin/mk-docker-opts.sh -d /run/flannel/docker
     
     Type=notify
    @@ -52,7 +52,7 @@ EOF
     # Store FLANNEL_NET to etcd.
     attempt=0
     while true; do
    -  /opt/kubernetes/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
    +  /usr/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
         --no-sync -C ${ETCD_SERVERS} \
         get /coreos.com/network/config >/dev/null 2>&1
       if [[ "$?" == 0 ]]; then
    @@ -63,7 +63,7 @@ while true; do
           exit 2
         fi
     
    -    /opt/kubernetes/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
    +    /usr/bin/etcdctl --ca-file ${CA_FILE} --cert-file ${CERT_FILE} --key-file ${KEY_FILE} \
           --no-sync -C ${ETCD_SERVERS} \
           mk /coreos.com/network/config "{\"Network\":\"${FLANNEL_NET}\"}" >/dev/null 2>&1
         attempt=$((attempt+1))
    

Once all fixes were in place, all we needed to do to get the ball rolling, was to run

MASTER=adrt@192.168.1.192 \
NODES="adrt@192.168.1.192 adrt@192.168.1.193 adrt@192.168.1.194" \
DOCKER_OPTS="--insecure-registry=hub.adroitlogic.com:5000" \
KUBERNETES_PROVIDER=centos \
CERT_GROUP=janaka \
./kube-up.sh

within the cluster directory of the K8s source.

  • Our master node is 192.168.1.192.
  • We have 3 worker nodes, 192.168.1.192, 192.168.1.193 and 192.168.1.194.
  • Our master node itself is a worker node (since we often don't have enough resources on our machines, to run a dedicated master).
  • adrt is the CentOS user on each node (with superuser privileges).
  • janaka is the username on my local machine (where the kube-up.sh script actually gets executed).
  • We have a local Docker hub for holding our IPS images, at hub.adroitlogic.com:5000 (resolved via internal hostname mappings), which we need to inject to the Docker startup script (in order to be able to utilize our own images within the future IPS set-up).

And within minutes, we got a working K8s cluster!


P.S.: All this happened a long time ago, and we utilized the cluster in developing and testing our Integration Platform (IPS), whose latest 17.07 release is now available for download. Check it out; you may happen to like it!