Monday, May 20, 2019

How I made AWS CLI 300% faster! [FULL disclosure]

Yeah yeah, it's "highly experimental" and all, but still it's 3 times faster than simply running aws bla bla bla, the "plain" aws-cli.

Full speed ahead - with <code>aws-cli</code>! (sashkin7)

And yes, it won't always be that fast, especially if you only run AWS CLI only about once a fortnight. But it will certainly have a clear impact once you start batching up your AWS CLI calls; maybe routine account checks/cleanups, maybe extracting tons of CloudWatch Metrics records; or maybe a totally different, unheard-of use case.

Whatever it is, I guess it would be useful for the masses some day.

Plus, as one of the authors and maintainers of the world's first serverless IDE, I have certainly had several chances to put it to good use!

The problem: why AWS CLI is "slow" for me

(Let's just call it "CLI", shall we?)

It's actually nothing to do with the CLI itself; rather it's the fact that each CLI invocation is a completely new program execution cycle.

This means:

But, as usual, the highest impact comes via network I/O:

  • the CLI has to create an API client from scratch (the previous one was lost when the command execution completed)
  • since the network connection to AWS is managed by the client, this means that each command creates (and then destroys) a fresh TCP connection to the AWS endpoint; which involves a DNS lookup as well (although later lookups may be served from the system cache)
  • since AWS APIs almost always use SSL, every new connection results in a full SSL handshake (client hello, server hello, server cert, yada yada yada)

Now, assume you have 20 CloudWatch Log Groups to be deleted. Since the Logs API does not offer a bulk deletion option, the cheapest way to do this would be to run a simple shell script - looping aws logs delete-log-group over all groups:

for i in $(aws logs describe-log-groups --query 'logGroups[*].logGroupName' --output text); do
    aws logs delete-log-group --log-group-name $i
done

This would run the CLI 20 times (21 to be precise, if you count the initial list API call); meaning that all of the above will run 20 times. Clearly a waste of time and resources, since we were quite clear that the same endpoint was going to be invoked in all those runs.

Maybe it's just repetition, after all. (MemeGenerator)

Try scaling this up to hundreds or thousands of batched operations; and see where it takes you!

And no, aws-shell does not cut it.

Not yet, at least.

Leaving aside the nice and cozy REPL interface (interactive user prompt), handy autocompletion, syntax coloring and inline docs, aws-shell does not give you any performance advantage over aws-cli. Every command in the shell is executed in a new AWS CLI instance - with parsers, command hierarchies, API specs and - more importantly API clients - getting recreated for every command.

Skeptical? Peek at the aws-shell sources; or better still, fire up Wireshark (or tcpdump if you dare), run a few commands in the shell REPL, and see how each command initializes a fresh SSL channel from scratch.

The proposal: what can we do?

Obviously, the CLI cannot do pretty much anything about it. It's a simple program, and whatever improvements we do, won't last until the next invocation. The OS would rudely wipe them and start the next CLI with a clean slate; unless we use some spooky (and rather discouraged) memory persistence magic to serialize and reload the CLI's state. Even then, the other OS-level stuff (network sockets etc.) will be gone, and our effort would be pretty much fruitless.

If we are going to make any impactful changes, we need to make the CLI stateful; a long-running process.

The d(a)emon

In the OS world, this usually means setting up a daemon - a background process that waits for and processes events like user commands. (A popular example is MySQL, with its mysql-server daemon and mysql-client packages.)

In our case, we don't want a fully-fledged "managed" daemon - like a system service. For example, there's no point in starting our daemon before we actually start making our CLI calls; also, if our daemon dies, there's no point in starting it right away; since we cannot recover the lost state anyway.

So we have a simple plan:

  • break the CLI into a "client" and daemon
  • every time we run the CLI,
    • check for the presence of the daemon, and
    • spawn the daemon if it is not already running

This way, if the daemon dies, the next CLI invocation will auto-start it. Nothing to worry, nothing to manage.

Our fast AWS CLI daemon - it's all in a subprocess!

It is easy to handle the daemon spawn without having the trouble of maintaining a second program or script; simply use subprocess.Popen to launch another instance of the program, and instruct it to run the daemon's code path, rather than the client's.

Enough talk; show me the code!

Enough talk; let's fight! (KFP, YouTube)

Here you go:

#!/usr/bin/python

import os
import sys
import tempfile
import psutil
import subprocess

rd = tempfile.gettempdir() + "/awsr_rd"
wr = tempfile.gettempdir() + "/awsr_wr"


def run_client():
	out = open(rd, "w")
	out.write(" ".join(sys.argv))
	out.write("\n")
	out.close()

	inp = open(wr, "r")
	result = inp.read()
	inp.close()

	sys.stdout.write(result)


def run_daemon():
	from awscli.clidriver import CLIOperationCaller, LOG, create_clidriver, HISTORY_RECORDER

	def patchedInit(self, session):
		self._session = session
		self._client = None

	def patchedInvoke(self, service_name, operation_name, parameters, parsed_globals):
		if self._client is None:
			LOG.debug("Creating new %s client" % service_name)
			self._client = self._session.create_client(
				service_name, region_name=parsed_globals.region,
				endpoint_url=parsed_globals.endpoint_url,
				verify=parsed_globals.verify_ssl)
		client = self._client

		response = self._make_client_call(
			client, operation_name, parameters, parsed_globals)
		self._display_response(operation_name, response, parsed_globals)
		return 0

	CLIOperationCaller.__init__ = patchedInit
	CLIOperationCaller.invoke = patchedInvoke

	driver = create_clidriver()
	while True:
		inp = open(rd, "r")
		args = inp.read()[:-1].split(" ")[1:]
		inp.close()

		if len(args) > 0 and args[0] == "exit":
			sys.exit(0)

		sys.stdout = open(wr, "w")
		rc = driver.main(args)

		HISTORY_RECORDER.record('CLI_RC', rc, 'CLI')
		sys.stdout.close()


if __name__ == "__main__":
	if not os.access(rd, os.R_OK | os.W_OK):
		os.mkfifo(rd)
	if not os.access(wr, os.R_OK | os.W_OK):
		os.mkfifo(wr)

	# fork if awsr daemon is not already running
	ps = psutil.process_iter(attrs=["cmdline"])
	procs = 0
	for p in ps:
		cmd = p.info["cmdline"]
		if len(cmd) > 1 and cmd[0].endswith("python") and cmd[1] == sys.argv[0]:
			procs += 1
	if procs < 2:
		sys.stderr.write("Forking new awsr background process\n")
		with open(os.devnull, 'r+b', 0) as DEVNULL:
			# new instance will see env var, and run itself as daemon
			p = subprocess.Popen(sys.argv, stdin=DEVNULL, stdout=DEVNULL, stderr=DEVNULL, close_fds=True, env={"AWSR_DAEMON": "True"})
			run_client()

	elif os.environ.get("AWSR_DAEMON") == "True":
		run_daemon()
	else:
		run_client()

Yep, just 89 lines of rather primitive code - of course it's also on GitHub, in case you were wondering.

Some statistics - if you're still not buying it

"Lies, damn lies and statistics", they say. But sometimes, statistics can do wonders when you are trying to prove a point.

As you would understand, our new REPL really shines when there are more and more individual invocations (API calls); so that's what we would compare.

S3 API: s3api, not s3

Let's upload some files (via put-object):

date

for file in $(find -type f -name "*.sha1"); do
    aws s3api put-object --acl public-read --body $file --bucket target.bucket.name --key base/path/
done

date
  • Bucket region: us-east-1
  • File type: fixed-length checksums
  • File size: 40 bytes each
  • Additional: public-read ACL

Uploading 70 such files via aws s3api put-object takes:

  • 4 minutes 35 seconds
  • 473.5 KB data (319.5 KB downlink + 154 KB uplink)
  • 70 DNS lookups + SSL handshakes (one for each file)

In comparison, uploading 72 files via awsr s3api put-object takes:

  • 1 minute 28 seconds
  • 115.5 KB data (43.5 KB downlink + 72 KB uplink)
  • 1 DNS lookup + SSL handshake for the whole operation

A 320% improvement on latency (or 420%, if you consider bandwidth).

If you feel like it, watch the outputs (stdout) of the two runs - real-time. You would notice how awsr shows a low and consistent latency from the second output onwards; while the plain aws shows almost the same latency between every output pair - apparently because almost everything gets re-initialized for each call.

If you monitor (say, "wireshark") your network interface, you will see the real deal: aws continuously makes DNS queries and SSL handshakes, while awsr just makes one every minute or so.

Counterargument #1: If your files are all in one place or directory hierarchy, you could just use aws s3 cp or aws s3 sync in one go. These will be as performant as awsr, if not more. However in my case, I wanted to pick 'n' choose only a subset of files in the hierarchy; and there was no easy way of doing that with the aws command alone.

Counterargument #2: If you want to upload to multiple buckets, you will have to batch up the calls bucket-wise (us-east-1 first, ap-southeast-2 next, etc.); and kill awsr after each batch - more on that later.

CloudWatch logs

Our serverless IDE Sigma generates quite a lot of CloudWatch Logs - especially when our QA battalion is testing it. To keep things tidy, I prefer to occasionally clean up these logs, via aws logs delete-log-group.

date

for i in $(aws logs describe-log-groups --query 'logGroups[*].logGroupName' --output text); do
    echo $i
    aws logs delete-log-group --log-group-name $i
done

date

Cleaning up 172 such log groups on us-east-1, via plain aws, takes:

  • 5 minutes 44 seconds
  • 1.51 MB bandwidth (1133 KB downlink, 381 KB uplink)
  • 173 (1 + 172) DNS lookups + SSL handshakes; one for each log group, plus one for the initial listing

On the contrary, deleting 252 groups via our new REPL awsr, takes just:

  • 2 minutes 41 seconds
  • 382 KB bandwidth (177 KB downlink, 205 KB uplink)
  • 4 DNS lookups + SSL handshakes (about 1 in each 60 seconds)

This time, a 310% improvement on latency; or 580% on bandwidth.

CloudWatch metrics

I use this script to occasionally check the sizes of our S3 buckets - to track down and remove any garbage; playing the "scavenger" role:

Okay, maybe not that much, but... (CartoonStock)

for bucket in `awsr s3api list-buckets --query 'Buckets[*].Name' --output text`; do
    size=$(awsr cloudwatch get-metric-statistics --namespace AWS/S3 \
        --start-time $(date -d @$((($(date +%s)-86400))) +%F)T00:00:00 --end-time $(date +%F)T00:00:00 \
        --period 86400 --metric-name BucketSizeBytes \
        --dimensions Name=StorageType,Value=StandardStorage Name=BucketName,Value=$bucket \
        --statistics Average --output text --query 'Datapoints[0].Average')
    if [ $size = "None" ]; then size=0; fi
    printf "%8.3f  %s\n" $(echo $size/1048576 | bc -l) $bucket
done

Checking 45 buckets via aws (45+1 API calls to the same CloudWatch API endpoint), takes:

94 seconds

Checking 61 buckets (62 API calls) via awsr, takes:

44 seconds

A 288% improvement.

The catch

There are many; more unknowns than knowns, in fact:

  • The REPL depends on serial communication via pipes; so you cannot run things in parallel - e.g. invoke several commands and wait for all of them to complete. (This, however, should not affect any internal parallelizations of aws-cli itself.)
  • awsr may start acting up, if you cancel or terminate an already running command - also a side-effect of using pipes.
  • awsr reuses internal client objects across invocations (sessions), so it is, let's say, "sticky"; it "remembers" - and does not allow you to override - the profile, region etc. across invocations. In order to start working with a new configuration, you should:
    • terminate the existing daemon:
      kill $(ps -ef -C /usr/bin/python | grep -v grep | grep awsr | awk '{print $2}')
    • in case the daemon might have been processing a command when it was brutally massacred; delete the pipes /tmp/awsr_rd and /tmp/awsr_wr
    • run a new awsr with the correct profile (--profile), region (--region) etc.
  • awsr cannot produce interactive output - at least not yet - as it simply reads/writes from/to each pipe exactly once in a single invocation. So commands like ec2 wait and cloudformation deploy will not work as you expected.
  • Currently the pipes only capture standard input and standard output; so, unless you initially launched awsr in the current console/tty, you won't be seeing any error messages (written to standard error) being generated by the underlying AWS API call/command.
  • Some extensions like s3 don't seem to benefit from the caching - even when invoked against the same bucket. It needs further investigation. (Luckily, s3api works fine - as we saw earlier.)

Bonus: hands-on AWS CLI fast automation example, FTW!

I run this occasionally to clean up our AWS accounts of old logs and build data. If you are curious, replace the awsr occurrences with aws (and remove the daemon-killing magic), and witness the difference in speed!

Caution: If there are ongoing CodeBuild builds, the last step may keep on looping - possibly even indefinitely, if the build is stuck in BUILD_IN_PROGRESS status. If you run this from a fully automated context, you may need to enhance the script to handle such cases as well.

for p in araProfile meProfile podiProfile thadiProfile ; do
    for r in us-east-1 us-east-2 us-west-1 us-west-2 ca-central-1 eu-west-1 eu-west-2 eu-central-1 \
        ap-northeast-1 ap-northeast-2 ap-southeast-1 ap-southeast-2 sa-east-1 ap-south-1 ; do

        # profile and region changed, so kill any existing daemon before starting
        arg="--profile $p --region $r"
        kill $(ps -ef -C /usr/bin/python | grep -v grep | grep awsr | awk '{print $2}')
        rm /tmp/awsr_rd /tmp/awsr_wr

        # log groups
        for i in $(awsr $arg logs describe-log-groups --query 'logGroups[*].logGroupName' --output text); do
            echo $i
            awsr $arg logs delete-log-group --log-group-name $i
        done

        # CodeBuild projects
        for i in $(awsr $arg codebuild list-projects --query 'projects[*]' --output text); do
            echo $i
            awsr $arg codebuild delete-project --name $i
        done

        # CodeBuild builds; strangely these don't get deleted when we delete the parent project...
        while true; do
            builds=$(awsr $arg codebuild list-builds --query 'ids[*]' --output text --no-paginate)
            if [[ $builds = "" ]]; then break; fi
            awsr $arg codebuild batch-delete-builds --ids $builds
        done

    done
done

Automation FTW! (CartoonStock)

In closing: so, there it is!

Feel free to install and try out awsr; after all there's just one file, with less than a hundred lines of code!

Although I cannot make any guarantees, I'll try to eventually hunt down and fix the gaping holes and shortcomings; and any other issues that you or me come across along the way.

Over to you, soldier/beta user!

No comments: