Showing posts with label EC2. Show all posts
Showing posts with label EC2. Show all posts

Wednesday, June 24, 2020

One Bite of Real-world Serverless: Controlling an EC2 with Lambda, API Gateway and Sigma

Originally written for The SLAppForge Blog; Jun 19, 2020

I have been developing and blogging about Sigma, the world's first serverless IDE for serverless developers - but haven't really been using it for my non-serverless work. That was why, when a (somewhat) peculiar situation came up recently, I decided to give Sigma a full-scale spin.

The Situation: a third party needs to control one of our EC2 instances

Our parent company AdroitLogic, sells an enterprise B2B messaging platform called AS2 Gateway - which comes as a simple SaaS subscription as well as an on-premise or cloud-installable dedicated deployment. (Meanwhile, part of our own team is also working on making it a completely serverless solution - we'll probably be bothering you with a whole lotta blog posts on that too, pretty soon!)

One of our potential clients needed a customized copy of the platform, first as a staging instance in our own AWS account; they would configure and test their integrations against it, before deciding on a production deployment - under a different cloud platform of their choice, in their own realm.

Their work time zone is several hours ahead of ours; keeping aside the clock skew on emails and Zoom calls, the staging instance had to be made available during their working hours, not ours.

Managing the EC2 across time zones: the Options

Obviously, we did have a few choices:

  • keep the instance running 24/7, so our client can access it anytime they want - obviously the simplest but also the costliest choice. True, one hour of EC2 time is pretty cheap - less than half a dollar - but it tends to add up pretty fast; while we continue to waste precious resources on a mostly-idling EC2 VM instance.
  • get up at 3 AM (figure of speech) every morning and launch the instance; and shut it down when we sign off - won't work if our client wishes to work late nights; besides they don't get the chance to do the testing every day, so there's still room for significant waste
  • fix up some automated schedule to start and stop the instance - pretty much the same caveats as before (minus the "getting up at 3 AM" part)
  • delegate control of the instance to our client, so they can start and stop it at their convenience

Evidently, the last option was the most economical for us (remember, the client is still in evaluation stage - and may decide not to go with us, after all), and also fairly convenient for them (just two extra steps, before and after work, plus a few seconds' startup delay).

Client-controlled EC2: how to KISS it, the right way

But on the other hand, we didn't want to overcomplicate the process either:

  • Giving them access to our AWS console was out of the question - even with highly constrained access.
  • A key pair with just ec2:StartInstances and ec2:StopInstances IAM permissions on the respective instance ID, would have been ideal; but it would still mean they would have to either install the AWS CLI, or write (or run) some custom code snippets every time they wanted to control the instance.
  • AWS isn't, and wasn't going to be, their favorite cloud platform anyway; so any AWS-specific steps would have been an unnecessary overhead for them.

KISS, FTW!

Serverless to the rescue!

Most probably, you are already screaming out the solution: a pair of custom HTTP (API Gateway) endpoints backed by dedicated Lambdas (we're thinking serverless, after all!) that would do that very specific job - and have just that permission, nothing else, keeping with the preached-by-everybody, least privilege principle.

Our client would just have to invoke the start/stop URL (with a simple, random auth token that you choose - for extra safety), and EC2 will obey promptly.

  • No more AWS or EC2 semantics for them,
  • our budget runs smooth,
  • they have full control over the testing cycles, and
  • I get to have a good night's sleep!

ec2-control: writing it with Sigma

There were a few points in this projects that required some advanced voodoo on Sigma side:

  • Sigma does not natively support EC2 APIs (why should it; it's supposed to be for serverless computing 😎) so, in addition to writing the EC2 SDK calls, we would need to add a custom permission for each function policy; to compensate for the automatic policy generation aspect.
  • The custom policy would need to be as narrow as possible: just ec2:StartInstances and ec2:StopInstances actions, on just our client's staging instance. (If the URL somehow gets out and some remote hacker out there gains control of our function, we don't want them to be able to start and stop random - or perhaps not-so-random - instances in our AWS account!)
  • Both the IAM role and the function itself, would need access to the instance ID (for policy minimization and the actual API call, respectively).
  • For reusability (we devs really love that, don't we? 😎) it should be possible to specify the instance ID (and the auth token) on a per-deployment basis - without embedding the values in the code or configurations, which would get checked into version control.

Template Editor FTW

Since Sigma uses CloudFormation under the hood, the solution is pretty obvious: define two template parameters for the instance ID and token, and refer them in the functions' environment variables and the IAM roles' policy statements.

Sigma does not natively support CloudFormation parameters (our team recently started working on it, so perhaps it may actually be supported at the time you read this!) but it surely allows you to specify them in your custom deployment template - which would get nicely merged into the final deployment template that Sigma would run.

Some premium bad news, and then some free good news

At the time of this writing, both the template editor and the permission manager were premium features of Sigma IDE. So if you start writing this on your own, you would either need to pay a few bucks and upgrade your account, or mess around with Sigma's configuration files to hack those pieces in (which I won't say is impossible 😎).

(After writing this project, I managed to convince our team to enable the permission manager and template editor for the free tier as well 🤗 so, by the time you read this, things may have taken a better light!)

But, as part of the way that Sigma actually works, not having a premium account does not mean that you cannot deploy an already template- or permission-customized project written by someone else; and my project is already in GitHub so you can simply open it in your Sigma IDE and deploy it, straightaway.

"But how do I provide my own instance ID and token when deploying?"

Patience. Read on.

"Old, but not obsolete" (a.k.a. more limitations, but not impossible)

As I said before, Sigma didn't natively support CloudFormation parameters; so even if you add them to the custom template, Sigma would just blindly merge and deploy the whole thing - without asking for actual values of the parameters!

While this could have been a cause for deployment failures in some cases, lucky for us, here it doesn't cause any trouble. But still, we need to provide correct, custom values for that instance ID and protection token!

Amazingly, CloudFormation allows you to just update the input parameters of an already completed deployment - without having to touch or even re-submit the deployment template:

aws cloudformation update-stack --stack-name Whatever-Stack \
  --use-previous-template --capabilities CAPABILITY_IAM \
  --parameters \
  ParameterKey=SomeKey,ParameterValue=SomeValue ...

(That command is already there, in my project's README.)

So our plan is simple:

  1. Deploy the project via Sigma, as usual.
  2. Run an update from CloudFormation side, providing just the correct instance ID and your own secret token value.

Enough talk, let's code!

Warning: You may not actually be able to write the complete project on your own, unless we have enabled custom template editing for free accounts - or you already have a premium account.

If you are just looking to deploy a copy on your own, simply open my already existing public project from https://github.com/janakaud/ec2-control - and skip over to the Ready to Deploy section.

1. ec2-start.js, a NodeJS Lambda

Note: If you use a different name for the file, your custom template would need to be adjusted - don't forget to check the details when you get to that point.

const {ec2} = require("./util");
exports.handler = async (event) => ec2(event, "startInstances", "StartingInstances");

API Gateway trigger

After writing the code,

  1. drag-n-drop an API Gateway entry from the left-side Resources pane, on to the event variable of the function,
  2. enter a few details -
    1. an API name (say EC2Control),
    2. path (say /start, or /ec2/start),
    3. HTTP method (GET would be easiest for the user - they can just paste a link into a browser!)
    4. and a stage name (say prod)
  3. under Show Advanced, turn on Enable Lambda Proxy Integration so that we will receive the query parameters (including the auth token) in the request
  4. and click Inject.

Custom permissions tab

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": {
                "Fn::Sub": "arn:aws:ec2:${AWS::Region}:${AWS::AccountId}:instance/${EC2ID}"
            },
            "Action": [
                "ec2:StartInstances"
            ]
        }
    ]
}

2. ec2-stop.js, a NodeJS Lambda

Note: As before, if your filename is different, update the key in your custom template accordingly - details later.

const {ec2} = require("./util");
exports.handler = async (event) => ec2(event, "stopInstances", "StoppingInstances");

API Gateway trigger

Just like before, drag-n-drop and configure an APIG trigger.

  1. But this time, make sure that you select the API name and deployment stage via the Existing tabs - instead of typing in new values.
  2. Resource path would still be a new one; pick a suitable pathname as before, like /ec2/stop (consistent with the previous).
  3. Method is also your choice; natural is to stick to the previously used one.
  4. Don't forget to Enable Lambda Proxy Integration too.

Custom permissions tab

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": {
                "Fn::Sub": "arn:aws:ec2:${AWS::Region}:${AWS::AccountId}:instance/${EC2ID}"
            },
            "Action": [
                "ec2:StopInstances"
            ]
        }
    ]
}

3. util.js, just a NodeJS file

const ec2 = new (require("aws-sdk")).EC2();

const EC2_ID = process.env.EC2_ID;
if (!EC2_ID) {
    throw new Error("EC2_ID unavailable");
}
const TOKEN = process.env.TOKEN;
if (!TOKEN) {
    throw new Error("TOKEN unavailable");
}

exports.ec2 = async (event, method, resultKey) => {
    let tok = (event.queryStringParameters || {}).token;
    if (tok !== TOKEN) {
        return {statusCode: 401};
    }
    let data = await ec2[method]({InstanceIds: [EC2_ID]}).promise();
    return {
        headers: {"Content-Type": "text/plain"},
        body: data[resultKey].map(si => `${si.PreviousState.Name} -> ${si.CurrentState.Name}`).join("\n")
    };
};

Code is pretty simple - we aren't doing much, just validating the incoming token, calling the EC2 API, and returning the state transition result (e.g. running -> stopping) back to the caller as confirmation; e.g. it will appear in the our client's browser window.

(If you were wondering why we didn't add aws-sdk as a dependency despite require()ing it; that's because aws-sdk is already available in the standard NodeJS Lambda environment. No need to bloat up our deployment package with a redundant copy - unless you wish to use some cutting-edge feature or SDK component that was released just last week.)

The better part of the coordinating fat and glue, is in the custom permissions and the template:

4. Custom template

{
  "Parameters": {
    "EC2ID": {
      "Type": "String",
      "Default": ""
    },
    "TOKEN": {
      "Type": "String",
      "Default": ""
    }
  },
  "Resources": {
    "ec2Start": {
      "Properties": {
        "Environment": {
          "Variables": {
            "EC2_ID": {
              "Ref": "EC2ID"
            },
            "TOKEN": {
              "Ref": "TOKEN"
            }
          }
        }
      }
    },
    "ec2Stop": {
      "Properties": {
        "Environment": {
          "Variables": {
            "EC2_ID": {
              "Ref": "EC2ID"
            },
            "TOKEN": {
              "Ref": "TOKEN"
            }
          }
        }
      }
    }
  }
}

Note: If you used some other/custom names for the Lambda code files, two object keys (ec2Start, ec2Stop) under Resources would be different - it's always better to double-check with the auto-generated template and ensure that the merged template also displays the properly-merged final version.

Deriving that one on your own, isn't total voodoo magic either; after writing the rest of the project, just have a look at the auto-generated template tab, and write up a custom JSON - whose pieces would merge themselves into the right places, yielding the expected final template.

We accept the EC2ID and TOKEN as parameters, and merge them into the Environment.Variables property of the Lambda definitions. (The customized IAM policies are already referencing the parameters via Fn::Sub so we don't need to do anything for them here.)

Once we have the template editor in the free tier, you would certainly have much more cool concepts to play around with - and probably also figure out so many bugs (full disclaimer: I was the one that initially wrote that feature!) which you would promptly report to us! 🤗

Ready to Deploy

When all is ready, click Deploy Project on the toolbar (or Project menu).

(If you came here on the fast-track (by directly opening my project from GitHub), Sigma may prompt you to enter values for the EC2_ID and TOKEN environment variables - just enter some dummy values; we are changing them later anyways.)

If all goes well, Sigma will build the project and deploy it, and you would end up with a Changes Summary popup with an outputs section at the bottom containing the URLs of your API Gateway endpoints.

If you accidentally closed the popup, you can get the outputs back via the Deployment tab of the Project Info window.

Copy both URLs - you would be sending these to your client.

Sigma's work is done - but we're not done yet!

Update the parameters to real values

Grab the EC2-generated identifier of your instance, and find a suitable value for the auth token (perhaps a uuid -v4?).

Via AWS CLI

If you have AWS CLI - which is really awesome, by the way - the next step is just one command; as mentioned in the README as well:

aws cloudformation update-stack --stack-name ec2-control-Stack \
  --use-previous-template --capabilities CAPABILITY_IAM --parameters \
  ParameterKey=EC2ID,ParameterValue=i-0123456789abcdef \
  ParameterKey=TOKEN,ParameterValue=your-token-goes-here

(If you copy-paste, remember to change the parameter values!)

We tell CloudFormation "hey, I don't need to change my deployment definitions but want to change the input parameters; so go and do it for me".

The update usually takes just a few seconds; if needed, you can confirm its success by calling aws cloudformation describe-stacks --stack-name ec2-control-Stack and checking the Stacks.0.StackStatus field.

Via the AWS Console

If you don't have the CLI, you can still do the update via the AWS Console; while it is a bit overkill, the console provides more intuitive (and colorful) feedback regarding the progress and success of the stack update.

Complete the URLs - plus one round of testing

Add the token (?token=the-token-you-picked) to the two URLs you copied from Sigma's deployment outputs. Now they are ready to be shared with your client.

1. Test: starting up

Finally, just to make sure everything works (and avoid any unpleasant or awkward moments), open the starter-up URL in your browser.

Assuming your instance was already stopped, you would get a plaintext response:

stopped -> pending

Within a few seconds, the instance will enter running status and become ready (obviously, this transition won't be visible to the user; but that shouldn't really matter).

2. Test: stopping

Now open the stopper URL:

running -> stopping

As before, stopped status will be reached in background within a few seconds.

0. Test: does it work without the token - hopefully not?

The "unauthorized" response doesn't have a payload, so you may want to use curl or wget to verify this one:

janaka@DESKTOP-M314LAB:~ curl -v https://foobarbaz0.execute-api.us-east-1.amazonaws.com/ec2/stop
*   Trying 13.225.2.77...
* ...
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
* ...
* ALPN, server accepted to use http/1.1

> GET /ec2/stop HTTP/1.1
> Host: foobarbaz0.execute-api.us-east-1.amazonaws.com
> User-Agent: curl/7.47.0
> Accept: */*
>

< HTTP/1.1 401 Unauthorized
< Content-Type: application/json
< Content-Length: 0
< Connection: keep-alive
< Date: Thu, 18 Jun 2020 06:14:58 GMT
< x-amzn-RequestId: ...

All good!

Now go ahead - share just those two token-included URLs with your client - or whatever third party that you wish to delegate the EC2 control; and ask them to use 'em wisely and keep 'em safe.

If the third party loses the URL(s); and the bad guy who got them, starts playing with them unnecessarily (stopping and starting things rapidly - or at random hours - for example): just run an aws cloudformation update-stack with a new TOKEN - to cut out old access! Then share the new token with your partner, obviously warning them to be a lot more careful.

You can also tear down the whole thing in seconds - without a trace of existence (except for the CloudWatch logs from previous runs) - via:

  • Sigma's Undeploy Project toolbar button or Project menu item,
  • aws cloudformation delete-stack on the CLI, or
  • the AWS console.

Lastly, don't forget to stay tuned for more serverless bites, snacks and full-course meals from our team!

Wednesday, November 28, 2018

AWS: Some Tips for Avoiding Those "Holy Bill" Moments

Cloud is awesome: almost-100% availability, near-zero maintenance, pay-as-you-go, and above all, infinitely scalable.

But the last two can easily bite you back, turning that awesomeness into a billing nightmare.

And occasionally you see stories like:

Within a week we accumulated a bill close to $10K.

Holy Bill!

And here I unveil a few tips that we learned from our not-so-smooth journey of building the world's first serverless IDE, that could help others to avoid some "interesting" pitfalls.

Careful with that config!

One thing we learned was to never underestimate the power of a configuration.

If you read the above linked article you would have noticed that it was a simple misconfiguration: a CloudTrail logging config that was writing logs to one of the buckets it was already monitoring.

You could certainly come up with more elaborate and creative examples of creating "service loops" yielding billing black-holes, but the idea is simple: AWS is only as intelligent as the person who configures it.

Infinite loop

(Well, in the above case it was one of my colleagues who configured it, and I was the one who validated it; so you can stop here if you feel like it ;) )

So, when you're about to submit a new config update, try to rethink the consequences. You won't regret it.

It's S3, not your attic.

AWS has estimated that 7% of cloud billing is wasted on "unused" storage - space taken up by content of no practical use: obsolete bundles, temporary uploads, old hostings, and the like.

Life in a bucket

However, it is true that cleaning up things is easier said than done. It is way too easy to forget about an abandoned file than to keep it tracked and delete it when the time comes.

Probably for the same reason, S3 has provided lifecycle configurations - time-based automated cleanup scheduling. You can simply say "delete this if it is older than 7 days", and it will be gone in 7 days.

This is an ideal way to keep temporary storage (build artifacts, one-time shares etc.) in check, hands-free.

Like the daily garbage truck.

Lifecycle configs can also become handy when you want to delete a huge volume of files from your bucket; rather than deleting individual files (which in itself would incur API costs - while deletes are free, listing is not!), you can simply set up a lifecycle config rule to expire everything in 1 day. Sit back and relax, while S3 does the job for you!

{
    "Rules": [
        {
            "Status": "Enabled",
            "Prefix": "",
            "Expiration": {
                "Days": 1
            }
        }
    ]
}

Alternatively you can move the no-longer-needed-but-not-quite-ready-to-let-go stuff into Glacier, for a fraction of the storage cost; say, for stuff under the subpath archived:

{
    "Rules": [
        {
            "Filter": {
                "Prefix": "archived"
            },
            "Status": "Enabled",
            "Transitions": [
                {
                    "Days": 1,
                    "StorageClass": "GLACIER"
                }
            ]
        }
    ]
}

But before you do that...

Ouch, it's versioned!

(Inspired by true events.)

I put up a lifecycle config to delete about 3GB of bucket access logs (millions of files, obviously), and thought everything was good - until, a month later, I got the same S3 bill as the previous month :(

Turns out that the bucket had had versioning enabled, so deletion does not really delete the object.

So with versioning enabled, you need to explicitly tell the S3 lifecycle logic to:

in order to completely get rid of the "deleted" content and the associated delete markers.

So much for "simple" storage service ;)

CloudWatch is your pal

Whenever you want to find out the total sizes occupied by your buckets, just iterate through your AWS/S3 CloudWatch Metrics namespace. There's no way—suprise, surprise—to check bucket size natively from S3; even the S3 dashboard relies on CloudWatch, so why not you?

Quick snippet to view everything? (uses aws-cli and bc on bash)

yesterday=$(date -d @$((($(date +%s)-86400))) +%F)
for bucket in `aws s3api list-buckets --query 'Buckets[*].Name' --output text`; do
        size=$(aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time ${yesterday}T00:00:00 --end-time $(date +%F)T00:00:00 --period 86400 --metric-name BucketSizeBytes --dimensions Name=StorageType,Value=StandardStorage Name=BucketName,Value=$bucket --statistics Average --output text --query 'Datapoints[0].Average')
        if [ $size = "None" ]; then size=0; fi
        printf "%8.3f  %s\n" $(echo $size/1048576 | bc -l) $bucket
done

EC2: sweep the garbage, plug the holes

EC2 makes it trivial to manage your virtual machines - compute, storage and networking. However, its simplicity also means that it can leave a trail of unnoticed garbage and billing leaks.

EC2

Pick your instance type

There's a plethora of settings when creating a new instance. Unless there are specific performance requirements, picking a T2-class instance type with Elastic Block Store (EBS)-backed storage and 2-4 GB of RAM would suffice for most needs.

Despite being free tier-eligible, t2.micro can be a PITA if your server could receive compute-or memory-intensive loads at some point; in these cases t2.micro tends to simply freeze (probably has to do with running out of CPU credits?), causing more trouble than it's worth.

Clean up AMIs and snapshots

We habitually tend to take periodic snapshots of our EC2 instances as backups. Some of these are made into Machine Images (AMIs) for reuse or sharing with other AWS users.

We easily forget about the other snapshots.

While snapshots don't get billed for their full volume sizes, they can add up to significant garbage over time. So it is important to periodically visit and clean up your EC2 snapshots tab.

Moreover, creating new AMIs would usually mean that older ones become obsolete; they can be "deregistered" from the AMIs tab as well.

But...

Who's the culprit - AMI or snapshot?

The actual charges are on snapshots, not on AMIs themselves.

And it gets tricky because deregistering an AMI does not automatically delete the corresponding snapshot.

You usually have to copy the AMI ID, go to snapshots, look for the ID in the description field, and nuke the matching snapshot. Or, if you are brave (and lazy), select and delete all snapshots; AWS will prevent you from deleting the ones that are being used by an AMI.

Likewise, for instances and volumes

Compute is billed while an EC2 instance is running; but its storage volume is billed all the time - right up to deletion.

Volumes usually get nuked when you terminate an instance; however, if you've played around with volume attachment settings, there's a chance that detached volumes are left behind in your account. Although not attached to an instance, these still occupy space; and so AWS charges for them.

Again, simply go to the volumes tab, select the volumes in "available" state, and hit delete to get rid of them for good.

Tag your EC2 stuff: instances, volumes, snapshots, AMIs and whatnot

Tag 'em

It's very easy to forget what state was in the instance, at the time that snapshot was made. Or the purpose of that running/stopped instance which nobody seems to take ownership or responsibility of.

Naming and tagging can help avoid unpleasant surprises ("Why on earth did you delete that last month's prod snapshot?!"); and also help you quickly decide what to toss ("We already have an 11-05 master snapshot, so just delete everything older than that").

You stop using, and we start billing!

Sometimes, the AWS Lords work in mysterious ways.

For example, Elastic IP Addresses (EIPs) are free as long as they are attached to a running instance. But they start getting charged by the hour, as soon as the instance is stopped; or if they get into a "detached" state (not attached to a running instance) in some way.

Some prior knowledge about the service you're about to sign up for, can prevent some nasty surprises of this fashion. A quick pricing page lookup or google can be a deal-breaker.

Pay-per-use vs pay-per-allocation

Many AWS services follow one or both of the above patterns. The former is trivial (you simply pay for the time/resources you actually use, and enjoy a zero bill for the rest of the time) and hard to miss; but the latter can be a bit obscure and quite easily go unnoticed.

Consider EC2: you mainly pay for instance runtime but you also pay for the storage (volumes, snapshots, AMIs) and network allocations (like inactive Elastic IPs) even if your instance has been stopped for months.

There are many more examples, especially in the serverless domain (which we ourselves are incidentally more familiar with):

Each block adds a bit more to your cost.

Meanwhile, some services secretly set up their own monitoring, backup and other "utility" entities. These, although (probably!) meant to do good, can secretly seep into your bill:

These are the main culprits that often appear in our AWS bills; certainly there are better examples, but you get the point.

CloudWatch (yeah, again)

Many services already—or can be configured to—report usage metrics to CloudWatch. Hence, with some domain knowledge of which metric maps into which billing component (e.g. S3 storage cost is represented by the summation of the BucketSizeBytes metric across all entries of the AWS/S3 namespace), you can build a complete billing and monitoring solution around CloudWatch Metrics (or delegate the job to a third-party service like DataDog).

CloudWatch

CloudWatch in itself is mostly free, and its metrics have automatic summarization mechanisms so you don't have to worry about overwhelming it with age-old garbage—or getting overwhelmed with off-the-limit capacity bills.

The Billing API

Although AWS does have a dedicated Billing Dashboard, logging in and checking it every single day is not something you would add to your agenda (at least not for API/CLI minds like you and me).

Luckily, AWS offers a billing API whereby you can obtain a fairly granular view of your current outstanding bill, over any preferred time period - broken down by services or actual API operations.

Catch is, this API is not free: each invocation costs you $0.01. Of course this is negligible - considering the risk of having to pay several dozens—or even hundreds or thousands in some cases—it is worth having a $0.30/month billing monitor to track down any anomalies before it's too late.

Food for thought: with support for headless Chrome offered for Google Cloud Functions, one might be able to set up a serverless workflow that logs into the AWS dashboard and checks the bill for you. Something to try out during free time (if some ingenious folk hasn't hacked it together already).

Billing alerts

Strangely (or perhaps not ;)) AWS doesn't offer a way to put up a hard limit for billing; despite the numerous user requests and disturbing incident reports all over the web. Instead, they offer alerts for various billing "levels"; you can subscribe for notifications like "bill at x% of the limit" and "limit exceeded", via email or SNS (handy for automation via Lambda!).

My advice: this is a must-have for every AWS account. If we had one in place, we could already have saved well over thousands of dollars to date.

Credit cards

Organizational accounts

If you want to delegate AWS access to third parties (testing teams, contract-basis devs, demo users etc.), it might be a good idea to create a sub-account by converting your root account into an AWS organization with consolidated billing enabled.

(While it is possible to do almost the same using an IAM user, it will not provide resource isolation; everything would be stuffed in the same account, and painstakingly complex IAM policies may be required to isolate entities across users.)

Our CEO and colleague Asankha has written about this quite comprehensively so I'm gonna stop at that.

And finally: Monitor. Monitor. Monitor.

No need to emphasize on this - my endless ramblings should already have conveyed its importance.

So, good luck with that!