Monday, September 16, 2019

Sigma IDE now supports Python serverless Lambda functions!

Think Serverless, go Pythonic - all in your browser!

Python. The coolest, craziest, sexiest, nerdiest, most awesome language in the world.

(Okay, this news is several weeks stale, but still...)

If you are into this whole serverless "thing", you might have noticed us, a notorious bunch at SLAppForge, blabbering about a "serverless IDE". Yeah, we have been operating the Sigma IDE - the first of its kind - for quite some time now, getting mixed feedback from users all over the world.

Our standard feedback form had a question, "What is your preferred language to develop serverless applications?"; with options Node, Java, Go, C#, and a suggestion box. Surprisingly (or perhaps not), the suggestion box was the most popular option; and except for two, all other "alternative" options were one - Python.

User is king; Python it is!

We even had some users who wanted to cancel their brand new subscription, because Sigma did not support Python as they expected.

So, in one of our roadmap meetings, the whole Python story came out; and we decided to give it a shot.

Yep, Python it is!

Before the story, some credits are in order.

Hasangi, one of our former devs, was initially in charge of evaluating the feasibility of supporting Python in Sigma. After she left, I took over. Now, at this moment of triumph, I would like to thank you, Hasangi, for spearheading the whole Pythonic move. 👏

Chathura, another of our former wizards, had tackled the whole NodeJS code analysis part of the IDE - using Babel. Although I had had some lessons on abstract syntax trees (ASTs) in my compiler theory lectures, it was after going through his code that I really "felt" the power of an AST. So this is to you, Chathura, for giving life to the core of our IDE - and making our Python journey much, much faster! 🖖

And thank you Matt - for filbert.js!

Chathura's work was awesome; yet, it was like, say, "water inside water" (heck, what kind of analogy is that?). In other words, we were basically parsing (Node)JS code inside a ReactJS (yeah, JS) app.

So, naturally, our first question - and the million-dollar one, back then - was: can we parse Python inside our JS app? And do all our magic - rendering nice popups for API calls, autodetecting resource use, autogenerating IAM permissions, and so on?

Hasangi had already hunted down filbert.js, a derivative of acorn that could parse Python. Unfortunately, before long, she and I learned that it could not understand the standard (and most popular) format of AWS SDK API calls - namely named params:

s3.put_object(
  Bucket="foo",
  Key="bar",
  Body=our_data
)

If we were to switch to the "fluent" format instead:

boto.connect_s3() \
  .get_bucket("foo") \
  .new_key("bar") \
  .set_contents_from_string(our_data)

we would have to rewrite a whole lotta AST parsing logic; maybe a whole new AST interpreter for Python-based userland code. We didn't want that much of adventure - not yet, at least.

Doctor Watson, c'mere! (IT WORKS!!)

One fine evening, I went ahead to play around with filbert.js. Glancing at the parsing path, I noticed:

...
    } else if (!noCalls && eat(_parenL)) {
      if (scope.isUserFunction(base.name)) {
        // Unpack parameters into JavaScript-friendly parameters, further processed at runtime
        var pl = parseParamsList();
...
        node.arguments = args;
      } else node.arguments = parseExprList(_parenR, false);
...

Wait... are they deliberately skipping the named params thingy?

What if I comment out that condition check?

...
    } else if (!noCalls && eat(_parenL)) {
//    if (scope.isUserFunction(base.name)) {
        // Unpack parameters into JavaScript-friendly parameters, further processed at runtime
        var pl = parseParamsList();
...
        node.arguments = args;
//    } else node.arguments = parseExprList(_parenR, false);
...

And then... well, I just couldn't believe my eyes.

Two lines commented out, and it already started working!

That was my moment of truth. I am gonna bring Python into Sigma. No matter what.

Yep. A Moment of Truth.

I just can't give up. Not after what I just saw.

The Great Refactor

When we gave birth to Sigma, it was supposed to be more of a PoC - to prove that we can do serverless development without a local dev set-up, dashboard and documentation round-trips, and a mountain of configurations.

As a result, extensibility and customizability weren't quite in our plate back then. Things were pretty much bound to AWS and NodeJS. (And to think that we still call 'em "JavaScript" files... 😁)

So, starting from the parser, a truckload of refactoring was awaiting my eager fingers. Starting with a Language abstraction, I gradually worked my way through editor and pop-up rendering, code snippet generation, building the artifacts, deployment, and so forth.

(I had tackled a similar challenge when bringing in Google Cloud support to Sigma - so I had a bit of an idea on how to approach the whole thing.)

Test environment

Ever since Chathura - our ex-Adroit wizard - implemented it single-handedly, the test environment was a paramount one among Sigma's feature set. If Python were to make an impact, we were also gonna need a test environment for Python.

Things start getting a bit funky here; thanks to its somewhat awkward history, Python has two distint "flavours": 2.7 and 3.x. So, in effect, we need to maintain two distinct environments - one for each version - and invoke the correct one based on the current function's runtime setting.

(Well now, in fact we do have the same problem for NodeJS as well (6.x, 8.x, 10.x, ...); but apparently we haven't given it much thought, and it hasn't caused any major problems as well! 🙏)

pip install

We also needed a new contraption for handling Python (pip) dependencies. Luckily pip was already available on the Lambda container, so installation wasn't a major issue; the real problem was that they had to be extracted right into the project root directory in the test environment. (Contrary to npm, where everything goes into a nice and manageable node_modules directory - so that we can extract and clean up things in one go.) Fortunately a little bit of (hopefully stable!) code, took us through.

`pip`, and the Python Package Index

Life without __init__.py

Everything was running smoothly, until...

from subdirectory.util_file import util_func
  File "/tmp/pypy/ding.py", line 1, in <module>
    from subdirectory.util_file import util_func
ImportError: No module named subdirectory.util_file

Happened only in Python 2.7, so this one was easy to figure out - we needed an __init__.py inside subdirectory to mark it as an importable module.

Rather than relying on the user to create one, we decided to do it ourselves; whenever a Python file gets created, we now ensure that an __init__.py also exists in its parent directory; creating an empty file if one is absent.

Dammit, the logs - they are dysfunctional!

SigmaTrail is another gem of our Sigma IDE. When writing a Lambda piece by piece, it really helps to have a logs pane next to your code window. Besides, what good is a test environment, if you cannot see the logs of what you just ran?

Once again, Chathura was the mastermind behind SigmaTrail. (Well, yeah, he wrote more than half of the IDE, after all!) His code was humbly parsing CloudWatch logs and merging them with LogResults returned by Lambda invocations; so I thought I could just plug it in to the Python runtime, sit back, and enjoy the view.

I was terribly wrong.

Raise your hand, those who use logging in Python!

In Node, the only (obvious) way you're gonna get something out in the console (or stdout, technically) is via one of those console.{level}() calls.

But Python gives you options - say the builtin print, vs the logging module.

If you go with logging, you have to:

  1. import logging,
  2. create a Logger and set its handler's level - if you want to generate debug logs etc.
  3. invoke the appropriate logger.{level} or logging.{level} method, when it comes to that

Yeah, on Lambda you could also

context.log("your log message\n")

if you have your context lying around - still, you need that extra \n at the end, to get it to log stuff to its own line.

But it's way easier to just print("your log message") - heck, if you are on 2.x, you don't even need those braces!

Good for you.

But that poses a serious problem to SigmaTrail.

Yeah. We have a serious problem.

All those print lines, in one gook of text. Yuck.

For console.log in Node, Lambda automagically prepends each log with the current timestamp and request ID (context.awsRequestId). Chathura had leveraged this data to separate out the log lines and display them as a nice trail in SigmaTrail.

But now, with print, there were no prefixes. Nothing was getting picked up.

Fixing this was perhaps the hardest part of the job. I spent about a week trying to understand the code (thanks to the workers-based pattern); and then another week trying to fix it without breaking the NodeJS flow.

By now, it should be fairly stable - and capable of handling any other languages that could be thrown at it as time passes by.

The "real" runtime: messing with PYTHONPATH

After the test environment came to life, I thought all my troubles were over. The "legacy" build (CodeBuild-driven) and deployment were rather straightforward to refactor, so I was happy - and even about to raise the green flag for an initial release.

But I was making a serious mistake.

I didn't realize it, until I actually invoked a deployed Lambda via an API Gateway trigger.

{"errorMessage": "Unable to import module 'project-name/func'"}

What the...

Unable to import module 'project-name/func': No module named 'subdirectory'

Where's ma module?

The tests work fine! So why not production?

After a couple of random experiments, and inspecting Python bundles generated by other frameworks, I realized the culprit was our deployment archive (zipfile) structure.

All other bundles have the functions at top level, but ours has them inside a directory (our "project root"). This wasn't a problem for NodeJS so far; but now, no matter how I define the handler path, AWS's Python runtime fails to find it!

Changing the project structure would have been a disaster; too much risk in breaking, well, almost everything else. A safer idea would be to override one of the available settings - like a Python-specific environmental variable - to somehow get our root directory on to PYTHONPATH.

A simple hack

Yeah, the answer is right there, PYTHONPATH; but I didn't want to override a hand-down from AWS Gods, just like that.

So I began digging into the Lambda runtime (yeah, again) to find if there's something I could use:

import os

def handler(event, context):
    print(os.environ)

Gives:

{'PATH': '/var/lang/bin:/usr/local/bin:/usr/bin/:/bin:/opt/bin',
'LD_LIBRARY_PATH': '/var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib',
...
'LAMBDA_TASK_ROOT': '/var/task',
'LAMBDA_RUNTIME_DIR': '/var/runtime',
...
'AWS_EXECUTION_ENV': 'AWS_Lambda_python3.6', '_HANDLER': 'runner_python36.handler',
...
'PYTHONPATH': '/var/runtime',
'SIGMA_AWS_ACC_ID': 'nnnnnnnnnnnn'}

LAMBDA_RUNTIME_DIR looked like a promising alternative; but unfortunately, AWS was rejecting it. Each deployment failed with the long, mean error:

Lambda was unable to configure your environment variables because the environment variables
you have provided contains reserved keys that are currently not supported for modification.
Reserved keys used in this request: LAMBDA_RUNTIME_DIR

Nevertheless, that investigation revealed something important: PYTHONPATH in Lambda wasn't as complex or crowded as I imagined.

'PYTHONPATH': '/var/runtime'

And apparently, Lambda's internal agents don't mess around too much with it. Just pull out and read /var/runtime/awslambda/bootstrap.py and see for yourself. 😎

PYTHONPATH works. Phew.

It finally works!!!

So I ended up overriding PYTHONPATH, to include the project's root directory, /var/task/project-name (in addition to /var/runtime). If you want something else to appear there, feel free to modify the environment variable - but leave our fragment behind!

On the bright side, this should mean that my functions should work in other platforms as well - since PYTHONPATH is supposed to be cross-platform.

Google Cloud for Python - Coming soon!

With a few tune-ups, we could get Python working on Google Cloud Functions as well. It's already in our staging environment; and as soon as it goes live, you GCP fellas would be in luck! 🎉

Still a long way to go... But Python is already alive and kicking!

You can enjoy writing Python functions in our current version of the IDE. Just click the plus (+) button on the top right of the Projects pane, select New Python Function File (or New Python File), and let the magic begin!

And of course, let us - and the world - know how it goes!

No comments: