Thursday, November 23, 2017

Connecting the dots in style: Build your own Dropbox Sync in 10 minutes!

Integration, or "connecting the dots", is something that is quite difficult to avoid in the modern era of highly globalized business domains. Fortunately, integration, or "enterprise integration" in more "enterprise-y" terms, is no longer meant to be something that makes your hair stand, thanks to advanced yet user-friendly enterprise integration frameworks such as Project-X.

Today, we shall extend our helping hand to Jane, a nice Public Relations officer of the HappiShoppin supermarket service (never heard the name? yup, neither have I :)) in setting up a portion of her latest customer feedback aggregation mechanism. No worries, though, since I will be helping and guiding you all the way to the end!

The PR Department of HappiShoppin supermarket service has opened up new channels for receiving customer feedback. In addition to the former, conventional paperback feedback drop-ins, they now accept electronic feedback via their website as well as via a public Dropbox folder (in addition to social media, Google Drive, Google Forms etc). Jane, who is heading the Dropbox-driven feedback initiative, would like to set up an automated system to sync any newly added Dropbox feedback to her computer so that she can check them offline whenever it is convenient for her, rather than having to keep an eye on the Dropbox folder all the time.

Jane has decided to compose a simple "Dropbox sync" integration flow that would periodically sync new content from the feedback accumulation Dropbox folder, to a local folder on her computer.

  • On HappiShoppin's shared Dropbox account, /Feedback/Inbox is the folder where customers can place feedback documents, and Jane hopes to sync the new arrivals into /home/jane/dropbox-feedback on her computer.
  • Jane has estimated that it is sufficient to sync content once a day, as the company receives only a limited number of feedback over a given day; however, during the coming Christmas season, the company is expecting a spike in customer purchases, which would probably mean an accompanied increase in feedback submissions as well.
  • For easier tracking and maintenance, she wants the feedback files to be organized into daily subfolders.
  • In order to avoid repetitively syncing the same feedback file, Jane has to ensure that the successfully synced files are removed from the inbox, which she hopes to address by moving them to a different Dropbox folder: /Feedback/Synced.

Design of the Dropbox Sync solution

Now, before we begin, a bit about what Project-X is and what we are about to do with it:

  • Project-X is a messaging engine, which one could also call an enterprise service bus (which is also valid for the scenario we are about to tackle).
  • Project-X ingests events (or messages) from ingress connectors, subjects them to various transformations via processing elements, and emits them to other systems via egress connectors. For a single message, any number of such transformations and emissions can happen, in any order.
  • The message lifecycle described above, is represented as an integration flow. It is somewhat similar to a conveyor belt in a production line, although it can be much more flexible with stuff like cloning, conditional branching, looping and try-catch flows.
  • A set of integration projects make up an integration project, which is the basic deployment unit when it comes to Project-X runtimes such as UltraESB-X.

So, in our case, we should:

  • create a new integration project
  • create an integration flow inside the project, to represent Jane's scenario
  • add the necessary connectors and processors, and configure and wire them together
  • test the flow to see if what we assembled is actually capable of doing what Jane is expecting
  • build the project into a deployable artifact, ready to be deployed in UltraESB-X

While the above may sound like quite a bit of work, we already have a cool IDE UltraStudio that can do most of the work for us. With UltraStudio on your side, all you have to do is to drag, drop and connect the required connectors and processing elements, and everything else will be magically done for you. You can even try out your brand-new solution right there, inside the IDE, and trace your events or messages real-time as they pass through your integration flow.

So, before we begin, let's get UltraStudio installed on your system (unless you already have it, of course!).

Once you are ready, create a new Ultra Project using File → New → Project... option on the menu bar and selecting Empty Ultra Project. While creating the project, select the following components on the respective wizard pages (don't worry, in a moment we'll actually get to know what they actually are):

  • Timer Task Connector and Dropbox Connector on the Connectors page
  • JSON Processor and Flow Control processor on the Processors page

If you were impatient and had already created a project, you could always add the above components later on via the menu option Tools → Ultra Studio → Component Registry.

Now we can start by creating a new integration flow dropbox-sync-flow, by opening the Project side pane and right-clicking the src/main/conf directory.

Again, a few tips on using the graphical flow UI (in case you're wondering where on earth it is) before you begin:

  • Inside, an integration flow is a XML (Spring) configuration, which UltraStudio can alternatively represent as a composable diagram for your convenience.
  • You can switch between the XML and graphical views using the two small tabs that would appear at the bottom of an integration flow file while it is opened in the IDE. (These tabs might be missing at certain times, e.g. when the IDE is performing indexing or Maven dependency resolution; at such times, patience is a virtue!)
  • The graphical view contains a side palette with all the components (connectors and processors) that have currently been added to your project (at creation or through the Component Registry). You can browse them by clicking on the collapsible labels on the palette, and add them to the flow by simply dragging-and-dropping them into the canvas.
  • In order to mimic the message flow, components should be connected together using lines drawn between their ports (small dots of different colors that appear around the component's icon). You will get the hang of it, when you have had a look at some of the existing integration flows, or at the image of the flow that would be developing (appearing later in this article).
  • When a component requires configuration parameters, a configuration pane gets automatically opened as soon as you drop an element into the canvas (you can also open it by clicking on the component later on). If the labels or descriptions on the configuration pane are not clear enough, just switch to the Documentation tab and click on the "Read more" URL to visit the complete documentation of the element (on your favourite web browser). Also, make sure that you click the Save button (at the bottom or on the side pane) once you have made any changes.

Start the flow with a Timer Ingress Connector. This is a connector used to trigger a periodic event (similar to a clock tick) for a time-driven message flow. Let's configure it to trigger an event that would set the sync process in motion. For flexibility, we will use a cron expression instead of a simple periodic trigger.

Scheduling tab:

Polling CRON Expression 0/30 * * ? * *

Although Jane wanted to run the check only at 6 PM each day, we have set the polling time to every 30 seconds, for the sake of convenience; otherwise you'll simply have to wait until 6 PM to see if things are working :)

Next add a Dropbox Egress Connector with a List Entities Connector operation element added to the side port. You can find the connector operations by clicking on the down arrow icon against the Dropbox Connector on the component palette, which will expand a list of available connector operations.

A connector operation is an appendage that you can, well, append to a connector, which will perform some additional processing on the outgoing message in a connector-specific way. For example, for Dropbox we have a main connector, with a bunch of connector operations that represent different API operations that you can perform against your Dropbox account, such as managing files, searching, downloading, etc.

Configure the Dropbox Connector with the shared Dropbox account credentials (App ID and Access Token), and the connector operation with the Path /Feedback/Inbox.

Basic tab:

Client ID
{client ID for your Dropbox app;
visit https://www.dropbox.com/developers/apps/create to create a new app}
Access Token
{access token for your Dropbox account, under the above app;
follow https://blogs.dropbox.com/developers/2014/05/generate-an-access-token-for-your-own-account/
to obtain an access token for personal use against your own app}

List Entities, Basic tab:

Path /Feedback/Inbox

The above contraption will return a List Folder response, containing all files that are currently inside /Feedback/Inbox, as a wrapped JSON payload:

{
    "entries": [
        {
            ".tag": "file",
            "name": "johndoe.docx",
            "id": "id:12345_67_890ABCDEFGHIJ",
            ...
        }, {
            ".tag": "file",
            "name": "janedoe.txt",
            "id": "id:JIHGF_ED_CBA9876543210",
            ...
        }
    ],
    ...
}

Ah, now there's the info that we have been looking for; sitting there in boldface. Now we need to somehow pull them out.

Next add a JSON Path Extractor processor to extract out the file paths list from the above JSON response, using a JSON Path pattern: $.entries[*].name. This will store the resulting file name list in a scope variable named files, for further processing. A scope variable is a kind of temporary storage where you can retain simple values for referring later in the flow.

Variable Name files
JSON Path $.entries[*].name

Then add a ForEach Loop to iterate over the previously mentioned scope variable, so that we can process each of the observed files separately. The next processing operations will each take place within a single iteration of the loop.

Basic tab:

Collection Variable Name files
Collection Type COLLECTION
Iterating Variable Name file

Now add a new Dropbox Connector (configured with your app and account credentials as before), along with a, Download Entity connector operation, to download the file (file) corresponding to the current iteration from Dropbox into the local directory.

Tip: When you are drawing outgoing connections from ForEach Loop, note that the topmost out port is for the loop termination (exit) path, and not for the next iteration!

Basic tab:

Client ID {client ID for your Dropbox app}
Access Token {access token for your Dropbox account, under the above app}

Advanced tab:

Retry Count 3

Download Entity, Basic tab:

Path /Feedback/Inbox/@{variable.file}
Destination /home/jane/dropbox-feedback/@{current.timestamp.yyyy-MM-dd_HH-mm}

Next add another Dropbox Connector (configured with your app and account credentials) with a Move Entity connector operation, to move the original file to /Feedback/Synced so that we would not process it again. We will set the Retry Count property of the connector to 3, to make a best effort to move the file (in case we face any temporary errors, such as network failures, during the initial move). We will also enable Auto-Rename on the connector operation to avoid any possible issues resulting from files with same name being placed at /Feedback/Inbox at different times (which could cause conflicts during movement).

Move Entity, Basic tab:

Path /Feedback/Inbox/@{variable.file}
Destination /Feedback/Synced/@{variable.file}

Now add a Successful Flow End element to signify that the message flow has completed successfully.

Now we need to connect the processing elements together, to resemble the following final flow diagram:

Dropbox Sync: Sample Flow

Finally, now we are ready to test our brand new Dropbox sync flow!

Before proceeding, ensure that your Dropbox account contains the /Feedback/Inbox and /Feedback/Synced directories.

Create an UltraStudio run configuration by clicking Run → Edit Configurations... on the menu, and selecting UltraESB-X Server under the Add New Configuration (+) button on the top left.

Now, with everything in place, select Run → Run configuration name from the menu to launch your project!

If everything goes fine, after a series of blue-colored logs, you'll see the following line at the end of the Run window:

2017-11-23T11:45:27,554 [127.0.1.1-janaka-ENVY] [main] [system-] [XEN45001I013]
INFO XContainer AdroitLogic UltraStudio UltraESB-X server started successfully in 1 seconds and 650 milliseconds

If you get any errors (red) or warnings (yellow) before this, you would have to click Stop (red square) on the Run window to stop the project, and dig into the logs to get a clue as to what might have gone wrong.

Once you have things up and running, open your Dropbox account on your favourite web browser, and drop some files into the /Feedback/Inbox directory.

After a few seconds (depending on the cron expression that you provided above), the files you dropped there will magically appear in a folder /home/jane/dropbox-feedback/. After this, if you check the Dropbox account again, you will notice that the original files have been moved from /Feedback/Inbox to /Feedback/Synced, as we expected.

Now, if you drop some more files into /Feedback/Inbox, they will appear under a different folder (named with the new timestamp) under /home/jane/dropbox-feedback. This would not be a problem for Jane, as in her case the flow will only be triggered once a day, resulting in a single directory for each day.

See? That's all!

Now, all that is left is to call Jane and let her know that her Dropbox integration task is ready to go alive!

Sunday, November 19, 2017

Out, you wretched, corrupted cache entry... OUT! (exclusively for the Fox on Fire)

While I'm a Firefox fan, I often run into tiny issues of the browser, many of which cannot be reproduced in clean environments (and hence are somehow related to the dozens of customizations and the horde of add-ons that I take for granted).

I recently nailed one that had been bugging me for well over three years—practically ever since I discovered FF's offline mode.

While the offline mode does an excellent job almost all the time, sometimes it can screw up your cache entries so bad that the only way out is a full cache clear. This often happens if you place the browser in offline mode while a resource (CSS, JS, font,... and sometimes even the main HTML page, esp. in case of Wikipedia).

If you are unfortunate enough to run into such a mess, from then onwards, whenever you load the page from cache, the cache responds with the partially fetched (hence partially cached) broken resource—apparently a known bug. No matter how many times you refresh—even in online mode—the full version of the resource will not get cached (the browser would fetch the full resource and just discard it secretly, coughing up the corrupted entry right away during the next offline fetch).

Although FF has a "Forget about this site" option that could have shed some light (as you could simply ask the browser to clear just that page from the cache), the feature is bugged as well, and ends up clearing your whole cache anyway; so you have no easy way of discarding the corrupted entry in isolation.

And the ultimate and unfortunate solution, for getting the site to work again, would be to drop several hundred megabytes of cache, so that the browser could start from zero; or to stop using the site until the expiry time of the resource is hit, which could potentially be months ahead in the future.

The good news is, FF's Cache2 API allows you to access the offending resource by URL, and kick it out of the cache. The bad news, on tbe other hand, is that although there are a few plugins that allow you to do this by hand, all of them are generic cache-browsing solutions, so they take forever to iterate through the browser cache and build the entry index, during which you cannot practically do anything useful. I don't know how things would be on a fast disk like a SSD, but on my 5400-RPM magnetic disk it takes well over 5 minutes to populate the list.

But since you already know the URL of the resource, why not invoke the Cache2 API directly with a few lines of code, and kick the bugger out yourself?

// load the disk cache
var cacheservice = Components.classes["@mozilla.org/netwerk/cache-storage-service;1"]
    .getService(Components.interfaces.nsICacheStorageService);
var {LoadContextInfo} = Components.utils.import("resource://gre/modules/LoadContextInfo.jsm",{})
var hdcache = cacheservice.diskCacheStorage(LoadContextInfo.default, true);

// compose the URL and submit it for dooming
var uri = Components.classes["@mozilla.org/network/io-service;1"]
    .getService(Components.interfaces.nsIIOService).newURI(prompt("Enter the URL to kick out:"), null, null);
hdcache.asyncDoomURI(uri, null, null);

Yes, that's all. Once the script is run on the browser console, with uri populated with the URL of the offending resource (which in this case is read in using a JS prompt()), poof! You just have to reload the resource (usually by loading the parent HTML page), taking care not to hit the offline mode prematurely, to get the site working fine again.

And that's the absolute beauty of Firefox.

Expires? Pragma? Cache-Control? Anybody home?... Yay! (exclusively for the Fox on Fire)

As you may already have noticed, from my previous articles and my (limited) GitHub contributions, that I am an absolute Firefox (FF) fan (though I cannot really call myself a Mozillian yet). Some of my recent endeavors with FF brought me closer to FF's internal and add-on APIs, which happen to be somewhat tough but quite interesting to work with.

I have been running an ancient FF version (45.0) until recent times, as I had too much to lose (and migrate) in terms of customizations if I decided to upgrade. Besides, I loved the single-process elegance of FF, amidst the endless "multiprocess Chrome is eating up my RAM!" complaints from the "Chromians" all around. I even downloaded a Nightly several months ago, but did not proceed to install it as it would simply involve too much hassle. Meanwhile, needless to say, I was continually being bombarded with sites howling "WTF is your browser? It's stone-age!" (in more civilized jargon, of course).

About a month ago I finally gave in, and installed the old nightly just to get the hang of the new FF. I must say I wasn't disappointed—in fact, I was a bit impressed. The multiprocess version didn't seem to be as bad as Chrome in terms of memory footprints (although I had to keep on restarting the browser every week; possibly due to some memory leaks introduced by my customizations?), and the addons too had matured to be e10s compatible. All was going fine...

... until I tried to reload the Gmail mobile page that I just visited, in offline mode.

I was baffled when, instead of the cached page, I was smacked with an "Offline Mode" error message.

What the... has FF stopped caching pages?

Nope, some pages still get loaded perfectly under offline mode.

Then where's the problem?

Maybe Gmail has set some brand-new cache-prevention header, right by the time I was busy setting up my new browser?

Luckily I had left my old browser intact; and no, it continued to cache the same page just fine.

Maybe the actual response from mail.google.com would give a clue.

Well, that was it. Gmail had been sending an Expires: Mon, 01 Jan 1990 00:00:00 GMT header, and my dear old FF 45.0 seems to have somehow been neglecting it all this time, hence unintentionally offering me the luxury of being able to view cached Gmail mobile pages all the way until the end of the current session.

Now that the "feature" was gone, I was basically doomed.

Worse still, the new "compiance" had rendered several other sites uncacheable: including Facebook, Twitter and even Google Search.

Of course you realize, this means war.

Reading a few MDN docs and browsing the FF Addons site, I soon realized that I was going to be all alone on this one. So I set forth, writing a response interceptor based on the Observer_Notifications framework, to strip off the expiration-related headers from all responses, hopefully before they have a chance of reaching (correction: not reaching) the Cache2.

Cc["@mozilla.org/observer-service;1"].getService(Ci.nsIObserverService).addObserver({
	observe: function(aSubject, aTopic, aData) {
		var channel = aSubject.QueryInterface(Ci.nsIHttpChannel);
		channel.setResponseHeader("Expires", "", false);
		channel.setResponseHeader("expires", "", false);
		channel.setResponseHeader("cache-control", "", false);
		channel.setResponseHeader("Cache-Control", "", false);
		channel.setResponseHeader("pragma", "", false);
		channel.setResponseHeader("Pragma", "", false);
	}
}, "http-on-modify-request", false);

That's all. Just 11 lines of code, a copy-paste into the browser console (Ctrl+Shift+F12), and a gentle touch on the Enter key.

No, hit it down hard, because you're going to nail it, once and for all!

I registered the handler on the browser, with a handy KeyConfig shortcut to toggle it when required (with some help from my own ToggleService framework; and all was back to normal. In fact it was better than normal, because some sites that were skipping the cache so far, started submitting to my desires right away; and because some self-destruct pages started to live across browser sessions—I could restart the btowser and enjoy viewing the Facebook, Gmail and other pages that usually kept on disappearing from the cache after each restart.

All of it, thanks to the amazing extensibility and customizability of Firefox.

Beating the GAS clock: Say Hello to MemsheetApp!

Google's Apps Script framework is really awesome as it helps—newbies and experts alike—to leverage the power of Google (as well as external) services for their day-to-day gimmicks—and sometimes even for enterprise-level integration. SpreadsheetApp is one of its best-known features, which allows one to create and manage Google spreadsheet documents via simple JS calls.

As simple as it may seem, misuse of SpreadsheetApp can easily lead to execution timeouts and fast exhaustion of your daily execution time quota (which is quite precious, especially when you are on the free plan). This is because most of the SpreadsheetApp operations take a considerable time to complee (possibly because they internally boil down to Google API calls? IDK) often irrespective of the amount of data read/written in each call.

In several of my projects, where huge amounts of results had to be dumped into GSheets in this manner, I ran into an impassable time barrier: no matter how much I optimized, the scripts kept on shooting beyond the 5-minute time limit. I had to bring in-memory caching to the picture, first per row, then per logical row set and finally for the whole spreadsheet (at which point the delays virtually disappeared).

  matrix = [];
  ...

      if (!matrix[row]) {
        matrix[row] = new Array(colCount);
      }
      for (k = 0; k < cols.length; k++) {
        matrix[row][k] = cols[k];
      }
  ...
  sheet.getRange(2, 2, rowCount, colCount).setValues(matrix);

Then, recently, I happened to run into a refactoring task on a GSheetd script written by a different developer. This time it was a different story, as every cell was referenced by name:

  for (i = 0; i < data.length; i++){
    spreadsheet.getRange("A" + (i + 2)).setValue((i + 2) % data.length);
    spreadsheet.getRange("B" + (i + 2)).setValue(data[i].sum);
    ...
  }

And there were simply too many references to fix by hand, and too many runtime data to utilize the default SpreadsheetApp calls without running into a timeout.

Then I had an idea, why can't I have an in-memory wrapper for SpreadsheetApp, which would give us the speed advantage without having to change existing code?

So I wrote my own MemsheetApp that uses a simple 2-D in-memory array to mimic a spreadsheet, without writing-through every operation to the API.

One problem I faced was that there is no specific way (call or event) to "flush" the data accumulated in-memory while retaining compatibility with the SpreadsheetApp API. The best thing I could find was SpreadsheetApp.flush() which, in normal use, would flush data of all open spreadsheets. In my case I had to explicitly retain references to all MemsheetApp instances created through my app, and flush them all during the global MemsheetApp.flush() call.

So, here goes the MemsheetApp source (hopefully I'll make it a GitHub gist soon):

MemsheetApp = {
  list: [],
  create: function(_name) {
    sheet = {
      //sheet: SpreadsheetApp.create(_name),
      name: _name,
      rows: [],
      maxRow: 0,
      maxCol: 0,
      getId: function() {
        return this.sheet.getId();
      },
      getRange: function(col, row) {
        if (!row) {
          row = col.substring(1);
          col = col.substring(0, 1);
        }
        
        if (isNaN(row)) {
          throw new Error("Multicell ranges not supported unless separating col and row in separate parameters");
        }
        
        c = col;
        
        if (typeof col  === "string"){
          c = col.charCodeAt(0) - 65;
        
          // this supports 2 letters in col
          if (col.length > 1) {
            //"AB": 1 * (26) + 1 = 27 
            c = ( (c + 1) * ("Z".charCodeAt(0) - 64)) + (col.charCodeAt(1) - 65);
          }
        }
        
        if (this.maxCol < c) {
          this.maxCol = c;
        }
        r = parseInt(row) - 1;
        if (this.maxRow < r) {
          this.maxRow = r;
        }
        
        if (!this.rows[r]) {
          this.rows[r] = [];
        }
        if (!this.rows[r][c]) {
          this.rows[r][c] = 0;
        }
        
        return {
          rows: this.rows,
          getValue: function() {
            return this.rows[r][c];
          },
          setValue: function(value) {
            this.rows[r][c] = value;
          }
        }
      }
    };
    this.list.push(sheet);
    return sheet;
  },
  flush: function() {
    for (i in this.list) {
      l = this.list[i];
      rowDiff = l.rows.length - Object.keys(l.rows).length;
      if (rowDiff > 0) {
        // insert empty rows at missing row entries
        emptyRow = [];
        for (c = 0; c < l.rows[0].length; c++) {
          emptyRow.push("");
        }
        for (j = 0; j < l.rows.length && rowDiff > 0; j++) {
          if (!l.rows[j]) {
            l.rows[j] = emptyRow;
            rowDiff--;
          }
        }
      }

      l.sheet.getActiveSheet().getRange(1, 1, l.maxRow + 1, l.maxCol + 1).setValues(l.rows);
    }
  }
}

As you may notice, it offers an extremely trimmed-down version of the SpreadsheetApp API, currently supporting only getValue(), setValue() and setNumberFormat() methods of Range and create() and flush() of SpreadsheetApp. One could simply add new functionalities by creating implementations (or wrappers) for additional methods at appropriate places in the returned object hierarchy.

If you are hoping to utilize MemsheetApp in your own Apps Script project, all you have to do extra is to ensure that you call MemsheetApp.flush() once you are done with inserting your data. This method is safe to call on regular SpreadsheetApp module as well, which means that you can convert your existing SpreadsheetApp-based code to be compatible with just one extra harmless line of code.

However, the coolest thing is that you can switch between SpreadsheetApp and MemsheetApp once you have refactored the code accordingly:

SheetApp = MemsheetApp;
// uncomment next line to switch back to SpreadsheetApp
// SheetApp = SpreadsheetApp;

// "SpreadsheetApp" in implementation code has been replaced with "SheetApp"
var ss1 = SheetApp.create("book1").getActiveSheet();

ss1.getRange(2, 2, 10, 3).setNumberFormat(".00");
ss1.getRange("A2").setValue(10);
...

var ss2 = SheetApp.create("book2").getActiveSheet();

ss2.getRange(2, 1, 1000, 1).setNumberFormat("yyyy-MM-dd");
ss2.getRange(2, 2, 1000, 1).setNumberFormat(".0");

// assume "inputs" is a grid of data, with dates in first column
// and 1-decimal-place precision numbers in second column
inputs.forEach(function(value, index) {
    ss2.getRange("A" + (index + 1)).setValue(value[0]);
    ss2.getRange("B" + (index + 1)).setValue(value[1]);
});
...

// this will push cached data to "ss1" and "ss2", from respective in-memory grids;
// and will have a similar effect (flushing all pending changes) when SpreadsheetApp is in use
SheetApp.flush();

MemsheetApp is a long way from being a fully-fledged wrapper, so feel free to improve it as you see fit; and share it here or somewhere public for the benefit of the Apps Script community.

Stop pulling out your (JSON-ey) hair; just drag, drop and connect!

The app is finally taking shape.

Data is sitting in your datastore.

Users are about to start bombarding the front-end with requests.

Quite a familiar scenario for any standard web/mobile app developer.

You have approached the Big Question:

How to get the balls rolling?

How to transform user actions into actual backend datastore operations?

One (obvious) way would be to build an ORM, configure a persistence provider (such as Hibernate-JPA) and link the pieces together through an MVC-style contraption.

But what if you don't want all those bells and whistles?

Or, what if all you need is just a quick 'n dirty PoC to impress your client/team/boss, while you are struggling to get the real thing rolling?

Either way, what you need is the "glue" between the frontend and the data model; or "integration" in more techy jargon.

UltraESB-X, successor of the record-breaking UltraESB, is an ideal candidate for both your requirements. Being a standalone yet lean runtime—just 9 MB in size, and runnable with well below 100 MB of heap—you could easily deploy one in your own dev machine, prod server, cloud VM, Docker, or on IPS, the dedicated lifecycle manager for on-premise and (coming up) cloud deployments.

UltraESB-X logo

As if that wasn't enough, building your backend becomes a simple drag-and-drop game, with the cool UltraStudio IDE for integration project development. Just pick the pieces, wire them together under a set of integration flows—one per each of your workflows, with interleaving subflows where necessary—and have your entire backend tested, verified and ready for deployment within minutes.

UltraStudio logo

We have internally used UltraESB-X seamlessly with JPA/Hibernate, whose details we hope to publish soon—in fact, there's nothing much to publish, as it all just works out of the box, thanks to the Spring-driven Project-X engine powering the beast.

Project-X logo

That being said, all you need right now is that QnD solution to wow your boss, right?

That's where the JSON Data Service utility comes into play.

JSON data service

Tiny as it may seem, the JSON Data Service is a powerful REST-to-CRUD mapper. It simply maps incoming REST API requests into SQL, executing them against a configured database and returning the results as JSON. Exactly what you need for a quick PoC or demo of your app!

We have a simple yet detailed sample demonstrating how to use the mapper, but all in all it's just a matter of specifying a set of path-to-query mappings. The queries can utilize HTTP path and query parameters to obtain inputs. SQL column name aliases can be used to control what fields would be returned in the response. The HTTP method of the inbound request (GET, POST, PUT, DELETE) decides what type of operation (create, read, update, delete) would be invoked. Of course, you can achieve further customization (adding/modifying fields, transforming the result to a different format such as XML, as well as audit actions such as logging the request) by simply enhancing the integration flow before the response is returned to the caller.

For example, here are some REST API operations, with their corresponding JSON Data Service configurations (all of which could be merged into a single integration flow, to share aspects like authentication and rate limiting):

Assuming

  • a book API entity to be returned to the frontend:
    {
    	"name": "book_name",
    	"author": "author_name",
    	"category": "category_name"
    }
  • a BOOK table:
    (
    	ID SMALLINT,
    	NAME VARCHAR(25),
    	AUTHOR_ID SMALLINT,
    	CATEGORY VARCHAR(25)
    )
  • and an associated AUTHOR table:
    (
    	ID SMALLINT,
    	NAME VARCHAR(25)
    )

The following API endpoints:

REST path operation
GET /books?start={offset}&limit={count} return all books, with pagination (not including author details)
GET /books/{id} return a specific book by ID, with author details
GET /books/search?author={author} return all books of a given author

could be set up with just the following data service configuration mapping (the rest of the steps being identical to those in our dedicated sample; just ensure that you maintain the order, and note the extra SINGLE: in front of the 2nd query):

key value
/books/search?author={author:VARCHAR}
SELECT B.NAME AS name, B.CATEGORY AS category
    FROM BOOK B, AUTHOR A
    WHERE B.AUTHOR_ID = A.ID AND A.NAME = :author
/books/{id:INTEGER}
SINGLE: SELECT B.NAME AS name, A.NAME AS author, B.CATEGORY AS category
    FROM BOOK B, AUTHOR A
    WHERE B.AUTHOR_ID = A.ID AND B.ID = :id
/books?start={offset:INTEGER}&limit={count:INTEGER}
SELECT NAME AS name, CATEGORY AS category 
    FROM BOOK LIMIT :offset, :count

See? Anybody with a basic SQL knowledge can now set up a fairly complex REST API, without writing a single line of code, thanks to the JSON Data Service!

Project-X S01E01: Pilot

P.S.: Okay, while the Pilot is supposed to arouse curiosity and keep you on the edge of the seat for S01E02, I'm not sure how well I've done that—rereading what I just completed writing. Anyway, see for yourself!

In February 2017, something happened. Something that has never been seen, and rarely been heard, ever before.

Project-X.

Project-X movie poster

No, not the movie! That was in 2012!

Engine. Messaging Engine.

Project-X logo

A lean (~9 MB), fast (yup, benchmarked) and reliable messaging engine.

But not just a messaging engine.

The original authors supposed it to be an Enterprise Service Bus (ESB). But, over time, it evolved.

Into a simple yet formidable integration middleware product.

Capable of connecting the dots: you, your customers, your partners, and the wide, wide world (WWW).

to the rescue!

Proven by powering the cores of many other integration solutions, including the famous B2B AS2 trading platform.

And, most importantly, something that can help you—and your company—tackle its integration challenges at a fraction of the time, effort and cost. Something that would allow you to draw your solution rather than coding or configuring it. Something that would make your PoC as good as your final draft, as they would essentially be the same.

Something that would make you enjoy integration, rather than hating it.

While you can always explore our official documentation to unravel how all this works (yup, I heard that yawn :)) the following couple of facts is all you need to know in order to grab Project-X by its horns:

Project-X deals with messages, or distinct and quantifiable bits of information (a clock tick, a HTTP request, an event/message over JMS/Kafka, a file dropped into your SFTP folder, a neutron emitted from a U-235, an interplanetary collision,... you name it).

U-235 fission

The beauty is the fact that almost every bit of interaction in an integration (connecting-the-dots) scenario can be mapped into a message. Care to see some examples?

  • "A consumer is calling your API" maps into "a HTTP request"
  • "Your customer just bought 3 techno-thrillers from your ebook store" could map to "an event over Kafka"
  • "Your partner just sent you an invoice" could map into "a file dropped into your SFTP folder"

Being the expert on messages, Project-X can take up the rest.

Project-X magical expertise

Project-X can consume messages from an insanely wide variety of sources. Ingress connectors bring in these messages into the Project-X engine. We already have connectors for all of the above examples, except maybe for the last (left as an exercise for the reader).

NIO HTTP ingress connector

Project-X can emit messages into an equally wide array of destinations over different media. Egress connectors are the ones that do this.

NIO HTTP egress connector

In between consumption and emission, a message can go through all sorts of weird experiences, which could consist of transormation, extraction, enrichment, conditional branching, looping, split-aggregation, throttling, exceptions,... and a lot more of other unimaginable stuff. Processing elements perform all this magic.

header conditions evaluator

(By the way, This document deals with all the gruesome details, in case you are interested.)

An ingress connector, a chain of processing elements and an egress connector, together make up an integration flow that represents everything that a message is destined to go through, given that it is sucked in by the ingress connector. A simple analogy is a conveyor belt on a production line, where something comes in and something (similar or different) comes out. Forget not, however, that depending on your requirement, an integration flow can be made as complex, rich and flexible as you like; your imagination being the only limit.

integration flow using the above components

Related integration flows are often bundled into a single integration project (yup, we love that word "integration"), which can be deployed as a single unit of functionality in a Project-X engine such as UltraESB-X.

integration architecture

Project-X keeps track of each and every message flying through the integration flows in every single project that is deployed in its runtime, and fully manages the lifecycle of each message (including allocating its required resources, routing it to the correct connector, pumping it through the flow, handling failures, gathering metrics, and cleaning up after its death/completion).

On top of all this, Project-X has its own bag of tricks, goodies and other cool stuff:

For you, the tech-savvy:

zero-copy proxying

For you, developers and architects:

  • A five-minute prototyping platform for your next integration project, whose thin-slice PoC outcome would eventually evolve into a fully-blown production deployment
  • An intuitive, DIY integration surface with the familiar notions of messages, events, connectors and flows, instead of painful XML configurations
  • A warehouse of ready-made connectors and processing elements to choose from
  • A simple and flexible way to create your own connectors and processors, and share/reuse them across projects (so you never have to reinvent the wheel across your colleagues, teams or departments... or even companies)
  • A super-cool IDE where you can graphically compose your integration solution (drag-and-drop, set-up and wire-together) and test, trace and debug it right away

UltraStudio integration flow message tracing

For you, deployers and sysadmins:

  • A pure Java-based runtime (no native libs, OSGi, voodoo or animal sacrifices)
  • Pluggable, REST-based, secure management API for remote administration
  • Pluggable analytics via Elasticsearch metrics collectors
  • Ready-made runtime bundled with the IDE for seamless dev-test cycles
  • Quick-and-dirty testing with ready-made Docker images: choose between slim and rich
  • A production bundle complete with a daemon (service-compatible) deployment, and in-built statistics and management servers
  • A tooling distribution for management operations: a CLI for engine governance, and ZooKeeper client/server bundles for clustering
  • A fully-fledged management console for tracking, monitoring and managing your deployment, complete with statistics, alerting and role-based access control
  • A fully-fledged containerized deployment platform, that can run in-cloud or on-premise, for deploying and managing Project-X clusters with just a few mouse clicks (and keystrokes)

Project-X integration solution development cycle

To be continued...

(Keep in touch for S01E02!)

Wednesday, September 13, 2017

Apps Script Navigator: UrlFetchApp Empowered with Cookie Management, Re-login and Much More!

If you are a fan of Google Apps Script, you would definitely have played around with its UrlFetchApp module, offering a convenient HTTP(S) client interface. As a frequent user of the module, although I admit that it is really cool, I always felt I needed a richer client, capable of handling stuff like automatic cookie management, proper redirection and logout detection with automatic re-login.

Hence came into being Navigator, a wrapper for UrlFetchApp that incorporates some of the basic HTTP client operations into convenient wrapper methods while addressing some of the pain points of the original UrlFetchApp module.

Unfortunately due to a bug in Apps Script framework, Navigator is not yet fully compatible with Apps Script editor's autocompletion feature. Hence for now we will have to depend on the comments in the source as documentation. Here is a summary of the features and utilities of the module:

A Navigator can be constructed using:

/**
 * a Navigator
 * invocation: Navigator.navigator(baseUrl)
 * returns a new Navigator object based on the given base URL (protocol://host:port/resource/path).
 * @class 
 * @param {baseUrl} default base URL for outbound requests
 * @implements {Navigator}
 * @return {Navigator} a new Navigator
 */
function Navigator(baseUrl)

The Navigator object currently supports the following methods:

/**
 * executes a GET request
 * @param {path} the destination path (relative or absolute)
 * @return the response payload
 */
Navigator.prototype.doGet

/**
 * executes a POST request
 * @param {path} the destination path (relative or absolute)
 * @param {payload} the payload (will be {@link UrlFetchApp}-escaped unless a String) to be sent with the
 *                  request; i.e. sent verbatim in case it is a string, or with escaping otherwise
 * @param {headers} an array of key-value pair headers to be sent with the request
 * @return the response payload
 */
Navigator.prototype.doPost

/**
 * executes an arbitrary request in {@link UrlFetchApp} style, for cases where you want to directly
 * manipulate certain options being passed to UrlFetchApp.fetch. However this still provides the
 * built-in enhancements of Navigator such as automatic cookie management.
 * @param {path} the destination path (relative or absolute)
 * @param {options} a {@link UrlFetchApp}-compatible options object
 * @return the response payload
 */
Navigator.prototype.sendRequest

The following configurator methods decide the behaviour of various features of Navigator:

/**
 * if set, cookies will be saved in {@link PropertiesService.getScriptProperties()}
 * @param {saveCookies} true if properties should be saved
 */
Navigator.prototype.setSaveCookies

/**
 * if saveCookies is set, decides the base username for saving cookies in the properties store (key {cookieusername}_cookie_{cookiename})
 * @param {cookieUsername} base username for cookies
 */
Navigator.prototype.setCookieUsername

/**
 * updates the local cookie cache with cookies received from a request, and returns the computed 'Cookie' header
 * @param {cookie} the current 'Cookie' header (string)
 * @param {rawCook} the cookie string ('Set-Cookie' header) received in a request
 * @return the updated 'Cookie' header string
 */
Navigator.prototype.updateCookies

/**
 * sets an absolute (starting with protocol://) or relative path for login requests to base website
 * @param {loginPath} path for login requests
 */
Navigator.prototype.setLoginPath

/**
 * sets the payload to be submitted during login (for automatic relogin)
 * @param {loginPayload} the login request payload
 */
Navigator.prototype.setLoginPayload

/**
 * if set, an automatic relogin will be performed whenever this content fragment is encountered in the response body
 * @param {logoutIndicator} content indicating a logout, for attempting relogin
 */
Navigator.prototype.setLogoutIndicator

/**
 * if set, when an automatic login is executed during a URL request, the original request will be replayed after login
 * @param {refetchOnLogin} true if refetch is required in case of a relogin
 */
Navigator.prototype.setRefetchOnLogin

/**
 * if set, logs would be generated for each request
 * @param {debug} true if request debug logging should be enabled
 */
Navigator.prototype.setDebug

The internal state of Navigator (such as the currently active cookies) can be obtained via the following methods:

/**
 * returns current 'Cookie' header
 * @return current 'Cookie' header string
 */
Navigator.prototype.getCookies

/**
 * returns headers received in the last navigation
 * @return headers from the last navigations
 */
Navigator.prototype.getLastHeaders

Navigator also provides some handy utility functions for extracting content from navigated pages, including those in the vicinity of HTML tags:

/**
 * similar to {@link extract} but is specialized for extracting form field values ("value" attributes)
 * @param {body} the HTML payload string
 * @param {locator} locator of the form field to be extracted (appearing before value)
 * @return value of the form field
 */
function getFormParam(body, locator)

/**
 * extracts a given tag attribute from a HTML payload based on a given locator; assumes locator appears before the attribute
 * @param {body} the HTML payload string
 * @param {key} key of the tag attribute
 * @param {locator} locator of the form field to be extracted (appearing before key)
 * @return value of the form field
 */
function extract(body, key, locator)

/**
 * similar to {@link extract} but performs a reverse match (for cases where the locator appears after the attribute)
 * @param {body} the HTML payload string
 * @param {key} key of the tag attribute
 * @param {locator} locator of the form field to be extracted (appearing after key)
 * @return value of the form field
 */
function extractReverse(body, key, locator)

Here are a few snippets utilizing the above features. I have masked the actual URLs and some of the parameters being passed, but you could get the hang of Navigator's usage.

Setting up automatic login and cookie saving:

var nav = new Navigator.Navigator("http://www.example.com");
nav.setSaveCookies(true);
nav.setCookieUsername(email);

// login form at http://www.example.com/login
nav.setLoginPath("login");

// will output all URLs, headers, cookies and payloads via Logger
nav.setDebug(true);

// for static login forms, this payload will be submitted during automatic log-in (if enabled)
nav.setLoginPayload({
	email: email,
	password: password
});

// only logged-out pages contain this; if we see this, we know we're logged out
nav.setLogoutIndicator("password_reset");

// if you request /home, and Navigator happens to find itself logged out and logs in again,
// setting this to true will make Navigator re-fetch /home right after the re-login
nav.setRefetchOnLogin(true);

// try #1; will automatically log in if stored cookies are already expired
var str = nav.doGet("daily");
if (str.indexOf("Get daily prize now") > 0) {
	str = nav.doGet("daily"); // try #2
	if (str.indexOf("Get daily prize now") > 0) {
		// notify failure
	}
}

Extracting HTML and form parameters for composing complex payloads:

// n is a Navigator

/* this will encode the payload as a regular form submission, just like UrlFetchApp
   Content-Type has no significance; if you want to send JSON you have to
   JSON.stringify() the payload yourself and pass the resulting string as the payload */

r = n.doPost("submit_path", {
	name: "name",
	email: email,
	"X-CSRF-Token": Navigator.extractReverse(s, "content", 'itemprop="csrf-token"')
}, {
	"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
	"X-Requested-With": "XMLHttpRequest"
});

/* sometimes, due to issues like empty param values (https://issuetracker.google.com/issues/36762225)
   you will have to do the encoding on your own, in which case you can directly submit a string payload
   which will be passed to UrlFetchApp verbatim (with escaping disabled) */

var payload = "authenticity_token=" +
	encodeURIComponent(Navigator.getFormParam(s, 'name="authenticity_token"')) +
	"&timestamp=" + Navigator.getFormParam(s, 'name="timestamp"') +
	"&spinner=" + Navigator.getFormParam(s, "spinner") + "&" +
	Navigator.extract(s, "name", "js-form-login") + "=USERNAME&login=&" +
	Navigator.extract(s, "name", "js-form-password") + "=PASSWORD&password=";

s = n.doPost("user_sessions", payload, {
	"Origin": "https://www.example.com",
	"X-CSRF-Token": Navigator.extractReverse(s, "content", 'itemprop="csrf-token"'),
	"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
	"X-Requested-With": "XMLHttpRequest"
});

Obtaining headers and cookies retrieved in the last response:

Logger.log(n.getLastHeaders());
Logger.log(n.getCookies());

You can use Navigator in your own Apps Script project, by adding it as an external library (Project key: MjQYH5uJpvHxeouBGoyDzheZkF1ZLDPsS). The source is also available via this Apps Script project, in case you are interested in peeking at the innards, or want to customize it with your own gimmicks. Should you decide that you need more, which may also be useful for others (including me), please do share your idea here (along with the implementation, if you already have one) so that I can incorporate it to Navigator.

Happy scripting!