Sunday, November 19, 2017

Beating the GAS clock: Say Hello to MemsheetApp!

Google's Apps Script framework is really awesome as it helps—newbies and experts alike—to leverage the power of Google (as well as external) services for their day-to-day gimmicks—and sometimes even for enterprise-level integration. SpreadsheetApp is one of its best-known features, which allows one to create and manage Google spreadsheet documents via simple JS calls.

As simple as it may seem, misuse of SpreadsheetApp can easily lead to execution timeouts and fast exhaustion of your daily execution time quota (which is quite precious, especially when you are on the free plan). This is because most of the SpreadsheetApp operations take a considerable time to complee (possibly because they internally boil down to Google API calls? IDK) often irrespective of the amount of data read/written in each call.

In several of my projects, where huge amounts of results had to be dumped into GSheets in this manner, I ran into an impassable time barrier: no matter how much I optimized, the scripts kept on shooting beyond the 5-minute time limit. I had to bring in-memory caching to the picture, first per row, then per logical row set and finally for the whole spreadsheet (at which point the delays virtually disappeared).

  matrix = [];
  ...

      if (!matrix[row]) {
        matrix[row] = new Array(colCount);
      }
      for (k = 0; k < cols.length; k++) {
        matrix[row][k] = cols[k];
      }
  ...
  sheet.getRange(2, 2, rowCount, colCount).setValues(matrix);

Then, recently, I happened to run into a refactoring task on a GSheetd script written by a different developer. This time it was a different story, as every cell was referenced by name:

  for (i = 0; i < data.length; i++){
    spreadsheet.getRange("A" + (i + 2)).setValue((i + 2) % data.length);
    spreadsheet.getRange("B" + (i + 2)).setValue(data[i].sum);
    ...
  }

And there were simply too many references to fix by hand, and too many runtime data to utilize the default SpreadsheetApp calls without running into a timeout.

Then I had an idea, why can't I have an in-memory wrapper for SpreadsheetApp, which would give us the speed advantage without having to change existing code?

So I wrote my own MemsheetApp that uses a simple 2-D in-memory array to mimic a spreadsheet, without writing-through every operation to the API.

One problem I faced was that there is no specific way (call or event) to "flush" the data accumulated in-memory while retaining compatibility with the SpreadsheetApp API. The best thing I could find was SpreadsheetApp.flush() which, in normal use, would flush data of all open spreadsheets. In my case I had to explicitly retain references to all MemsheetApp instances created through my app, and flush them all during the global MemsheetApp.flush() call.

So, here goes the MemsheetApp source (hopefully I'll make it a GitHub gist soon):

MemsheetApp = {
  list: [],
  create: function(_name) {
    sheet = {
      //sheet: SpreadsheetApp.create(_name),
      name: _name,
      rows: [],
      maxRow: 0,
      maxCol: 0,
      getId: function() {
        return this.sheet.getId();
      },
      getRange: function(col, row) {
        if (!row) {
          row = col.substring(1);
          col = col.substring(0, 1);
        }
        
        if (isNaN(row)) {
          throw new Error("Multicell ranges not supported unless separating col and row in separate parameters");
        }
        
        c = col;
        
        if (typeof col  === "string"){
          c = col.charCodeAt(0) - 65;
        
          // this supports 2 letters in col
          if (col.length > 1) {
            //"AB": 1 * (26) + 1 = 27 
            c = ( (c + 1) * ("Z".charCodeAt(0) - 64)) + (col.charCodeAt(1) - 65);
          }
        }
        
        if (this.maxCol < c) {
          this.maxCol = c;
        }
        r = parseInt(row) - 1;
        if (this.maxRow < r) {
          this.maxRow = r;
        }
        
        if (!this.rows[r]) {
          this.rows[r] = [];
        }
        if (!this.rows[r][c]) {
          this.rows[r][c] = 0;
        }
        
        return {
          rows: this.rows,
          getValue: function() {
            return this.rows[r][c];
          },
          setValue: function(value) {
            this.rows[r][c] = value;
          }
        }
      }
    };
    this.list.push(sheet);
    return sheet;
  },
  flush: function() {
    for (i in this.list) {
      l = this.list[i];
      rowDiff = l.rows.length - Object.keys(l.rows).length;
      if (rowDiff > 0) {
        // insert empty rows at missing row entries
        emptyRow = [];
        for (c = 0; c < l.rows[0].length; c++) {
          emptyRow.push("");
        }
        for (j = 0; j < l.rows.length && rowDiff > 0; j++) {
          if (!l.rows[j]) {
            l.rows[j] = emptyRow;
            rowDiff--;
          }
        }
      }

      l.sheet.getActiveSheet().getRange(1, 1, l.maxRow + 1, l.maxCol + 1).setValues(l.rows);
    }
  }
}

As you may notice, it offers an extremely trimmed-down version of the SpreadsheetApp API, currently supporting only getValue(), setValue() and setNumberFormat() methods of Range and create() and flush() of SpreadsheetApp. One could simply add new functionalities by creating implementations (or wrappers) for additional methods at appropriate places in the returned object hierarchy.

If you are hoping to utilize MemsheetApp in your own Apps Script project, all you have to do extra is to ensure that you call MemsheetApp.flush() once you are done with inserting your data. This method is safe to call on regular SpreadsheetApp module as well, which means that you can convert your existing SpreadsheetApp-based code to be compatible with just one extra harmless line of code.

However, the coolest thing is that you can switch between SpreadsheetApp and MemsheetApp once you have refactored the code accordingly:

SheetApp = MemsheetApp;
// uncomment next line to switch back to SpreadsheetApp
// SheetApp = SpreadsheetApp;

// "SpreadsheetApp" in implementation code has been replaced with "SheetApp"
var ss1 = SheetApp.create("book1").getActiveSheet();

ss1.getRange(2, 2, 10, 3).setNumberFormat(".00");
ss1.getRange("A2").setValue(10);
...

var ss2 = SheetApp.create("book2").getActiveSheet();

ss2.getRange(2, 1, 1000, 1).setNumberFormat("yyyy-MM-dd");
ss2.getRange(2, 2, 1000, 1).setNumberFormat(".0");

// assume "inputs" is a grid of data, with dates in first column
// and 1-decimal-place precision numbers in second column
inputs.forEach(function(value, index) {
    ss2.getRange("A" + (index + 1)).setValue(value[0]);
    ss2.getRange("B" + (index + 1)).setValue(value[1]);
});
...

// this will push cached data to "ss1" and "ss2", from respective in-memory grids;
// and will have a similar effect (flushing all pending changes) when SpreadsheetApp is in use
SheetApp.flush();

MemsheetApp is a long way from being a fully-fledged wrapper, so feel free to improve it as you see fit; and share it here or somewhere public for the benefit of the Apps Script community.

No comments: