Wednesday, September 13, 2017

Apps Script Navigator: UrlFetchApp Empowered with Cookie Management, Re-login and Much More!

If you are a fan of Google Apps Script, you would definitely have played around with its UrlFetchApp module, offering a convenient HTTP(S) client interface. As a frequent user of the module, although I admit that it is really cool, I always felt I needed a richer client, capable of handling stuff like automatic cookie management, proper redirection and logout detection with automatic re-login.

Hence came into being Navigator, a wrapper for UrlFetchApp that incorporates some of the basic HTTP client operations into convenient wrapper methods while addressing some of the pain points of the original UrlFetchApp module.

Unfortunately due to a bug in Apps Script framework, Navigator is not yet fully compatible with Apps Script editor's autocompletion feature. Hence for now we will have to depend on the comments in the source as documentation. Here is a summary of the features and utilities of the module:

A Navigator can be constructed using:

/**
 * a Navigator
 * invocation: Navigator.navigator(baseUrl)
 * returns a new Navigator object based on the given base URL (protocol://host:port/resource/path).
 * @class 
 * @param {baseUrl} default base URL for outbound requests
 * @implements {Navigator}
 * @return {Navigator} a new Navigator
 */
function Navigator(baseUrl)

The Navigator object currently supports the following methods:

/**
 * executes a GET request
 * @param {path} the destination path (relative or absolute)
 * @return the response payload
 */
Navigator.prototype.doGet

/**
 * executes a POST request
 * @param {path} the destination path (relative or absolute)
 * @param {payload} the payload (will be {@link UrlFetchApp}-escaped unless a String) to be sent with the
 *                  request; i.e. sent verbatim in case it is a string, or with escaping otherwise
 * @param {headers} an array of key-value pair headers to be sent with the request
 * @return the response payload
 */
Navigator.prototype.doPost

/**
 * executes an arbitrary request in {@link UrlFetchApp} style, for cases where you want to directly
 * manipulate certain options being passed to UrlFetchApp.fetch. However this still provides the
 * built-in enhancements of Navigator such as automatic cookie management.
 * @param {path} the destination path (relative or absolute)
 * @param {options} a {@link UrlFetchApp}-compatible options object
 * @return the response payload
 */
Navigator.prototype.sendRequest

The following configurator methods decide the behaviour of various features of Navigator:

/**
 * if set, cookies will be saved in {@link PropertiesService.getScriptProperties()}
 * @param {saveCookies} true if properties should be saved
 */
Navigator.prototype.setSaveCookies

/**
 * if saveCookies is set, decides the base username for saving cookies in the properties store (key {cookieusername}_cookie_{cookiename})
 * @param {cookieUsername} base username for cookies
 */
Navigator.prototype.setCookieUsername

/**
 * updates the local cookie cache with cookies received from a request, and returns the computed 'Cookie' header
 * @param {cookie} the current 'Cookie' header (string)
 * @param {rawCook} the cookie string ('Set-Cookie' header) received in a request
 * @return the updated 'Cookie' header string
 */
Navigator.prototype.updateCookies

/**
 * sets an absolute (starting with protocol://) or relative path for login requests to base website
 * @param {loginPath} path for login requests
 */
Navigator.prototype.setLoginPath

/**
 * sets the payload to be submitted during login (for automatic relogin)
 * @param {loginPayload} the login request payload
 */
Navigator.prototype.setLoginPayload

/**
 * if set, an automatic relogin will be performed whenever this content fragment is encountered in the response body
 * @param {logoutIndicator} content indicating a logout, for attempting relogin
 */
Navigator.prototype.setLogoutIndicator

/**
 * if set, when an automatic login is executed during a URL request, the original request will be replayed after login
 * @param {refetchOnLogin} true if refetch is required in case of a relogin
 */
Navigator.prototype.setRefetchOnLogin

/**
 * if set, logs would be generated for each request
 * @param {debug} true if request debug logging should be enabled
 */
Navigator.prototype.setDebug

The internal state of Navigator (such as the currently active cookies) can be obtained via the following methods:

/**
 * returns current 'Cookie' header
 * @return current 'Cookie' header string
 */
Navigator.prototype.getCookies

/**
 * returns headers received in the last navigation
 * @return headers from the last navigations
 */
Navigator.prototype.getLastHeaders

Navigator also provides some handy utility functions for extracting content from navigated pages, including those in the vicinity of HTML tags:

/**
 * similar to {@link extract} but is specialized for extracting form field values ("value" attributes)
 * @param {body} the HTML payload string
 * @param {locator} locator of the form field to be extracted (appearing before value)
 * @return value of the form field
 */
function getFormParam(body, locator)

/**
 * extracts a given tag attribute from a HTML payload based on a given locator; assumes locator appears before the attribute
 * @param {body} the HTML payload string
 * @param {key} key of the tag attribute
 * @param {locator} locator of the form field to be extracted (appearing before key)
 * @return value of the form field
 */
function extract(body, key, locator)

/**
 * similar to {@link extract} but performs a reverse match (for cases where the locator appears after the attribute)
 * @param {body} the HTML payload string
 * @param {key} key of the tag attribute
 * @param {locator} locator of the form field to be extracted (appearing after key)
 * @return value of the form field
 */
function extractReverse(body, key, locator)

Here are a few snippets utilizing the above features. I have masked the actual URLs and some of the parameters being passed, but you could get the hang of Navigator's usage.

Setting up automatic login and cookie saving:

var nav = new Navigator.Navigator("http://www.example.com");
nav.setSaveCookies(true);
nav.setCookieUsername(email);

// login form at http://www.example.com/login
nav.setLoginPath("login");

// will output all URLs, headers, cookies and payloads via Logger
nav.setDebug(true);

// for static login forms, this payload will be submitted during automatic log-in (if enabled)
nav.setLoginPayload({
	email: email,
	password: password
});

// only logged-out pages contain this; if we see this, we know we're logged out
nav.setLogoutIndicator("password_reset");

// if you request /home, and Navigator happens to find itself logged out and logs in again,
// setting this to true will make Navigator re-fetch /home right after the re-login
nav.setRefetchOnLogin(true);

// try #1; will automatically log in if stored cookies are already expired
var str = nav.doGet("daily");
if (str.indexOf("Get daily prize now") > 0) {
	str = nav.doGet("daily"); // try #2
	if (str.indexOf("Get daily prize now") > 0) {
		// notify failure
	}
}

Extracting HTML and form parameters for composing complex payloads:

// n is a Navigator

/* this will encode the payload as a regular form submission, just like UrlFetchApp
   Content-Type has no significance; if you want to send JSON you have to
   JSON.stringify() the payload yourself and pass the resulting string as the payload */

r = n.doPost("submit_path", {
	name: "name",
	email: email,
	"X-CSRF-Token": Navigator.extractReverse(s, "content", 'itemprop="csrf-token"')
}, {
	"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
	"X-Requested-With": "XMLHttpRequest"
});

/* sometimes, due to issues like empty param values (https://issuetracker.google.com/issues/36762225)
   you will have to do the encoding on your own, in which case you can directly submit a string payload
   which will be passed to UrlFetchApp verbatim (with escaping disabled) */

var payload = "authenticity_token=" +
	encodeURIComponent(Navigator.getFormParam(s, 'name="authenticity_token"')) +
	"&timestamp=" + Navigator.getFormParam(s, 'name="timestamp"') +
	"&spinner=" + Navigator.getFormParam(s, "spinner") + "&" +
	Navigator.extract(s, "name", "js-form-login") + "=USERNAME&login=&" +
	Navigator.extract(s, "name", "js-form-password") + "=PASSWORD&password=";

s = n.doPost("user_sessions", payload, {
	"Origin": "https://www.example.com",
	"X-CSRF-Token": Navigator.extractReverse(s, "content", 'itemprop="csrf-token"'),
	"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
	"X-Requested-With": "XMLHttpRequest"
});

Obtaining headers and cookies retrieved in the last response:

Logger.log(n.getLastHeaders());
Logger.log(n.getCookies());

You can use Navigator in your own Apps Script project, by adding it as an external library (Project key: MjQYH5uJpvHxeouBGoyDzheZkF1ZLDPsS). The source is also available via this Apps Script project, in case you are interested in peeking at the innards, or want to customize it with your own gimmicks. Should you decide that you need more, which may also be useful for others (including me), please do share your idea here (along with the implementation, if you already have one) so that I can incorporate it to Navigator.

Happy scripting!

1 comment:

Dilip said...

We have a limit of around 100k read/write for properties. How does this solution scale for large number of users?