Saturday, November 7, 2015

A Word About Using External Web Resources

Today one can make a surprisingly attractive and responsive with a couple hundred lines of code. This is really cool; but one should not forget the fact that most of the hard work is being done by the plethora of libraries (JS) and style definitions (CSS) that he (sometimes unconsciously) incorporates into his code.

Content developer networks (CDNs) publicly serve static web resources (e.g. jQuery and Bootstrap libraries) for public use in other web sites. This has multiple advantages:

  • Users do not have to download the resources from scratch whenever they visit a new web site, as soon as the resources remain cached in their browsers. (CDN-based resources are often deliberately served with long expiry times, to encompass this requirement.)
  • Resources are standardized across web sites, leading to an "unconscious agreement" among different developers regarding usage of external resources.

Unfortunately, many developers have got accustomed to the habit of using their own versions of static web resources, hosted within the respective site domains, rather than including them from CDN sources. This is mainly done in favour of convenience in development, as developers can cut down the bandwidth usage and latency of over-the-wire CDN resource fetches, and stick to the same set of resource versions for all development work.

An even worse approach is observed in many other sites, even some highly recognized ones like some Google Apps domains, where the URL for the same resource is dynamically changed to force a redownload of a fresh copy, regardless of it already being present in the local browser cache. For example, the URI for the resource http://www.example.com/static/style.css may be http://www.example.com/static/style.css?_=xxxxxx where xxxxxx is something like a timestamp, version number or a randomly generated string.

Visiting several such sites results in the browser having to retain multiple copies of the same resource in memory (or disk), despite their content being identical, since it has to honor the possibility that distinct URIs may correspond to distinct content at any time.

Another trick used by websites is the mutation of the URL itself, rather than the query parameters, using hashes and other means. This is typically seen among resource URLs used in sites like Facebook, Google Apps (Groups, Apps Script etc.), LinkedIn and eBay. For example, the URL we demonstrated above may be transformed into http://www.example.com/static/style-xxxxxx.css where xxxxxx may correspond to some alphanumeric string (which may, at times, contain dashes and other symbols as well).

As all the abovementioned malpractices lead to increased latency and bandwidth usage on the user's side, some better design approaches would be to:

  • use public CDN resource URLs whenever possible
  • While this may be disadvantageous from the developer's side (since page loading would get delayed the moment you switch from local to remote resources), it can be minimized by devising a scheme to use local resources during development, and switching everything to remote on deployment.

  • avoid using dynamic or parametric URIs (e.g. with timestamps) for supposedly static resources which might get updated less frequently
  • HTTP provides techniques like expiration headers (Expires (response), If-Modified-Since (request) etc.) and ETags for propagating updates to client side. Although these can be tricky to master, they can turn out to be quite efficient in terms of cache and bandwidth usage when it comes to static resources.

  • set up the expiration headers of your web server properly
  • This is essential for avoiding frequent cache entry expirations resulting in redownloads of the same resource, straining the client (browser) as well as the server.

No comments: