Saturday, November 7, 2015

A Word About Using External Web Resources

Today one can make a surprisingly attractive and responsive with a couple hundred lines of code. This is really cool; but one should not forget the fact that most of the hard work is being done by the plethora of libraries (JS) and style definitions (CSS) that he (sometimes unconsciously) incorporates into his code.

Content developer networks (CDNs) publicly serve static web resources (e.g. jQuery and Bootstrap libraries) for public use in other web sites. This has multiple advantages:

  • Users do not have to download the resources from scratch whenever they visit a new web site, as soon as the resources remain cached in their browsers. (CDN-based resources are often deliberately served with long expiry times, to encompass this requirement.)
  • Resources are standardized across web sites, leading to an "unconscious agreement" among different developers regarding usage of external resources.

Unfortunately, many developers have got accustomed to the habit of using their own versions of static web resources, hosted within the respective site domains, rather than including them from CDN sources. This is mainly done in favour of convenience in development, as developers can cut down the bandwidth usage and latency of over-the-wire CDN resource fetches, and stick to the same set of resource versions for all development work.

An even worse approach is observed in many other sites, even some highly recognized ones like some Google Apps domains, where the URL for the same resource is dynamically changed to force a redownload of a fresh copy, regardless of it already being present in the local browser cache. For example, the URI for the resource http://www.example.com/static/style.css may be http://www.example.com/static/style.css?_=xxxxxx where xxxxxx is something like a timestamp, version number or a randomly generated string.

Visiting several such sites results in the browser having to retain multiple copies of the same resource in memory (or disk), despite their content being identical, since it has to honor the possibility that distinct URIs may correspond to distinct content at any time.

Another trick used by websites is the mutation of the URL itself, rather than the query parameters, using hashes and other means. This is typically seen among resource URLs used in sites like Facebook, Google Apps (Groups, Apps Script etc.), LinkedIn and eBay. For example, the URL we demonstrated above may be transformed into http://www.example.com/static/style-xxxxxx.css where xxxxxx may correspond to some alphanumeric string (which may, at times, contain dashes and other symbols as well).

As all the abovementioned malpractices lead to increased latency and bandwidth usage on the user's side, some better design approaches would be to:

  • use public CDN resource URLs whenever possible
  • While this may be disadvantageous from the developer's side (since page loading would get delayed the moment you switch from local to remote resources), it can be minimized by devising a scheme to use local resources during development, and switching everything to remote on deployment.

  • avoid using dynamic or parametric URIs (e.g. with timestamps) for supposedly static resources which might get updated less frequently
  • HTTP provides techniques like expiration headers (Expires (response), If-Modified-Since (request) etc.) and ETags for propagating updates to client side. Although these can be tricky to master, they can turn out to be quite efficient in terms of cache and bandwidth usage when it comes to static resources.

  • set up the expiration headers of your web server properly
  • This is essential for avoiding frequent cache entry expirations resulting in redownloads of the same resource, straining the client (browser) as well as the server.

Know Your Limits: Check File Size Before You Download!

At times it may be necessary for you to check the size of a downloadable file before actually downloading it. Although many well-designed websites display file sizes alongside their URLs, there are ample cases where there is no such indication. An ignorant mouse click may cost you a significant quantity of data before you realize that the download simply cannot proceed under the current data package quota, or that you have simply picked a wrong download link.

While browsers like Firefox can be configured to display confirmation dialogs acknowledging file size before the download starts progressing, this is generally an illusion because modern browsers generally start downloading the file in advance while the confirmation dialog is still in the foreground.

Fortunately, in most cases you may be able to use a HTTP tool (like wget or curl) to check the size of the associated file in advance before actually initiating the content download. This works not only for files, but for most other kinds of resources as well.

Here's how you can use wget to check the size of a download without initiating it. It uses the HTTP HEAD method to restrict the server response to headers, avoiding the actual payload (download content).

wget -S --method=HEAD -O - your_url_goes_here

-S is for asking wget to output the server response, while -O writes the output to a file, with - indicating that the output file is standard output (in our case, the terminal).

On a Linux machine with wget version 1.15, this would provide an output similar to what follows:

$ wget -S --method=HEAD -O - http://jflex.de/release/jflex-1.6.1.tar.gz
Resolving jflex.de (jflex.de)... 65.19.178.144
Connecting to jflex.de (jflex.de)|65.19.178.144|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Sat, 07 Nov 2015 16:05:43 GMT
  Server: Apache/2.4.7 (Ubuntu)
  Last-Modified: Sat, 11 Apr 2015 02:44:19 GMT
  ETag: "2e334f-51369db8566c0"
  Accept-Ranges: bytes
  Content-Length: 3027791
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: application/x-gzip
Length: 3027791 (2.9M) [application/x-gzip]
Remote file exists.

As seen above the download size is indicated by the Content-Length header. Unless packet transmission errors happen during the download process, this would be the amount of data to be consumed in the download process (plus a small margin for headers and other lower-level synchronization and acknowledgement signals).

Unfortunately some servers may not provide the Length header, in which case the value would either appear as unspecified or not appear at all. In such cases an attempt via a browser would produce the same result. As of now I haven't been able to find a workaround for this issue.

Sunday, November 1, 2015

A Digital Odyssey: Sailing the Firefox HTTP Cache

The cache is one great resource of Firefox which brings life to the concept of offline browsing, allowing you to browse the pages you have already visited without reconnecting to the Internet. It does this by saving (caching) the content when it is fetched for the first time, indexed under the corresponding resource URI, so that it can be directly fetched from memory (RAM or disk) without downloading a fresh copy. (Well, other browsers do this as well; but in some, like Google Chrome, it's pretty hard for a mere mortal to use offline browsing, unlike in Firefox, IE or Opera.)

The cache gives you multiple advantages, the most prominent being that you do not have to hook up to the Internet whenever you want to visit a page, as long as the page (resource) has been visited previously within a reasonable time window. Even under normal (online) browsing, a resource would not be re-fetched if a valid (unexpired) copy already exists in the cache.

Traversing the Firefox cache is not a very common use case, but it has some interesting applications. One is full text search. Say, for instance, you want to find out that great article you looked up some day last week, but forgot to bookmark or save; you don't remember much, only that its body text (not the title or URL, unfortunately) contained the word "Android". Without going for a web search and skimming through dozens of irrelevant results in sequence, you can simply do a text search against the browser's cache which would return all articles containing the word (assuming it hasn't been cleared away by either an expiration or a crash). Sophisticated tools like CacheViewer are quite good at this.

However, if you wish to implement your own cache traversal piece of code (maybe for an add-on), you might easily get frustrated over your first few attempts. Unfortunately, Mozilla's detailed Firefox documentation does not seem to cover the matter adequately, especially after the cache management mechanisms were updated last year.

Cache traversal logic for Firefox is mostly asynchronous, implemented using the visitor design pattern. It requires that you obtain an instance of nsICacheStorageService via Components.classes["..."].getService() and invoke asyncVisitStorage on a diskCacheStorage or memoryCacheStorage retrieved through it, passing an instance of nsICacheStorageVisitor as a parameter. The instance should define an onCacheEntryInfo() event callback which is invoked whenever a cached resource entry is visited, and an onCacheEntryVisitCompleted() event callback that gets invoked when all visits have been completed.

onCacheEntryInfo() receives a parameter containing attributes of the resource visited (such as URI, size and last visited date) and a stream reference that can be used to read the resource content. Logic can be included to operate on these attributes on a per-resource basis (since the callbacks would be invoked independently). Normal JS tricks like closures can be used to accumulate such results, perhaps to be combined at the traversal completion event.

For example, the following snippet of code (adapted from my own "aggressive URL remapper for Firefox" here) will traverse the cache, looking for resources corresponding to partial URLs listed in f, and accumulate all matches into a separate array r to be processed when the traversal is completed:

cacheService = Components.classes["@mozilla.org/netwerk/cache-storage-service;1"]
	.getService(Components.interfaces.nsICacheStorageService);	//note the "netwerk"!
var {LoadContextInfo} = Components.utils.import("resource://gre/modules/LoadContextInfo.jsm",{});
cache = cacheService.diskCacheStorage(LoadContextInfo.default, true);
cache.asyncVisitStorage({
	f: [ /*list of URL segments to be searched*/ ],
	r: [ /*list of URL segments found*/ ],

	// this will run for each cache entry
	onCacheEntryInfo: function(entryInfo) {
		url = entryInfo.key;
		for(i in this.f) {
			if(url.indexOf(this.f[i]) > 0) {
				/* do your stuff */
				this.r.push(url);
			}
		}
	},

	// this will run when traversal is over
	onCacheEntryVisitCompleted: function() {
		diff = this.f.length - this.r.length;
		if(diff > 0) {
			alert('Warning: ' + diff + ' URLs missing');
		}
		/* process this.r here */
	}
}, true);

Please note, however, that traversing a large cache may hang your browser for quite some time, as a huge amount of operations may be carried out in the process. If your callback methods tend to use up too much memory (e.g. by accumulating data in-memory), it's even possible that the browser may crash altogether. So beware!

Some of the interface definitions used for composing the above code was obtained from the mozilla/newtab-dev GitHub repository.

Tuesday, October 27, 2015

Ending the Ad Regime: Removing Ads on Mobile Partner

If you're a mobile broadband (dongle) user like me, you'll probably have a relatively new version of Mobile Partner or comparable application installed on your system. While the newer versions are really nice (they allow voice calls, USSD operations, direct top-ups and whatnot), providers like Mobitel and Dialog seem to plague their customized (dongle-embedded setup) versions with ads. Every time you connect to the Internet, the software interface gets updated with an ad, which can sometimes consume as much as 1 MB. While this may not seem a lot, it certainly adds up if you have the habit of making and breaking your connection frequently.

Fortunately, getting rid of these annoying ads is also quite easy (at least for Mobitel guys). Here I outline the method to disable ads, on a typical Mobitel Broadband software installation (bundled with the Huawei E3131 dongle sold by Mobitel) on a Windows system:

  1. Open the Mobile Partner folder at the installation location. This would usually be C:\Program Files (x86)\Mobitel Broadband.
  2. Open SysSettings.xml using a text editor.
  3. Locate the following section at the end of the file:
  4.   <webview_flash>
        <zone1>http://selfcare.mobitel.lk/MyAccount/linkfive.html</zone1>
        <zone4_1>http://selfcare.mobitel.lk/MyAccount/linkone.html</zone4_1>
        <zone4_2>http://selfcare.mobitel.lk/MyAccount/linktwo.html</zone4_2>
        <zone4_3>http://selfcare.mobitel.lk/MyAccount/linkthree.html</zone4_3>
        <zone4_4>http://selfcare.mobitel.lk/MyAccount/linkfour.html</zone4_4>
      </webview_flash>
    
    and comment out the contents of <webview_flash> tag by adding <!-- and --> at the start and the end, so it would read like:
      <webview_flash>
        <!--<zone1>http://selfcare.mobitel.lk/MyAccount/linkfive.html</zone1>
        <zone4_1>http://selfcare.mobitel.lk/MyAccount/linkone.html</zone4_1>
        <zone4_2>http://selfcare.mobitel.lk/MyAccount/linktwo.html</zone4_2>
        <zone4_3>http://selfcare.mobitel.lk/MyAccount/linkthree.html</zone4_3>
        <zone4_4>http://selfcare.mobitel.lk/MyAccount/linkfour.html</zone4_4>-->
      </webview_flash>
    
  5. Save the file, and close and reopen Mobile Partner if it's already running.

This shows a blank space on the connection tab after you initiate the connection, instead of ads. If you don't like it, you can add any preferred URLs inside the <zone> tags, instead of commenting them out altogether. For example, the following configuration would display the HTML file located at C:\connected.html in the software's connection tab when a data connection is made. (Don't forget to replace the \ characters in the path, with /.):

  <webview_flash>
    <zone1>file:///C:/connected.html</zone1>
  </webview_flash>

Tuesday, October 13, 2015

Finally, Now You Can Root Your HTC Desire 820s Dual SIM!

The HTC Desire series appears to be notoriously resistant to rooting. Rooting techniques for one device can be incompatible with most other devices. For example, while there are ample successful techniques for rooting the 820, none of them work for the 820s Dual SIM, despite all the similarities between the 2 models.

Kingo Android Root was the only thing capable of rooting the 820s Dual SIM. Unfortunately, as with most other rooting and hacking tools, it's available only for Windows, so the Linux geeks would have to seek help from the Windows guys.

Using the software is quite straightforward: make sure that you're connected to the internet, connect your phone to the computer via a USB cable (with USB Debugging enabled) and launch the tool. It will automatically detect the device model and give you a confirmation message. Just click the "ROOT" button, and the tool will start working, downloading the necessary dependencies. The rooting process should complete within a few minutes, at the end displaying a final confirmation message.

As the tool uses an on-the-fly custom dependency download, I was unable to track down the actual rooting method it had exploited. However it should probably be straightforward if you can check the temporary folders of the machine at the right time.

Unfortunately the superuser management app installed by the tool seems to be a bit crappy; some applications like ES File Explorer seem to be unable to access root privileges via it. However, the su binary installed by the tool works fine, so you can still access root privileges via the ADB shell or a terminal emulator app.

Sunday, June 21, 2015

Nobody would cache your torrent? Not to worry! | Torrent Web Seeding

Have you ever had trouble downloading large torrent files? It may be that the download is too slow, or that the network you are on (e.g. a school or company wireless network) has blocked torrent downloads. Most of the free file caching services out there don't accept torrents larger than 1GB, unlesse the torrent contains multiple small (< 1GB) files that can be downloaded over several attempts. Even those that offer 'unlimited' sizes have certain limitations on the contents of the torrent, such as hive.im which only accepts torrents with multimedia content.

I recently had to download a 3.9 GB torrent containing a software package and, no matter how thoroughly I searched, I could not find a torrent caching service that would accept it.

Finally I came across the BitTorrent protocol page, and started reading it in the hope of finding a clue to try to write some kind of mechanism that would bypass the university network limitations (maybe by port forwarding via a cloud-hosted app). There I came to know the concept of web seeding, where some distributors expose direct-downloadable files as torrent seeds in the hope of reducing the load on the original file server as more clients download the torrent and become its seeds.

Just out of curiosity, I loaded the torrent to a torrent client on my regular mobile data connection, allowed the peer discovery to proceed, and checked the seeds. Guess what? There were just two pure web seed URLs! I gave one URL to wget, and got a resumable direct download that completed in a little over an hour on an average 3-4 Mbps wireless network!

So, next time, before you go around complaining that your torrent is too big or too slow for downloading, just have a look at the torrent's seeds; if you have a web seed, you're most probably in luck!

(This, unfortunately won't work for multimedia torrents most of the time, as they are generally initially seeded by generous individuals who don't own web servers to directly host such files; but give it a try anyways before going for expensive alternatives, just in case!)

Thursday, April 23, 2015

Discover the Hidden Face of Your Android!

If you have an Android device with a customized OS or ROM, your UI would be significantly different from what a genuine Android version would offer. However, unless the developers of your device are clever enough to get rid of all the old stuff, you can easily get an idea of what your device had to offer before it was customized, and maybe even gain access to some useful apps or functions that have been disabled in the customized version.

You will have to write a small Android app, just a few lines of code. Create a standard Android app project with a main activity, and include the following code in the onCreate() method (or in a UI event handler, if you like to have some fancy UI):

	Intent mainIntent = new Intent("android.intent.action.MAIN");
	startActivity(mainIntent);

When the code gets run, an app picker (intent chooser dialog) will come up, showing all activities in your phone that have been registered for the MAIN intent. You would most probably see a lot of new options which had not been seen before, probably because they are registered only for more specific intents like VIEW, or do not belong to the LAUNCHER category so that they were, well, not launchable.

In my case (a Mlais MX28 with a factory-installed modded ROM), using this trick, I discovered some totally unexpected, quite useful stuff:

  • a logger app (MTKLogger) that allows generation and saving of mobile, modem and network logs (including TCP dumps); I had seen this app earlier (although it didn't appear in the app list), but it was not launchable, even via the OS app manager.
  • a built-in testing tool that allows direct access to the phone's Wi-Fi API functions and configuration, battery information and network details including a ping test
  • a viewer showing SIM information for each of my SIMs (although the device is dual SIM, there was no built-in facility to check their individual IMEI numbers)
  • the genuine About Phone page that had been replaced with a custom version; I had been missing the famous JellyBean splitter ever since I got hands on my phone, simply because there was no Android version entry to tap 7 times in the About Phone section

I came across a load of other activities, many of which didn't show anything, and some of which crashed on launching. There were also some installer and configuration actions as well, which had probably been left behind by the phone mod process.

Ironically, there were also an NFC and an Android Beam page, although my device has no such capabilities; maybe the developer just left the activities behind and only removed the references from the modded Settings page.

Remember that all these things came up for just the MAIN intent; I'm dying to find out what the load of other Android intents—official and unofficial, documented and undocumented—would bring up. It should be quite simple to find out; just a change in the intent name string literal in the above code.

So, go ahead and try it out if you have a customized phone (most of you would, since almost every smartphone manufacturer offers a proprietary UI these days); your phone too may have some nice surprises in store for you!

Sunday, March 15, 2015

The Java NIO WatchService: Pitfalls to watchOut() for

My intention here is not to delve into the API or whereabouts of the Java NIO watch service; in a nutshell, it allows an application to register for filesystem event notifications on a set of arbitrary paths. This is quite useful in text editors, filesystem monitors and other applications which require immediate notifications without incurring the overhead of repetitively polling the file tree. You can find perfect explanations of it on the internet, such as this article, accompanied by a plethora of examples and tutorials.

My article is dedicated solely for explaining the various pitfalls I came across when writing my first WatchService-based application—an event-driven file transport for the UltraESB, the best open source ESB ever.

If you are familiar with inotify watches on Linux, you won't need any further introduction to NIO watch API. Although I have not directly used inotify, judging by the content of inotify's man page I would say that the NIO watch service, on Linux, is just a simplified wrapper module for inotify. The architecture is quite identical, with the WatchService corresponding to the inotify instance, the WatchKey corresponding to the watch descriptor and the WatchEvent corresponding to the inotify_event entity; and so are the processes of watch registration and retrieval of events (blocking and non-blocking). NIO seems to have introduced the concept of queueing events under WatchKey's and requiring the user to reset() the WatchKey before it can gather further events, but almost everything else is quite the same as inotify.

First, it should be noted that in some systems (such as Mac OS X) the watch service may fall back internally to a polling approach, if notifications are not natively supported by the underlying OS. It is also known that the watch service has certain incompatibilities with Windows-based operating systems. From what I have read so far, *NIX environments are the ones which can reap the true benefits of the watch service.

But that's just the beginning.

This extract from the inotify manual itself, titled Limitations and caveats, gives us an idea of how careful we should be, when using inotify-based implementations:


Inotify monitoring of directories is not recursive: to monitor subdirectories under a directory, additional watches must be created. This can take a significant amount time for large directory trees.

The inotify API provides no information about the user or process that triggered the inotify event. In particular, there is no easy way for a process that is monitoring events via inotify to distinguish events that it triggers itself from those that are triggered by other processes.

Note that the event queue can overflow. In this case, events are lost. Robust applications should handle the possibility of lost events gracefully.

The inotify API identifies affected files by filename. However, by the time an application processes an inotify event, the filename may already have been deleted or renamed.

If monitoring an entire directory subtree, and a new subdirectory is created in that tree, be aware that by the time you create a watch for the new subdirectory, new files may already have been created in the subdirectory. Therefore, you may want to scan the contents of the subdirectory immediately after adding the watch.


On most systems, the number of WatchService instances you can open is limited. Exceeding this limit can sometimes produce vague errors like "network BIOS command limit reached". However, you'll almost always work with a single WatchService per application, so this should rarely become an actual problem. Nevertheless, keep in mind the fact that the WatchService is another I/O resource (just like a Stream or Channel) and make sure that you invoke close() on it after use, to avoid resource exhaustion.

In addition to the basic watch event types (ENTRY_CREATE, ENTRY_MODIFY and ENTRY_DELETE), there's a fourth event type, OVERFLOW. This corresponds to the case where the event queue for a given WatchKey has overflown, causing the loss of events as explained by inotify. For example, under default Linux settings, a WatchKey will produce an OVERFLOW if its queued event count exceeds 512. Although this limit is configurable, it should be remembered that an overflow is not entirely unavoidable, and should be taken into considerable in all mission-critical implementations.

The Path (more correctly, Watchable instance) associated with a WatchKey can be retrieved via its watchable() method. However, the watchable() is bound to the system representation of the directory (e.g. inode on Linux), and not the actual (literal/textual) path. Hence, even if we change the name of a directory in the path, the value of watchable() would remain at the old value, eventually leading to failure when we try to use watchable()'s value as a valid Path. Hence it is necessary to constantly keep track of changes in directory paths and update the local 'image' of the filesystem accordingly. Fortunately, a folder rename or move (that would result in the invalidation of a corresponding watchable() value) is always accompanied by an ENTRY_DELETE event for that directory, so we can quickly identify and take action before things get messed up.

Events are not guaranteed to arrive in any particular order. For example, if you delete a directory containing a subdirectory and being watched by watch service W, located inside a directory which is also being watched by W (so there are is a 3-level directory hierarchy), deletion of the middle directory may be notified before that of the inner directory, although logically they should have happened in the opposite order. To make things worse, if the innermost directory was also being watched, it might generate a bogus notification (i.e. one containing zero events) during deletion.

Every operation is decomposed into the 3 basic event types. For example, renaming a file or directory inside a watched parent directory will trigger an ENTRY_DELETE (oops, that file is gone!) followed by an ENTRY_CREATE (hey, a new file is here!). This can be thought of as a new way of looking at the renaming process (deletion followed by creation), but it does not reflect the actual operation (inode metadata update) that would be happening behind the scenes.

A more strange (albeit perfectly OK) scenario happens if a file is moved into or out of a watched directory. As only the outermost directory entry gets changed 'effectively', you only get a notification for the outermost directory that got moved. Now, if there were any watched subdirectories inside the moved directory, they would still be 'active' but their watchable()s would no longer be valid; from that point onwards, only the relative path provided by the corresponding watch key would be valid, and you will have to manually prepend it with the new parent path to get the correct full path.

Enough talking; let's demonstrate all this Greek with a simple directory hierarchy!

a
+--b
|  +--d
|  |  +--g.txt
|  |
|  +--e
|     +--h.log
+--c
   +--f
      +--i.sh

Assume that all directories except f are being watched. See if you can deduce what is going on behind the scenes.

Operation Resulting events
WatchKey's watchable() WatchEvent's kind() WatchEvent's context()
delete c a ENTRY_DELETE c
c ENTRY_DELETE f
(f deleted silently; order of other events is unpredictable)
move c into b a ENTRY_DELETE c
b ENTRY_CREATE c
move e into f (not included here), then delete f c ENTRY_DELETE f
e ENTRY_DELETE h.log
(f deleted silently; event order unpredictable)
rename c as j a ENTRY_DELETE c
a ENTRY_CREATE j
delete d d ENTRY_DELETE g.txt
b ENTRY_DELETE d

Moral of the story? Be careful and thoughtful when using the WatchService API. It's a golden sword that can cut you badly if handled in the wrong way.

JS Obfuscation (Part 2) - Countermeasures

Despite how cryptic JS obfuscation may sound, deobfuscating JS is only slightly more difficult than the obfuscating part.

  • Decompaction
  • Since this is exactly the job of a beautifier, any popular JS beautifier can get a whitespace-trimmed, cryptic JS one-liner back in good shape in a split second. Online Javascript Beautifier is my favourite, as it can process entire webpages (HTML+CSS+JS) as well, and even supports some JS deobfuscation features.

  • Packer
  • Ironically enough, packer itself has a deobfuscation feature. Although it's disabled by default, you can easily enable it by stripping the disabled attribute off the lower textarea control, using the Inspect Element feature available on web browsers. After this hack, you can paste the packed code into the lower textarea and use the Decode button to get it de-obfuscated.

  • Token replacement
  • I haven't yet come across a straightforward way to defend this trick, although there's most probably an extremely simple way. Yet, the JS consoles of most web browsers can help you what each set of tokens would concatenate into. All I can say is that it's just dead painful to copy and paste each set of tokens to see what they mean, so some regex and text manipulation magic may often come in handy.

  • Experience: It's the Key!
  • As with anything else, when it comes to deciphering a JS puzzle, what matters most is prior experience and, more importantly, attention to the tiniest levels of detail. Taking a 'snapshot' of the current JS context of the browser (window) can sometimes come in handy. The following script can do the trick, when run in the browser console:

    var keys = Object.keys(window);
    var backup = {}
    for(i in keys) {
    	var key = keys[i];
    	backup[key] = window[key];
    }
    

    If you want to compare this snapshot with a later state of the context, the following script will help:

    var keys = Object.keys(window);
    for(i in keys) {
    	var key = keys[i];
    	if(backup[key] != window[key])
    		console.log(key + ': ' + backup[key] + ' -> ' + window[key]);
    }
    

    Examining HTTP requests sent out by the web page can also provide hints regarding what might be going on inside the page, behind the curtain of obfuscation. Most modern browsers provide a network request tab in their developer consoles, while there are popular analyzer plug-ins like FireBug as well.

  • Circumvention
  • Although JS is nearly taken for granted in modern web apps, some old (or more "compatible") pages still contain <noscript> tags to tackle non-JS environments. No matter how complicated the corresponding JS may look, the <noscript> tag always has to use plain old HTML (and CSS) to accomplish the same goal. An excellent example is the CAPTCHA component used on many pages; the cryptic JS fragment used to dynamically load and validate the CAPTCHA is often accompanied by a plain <noscript> tag that simply contains the URL of a CAPTCHA image (usually in an <img> tag) and a form that simply POSTs the user input to the solution URL.

  • Situation-specific Strategies (ඌරගෙ මාලු උෟරගෙ ඇඟේ තියල කැපීම)
  • Once I came across a page (on a popular PTC site) that had gone to such lengths as to dynamically generate a JS fragment by concatenating an array of numbers (ASCII codes), which then sends another HTTP request for a different obfuscated script file, and evaluates the resulting content to obtain the actual JS source, which then gets evaluated yet again to bring about the desired functionality. I simply copied the initial script and ran it 'as-is', removing only the outermost (last) eval(), to obtain the final JS source that first appeared to be impossible to intercept. :) Sometimes, all that is required to defeat JS obfuscation is a little bit of strategy.

    Please note that some keywords used in this article may not correspond to their actual technical meanings, as most of the scenarios are my personal experiences and not validated against standard 'hacking' literature.

Monday, March 2, 2015

JS Obfuscation - The Nuts and Bolts

Whether you're a professional WWW hacker or just an average WWW kiddie like me, JavaScript (JS) will definitely play a huge role in your line of work. JS is what gives life to almost anything on a web page—save for some fancy CSS, a web page without JS would be plain lifeless.

The great thing about JS is that it's browser-interpreted, so the source code is already in your browser, ready for examination and sometimes even modification. This may sound cool to the client-side hacker but it's a nuisance for the server-side programmer and security team; a part of the application logic is getting leaked to the client side, all ready for exploitation!

As a result, many websites now use obfuscation techniques to scramble their JS sources. JS programmers use the concept of metaprogramming—creating a program that mutates itself to become something else—to devise elaborate tricks for hiding their logic inside cryptic, uncommented, space-stripped blocks of sphagetti code. The browser doesn't give a damn, however, and gracefully executes this sphagetti metaprogram which eventually does the entire job as nicely as the original code (perhaps at the cost of a few extra milliseconds, but who cares?).

These are some of the obtuscation 'techniques', which I have come across, while examining obfuscated JS:

  • eval()
  • This native JS function executes whatever string passed to it, on the JS engine, just as if it were regular inline JS code. eval(['a','l','e','r','t','(','"','H','e','l','l','o','"',')'].join('')) will display an alert "Hello" on the browser. This is often the core trick of more advanced techniques.

  • Packer, and similar code compacting tools
  • /packer/ is a service that can convert an arbitrary JS fragment to a "packer" function, often with the signature function(p,a,c,k,e,r) which, when eval()'d, produces the same result as the original fragment. Inside the function is a total mess for the naked eye, with a mixture of math operations, concatenations, arrays of keyword and function names, nested eval()s, and more. There are other similar converters as well, doing the same thing with slightly different algorithms. Many PTC websites including Neobux and the late Probux use this technique to shield their ad view and validation logic from prying eyes (but not from mine ;-)).

  • Token replacement
  • A list of textual tokens of 2-3 characters each, often having cryptic names, are declared. Subsequent code uses various combinations of these tokens to invoke desired operations. For example,

    a2x = 'do'
    //...
    n9p = 'ran'
    //...
    bb7 = 'm'
    //...
    alert(Math[n9p + a2x + bb7]())
    will produce an alert with a random number. I have observed this technique extensively in AdFly.

  • Plain compaction
  • To be honest, this is not obfuscation. However, given the possibility of complex syntactic tricks of JS, removing whitespace can make JS code highly unreadable and cryptic; yet it's just an illusion devised to frighten away inexperienced script kiddies.

Of course, this list is not exhausive; given the flexibility of JS, there are infinitely many possibilities of obfuscating the same piece of code. Nevertheless, all of them are based on the fundamentals—confusion, illusion and eval()—as they are meant for the man, not for the beast machine.

Saturday, February 28, 2015

Running Out of Data? Mobile Sites to the Rescue!

You may certainly have faced situations where you have only a few megabytes left on your internet plan subscription. However, you may not be aware yet that you can survive in the internet for several hours using those few megabytes!

Currently, many of the leading web services maintain mobile-friendly versions of their websites, which work well under low bandwidths. Here we present a list of some of such websites which may come handy at the hour of need.

  • Gmail Mobile - https://mail.google.com/mail/u/0/x
  • This is one of the many Gmail sites available for mobile viewing. It lacks all the fancy features of automatic email notifications, HTML-formatted email views and such, but it's a handy tool for viewing your emails at the cost of just a few kilobytes.

  • Gmail Basic HTML View - https://mail.google.com/mail/u/0/h

    This is the old Gmail desktop site (the "basic HTML" view). It's only slightly heavier than the genuine mobile site, but offers almost all Gmail functionalities and settings, minus the fancy AJAX stuff and the chat interface. In my opinion, it's more than sufficient for all your daily Gmail chores.

  • Facebook Mobile - https://m.facebook.com
  • Although this is officially for mobile devices, it can easily be accessed even on a desktop computer using the above link.

    Facebook Mobile comes in 2 main versions; on some browsers (Opera and some versions of Firefox) it renders as a plain site, with much less interactivity. But the other version (especially on Google Chrome) is a nearly fully-functional website which even has real-time chatting facilities and a high degrees of responsiveness (AJAX). (Of course, this version consumes significantly more data than the plain version (a page load on the former consumes ~10 KB while the latter needs ~100 KB), but still it is much less than the actual WWW version; with images turned off, you can survive for a whole day for much less than 20MB.

  • Twitter Mobile - https://mobile.twitter.com
  • This is similar in functionality to the responsive Facebook mobile site. It offers a limited set of basic Twitter functionalities.

  • Google Drive Mobile - https://drive.google.com/m
  • Although this used to be a plain site, it has recently been converted into a responsive version. It's similar to the old Google Drive desktop site in functionality, and especially by the fact that you can preview files without actually opening them. However, data usage for this site is significantly high.

  • Wikipedia Mobile - http://en.m.wikipedia.org
  • This is a lightweight version of the mother site, with simple navigation, content folding etc. There are no dedicated sidebars or table-of-contents sections, but the content is almost as descriptive as the original articles.

  • LinkedIn Mobile - https://touch.www.linkedin.com
  • This is similar to the Twitter Mobile website, and provides a sufficiently rich set of functionalities compared to the desktop version.

  • PayPal Mobile - https://mobile.paypal.com
  • This is extremely lightweight, at the cost of the fact that it only allows you to view your existing PayPal balance. Although there's a "send money" option, currently it is linked back to the facility on the desktop site.

  • eBay Mobile
  • The WAP site http://wap.ebay.com is the default eBay mobile site for WAP clients, but it works on desktops as well. It has low responsiveness but still provides basic functionalities.

    There is a more sophisticated version at http://m.ebay.com for smartphones, but it consumes more data than the WAP version. Nevertheless it provides a significantly close-to-the-real-thing experience at a fraction of the data cost.

  • oDesk Mobile - https://www.odesk.com/m
  • This is the mobile optimized version of the famous freelancing site, and offers messaging and job search features. However at the time of this writing, it does not permit you to submit job applications.

  • Blogger Mobile
  • Unless the blog admin has either disabled mobile view or has used a non-mobile-friendly theme, you can view the mobile version of any blog or post by appending "?m=1" to the relevant URL (or, if "m=0" is already in the URL, by changing it to "m=1").

  • Google Mobile Site Viewer - http://www.google.com/gwt/n
  • This site can convert almost any public web page into a mobile-friendly, plain version. However the converted page would lose its interactivity and styling. It is also not applicable for pages or sites that rely on cookies for proper functionality. Still, it's a great resource when you want to check out the bare-bones content of a really bulky website, strippling off the fancy styling and JavaScript stuff.

  • Stack Exchange
  • Stack Exchange offers an option for you to switch between desktop and mobile views when visiting their websites. This switch is persistent until you decide to switch back again manually.

    ...and the list keeps growing...

Apart from these, most websites (such as Google Search, Fiverr, Facebook, LinkedIn and Atlassian Confluence-powered websites) can now recognize which type of device you are using to access them (via the HTTP User-Agent header), and serve mobile-friendly version. (Check this out for more information; unfortunately there's no English version of the article available yet.)

However, it should be noted that mobile sites may not always be more lightweight than their desktop counterparts; the default Gmail Mobile site (https://mail.google.com/mail/mu/mp), to which smartphones are automatically redirected, can be cited as an example; at times it can consume more data than the actual default desktop site (https://mail.google.com/mail/u/0). So, keep an eye on your gauges when surfing unforeseen mobile sites!

Friday, February 27, 2015

xcalib: Say No to Eye Strain - Exclusively for You on Linux!

Eyes getting tired after staring at the computer screen for a couple of hours? Don't take it lightly; eye strain, if neglected, can eventually lead to catastrophic conditions like glucoma and even total blindness. And it's quite common around members of the IT sector, who have no other option than staring at their computer screens for long periods each day.

Eye strain often traces back to computer screens that are too bright (although lack of brightness can also cause problems at times). If you are a frequent computer user, you would have noticed that reading text on white or other light-coloured backgrounds over prolonged times can lead to eye strain quite quickly.

As a result, many applications now provide themes and configurations to ensure minimum eye strain. Ubuntu Linux, for example, offers a (PDF) Document Viewer featuring an 'Inverted Colors' option, while IDEs like NetBeans and IntelliJ IDEA have specialized 'dark themes' that provide similar colour configurations. Even the default text editor, gedit, can be configured to display light-coloured text on a black background.

However, if you are frequently switching between the 'light' and 'dark' modes, you will soon realize how painful it is to configure each application individually during the transition process. Besides, there's no guarantee that every arbitrary application would support colour inversion, so you may very well end up in situations where you are stuck with black text on white on one application and the eye-friendly inverse on all the others.

On Ubuntu Linux, as with anything else, there's a workaround: xcalib.

Installation takes the usual, friendly path: sudo apt-get install xcalib

This utility can be used to completely invert screen colours with a single command, so you never have to mess with individual configurations again:

xcalib -i -a

Repeating the same command will bring the screen back to normal.

The catch, however, is that it will also swap the other screen colours (red becomes cyan, blue becomes yellow, and so on). It takes some time for the brain to get used to this strangeness, but after that it just becomes normal.

xcalib only seems to trigger the colour swap on the display; if you take a screenshot with xcalib executed, you will notice that it would still be a 'natural colour' screenshot.

Still, it may be required to configure some colour schemes, such as that on the terminal, so that whether you are in the inverted mode or not, you would be seeing a 'natural' (light text on dark background) terminal.

Here are two shell scripts, zkreen and swap, that I use for inverting my screen and the terminal colours together so that I always see a green-text-on-black terminal but everything else gets swapped (please note that you need gconftool-2 installed, for the swap operation to work).


zkreen:

xcalib -i -a
swap


swap:

xcalib -i -a
fore=`gconftool-2 --get "/apps/gnome-terminal/profiles/Profile0/foreground_color"`;
if [ $fore = "#0000FFFF0000" ]; then
	gconftool-2 --set "/apps/gnome-terminal/profiles/Profile0/background_color" --type string "#FFFFFFFFFFFF";
	gconftool-2 --set "/apps/gnome-terminal/profiles/Profile0/foreground_color" --type string "#FFFF0000FFFF";
else
	gconftool-2 --set "/apps/gnome-terminal/profiles/Profile0/background_color" --type string "#000000000000";
	gconftool-2 --set "/apps/gnome-terminal/profiles/Profile0/foreground_color" --type string "#0000FFFF0000";
fi;


(Make sure that you set executable permissions on both scripts, using chmod +x.)

I leave other configurations untouched so that I can switch back to the dark-on-light scheme easily when I get bored with light-on-dark, or have some special requirement such as viewing a colour image.

On both Ubuntu systems that I am using, the effect of xcalib seems to wear off suddenly during certain operations (such as pressing Ctrl+C or Ctrl+V, on the terminal or in other applications). The issue is intermittent and I believe it is a bug (not a serious one, though). The second script (swap) is for such situations, whose job is to solely swap terminal colours when such a thing happens (otherwise the terminal will retain a funny-looking pink-text-on-pink scheme, which we had actually set during the xcalib swap so as to get a natural-looking terminal under colour inversion).

Using dconf and other related utility commands, you will be able to customize the script to trigger colour changes on other selected applications such as gedit, and maybe even get rid of xcalib and rely solely on application configurations for colour inversions.

MyLightBlog - Lightweight Blogging Champion

Blogger is becoming increasingly popular as a feature-full, free and smart blogging platform available to everyone.

However, one would have realized that the Blogger web interface is quite complicated and, more than all, bulky. It may contain a million smart features, but most of the time an average blogger just will just step in to compose and post an article, and nothing beyond. The interface takes nearly a megabyte of bandwidth on first load, and continues to communicate with Blogger for post draft auto-saving and such. Moreover, it does not get cached properly in the browser (at least in Firefox) so it practically loads everything from scratch during the next visit if you happen to close the browser between sessions.

So, rather than sitting back and criticizing, I wrote a Blogger browser client on my own, using the Blogger API v3

It's now public, available right here on GitHub.

The catch is that it requires a web browser to run on the machine (a simple HTTP web server would do; there's no need to go for a full stack with PHP/MySQL).

Put more precisely, it needs to be able to navigate to the address http://127.0.0.1/blogger/oauth (whose corresponding web page is included in the project itself) for Google OAuth authorization.

You can simply clone the repository into the web data folder of your web server (www on Apache, unless you're using a specialized stack like XAMPP where the corresponding folder is named htdocs; for example, in Linux it's the famous /var/www/html directory).

Note that you may also have to set executable permissions on the two html files in Linux, which is easily done by cd ing into the blogger directory and running chmod +x blogger.html oauth.html.

Now, simply navigate to http://127.0.0.1/blogger/blogger.html to access the one-page client.

You can load the blogs associated with your Blogger account, and their posts, using the Blogs and Posts buttons. You can use the corresponding lists to switch among blogs and posts, and the All and Draft buttons to load either all posts or just the draft posts, respectively.

At first use, you will have to authorize the backend application MyLightBlog, which is hosted on Google App Engine, by clicking the Authorize button. In this case a pop-up will open, asking you to authorize the application for your Google account. You will have to click 'Accept' so that the client is granted permission to manage blogs on your Google account. Authorization is granted merely for Blogger-related features (the dialog will clearly state so) so you don't have to worry about your account or any unintended parts of it falling into evil hands. (Refer to this nice post for more information on how OAuth works.)

The page layout is quite crude at the moment, but it features all basic Blogger operations required by the average blogger, like listing your blogs and posts, creating new posts, and editing, publishing, unpublishing and deleting existing posts.

It may seem inconvenient that you have to edit the post as raw HTML, but it eliminates a great deal of redundancy that would otherwise creep into the actual blog in case you use an actual HTML editor such as the one you see on Blogger.

Just as an example, each <p> is automatically given a dir="ltr" attribute by Google's editor, which increases by five-fold the amount of characters used for the tag alone. In case you blog in a non-ASCII character set like Sinhala, each letter would be separately escaped (as &xxxx;), bloating the post content dramatically. (While these 'intelligent' features may be useful for writing truly international blogs, I don't believe they are worth the obvious overhead in our cases.)

A good alternative would be to create the document in a simple HTML-supporting editor (word processing software will almost always produce bulky HTML requiring heavy cleansing) and pasting the HTML content into the post content field.

To compensate for the lack of direct HTML support, the middle right region of the document will display a HTML preview of the post as it is being edited.

You can add or modify the title and labels (tags) of the post using the text boxes above and below the main content text area, respectively. You can use the Update button to post your changes to the actual blog post online on Blogger, or the New Post button to post the current content as a new blog post.

(WARNING: Be sure to refresh the posts list and load the new post, if you happen to make a new post; as of now the client will not automatically refresh the list, so if you click Update again, the currently selected post from the list will get overwritten automatically!)

My main intention is to keep this client as light as possible, hence future improvements will focus on removing jQuery altogether, as soon as possible. No external CSS or JS would be included unless absolutely necessary.

Feedback on the client is always welcome, preferably as comments here itself, or (if more critical) as issues here.

P.S.: This article was composed, formatted, finalized and published using the Blogger client itself.

Saturday, January 31, 2015

Farewell, apt-get install

One reason behind many people liking Ubuntu is the availability of a tremendous number of free software, and the relative easiness of installing them (most of the time, all that takes is a simple apt-get install).

However, things get nasty when you need to install software on a machine with limited or no internet connectivity. The repository URLs may be inaccessible, or downloads may be restricted on the network. Even if everything goes fine, in the end you will be left with a huge degree of data usage and time spent (since everything would have to be redownloaded separately for every instance of the installation on different machines).

Nevertheless, if you are not too uncomfortable with digging around a bit, you will be able to copy most of the already installed applications from machine to machine, in no time.

(Please note that this approach is not recommended (and probably not applicable) for services like mysql or apache, which usually inject configurations to /etc and other locations.)

Generally, the bulk of an application installed via apt-get goes into /usr/lib, and a launcher script is created in /usr/bin. Some more stuff may go into /usr/share and /usr/share/doc, but you can live without them most of the time.

Configuration and customization settings are usually confined to your user directory (/home/[username]), which can usually be ignored. If you want an exact clone of the application, you can copy them as well.

Locating the necessary files correctly and completely is the next biggest challenge. However, thanks to proper naming conventions used by Linux packages, the directory names (and executable or launcher script names, if any) are almost always identical to the base package name (e.g. eclipse, firefox, chromium-browser, etc). Besides, you can always use a command like sudo find /usr -name chromium* to locate the relevant directories and files (in this case, for an application having a name starting with 'chromium').

For example, once I managed to get a working copy chromium-browser (which would consume a few hundred megabytes of bandwidth if retrieved via apt-get), just by cloning the /usr/lib/chromium-browser directory, and the /usr/bin/chromium-browser script.

For licensed software such as IntelliJ IDEA, if you copy the .IntelliJIDEA directory inside your user directory, you will get a working licensed copy on the target machine as well! (Please note, however, that such copying may violate the license terms and conditions.)

The copying process can be facilitated by any of the popular Linux remote copy tools, such as scp, rsync, or even netcat.

Depending on any errors you get upon invoking the newly copied application's launcher, you may also need to set executable permissions on certain files such as scripts and executables (in case they do not get replicated correctly during copying).

So, that's it, folks! No more apt-get!

[Please note that the content in this article is based solely on my experience with Ubuntu Linux. There is no guarantee that this approach would work on a given Linux distro or version, regardless of the consistency of Linux culture across platforms.]