Sunday, March 15, 2015

The Java NIO WatchService: Pitfalls to watchOut() for

My intention here is not to delve into the API or whereabouts of the Java NIO watch service; in a nutshell, it allows an application to register for filesystem event notifications on a set of arbitrary paths. This is quite useful in text editors, filesystem monitors and other applications which require immediate notifications without incurring the overhead of repetitively polling the file tree. You can find perfect explanations of it on the internet, such as this article, accompanied by a plethora of examples and tutorials.

My article is dedicated solely for explaining the various pitfalls I came across when writing my first WatchService-based application—an event-driven file transport for the UltraESB, the best open source ESB ever.

If you are familiar with inotify watches on Linux, you won't need any further introduction to NIO watch API. Although I have not directly used inotify, judging by the content of inotify's man page I would say that the NIO watch service, on Linux, is just a simplified wrapper module for inotify. The architecture is quite identical, with the WatchService corresponding to the inotify instance, the WatchKey corresponding to the watch descriptor and the WatchEvent corresponding to the inotify_event entity; and so are the processes of watch registration and retrieval of events (blocking and non-blocking). NIO seems to have introduced the concept of queueing events under WatchKey's and requiring the user to reset() the WatchKey before it can gather further events, but almost everything else is quite the same as inotify.

First, it should be noted that in some systems (such as Mac OS X) the watch service may fall back internally to a polling approach, if notifications are not natively supported by the underlying OS. It is also known that the watch service has certain incompatibilities with Windows-based operating systems. From what I have read so far, *NIX environments are the ones which can reap the true benefits of the watch service.

But that's just the beginning.

This extract from the inotify manual itself, titled Limitations and caveats, gives us an idea of how careful we should be, when using inotify-based implementations:

Inotify monitoring of directories is not recursive: to monitor subdirectories under a directory, additional watches must be created. This can take a significant amount time for large directory trees.

The inotify API provides no information about the user or process that triggered the inotify event. In particular, there is no easy way for a process that is monitoring events via inotify to distinguish events that it triggers itself from those that are triggered by other processes.

Note that the event queue can overflow. In this case, events are lost. Robust applications should handle the possibility of lost events gracefully.

The inotify API identifies affected files by filename. However, by the time an application processes an inotify event, the filename may already have been deleted or renamed.

If monitoring an entire directory subtree, and a new subdirectory is created in that tree, be aware that by the time you create a watch for the new subdirectory, new files may already have been created in the subdirectory. Therefore, you may want to scan the contents of the subdirectory immediately after adding the watch.

On most systems, the number of WatchService instances you can open is limited. Exceeding this limit can sometimes produce vague errors like "network BIOS command limit reached". However, you'll almost always work with a single WatchService per application, so this should rarely become an actual problem. Nevertheless, keep in mind the fact that the WatchService is another I/O resource (just like a Stream or Channel) and make sure that you invoke close() on it after use, to avoid resource exhaustion.

In addition to the basic watch event types (ENTRY_CREATE, ENTRY_MODIFY and ENTRY_DELETE), there's a fourth event type, OVERFLOW. This corresponds to the case where the event queue for a given WatchKey has overflown, causing the loss of events as explained by inotify. For example, under default Linux settings, a WatchKey will produce an OVERFLOW if its queued event count exceeds 512. Although this limit is configurable, it should be remembered that an overflow is not entirely unavoidable, and should be taken into considerable in all mission-critical implementations.

The Path (more correctly, Watchable instance) associated with a WatchKey can be retrieved via its watchable() method. However, the watchable() is bound to the system representation of the directory (e.g. inode on Linux), and not the actual (literal/textual) path. Hence, even if we change the name of a directory in the path, the value of watchable() would remain at the old value, eventually leading to failure when we try to use watchable()'s value as a valid Path. Hence it is necessary to constantly keep track of changes in directory paths and update the local 'image' of the filesystem accordingly. Fortunately, a folder rename or move (that would result in the invalidation of a corresponding watchable() value) is always accompanied by an ENTRY_DELETE event for that directory, so we can quickly identify and take action before things get messed up.

Events are not guaranteed to arrive in any particular order. For example, if you delete a directory containing a subdirectory and being watched by watch service W, located inside a directory which is also being watched by W (so there are is a 3-level directory hierarchy), deletion of the middle directory may be notified before that of the inner directory, although logically they should have happened in the opposite order. To make things worse, if the innermost directory was also being watched, it might generate a bogus notification (i.e. one containing zero events) during deletion.

Every operation is decomposed into the 3 basic event types. For example, renaming a file or directory inside a watched parent directory will trigger an ENTRY_DELETE (oops, that file is gone!) followed by an ENTRY_CREATE (hey, a new file is here!). This can be thought of as a new way of looking at the renaming process (deletion followed by creation), but it does not reflect the actual operation (inode metadata update) that would be happening behind the scenes.

A more strange (albeit perfectly OK) scenario happens if a file is moved into or out of a watched directory. As only the outermost directory entry gets changed 'effectively', you only get a notification for the outermost directory that got moved. Now, if there were any watched subdirectories inside the moved directory, they would still be 'active' but their watchable()s would no longer be valid; from that point onwards, only the relative path provided by the corresponding watch key would be valid, and you will have to manually prepend it with the new parent path to get the correct full path.

Enough talking; let's demonstrate all this Greek with a simple directory hierarchy!

|  +--d
|  |  +--g.txt
|  |
|  +--e
|     +--h.log

Assume that all directories except f are being watched. See if you can deduce what is going on behind the scenes.

Operation Resulting events
WatchKey's watchable() WatchEvent's kind() WatchEvent's context()
delete c a ENTRY_DELETE c
(f deleted silently; order of other events is unpredictable)
move c into b a ENTRY_DELETE c
move e into f (not included here), then delete f c ENTRY_DELETE f
(f deleted silently; event order unpredictable)
rename c as j a ENTRY_DELETE c
delete d d ENTRY_DELETE g.txt

Moral of the story? Be careful and thoughtful when using the WatchService API. It's a golden sword that can cut you badly if handled in the wrong way.

JS Obfuscation (Part 2) - Countermeasures

Despite how cryptic JS obfuscation may sound, deobfuscating JS is only slightly more difficult than the obfuscating part.

  • Decompaction
  • Since this is exactly the job of a beautifier, any popular JS beautifier can get a whitespace-trimmed, cryptic JS one-liner back in good shape in a split second. Online Javascript Beautifier is my favourite, as it can process entire webpages (HTML+CSS+JS) as well, and even supports some JS deobfuscation features.

  • Packer
  • Ironically enough, packer itself has a deobfuscation feature. Although it's disabled by default, you can easily enable it by stripping the disabled attribute off the lower textarea control, using the Inspect Element feature available on web browsers. After this hack, you can paste the packed code into the lower textarea and use the Decode button to get it de-obfuscated.

  • Token replacement
  • I haven't yet come across a straightforward way to defend this trick, although there's most probably an extremely simple way. Yet, the JS consoles of most web browsers can help you what each set of tokens would concatenate into. All I can say is that it's just dead painful to copy and paste each set of tokens to see what they mean, so some regex and text manipulation magic may often come in handy.

  • Experience: It's the Key!
  • As with anything else, when it comes to deciphering a JS puzzle, what matters most is prior experience and, more importantly, attention to the tiniest levels of detail. Taking a 'snapshot' of the current JS context of the browser (window) can sometimes come in handy. The following script can do the trick, when run in the browser console:

    var keys = Object.keys(window);
    var backup = {}
    for(i in keys) {
    	var key = keys[i];
    	backup[key] = window[key];

    If you want to compare this snapshot with a later state of the context, the following script will help:

    var keys = Object.keys(window);
    for(i in keys) {
    	var key = keys[i];
    	if(backup[key] != window[key])
    		console.log(key + ': ' + backup[key] + ' -> ' + window[key]);

    Examining HTTP requests sent out by the web page can also provide hints regarding what might be going on inside the page, behind the curtain of obfuscation. Most modern browsers provide a network request tab in their developer consoles, while there are popular analyzer plug-ins like FireBug as well.

  • Circumvention
  • Although JS is nearly taken for granted in modern web apps, some old (or more "compatible") pages still contain <noscript> tags to tackle non-JS environments. No matter how complicated the corresponding JS may look, the <noscript> tag always has to use plain old HTML (and CSS) to accomplish the same goal. An excellent example is the CAPTCHA component used on many pages; the cryptic JS fragment used to dynamically load and validate the CAPTCHA is often accompanied by a plain <noscript> tag that simply contains the URL of a CAPTCHA image (usually in an <img> tag) and a form that simply POSTs the user input to the solution URL.

  • Situation-specific Strategies (ඌරගෙ මාලු උෟරගෙ ඇඟේ තියල කැපීම)
  • Once I came across a page (on a popular PTC site) that had gone to such lengths as to dynamically generate a JS fragment by concatenating an array of numbers (ASCII codes), which then sends another HTTP request for a different obfuscated script file, and evaluates the resulting content to obtain the actual JS source, which then gets evaluated yet again to bring about the desired functionality. I simply copied the initial script and ran it 'as-is', removing only the outermost (last) eval(), to obtain the final JS source that first appeared to be impossible to intercept. :) Sometimes, all that is required to defeat JS obfuscation is a little bit of strategy.

    Please note that some keywords used in this article may not correspond to their actual technical meanings, as most of the scenarios are my personal experiences and not validated against standard 'hacking' literature.

Monday, March 2, 2015

JS Obfuscation - The Nuts and Bolts

Whether you're a professional WWW hacker or just an average WWW kiddie like me, JavaScript (JS) will definitely play a huge role in your line of work. JS is what gives life to almost anything on a web page—save for some fancy CSS, a web page without JS would be plain lifeless.

The great thing about JS is that it's browser-interpreted, so the source code is already in your browser, ready for examination and sometimes even modification. This may sound cool to the client-side hacker but it's a nuisance for the server-side programmer and security team; a part of the application logic is getting leaked to the client side, all ready for exploitation!

As a result, many websites now use obfuscation techniques to scramble their JS sources. JS programmers use the concept of metaprogramming—creating a program that mutates itself to become something else—to devise elaborate tricks for hiding their logic inside cryptic, uncommented, space-stripped blocks of sphagetti code. The browser doesn't give a damn, however, and gracefully executes this sphagetti metaprogram which eventually does the entire job as nicely as the original code (perhaps at the cost of a few extra milliseconds, but who cares?).

These are some of the obtuscation 'techniques', which I have come across, while examining obfuscated JS:

  • eval()
  • This native JS function executes whatever string passed to it, on the JS engine, just as if it were regular inline JS code. eval(['a','l','e','r','t','(','"','H','e','l','l','o','"',')'].join('')) will display an alert "Hello" on the browser. This is often the core trick of more advanced techniques.

  • Packer, and similar code compacting tools
  • /packer/ is a service that can convert an arbitrary JS fragment to a "packer" function, often with the signature function(p,a,c,k,e,r) which, when eval()'d, produces the same result as the original fragment. Inside the function is a total mess for the naked eye, with a mixture of math operations, concatenations, arrays of keyword and function names, nested eval()s, and more. There are other similar converters as well, doing the same thing with slightly different algorithms. Many PTC websites including Neobux and the late Probux use this technique to shield their ad view and validation logic from prying eyes (but not from mine ;-)).

  • Token replacement
  • A list of textual tokens of 2-3 characters each, often having cryptic names, are declared. Subsequent code uses various combinations of these tokens to invoke desired operations. For example,

    a2x = 'do'
    n9p = 'ran'
    bb7 = 'm'
    alert(Math[n9p + a2x + bb7]())
    will produce an alert with a random number. I have observed this technique extensively in AdFly.

  • Plain compaction
  • To be honest, this is not obfuscation. However, given the possibility of complex syntactic tricks of JS, removing whitespace can make JS code highly unreadable and cryptic; yet it's just an illusion devised to frighten away inexperienced script kiddies.

Of course, this list is not exhausive; given the flexibility of JS, there are infinitely many possibilities of obfuscating the same piece of code. Nevertheless, all of them are based on the fundamentals—confusion, illusion and eval()—as they are meant for the man, not for the beast machine.