Tuesday, October 15, 2019

JAR File Handles: Clean Up After Your Mess!

In Ultra ESB we use a special hot-swap classloader that allows us to reload Java classes on demand. This allows us to literally hot-swap our deployment units - load, unload, reload with updated classes, and phase-out gracefully - without restarting the JVM.

Hot swap, baby!

Windows: supporting the forbidden land

In Ultra ESB Legacy the loader was working fine on Windows, but on the newer X-version it seemed to be having some hiccups. We are not supporting Windows as a target platform, so it didn't matter much - until recently, when we decided to support non-production distros on Windows. (Our enterprise integration IDE UltraStudio runs fine on Windows, so Windows devs, you are all covered.)

AdroitLogic UltraStudio - Your Enterprise Integration IDE

TDD FTW

Fixing the classloader was a breeze, and all tests were passing; but I wanted to back my fixes up with some extra tests, so I wrote a few new ones. Most of these involved creating a new JAR file in a subditrectory under the system temp directory, and using the hot-swap classloader to load different artifacts that were placed inside the JAR. For extra credit on best practices, I also made sure to add some cleanup logic to delete the temp subdirectory via FileUtils.deleteDirectory().

And then, things went nuts.

And the tear-down was no more.

All tests were passing, in both Linux and Windows; but the final tear-down logic was failing in Windows, right at the point where I delete the temp subdirectory.

Being on Windows, I didn't have the luxury of lsof; fortunately, Sysinternals already had just the thing I needed: handle64.

Finding the culprit was pretty easy: hit a breakpoint in tearDown() just before the directory tree deletion call, and run a handle64 {my-jar-name}.jar.

Bummer.

My test Java process, was holding a handle to the test JAR file.

Hunting for the leak

No. Seriously. I didn't.

No. Seriously. I didn't.

Naturally, my first suspect was the classloader itself. I spent almost half an hour going over the classloader codebase again and again. No luck. Everything seemed rock solid.

The "leak dumper"; a.k.a my Grim Reaper for file handles

My best shot was to see what piece of code had opened the handler to the JAR file. So I wrote a quick-n-dirty patch to Java's FileInputStream and FilterInputStream that would dump acquire-time stacktrace snapshots; whenever a thread holds a stream open for too long.

This "leak dumper" was partly inspired by our JDBC connection pool that detects unreleased connections (subject to a grace period) and then dumps the stacktrace of the thread that borrowed it - back at the time it was borrowed. (Kudos to Sachini, my former colleague-intern at AdroitLogic.)

The leak, exposed!

Sure enough, the stacktrace revealed the culprit:

id: 174 created: 1570560438355
--filter--

  java.io.FilterInputStream.<init>(FilterInputStream.java:13)
  java.util.zip.InflaterInputStream.<init>(InflaterInputStream.java:81)
  java.util.zip.ZipFile$ZipFileInflaterInputStream.<init>(ZipFile.java:408)
  java.util.zip.ZipFile.getInputStream(ZipFile.java:389)
  java.util.jar.JarFile.getInputStream(JarFile.java:447)
  sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:162)
  java.net.URL.openStream(URL.java:1045)
  org.adroitlogic.x.base.util.HotSwapClassLoader.loadSwappableClass(HotSwapClassLoader.java:175)
  org.adroitlogic.x.base.util.HotSwapClassLoader.loadClass(HotSwapClassLoader.java:110)
  org.adroitlogic.x.base.util.HotSwapClassLoaderTest.testServiceLoader(HotSwapClassLoaderTest.java:128)
  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  java.lang.reflect.Method.invoke(Method.java:498)
  org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:86)
  org.testng.internal.Invoker.invokeMethod(Invoker.java:643)
  org.testng.internal.Invoker.invokeTestMethod(Invoker.java:820)
  org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1128)
  org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
  org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
  org.testng.TestRunner.privateRun(TestRunner.java:782)
  org.testng.TestRunner.run(TestRunner.java:632)
  org.testng.SuiteRunner.runTest(SuiteRunner.java:366)
  org.testng.SuiteRunner.runSequentially(SuiteRunner.java:361)
  org.testng.SuiteRunner.privateRun(SuiteRunner.java:319)
  org.testng.SuiteRunner.run(SuiteRunner.java:268)
  org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
  org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
  org.testng.TestNG.runSuitesSequentially(TestNG.java:1244)
  org.testng.TestNG.runSuitesLocally(TestNG.java:1169)
  org.testng.TestNG.run(TestNG.java:1064)
  org.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:72)
  org.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:123)

Gotcha!

  java.io.FilterInputStream.<init>(FilterInputStream.java:13)
  ...
  sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:162)
  java.net.URL.openStream(URL.java:1045)
  org.adroitlogic.x.base.util.HotSwapClassLoader.loadSwappableClass(HotSwapClassLoader.java:175)

But still, that didn't tell the whole story. If URL.openStream() opens the JAR, why does it not get closed when we return from the try-with-resources block?

        try (InputStream is = jarURI.toURL().openStream()) {
            byte[] bytes = IOUtils.toByteArray(is);
            Class<?> clazz = defineClass(className, bytes, 0, bytes.length);
            ...
            logger.trace(15, "Loaded class {} as a swappable class", className);
            return clazz;

        } catch (IOException e) {
            logger.warn(16, "Class {} located as a swappable class, but couldn't be loaded due to : {}, " +
                    "trying to load the class as a usual class", className, e.getMessage());
            ...
        }

Into the wild: JarURLConnection, URLConnection, and beyond

Thanks to Sun Microsystems who made it OSS, I could browse through the JDK source, right up to this shocking comment - all the way down, in java.net.URLConnection:

    private static boolean defaultUseCaches = true;

   /**
     * If <code>true</code>, the protocol is allowed to use caching
     * whenever it can. If <code>false</code>, the protocol must always
     * try to get a fresh copy of the object.
     * <p>
     * This field is set by the <code>setUseCaches</code> method. Its
     * value is returned by the <code>getUseCaches</code> method.
     * <p>
     * Its default value is the value given in the last invocation of the
     * <code>setDefaultUseCaches</code> method.
     *
     * @see     java.net.URLConnection#setUseCaches(boolean)
     * @see     java.net.URLConnection#getUseCaches()
     * @see     java.net.URLConnection#setDefaultUseCaches(boolean)
     */
    protected boolean useCaches = defaultUseCaches;

Yep, Java does cache JAR streams!

From sun.net.www.protocol.jar.JarURLConnection:

    class JarURLInputStream extends FilterInputStream {
        JarURLInputStream(InputStream var2) {
            super(var2);
        }

        public void close() throws IOException {
            try {
                super.close();
            } finally {
                if (!JarURLConnection.this.getUseCaches()) {
                    JarURLConnection.this.jarFile.close();
                }
            }

        }
    }

Gotcha!

If (well, because) useCaches is true by default, we're in for a big surprise!

Let Java cache its JARs, but don't break my test!

JAR caching would probably improve performance; but does that mean I should stop cleaning up after - and leave behind stray files after each test?

(Of course I could say file.deleteOnExit(); but since I was dealing with a directory hierarchy, there was no guarantee that things would get deleted in order, and undeleted directories would be left behind.)

So I wanted a way to clean up the JAR cache - or at least purge just my JAR entry; after I am done, but before the JVM shuts down.

Disabling JAR caching altogether - probably not a good idea!

URLConnection does offer an option to avoid caching connection entries:

   /**
     * Sets the default value of the <code>useCaches</code> field to the
     * specified value.
     *
     * @param   defaultusecaches   the new value.
     * @see     #getDefaultUseCaches()
     */
    public void setDefaultUseCaches(boolean defaultusecaches) {
        defaultUseCaches = defaultusecaches;
    }

It would have been perfect if caching could be disabled per file/URL, as above; our classloader caches all entries as soon as it opens a JAR, so it never needs to open/read that file again. However, once a JAR is open, caching cannot be disabled on it; so once our classloader has opened the JAR, there's no getting rid of the cached file handle - until the JVM itself shuts down!

URLConnection also allows you to disable caching by default for all subsequent connections:

   /**
     * Sets the default value of the <code>useCaches</code> field to the
     * specified value.
     *
     * @param   defaultusecaches   the new value.
     * @see     #getDefaultUseCaches()
     */
    public void setDefaultUseCaches(boolean defaultusecaches) {
        defaultUseCaches = defaultusecaches;
    }

However, if you disable it once, the whole JVM could be affected from that moment onwards - since it probably applies to all URLConnection-based implementations. As I said before, that could hinder performance - not to mention deviating my test from cache-enabled, real-world behavior.

Down the rabbit hole (again!): purging manually from the JarFileFactory

The least-invasive option is to remove my own JAR from the cache, when I know I'm done.

And good news, the cache - sun.net.www.protocol.jar.JarFileFactory - already has a close(JarFile) method that does the job.

But sadly, the cache class is package-private; meaning there's no way to manipulate it from within my test code.

Reflection to the rescue!

Reflection it is.

Thanks to reflection, all I needed was one little "bridge" that would access and invoke jarFactory.close(jarFile) on behalf of me:

class JarBridge {

    static void closeJar(URL url) throws Exception {

        // JarFileFactory jarFactory = JarFileFactory.getInstance();
        Class<?> jarFactoryClazz = Class.forName("sun.net.www.protocol.jar.JarFileFactory");
        Method getInstance = jarFactoryClazz.getMethod("getInstance");
        getInstance.setAccessible(true);
        Object jarFactory = getInstance.invoke(jarFactoryClazz);

        // JarFile jarFile = jarFactory.get(url);
        Method get = jarFactoryClazz.getMethod("get", URL.class);
        get.setAccessible(true);
        Object jarFile = get.invoke(jarFactory, url);

        // jarFactory.close(jarFile);
        Method close = jarFactoryClazz.getMethod("close", JarFile.class);
        close.setAccessible(true);
        //noinspection JavaReflectionInvocation
        close.invoke(jarFactory, jarFile);

        // jarFile.close();
        ((JarFile) jarFile).close();
    }
}

And in my test, I just have to say:

        JarBridge.closeJar(jarPath.toUri().toURL());

Right before deleting the temp directory.

Yay!

So, what's the take-away?

Nothing much for you, if you are not directly dealing with JAR files; but if you are, you might run into this kind of obscure "file in use" errors. (That would hold true for other URLConnection-based streams as well.)

If you happen to be as (un)lucky as I was, just recall that some notorious blogger had written some hacky "leak dumper" patch JAR that would show you exactly where your JAR (or non-JAR) leak is.

Adieu!

No comments: