Foreword: The two issues described here, were discovered and fixed more than a year ago. This article only serves as historical proof, and a beginners' guide on tackling file descriptor leaks in Java.
In Ultra ESB we use an in-memory RAM disk file cache for fast and garbage-free payload handling. Some time back, we faced an issue on our shared SaaS AS2 Gateway where this cache was leaking file descriptors over time. Eventually leading to too many open files
errors when the system ulimit
was hit.
The Legion of the Bouncy Castle: leftovers from your stream-backed MIME parts?
One culprit, we found, was Bouncy Castle - the famous security provider that had been our profound love since the Ultra ESB Legacy days.
With some simple tooling we found that BC had the habit of calling getContent()
on MIME parts in order to determine their type (say, instanceof
checks). True, this wasn't a crime in itself; but most of our MIME parts were file-backed, with a file-cache file on the other end - meaning that every getContent()
opens a new stream to the file. So now there are stray streams (and hence file descriptors) pointing to our file cache.
Enough of these, and we would exhaust the file descriptor quota allocated to the Ultra ESB (Java) process.
Solution? Make 'em lazy!
We didn't want to mess with the BC codebase. So we found a simple solution: create all file-backed MIME parts with "lazy" streams. Our (former) colleague Rajind wrote a LazyFileInputStream
- inspired by LazyInputStream
from jboss-vfs
- that opens the actual file only when a read
is attempted.
BC was happy, and so was the file cache; but we were the happiest.
Hibernate JPA: cleaning up after supper, a.k.a closing consumed streams
Another bug we spotted was that some database operations were leaving behind unclosed file handles. Apparently this was only when we were feeding stream-backed blobs to Hibernate, where the streams were often coming from file cache entries.
After some digging, we came up with a theory that Hibernate was not closing the underlying streams of these blob entries. (It made sense because the java.sql.Blob
interface does not expose any methods that Hibernate could use to manipulate the underlying data sources.) This was a problem, though, because the discarded streams (and the associated file handles) would not get released until the next GC.
This would have been fine for a short-term app, but a long-running one like ours could easily run out of file descriptors; such as in case of a sudden and persistent spike.
Solution? Make 'em self-closing!
We didn't want to lose the benefits of streaming, but we didn't have control over our streams either. You might say we should have placed our streams in auto-closeable constructs (say, try-with-resources). Nice try; but sadly, Hibernate was reading them outside of our execution scope (especially in @Transactional
flows). As soon as we started closing the streams within our code scope, our database operations started to fail miserably - screaming "stream already closed!".
When in Rome, do as Romans do, they say.
So, instead of messing with Hibernate, we decided we would take care of the streams ourselves.
Rajind (yeah, him again) hacked together a SelfClosingInputStream
wrapper. This would keep track of the amount of data read from the underlying stream, and close it up as soon as the last byte was read.
(We did consider using existing options like AutoCloseInputStream
from Apache commons-io
; but it occurred that we needed some customizations here and there - like detailed trace logging.)
Bottom line
When it comes to resource management in Java, it is quite easy to over-focus on memory and CPU (processing), and forget about the rest. But virtual resources - like ephemeral ports and per-process file descriptors - can be just as important, if not more.
Especially on long-running processes like our AS2 Gateway SaaS application, they can literally become silent killers.
You can detect this type of "leaks" in two main ways:
- "single-cycle" resource analysis: run a single, complete processing cycle, comparing resource usage before and after
- long-term monitoring: continuously recording and analyzing resource metrics to identify trends and anomalies
In any case, fixing the leak is not too difficult; once you have a clear picture of what you are dealing with.
Good luck with hunting down your resource-hog d(a)emons!
No comments:
Post a Comment
Thanks for commenting! But if you sound spammy, I will hunt you down, kill you, and dance on your grave!