Java Garbage Collector and Reference Objects

garbagecollection4

Introduction

In this article, we will discuss a few memory management concepts in Java with a heavy focus on the interaction between the Garbage Collector and the different reference objects available.

This is no introduction, let us mutually agree that you have Java Heap and GC basics down. Many articles cover this topic quite well, you may actually wonder why am covering something that is already well discussed over the www.

#1. I found that most articles introduce Java memory very well, but start choking later on stuff like reference objects. It is either a variant of “Who doesn’t know that stuff?” attitude or author fatigue. I would like to give my shot and, hopefully, not fall in the same pool.

#2. Memory is a gold mine of interview questions for Senior Dev positions. “Java manages its own memory, I really don’t have to know how it does it”, nice story for your neighbor or close friend…good luck convincing that mean looking interviewer. Morale: You must understand memory.

Analogy

Analogies make understanding computing concepts short and sweet, full of oooh! and uh huh! moments, I hope you experience nothing less.

Picture your college cafeteria. Plates are scarce, but the manager is smart, he derives a strategy together with his staff with one aim: to feed all the hungry students in time without anyone missing food due to shortage of plates.

Strategy0: they plan to collect all used plates for washing whenever a student finishes eating and leaves the cafeteria, such that the same plates are reused by those that have not yet eaten.

Therefore, whenever the serving team reports that plates are running out, a dedicated waitress is sent out to collect all used plates. A plate will be collected as long as the student that was using it has left the cafeteria. These plates are then washed and added to the pile to serve more students.

This strategy works very well and Mr. Manager is pleased with himself and his staff. Soon, he realizes that some students finish eating but stick around chatting and laughing with friends. He had told his collector waitress to only collect a plate when a student has left the table. The result, a lot of dirty plates stuck on tables simply because satisfied students are still seated there meanwhile the cafeteria is frequently running into plate crisis.

Strategy1: Mr. Smart manager has another trick up his sleeve. His instruction to the collector waitress changes: collect a plate as long as a student has finished eating, regardless of whether he/she is still seated on the table or not. Word reaches students of the new and “hostile” changes, a lot of special groups come with complaints seeking for a compromise with Mr. Manager. This strategy is marked a failure.

Strategy2: Being the smart man he is, he comes up with yet more brilliant ideas and he now draws up an instruction set for any collector waitress:

  • If a student is a prefect or guild rep, allow them to retain their plate for as long as they like until they explicitly call you to collect it or they leave the table.
  • If a student is a girl friend or boyfriend to a prefect or guild rep, allow them privileges of their partner… meaning, even if they have finished eating but are still on the table, don’t rush to collect their plates unless there is absolutely no alternative and the cafeteria has literally reached its limit in terms of available plates.
  • If the student is no prefect and not in a relationship with any prefect as is the case with most freshers, be very vigilant about collecting his/her plate, no delays, no compromise. Whether they are still seated on the table or not, as long as they have finished eating…even if they beg and cry. They are not allowed any kind of privilege whatsoever.
  • Finally, the college doc has sent me a list of diabetic students and needs to know the exact time they had their last meal every day in order to determine when to do the routine blood sugar level check. Collect their plates as you do for freshers but always note down the exact time when you pick the plate and notify me.

This turned out to be an excellent strategy, talk about iterative problem solving πŸ™‚ .

References to objects in Java follow a certain hierarchy of strength and privilege as do the students in the cafeteria. Let us dive into the technical rundown.

Strong Reference

In every Java program, objects are the way to hold and manipulate data:

StringBuilder sb = new StringBuilder();

In the above snippet, the new key word creates a StringBuilder object and stores a strong reference to it in variable sb. This is the default level of strength for all objects we create, so we don’t use any special label to identify them as we do for the rest.

To create the rest of the references in the hierarchy, we need special wrapper objects which reside in the java.lang.ref package.

Any object with at least one strong reference is not eligible for garbage collection. In our analogy, prefects hold strong references to their plates. In technical terms, we say the object is strongly reachable.

It only becomes eligible for collection when we nullify its reference, akin to a prefect leaving the table:

sb = null;

Soft Reference

A soft reference can be created to an object by wrapping its instance in a java.lang.ref.SoftReference object:


StringBuilder sb = new StringBuilder();

SoftReference<StringBuilder> sbSoftRef = new SoftReference<>(sb);

sb = null;

In the above snippet, the first line creates a string builder object with a strong reference stored in sb. The second line creates a soft reference to it inside sbSoftRef so that now the string builder object has two references.

At this stage, the string builder is not eligible for collection whatsoever. However, the third line nullifies the strong reference and now the object only boasts of a soft reference.

The object is now similar to a used plate sitting infront of the boyfriend/girlfriend of a prefect, it can be collected only as a last resort when the cafeteria staff is absolutely sure there are no more plates available. Technically, we say the object is softly reachable.

In this phase, we can still retrieve a strong reference to the object by calling the get method of the SoftReference object, which returns null if the object is already collected:

sb = sbSoftRef.get();

Objects with only soft references to it are collected if and only if the JVM has concluded that there is no more memory to allocate to new objects and is on the brink of throwing an OutOfMemory error.

Soft references are intended for use in memory-sensitive caches. As the cache grows, available memory for new objects reduces yet you need the cache, so the JVM compromises with “you” till it can’t anymore.

Weak Reference

A weak reference can be created to an object by wrapping its instance in a java.lang.ref.WeakReference object:


StringBuilder sb = new StringBuilder();

WeakReference<StringBuilder> sbWeakRef = new WeakReference<>(sb);

sb = null;

In the above snippet, after nullifying the strong reference in the third line, the object immediately becomes eligible for GC.

The object is now similar to a used plate sitting infront of a fresher with no privileges, it can be collected right away without any regard for whether the student is still seated at the table or not, his luck is that the waitress has not come his way. Technically, we say the object is weakly reachable.

Though we can still retrieve a strong reference to the object, the window of opportunity is way smaller than that of soft reference, we will get a null much more often than in the other cases:

sb = sbWeakRef.get();

Objects with only weak references are collected eagerly by the the GC whether memory is tight or not.

Weak references are intended for use in Canonicalized mapping. Just 5 minutes of googling will give you understanding of this use case, so I will not dwell…

Phantom Reference

A phantom reference can be created to an object by wrapping its instance in a java.lang.ref.PhantomReference object:


StringBuilder sb = new StringBuilder();

ReferenceQueue<StringBuilder> refQ = new ReferenceQueue<>();

PhantomReference<StringBuilder> sbPhantomRef = new PhantomReference<>(sb, refQ);

sb = null;

In the above snippet, after nullifying the strong reference in the fourth line, the object immediately becomes eligible for GC. Ignore the ReferenceQueue object, that’s for later. For now just remember that PhantomReference unlike soft and weak reference, is useless without a ReferenceQueue.

The object is now similar to a used plate sitting infront of a fresher who is diabetic, it can be collected right away without any regard for whether the student is still seated at the table or not, albeit with one condition: the exact time of collection is noted and the doc notified.

For as long as the piece of paper on which the time has been noted has not yet been used and discarded by the doctor, we refer to the object as phantom reachable.(Major point of confusion, continue reading but watch out πŸ™‚).

The get method of PhantomReference object is useless because it always returns null. This further strengthens the uniqueness of this reference object vis-a-vis soft and weak refs. The next sections will make this uniqueness even more crystal clear.

Phantom references are intended for use as a more flexible alternative to the the Object.finalize() method…read on!

Reference Queue

True to its name, a reference queue is a data structure that enqueues reference objects namely: WeakReference, SoftReference and PhantomReference.

Whether a reference object is enqueued at any time or not depends on whether we provide a ReferenceQueue object on creation of the reference object. Apart from PhantomReference, it is not mandatory or even useful to provide one.

Depending on the type of reference object, the exact point at which it is enqueued varies. I don’t intend to dwell much on this subject apart from furthering the coverage of phantom references. Read the next paragraph very keenly πŸ™‚.

A phantom reference is enqueued as soon as the object it is referring to has been finalized by the the garbage collector. “Finalized” means its finalize() method has been called. The GC calls the finalize() method of objects just before collecting them for the benefit of the application that created them. The benefit is the opportunity to release non “GC’eable” resources that the object must have created or used during its earthly life πŸ™‚ (read Java heap life). Non “GC’eable” resources are those that don’t leave on Java heap and therefore, are inaccessible to the GC. One obvious example is a file handle provided by the OS. To demonstrate, take a look at the finalize method of FileInputStream class in all its glory:


protected void finalize() throws IOException {
if ((fd != null) &&Β  (fd != FileDescriptor.in)) {
/* if fd is shared, the references in FileDescriptor
* will ensure that finalizer is only called when
* safe to do so. All references using the fd have
* become unreachable. We can call close()
*/
close();
}
}

Notice the close() method call at the end, look familiar??? yes, that fis.close() call your Java teacher told you to always place in the finally clause of try/catch/finally blocks. Wonder whether it is a duplicate call since you always close()‘d your file resources? Great, read on!

Now notice the if check at the top of the method body. If you already called close() inside your code, then fd in the above code would be null and the rest of the method body would not be executed. Phewww!!!! (did that come too late???). Read on!

It so happens that, these days, the finalize() method, especially in library/platform classes such as FileInputStream above, acts as a safety net for cases where we as developers are reckless and forget to release non-heap resources.(I may already by whispering to you that you may never have to use phantom refs πŸ™‚ but read on).

So what the heck does all this have to do with phantom references and reference queues, am I losing focus??? you ask and I will now answer:

The finalize() method is plagued with numerous issues that counter almost all the advantages it sought to offer. There is actually a detailed coverage in Joshua Bloch’s Effective Java book plus several blogs talk about its problems extensively. Don’t think that they are just staining the image of an otherwise brilliant API. Unless you are an upcoming James Gosling, I would highly recommend that you just take their word and make peace with other alternatives for cleaning up after your resources…READ ON, WE ARE ALMOST DONE :).

We can now finalize() this section with why phantom refs and ref queues come in. PhantomReference object was created to “provide a more flexible alternative to finalize()”. Whether it has succeeded or not is quite a huge topic, but my bet is NO.

How we as programmers do this is by supplying a ReferenceQueue instance while creating a PhantomReference instance as we saw earlier. As soon as the finalize() method of the object is called, its phantom ref is enqueued. It is upon us to keep polling the reference queue to keep track of when this event happens. Herein, is where we manually release resources including a final call to phantomRef.clear() method to nullify the underlying reference.

Differences Put Together

I know the title of this section is kinda lame but bear with me. I will try to put together the differences of the three reference objects just to clear any doubts:

  • Enqueuing: Soft and weak reference objects are enqueued as soon as the GC, from its algorithms, has “marked” their memory objects for collection, but has not yet finalized them. On the other hand, phantom references are enqueued as soon as the memory objects have been finalized.
  • Reference.get(): The get method of both soft and weak references returns a strong reference to the underlying object or null if that object has already been marked for collection. On the other hand, the get method of phantom reference always returns null (whether you are Josh Bloch or Jon Skeet calling it, kidding ;)). This difference strikes a key point about their use cases, for weak and soft refs to be of any use, one should be able to recreate a strong ref to their objects within the “window of opportunity”. But phantom refs don’t need this quality as we explored earlier.
  • Reference.clear(): This method is available in all the three ref objects. It sets the underlying reference to null hence explicitly availing the memory object for collection before its time. This is a useless facility for weak and soft refs as it would counter their benefit.Β  But it is key for phantom refs as you have to call it on Mr. phantom at the end of your clean up steps.
  • ReferenceQueue: it is useless for soft and weak refs but without it, phantom refs are useless.
  • Use cases: Soft refs: memory-sensitive cache, weak-refs: canonicalized mapping, phantom-refs: more flexible substitute for finalize().

Conclusion

In this article, albeit very long, I have tried to discuss my findings on the relationship between the GC and reference objects. I hope it has, at least, given you a better understanding of the reference objects specifically and the GC generally.

It goes without saying that you may never have to deal with these reference types directly:

  • Need memory-sensitive caching? (a very complex and meticulous endeavor btw to get right), use solid and battle-hardened libs like ehcache.
  • Need canonicalized mapping? simply use java.util.WeakHashMap and save yourself hair loss.
  • Need to do pre-mortem clean up? don’t be deceived by Mr. phantom’s offering as opposed to finalize(), stick to age-old try/catch/finally.