Built-in Garbage Collection in Java:

Rupam Jha
8 min readNov 28, 2021

This article will be covering a complete functionality of how the garbage collector works in a JVM. Moreover it will also brief in about the types and algorithms used by Garbage collectors.

Hello Comrades,

To begin with, let us understand that the Garbage Collection [GC] is a built-in process by which the Java Virtual Machine [JVM] automatically clears the unused object from the heap to reclaim the heap space; there by enabling faster development with less boilerplate code. One might be thinking here that garbage collection collects and discards dead objects; whereas in reality, GC is doing the opposite! Live objects are tracked and everything else designated as garbage.

The Heap space :

As mentioned above; that GC clears the unused objects from “ heap space” , lets understand here what a heap space is…Heap space is a space in memory for the dynamic memory allocation for Java objects and JRE classes at the runtime. New objects are always created in heap space and their references are stored in stack memory. These objects have global access and can be accessed from anywhere in the entire of application.[ for complete understanding on memory allocation refer- https://medium.com/javamadetranquil/memory-allocation-in-java-fb2d4498a3ed ].In most configurations the OS allocates the heap in advance to be managed by the JVM while the program is running; which has ramifications as mentioned — here the object creation gets faster as the global synchronisation with the OS is not needed for every single object. A single allocation simply claims some portion of a memory array and moves the offset pointer ahead and the next allocation starts at this particular mentioned offset and thereby it claims the next portion of the memory array. When an object is no longer used, GC reclaims the underlying memory and reuses it for future object allocation; which means there is no explicit deletion and no portion of memory is given back to the OS . All objects are therefore allocated on the heap space managed by the JVM. As long as an object is being referenced, the JVM considers it alive. Once an object is no longer referenced and therefore is not reachable by the application code, GC removes it and reclaims the unused memory.

It henceforth might raise a question: >>what is the first reference to this entire tree of GC ?

Every object must have one or more root objects. As long as the application can reach to those roots, the whole tree becomes reachable. So how and when are those root objects considered reachable? Special objects called garbage-collection roots (GC roots) are always reachable and so is any object that has a garbage-collection root at its own root. A simple Java application has the following GC roots — Local variables in the main method ; The main thread and the Static variables of the main class.

Role of JIT Compiler in assisting GC:

GC is usually implemented as a single-thread or a set of threads; it requires extra memory and consumes CPU resources just like any other applications. In the traditional implementation of GC , when GC threads work, JVM pauses all application threads until the collection is completed. This is known as a stop-the-world collection. If the collection time is too long, it will make a great impact on the executing efficiency of applications. In order to reduce the space and time cost of GC, other than further improving the garbage collection algorithms, a more effective approach is presented to provide runtime and compiler support for garbage collector. That is to say, Just-in-time (JIT) compiler analyses the applications to instrument the explicit “free” instructions and even the special object allocation instructions to assist GC in improving the object allocation and reclamation.

Working of GC in JVM:

GC follows the basic process wherein the first step, the unreferenced objects are identified and marked as ready for garbage collection. In the second step, marked objects are deleted. Optionally, memory can be compacted after the GC deletes objects, so remaining objects are in a contiguous block at the start of the heap. The compaction process makes it easier to allocate memory to new objects sequentially after the block of memory allocated to existing objects. GC implements a generational garbage collection strategy that categorizes objects by “age”. The rational behind generational garbage collection is that most objects are short-lived and will be ready for garbage collection soon after creation. The generation concept is a portion of heap space ,where the heap space is divided into:

  • Young Generation: Newly created objects start in the Young Generation. The Young Generation is further subdivided into an Eden space, where all new objects start, and two Survivor spaces, where objects are moved from Eden after surviving one garbage collection cycle. When objects are garbage collected from the Young Generation, it is a minor garbage collection event.
  • Old Generation: Objects that are long-lived are eventually moved from the Young Generation to the Old Generation. When objects are garbage collected from the Old Generation, it is a major garbage collection event.
  • Permanent Generation: Metadata such as classes and methods are stored in the Permanent Generation. Classes that are no longer in use may be garbage collected from the Permanent Generation.

Types of Garbage Collector implementation: The JVM has five types of implementations of GC

  • Serial Garbage Collector — This is the simplest GC implementation, as it basically works using a single thread. As a result, this GC implementation freezes all application threads when it runs. Hence, it is not a recommended one to use in a multi-threaded applications. The Serial GC is the GC of choice for most applications that do not have small pause time requirements and run on client-style machines.
  • Parallel Garbage Collector — It is the default GC of the JVM and sometimes called “Throughput Collectors”. Unlike Serial Garbage Collector, this uses multiple threads for managing heap space. But it also freezes other application threads while performing GC. If we use this GC , we can specify maximum garbage collection threads and pause time, throughput, and heap size.The time spent doing garbage collection versus the time spent outside of garbage collection is called the maximum throughput target.
  • CMS Garbage Collector — The Concurrent Mark Sweep (CMS) implementation uses multiple garbage collector threads for garbage collection. It is typically designed for applications that prefer shorter garbage collection pauses, and can afford to share processor resources with the GC while the application is running; which means applications using this type of GC respond slower on average but do not stop responding to perform garbage collection. A quick point to note here is that since this GC is concurrent, an invocation of explicit garbage collection such as using System.gc() while the concurrent process is working, will result in “Concurrent Mode failure ”
  • G1 Garbage Collector — G1 (Garbage First) Garbage Collector is designed for applications running on multi-processor machines with a large memory space. It’s available since JDK7 Update 4 and in later releases. It has replaced the CMS collector as it’s more performance efficient. Unlike other collectors, the G1 collector partitions the heap into a set of equal-sized heap regions, each a contiguous range of virtual memory. When performing garbage collections, G1 shows a concurrent global marking phase -phase 1 known as “Marking” to determine the liveness of objects throughout the heap. After the mark phase is completed, G1 knows which regions are mostly empty. It collects in these areas first, which usually yields a significant amount of free space -phase 2 known as “Sweeping”. It is why this method of garbage collection is called Garbage-First. The G1 GC uses concurrent and parallel phases to achieve its target pause time and to maintain good throughput.
  • Z Garbage Collector —It is a scalable low-latency GC which is introduced in Java 11 as an experimental option for Linux. JDK 14 introduced ZGC under the Windows and macOS operating systems. ZGC has obtained the production status from Java 15 onwards. ZGC performs all expensive work concurrently, without stopping the execution of application threads for any longer than 10 ms, which makes it suitable for applications that require low latency. It uses load barriers with colored pointers to perform concurrent operations when the threads are running and they are used to keep track of heap usage. Similar to G1, ZGC partitions the heap, except that heap regions can have different sizes.

Java Garbage Collection Algorithm:

  • Mark-compact collection method Algorithm: This algorithm is used by the Serial Garbage Collector and Parallel Garbage Collector. The goal of mark-compact algorithm is to shift the live objects in memory together so the fragmentation is eliminated. Mark-compact algorithms can be regarded as a combination of the mark- sweep algorithm and Cheney’s copying algorithm. Here firstly the reachable objects are marked, then a compacting step relocates the reachable (marked) objects towards the beginning of the heap area.
  • Mark and sweep Algorithm: This algorithm is used by CMS Garbage Collector. It consists of a mark phase that is followed up by a sweep phase. During mark phase, the collector walks over all the GC roots (global variables, local variables, stack frames, virtual and hardware registers etc.) and marks every object that it encounters by setting a bit somewhere in/around that object. And during the sweep phase, it walks over the heap and reclaims memory from all the unmarked objects.
  • Snapshot at the beginning Algorithm : This algorithm is used by G1 collector. It takes a snapshot of the set of live objects in the heap at the start of a marking cycle. So the logical conclusion would be: these buffers are used by that algorithm in order to store that snapshot. Snapshot-at-the-beginning algorithm is for tracing, incremental GC note changes made by the mutator to the graph of objects and update the collector state to make it trace relevant edges that the mutator deletes. In order for the collector to miss a reachable object, the following two conditions need to hold at some point during tracing>>The mutator stores a reference to a white object into a black object.>> All paths from any gray objects to that white object are destroyed. Snapshot-at-the-beginning algorithms ensure the second condition [All paths from any gray objects to that white object are destroyed] cannot occur, by causing the collector to process any reference that the mutator overwrites and that might be part of such a path. They are so called because they keep track of references that existed at the beginning of the collection cycle.

Hope this article brings in clarity to you for how the built-in Garbage collector functions.

Rupam Pawan Jha

--

--