What is Java GC(Garbage Collection)?

Java

What is Java GC(Garbage Collection)?

RyanGomdoriPooh 2021. 10. 5. 02:02

Java uses memory without explicitly releasing it. Instead, JVM manages memory.

JVM deletes unused objects in the Heap area by GC.

Heap area is dynamically allocated memory area.

Heap manages object-type data objects, typically classes such as String and Collection.

The important part of this is to determine which object to keep and which object to GC in Heap.

GC is designed with the following concept.

The hypothesis for design of GC(Garbage Collection)

* Most objects quickly become unreachable state. ⇒ Process by "minor GC"
* There are very few references from old objects to young objects. ⇒ Process by "major GC"

So, Which objects will GC collect?

If an object have a valid reference from GC Roots, it is called Reachable, otherwise Unreachable.

* Unreachable objects : Among the objects, an object that is not being referenced.

* Unreachable objects are subject to GC collection.
* GC Roots : Data in the stack area, static data in the method area, and objects generated by JNI.

How does gc work?

The basic principle of gc motion is as follows.

"MARK and Sweep"

1. "Mark" : From GC Root, GC scans all variables to find and mark which objects each is referring to. (The process of identifying a Reachable object and an Unreachable object)
2. "Sweep" : Unreachable objects are removed from Heap.

--
3. (It can be added according to the algorithm) "Compact": Sweep, collect the distributed objects to the starting address of Heap. And divide them into "parts where memory is allocated" and "parts that are not". (It prevents memory fragmentation)

When will GC happen and how will it be handled?

First of all, We have to know concept for Heap.

The architecture of Heap is as follows.

The architecture of Heap

* Young Generation(Eden, Survivor 0, Survivor 1): Newly created objects are located here. Because most objects become quickly inaccessible state, so many objects are created and disappeared only in the Young area. Minor GC occurs when an object disappears in this area.
* Old Generation : The objects surviving for a long time in the Young area is copied here. Most of them are allocated larger than the Young Generation, and GC occurs less than the Young area as the size of them is large. Major GC occurs when an object disappears in this area.

Process of GC

Eden => Survivor(1,2) => Old

What is Minor GC?

* Occurs when the Eden area is full

* At the same time, Minor GC also occurs in the Sirvivor area.

Sequence of Minor GC

1-1. In Eden area, newly objects is created.

1-2. Objects are continuously created, and when the Eden area is full, it executes "Minor GC".

1-3. When "Minor GC" occurs in "Eden", "Mark" executes about objects which existed in "Eden".

1-4. When "Mark" executes, Distinguish unreachable or reachable objects in "Eden".

1-5. Reachable objects move to "Survivor 1 Area" and the age number of them goes up by 1. (*Aging)

1-5. After "Mark" execution, Execute "Sweep" to remove unreachable objects in "Eden Area".

Important concept

*Objects must exist separately in the Survivor1 or Survivor2 area (When "Minor GC" is executed, one of the Survivor areas must be empty)

2-1. The empty Enden area creates new objects again.

2-2. Step 1 is repeatly processed. Likewise, If "Eden" area is full, Execute "Minor GC in Eden", Reachable objects move to "Survivor Area" that was not used in the previous step. Unreachable objects remove in "Eden". Also, Execute "Minor GC in Survivor which is executed in the previous step". Reachable objects move to "Survivor Area" that was not used in the previous step. Unreachable objects remove in Survivor which is executed in the previous step.

2-3. Likewise, Reachable objects that survived in "Minor GC" go up by 1. (*Aging)

3-1. When objects in "Servivor Area" reach its age threshold, its move to "Old Generation". (*Promoted)

3-2. If "Old Generation" is full, "Major GC(Full GC)" is executed.

The process of GC generation.

What is Major GC?

: If "Old Generation" is full, "Stop-the-world" is executed and then "Major GC(Full GC)" is executed.

Stop-the-world

* JVM stops running the application to execute the GC.
* All threads except threads running GC stop working.
* After completing the GC work, Resume the discontinued works.
* In most cases, Tuning GC is to reduce this stop-the-world time.

Type of Major GC (based on JDK 7)

* Serial GC
* Parallel GC
* Parallel Old GC(Parallel Compacting GC)
* Concurrent Mark & Sweep GC (CMS 또는 Mark & Sweep)
* G1(Garbage First) GC

Serial GC

JVM option: -XX:+UseSerialGC

* Serial GC should never be used in the operating server.
* Because It is made for a single core method, The process power is wasted.

* Serial GC is suitable for fewer memories and fewer cores.
* The "Young area" uses bump-the-pointer and TALBs methods.

* The "Old area" uses the Mark & Sweep Compact algorithm.

Parallel GC

JVM option: -XX:+UseParallelGC, -XX:ParallelGCThreads (Minor GC Thread Count)

* Serial GC has the same basic algorithm with Parallel GC.

* Parallel GC uses multi-thread for GC.
* Parallel GC is suitable for multiple cores and large size memory.

* When throughput is more important than latency reduction(obtained using CMS GC), Use Parallel GC.

Parallel Old GC

JVM option: -XX:+UseParallelOldGC, -XX:ParallelGCThreads (Minor, Full GC Thread Count)

* In Parallel GC, the GC algorithm in the Old region passes through Mark-Summary-Compaction.

* Unlike Sweep, the Summary step goes through object identification for the area where GC was performed.

Concurrent Mark & Sweep GC(CMS GC)

JVM option: -XX:+UseConcMarkSweepGC

* This is a more improved method than the GCs discussed earlier.

* Although its performance has improved, GC's process has become more complex.

* CMS GC is focused on minimizing STW(stop-the-world) time occurring during the GC process.
* The concept of CMS GC is to identify the GC target in as much detail as possible and then take a short time to execute STW.

* CPU usage is higher than that of other GCs. Because the process of identifying GC targets is performed in various complicated steps.

CMS GC : Initial Mark -> Concurrent Mark -> Remark -> Concurrent Sweep

Step "Initial Mark"

- It find reachable objects among the closest objects to the class loader. Therefore, the stopping time is very short.

Step "Concurrent Mark"

- Check by following the objects referenced by the reachable objects which were checked in "Initial Mark".

- This step proceeds simultaneously while other threads are running.

Step "Remark"
- An object newly added or disconnected from reference is identified in the Current Mark step.

Step "Concurrent Sweep"

- Cleaning up the garbage is performed.

- This work is also executed in a situation where other threads are being executed.

* The stop-the-world time is very short because it is a GC method that proceeds to this step.

* CMS GC is used when the response speed of all applications is very important, and it is also called Low Latency GC.

However, while CMS GC has the advantage of short stop-the-world time, the following disadvantages exist.

* It uses more memory and CPU than other GC methods.

* Compaction steps are not provided by default.
* Therefore, when using CMS GC, it should be used after careful review.

* If you need a compaction operation because of many fragmented memories, you have to check how often and for how long the compaction operation is performed because STW time is longer than other GC STW time.

G1 GC (G1: Garbage First)

JVM option: -XX:+UseG1GC

G1 manages heap memory in a different way than the gc discussed earlier.

Unlike parallel/CMS collectors, there is no continuous memory space with distinct boundaries from generation to generation

Also, It has nothing to do with hemispherical heap layouts.

G1 consists of a Region divided into logical units.

The region allows generation to be placed discontinuously.

And there is no need to collect the entire garbage every time the collector runs.

The characteristics of G1 GC.
* It is a gc developed for use in multi-processor systems with large memory.
* It's much easier to tune than CMS.
* They are less vulnerable for early promotion. (Early promotion is a problem in which the quota is too high to be promoted to the Old area too quickly.)
* It has excellent scalability (especially break time) in large heap.
* From Java 9, it is default gc.

The heap of G1 consists of a region.

The dictionary allows generations to be deployed discontinuously, and there is no need to collect the entire garbage every time the collector runs.

The G1 collector tracks which objects are stored in which region through the Reserved Set (RSet).

Thanks to this, G1 only needs to take out the Rset without having to go through the entire hip to find a reference looking inside the area.

The collection steps of G1 are as follows.

"Initial Mark-STW"

- STW occurs.

- Find the Survivor area referenced by objects existing in Old area.

"Root Region Scanning"

- Identify references pointing to the Old Area in the Survivor Area identified in "Initial Mark-STW" Step.
"Concurrent Mark"

- Find reachable objects throughout the heap.
"Remark-STW"

- STW occurs.

- Verify "Concurrent Mark" step and identify the objects that will finally survive. In this step, a snapshot-at-the-beginning (SATB) algorithm is used.
"Cleanup-STW"

- Stop the application (STW) and remove unused objects for the fewest reachable objects.

- After the STW is finished, and the completely empty region in the GC process is added to the Freelist to be reused.
"Copy"

- Copy the surviving objects of the relocation to GC but not completely empty during the Cleanup process to perform the Compaction operation by copying them to the new relocation.

Since G1GC officially appeared in JDK 7, it has been widely used, so let's take a closer look at the setting options.

-XX: G1HeapRegionSize

Default: X
Region Size. Can be set from 1MB to 32MB. Minimum heap size must be set to be divided into 2048 Region.

-XX:MaxGCPauseMillis
Default: 200
Maximum time of STW caused by G1GC. G1 only tries to match the set value as much as possible, but it is not a guaranteed value.

-XX:DefaultMinNewGenPercent
Default: 5
Minimum size of the heap to be used as the Young area (ratio to the total size of the heap,

-XX:DefaultMaxNewGenPercent
Default: 60
Maximum size of heap to be used as Young area (ratio to total heap size, %)

-XX:ParallelGCThreads
Default: X
Number of threads performing GC in an STW situation. If the number of CPU cores is less than or equal to 8, it is recommended to set the same as the number of cores.

-XX:ConcGCThreads
Default: X
Number of threads performing Concurrent Mark. It is recommended to set it to 25% of Parallel GCThreads.

-XX:InitiatingHeapOccupancyPercent
Default: 45
Option to perform Mark when using a specific percentage (%) of the hip relative to the total size.

-XX:G1OldCSetRegionLiveThresholdPercent
Default: 65
The ratio of the Old Region size at which the mixed GC begins.

Memory Management

Java does not directly access the memory area of the OS, but indirectly uses a virtual machine called JVM.

Take over all the tricky parts of memory management to Java virtual machine.

* Java's memory leakage?

- A phenomenon in which objects that are no longer used for memory leakage are not recovered by the garbage collection GC and are continuously accumulated. So, Major GC frequently occurs due to objects accumulated in the Old area, and the program's response speed is delayed, and eventually the program ends with an Out of Memory (OOM) error.

In this way, Java has the advantage of using virtual machines (in addition to the advantage of being independent of the operating system) that memory leak at the OS level becomes impossible. But memoty leak can occur.

The reason is that if you hold a reference of an object that is not actually used in the program, the object will occupy memory (address space) as garbage objects that is not processed by GC and cannot be accessed and used within the program.

Since Java's used memory is only likely to be used, there may be garbage objects, not objects that are being used logically and accurately, and these objects cause memory leaks.

Memory Leak Examples)
- It usually occurs when objects stored in collections such as declarations of frequent global variables, lists, or hashmaps are maintained without releasing them.
- When creating meaningless objects using rapper classes such as Integer and Long,
- If the cache data is not declared and released in Map,
- If you use a stream object and don't close it,
- If you define the key of Map as a user object and do not override Equals() and hascode() and mistake it for the same key, and the data continues to accumulate.
- If the key value of Map is not uncomfortable data and changes continuously when comparing data, although the key value is redefined by defining the key of Map as a user object.
- If the data structure is created and used, and the memory is not released due to an implementation error,

Conclusion

I haven't experienced a large-scale project yet, so the topic of GC tuning doesn't touch much, but I think I need a deep understanding of GC to become a great Java developer.

I think I need to know how GC works, how to tune, and what kind of memory leakage occurs so that I can write code with improved performance during a large scale of project.

Reference

https://www.oreilly.com/library/view/optimizing-java/9781492039259/

Optimizing Java

Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done. Yet that’s often the case. With this practical … - Selection from Optimizing Java [Book]

www.oreilly.com

https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/cms.html

Concurrent Mark Sweep (CMS) Collector

Concurrent Mode Failure The CMS collector uses one or more garbage collector threads that run simultaneously with the application threads with the goal of completing the collection of the tenured generation before it becomes full. As described previously,

docs.oracle.com

https://youtu.be/Fe3TVCEJhzo?t=200

저작자표시 비영리 변경금지 (새창열림)