18 KiB
Garbage Collector
- Introduction
- Terms
- Code
- Data Structures
- Minor Collection
- Major Collection
- Looking Ahead
- Further Reading
Introduction
The goal of this paper is to provide a high-level overview of Cyclone's garbage collector. The collector has the following requirements:
- Automatically free allocated memory.
- Allow the language implementation to support tail calls and continuations.
- Allow the language to support native multithreading.
Cyclone uses generational garbage collection (GC) to automatically free allocated memory using two types of collection. In practice, most allocations consist of short-lived objects such as temporary variables. Minor GC is done frequently to clean up most of these short-lived objects. Some objects will survive this collection because they are still referenced in memory. A major collection runs less often to free longer-lived objects that are no longer being used by the application.
Cheney on the MTA, a technique introduced by Henry Baker, is used to implement the first generation of our garbage collector. Objects are allocated directly on the stack using alloca
so allocations are very fast, do not cause fragmentation, and do not require a special pass to free unused objects. Baker's technique uses a copying collector for both the minor and major generations of collection. One of the drawbacks of using a copying collector for major GC is that it relocates all the live objects during collection. This is problematic for supporting native threads because an object can be relocated at any time, invalidating any references to the object. To prevent this either all threads must be stopped while major GC is running or a read barrier must be used each time an object is accessed. Both options add a potentially significant overhead so instead another type of collector is used for the second generation.
Cyclone supports native threads by using a tracing collector based on the Doligez-Leroy-Gonthier (DLG) algorithm for major collections. An advantage of this approach is that objects are not relocated once they are placed on the heap. In addition, major GC executes asynchronously so threads can continue to run concurrently even during collections.
Terms
- Collector - A thread running the garbage collection code. The collector is responsible for coordinating and performing most of the work for major garbage collections.
- Continuation - With respect to the collectors, this is a function that is called to resume execution. For more information see this article on continuation passing style.
- Forwarding Pointer - When a copying collector relocates an object it leaves one of these pointers behind with the object's new address.
- Mutation - A modification to an object. For example, changing a vector (array) entry.
- Mutator - A thread running user (or "application") code; there may be more than one mutator running concurrently.
- Read Barrier - Code that is executed before reading an object. Read barriers have a larger overhead than write barriers because object reads are much more common.
- Root - During tracing the collector uses these objects as the starting point to find all reachable data.
- Write Barrier - Code that is executed before writing to an object.
Code
The implementation code is available here:
runtime.c
contains most of the runtime system, including code to perform minor GC. A good place to start would be theGC
andgc_minor
functions.gc.c
contains the major GC code.
Data Structures
Heap
The heap consists of a linked list of pages. Each page contains a contiguous block of memory and a linked list of free chunks. When a new chunk is requested the first free chunk large enough to meet the request is found and either returned directly or carved up into a smaller chunk to return to the caller.
Memory is always allocated in multiples of 32 bytes. On the one hand this helps prevent external fragmentation by allocating many objects of the size. But on the other it incurs internal fragmentation because an object will not always fill all of its allocated memory.
The heap is locked during allocation and sweep operations to protect against concurrent access.
If there is not enough free memory to fulfill a request a new page is allocated and added to the heap. There is no choice, unfortunately. The collection process is asynchronous so memory cannot be freed immediately to make room.
Cyclone's heap is based on the implementation from chibi scheme.
Thread Data
At runtime Cyclone passes the current continuation, number of arguments, and a thread data parameter to each compiled C function. The thread data contains all of the necessary information to perform collections, including:
- Thread state
- Stack boundaries
- Current continuation and arguments
- Jump buffer
- Call history buffer
- Exception handler stack
- Contents of the minor GC write barrier
- Major GC parameters
Object Header
Each object contains a header with the following information:
- Tag - A number indicating the object type: cons, vector, string, etc.
- Mark - The status of the object's memory. Possible values are:
- Blue - Unallocated memory.
- Red - Objects on the stack.
- White - Heap memory that has not been scanned by the collector.
- Gray - Objects marked by the collector that may still have child objects that must be marked.
- Black - Objects marked by the collector whose immediate child objects have also been marked.
- Grayed - A field indicating the object has been grayed but has not been added to a mark buffer yet. This is only applicable for objects on the stack.
Mark Buffers
Mark buffers are used to hold gray objects instead of explicitly marking objects gray. Each mutator has a reference to a mark buffer holding their gray objects. A last write variable is used to keep track of the buffer size.
The collector updates the mutator's last read variable each time it marks an object from the mark buffer. Marking is finished when last read and last write are equal. The collector also maintains a single mark stack of objects that the collector has marked gray.
These mark buffers consist of fixed-size pointer arrays that are increased in size as necessary using realloc
.
Finally, an object on the stack cannot be added to a mark buffer because the reference may become invalid before it can be processed by the collector.
Minor Collection
Cyclone converts the original program to continuation passing style (CPS) and compiles it as a series of C functions that never return. At runtime the code periodically checks to see if the executing thread's stack has exceeded a certain size. When this happens a minor GC is started and all live stack objects are copied to the heap.
Root objects are "live" objects the collector uses to begin the tracing process. Cyclone's minor collector treats the following as roots:
- The current continuation
- Arguments to the current continuation
- Mutations contained in the write barrier
- Closures from the exception stack
- Global variables
A minor collection is always performed for a single mutator thread, usually by the thread itself. The algorithm is based on Cheney on the MTA and consists of the following:
- Move any root objects on the stack to the heap.
- Replace the stack object with a forwarding pointer. The forwarding pointer ensures all references to a stack object refer to the same heap object, and allows minor GC to handle cycles.
- Record each moved object in a buffer to serve as the Cheney "to-space".
- Loop over the "to-space" buffer and check each object moved to the heap. Move any child objects that are still on the stack. This loop continues until all live objects are moved.
- Cooperate with the collection thread (see next section).
- Perform a
longjmp
to reset the stack and call into the current continuation.
Any objects left on the stack after longjmp
are considered garbage. There is no need to clean them up because the stack will just re-use the memory as it grows.
Finally, although not mentioned in Baker's paper, a heap object can be modified to contain a reference to a stack object. For example, by using a set-car!
to change the head of a list. This is problematic since stack references are no longer valid after a minor GC. We account for these "mutations" by using a write barrier to maintain a list of each modified object. During GC, these modified objects are treated as roots to avoid dangling references.
Major Collection
A single heap is used to store objects relocated from the various thread stacks. Eventually the heap will run too low on space and a collection is required to reclaim unused memory. The collector thread is used to perform a major GC with cooperation from the mutator threads.
Tri-color Marking
Only objects marked as white, gray, and black participate in major collections:
- White objects are freed during the sweep state. White is sometimes also referred to as the clear color.
- Gray is never explicitly assigned to an object. Instead, objects are grayed by being added to lists of gray objects awaiting marking. This improves performance by avoiding repeated passes over the heap to search for gray objects.
- Black objects survive the collection cycle. Black is sometimes referred to as the mark color as live objects are ultimately marked black.
Handshakes
Instead of "stopping the world" and pausing all threads, when the collector needs to coordinate with the mutators it performs a handshake.
Each of the mutator threads, and the collector itself, has a status variable:
typedef enum { STATUS_ASYNC
, STATUS_SYNC1
, STATUS_SYNC2
} gc_status_type;
The collector will update its status variable and then wait for all of the collectors to change their status before continuing. The mutators periodically call a cooperate function to check in and update their status to match the collectors. A handshake is complete once all mutators have updated their status.
Collection Cycle
During a GC cycle the collector thread transitions through the following states:
Clear
The collector swaps the values of the clear color (white) and the mark color (black). This is more efficient than modifying the color on each object in the heap. The collector then transitions to sync 1.
Mark
The collector transitions to sync 2 and then async. At this point it marks the global variables and waits for the mutators to also transition to async.
Trace
The collector finds all live objects using a breadth-first search and marks them black.
Sweep
The collector scans the heap and frees memory used by all white objects. If the heap is still low on memory at this point the heap will be increased in size.
Also, to ensure a complete collection data for any terminated threads is not freed until now.
Resting
The collector cycle is complete and it rests until it is triggered again.
Collector Functions
Mark Gray
Mutators call this function to add an object to their mark buffer.
Collector Mark Gray
The collector calls this function to add an object to the mark stack.
Mark Black
The collector calls this function to mark an object black and mark all of the object's children gray using Collector Mark Gray.
Collector Trace
This function performs tracing for the collector by looping over all of the mutator mark buffers. All of the remaining objects in each buffer are marked black, as well as all the remaining objects on the collector's mark stack. This function continues looping until there are no more objects to mark.
Mutator Functions
Each mutator calls the following functions to coordinate with the collector.
Create
This function is called by a mutator to allocate memory on the heap for an object. This is generally only done during a minor GC when each object is relocated to the heap.
Update
A write barrier is used to ensure any modified objects are properly marked for the current collection cycle. There are two cases:
- Gray the object's new and old values if the mutator is in a synchronous status. Graying of the new value is a special case since it may still be on the stack. Instead of marking it directly, the object is tagged to be grayed when it is relocated to the heap.
- Gray the object's old value if the collector is in the tracing stage.
Cooperate
Each mutator is required to periodically call this function to cooperate with the collector. During cooperation a mutator will update its status to match the collector's status, to handshake with the collector.
In addition when a mutator transitions to async it will:
- Mark all of its roots gray
- Use black as the allocation color for any new objects to prevent them from being collected during this cycle.
Cyclone's mutators cooperate after each minor GC, for two reasons. Minor GC's are frequent and immediately afterwards all of the mutator's live objects can be marked because they are on the heap.
Cooperation by the Collector
In practice a mutator will not always be able to cooperate in a timely manner. For example, a thread can block indefinitely waiting for user input or reading from a network port. In the meantime the collector will never be able to complete a handshake with this mutator and major GC will never be performed.
Cyclone solves this problem by requiring that a mutator let the collector know that it is (or could be) blocking. The mutator will call a function to update its thread state to CYC_THREAD_STATE_BLOCKED
.
With this information the collector can cooperate on behalf of a blocked mutator and do the work itself instead of waiting for the mutator. The possible thread states are:
typedef enum { CYC_THREAD_STATE_NEW
, CYC_THREAD_STATE_RUNNABLE
, CYC_THREAD_STATE_BLOCKED
, CYC_THREAD_STATE_BLOCKED_COOPERATING
, CYC_THREAD_STATE_TERMINATED
} cyc_thread_state_type;
By now you might be wondering about BLOCKED_COOPERATING
. Unfortunately, if the mutator is transitioning to async all of its objects need to be relocated from the stack so they can be marked. In this case the collector changes the thread's state to CYC_THREAD_STATE_BLOCKED_COOPERATING
and performs a minor collection for the thread. The mutator's objects can then be marked gray and its allocation color can be flipped.
When a mutator exits a (potentially) blocking section of code, it must call another function to update its thread state to CYC_THREAD_STATE_RUNNABLE
. In addition, the function will detect if the collector cooperated for this mutator. If so, the mutator will perform a minor GC again to ensure any additional objects - such as results from the blocking code - are moved to the heap then it will longjmp
back to the beginning of its stack. Either way, the mutator now calls into its continuation and resumes normal operations.
Other Considerations
Garbage collection papers are generally silent on when to start the collection cycle, presumably leaving this up to the implementation. Cyclone checks the amount of free memory as part of its cooperation code. A major GC cycle is started if the amount of free memory dips below a threshold.
Looking Ahead
Motivations:
- Extend baker's approach to support multiple mutators
- Position to potentially support state of the art GC's built on top of DLG (Stopless, Chicken, Clover)
Limitations or potential issues:
-
Heap memory fragmentation has not been addressed and could be an issue for long-running programs. Traditionally a compaction process is used to defragment a heap. An alternative strategy has also been suggested by Pizlo:
instead of copying objects to evacuate fragmented regions of the heap, fragmentation is instead embraced. A fragmented heap is allowed to stay fragmented, but the collector ensures that it can still satisfy allocation requests even if no large enough contiguous free region of space exists.
-
Accordingly, the runtime needs to be able to handle large objects that could potentially span one or more pages.
-
There is probably too much heap locking going on, and this could be an issue for a large heap and/or a large number of mutators. Improvements can likely be made in this area.
Ultimately, a garbage collector is tricky to implement and the focus must primarily be on correctness first, with an eye towards performance.
TODO: should measure performance of Cyclone's collector, and improve it over time
Further Reading
- CHICKEN internals: the garbage collector, by Peter Bex
- CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A., by Henry Baker
- Fragmentation Tolerant Real Time Garbage Collection (PhD Dissertation), by Filip Pizlo
- Implementing an on-the-fly garbage collector for Java, by Domani et al
- Incremental Parallel Garbage Collection, by Paul Thomas
- Portable, Unobtrusive Garbage Collection for Multiprocessor Systems, by Damien Doligez and Georges Gonthier