cyclone/docs/Garbage-Collector.md
Justin Ethier 98eadc2be8 WIP
2016-01-20 21:40:55 -05:00

15 KiB

cyclone-scheme

Garbage Collector

Introduction

Cyclone uses generational garbage collection (GC) to automatically free allocated memory. In practice, most allocations consist of short-lived objects such as temporary variables. Our generational collector performs two types of collection. Minor GC is done frequently to clean up most of these short-lived objects. Some objects will survive this collection and remain in memory. A major collection runs less often to free longer-lived objects that are no longer being used by the application.

Cheney on the MTA, a technique introduced by Henry Baker, is used to implement the first generation of our garbage collector. Objects are allocated directly on the stack using alloca so allocations are very fast, do not cause fragmentation, and do not require a special pass to free unused objects. Baker's technique uses a copying collector for both the minor and major generations of collection. One of the drawbacks of using a copying collector for major GC is that it relocates all the live objects during collection. This is problematic for supporting native threads because an object can be relocated at any time, invalidating any references to the object. To prevent this either all threads must be stopped while major GC is running or a read barrier must be used each time an object is accessed. Both options add a potentially significant overhead so instead another type of collector is used for the second generation.

Cyclone supports native threads by using a tracing collector based on the Doligez-Leroy-Gonthier (DLG) algorithm for major collections. An advantage of this approach is that objects are not relocated once they are placed on the heap. In addition, major GC executes asynchronously so threads can continue to run concurrently even during collections.

The goal of this paper is to provide a high-level overview of Cyclone's garbage collector.

Terms

  • Collector - A thread running the garbage collection code. The collector is responsible for coordinating and performing most of the work for major garbage collections.
  • Continuation - With respect to the collectors, this is a function that is called to resume execution. For more information see this article on continuation passing style.
  • Forwarding Pointer - When a copying collector relocates an object it leaves one of these pointers behind with the object's new address.
  • Mutation - A modification to an object. For example, changing a vector (array) entry.
  • Mutator - A thread running user (or "application") code; there may be more than one mutator running concurrently.
  • Read Barrier - Code that is executed before reading an object. Read barriers have a larger overhead than write barriers because object reads are much more common.
  • Root - During tracing the collector uses these objects as the starting point to find all reachable data.
  • Write Barrier - Code that is executed before writing to an object.

Code

The implementation code is available here:

  • runtime.c contains most of the runtime system, including code to perform minor GC. A good place to start would be the GC and gc_minor functions.
  • gc.c contains the major GC code.

Data Structures

Heap

Cyclone's heap is based on the implementation from chibi scheme.

The heap consists of a linked list of pages. Each page contains a contiguous block of memory and a linked list of free chunks. When a new chunk is requested the first free chunk large enough to meet the request is found and either returned directly or carved up into a smaller chunk to return to the caller.

Memory is always allocated in multiples of 32 bytes. On the one hand this helps prevent external fragmentation by allocating many objects of the size. But on the other it incurs internal fragmentation because an object will not always fill all of its allocated memory.

The heap is locked during allocation and sweep operations to protect against concurrent access.

If there is not enough free memory to fulfill a request a new page is allocated and added to the heap. There is no choice, unfortunately. The collection process is asynchronous so memory cannot be freed immediately to make room.

Thread Data

At runtime Cyclone passes the current continuation, number of arguments, and a thread data parameter to each compiled C function. The thread data contains all of the necessary information to perform collections, including:

  • Thread state
  • Stack boundaries
  • Current continuation and arguments
  • Jump buffer
  • Call history buffer
  • Exception handler stack
  • Contents of the minor GC write barrier
  • Major GC parameters

TODO: anything else? mutator/collector mark lists? write barrier lists?

Minor Collection

Cyclone converts the original program to continuation passing style (CPS) and compiles it as a series of C functions that never return. At runtime the code periodically checks to see if the executing thread's stack has exceeded a certain size. When this happens a minor GC is started and all live stack objects are copied to the heap.

Root objects are "live" objects the collector uses to begin the tracing process. Cyclone's minor collector treats the following as roots:

  • The current continuation
  • Arguments to the current continuation
  • Mutations contained in the write barrier
  • Closures from the exception stack
  • Global variables

A minor collection is always performed for a single mutator thread, usually by the thread itself. The algorithm is based on Cheney on the MTA and consists of the following:

  • Move any root objects on the stack to the heap.
    • Replace the stack object with a forwarding pointer. The forwarding pointer ensures all references to a stack object refer to the same heap object, and allows minor GC to handle cycles.
    • Record each moved object in a buffer to serve as the Cheney "to-space".
  • Loop over the "to-space" buffer and check each object moved to the heap. Move any child objects that are still on the stack. This loop continues until all live objects are moved.
  • Cooperate with the collection thread (see next section).
  • Perform a longjmp to reset the stack and call into the current continuation.

Finally, although not mentioned in Baker's paper, a heap object can be modified to contain a reference to a stack object. For example, by using a set-car! to change the head of a list. This is problematic since stack references are no longer valid after a minor GC. We account for these "mutations" by using a write barrier to maintain a list of each modified object. During GC, these modified objects are treated as roots to avoid dangling references.

Major Collection

A single heap is used to store objects relocated from the various thread stacks. Eventually the heap will run too low on space and a collection is required to reclaim unused memory. The collector thread is used to perform a major GC with cooperation from the mutator threads.

Each object is assigned a color to indicate the status of its memory:

  • Blue - Unallocated memory.
  • Red - Objects on the stack.
  • White - Heap memory that has not been scanned by the collector. Memory that is still white after the collector finishes tracing is garbage.
  • Gray - Objects marked by the collector that may still have child objects that must be marked.
  • Black - Objects marked by the collector whose immediate child objects have also been marked.

White is also referred to as the clear color and black as the mark color. Gray is never explicitly assigned to an object. Instead, objects are grayed by being added to lists of gray objects awaiting marking. This improves performance by avoiding repeated passes over the heap to search for gray objects.

Each of the mutator threads, and the collector itself, has a status variable:

 typedef enum { STATUS_ASYNC 
              , STATUS_SYNC1 
              , STATUS_SYNC2 
              } gc_status_type;

The collector performs a handshake with the mutators to change status. The collector will update its status variable and then wait for all of the collectors to change their status before continuing. The mutators periodically call a cooperate function to check in and update their status to match the collectors. A handshake is complete once all mutators have updated their status.

Collection Cycle

During a GC cycle the collector thread transitions through the following states:

Clear

The collector swaps the values of the clear color (white) and the mark color (black). This is more efficient than modifying the color on each object. The collector then transitions to sync 1.

Mark

The collector transitions to sync 2 and then async. At this point it marks the global variables and waits for the mutators to also transition to async.

Trace

The collector finds all live objects and marks them black.

Sweep

The collector scans the heap and frees memory used by all white objects. If the heap is still low on memory at this point the heap will be increased in size.

Any terminated threads will have their thread data freed now. ( TODO: Thread data is kept through the collection cycle to ... ensure live objects are not missed? double-check this)

Resting

The collector cycle is complete and it rests until it is triggered again.

Collector Functions

Mark Gray

TODO: data structure used instead of explicit marking, to improve performance

Collector Mark Gray

Mark Black

Collector Trace

TODO: needed? should this just be part of the collector trace section?

Mutator Functions

Each mutator calls the following functions to coordinate with the collector.

Create

This function is called by a mutator to allocate memory on the heap for an object. This is generally only done during a minor GC when each object is relocated to the heap.

Update

A write barrier is used to ensure any modified objects are properly marked for the current collection cycle. There are two cases:

  • Gray the object's new and old values if the mutator is in a synchronous status. Graying of the new value is a special case since it may still be on the stack. Instead of marking it directly, the object is tagged to be grayed when it is relocated to the heap.
  • Gray the object's old value if the collector is in the tracing stage.

Cooperate

Each mutator is required to periodically call this function to cooperate with the collector. During cooperation a mutator will update its status to match the collector's status, to handshake with the collector.

In addition when a mutator transitions to async it will:

  • Mark all of its roots gray
  • Use black as the allocation color for any new objects to prevent them from being collected during this cycle.

Cyclone's mutators cooperate after each minor GC, for two reasons. Minor GC's are frequent and immediately afterwards all of the mutator's live objects can be marked because they are on the heap.

Cooperation by the Collector

In practice a mutator will not always be able to cooperate in a timely manner. For example, a thread can block indefinitely waiting for user input or reading from a network port. In the meantime the collector will never be able to complete a handshake with this mutator and major GC will never be performed.

Cyclone solves this problem by requiring that a mutator let the collector know that it is (or could be) blocking. The mutator will call a function to update its thread state to CYC_THREAD_STATE_BLOCKED.

With this information the collector can cooperate on behalf of a blocked mutator and do the work itself instead of waiting for the mutator. The possible thread states are:

typedef enum { CYC_THREAD_STATE_NEW
             , CYC_THREAD_STATE_RUNNABLE
             , CYC_THREAD_STATE_BLOCKED
             , CYC_THREAD_STATE_BLOCKED_COOPERATING
             , CYC_THREAD_STATE_TERMINATED
             } cyc_thread_state_type;

By now you might be wondering about BLOCKED_COOPERATING. Unfortunately, if the mutator is transitioning to async all of its objects need to be relocated from the stack so they can be marked. In this case the collector changes the thread's state to CYC_THREAD_STATE_BLOCKED_COOPERATING and performs a minor collection for the thread. The mutator's objects can then be marked gray and its allocation color can be flipped.

When a mutator exits a (potentially) blocking section of code, it must call another function to update its thread state to CYC_THREAD_STATE_RUNNABLE. In addition, the function will detect if the collector cooperated for this mutator. If so, the mutator will perform a minor GC again to ensure any additional objects - such as results from the blocking code - are moved to the heap then it will longjmp back to the beginning of its stack. Either way, the mutator now calls into its continuation and resumes normal operations.

Other Considerations

Garbage collection papers are generally silent on when to start the collection cycle, presumably leaving this up to the implementation. Cyclone checks the amount of free memory as part of its cooperation code. A major GC cycle is started if the amount of free memory dips below a threshold.

Looking Ahead

Motivations:

  • Extend baker's approach to support multiple mutators
  • Position to potentially support state of the art GC's built on top of DLG (Stopless, Chicken, Clover)

Limitations or potential issues:

  • DLG memory fragmentation could be an issue for long-running programs. either need a compaction process or a way to handle larger objects by breaking them up (pizlo phd)
  • Locking, atomics, etc
  • Improve performance?
  • handle large objects, need to be able to split them across pages

Cyclone's implementation generally tries to use atomic operations, but there is some locking. In particular, heap is protected by lock during object allocation and deallocation. This is one area that could probably be improved.

quote on this? - first priority must be correctness, can address performance over time

Further Reading