This commit is contained in:
Justin Ethier 2018-07-19 19:45:34 +00:00
parent 0379ad6907
commit a746ce3323

View file

@ -1,4 +1,6 @@
One of the basic improvements to the mark-sweep algorithm suggested by the Garbage Collection Handbook is lazy sweeping. With this approach instead of waiting until tracing is finished and having the collector thread sweep the entire heap at once, each thread will sweep its own heaps as part of the allcation. When no more free space is available to meet a request the allocator will check to see if there are unswept heap pages. If so, the next one will be selected, the mutator will sweep it to free up space, and the new page will be used for allocation. If insufficient space is available then a major collection is triggered.
One of the basic improvements to the mark-sweep algorithm suggested by the Garbage Collection Handbook is lazy sweeping.
With this approach instead of waiting until tracing is finished and having the collector thread sweep the entire heap at once, each thread will sweep its own heaps as part of the allcation. When no more free space is available to meet a request the allocator will check to see if there are unswept heap pages. If so, the next one will be selected, the mutator will sweep it to free up space, and the new page will be used for allocation. If insufficient space is available then a major collection is triggered.
The main goal of this process is to improve performance:
@ -9,10 +11,6 @@ The main goal of this process is to improve performance:
Older notes:
Ideally want to improve performance with this approach. Hopefully it improves cache locality since we will only sweep a little bit and then will use the newly-sweeped portion for allocations.
Can we minimize the size of gc_try_alloc to allow inlining of this heavily-used function?
Original notes:
Should consider lazy sweeping, riptide does this. Perhaps it would improve cache locality when sweeping fixed-size heaps
Pseudocode on page 25 of the GC Handbook, but we need to adapt it for cyclone
@ -26,9 +24,44 @@ Maybe we wait for heap page to be empty, then add the check as part of the slow
How do we handle heap growth and GC initiation when we are doing partial, lazy sweeps?
Object Coloring
# Object Coloring with the old collector
Before this change, an object could be marked using any of the following colors to indicate the status of its memory:
- Blue - Unallocated memory.
- Red - An object on the stack.
- White - Heap memory that has not been scanned by the collector.
- Gray - Objects marked by the collector that may still have child objects that must be marked.
- Black - Objects marked by the collector whose immediate child objects have also been marked.
Only objects marked as white, gray, or black participate in major collections:
- White objects are freed during the sweep state. White is sometimes also referred to as the clear color.
- Gray is never explicitly assigned to an object. Instead, objects are grayed by being added to lists of gray objects awaiting marking. This improves performance by avoiding repeated passes over the heap to search for gray objects.
- Black objects survive the collection cycle. Black is sometimes referred to as the mark color as live objects are ultimately marked black.
After a major GC is completed the collector thread swaps the values of the black and white color. This simple optimization avoids having to revisit any objects while allowing the next cycle to start with a fresh set of white objects.
# Object Coloring with Lazy Sweeping
The current set of colors is insufficient for lazy sweeping because parts of the heap may not be swept during a collection cycle.
(an example might help here)
Thus an object that is really garbage could accidentally be assigned the black color.
The solution is to add a new color (purple) to indicate garbage objects on the heap. That way we can sweep while the collector is busy doing other work such as mark/trace.
We can assign a new purple color after tracing is finished. At this point the clear color and the purple color are (essentially) the same, and any new objects are allocated using the mark color. When gc starts back up, the clear and mark colors are each incremented by 2. So we would then have purple (assigned the previous clear color), clear (assigned the previous mark color), and mark (assigned a new number). All of these numbers must be odd so they will never conflict with the red (stack) color or the blue color (though that one is presently unused).
Notes:
If we now have two alloc colors:
One is the existing alloc color
The other is the previous clear color, when we cooperate. We can't free objects of this color because the collector is tracing over them
After tracing is finished, we would want to remove this color because at that point objects that still have it need to become garbage
Globals (collector? who sets these?):
static unsigned char gc_color_mark = 5; // Black, is swapped during GC
static unsigned char gc_color_clear = 3; // White, is swapped during GC
static unsigned char gc_color_purple = 1; // There are many "shades" of purple, this is the most recent one
@ -45,13 +78,13 @@ Mutator data:
unsigned char cached_free_size_status;
Plan B - Do not force any sweeps
The plan is to add a new color (purple) that will be used to indicate garbage objects on the heap. That way we can sweep while the collector is busy doing other work such as mark/trace.
We can assign a new purple color after tracing is finished. At this point the clear color and the purple color are (essentially) the same, and any new objects are allocated using the mark color. When gc starts back up, the clear and mark colors are each incremented by 2. So we would then have purple (assigned the previous clear color), clear (assigned the previous mark color), and mark (assigned a new number). All of these numbers must be odd so they will never conflict with the red (stack) color or the blue color (though that one is presently unused).
# Allocation
With this strategy we could use a global for the purple (garbage) color. But do we even need to track purple at all? I dont think so, all of the shades of purple are implicit - they are just the odd-numbered colors that are not the mark or (sometimes) the clear color.
TODO: discuss fast path (slot on current heap page) and slow path (page full, need to find another one)
Each heap will have to maintain a “full” bit. This is necessary to avoid wasted work of re-examining heaps that we already know to be full.
(below about selecting next heap page)
Each heap will have to maintain a full bit. This is necessary to avoid wasted work of re-examining heaps that we already know to be full.
Bit is set by the allocate function when no more allocations are possible
Bit is cleared by the collector after tracing is complete
Would be better if the mutator could do it to avoid contention
@ -68,16 +101,37 @@ Initialize it to the same as the allocation color
We want to assign this during cooperation, in preparation for tracing. This can be done using the existing code (note there are 2 places, in case collector cooperates on behalf of a mutator). Actually, during cooperation this value can remain unchanged since it is already assigned properly (IE, it is the white color).
After tracing is finished, we want to assign white color to the same value as the new allocation color. gc_collector_sweep already loops over all mutators. We can still do this and just atomically update the second alloc color to allow it to be freed again (IE, just set it to the mark color)
# Sweeping
// Use the object's mark to determine if we keep it.
// Need to check for both colors because:
// - Objects that are either newly-allocated or recently traced are given
// the alloc color, and we need to keep them.
// - If the collector is currently tracing, objects not traced yet will
// have the trace/clear color. We need to keep any of those to make sure
// the collector has a chance to trace the entire heap.
if (//mark(p) != markColor &&
mark(p) != thd->gc_alloc_color &&
mark(p) != thd->gc_trace_color) { //gc_color_clear)
This makes sweep slightly more expensive because now to determine if an object is garbage it needs to make sure it is not using the allocation color or the white color (remember, we only want to free purple objects, but that color value changes each GC cycle). I think this will be acceptable though because it allows us to only sweep when necessary (IE, a heap does not need to be swept at all during a GC cycle if we dont need the space) and when we sweep we will only iterate over one heap object.
Free Space Tracking
TBD
# Starting a Major Collection
The existing GC tracked free space and would start a major GC once the amount of available heap memory was below a threshold. We continue to use the same strategy with lazy sweeping, but during a slow allocation the mutators also check how many heap pages are still free. If that number is too low we trigger a new GC cycle.
Notes:
If we now have two alloc colors:
One is the existing alloc color
The other is the previous clear color, when we cooperate. We cant free objects of this color because the collector is tracing over them
After tracing is finished, we would want to remove this color because at that point objects that still have it need to become garbage
# Results
TODO: compare performance of new GC to old one, perhaps with benchmarks (compare 0.8.1 release with current 0.9 branch)
# Conclusion
wrap this up...
# References
- Garbage Collection Handbook
- riptide (see blog post)