cyclone/gc-notes.txt
2016-01-03 23:02:11 -05:00

200 lines
10 KiB
Text

Phase 1 (gc-dev) - Add gc.h, make sure it compiles.
Phase 2 (gc-dev2) - Change how strings are allocated, to clean up the code and be compatible with a new GC algorithm.
Phase 3 (gc-dev3) - Change from using a Cheney-style copying collector to a naive mark&sweep algorithm.
Phase 4 (gc-dev4) - Integrating new tracing GC algorithm, added new thread data argument to runtime.
Phase 5 (gc-dev5) - Require pthreads library, stand cyclone back up using new GC algorithm.
Phase 6 (gc-dev6) - Multiple mutators (application threads)
Phase 7 (TBD) - Sharing of variables between threads (ideally without limitation, but that might not be realistic)
TODO:
- merge everything back to master??? I think it's just about time
- move mutations into local thread data
- multiple mutators, and threading functions/types. probably want this on a new branch, when ready
part of this is implementing the beginnings of srfi-18, to create multiple threads, sync them, etc
next steps:
- start making core stuff thread safe. for example, test.scm sometimes
crashes, I think printing out result from (read)
- assume I/O and eval both have threading issues
- bring exceptions into local thread data? anything else?
also, will probably need to lock shared resources such as I/O...
- need a legitimate test program that uses mutexes. am worried that when lock calls into a cont, the program will crash because it returns a boolean object, which the runtime does not handle
- user manual
need to document everything, including:
- how to use cyclone (meta files, compiling modules, etc)
- what to be cognizant of when writing threading code. esp, how to deal with stack objects (initiating minor GC's, etc)
DONE:
- need to cooperate when a mutator is blocked
IMPLEMENTATION NOTES:
these become gc_cont and gc_args, so we need them for the wrapper:
GC(td,cfn,buf,1); return;
also need the result of the primitive, although that obviously is not
available until after it finishes blocking. will just have to live with that
constraint.
requirements:
- collector detects initiates async transition
- collector will need to perform a minor GC instead of this mutator
will need to pass in top of stack then, since collector won't have that.
can use address of continuation, if we can guarantee it will always be
allocated on the stack prior to wrapper call. or can just let the wrapper
do it, and stash it somewhere collector can get to it
- collector must set flag immediately to let mutator know what happened
- mutator must know when the transition occurs, and wait for it to finish
- can use mutator lock
will cont always be called via closcall1?
maybe we need to require prim accepts cont as an arg. might simplify
calling the wrapper.
then instead of a wrapper, the prim can call functions to set initial state and cleanup. it already does this to set thread state, so this isn't that big of a change (just call 2 other functions):
before_blocking {
set thread state ==> BLOCKING
set thd->gc_cont to cont, in case collector needs to use it
set stack_top to new field in "thd", again in case collector needs it
OR NOT, I think we can use stack_limit for this, to define the
range of stack addresses
}
after_blocking {
set thread state ==> RUNNABLE
check async flag
if set:
wait for thd->lock
unset async flag
transport result to heap, if necessary (not a value type)
set gc_args[0] to result
longjmp. assumes gc_cont already set by collector
else:
call into cont with result, just like today (see Cyc_io_read_line)
}
OLDER NOTES:
might be able to stop a thread and do a minor GC on it, but no longjmp until after major GC.
would need to figure out how to repack gc_cont and args
optionally, some primitives can accept a cont, how to handle? I guess we would have to
call the primitive with a wrapper instead of the real cont.
worse, how to handle args to a possibly blocking cont? maybe use some kind of proxy
objects? do these primitives need to use a read barrier?? ideally want low overhead...
at the end of the day, obviously need to use a wrapper function to call the primitive,
instead of calling it directly.
how to stop a thread? suppose mutator would set a member in thread data (need to mutex/atomic
that, and be careful about doing that for any shared members), and mutator would need to
lock somehow if that is set upon return.
bottom line, only have to worry about this when calling potentially-blocking primitives.
and if one is blocked when collector is active, then need the collector to cooperate
instead of the blocked mutator. overally this seems do-able, though there are many details
to consider.
- how to share variables between threads?
obviously need to use mutexes (on the application side) to handle access.
but how to handle the case where an object from one thread is added to
a list that belongs to another (IE, queueing an object)? because the
other thread's object might be added as a stack object.
keep in mind referenced obj may be a list or such that contains many other
refs to stack objects on another thread
how can a variable be shared? - cons, vector, set!, define (?), set-car, set-cdr
can we detect if there will be a problem?
* adding var to something in this thread - can tell that obj is red and not on this stack
* modifying list on another thread - if list is on heap, how do we know the 'owning' thread is
not this one? we would have no idea
very concerned about how to make this work
since we only need a minor GC to put the var in the heap, might be able to add a function to trigger a minor GC. could call this function, then it would be safe to move a var to another thread (I think).
might also need to expose a function that would determine whether any given object lives on the stack, and which thread it is on (or at least, if it belongs to the current one).
neither is ideal, but might make the whole thing workable. ideally application code would not need to know about stack vs heap
this feature might end up being gc-dev7 (possibly the final phase)
ORIGINAL notes migrated here from gc.c:
/*
Rough plan for how to implement new GC algorithm. We need to do this in
phases in order to have any hope of getting everything working. Let's prove
the algorithm out, then extend support to multiple mutators if everything
looks good.
PHASE 1 - separation of mutator and collector into separate threads
need to syncronize access (preferably via atomics) for anything shared between the
collector and mutator threads.
can cooperate be part of a minor gc? in that case, the
marking could be done as part of allocation
but then what exactly does that mean, to mark gray? because
objects moved to the heap will be set to mark color at that
point (until collector thread finishes). but would want
objects on the heap referenced by them to be traced, so
I suppose that is the purpose of the gray, to indicate
those still need to be traced. but need to think this through,
do we need the markbuffer and last read/write? do those make
sense with mta approach (assume so)???
ONLY CONCERN - what happens if an object on the stack
has a reference to an object on the heap that is collected?
but how would this happen? collector marks global roots before
telling mutators to go to async, and once mutators go async
any allocations will not be collected. also once collectors go
async they have a chance to markgray, which will include the write
barrier. so given that, is it still possible for an old heap ref to
sneak into a stack object during the async phase?
more questions on above point:
- figure out how/if after cooperation/async, can a stack object pick
up a reference to a heap object that will be collected during that GC cycle?
need to be able to prevent this somehow...
- need to figure out real world use case(s) where this could happen, to try and
figure out how to address this problem
from my understanding of the paper, the write barrier prevents this. consider, at the
start of async, the mutator's roots, global roots, and anything on the write barrier
have been marked. any new objects will be allocated as marked. that way, anything the
mutator could later access is either marked or will be after tracing. the only exception
is if the mutator changes a reference such that tracing will no longer find an object.
but the write barrier prevents this - during tracing a heap update causes the old
object to be marked as well. so it will eventually be traced, and there should be no
dangling objects after GC completes.
PHASE 2 - multi-threaded mutator (IE, more than one stack thread):
- how does the collector handle stack objects that reference objects from
another thread's stack?
* minor GC will only relocate that thread's objects, so another thread's would not
be moved. however, if another thread references one of the GC'd thread's
stack objects, it will now get a forwarding pointer. even worse, what if the
other thread is blocked and the reference becomes corrupt due to the stack
longjmp? there are major issues with one thread referencing another thread's
objects.
* had considered adding a stack bit to the object header. if we do this and
initialize it during object creation, a thread could in theory detect
if an object belongs to another thread. but it might be expensive because
a read barrier would have to be used to check the object's stack bit and
address (to see if it is on this heap).
* alternatively, how would one thread pick up a reference to another one's
objects? are there any ways to detect these events and deal with them?
it might be possible to detect such a case and allocate the object on the heap,
replacing it with a fwd pointer. unfortunately that means we need a read
barrier (ick) to handle forwarding pointers in arbitrary places
* but does that mean we need a fwd pointer to be live for awhile? do we need
a read barrier to get this to work? obviously we want to avoid a read barrier
at all costs.
- what are the real costs of allowing forwarding pointers to exist outside of just
minor GC? assume each runtime primitive would need to be updated to handle the
case where the obj is a fwd pointer - is it just a matter of each function
detecting this and (possibly) calling itself again with the 'real' address?
obviously that makes the runtime slower due to more checks, but maybe it is
not *so* bad?
*/