cyclone/gc-notes.txt

Phase 1 (gc-dev) - Add gc.h, make sure it compiles.
Phase 2 (gc-dev2) - Change how strings are allocated, to clean up the code and be compatible with a new GC algorithm.
Phase 3 (gc-dev3) - Change from using a Cheney-style copying collector to a naive mark&sweep algorithm.
Phase 4 (gc-dev4) - Integrating new tracing GC algorithm, added new thread data argument to runtime.
Phase 5 (gc-dev5) - Require pthreads library, stand cyclone back up using new GC algorithm.
Phase 6 (gc-dev6) - Multiple mutators (application threads)
Phase 7 (TBD) - Sharing of variables between threads (ideally without limitation, but that might not be realistic)

TODO:
- create a branch and try to use CK atomics. - seems done, just keep an eye on this

- multiple mutators, and threading functions/types.  probably want this on a new branch, when ready
  part of this is implementing the beginnings of srfi-18, to create multiple threads, sync them, etc
  next steps:
  - add mutex type, and associated functions from SRFI-18
    when allocating a mutex, probably should do it on thread since by definition these are
    shared among multiple threads
  - may be able to free mutex using mutex_destroy from within gc_sweep.
    unfortunately it requires type checking each object before free, which is not ideal

  - start making core stuff thread safe. for example, test.scm sometimes
    crashes, I think printing out result from (read)
  - assume I/O and eval both have threading issues

- bring exceptions into local thread data? anything else?
  also, will probably need to lock shared resources such as I/O...


DONE:
- need to cooperate when a mutator is blocked
 IMPLEMENTATION NOTES:

 these become gc_cont and gc_args, so we need them for the wrapper:
     GC(td,cfn,buf,1); return;
 also need the result of the primitive, although that obviously is not
 available until after it finishes blocking. will just have to live with that
 constraint.

 requirements:
  - collector detects initiates async transition
  - collector will need to perform a minor GC instead of this mutator
    will need to pass in top of stack then, since collector won't have that.
    can use address of continuation, if we can guarantee it will always be
    allocated on the stack prior to wrapper call. or can just let the wrapper
    do it, and stash it somewhere collector can get to it
  - collector must set flag immediately to let mutator know what happened
  - mutator must know when the transition occurs, and wait for it to finish
  - can use mutator lock

  will cont always be called via closcall1?
  maybe we need to require prim accepts cont as an arg. might simplify
  calling the wrapper.

  then instead of a wrapper, the prim can call functions to set initial state and cleanup. it already does this to set thread state, so this isn't that big of a change (just call 2 other functions):

  before_blocking {
    set thread state ==> BLOCKING
    set thd->gc_cont to cont, in case collector needs to use it
    set stack_top to new field in "thd", again in case collector needs it
      OR NOT, I think we can use stack_limit for this, to define the
      range of stack addresses
  }

  after_blocking {
    set thread state ==> RUNNABLE
    check async flag
    if set:
      wait for thd->lock
      unset async flag
      transport result to heap, if necessary (not a value type)
      set gc_args[0] to result
      longjmp. assumes gc_cont already set by collector
    else:
      call into cont with result, just like today (see Cyc_io_read_line)
  }

 OLDER NOTES:
  might be able to stop a thread and do a minor GC on it, but no longjmp until after major GC.
  would need to figure out how to repack gc_cont and args
  optionally, some primitives can accept a cont, how to handle? I guess we would have to
  call the primitive with a wrapper instead of the real cont.
  worse, how to handle args to a possibly blocking cont? maybe use some kind of proxy
  objects? do these primitives need to use a read barrier?? ideally want low overhead...

  at the end of the day, obviously need to use a wrapper function to call the primitive,
  instead of calling it directly.

  how to stop a thread? suppose mutator would set a member in thread data (need to mutex/atomic
  that, and be careful about doing that for any shared members), and mutator would need to
  lock somehow if that is set upon return.

  bottom line, only have to worry about this when calling potentially-blocking primitives.
  and if one is blocked when collector is active, then need the collector to cooperate
  instead of the blocked mutator. overally this seems do-able, though there are many details
  to consider.

- how to share variables between threads?
  obviously need to use mutexes (on the application side) to handle access.
  but how to handle the case where an object from one thread is added to
  a list that belongs to another (IE, queueing an object)? because the
  other thread's object might be added as a stack object.

  keep in mind referenced obj may be a list or such that contains many other
  refs to stack objects on another thread

  how can a variable be shared? - cons, vector, set!, define (?), set-car, set-cdr
  can we detect if there will be a problem?
  * adding var to something in this thread - can tell that obj is red and not on this stack
  * modifying list on another thread - if list is on heap, how do we know the 'owning' thread is
    not this one? we would have no idea

  very concerned about how to make this work

  since we only need a minor GC to put the var in the heap, might be able to add a function to trigger a minor GC. could call this function, then it would be safe to move a var to another thread (I think).

  might also need to expose a function that would determine whether any given object lives on the stack, and which thread it is on (or at least, if it belongs to the current one).

  neither is ideal, but might make the whole thing workable. ideally application code would not need to know about stack vs heap

  this feature might end up being gc-dev7 (possibly the final phase)