cyclone/TODO
2015-07-24 17:47:01 -04:00

115 lines
8 KiB
Text

Roadmap:
- Add macro support (instead of current kludge)
- Target r7rs support (coordinate with feature list)
- User manual (or at least API docs)
Working TODO list. should start creating issues for these to get them out of here:
- self-hosting, there are a lot of accumulated TODO's that need to be addressed
- improved error handling:
- param count checks
is what we have now robust enough to prevent segfaults?
- type checking
ideally want to do this in a way that minimizes performance impacts.
will probaby require extensive checks within apply() though, since that
all happens at runtime.
without these, it will be impossible (or at least time-consuming) to debug issues going forward
2) Need to either allow code to read an import after macro expansion, or have another main module for self-hosting
- Documentation improvements
- create a getting started page to go into more detail (build section could move to a that page, could go over build options, rlwrap, etc)
- create a 'how this was built' page to go into more detail about which references were used where
- features section is not accurate, need to assess what is implemented
- eval
there is no concept of macro expansion, probably other deficiencies as well
almost certainly will break when running the self-hosted compiler...
no, I think this is ok. may want to unify macros with compiler side though
- vectors
- make-vector should have an optional 'fill' arg in compiled code
- note allocation functions can be functions instead of macros if they accept a
cont arg so they do not have to return
- should make GC more efficient, only transport mutated vector index, not the whole vector
- After all this works, make sure to add tests from r7rs to test suite
- I/O
- may be able to use fmemopen to implement output strings, although it is not supported on windows
- what else? there must be more stuff
- Reduction in size of generated code
is there anything we can do?
are closures being packed/unpacked unnecessarily?
right now, bundling this with below and attempting to optimize CPS conversion process to generate less code
- Compiler efficiency
Is there a more efficient way to CPS convert expressions? for example, can these primitive applications be inlined, instead of creating wrapping lambdas?
#;10> (cps-convert '(+ 1 (+ 2 2) (+ 3 3)))
((lambda (r$632) ((lambda (r$633) (%halt (+ 1 r$632 r$633))) (+ 3 3))) (+ 2 2))
yes, but what can we do when a function is applied to more than just literals? are there still opportunities for optimization?
may need to look through icyc.c for examples
Are there instances where full CPS is not needed, and multiple expressions can be contained within a single lambda?
- Simple idea about local variable elimination:
http://www.pvk.ca/Blog/2012/02/19/fixed-points-and-strike-mandates/
- String support
issue is how to support strings themselves in memory. can add them directly to the string_type, but then apply won't work
because it could return an unknown number of bytes. on the other hand could use a separate data heap that is mirrored during GC.
may need some extra buffer for that because technically it could overflow any time a new string is allocated, not just during
function calls. but this would work for apply as well as everything else, I believe. obviously it makes GC a bit harder because
there is another pair of heaps to deal with. but all that would be done is that strings from heap A would be copied to B during GC.
GC would need to keep track of a pointer to each one. Sounds straightforward, but have to be careful of complications.
Initial plan:
- Add two "data" heap sections, and vars for each (head ptr, pos ptr [active only?], size)
- Allocate string on active data heap via make_string
- Initiate GC when stack exceeded or data heap under certain threshold
- Need adequate extra space in data heap (100K? make config), since we only check it upon function call
- Need to update GC to copy strings to other heap
- Wait, this is broken if anything is pointing to one of these strings, since memory location changes upon GC!
Is that a fatal issue? How to handle? could write string operations such that any operate on copies of
strings rather than pointing to another string. not nearly as efficient but avoids this problem. could revisit
other solutions down the road.
- Anything else? Probably want to branch for this development, just in case there are complications
COMPLICATION - only need to memcpy strings on data heap during a major collection. during a minor collection the strings are already where they need to be
need to fully-implement this in the runtime by passing minor/major flag to transport
TODO: trigger GC if data heap too low
TODO: once this works but before moving all, consolidate all this in docs/strings.txt or such. would be useful to keep these notes
- Think about consoldating list of primitives in (prim?) and (c-compile-prim?). should also include other information such as number of args (or variable args), for error handling.
- Notes on implementing variables
* lexical addressing (see chapter 5 of SICP) can be used to find a variable in recursive env's, so you can access it directly instead of having to recursively traverse environments.
- Question about closures and continuations:
Presumably every function will recieve a closure. Do we have to differentiate between continuation (which every
function must have) and closure (which can be empty if no fv)? right now the MTA runtime combines the two by
having an fn argument to each closure. Is that OK?
- keeping this in here, should mention this in the 'how this works' doc:
GC - notes from: http://www.more-magic.net/posts/internals-gc.html
JAE - Good notes here about mutations (use write barrier to keep track of changes, EG: vector-set!). remember changes so they can be handled properly during next GC:
Another major oversight is the assumption that objects can only point from the stack into the heap. If Scheme was a purely functional language, this would be entirely accurate: new objects can refer to old objects, but there is no way that a preexisting object can be made to refer to a newly created object. For that, you need to support mutation.
But Scheme does support mutation! So what happens when you use vector-set! to store a newly created, stack-allocated value in an old, heap-allocated vector? If we used the above algorithm, the newly created element would either be part of the live set and get copied, but the vector's pointer would not be updated, or it wouldn't be part of the live set and the object would be lost in the stack reset.
The answer to this problem is also pretty simple: we add a so-called write barrier. Whenever a value is written to an object, it is remembered. Then, when performing a GC, these remembered values are considered to be part of the live set, just like the addresses in the saved call. This is also the reason CHICKEN always shows the number of mutations when you're asking for GC statistics: mutation may slow down a program because GCs might take longer.
JAE - Important point, that the heap must be reallocated during a major GC if there is too much data in the stack / old heap. Considered this but not sure if cyclone's GC does that right now:
The smart reader might have noticed a small problem here: what if the amount of garbage cleaned up is less than the data on the stack? Then, the stack data can't be copied to the new heap because it simply is too small. Well, this is when a third GC mode is triggered: a reallocating GC. This causes a new heap to be allocated, twice as big as the current heap. This is also split in from- and tospace. Then, Cheney's algorithm is performed on the old heap's fromspace, using one half of the new heap as tospace. When it's finished, the new tospace is called fromspace, and the other half of the new heap is called tospace. Then, the old heap is de-allocated.