cyclone/docs/History.md
2015-07-31 22:08:07 -04:00

7.7 KiB

Background on Writing Cyclone Scheme

This document covers some of the background on how Cyclone was written, including aspects of the compiler and runtime system.

Before we get started, I want to say Thank You to everyone that has contributed to the Scheme community. At the end of this document is a list of online resources that were the most helpful and influential in writing Cyclone. Without quality Scheme resources like these it would not have been possible to write Cyclone.

In addition to those resources, developing Husk Scheme helped me discover many of the resources that would later be used to build Cyclone. In fact, the primary motivation in building Cyclone was to go a step further and understand not only how to write a Scheme compiler but also how to build a runtime system. Over time some of the features and understanding gained in Cyclone may be folded back into Husk.

Table of Contents

Source-to-Source Transformations

My primary inspiration for Cyclone was Marc Feeley's The 90 minute Scheme to C compiler (also video and code). Over the course of 90 minutes, Feeley demonstrates how to compile Scheme to C code using source-to-source transformations, including closure and continuation-passing-style (CPS) conversions.

As outlined in the presentation, some of the difficulties in compiling to C are:

Scheme has, and C does not have

  • tail-calls a.k.a. tail-recursion optimization
  • first-class continuations
  • closures of indefinite extent
  • automatic memory management i.e. garbage collection (GC)

Implications

  • cannot translate (all) Scheme calls into C calls
  • have to implement continuations
  • have to implement closures
  • have to organize things to allow GC

The rest is easy!

To overcome these difficulties a series of source-to-source transformations are used to remove powerful features not provided by C, add constructs required by the C code, and restructure/relabel the code in preparation for generating C. The final code may be compiled direcly to C. Cyclone also includes many other intermediate transformations, including:

Since Scheme represents both code and data using S-Expressions, our compiler does not have to use abstract data types to store the code as would be the case with many other languages.

The 90-minute scc ultimately compiles the code down to a single function and uses jumps to support continuations. This is a bit too limiting for a production compiler, so that part was not used.

C Code Generation

anything worth mentioning here? mention converting scheme sexp => strings? tuples of allocations/code??

C Runtime

A runtime based on Henry Baker's paper CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A. was used as it provides fast code while meeting all of the fundamental requirements for a Scheme runtime: tail calls, garbage collection, and continuations.

Baker explains how it works:

We propose to compile Scheme by converting it into continuation-passing style (CPS), and then compile the resulting lambda expressions into individual C functions. Arguments are passed as normal C arguments, and function calls are normal C calls. Continuation closures and closure environments are passed as extra C arguments. Such a Scheme never executes a C return, so the stack will grow and grow ... eventually, the C "stack" will overflow the space assigned to it, and we must perform garbage collection.

Cheney on the M.T.A. uses a copying garbage collector. By using static roots and the current continuation closure, the GC is able to copy objects from the stack to a pre-allocated heap without having to know the format of C stack frames. To quote Baker:

the entire C "stack" is effectively the youngest generation in a generational garbage collector!

After GC is finished, the C stack pointer is reset using longjmp and the GC calls its continuation.

Here is a snippet demonstrating how C functions may be written using Baker's approach:

object Cyc_make_vector(object cont, object len, object fill) {
  object v = nil;
  int i;
  Cyc_check_int(len);

  // Memory for vector can be allocated directly on the stack
  v = alloca(sizeof(vector_type));

  // Populate vector object
  ((vector)v)->tag = vector_tag;
  ... 

  // Check if GC is needed, then call into continuation with the new vector
  return_funcall1(cont, v);
}

CHICKEN was the first Scheme compiler to use Baker's approach.

Data Types

also mention object types and value types from lisp in small pieces

Interpreter

The Metacircular Evaluator from SICP was used as a starting point for eval.

Macros

Explicit renaming (ER) macros provide a simple, low-level macro system without much more code than eval. Many ER macros from - Chibi Scheme were used to implement the built-in macros in Cyclone.

Scheme Standards

Cyclone targets the R7RS-small specification. This spec is relatively new and provides several incremental improvements from the popular R5RS spec. Library (C module) support is the most important, but there are also exceptions, more system interfaces, and a more consistent API.

Future

Andrew Appel used a similar runtime for Standard ML of New Jersey which is referenced by Baker's paper. Appel's book Compiling with Continuations includes a section on how to implement compiler optimizations - many of which could be applied to Cyclone.

Conclusion

From Feeley's presentation:

Performance is not so bad with NO optimizations (about 6 times slower than Gambit-C with full optimization)

Compared to a similar compiler (CHICKEN), Cyclone's performance is worse but also "not so bad":

$ time cyclone transforms.sld

real 0m22.657s user 0m13.220s sys 0m8.320s

$ time csc transforms.scm

real 0m9.305s user 0m3.732s sys 0m5.064s

References