The original Scheme implementation is astonishingly slow. Rewriting
SHA-2 in C yields around x10000 speed boost for premade strings and
bytevectors. For input ports this is alleviated to x100 boost.
The implementation is divided into two parts: native computational
backend and thin Scheme interface. It is tedious to properly do IO
from C, so the Scheme code handles reading data from an input port
and the C code performs actual computations on byte buffers (which
is also used to handle strings and bytevectors directly).
Scheme wrapper reads data in chunked manner with 'read-bytevector'.
Currently, the chunk size has insignificant impact on performance
as soon as it is bigger than 64. Also, using simply read-bytevector
turned out to be 33% faster than preallocating a buffer and filling
it with read-bytevector!
One tricky part is how to get exact 32-bit integers in C89. We have
no <inttypes.h> there, so instead we use <limits.h> to see whether
we have a standard type with suitable boundaries.
The other one is how to return a properly tagged sha_context from C.
Chibi FFI currently cannot handle the case when a procedure returns
either a C pointer (which needs to be boxed) or an exception (which
should be left as is). To workaround this sexp_start_sha() receives
a dummy argument of type sha_context; this makes Chibi FFI to put a
proper type tag into 'self', which is then extracted in the C code.
This commits adds a new shared library 'crypto$(SO)' with intent to
keep there all native code of (chibi crypto) libraries. This allows
to simply put any future native implementation of SHA-512 or MD5 in
some md5.c and just include those files into crypto.stub.