The checks for VRAM access account for image columns intersecting the
longword before the start of a VRAM line, but not the longword after the
start of a VRAM line. This is now fixed.
Known limitation: OpenLibm can't be installed out of the compiler prefix
yet (because of that pesky openlibm/ prefix that it installs to but does
not use when including).
Nothing particular to change, simply make sure that the DMA channels
have higher priority than the USB module, otherwise the BEMP interrupt
might be executed before the DMA frees the channel, resulting in the
transfer failing because the channel is still busy.
Also reduce BUSWAIT since it works even on high overclock levels, and
keeping it high won't help increase performance.
This changes fixes the way gint uses the FIFO controllers D0F and D1F
to access the FIFO. It previously used D0F in the main thread and D1F
during interrupt handling, but this is incorrect for several reasons,
mainly the possible change of controllers between a write and a commit,
and numerous instances of two FIFOs managing the same pipe caused by
the constant switching.
gint now treats FIFO controllers as resources allocated to pipes for
the duration of a commit-terminated sequence of writes. The same
controller is used for a single pipe in both normal and interrupt
modes, and released when the pipe is committed. If no controller is
available, asynchronous writes fail and synchronous ones wait.
The fxlink API is also added with a small amount of functions, namely
to transfer screenshots and raw text. Currently these are synchronous
and do not use the DMA, this will be improved later.
Finally:
* Removed pipe logic from src/usb/setup.c, instead letting pipes.c
handle the special case of the DCP (which might be regularized later)
* Removed the usb_pipe_mode_{read,write} functions as they're actually
about FIFo controllers and it's not clear yet how a pipe with both
read and write should be handled. This is left for the future.
* Clarified end-of-sequence semantics after a successful commit.
The function was designed with multi-threaded concurrency in mind,
where threads can take over while the lock is held and simply block
trying to acquire it, which allows the lock holder to proceed.
However interrupt handlers are different; they have priority, so once
they start they must complete immediately. The cannot afford to block
on the lock as the program would simply freeze. In exchange, they clean
up before they leave, so there are some guarantees on the execution
state even when interrupted.
The correct protection is therefore not a lock but a temporary block on
interrupts. There is no data race on the value of the saved IMASK
because it is preserved during interrupt handling.
This change introduces new sleep_block() and sleep_unblock() functions
that control whether the sleep() function actually sleeps. This type of
behavior was already implemented in the DMA driver, since DMA access to
on-chip memory is paused when sleeping (on-chip memory being paused
itself), which would make waiting for a DMA transfer a freeze.
Because DMA transfers are now asynchronous, and USB transfers that may
involve on-chip memory are coming, this API change allows the DMA and
USB drivers to block the sleep() function so that user code can sleep()
for interrupts without having to worry about asynchronous tasks
requiring on-chip memory to complete.
This change introduces the global "feature function" that can be
enabled in getkey() to receive events, and use them for
application-wide features. This would be useful, for instance, to
toggle screen backlight with a different key combination that the
default, to capture screenshots, or to implement a catalog.
When enabled, the feature function is present with all new events and
can perform actions, then decide whether or not to return them from
getkey().
Bounds would be moved before drawing the border, therefore displacing
the border. Since drect() already performs all the necessary checks,
this change doesn't try to save a couple of function calls and drops the
redundant checks.
* Properly define the callback time of a write/commit as the time when
the pipe is available again for further writing.
* Refuse commits when writes are pending; instead, enforce a strict
order of finishing writes before committing, which makes sense since
consecutive writes are ordered this way already.
* Properly support callbacks for writes and for commits.
* Define the synchronous APIs in terms of waiting until the callbacks
for equivalent asynchronous functions are invoked (plus initial
waiting for pipes to be ready).
This change adds asynchronous capabilities to the DMA API. Previously,
transfers would start asynchronously but could only be completed by a
call to dma_transfer_wait(). The API now supports a callback, as well
as the dma_transfer_sync() variant, to be consistent with the upcoming
USB API that has both _sync and _async versions of functions.
The interrupt handler of the DMA was changed to include a return to
userland, which is required to perform the callback.
* dma_transfer() is now an obsolete synonym for dma_transfer_async()
with no callback.
* dma_transfer_noint() is now a synonym for dma_transfer_atomic(), for
consistency with the upcoming USB API.
* Change gint_inth_callback()
* Add intc_handler_function() to use C functions as handlers instead of
writing assembler, and use it in the RTC and USB
* Revisit the TMU handlers, which after moving out the callbacks, now
fit into 3 gates (great!), and adapt the ETMU handler
* Improve the timer driver (less code = better code, removed magic
constants assuming the VBR layout on SH3/SH4, etc.)
* Remove 2 gates and a gap from the compact scheme on SH3
* Define timer_configure() to replace timer_setup(), which could not be
cleanly updated to support GINT_CALL()
* Replace rtc_start/stop_timer with rtc_periodic_enable/disable, which
is less confusing because of ETMU being "RTC timers"
Changes in the driver and world system:
* Rewrite driver logic to include more advanced concepts. The notion of
binding a driver to a device is introduced to formalize wait(); power
management is now built-in instead of being handled by the drivers
(for instance DMA). The new driver model is described in great detail
in <gint/drivers.h>
* Formalized the concept of "world switch" where the hardware state is
saved and later restored. As a tool, the world switch turns out to be
very stable, and allows a lot of hardware manipulation that would be
edgy at best when running in the OS world.
* Added a GINT_DRV_SHARED flag for drivers to specify that their state
is shared between worlds and not saved/restored. This has a couple of
uses.
* Exposed a lot more of the internal driver/world system as their is no
particular downside to it. This includes stuff in <gint/drivers.h>
and the driver's state structures in <gint/drivers/states.h>. This is
useful for debugging and for cracked concepts, but there is no
API stability guarantee.
* Added a more flexible driver level system that allows any 2-digit
level to be used.
Feature changes:
* Added a CPU driver that provides the VBR change as its state save.
Because the whole context switch relied on interrupts being disabled
anyway, there is no longer an inversion of control when setting the
VBR; this is just part of the CPU driver's configuration. The CPU
driver may also support other features such as XYRAM block transfer
in the future.
* Moved gint_inthandler() to the INTC driver under the name
intc_handler(), pairing up again with intc_priority().
* Added a reentrant atomic lock based on the test-and-set primitive.
Interrupts are disabled with IMASK=15 for the duration of atomic
operations.
* Enabled the DMA driver on SH7305-based fx-9860G. The DMA provides
little benefit on this platform because the RAM is generally faster
and buffers are ultimately small. The DMA is still not available on
SH3-based fx-9860G models.
* Solved an extremely obnoxious bug in timer_spin_wait() where the
timer is not freed, causing the callback to be called when interrupts
are re-enabled. This increments a random value on the stack. As a
consequence of the change, removed the long delays in the USB driver
since they are not actually needed.
Minor changes:
* Deprecated some of the elements in <gint/hardware.h>. There really is
no good way to "enumerate" devices yet.
* Deprecated gint_switch() in favor of a new function
gint_world_switch() which uses the GINT_CALL abstraction.
* Made the fx-9860G VRAM 32-aligned so that it can be used for tests
with the DMA.
Some features of the driver and world systems have not been implemented
yet, but may be in the future:
* Some driver flags should be per-world in order to create multiple
gint worlds. This would be useful in Yatis' hypervisor.
* A GINT_DRV_LAZY flag would be useful for drivers that don't want to
be started up automatically during a world switch. This is relevant
for drivers that have a slow start/stop sequence. However, this is
tricky to do correctly as it requires dynamic start/stop and also
tracking which world the current hardware state belongs to.
* Add the power management functions (mostly stable even under
overclock; requires some testing, but no known issue)
* Add a dynamic configuration system where interfaces can declare
descriptors with arbitrary endpoint numbers and additional
parameters, and the driver allocates USB resources (endpoints, pipes
and FIFO memory) between interfaces at startup. This allows
implementations of different classes to be independent from each
other.
* Add responses to common SETUP requests.
* Add pipe logic that allows programs to write data synchronously or
asynchronously to pipes, in a single or several fragments, regardless
of the buffer size (still WIP with a few details to polish and the
API is not public yet).
* Add a WIP bulk IN interface that allows sending data to the host.
This will eventually support the fxlink protocol.
This mechanism allows callbacks to be defined with up to 4 32-bit
arguments, and could be extended later. This will hopefully replace the
timer_callback_t used in timers and RTC, and will be added to the DMA
and USB APIs -- the hard part is to not break source compatibility with
previous versions.
The question of how to handle a partially-restored world state begs for
an elegant symmetrical answer, but that doesn't work unless both kernels
do the save/restore for themselves. So far, things have worked out
because any order works since interrupts are disabled therefore
partially-restored drivers are inactive.
However the USB module requires waits that are best performed with
timers, so the order cannot be chosen arbitrarily. This commit enforces
a gint-centric order where code from a gint driver is only run when all
lower-level drivers are active. This solves some pretty bad freezes with
the USB module.
The new allocator uses a segregated best-fit algorithm with exact-size
lists for all sizes between 8 bytes (the minimum) and 60 bytes, one list
for blocks of size 64-252 and one for larger blocks.
Arenas managed by this allocator have built-in statistics that track
used and free memory (accounting for block headers), peak memory, and
various allocation results.
In addition, the allocator has self-checks in the form of integrity
verifications, that can be enabled with -DGINT_KMALLOC_DEBUG=1 at
configuration time or with the :dev configuration for GiteaPC. This is
used by gintctl.
The kmalloc interface is extended with a new arena covering all unused
memory in user RAM, managed by gint's allocator. It spans about 4 kB on
SH3 fx-9860G, 16 kB on SH4 fx-9860G, and 500 kB on fx-CG 50, in addition
to the OS heap. This new arena is now the default arena for malloc(),
except on SH3 where some heap problems are currently known.
This change introduces a centralized memory allocator in the kernel.
This interface can call into multiple arenas, including the default OS
heap and planned arenas managed by a gint algorithm.
The main advantage of this method is that it allows the heap to be
extended over previously-unused areas of RAM such as the end of the
static RAM region (apart from where the stack resides). Not using the OS
heap is also sometimes a matter of correctness since on some OS versions
the heap is known to fragment badly and degrade over time.
I hope the deep control this interfaces gives over meomry allocation
will allow very particular applications like object-specific allocators
in fragmented SPU memory.
This change does not introduce any new algorithm or arena so programs
should behave exactly as before.