libprof/README.md
2021-01-25 15:58:52 +01:00

166 lines
5.1 KiB
Markdown

# libprof: A performance profiling library for gint
libprof is a small gint library that times and profiles the execution of an
add-in. It is useful to record the time spent in one or several functions and
to identify performance bottlenecks in an application.
libprof's measurements are accurate down to the microsecond-level thanks to
precise hardware timers, so it can also be used to time even small portions of
code.
## Installing with GiteaPC
libprof can be built and installed with [GiteaPC](https://gitea.planet-casio.com/Lephenixnoir/GiteaPC), an automation tool for the fxSDK.
```
% giteapc install Lephenixnoir/libprof
```
## Building
libprof is built and installed only once for both fx-9860G and fx-CG 50. The
dependencies are:
* A GCC cross-compiler for a SuperH architecture
* The [gint kernel](/Lephenixnoir/gint)
The Makefile will build and install the library without further instructions.
```sh
% make
% make install
```
By default `sh-elf` is used to build; you can override this by setting the
`target` variable.
```sh
% make target=sh4eb-elf
% make install target=sh4eb-elf
```
If you have the older setup with two toolchains (`sh3eb-elf` and `sh4eb-elf`),
instead of the new one with two targets on the same toolchain (`sh-elf`), you
will need to make and install twice.
## Basic use
To access the library, include the `<libprof.h>` header file and call
`prof_init()` somewhere so that libprof has access to a precise timer. If no
such timer is available, `prof_init()` returns non-zero (but normally either 2
or 3 of the TMU are available when an add-in starts).
```c
#include <libprof.h>
prof_init();
```
To measure execution time, create a profiling context with `prof_make()`, then
call `prof_enter()` at the beginning of the code to time and `prof_leave()` at
the end.
```c
void function_to_time(void)
{
prof_t prof = prof_make();
prof_enter(prof);
/* Do stuff... */
prof_leave(prof);
}
```
The context records the time spent between calls to `prof_enter()` and
`prof_leave()`. It can be entered multiple times and will simply accumulate the
time spent in its counter. When the counter is not running, you can query
recorded time with `prof_time()`.
```c
uint32_t total_function_us = prof_time(prof);
```
This time is returned in microseconds, even though the timers are slightly more
precise than this. Note that the overhead of `prof_enter()` and `prof_leave()`
is usually less than 1 microsecond, so the measure is very close to the
wall-clock time spent in the function even if the context is frequently used.
At the end of the program or whenever you need the timer that libprof is
holding, call `prof_quit()` to free the resources. This will make the timer
available to `timer_setup()` again.
```c
prof_quit();
```
## Recursive functions
The profiler context keeps track of recursive calls so that functions that
enter and leave recursively can be timed as well. The only difference with the
basic example is that we need to make sure a single context is used (instead
of creating a new one in each stack frame). Making it static is enough.
```c
void recursive_function_to_time(void)
{
static prof_t prof = prof_make();
prof_enter(prof);
/* Stuff... */
recursive_function_to_time();
/* Stuff... */
prof_leave(prof);
}
```
However it makes it somewhat difficult to retrieve the elapsed time because it
is hidden withing the function's name scope. Making it global can help.
## Timing a single execution
In many cases applications just need to measure a single piece of code once and
get the resulting time. `prof_exec()` is a shortcut macro that does exactly
that, creating a temporary context and returning elapsed time.
```
uint32_t elapsed_us = prof_exec({
scene_graph_render();
});
```
The macro expands to a short code block that wraps the argument and returns
the `prof_time()` of the temporary context.
## Using the timers's full precision
Hardware timers run at 7-15 MHz depending on the calculator model, so the time
measure it slightly more precise than what `prof_time()` shows you. You can
access that value through the `elapsed` field of a context object.
The value is a count of ticks that occur at a frequency of PΦ/4, where the
value of PΦ can be obtained by querying gint's CPG driver:
```c
#include <gint/clock.h>
uint32_t tick_freq = clock_freq()->Pphi_f / 4;
```
Note that the overhead of `prof_enter()` and `prof_leave()` is usually less
than a microsecond, but more than a few timer ticks.
Due in part to caching effects, the first measurement for a given code sequence
is likely to be larger than the other ones. I have seen effects such as 3 µs
for a no-op (cache misses in libprof's code) and 100 µs for real cases (cache
misses in the code itself). Make sure to make several measurements and use
serious statistical methods!
## Overclock interference
Contexts store timer tick counts, which are converted to microsecond delays
only when `prof_time()` is called. Do not mix measurements performed at
different overclock settings as the results will be erroneous.
What you can do is call `prof_time()` and reset the context (by assigning it
`prof_make()`) before switching clock settings, then add up the microsecond
delays when the execution is over.