mirror of
https://git.planet-casio.com/Lephenixnoir/libprof.git
synced 2024-12-27 20:13:43 +01:00
166 lines
5.1 KiB
Markdown
166 lines
5.1 KiB
Markdown
# libprof: A performance profiling library for gint
|
|
|
|
libprof is a small gint library that times and profiles the execution of an
|
|
add-in. It is useful to record the time spent in one or several functions and
|
|
to identify performance bottlenecks in an application.
|
|
|
|
libprof's measurements are accurate down to the microsecond-level thanks to
|
|
precise hardware timers, so it can also be used to time even small portions of
|
|
code.
|
|
|
|
## Installing with GiteaPC
|
|
|
|
libprof can be built and installed with [GiteaPC](https://gitea.planet-casio.com/Lephenixnoir/GiteaPC), an automation tool for the fxSDK.
|
|
|
|
```
|
|
% giteapc install Lephenixnoir/libprof
|
|
```
|
|
|
|
## Building
|
|
|
|
libprof is built and installed only once for both fx-9860G and fx-CG 50. The
|
|
dependencies are:
|
|
|
|
* A GCC cross-compiler for a SuperH architecture
|
|
* The [gint kernel](/Lephenixnoir/gint)
|
|
|
|
The Makefile will build and install the library without further instructions.
|
|
|
|
```sh
|
|
% make
|
|
% make install
|
|
```
|
|
|
|
By default `sh-elf` is used to build; you can override this by setting the
|
|
`target` variable.
|
|
|
|
```sh
|
|
% make target=sh4eb-elf
|
|
% make install target=sh4eb-elf
|
|
```
|
|
|
|
If you have the older setup with two toolchains (`sh3eb-elf` and `sh4eb-elf`),
|
|
instead of the new one with two targets on the same toolchain (`sh-elf`), you
|
|
will need to make and install twice.
|
|
|
|
## Basic use
|
|
|
|
To access the library, include the `<libprof.h>` header file and call
|
|
`prof_init()` somewhere so that libprof has access to a precise timer. If no
|
|
such timer is available, `prof_init()` returns non-zero (but normally either 2
|
|
or 3 of the TMU are available when an add-in starts).
|
|
|
|
```c
|
|
#include <libprof.h>
|
|
prof_init();
|
|
```
|
|
|
|
To measure execution time, create a profiling context with `prof_make()`, then
|
|
call `prof_enter()` at the beginning of the code to time and `prof_leave()` at
|
|
the end.
|
|
|
|
```c
|
|
void function_to_time(void)
|
|
{
|
|
prof_t prof = prof_make();
|
|
prof_enter(prof);
|
|
|
|
/* Do stuff... */
|
|
|
|
prof_leave(prof);
|
|
}
|
|
```
|
|
|
|
The context records the time spent between calls to `prof_enter()` and
|
|
`prof_leave()`. It can be entered multiple times and will simply accumulate the
|
|
time spent in its counter. When the counter is not running, you can query
|
|
recorded time with `prof_time()`.
|
|
|
|
```c
|
|
uint32_t total_function_us = prof_time(prof);
|
|
```
|
|
|
|
This time is returned in microseconds, even though the timers are slightly more
|
|
precise than this. Note that the overhead of `prof_enter()` and `prof_leave()`
|
|
is usually less than 1 microsecond, so the measure is very close to the
|
|
wall-clock time spent in the function even if the context is frequently used.
|
|
|
|
At the end of the program or whenever you need the timer that libprof is
|
|
holding, call `prof_quit()` to free the resources. This will make the timer
|
|
available to `timer_setup()` again.
|
|
|
|
```c
|
|
prof_quit();
|
|
```
|
|
|
|
## Recursive functions
|
|
|
|
The profiler context keeps track of recursive calls so that functions that
|
|
enter and leave recursively can be timed as well. The only difference with the
|
|
basic example is that we need to make sure a single context is used (instead
|
|
of creating a new one in each stack frame). Making it static is enough.
|
|
|
|
```c
|
|
void recursive_function_to_time(void)
|
|
{
|
|
static prof_t prof = prof_make();
|
|
prof_enter(prof);
|
|
|
|
/* Stuff... */
|
|
recursive_function_to_time();
|
|
/* Stuff... */
|
|
|
|
prof_leave(prof);
|
|
}
|
|
```
|
|
|
|
However it makes it somewhat difficult to retrieve the elapsed time because it
|
|
is hidden withing the function's name scope. Making it global can help.
|
|
|
|
## Timing a single execution
|
|
|
|
In many cases applications just need to measure a single piece of code once and
|
|
get the resulting time. `prof_exec()` is a shortcut macro that does exactly
|
|
that, creating a temporary context and returning elapsed time.
|
|
|
|
```
|
|
uint32_t elapsed_us = prof_exec({
|
|
scene_graph_render();
|
|
});
|
|
```
|
|
|
|
The macro expands to a short code block that wraps the argument and returns
|
|
the `prof_time()` of the temporary context.
|
|
|
|
## Using the timers's full precision
|
|
|
|
Hardware timers run at 7-15 MHz depending on the calculator model, so the time
|
|
measure it slightly more precise than what `prof_time()` shows you. You can
|
|
access that value through the `elapsed` field of a context object.
|
|
|
|
The value is a count of ticks that occur at a frequency of PΦ/4, where the
|
|
value of PΦ can be obtained by querying gint's CPG driver:
|
|
|
|
```c
|
|
#include <gint/clock.h>
|
|
uint32_t tick_freq = clock_freq()->Pphi_f / 4;
|
|
```
|
|
|
|
Note that the overhead of `prof_enter()` and `prof_leave()` is usually less
|
|
than a microsecond, but more than a few timer ticks.
|
|
|
|
Due in part to caching effects, the first measurement for a given code sequence
|
|
is likely to be larger than the other ones. I have seen effects such as 3 µs
|
|
for a no-op (cache misses in libprof's code) and 100 µs for real cases (cache
|
|
misses in the code itself). Make sure to make several measurements and use
|
|
serious statistical methods!
|
|
|
|
## Overclock interference
|
|
|
|
Contexts store timer tick counts, which are converted to microsecond delays
|
|
only when `prof_time()` is called. Do not mix measurements performed at
|
|
different overclock settings as the results will be erroneous.
|
|
|
|
What you can do is call `prof_time()` and reset the context (by assigning it
|
|
`prof_make()`) before switching clock settings, then add up the microsecond
|
|
delays when the execution is over.
|