mirror of
https://git.planet-casio.com/Lephenixnoir/libprof.git
synced 2024-12-29 13:03:42 +01:00
189 lines
5.7 KiB
Markdown
189 lines
5.7 KiB
Markdown
# libprof: A performance profiling library for gint
|
|
|
|
libprof is a small gint library that can be used to time and profile the
|
|
execution of an add-in. Using it, one can record the time spent in one or
|
|
several functions to identify performance bottlenecks in an application.
|
|
|
|
libprof's measurements are accurate down to the microsecond-level thanks to
|
|
precise hardware timers, so it can also be used to time even small portions of
|
|
code.
|
|
|
|
## Building
|
|
|
|
libprof is built and installed only once for both fx-9860G and fx-CG 50. The
|
|
The dependencies are:
|
|
|
|
* A GCC cross-compiler for a SuperH architecture
|
|
* The [gint kernel](/Lephenixnoir/gint)
|
|
|
|
The Makefile will build and install the library without further instructions.
|
|
|
|
```sh
|
|
% make
|
|
% make install
|
|
```
|
|
|
|
By default `sh-elf` is used to build; you can override this by setting the
|
|
`target` variable.
|
|
|
|
```sh
|
|
% make target=sh4eb-elf
|
|
% make install target=sh4eb-elf
|
|
```
|
|
|
|
If you have the older setup with two toolchains (`sh3eb-elf` and `sh4eb-elf`),
|
|
instead of the new one with two targets on the same toolchain (`sh-elf`), you
|
|
will need to make and install twice.
|
|
|
|
## Basic use
|
|
|
|
To access the library, include the `<libprof.h>` header file.
|
|
|
|
```c
|
|
#include <libprof.h>
|
|
```
|
|
|
|
For each function you want to time, libprof will create a counter. At the start
|
|
of the program, you need to specify how many functions (libprof calls them
|
|
*contexts*) you will be timing, so that libprof can allocate enough memory.
|
|
|
|
libprof also needs one of gint's timer to actually measure time; it must be one
|
|
of timers 0, 1 and 2, which are the only one precise enough to do this job. You
|
|
can use any timer which you are not already using for something else.
|
|
|
|
These settings are specified with the `prof_init()` function.
|
|
|
|
```c
|
|
/* Initialize libprof for 13 contexts using timer 0 */
|
|
prof_init(13, 0);
|
|
```
|
|
|
|
You can then measure the execution time of a function by calling `prof_enter()`
|
|
at the beginning and `prof_end()` at the end. You just need to "name" the
|
|
function by giving its context ID, which is any number between 0 and the number
|
|
of contexts passed to `prof_init()` (here 0 to 12).
|
|
|
|
```c
|
|
void function5(void)
|
|
{
|
|
prof_enter(5);
|
|
/* Do stuff... */
|
|
prof_leave(5);
|
|
}
|
|
```
|
|
|
|
This will add `function5()`'s execution time to the 5th counter, so if the
|
|
function is called several times the total execution time will be recorded.
|
|
This way, at the end of the program, you can look at the counters to see where
|
|
most of the time has been spent.
|
|
|
|
To retrieve the total execution time of a function, use `prof_time()` :
|
|
|
|
```c
|
|
uint32_t total_function5_us = prof_time(5);
|
|
```
|
|
|
|
This time is measured in microseconds, even though the timers are actually more
|
|
precise than this. Note that the overhead of `prof_enter()` and `prof_leave()`
|
|
is usually less than 1 microsecond, so the time is very close to the actual
|
|
time spent in the function even if the context is frequently entered and left.
|
|
|
|
At the end of the program, free the resources of the library by calling
|
|
`prof_quit()`.
|
|
|
|
```c
|
|
prof_quit();
|
|
```
|
|
|
|
## Managing context numbers
|
|
|
|
The number of contexts must be set for all execution and all context IDs must
|
|
be between 0 and this number (excluded). Managing the numbers by hand is error-
|
|
prone and can lead to memory errors.
|
|
|
|
A simple way of managing context numbers without risking an error is to use an
|
|
enumeration.
|
|
|
|
```c
|
|
enum {
|
|
/* Whatever function you need */
|
|
PROFCTX_FUNCTION1 = 0,
|
|
PROFCTX_FUNCTION2,
|
|
PROFCTX_FUNCTION3,
|
|
|
|
PROFCTX_COUNT,
|
|
};
|
|
```
|
|
|
|
Enumerations will assign a value to all the provided names, and increment by
|
|
one each time. So for example here `PROFCTX_FUNCTION2` is equal to `1` and
|
|
`PROFCTX_COUNT` is equal to `3`. As you can see this is conveniently equal to
|
|
the number of contexts, which makes it simple to initialize the library:
|
|
|
|
```c
|
|
prof_init(PROFCTX_COUNT, 0);
|
|
```
|
|
|
|
Then you can use context names instead of numbers:
|
|
|
|
```c
|
|
prof_enter(PROFCTX_FUNCTION1);
|
|
/* Do stuff... */
|
|
prof_leave(PROFCTX_FUNCTION1);
|
|
```
|
|
|
|
If you want to use a new context, you just need to add a name in the
|
|
enumeration (anywhere but after `PROFCTX_COUNT`) and all IDs plus the
|
|
initialization call will be updated automatically.
|
|
|
|
## Timing a single execution
|
|
|
|
`prof_enter()` and `prof_leave()` will add the measured execution time to the
|
|
context counter. Sometimes you want to make individual measurements instead of
|
|
adding all calls together. To achieve this effect, clear the counter before
|
|
the measure using `prof_clear()`.
|
|
|
|
Here is an example of a function `exec_time_us()` that times the execution of a
|
|
function `f` passed as parameter.
|
|
|
|
```c
|
|
uint32_t exec_time_us(void (*f)(void))
|
|
{
|
|
int ctx = PROFCTX_EXEC_TIME_US;
|
|
prof_clear(ctx);
|
|
prof_enter(ctx);
|
|
|
|
f();
|
|
|
|
prof_leave(ctx);
|
|
return prof_time(ctx);
|
|
}
|
|
```
|
|
|
|
## Exploiting the measure's precision
|
|
|
|
The overhead of `prof_enter()` and `prof_leave()` is usually less than a
|
|
microsecond, but the starting time of your benchmark might count (loading data
|
|
from memory to initialize arrays, performing function calls...). In this case,
|
|
the best you can do is measure the time difference between two similar calls.
|
|
|
|
If you need something even more precise then you can access libprof's counter
|
|
array directly to get the timer-tick value itself:
|
|
|
|
```c
|
|
uint32_t elapsed_timer_tick = prof_elapsed[ctx];
|
|
```
|
|
|
|
The frequency of this tick is PΦ/4, where the value of PΦ can be obtained by
|
|
querying gint's clock module:
|
|
|
|
```c
|
|
#include <gint/clock.h>
|
|
uint32_t tick_freq = clock_freq()->Pphi_f / 4;
|
|
```
|
|
|
|
One noteworthy phenomenon is the startup cost. The first few measurements are
|
|
always less precise, probably due to cache effects. I frequently have a first
|
|
measurement with an additional 100 us of execution time and 3 us of overhead,
|
|
which subsequent tests remove. So it is expected for the first few points of
|
|
data to lie outside the range of the next.
|