mirror of
https://git.planet-casio.com/Lephenixnoir/libprof.git
synced 2024-12-26 19:43:41 +01:00
add README
This commit is contained in:
parent
2c78a2cfeb
commit
0566fad0c5
1 changed files with 191 additions and 0 deletions
191
README.md
Normal file
191
README.md
Normal file
|
@ -0,0 +1,191 @@
|
|||
# libprof: A performance profiling library for gint
|
||||
|
||||
libprof is a small gint library that can be used to time and profile the
|
||||
execution of an add-in. Using it, one can record the time spent in one or
|
||||
several functions to identify performance bottlenecks in an application.
|
||||
|
||||
libprof's measurements are accurate down to the microsecond-level thanks to
|
||||
precise hardware timers, so it can also be used to time even small portions of
|
||||
code.
|
||||
|
||||
## Building
|
||||
|
||||
libprof is built only once for both fx-9860G and fx-CG 50, but if you use
|
||||
different compilers you will need to install it twice. The dependencies are:
|
||||
|
||||
* A GCC cross-compiler for a SuperH architecture
|
||||
* The [gint kernel](/Lephenixnoir/gint)
|
||||
|
||||
The Makefile will build the library without further instructions.
|
||||
|
||||
```sh
|
||||
% make
|
||||
```
|
||||
|
||||
By default `sh3eb-elf` is used to build; you can override this by setting the
|
||||
`target` variable.
|
||||
|
||||
```sh
|
||||
% make target=sh4eb-elf
|
||||
```
|
||||
|
||||
Install as usual:
|
||||
|
||||
```sh
|
||||
% make install
|
||||
# or
|
||||
% make install target=sh4eb-elf
|
||||
```
|
||||
|
||||
## Basic use
|
||||
|
||||
To access the library, include the `<libprof.h>` header file.
|
||||
|
||||
```c
|
||||
#include <libprof.h>
|
||||
```
|
||||
|
||||
For each function you want to time, libprof will create a counter. At the start
|
||||
of the program, you need to specify how many functions (libprof calls them
|
||||
*contexts*) you will be timing, so that libprof can allocate enough memory.
|
||||
|
||||
libprof also needs one of gint's timer to actually measure time; it must be one
|
||||
of timers 0, 1 and 2, which are the only one precise enough to do this job. You
|
||||
can use any timer which you are not already using for something else.
|
||||
|
||||
These settings are specified with the `prof_init()` function.
|
||||
|
||||
```c
|
||||
/* Initialize libprof for 13 contexts using timer 0 */
|
||||
prof_init(13, 0);
|
||||
```
|
||||
|
||||
You can then measure the execution time of a function by calling `prof_enter()`
|
||||
at the beginning and `prof_end()` at the end. You just need to "name" the
|
||||
function by giving its context ID, which is any number between 0 and the number
|
||||
of contexts passed to `prof_init()` (here 0 to 12).
|
||||
|
||||
```c
|
||||
void function5(void)
|
||||
{
|
||||
prof_enter(5);
|
||||
/* Do stuff... */
|
||||
prof_leave(5);
|
||||
}
|
||||
```
|
||||
|
||||
This will add `function5()`'s execution time to the 5th counter, so if the
|
||||
function is called several times the total execution time will be recorded.
|
||||
This way, at the end of the program, you can look at the counters to see where
|
||||
most of the time has been spent.
|
||||
|
||||
To retrieve the total execution time of a function, use `prof_time()` :
|
||||
|
||||
```c
|
||||
uint32_t total_function5_us = prof_time(5);
|
||||
```
|
||||
|
||||
This time is measured in microseconds, even though the timers are actually more
|
||||
precise than this. Note that the overhead of `prof_enter()` and `prof_leave()`
|
||||
is usually less than 1 microsecond, so the time is very close to the actual
|
||||
time spent in the function even if the context is frequently entered and left.
|
||||
|
||||
At the end of the program, free the resources of the library by calling
|
||||
`prof_quit()`.
|
||||
|
||||
```c
|
||||
prof_quit();
|
||||
```
|
||||
|
||||
## Managing context numbers
|
||||
|
||||
The number of contexts must be set for all execution and all context IDs must
|
||||
be between 0 and this number (excluded). Managing the numbers by hand is error-
|
||||
prone and can lead to memory errors.
|
||||
|
||||
A simple way of managing context numbers without risking an error is to use an
|
||||
enumeration.
|
||||
|
||||
```c
|
||||
enum {
|
||||
/* Whatever function you need */
|
||||
PROFCTX_FUNCTION1 = 0,
|
||||
PROFCTX_FUNCTION2,
|
||||
PROFCTX_FUNCTION3,
|
||||
|
||||
PROFCTX_COUNT,
|
||||
};
|
||||
```
|
||||
|
||||
Enumerations will assign a value to all the provided names, and increment by
|
||||
one each time. So for example here `PROFCTX_FUNCTION2` is equal to `1` and
|
||||
`PROFCTX_COUNT` is equal to `3`. As you can see this is conveniently equal to
|
||||
the number of contexts, which makes it simple to initialize the library:
|
||||
|
||||
```c
|
||||
prof_init(PROFCTX_COUNT, 0);
|
||||
```
|
||||
|
||||
Then you can use context names instead of numbers:
|
||||
|
||||
```c
|
||||
prof_enter(PROFCTX_FUNCTION1);
|
||||
/* Do stuff... */
|
||||
prof_leave(PROFCTX_FUNCTION1);
|
||||
```
|
||||
|
||||
If you want to use a new context, you just need to add a name in the
|
||||
enumeration (anywhere but after `PROFCTX_COUNT`) and all IDs plus the
|
||||
initialization call will be updated automatically.
|
||||
|
||||
## Timing a single execution
|
||||
|
||||
`prof_enter()` and `prof_leave()` will add the measured execution time to the
|
||||
context counter. Sometimes you want to make individual measurements instead of
|
||||
adding all calls together. To achieve this effect, clear the counter before
|
||||
the measure using `prof_clear()`.
|
||||
|
||||
Here is an example of a function `exec_time_us()` that times the execution of a
|
||||
function `f` passed as parameter.
|
||||
|
||||
```c
|
||||
uint32_t exec_time_us(void (*f)(void))
|
||||
{
|
||||
int ctx = PROFCTX_EXEC_TIME_US;
|
||||
prof_clear(ctx);
|
||||
prof_enter(ctx);
|
||||
|
||||
f();
|
||||
|
||||
prof_leave(ctx);
|
||||
return prof_time(ctx);
|
||||
}
|
||||
```
|
||||
|
||||
## Exploiting the measure's precision
|
||||
|
||||
The overhead of `prof_enter()` and `prof_leave()` is usually less than a
|
||||
microsecond, but the starting time of your benchmark might count (loading data
|
||||
from memory to initialize arrays, performing function calls...). In this case,
|
||||
the best you can do is measure the time difference between two similar calls.
|
||||
|
||||
If you need something even more precise then you can access libprof's counter
|
||||
array directly to get the timer-tick value itself:
|
||||
|
||||
```c
|
||||
uint32_t elapsed_timer_tick = prof_elapsed[ctx];
|
||||
```
|
||||
|
||||
The frequency of this tick is PΦ/4, where the value of PΦ can be obtained by
|
||||
querying gint's clock module:
|
||||
|
||||
```c
|
||||
#include <gint/clock.h>
|
||||
uint32_t tick_freq = clock_freq()->Pphi_f / 4;
|
||||
```
|
||||
|
||||
One noteworthy phenomenon is the startup cost. The first few measurements are
|
||||
always less precise, probably due to cache effects. I frequently have a first
|
||||
measurement with an additional 100 us of execution time and 3 us of overhead,
|
||||
which subsequent tests remove. So it is expected for the first few points of
|
||||
data to lie outside the range of the next.
|
Loading…
Reference in a new issue