mirror of
https://git.planet-casio.com/Lephenixnoir/libprof.git
synced 2024-12-26 19:43:41 +01:00
let the user allocate contexts locally
This incompatible change modifies the allocation method for contexts. Instead of managing a set of IDs in an malloc()-backed array, contexts can now be allocated on the stack or static RAM. Contexts are values of type [prof_t] and can be initialized to the value provided by prof_make(). lvalues to contexts replace context IDs. Finally, prof_exec() no longer requires a context to be specified, it creates one on the fly.
This commit is contained in:
parent
8a0bcfe903
commit
5be56fe2eb
3 changed files with 133 additions and 176 deletions
191
README.md
191
README.md
|
@ -1,8 +1,8 @@
|
|||
# libprof: A performance profiling library for gint
|
||||
|
||||
libprof is a small gint library that can be used to time and profile the
|
||||
execution of an add-in. Using it, one can record the time spent in one or
|
||||
several functions to identify performance bottlenecks in an application.
|
||||
libprof is a small gint library that times and profiles the execution of an
|
||||
add-in. It is useful to record the time spent in one or several functions and
|
||||
to identify performance bottlenecks in an application.
|
||||
|
||||
libprof's measurements are accurate down to the microsecond-level thanks to
|
||||
precise hardware timers, so it can also be used to time even small portions of
|
||||
|
@ -11,7 +11,7 @@ code.
|
|||
## Building
|
||||
|
||||
libprof is built and installed only once for both fx-9860G and fx-CG 50. The
|
||||
The dependencies are:
|
||||
dependencies are:
|
||||
|
||||
* A GCC cross-compiler for a SuperH architecture
|
||||
* The [gint kernel](/Lephenixnoir/gint)
|
||||
|
@ -37,153 +37,122 @@ will need to make and install twice.
|
|||
|
||||
## Basic use
|
||||
|
||||
To access the library, include the `<libprof.h>` header file.
|
||||
To access the library, include the `<libprof.h>` header file and call
|
||||
`prof_init()` somewhere so that libprof has access to a precise timer. If no
|
||||
such timer is available, `prof_init()` returns non-zero (but normally either 2
|
||||
or 3 of the TMU are available when an add-in starts).
|
||||
|
||||
```c
|
||||
#include <libprof.h>
|
||||
prof_init();
|
||||
```
|
||||
|
||||
For each function you want to time, libprof will create a counter. At the start
|
||||
of the program, you need to specify how many functions (libprof calls them
|
||||
*contexts*) you will be timing, so that libprof can allocate enough memory.
|
||||
|
||||
libprof also needs one of gint's timer to actually measure time; it selects one
|
||||
of TMU0, TMU1 and TMU2 which are the only ones suited for this precise job. The
|
||||
available timer with the smallest interrupt priority is selected.
|
||||
|
||||
This initialization happens in the `prof_init()` function.
|
||||
To measure execution time, create a profiling context with `prof_make()`, then
|
||||
call `prof_enter()` at the beginning of the code to time and `prof_leave()` at
|
||||
the end.
|
||||
|
||||
```c
|
||||
/* Initialize libprof for 13 contexts (automatically selects a timer) */
|
||||
prof_init(13);
|
||||
```
|
||||
|
||||
You can then measure the execution time of a function by calling `prof_enter()`
|
||||
at the beginning and `prof_end()` at the end. You just need to "name" the
|
||||
function by giving its context ID, which is any number between 0 and the number
|
||||
of contexts passed to `prof_init()` (here 0 to 12).
|
||||
|
||||
```c
|
||||
void function5(void)
|
||||
void function_to_time(void)
|
||||
{
|
||||
prof_enter(5);
|
||||
prof_t prof = prof_make();
|
||||
prof_enter(prof);
|
||||
|
||||
/* Do stuff... */
|
||||
prof_leave(5);
|
||||
|
||||
prof_leave(prof);
|
||||
}
|
||||
```
|
||||
|
||||
This will add `function5()`'s execution time to the 5th counter, so if the
|
||||
function is called several times the total execution time will be recorded.
|
||||
This way, at the end of the program, you can look at the counters to see where
|
||||
most of the time has been spent.
|
||||
|
||||
To retrieve the total execution time of a function, use `prof_time()` :
|
||||
The context records the time spent between calls to `prof_enter()` and
|
||||
`prof_leave()`. It can be entered multiple times and will simply accumulate the
|
||||
time spent in its counter. When the counter is not running, you can query
|
||||
recorded time with `prof_time()`.
|
||||
|
||||
```c
|
||||
uint32_t total_function5_us = prof_time(5);
|
||||
uint32_t total_function_us = prof_time(prof);
|
||||
```
|
||||
|
||||
This time is measured in microseconds, even though the timers are actually more
|
||||
This time is returned in microseconds, even though the timers are slightly more
|
||||
precise than this. Note that the overhead of `prof_enter()` and `prof_leave()`
|
||||
is usually less than 1 microsecond, so the time is very close to the actual
|
||||
time spent in the function even if the context is frequently entered and left.
|
||||
is usually less than 1 microsecond, so the measure is very close to the
|
||||
wall-clock time spent in the function even if the context is frequently used.
|
||||
|
||||
At the end of the program, free the resources of the library by calling
|
||||
`prof_quit()`.
|
||||
At the end of the program or whenever you need the timer that libprof is
|
||||
holding, call `prof_quit()` to free the resources. This will make the timer
|
||||
available to `timer_setup()` again.
|
||||
|
||||
```c
|
||||
prof_quit();
|
||||
```
|
||||
|
||||
## Managing context numbers
|
||||
## Recursive functions
|
||||
|
||||
The number of contexts must be set for all execution and all context IDs must
|
||||
be between 0 and this number (excluded). Managing the numbers by hand is error-
|
||||
prone and can lead to memory errors.
|
||||
|
||||
A simple way of managing context numbers without risking an error is to use an
|
||||
enumeration.
|
||||
The profiler context keeps track of recursive calls so that functions that
|
||||
enter and leave recursively can be timed as well. The only difference with the
|
||||
basic example is that we need to make sure a single context is used (instead
|
||||
of creating a new one in each stack frame). Making it static is enough.
|
||||
|
||||
```c
|
||||
enum {
|
||||
/* Whatever function you need */
|
||||
PROFCTX_FUNCTION1 = 0,
|
||||
PROFCTX_FUNCTION2,
|
||||
PROFCTX_FUNCTION3,
|
||||
|
||||
PROFCTX_COUNT,
|
||||
};
|
||||
```
|
||||
|
||||
Enumerations will assign a value to all the provided names, and increment by
|
||||
one each time. So for example here `PROFCTX_FUNCTION2` is equal to `1` and
|
||||
`PROFCTX_COUNT` is equal to `3`. As you can see this is conveniently equal to
|
||||
the number of contexts, which makes it simple to initialize the library:
|
||||
|
||||
```c
|
||||
prof_init(PROFCTX_COUNT);
|
||||
```
|
||||
|
||||
Then you can use context names instead of numbers:
|
||||
|
||||
```c
|
||||
prof_enter(PROFCTX_FUNCTION1);
|
||||
/* Do stuff... */
|
||||
prof_leave(PROFCTX_FUNCTION1);
|
||||
```
|
||||
|
||||
If you want to use a new context, you just need to add a name in the
|
||||
enumeration (anywhere but after `PROFCTX_COUNT`) and all IDs plus the
|
||||
initialization call will be updated automatically.
|
||||
|
||||
## Timing a single execution
|
||||
|
||||
`prof_enter()` and `prof_leave()` will add the measured execution time to the
|
||||
context counter. Sometimes you want to make individual measurements instead of
|
||||
adding all calls together. To achieve this effect, clear the counter before
|
||||
the measure using `prof_clear()`.
|
||||
|
||||
Here is an example of a function `exec_time_us()` that times the execution of a
|
||||
function `f` passed as parameter.
|
||||
|
||||
```c
|
||||
uint32_t exec_time_us(void (*f)(void))
|
||||
void recursive_function_to_time(void)
|
||||
{
|
||||
int ctx = PROFCTX_EXEC_TIME_US;
|
||||
prof_clear(ctx);
|
||||
prof_enter(ctx);
|
||||
static prof_t prof = prof_make();
|
||||
prof_enter(prof);
|
||||
|
||||
f();
|
||||
/* Stuff... */
|
||||
recursive_function_to_time();
|
||||
/* Stuff... */
|
||||
|
||||
prof_leave(ctx);
|
||||
return prof_time(ctx);
|
||||
prof_leave(prof);
|
||||
}
|
||||
```
|
||||
|
||||
## Exploiting the measure's precision
|
||||
However it makes it somewhat difficult to retrieve the elapsed time because it
|
||||
is hidden withing the function's name scope. Making it global can help.
|
||||
|
||||
The overhead of `prof_enter()` and `prof_leave()` is usually less than a
|
||||
microsecond, but the starting time of your benchmark might count (loading data
|
||||
from memory to initialize arrays, performing function calls...). In this case,
|
||||
the best you can do is measure the time difference between two similar calls.
|
||||
## Timing a single execution
|
||||
|
||||
If you need something even more precise then you can access libprof's counter
|
||||
array directly to get the timer-tick value itself:
|
||||
In many cases applications just need to measure a single piece of code once and
|
||||
get the resulting time. `prof_exec()` is a shortcut macro that does exactly
|
||||
that, creating a temporary context and returning elapsed time.
|
||||
|
||||
```c
|
||||
uint32_t elapsed_timer_tick = prof_elapsed[ctx];
|
||||
```
|
||||
uint32_t elapsed_us = prof_exec({
|
||||
scene_graph_render();
|
||||
});
|
||||
```
|
||||
|
||||
The frequency of this tick is PΦ/4, where the value of PΦ can be obtained by
|
||||
querying gint's clock module:
|
||||
The macro expands to a short code block that wraps the argument and returns
|
||||
the `prof_time()` of the temporary context.
|
||||
|
||||
## Using the timers's full precision
|
||||
|
||||
Hardware timers run at 7-15 MHz depending on the calculator model, so the time
|
||||
measure it slightly more precise than what `prof_time()` shows you. You can
|
||||
access that value through the `elapsed` field of a context object.
|
||||
|
||||
The value is a count of ticks that occur at a frequency of PΦ/4, where the
|
||||
value of PΦ can be obtained by querying gint's CPG driver:
|
||||
|
||||
```c
|
||||
#include <gint/clock.h>
|
||||
uint32_t tick_freq = clock_freq()->Pphi_f / 4;
|
||||
```
|
||||
|
||||
One noteworthy phenomenon is the startup cost. The first few measurements are
|
||||
always less precise, probably due to cache effects. I frequently have a first
|
||||
measurement with an additional 100 us of execution time and 3 us of overhead,
|
||||
which subsequent tests remove. So it is expected for the first few points of
|
||||
data to lie outside the range of the next.
|
||||
Note that the overhead of `prof_enter()` and `prof_leave()` is usually less
|
||||
than a microsecond, but more than a few timer ticks.
|
||||
|
||||
Due in part to caching effects, the first measurement for a given code sequence
|
||||
is likely to be larger than the other ones. I have seen effects such as 3 µs
|
||||
for a no-op (cache misses in libprof's code) and 100 µs for real cases (cache
|
||||
misses in the code itself). Make sure to make several measurements and use
|
||||
serious statistical methods!
|
||||
|
||||
## Overclock interference
|
||||
|
||||
Contexts store timer tick counts, which are converted to microsecond delays
|
||||
only when `prof_time()` is called. Do not mix measurements performed at
|
||||
different overclock settings as the results will be erroneous.
|
||||
|
||||
What you can do is call `prof_time()` and reset the context (by assigning it
|
||||
`prof_make()`) before switching clock settings, then add up the microsecond
|
||||
delays when the execution is over.
|
||||
|
|
48
libprof.c
48
libprof.c
|
@ -1,51 +1,34 @@
|
|||
#include <gint/timer.h>
|
||||
#include <gint/clock.h>
|
||||
#include <gint/mpu/tmu.h>
|
||||
#include <gint/std/stdlib.h>
|
||||
#include <gint/hardware.h>
|
||||
|
||||
#include <libprof.h>
|
||||
|
||||
/* Recursion depth of each function currently being executed */
|
||||
uint8_t *prof_rec = NULL;
|
||||
/* Time that has elapsed within each function; the value for a given function
|
||||
is only relevant when it is not executing, due to optimizations */
|
||||
uint32_t *prof_elapsed = NULL;
|
||||
/* Timer counter */
|
||||
uint32_t volatile *prof_tcnt = NULL;
|
||||
/* Timer ID */
|
||||
static int prof_timer = -1;
|
||||
|
||||
/* prof_init(): Initialize the profiler's data and timer */
|
||||
int prof_init(int context_count)
|
||||
/* prof_init(): Initialize the profiler's timer */
|
||||
int prof_init(void)
|
||||
{
|
||||
prof_rec = malloc(context_count * sizeof *prof_rec);
|
||||
prof_elapsed = malloc(context_count * sizeof *prof_elapsed);
|
||||
|
||||
/* Get a TMU with the exact constant 0xffffffff */
|
||||
int timer = -1;
|
||||
for(int t = 2; t >= 0 && timer == -1; t--)
|
||||
{
|
||||
timer = timer_setup(t, 0xffffffff, NULL);
|
||||
}
|
||||
|
||||
if(!prof_rec || !prof_elapsed || timer == -1)
|
||||
if(timer == -1)
|
||||
{
|
||||
prof_quit();
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Fix the configuration done by gint by disabling the interrupt */
|
||||
if(isSH3())
|
||||
{
|
||||
SH7705_TMU.TMU[timer].TCR.UNIE = 0;
|
||||
prof_tcnt = &SH7705_TMU.TMU[timer].TCNT;
|
||||
}
|
||||
else
|
||||
{
|
||||
SH7305_TMU.TMU[timer].TCR.UNIE = 0;
|
||||
prof_tcnt = &SH7305_TMU.TMU[timer].TCNT;
|
||||
}
|
||||
/* Keep the address of the TCNT register */
|
||||
prof_tcnt = isSH3()
|
||||
? &SH7705_TMU.TMU[timer].TCNT
|
||||
: &SH7305_TMU.TMU[timer].TCNT;
|
||||
|
||||
timer_start(timer);
|
||||
prof_timer = timer;
|
||||
|
@ -53,27 +36,16 @@ int prof_init(int context_count)
|
|||
return 0;
|
||||
}
|
||||
|
||||
/* prof_quit(): Free the profiler's data and timer */
|
||||
/* prof_quit(): Free the profiler's timer */
|
||||
void prof_quit(void)
|
||||
{
|
||||
if(prof_timer >= 0) timer_stop(prof_timer);
|
||||
if(prof_rec) free(prof_rec);
|
||||
if(prof_elapsed) free(prof_elapsed);
|
||||
|
||||
prof_timer = -1;
|
||||
prof_rec = NULL;
|
||||
prof_elapsed = NULL;
|
||||
}
|
||||
|
||||
//---
|
||||
// Post-measurement analysis
|
||||
//---
|
||||
|
||||
/* prof_time(): Time spent in a given context, in microseconds */
|
||||
uint32_t prof_time(int ctx)
|
||||
uint32_t prof_time(prof_t prof)
|
||||
{
|
||||
int Pphi = clock_freq()->Pphi_f;
|
||||
uint64_t time = prof_elapsed[ctx];
|
||||
|
||||
return (time * 4 * 1000000) / Pphi;
|
||||
return ((uint64_t)prof.elapsed * 4 * 1000000) / Pphi;
|
||||
}
|
||||
|
|
70
libprof.h
70
libprof.h
|
@ -6,37 +6,43 @@
|
|||
#define LIBPROF_LIBPROF
|
||||
|
||||
#include <stdint.h>
|
||||
#include <gint/defs/attributes.h>
|
||||
|
||||
//---
|
||||
// Initialization
|
||||
//---
|
||||
|
||||
/* prof_init(): Initialize the profiler's data and timer
|
||||
/* prof_init(): Initialize the profiler's timer
|
||||
|
||||
Initializes [prof_rec] and [prof_time] (see below) with enough elements to
|
||||
hold all the context IDs. Context IDs should be numbered from 0 to [n-1];
|
||||
due to speed requirements array bounds are not checked so be careful.
|
||||
Starts a timer to count time. libprof automatically selects a TMU and tries
|
||||
to use TMU2 before TMU1 before TMU0 so that high-priority interrupts remain
|
||||
available, and sets an accurate clock configuration.
|
||||
|
||||
Also starts a timer to count time. libprof automatically selects a TMU and
|
||||
tries to use TMU2 before TMU1 before TMU0 so that high-priority interrupts
|
||||
remain available, and sets an accurate clock configuration.
|
||||
Returns non-zero if no timer is available. */
|
||||
int prof_init(void);
|
||||
|
||||
@context_count Number of different contexts that will be measured
|
||||
Returns non-zero if a setup error occurs or no timer is available. */
|
||||
int prof_init(int context_count);
|
||||
|
||||
/* prof_quit(): Free the profiler's data and timer */
|
||||
/* prof_quit(): Free the profiler's timer */
|
||||
void prof_quit(void);
|
||||
|
||||
//---
|
||||
// Runtime time measurement
|
||||
//---
|
||||
|
||||
/* Recursion depth of each function currently being executed */
|
||||
extern uint8_t *prof_rec;
|
||||
/* Time that has elapsed within each function; the value for a given function
|
||||
is only relevant when it is not executing, due to optimizations */
|
||||
extern uint32_t *prof_elapsed;
|
||||
/* Context object, has an elasped delay and a recursion level. This object can
|
||||
be created on the stack of a function that measures its execution time,
|
||||
except if the function is recursive, in which case it should be either
|
||||
static or global. */
|
||||
typedef struct prof_t
|
||||
{
|
||||
uint32_t rec;
|
||||
uint32_t elapsed;
|
||||
|
||||
} GPACKED(4) prof_t;
|
||||
|
||||
/* prof_make(): Create a new context object
|
||||
A context can be cleared by assigning it prof_make() again. */
|
||||
#define prof_make() ((prof_t){ 0, 0 })
|
||||
|
||||
/* Timer counter */
|
||||
extern uint32_t volatile *prof_tcnt;
|
||||
|
||||
|
@ -44,8 +50,8 @@ extern uint32_t volatile *prof_tcnt;
|
|||
This macro should be called at the start of the context scope. If the
|
||||
function was already executing then the deepest instance in the stack is
|
||||
used instead of creating a new counter. */
|
||||
#define prof_enter(ctx) { \
|
||||
if(!prof_rec[ctx]++) prof_elapsed[ctx] += *prof_tcnt; \
|
||||
#define prof_enter(prof) { \
|
||||
if(!prof.rec++) prof.elapsed += *prof_tcnt; \
|
||||
}
|
||||
|
||||
/* prof_leave(): Stop counting time for a function
|
||||
|
@ -53,15 +59,25 @@ extern uint32_t volatile *prof_tcnt;
|
|||
stops if there is no deeper instance of the context in the stack. If there
|
||||
are not as exactly as many prof_leave()'s as prof_enter()'s then the
|
||||
resulting time measure will not be relevant at all. */
|
||||
#define prof_leave(ctx) { \
|
||||
if(!--prof_rec[ctx]) prof_elapsed[ctx] -= *prof_tcnt; \
|
||||
#define prof_leave(prof) { \
|
||||
if(!--prof.rec) prof.elapsed -= *prof_tcnt; \
|
||||
}
|
||||
|
||||
/* prof_clear(): Clear a context's counter
|
||||
This operation is defined only if the context is not being profiled. */
|
||||
#define prof_clear(ctx) { \
|
||||
prof_elapsed[ctx] = 0; \
|
||||
}
|
||||
/* prof_exec(): Measure a single block of code
|
||||
This operation can be used when profiling is not required, and instead
|
||||
libprof is used to measure the performance of a single bit of code. Use it
|
||||
like this:
|
||||
|
||||
uint32_t elasped_us = prof_exec({
|
||||
exec_code();
|
||||
}); */
|
||||
#define prof_exec(code) ({ \
|
||||
prof_t prof = prof_make(); \
|
||||
prof_enter(prof); \
|
||||
code; \
|
||||
prof_leave(prof); \
|
||||
prof_time(prof); \
|
||||
})
|
||||
|
||||
//---
|
||||
// Post-measurement analysis
|
||||
|
@ -69,6 +85,6 @@ extern uint32_t volatile *prof_tcnt;
|
|||
|
||||
/* prof_time(): Time spent in a given context, in microseconds
|
||||
Should only be called when the context is not currently executing. */
|
||||
uint32_t prof_time(int ctx);
|
||||
uint32_t prof_time(prof_t prof);
|
||||
|
||||
#endif /* LIBPROF_LIBPROF */
|
||||
|
|
Loading…
Reference in a new issue