let the user allocate contexts locally

This incompatible change modifies the allocation method for contexts.
Instead of managing a set of IDs in an malloc()-backed array, contexts
can now be allocated on the stack or static RAM.

Contexts are values of type [prof_t] and can be initialized to the value
provided by prof_make(). lvalues to contexts replace context IDs.

Finally, prof_exec() no longer requires a context to be specified, it
creates one on the fly.
This commit is contained in:
Lephenixnoir 2020-10-13 19:30:45 +02:00
parent 8a0bcfe903
commit 5be56fe2eb
No known key found for this signature in database
GPG key ID: 1BBA026E13FC0495
3 changed files with 133 additions and 176 deletions

191
README.md
View file

@ -1,8 +1,8 @@
# libprof: A performance profiling library for gint # libprof: A performance profiling library for gint
libprof is a small gint library that can be used to time and profile the libprof is a small gint library that times and profiles the execution of an
execution of an add-in. Using it, one can record the time spent in one or add-in. It is useful to record the time spent in one or several functions and
several functions to identify performance bottlenecks in an application. to identify performance bottlenecks in an application.
libprof's measurements are accurate down to the microsecond-level thanks to libprof's measurements are accurate down to the microsecond-level thanks to
precise hardware timers, so it can also be used to time even small portions of precise hardware timers, so it can also be used to time even small portions of
@ -11,7 +11,7 @@ code.
## Building ## Building
libprof is built and installed only once for both fx-9860G and fx-CG 50. The libprof is built and installed only once for both fx-9860G and fx-CG 50. The
The dependencies are: dependencies are:
* A GCC cross-compiler for a SuperH architecture * A GCC cross-compiler for a SuperH architecture
* The [gint kernel](/Lephenixnoir/gint) * The [gint kernel](/Lephenixnoir/gint)
@ -37,153 +37,122 @@ will need to make and install twice.
## Basic use ## Basic use
To access the library, include the `<libprof.h>` header file. To access the library, include the `<libprof.h>` header file and call
`prof_init()` somewhere so that libprof has access to a precise timer. If no
such timer is available, `prof_init()` returns non-zero (but normally either 2
or 3 of the TMU are available when an add-in starts).
```c ```c
#include <libprof.h> #include <libprof.h>
prof_init();
``` ```
For each function you want to time, libprof will create a counter. At the start To measure execution time, create a profiling context with `prof_make()`, then
of the program, you need to specify how many functions (libprof calls them call `prof_enter()` at the beginning of the code to time and `prof_leave()` at
*contexts*) you will be timing, so that libprof can allocate enough memory. the end.
libprof also needs one of gint's timer to actually measure time; it selects one
of TMU0, TMU1 and TMU2 which are the only ones suited for this precise job. The
available timer with the smallest interrupt priority is selected.
This initialization happens in the `prof_init()` function.
```c ```c
/* Initialize libprof for 13 contexts (automatically selects a timer) */ void function_to_time(void)
prof_init(13);
```
You can then measure the execution time of a function by calling `prof_enter()`
at the beginning and `prof_end()` at the end. You just need to "name" the
function by giving its context ID, which is any number between 0 and the number
of contexts passed to `prof_init()` (here 0 to 12).
```c
void function5(void)
{ {
prof_enter(5); prof_t prof = prof_make();
prof_enter(prof);
/* Do stuff... */ /* Do stuff... */
prof_leave(5);
prof_leave(prof);
} }
``` ```
This will add `function5()`'s execution time to the 5th counter, so if the The context records the time spent between calls to `prof_enter()` and
function is called several times the total execution time will be recorded. `prof_leave()`. It can be entered multiple times and will simply accumulate the
This way, at the end of the program, you can look at the counters to see where time spent in its counter. When the counter is not running, you can query
most of the time has been spent. recorded time with `prof_time()`.
To retrieve the total execution time of a function, use `prof_time()` :
```c ```c
uint32_t total_function5_us = prof_time(5); uint32_t total_function_us = prof_time(prof);
``` ```
This time is measured in microseconds, even though the timers are actually more This time is returned in microseconds, even though the timers are slightly more
precise than this. Note that the overhead of `prof_enter()` and `prof_leave()` precise than this. Note that the overhead of `prof_enter()` and `prof_leave()`
is usually less than 1 microsecond, so the time is very close to the actual is usually less than 1 microsecond, so the measure is very close to the
time spent in the function even if the context is frequently entered and left. wall-clock time spent in the function even if the context is frequently used.
At the end of the program, free the resources of the library by calling At the end of the program or whenever you need the timer that libprof is
`prof_quit()`. holding, call `prof_quit()` to free the resources. This will make the timer
available to `timer_setup()` again.
```c ```c
prof_quit(); prof_quit();
``` ```
## Managing context numbers ## Recursive functions
The number of contexts must be set for all execution and all context IDs must The profiler context keeps track of recursive calls so that functions that
be between 0 and this number (excluded). Managing the numbers by hand is error- enter and leave recursively can be timed as well. The only difference with the
prone and can lead to memory errors. basic example is that we need to make sure a single context is used (instead
of creating a new one in each stack frame). Making it static is enough.
A simple way of managing context numbers without risking an error is to use an
enumeration.
```c ```c
enum { void recursive_function_to_time(void)
/* Whatever function you need */
PROFCTX_FUNCTION1 = 0,
PROFCTX_FUNCTION2,
PROFCTX_FUNCTION3,
PROFCTX_COUNT,
};
```
Enumerations will assign a value to all the provided names, and increment by
one each time. So for example here `PROFCTX_FUNCTION2` is equal to `1` and
`PROFCTX_COUNT` is equal to `3`. As you can see this is conveniently equal to
the number of contexts, which makes it simple to initialize the library:
```c
prof_init(PROFCTX_COUNT);
```
Then you can use context names instead of numbers:
```c
prof_enter(PROFCTX_FUNCTION1);
/* Do stuff... */
prof_leave(PROFCTX_FUNCTION1);
```
If you want to use a new context, you just need to add a name in the
enumeration (anywhere but after `PROFCTX_COUNT`) and all IDs plus the
initialization call will be updated automatically.
## Timing a single execution
`prof_enter()` and `prof_leave()` will add the measured execution time to the
context counter. Sometimes you want to make individual measurements instead of
adding all calls together. To achieve this effect, clear the counter before
the measure using `prof_clear()`.
Here is an example of a function `exec_time_us()` that times the execution of a
function `f` passed as parameter.
```c
uint32_t exec_time_us(void (*f)(void))
{ {
int ctx = PROFCTX_EXEC_TIME_US; static prof_t prof = prof_make();
prof_clear(ctx); prof_enter(prof);
prof_enter(ctx);
f(); /* Stuff... */
recursive_function_to_time();
/* Stuff... */
prof_leave(ctx); prof_leave(prof);
return prof_time(ctx);
} }
``` ```
## Exploiting the measure's precision However it makes it somewhat difficult to retrieve the elapsed time because it
is hidden withing the function's name scope. Making it global can help.
The overhead of `prof_enter()` and `prof_leave()` is usually less than a ## Timing a single execution
microsecond, but the starting time of your benchmark might count (loading data
from memory to initialize arrays, performing function calls...). In this case,
the best you can do is measure the time difference between two similar calls.
If you need something even more precise then you can access libprof's counter In many cases applications just need to measure a single piece of code once and
array directly to get the timer-tick value itself: get the resulting time. `prof_exec()` is a shortcut macro that does exactly
that, creating a temporary context and returning elapsed time.
```c ```
uint32_t elapsed_timer_tick = prof_elapsed[ctx]; uint32_t elapsed_us = prof_exec({
scene_graph_render();
});
``` ```
The frequency of this tick is PΦ/4, where the value of PΦ can be obtained by The macro expands to a short code block that wraps the argument and returns
querying gint's clock module: the `prof_time()` of the temporary context.
## Using the timers's full precision
Hardware timers run at 7-15 MHz depending on the calculator model, so the time
measure it slightly more precise than what `prof_time()` shows you. You can
access that value through the `elapsed` field of a context object.
The value is a count of ticks that occur at a frequency of PΦ/4, where the
value of PΦ can be obtained by querying gint's CPG driver:
```c ```c
#include <gint/clock.h> #include <gint/clock.h>
uint32_t tick_freq = clock_freq()->Pphi_f / 4; uint32_t tick_freq = clock_freq()->Pphi_f / 4;
``` ```
One noteworthy phenomenon is the startup cost. The first few measurements are Note that the overhead of `prof_enter()` and `prof_leave()` is usually less
always less precise, probably due to cache effects. I frequently have a first than a microsecond, but more than a few timer ticks.
measurement with an additional 100 us of execution time and 3 us of overhead,
which subsequent tests remove. So it is expected for the first few points of Due in part to caching effects, the first measurement for a given code sequence
data to lie outside the range of the next. is likely to be larger than the other ones. I have seen effects such as 3 µs
for a no-op (cache misses in libprof's code) and 100 µs for real cases (cache
misses in the code itself). Make sure to make several measurements and use
serious statistical methods!
## Overclock interference
Contexts store timer tick counts, which are converted to microsecond delays
only when `prof_time()` is called. Do not mix measurements performed at
different overclock settings as the results will be erroneous.
What you can do is call `prof_time()` and reset the context (by assigning it
`prof_make()`) before switching clock settings, then add up the microsecond
delays when the execution is over.

View file

@ -1,51 +1,34 @@
#include <gint/timer.h> #include <gint/timer.h>
#include <gint/clock.h> #include <gint/clock.h>
#include <gint/mpu/tmu.h> #include <gint/mpu/tmu.h>
#include <gint/std/stdlib.h>
#include <gint/hardware.h> #include <gint/hardware.h>
#include <libprof.h> #include <libprof.h>
/* Recursion depth of each function currently being executed */
uint8_t *prof_rec = NULL;
/* Time that has elapsed within each function; the value for a given function
is only relevant when it is not executing, due to optimizations */
uint32_t *prof_elapsed = NULL;
/* Timer counter */ /* Timer counter */
uint32_t volatile *prof_tcnt = NULL; uint32_t volatile *prof_tcnt = NULL;
/* Timer ID */ /* Timer ID */
static int prof_timer = -1; static int prof_timer = -1;
/* prof_init(): Initialize the profiler's data and timer */ /* prof_init(): Initialize the profiler's timer */
int prof_init(int context_count) int prof_init(void)
{ {
prof_rec = malloc(context_count * sizeof *prof_rec);
prof_elapsed = malloc(context_count * sizeof *prof_elapsed);
/* Get a TMU with the exact constant 0xffffffff */ /* Get a TMU with the exact constant 0xffffffff */
int timer = -1; int timer = -1;
for(int t = 2; t >= 0 && timer == -1; t--) for(int t = 2; t >= 0 && timer == -1; t--)
{ {
timer = timer_setup(t, 0xffffffff, NULL); timer = timer_setup(t, 0xffffffff, NULL);
} }
if(timer == -1)
if(!prof_rec || !prof_elapsed || timer == -1)
{ {
prof_quit(); prof_quit();
return 1; return 1;
} }
/* Fix the configuration done by gint by disabling the interrupt */ /* Keep the address of the TCNT register */
if(isSH3()) prof_tcnt = isSH3()
{ ? &SH7705_TMU.TMU[timer].TCNT
SH7705_TMU.TMU[timer].TCR.UNIE = 0; : &SH7305_TMU.TMU[timer].TCNT;
prof_tcnt = &SH7705_TMU.TMU[timer].TCNT;
}
else
{
SH7305_TMU.TMU[timer].TCR.UNIE = 0;
prof_tcnt = &SH7305_TMU.TMU[timer].TCNT;
}
timer_start(timer); timer_start(timer);
prof_timer = timer; prof_timer = timer;
@ -53,27 +36,16 @@ int prof_init(int context_count)
return 0; return 0;
} }
/* prof_quit(): Free the profiler's data and timer */ /* prof_quit(): Free the profiler's timer */
void prof_quit(void) void prof_quit(void)
{ {
if(prof_timer >= 0) timer_stop(prof_timer); if(prof_timer >= 0) timer_stop(prof_timer);
if(prof_rec) free(prof_rec);
if(prof_elapsed) free(prof_elapsed);
prof_timer = -1; prof_timer = -1;
prof_rec = NULL;
prof_elapsed = NULL;
} }
//---
// Post-measurement analysis
//---
/* prof_time(): Time spent in a given context, in microseconds */ /* prof_time(): Time spent in a given context, in microseconds */
uint32_t prof_time(int ctx) uint32_t prof_time(prof_t prof)
{ {
int Pphi = clock_freq()->Pphi_f; int Pphi = clock_freq()->Pphi_f;
uint64_t time = prof_elapsed[ctx]; return ((uint64_t)prof.elapsed * 4 * 1000000) / Pphi;
return (time * 4 * 1000000) / Pphi;
} }

View file

@ -6,37 +6,43 @@
#define LIBPROF_LIBPROF #define LIBPROF_LIBPROF
#include <stdint.h> #include <stdint.h>
#include <gint/defs/attributes.h>
//--- //---
// Initialization // Initialization
//--- //---
/* prof_init(): Initialize the profiler's data and timer /* prof_init(): Initialize the profiler's timer
Initializes [prof_rec] and [prof_time] (see below) with enough elements to Starts a timer to count time. libprof automatically selects a TMU and tries
hold all the context IDs. Context IDs should be numbered from 0 to [n-1]; to use TMU2 before TMU1 before TMU0 so that high-priority interrupts remain
due to speed requirements array bounds are not checked so be careful. available, and sets an accurate clock configuration.
Also starts a timer to count time. libprof automatically selects a TMU and Returns non-zero if no timer is available. */
tries to use TMU2 before TMU1 before TMU0 so that high-priority interrupts int prof_init(void);
remain available, and sets an accurate clock configuration.
@context_count Number of different contexts that will be measured /* prof_quit(): Free the profiler's timer */
Returns non-zero if a setup error occurs or no timer is available. */
int prof_init(int context_count);
/* prof_quit(): Free the profiler's data and timer */
void prof_quit(void); void prof_quit(void);
//--- //---
// Runtime time measurement // Runtime time measurement
//--- //---
/* Recursion depth of each function currently being executed */ /* Context object, has an elasped delay and a recursion level. This object can
extern uint8_t *prof_rec; be created on the stack of a function that measures its execution time,
/* Time that has elapsed within each function; the value for a given function except if the function is recursive, in which case it should be either
is only relevant when it is not executing, due to optimizations */ static or global. */
extern uint32_t *prof_elapsed; typedef struct prof_t
{
uint32_t rec;
uint32_t elapsed;
} GPACKED(4) prof_t;
/* prof_make(): Create a new context object
A context can be cleared by assigning it prof_make() again. */
#define prof_make() ((prof_t){ 0, 0 })
/* Timer counter */ /* Timer counter */
extern uint32_t volatile *prof_tcnt; extern uint32_t volatile *prof_tcnt;
@ -44,8 +50,8 @@ extern uint32_t volatile *prof_tcnt;
This macro should be called at the start of the context scope. If the This macro should be called at the start of the context scope. If the
function was already executing then the deepest instance in the stack is function was already executing then the deepest instance in the stack is
used instead of creating a new counter. */ used instead of creating a new counter. */
#define prof_enter(ctx) { \ #define prof_enter(prof) { \
if(!prof_rec[ctx]++) prof_elapsed[ctx] += *prof_tcnt; \ if(!prof.rec++) prof.elapsed += *prof_tcnt; \
} }
/* prof_leave(): Stop counting time for a function /* prof_leave(): Stop counting time for a function
@ -53,15 +59,25 @@ extern uint32_t volatile *prof_tcnt;
stops if there is no deeper instance of the context in the stack. If there stops if there is no deeper instance of the context in the stack. If there
are not as exactly as many prof_leave()'s as prof_enter()'s then the are not as exactly as many prof_leave()'s as prof_enter()'s then the
resulting time measure will not be relevant at all. */ resulting time measure will not be relevant at all. */
#define prof_leave(ctx) { \ #define prof_leave(prof) { \
if(!--prof_rec[ctx]) prof_elapsed[ctx] -= *prof_tcnt; \ if(!--prof.rec) prof.elapsed -= *prof_tcnt; \
} }
/* prof_clear(): Clear a context's counter /* prof_exec(): Measure a single block of code
This operation is defined only if the context is not being profiled. */ This operation can be used when profiling is not required, and instead
#define prof_clear(ctx) { \ libprof is used to measure the performance of a single bit of code. Use it
prof_elapsed[ctx] = 0; \ like this:
}
uint32_t elasped_us = prof_exec({
exec_code();
}); */
#define prof_exec(code) ({ \
prof_t prof = prof_make(); \
prof_enter(prof); \
code; \
prof_leave(prof); \
prof_time(prof); \
})
//--- //---
// Post-measurement analysis // Post-measurement analysis
@ -69,6 +85,6 @@ extern uint32_t volatile *prof_tcnt;
/* prof_time(): Time spent in a given context, in microseconds /* prof_time(): Time spent in a given context, in microseconds
Should only be called when the context is not currently executing. */ Should only be called when the context is not currently executing. */
uint32_t prof_time(int ctx); uint32_t prof_time(prof_t prof);
#endif /* LIBPROF_LIBPROF */ #endif /* LIBPROF_LIBPROF */