From 0566fad0c51998c288596c6bdbd7b66572a61779 Mon Sep 17 00:00:00 2001 From: Lephenixnoir Date: Wed, 17 Jul 2019 14:34:25 -0400 Subject: [PATCH] add README --- README.md | 191 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 191 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..0490661 --- /dev/null +++ b/README.md @@ -0,0 +1,191 @@ +# libprof: A performance profiling library for gint + +libprof is a small gint library that can be used to time and profile the +execution of an add-in. Using it, one can record the time spent in one or +several functions to identify performance bottlenecks in an application. + +libprof's measurements are accurate down to the microsecond-level thanks to +precise hardware timers, so it can also be used to time even small portions of +code. + +## Building + +libprof is built only once for both fx-9860G and fx-CG 50, but if you use +different compilers you will need to install it twice. The dependencies are: + +* A GCC cross-compiler for a SuperH architecture +* The [gint kernel](/Lephenixnoir/gint) + +The Makefile will build the library without further instructions. + +```sh +% make +``` + +By default `sh3eb-elf` is used to build; you can override this by setting the +`target` variable. + +```sh +% make target=sh4eb-elf +``` + +Install as usual: + +```sh +% make install +# or +% make install target=sh4eb-elf +``` + +## Basic use + +To access the library, include the `` header file. + +```c +#include +``` + +For each function you want to time, libprof will create a counter. At the start +of the program, you need to specify how many functions (libprof calls them +*contexts*) you will be timing, so that libprof can allocate enough memory. + +libprof also needs one of gint's timer to actually measure time; it must be one +of timers 0, 1 and 2, which are the only one precise enough to do this job. You +can use any timer which you are not already using for something else. + +These settings are specified with the `prof_init()` function. + +```c +/* Initialize libprof for 13 contexts using timer 0 */ +prof_init(13, 0); +``` + +You can then measure the execution time of a function by calling `prof_enter()` +at the beginning and `prof_end()` at the end. You just need to "name" the +function by giving its context ID, which is any number between 0 and the number +of contexts passed to `prof_init()` (here 0 to 12). + +```c +void function5(void) +{ + prof_enter(5); + /* Do stuff... */ + prof_leave(5); +} +``` + +This will add `function5()`'s execution time to the 5th counter, so if the +function is called several times the total execution time will be recorded. +This way, at the end of the program, you can look at the counters to see where +most of the time has been spent. + +To retrieve the total execution time of a function, use `prof_time()` : + +```c +uint32_t total_function5_us = prof_time(5); +``` + +This time is measured in microseconds, even though the timers are actually more +precise than this. Note that the overhead of `prof_enter()` and `prof_leave()` +is usually less than 1 microsecond, so the time is very close to the actual +time spent in the function even if the context is frequently entered and left. + +At the end of the program, free the resources of the library by calling +`prof_quit()`. + +```c +prof_quit(); +``` + +## Managing context numbers + +The number of contexts must be set for all execution and all context IDs must +be between 0 and this number (excluded). Managing the numbers by hand is error- +prone and can lead to memory errors. + +A simple way of managing context numbers without risking an error is to use an +enumeration. + +```c +enum { + /* Whatever function you need */ + PROFCTX_FUNCTION1 = 0, + PROFCTX_FUNCTION2, + PROFCTX_FUNCTION3, + + PROFCTX_COUNT, +}; +``` + +Enumerations will assign a value to all the provided names, and increment by +one each time. So for example here `PROFCTX_FUNCTION2` is equal to `1` and +`PROFCTX_COUNT` is equal to `3`. As you can see this is conveniently equal to +the number of contexts, which makes it simple to initialize the library: + +```c +prof_init(PROFCTX_COUNT, 0); +``` + +Then you can use context names instead of numbers: + +```c +prof_enter(PROFCTX_FUNCTION1); +/* Do stuff... */ +prof_leave(PROFCTX_FUNCTION1); +``` + +If you want to use a new context, you just need to add a name in the +enumeration (anywhere but after `PROFCTX_COUNT`) and all IDs plus the +initialization call will be updated automatically. + +## Timing a single execution + +`prof_enter()` and `prof_leave()` will add the measured execution time to the +context counter. Sometimes you want to make individual measurements instead of +adding all calls together. To achieve this effect, clear the counter before +the measure using `prof_clear()`. + +Here is an example of a function `exec_time_us()` that times the execution of a +function `f` passed as parameter. + +```c +uint32_t exec_time_us(void (*f)(void)) +{ + int ctx = PROFCTX_EXEC_TIME_US; + prof_clear(ctx); + prof_enter(ctx); + + f(); + + prof_leave(ctx); + return prof_time(ctx); +} +``` + +## Exploiting the measure's precision + +The overhead of `prof_enter()` and `prof_leave()` is usually less than a +microsecond, but the starting time of your benchmark might count (loading data +from memory to initialize arrays, performing function calls...). In this case, +the best you can do is measure the time difference between two similar calls. + +If you need something even more precise then you can access libprof's counter +array directly to get the timer-tick value itself: + +```c +uint32_t elapsed_timer_tick = prof_elapsed[ctx]; +``` + +The frequency of this tick is PΦ/4, where the value of PΦ can be obtained by +querying gint's clock module: + +```c +#include +uint32_t tick_freq = clock_freq()->Pphi_f / 4; +``` + +One noteworthy phenomenon is the startup cost. The first few measurements are +always less precise, probably due to cache effects. I frequently have a first +measurement with an additional 100 us of execution time and 3 us of overhead, +which subsequent tests remove. So it is expected for the first few points of +data to lie outside the range of the next.