A microsecond-level performance profiling library for gint.
Go to file
2022-08-19 16:38:35 +02:00
cmake cmake: install in the fxSDK sysroot 2022-08-19 16:38:35 +02:00
.gitignore switch build system to CMake 2021-01-27 15:04:26 +01:00
CMakeLists.txt cmake: install in the fxSDK sysroot 2022-08-19 16:38:35 +02:00
giteapc.make build: fix uninstall command for sh-based OSes 2021-01-29 19:05:11 +01:00
libprof.c udpate to gint 2.4.0 2021-04-27 16:25:48 +02:00
libprof.h harden macros 2022-08-16 09:07:12 +02:00
README.md update README 2021-01-28 22:53:20 +01:00

libprof: A performance profiling library for gint

libprof is a small gint library that times and profiles the execution of an add-in. It is useful to record the time spent in one or several functions and to identify performance bottlenecks in an application.

libprof's measurements are accurate down to the microsecond-level thanks to precise hardware timers, so it can also be used to time even small portions of code.

Installing and using libprof in a program

Installing with GiteaPC

libprof can be built and installed with GiteaPC, an automation tool for the fxSDK.

% giteapc install Lephenixnoir/libprof

Installing manually

libprof should be built with the fxSDK, which provides compiler settings and library interfaces to build on the calculator.

Simply run fxsdk build-fx or fxsdk build-cg. The fxSDK will invoke CMake with a suitable toolchain file while exposing CMake modules for the calculator.

% fxsdk build-fx install
% fxsdk build-cg install

Using in a CMake-based add-in

Find the LibProf package, which has a LibProf target. Versions are numbered like gint, so aim for your version of gint.

find_package(LibProf 2.1 REQUIRED)
target_link_libraries(<target_name> LibProf::LibProf)

Using in a Makefile-based add-in

Link with -lprof-fx on fx-9860G and -lprof-cg on fx-CG 50. In project.cfg:

LIBS_FX := -lprof-fx
LIBS_CG := -lprof-cg

Basic use

To access the library, include the <libprof.h> header file and call prof_init() somewhere so that libprof has access to a precise timer. If no such timer is available, prof_init() returns non-zero (but normally either 2 or 3 of the TMU are available when an add-in starts).

#include <libprof.h>
prof_init();

To measure execution time, create a profiling context with prof_make(), then call prof_enter() at the beginning of the code to time and prof_leave() at the end.

void function_to_time(void)
{
	prof_t prof = prof_make();
	prof_enter(prof);

	/* Do stuff... */

	prof_leave(prof);
}

The context records the time spent between calls to prof_enter() and prof_leave(). It can be entered multiple times and will simply accumulate the time spent in its counter. When the counter is not running, you can query recorded time with prof_time().

uint32_t total_function_us = prof_time(prof);

This time is returned in microseconds, even though the timers are slightly more precise than this. Note that the overhead of prof_enter() and prof_leave() is usually less than 1 microsecond, so the measure is very close to the wall-clock time spent in the function even if the context is frequently used.

At the end of the program or whenever you need the timer that libprof is holding, call prof_quit() to free the resources. This will make the timer available to timer_setup() again.

prof_quit();

Recursive functions

The profiler context keeps track of recursive calls so that functions that enter and leave recursively can be timed as well. The only difference with the basic example is that we need to make sure a single context is used (instead of creating a new one in each stack frame). Making it static is enough.

void recursive_function_to_time(void)
{
	static prof_t prof = prof_make();
	prof_enter(prof);

	/* Stuff... */
	recursive_function_to_time();
	/* Stuff... */

	prof_leave(prof);
}

However it makes it somewhat difficult to retrieve the elapsed time because it is hidden withing the function's name scope. Making it global can help.

Timing a single execution

In many cases applications just need to measure a single piece of code once and get the resulting time. prof_exec() is a shortcut macro that does exactly that, creating a temporary context and returning elapsed time.

uint32_t elapsed_us = prof_exec({
	scene_graph_render();
});

The macro expands to a short code block that wraps the argument and returns the prof_time() of the temporary context.

Using the timers's full precision

Hardware timers run at 7-15 MHz depending on the calculator model, so the time measure it slightly more precise than what prof_time() shows you. You can access that value through the elapsed field of a context object.

The value is a count of ticks that occur at a frequency of PΦ/4, where the value of PΦ can be obtained by querying gint's CPG driver:

#include <gint/clock.h>
uint32_t tick_freq = clock_freq()->Pphi_f / 4;

Note that the overhead of prof_enter() and prof_leave() is usually less than a microsecond, but more than a few timer ticks.

Due in part to caching effects, the first measurement for a given code sequence is likely to be larger than the other ones. I have seen effects such as 3 µs for a no-op (cache misses in libprof's code) and 100 µs for real cases (cache misses in the code itself). Make sure to make several measurements and use serious statistical methods!

Overclock interference

Contexts store timer tick counts, which are converted to microsecond delays only when prof_time() is called. Do not mix measurements performed at different overclock settings as the results will be erroneous.

What you can do is call prof_time() and reset the context (by assigning it prof_make()) before switching clock settings, then add up the microsecond delays when the execution is over.