Systemtap OMAP Profiler
From OMAPpedia
Contents |
Goal
- Easy to use systemtap tapset for benchmarking and timestamping kernel functions on OMAP devices
- Minimise systemtap overhead
- Systemtap uses 64-bit types for variables and so design a tapset to minimise use of 64-bit types.
- Ability to use 32kHz timer or a GP timer for benchmarking
- GP timer can run at a derivative of the OMAP SYS_CLK and so are very accurate for benchmarking.
- For example, the OMAP4 blaze has a SYS_CLK of 38.4MHz.
Benchmarking Functions
1. Download and build systemtap as described here.
2. Copy the file Omap_timestamp.stp into your systemtap tapset directory.
cp omap_timestamp.stp <stp-install-dir>/systemtap-1.x-target/share/systemtap/tapset/
3. Download File:Omap profile.tar.gz, uncompress and copy the file to your systemtap tapset directory.
cp omap_profile.stp <stp-install-dir>/systemtap-1.x-target/share/systemtap/tapset/
4. Create a new systemtap script file.
vim <stp-install-dir>/systemtap-1.x-target/share/systemtap/myscript.stp
5. In your script define if you want to use the 32kHz clock or a GP timer.
- For a 32kHz clock add the following.
%{
#define TIMER_USE_32K 1
%}
- For a GP timer add the following.
%{
#define TIMER_USE_GPTIMER 1
%}
6. Define the maximum number of benchmarks you wish to collect.
%{
#define PROFILE_BENCHMARK_MAX_NUM 10
%}
7. Optionally, for each benchmark you can call profile_benchmark_init() to specify a name for the benchmark. This name will be used for displaying the results. Typically this would be called from the begin probe.
probe begin {
profile_benchmark_init(“benchmark-name”, 0)
}
8. For each function you wish to benchmark add a kprobe to call profile_start() on entry. Where “func_name” is the kernel function name and the value passed is an index that is used to identify each benchmark. The index MUST be less than PROFILE_BENCHMARK_MAX_NUM.
probe kernel.function("func_name”).call {
profile_start(0)
}
9. For each function you wish to benchmark add a kprobe to call profile_end() on exit. Note that the value passed should be the same as passed to the profile_start().
probe kernel.function("func_name").return {
profile_end(0)
}
10. To display the results simply call profile_show_benchmarks() function. This would typically done in the end probe.
probe end {
profile_show_benchmarks()
}
Benchmarking Example
Here is a example that will use a GP timer to benchmark the functions “omap_dsi_prepare_update()” and “omap_dsi_update()" for 30 seconds and displays the results.
%{
#define TIMER_USE_GPTIMER 1
#define PROFILE_BENCHMARK_MAX_NUM 2
%}
probe timer.s(30) {
exit()
}
probe begin {
profile_benchmark_init("omap_dsi_prepare_update", 0)
profile_benchmark_init("omap_dsi_update", 1)
}
probe kernel.function("omap_dsi_prepare_update").call {
profile_start(0)
}
probe kernel.function("omap_dsi_prepare_update").return {
profile_end(0);
}
probe kernel.function(“omap_dsi_update").call {
profile_start(1)
}
probe kernel.function("omap_dsi_update").return {
profile_end(1);
}
probe end {
profile_show_benchmarks()
}
Here is an example of the results output from the previous example.
OMAP Profiler: Total profile time 30080973104 nsecs (985712 ticks @ 32kHz)
Profiling clock frequency 19200000
Function Count Min Max Total Avg
omap_dsi_prepare_update (ns) 801 25064 941720 52629356 65676
omap_dsi_prepare_update (ticks) 801 482 18110 1012103 1263
omap_dsi_update (ns) 801 31460 1172808 68042104 84916
omap_dsi_update (ticks) 801 605 22554 1308502 1633
- Where
- Results are expressed in timer ticks as well as time (ns) for each function
- Total profiling time shows how long the benchmark ran
- Profiling clock frequency is shown to be 19.2MHz
- Count shows number of times the function was called
- Min is the minimum time for function call
- Max is the maximum time for function call
- Total is the total time spent in the function
- Avg is the average time spent in the function
Timestamping Functions
1. Download and build systemtap as described here.
2. Copy the file Omap_timestamp.stp into your systemtap tapset directory.
cp omap_timestamp.stp <stp-install-dir>/systemtap-1.x-target/share/systemtap/tapset/
3. Download File:Omap profile.tar.gz, uncompress and copy the file to your systemtap tapset directory.
cp omap_profile.stp <stp-install-dir>/systemtap-1.x-target/share/systemtap/tapset/
4. Create a new systemtap script file.
vim <stp-install-dir>/systemtap-1.x-target/share/systemtap/myscript.stp
5. In your script define if you want to use the 32kHz clock or a GP timer.
- For a 32kHz clock add the following.
%{
#define TIMER_USE_32K 1
%}
- For a GP timer add the following.
%{
#define TIMER_USE_GPTIMER 1
%}
6. Define the maximum number of functions you wish to timestamp.
%{
#define PROFILE_TIMESTAMP_MAX_NUM 10
%}
7. Define the maximum number of timestamps you wish to collect for each function.
%{
#define PROFILE_TIMESTAMP_MAX_SAMPLES 10
%}
8. Optionally, for each benchmark you can call profile_timestamp_init() to specify a name for the benchmark. This name will be used for displaying the results. Typically this would be called from the begin probe.
probe begin {
profile_timestamp_init(“timestamp-name”, 0)
}
9. For each function you wish to timestamp add a kprobe to call profile_timestamp() on entry. Where “func_name” is the kernel function name and the value passed is an index that is used to identify each benchmark. The index MUST be less than PROFILE_TIMESTAMP_MAX_NUM.
probe kernel.function("func_name”).call {
profile_timestamp(0)
}
10. To display the results simply call profile_show_benchmarks() function. This would typically done in the end probe.
probe end {
profile_show_benchmarks()
}
Timestamping Example
- Here is a example that will use a GP timer to timestamp the “musb_interrupt” function for 30 seconds and displays the results.
%{
#define TIMER_USE_GPTIMER 1
#define PROFILE_TIMESTAMP_MAX_NUM 1
#define PROFILE_TIMESTAMP_MAX_SAMPLES 10
%}
probe timer.s(30) {
exit()
}
probe begin {
profile_timestamp_init(“musb_interrupt", 0)
}
probe kernel.function("musb_interrupt").call {
profile_timestamp(0)
}
probe end {
profile_show_timestamps()
}
Here is an example of the results output from the previous example.
OMAP Profiler: Total profile time 30090646993 nsecs (986029 ticks @ 32kHz)
Profiling clock frequency 19200000 Hz
Function Index Timestamp (ticks) Delta (ticks) Timestamp (ns) Delta (ns)
musb_interrupt 0 164721061 0 8565495172 0
musb_interrupt 1 164731562 10501 8566041224 546052
musb_interrupt 2 164794507 62945 8569314364 3273140
musb_interrupt 3 165149452 354945 8587771504 18457140
musb_interrupt 4 165159885 10433 8588314020 542516
musb_interrupt 5 165170432 10547 8588862464 548444
musb_interrupt 6 165197199 26767 8590254348 1391884
musb_interrupt 7 165207346 10147 8590781992 527644
…
- Where …
- Results are expressed in timer ticks as well as time (ns) for each timestamp and each line represents a single call to that function.
- Total profiling time show how long the benchmark ran
- Profiling clock frequency is shown to be 19.2MHz
- Index is a number representing each time the function was called
- Timestamp (ticks) is the timestamp expressed in timer ticks
- Delta (ticks) is the difference between the current timestamp and the last expressed in ticks.
- Timestamp (ns) is the timestamp expressed in nsecs
- Delta (ticks) is the difference between the current timestamp and the last expressed in nsecs.
Notes
- 32kHz timer is only accurate to 32usecs
- GP timer is accurate to 52ns (@ 19.2MHz)
- OMAP profiler tapset uses 32-bit unsigned integers to store tick counts and totals. So when using GP timer if benchmarking for minutes you could saturate total tick count.
Known Issues
According to the systemtap language reference “all SystemTap functions and probes run with interrupts disabled, thus you cannot call functions that might sleep within the embedded C.” This makes sense of probing functions, however, for begin and end probe points it would be nice to have the ability to call functions that might sleep. Due to this limitation, when using a GP timer, when you will see the following BUG report when the GP timer is initialised at the very start of the benchmarking. There should be no impact from this but it is annoying. A bug has been filed against systemtap here.
[ 242.426544] omap4_dsi_update: systemtap: 1.7/0.143, base: bf013000, memory: 19data/29text/3ctx/10net/17alloc kb, probes: 14 [ 242.438507] BUG: sleeping function called from invalid context at kernel/mutex.c:85 [ 242.446899] in_atomic(): 1, irqs_disabled(): 0, pid: 735, name: stapio [ 242.454193] Backtrace: [ 242.456970] [<c005abe4>] (dump_backtrace+0x0/0x110) from [<c06467b8>] (dump_stack+0x18/0x1c) [ 242.466033] r6:00000348 r5:ffffe337 r4:ffff9676 r3:60000013 [ 242.472503] [<c06467a0>] (dump_stack+0x0/0x1c) from [<c0093fb4>] (__might_sleep+0x130/0x134) [ 242.481567] [<c0093e84>] (__might_sleep+0x0/0x134) from [<c0648edc>] (mutex_lock+0x24/0x44) [ 242.490539] r5:bf01c720 r4:c0877640 [ 242.494537] [<c0648eb8>] (mutex_lock+0x0/0x44) from [<c008e17c>] (omap_dm_timer_request+0x1c/0xac) [ 242.504180] r4:c0877640 r3:ffffffff [ 242.508178] [<c008e160>] (omap_dm_timer_request+0x0/0xac) from [<bf0130cc>] (timer_init_gptimer+0x10/0x8c [omap4_dsi_update]) …