Systemtap OMAP Profiler

From OMAPpedia

Revision as of 15:28, 24 April 2012 by Joe (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

[edit] Goal

  • Systemtap uses 64-bit types for variables and so design a tapset to minimise use of 64-bit types.
  • GP timer can run at a derivative of the OMAP SYS_CLK and so are very accurate for benchmarking.
  • For example, the OMAP4 blaze has a SYS_CLK of 38.4MHz.


[edit] Benchmarking Functions

1. Download and build systemtap as described here.

2. Copy the file Omap_timestamp.stp into your systemtap tapset directory.

cp omap_timestamp.stp <stp-install-dir>/systemtap-1.x-target/share/systemtap/tapset/

3. Download File:Omap profile.tar.gz, uncompress and copy the file to your systemtap tapset directory.

cp omap_profile.stp <stp-install-dir>/systemtap-1.x-target/share/systemtap/tapset/

4. Create a new systemtap script file.

vim <stp-install-dir>/systemtap-1.x-target/share/systemtap/myscript.stp

5. In your script define if you want to use the 32kHz clock or a GP timer.

  • For a 32kHz clock add the following.
%{
#define TIMER_USE_32K 1
%}
  • For a GP timer add the following.
%{
#define TIMER_USE_GPTIMER 1
%}

6. Define the maximum number of benchmarks you wish to collect.

%{
#define PROFILE_BENCHMARK_MAX_NUM 10
%}

7. Optionally, for each benchmark you can call profile_benchmark_init() to specify a name for the benchmark. This name will be used for displaying the results. Typically this would be called from the begin probe.

probe begin {
	profile_benchmark_init("benchmark-name”, 0)
}

8. For each function you wish to benchmark add a kprobe to call profile_start() on entry. Where “func_name” is the kernel function name and the value passed is an index that is used to identify each benchmark. The index MUST be less than PROFILE_BENCHMARK_MAX_NUM.

probe kernel.function("func_name”).call {
	profile_start(0)
}

9. For each function you wish to benchmark add a kprobe to call profile_end() on exit. Note that the value passed should be the same as passed to the profile_start().

probe kernel.function("func_name").return {
	profile_end(0)
}

10. To display the results simply call profile_show_benchmarks() function. This would typically done in the end probe.

probe end {
	profile_show_benchmarks()
}

[edit] Benchmarking Example

Here is a example that will use a GP timer to benchmark the functions “omap_dsi_prepare_update()” and “omap_dsi_update()" for 30 seconds and displays the results.

%{
#define TIMER_USE_GPTIMER 1
#define	PROFILE_BENCHMARK_MAX_NUM 2
%}

probe timer.s(30) {
	exit()
}

probe begin {
	profile_benchmark_init("omap_dsi_prepare_update", 0)
	profile_benchmark_init("omap_dsi_update", 1)
}

probe kernel.function("omap_dsi_prepare_update").call {
	profile_start(0)
}

probe kernel.function("omap_dsi_prepare_update").return {
	profile_end(0);
}

probe kernel.function("omap_dsi_update").call {
	profile_start(1)
}

probe kernel.function("omap_dsi_update").return {
	profile_end(1);
}

probe end {
	profile_show_benchmarks()
}

Here is an example of the results output from the previous example.

OMAP Profiler:       Total profile time 30080973104 nsecs (985712 ticks @ 32kHz)
                     Profiling clock frequency 19200000

Function                       Count                Min                  Max                  Total                Avg                 
omap_dsi_prepare_update (ns)   801                  25064                941720               52629356             65676               
omap_dsi_prepare_update (ticks) 801                  482                  18110                1012103              1263                
omap_dsi_update (ns)           801                  31460                1172808              68042104             84916               
omap_dsi_update (ticks)        801                  605                  22554                1308502              1633
  • Results are expressed in timer ticks as well as time (ns) for each function
  • Total profiling time shows how long the benchmark ran
  • Profiling clock frequency is shown to be 19.2MHz
  • Count shows number of times the function was called
  • Min is the minimum time for function call
  • Max is the maximum time for function call
  • Total is the total time spent in the function
  • Avg is the average time spent in the function


[edit] Timestamping Functions

1. Download and build systemtap as described here.

2. Copy the file Omap_timestamp.stp into your systemtap tapset directory.

cp omap_timestamp.stp <stp-install-dir>/systemtap-1.x-target/share/systemtap/tapset/

3. Download File:Omap profile.tar.gz, uncompress and copy the file to your systemtap tapset directory.

cp omap_profile.stp <stp-install-dir>/systemtap-1.x-target/share/systemtap/tapset/

4. Create a new systemtap script file.

vim <stp-install-dir>/systemtap-1.x-target/share/systemtap/myscript.stp

5. In your script define if you want to use the 32kHz clock or a GP timer.

  • For a 32kHz clock add the following.
%{
#define TIMER_USE_32K 1
%}
  • For a GP timer add the following.
%{
#define TIMER_USE_GPTIMER 1
%}

6. Define the maximum number of functions you wish to timestamp.

%{
#define PROFILE_TIMESTAMP_MAX_NUM 10
%}

7. Define the maximum number of timestamps you wish to collect for each function.

%{
#define PROFILE_TIMESTAMP_MAX_SAMPLES 10
%}

8. Optionally, for each benchmark you can call profile_timestamp_init() to specify a name for the benchmark. This name will be used for displaying the results. Typically this would be called from the begin probe.

probe begin {
	profile_timestamp_init("timestamp-name”, 0)
}

9. For each function you wish to timestamp add a kprobe to call profile_timestamp() on entry. Where “func_name” is the kernel function name and the value passed is an index that is used to identify each benchmark. The index MUST be less than PROFILE_TIMESTAMP_MAX_NUM.

probe kernel.function("func_name”).call {
	profile_timestamp(0)
}

10. To display the results simply call profile_show_benchmarks() function. This would typically done in the end probe.

probe end {
	profile_show_benchmarks()
}

[edit] Timestamping Example

%{
#define TIMER_USE_GPTIMER 1
#define PROFILE_TIMESTAMP_MAX_NUM 1
#define PROFILE_TIMESTAMP_MAX_SAMPLES 10
%}

probe timer.s(30) {
	exit()
}

probe begin {
	profile_timestamp_init("musb_interrupt", 0)
}

probe kernel.function("musb_interrupt").call {
	profile_timestamp(0)
}

probe end {
	profile_show_timestamps()
}

Here is an example of the results output from the previous example.

OMAP Profiler:       Total profile time 30090646993 nsecs (986029 ticks @ 32kHz)
                     Profiling clock frequency 19200000 Hz

Function                       Index                Timestamp (ticks)    Delta (ticks)        Timestamp (ns)       Delta (ns)          
musb_interrupt                 0                    164721061            0                    8565495172           0                   
musb_interrupt                 1                    164731562            10501                8566041224           546052              
musb_interrupt                 2                    164794507            62945                8569314364           3273140             
musb_interrupt                 3                    165149452            354945               8587771504           18457140            
musb_interrupt                 4                    165159885            10433                8588314020           542516              
musb_interrupt                 5                    165170432            10547                8588862464           548444              
musb_interrupt                 6                    165197199            26767                8590254348           1391884             
musb_interrupt                 7                    165207346            10147                8590781992           527644
…
  • Results are expressed in timer ticks as well as time (ns) for each timestamp and each line represents a single call to that function.
  • Total profiling time show how long the benchmark ran
  • Profiling clock frequency is shown to be 19.2MHz
  • Index is a number representing each time the function was called
  • Timestamp (ticks) is the timestamp expressed in timer ticks
  • Delta (ticks) is the difference between the current timestamp and the last expressed in ticks.
  • Timestamp (ns) is the timestamp expressed in nsecs
  • Delta (ticks) is the difference between the current timestamp and the last expressed in nsecs.

[edit] Notes


[edit] Known Issues

According to the systemtap language reference “all SystemTap functions and probes run with interrupts disabled, thus you cannot call functions that might sleep within the embedded C.” This makes sense of probing functions, however, for begin and end probe points it would be nice to have the ability to call functions that might sleep. Due to this limitation, when using a GP timer, when you will see the following BUG report when the GP timer is initialised at the very start of the benchmarking. There should be no impact from this but it is annoying. A bug has been filed against systemtap here.

[  242.426544] omap4_dsi_update: systemtap: 1.7/0.143, base: bf013000, memory: 19data/29text/3ctx/10net/17alloc kb, probes: 14                                 
[  242.438507] BUG: sleeping function called from invalid context at kernel/mutex.c:85                                                                         
[  242.446899] in_atomic(): 1, irqs_disabled(): 0, pid: 735, name: stapio                                                                                      
[  242.454193] Backtrace:                                                                                                                                      
[  242.456970] [<c005abe4>] (dump_backtrace+0x0/0x110) from [<c06467b8>] (dump_stack+0x18/0x1c)                                                                
[  242.466033]  r6:00000348 r5:ffffe337 r4:ffff9676 r3:60000013                                                                                                
[  242.472503] [<c06467a0>] (dump_stack+0x0/0x1c) from [<c0093fb4>] (__might_sleep+0x130/0x134)                                                                
[  242.481567] [<c0093e84>] (__might_sleep+0x0/0x134) from [<c0648edc>] (mutex_lock+0x24/0x44)                                                                 
[  242.490539]  r5:bf01c720 r4:c0877640                                                                                                                        
[  242.494537] [<c0648eb8>] (mutex_lock+0x0/0x44) from [<c008e17c>] (omap_dm_timer_request+0x1c/0xac)                                                          
[  242.504180]  r4:c0877640 r3:ffffffff                                                                                                                        
[  242.508178] [<c008e160>] (omap_dm_timer_request+0x0/0xac) from [<bf0130cc>] (timer_init_gptimer+0x10/0x8c [omap4_dsi_update])                               
…
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox