Power Management Device Latencies Measurement
From OMAPpedia
| Line 356: | Line 356: | ||
* (2) The threshold value is derived using the intersection of C3 and C4 in the graph | * (2) The threshold value is derived using the intersection of C3 and C4 in the graph | ||
* (3) No sys_clkoff is supported, this value need to be corrected with the correct value of SYSCLK on/off timings (1ms for sysclk on, 2.5ms for sysclk off) | * (3) No sys_clkoff is supported, this value need to be corrected with the correct value of SYSCLK on/off timings (1ms for sysclk on, 2.5ms for sysclk off) | ||
| - | * (4) From the 'HW and SW measurements results' here | + | * (4) From the 'HW and SW measurements results' here above and the T2 scripts page, this value is equal to the HW and SW parts, so 11500 + (915 + 488 + 30) |
* (5) The new threshold value is derived using the intersection of C5 and C9 in the graph. However since the sleep and wake-up values are different, C9 is offset in time and in energy by a constant factor (from the initial value of (3760 + 8794) to the new value (4300 + 12933)), and the intersection gives the new threshold | * (5) The new threshold value is derived using the intersection of C5 and C9 in the graph. However since the sleep and wake-up values are different, C9 is offset in time and in energy by a constant factor (from the initial value of (3760 + 8794) to the new value (4300 + 12933)), and the intersection gives the new threshold | ||
| Line 396: | Line 396: | ||
|- | |- | ||
|} | |} | ||
| + | |||
| + | Those figures are used in the code as the power domains wake-up latencies for RET and OFF, cf. arch/arm/mach-omap2/powerdomains3xxx_data.c. | ||
==ToDo== | ==ToDo== | ||
| + | * Measure the wake-up latencies for all power domains for OMAP3 | ||
* Measure and add figures for OMAP4 | * Measure and add figures for OMAP4 | ||
* Correct some numbers when sys_clkreq and sys_offmode are supported | * Correct some numbers when sys_clkreq and sys_offmode are supported | ||
Revision as of 12:33, 2 September 2011
Contents |
PM Devices constraintes measurements
Introduction
To correctly implement the device latency constraint support it is needed to get accurate measurements of the system low power modes overhead:
- Total amount of time taken for a device to become accessible, and so the time for the device to wake-up from a given low power mode.
- It includes turning the clocks on, bringing the clock domain out of inactive, power domain out of RET or OFF (with context restore) state.
- This constraint mainly governs the deepest device idle state (only clocks cut, clock domain in inactive, power domain in RET or off) acceptable to the device at any given time.
This wiki page details the measurements setup and the results. The latency data is to be fed into the constraints latency patches.
Kernel patches & build
Some kernel changes are required for the kernel instrumentation. The patches and config are attached to this page.
- Starting point: linux-omap master branch as of Sep 2 2011.
- GPIO instrumentation
- e27b7a5dbb8cbc126b332e7e89b4e01e3d0aa286 OMAP3: Add HW tracing code
- GPT instrumentation
- c8ae9658b20f76ce2eb69d796b400668dce6339a OMAP3: Use GPT12 timer for low level PM instrumentation
- Kernel config for Beagleboard
Changes: enable IDLE, DSS for Beagle, Initramfs Busybox root FS
HW traces details
The trace points are connected on Beagleboard rev B7.
- Trace A: on the USER button, at the connection to R36. This signal is the system wake-up event. The trigger is set on the raising edge of the signal.
- Trace B: USR1 LED (GPIO_149). This signal is set at the end of omap_sram_idle, along with trace_power_start(POWER_WAKEN, 7, smp_processor_id());. This allows to synchronize the time between the HW and the SW traces.
!Warning! The HW power supplies and external clocks are not cut off in this config (no support for System OFF in l-o), so the HW latencies are lower than expected. The HW measurements need to be performed as soon as l-o supports the System OFF. The measurements from TI are used for the real HW latency.
Here are some scope screenshots showing the time delta between the wake-up event (USER button press, trace A) and the end of omap_sram_idle (USR1 Led).
For RET mode, showing a delta of 408us:
For OFF mode, showing a delta of 2700us:
GPT tracer
Since GPT12 is used as a wake-up source from the idle mode, it can be used to track the timings during the wake-up sequence. A patch is needed to let the timer count after it overflowed and woke up the system.
The GPT runs on 32KHz clock and so the resolution is limited to 30.518us. Given the latencies to measure for OFF mode, the resolution is accpetable.
4 GPT measurements are performed during the wake-up:
- At the wake-up event the GPT overflows and the counter value is 0,
- At the time the WFI instrcution is done, before the MPU context restore code (in ASM),
- At the same time as the SW tracers 1 and 7. This allows to synchronize the HW and SW tracers.
SW trace usage
Enable the power events and dump the trace:
# echo 1 > /debug/tracing/events/power/enable # cat /debug/tracing/trace_pipe &
Enable the system idle in RET mode:
# echo 5 > /sys/devices/platform/omap/omap-hsuart.0/sleep_timeout # echo 5 > /sys/devices/platform/omap/omap-hsuart.1/sleep_timeout # echo 5 > /sys/devices/platform/omap/omap-hsuart.2/sleep_timeout # echo 0 > /debug/pm_debug/enable_off_mode # echo 1 > /debug/pm_debug/sleep_while_idle
Trace output:
[ 62.311462] *** GPT12 wake-up (HW wake-up, ASM restore, delta trace1-7): 183, 0, 244 us => Dump of GPT timing deltas
<idle>-0 [000] 62.241608: power_start: type=1 state=1 cpu_id=0 => Idle start
<idle>-0 [000] 62.241608: power_start: type=4 state=1 cpu_id=0 => First suspend SW trace in omap_sram_idle
<idle>-0 [000] 62.241638: power_start: type=4 state=2 cpu_id=0 => ...
<idle>-0 [000] 62.241669: power_start: type=4 state=3 cpu_id=0
<idle>-0 [000] 62.241699: power_domain_target: name=neon_pwrdm state=1 cpu_id=0
<idle>-0 [000] 62.241699: power_start: type=4 state=4 cpu_id=0
<idle>-0 [000] 62.241699: clock_disable: name=uart3_fck state=0 cpu_id=0
<idle>-0 [000] 62.241730: power_start: type=4 state=5 cpu_id=0
<idle>-0 [000] 62.241730: clock_disable: name=uart1_fck state=0 cpu_id=0
<idle>-0 [000] 62.241730: clock_disable: name=uart2_fck state=0 cpu_id=0
<idle>-0 [000] 62.241760: power_start: type=4 state=6 cpu_id=0
<idle>-0 [000] 62.241760: power_start: type=4 state=7 cpu_id=0
<idle>-0 [000] 62.241760: power_start: type=4 state=8 cpu_id=0 => Last suspend SW trace in omap_sram_idle
<idle>-0 [000] 62.311188: power_start: type=5 state=1 cpu_id=0 => First resume SW trace in omap_sram_idle
<idle>-0 [000] 62.311188: power_start: type=5 state=2 cpu_id=0 => ...
<idle>-0 [000] 62.311188: power_start: type=5 state=3 cpu_id=0
<idle>-0 [000] 62.311188: power_start: type=5 state=4 cpu_id=0
<idle>-0 [000] 62.311218: clock_enable: name=uart1_fck state=1 cpu_id=0
<idle>-0 [000] 62.311310: clock_enable: name=uart2_fck state=1 cpu_id=0
<idle>-0 [000] 62.311310: power_start: type=5 state=5 cpu_id=0
<idle>-0 [000] 62.311340: clock_enable: name=uart3_fck state=1 cpu_id=0
<idle>-0 [000] 62.311340: power_start: type=5 state=6 cpu_id=0
<idle>-0 [000] 62.311432: power_start: type=5 state=7 cpu_id=0 => Last resume SW trace in omap_sram_idle
<idle>-0 [000] 62.311462: power_end: cpu_id=0 => Idle end
Enable the system idle in OFF mode:
# echo 5 > /sys/devices/platform/omap/omap-hsuart.0/sleep_timeout # echo 5 > /sys/devices/platform/omap/omap-hsuart.1/sleep_timeout # echo 5 > /sys/devices/platform/omap/omap-hsuart.2/sleep_timeout # echo 1 > /debug/pm_debug/enable_off_mode # echo 1 > /debug/pm_debug/sleep_while_idle
Trace output:
/ # echo 1 > /debug/pm_debug/enable_off_mode
/ #
sh-503 [000] 70.862366: power_domain_target: name=iva2_pwrdm state=0 cpu_id=0
sh-503 [000] 70.862396: power_domain_target: name=mpu_pwrdm state=0 cpu_id=0
sh-503 [000] 70.862396: power_domain_target: name=neon_pwrdm state=0 cpu_id=0
sh-503 [000] 70.862396: power_domain_target: name=core_pwrdm state=0 cpu_id=0
sh-503 [000] 70.862427: power_domain_target: name=cam_pwrdm state=0 cpu_id=0
sh-503 [000] 70.862457: power_domain_target: name=dss_pwrdm state=0 cpu_id=0
sh-503 [000] 70.862488: power_domain_target: name=per_pwrdm state=0 cpu_id=0
sh-503 [000] 70.862488: power_domain_target: name=usbhost_pwrdm state=0 cpu_id=0
/ #
[ 557.240020] *** GPT12 wake-up (HW wake-up, ASM restore, delta trace1-7): 1495, 915, 488 us => Dump of GPT timing deltas
<idle>-0 [000] 557.156769: power_start: type=1 state=1 cpu_id=0 => Idle start
<idle>-0 [000] 557.156769: power_start: type=4 state=1 cpu_id=0 => First suspend SW trace in omap_sram_idle
<idle>-0 [000] 557.156769: power_start: type=4 state=2 cpu_id=0 => ...
<idle>-0 [000] 557.156830: power_start: type=4 state=3 cpu_id=0
<idle>-0 [000] 557.156830: power_domain_target: name=neon_pwrdm state=0 cpu_id=0
<idle>-0 [000] 557.156830: power_start: type=4 state=4 cpu_id=0
<idle>-0 [000] 557.156860: clock_disable: name=uart3_fck state=0 cpu_id=0
<idle>-0 [000] 557.156891: power_start: type=4 state=5 cpu_id=0
<idle>-0 [000] 557.156891: clock_disable: name=uart1_fck state=0 cpu_id=0
<idle>-0 [000] 557.156921: clock_disable: name=uart2_fck state=0 cpu_id=0
<idle>-0 [000] 557.157013: power_start: type=4 state=6 cpu_id=0
<idle>-0 [000] 557.157013: power_start: type=4 state=7 cpu_id=0
<idle>-0 [000] 557.157898: power_start: type=4 state=8 cpu_id=0 => Last suspend SW trace in omap_sram_idle
<idle>-0 [000] 557.236084: power_start: type=5 state=1 cpu_id=0 => First resume SW trace in omap_sram_idle
<idle>-0 [000] 557.236145: power_start: type=5 state=2 cpu_id=0 => ...
<idle>-0 [000] 557.236206: power_start: type=5 state=3 cpu_id=0
<idle>-0 [000] 557.236267: power_start: type=5 state=4 cpu_id=0
<idle>-0 [000] 557.236389: clock_enable: name=uart1_fck state=1 cpu_id=0
<idle>-0 [000] 557.236450: clock_enable: name=uart2_fck state=1 cpu_id=0
<idle>-0 [000] 557.236450: power_start: type=5 state=5 cpu_id=0
<idle>-0 [000] 557.236481: clock_enable: name=uart3_fck state=1 cpu_id=0
<idle>-0 [000] 557.236511: power_start: type=5 state=6 cpu_id=0
<idle>-0 [000] 557.236572: power_start: type=5 state=7 cpu_id=0 => Last resume SW trace in omap_sram_idle
<idle>-0 [000] 557.236602: power_end: cpu_id=0 => Idle end
Results interpretation
The low power transition sequence is pictured as nested calls to functions:
The measured results (from the HW and SW traces) are mapped to the pictured states according to the following table:
| Pictured state | Trace point | Performed SW action |
|---|---|---|
| Idle enter | start suspend | System ready to enter idle |
| omap_sram_idle 1 | suspend trace point 1 | Enter omap_sram_idle |
| omap_sram_idle 2 | suspend trace point 2 | calculation of next power domains modes |
| omap_sram_idle 3 | suspend trace point 3 | Power domains pre-transition: program power domains current state, clear status |
| omap_sram_idle 4 | suspend trace point 4 | Context save for NEON IO pad and chain new state programmed |
| omap_sram_idle 5 | suspend trace point 5 | Context save for PER, GPIO Prepare UARTs 2&3 |
| omap_sram_idle 6 | suspend trace point 6 | Context save for CORE and PRCM Prepare UARTs 0&1 |
| omap_sram_idle 7 | suspend trace point 7 | Context save for INTC Program SDRC |
| WFI enter | suspend trace point 8 | GPIO HW trace MPU context save in ASM (caches, registers, disable cache & prediction) |
| System OFF active | - sys_off_mode, external clocks and power supplies to be measured with System OFF support | - |
| Wake-up event: IO or GPT12 | HW trace A (if IO wake-up) GPT12=0 (if GPT wake-up) | - |
| System OFF inactive | - sys_off_mode, external clocks and power supplies to be measured with System OFF support | - |
| WFI exit | GPT12 sampling right after WFI | - |
| omap_sram_idle 1 | GPT12 sampling at return from ASM code Wake-up trace point 1 | SDRC errata for ES3.1 MPU context restore MMU restore and enable |
| omap_sram_idle 2 | wake-up trace point 2 | cpu_init |
| omap_sram_idle 3 | wake-up trace point 3 | SDRC settings restore |
| omap_sram_idle 4 | wake-up trace point 4 | Restore MMU tables Enable caches and prediction |
| omap_sram_idle 5 | wake-up trace point 5 | Context restore for CORE, PRCM, SRAM, SMS Resume UARTs 0&1 |
| omap_sram_idle 6 | wake-up trace point 6 | Context restore for PER, INTC, GPIO IO pad & chain Resume UARTS 2&3 |
| omap_sram_idle 7 | wake-up trace point 7 GPT sampling HW trace B | Power domains post-transition: program power domains current state, clear status Restore SDRC settings |
| Idle exit | exit suspend | System out of idle |
cpuidle results
PSI measurements results
Some timings measurements have been made by the TI PSI team. The following tables gives the results for the sleep and wake-up latencies for the C-states:
Note: in the linux code there is no C7/C8/C9 as in the table. C7 is MPU OFF + CORE OFF, which is identical to C9 in the table.
A model with the energy spent in the C-states has been built from the measured numbers. Here is the graph of the energy vs time:
Taking the minimum energy from the graph allows to identify the 4 energy-wise interesting C-states: C1, C3, C5, C9 and the threshold time for those C-states to be efficient:
Notes:
- The measurements have been performed at OPP50
- No data has been measured for C9 (MPU OFF + CORE OFF). Data from the HW and SW trace points are used to fill in the results.
- The sys_offmode signal is not supported and so not used for the measurements. A value of 8ms is used in the table. From the T2 scripts page the value should be 11.5ms. The measurements data and the threshold for C9 need to be corrected.
- The sys_clkreq signal is not used and so a correction is needed. ToBeDone
HW and SW measurements results
Here are the results for full RET and full OFF modes:
| Sequence | Time (us) - RET = C5 | Time (us) - OFF = C9 |
|---|---|---|
| From idle start till omap_sram_idle entry | 0 | 0 |
| From omap_sram_idle entry till WFI | 152 | 1129 |
| ... HW sleep... | ||
| From WKUP event till WFI (HW wake-up - GPT12) | 183 | 1495 |
| From WFI till return from omap34xx_save_cpu_context_wfi (MPU context restore in ASM) | 0 | 915 |
| From return from omap34xx_save_cpu_context_wfi till end of omap_sram_idle (System restore) | 244 | 488 |
| From end of omap_sram_idle till return from idle | 30 | 30 |
Aggregated timings results
From the various sources of data the following figures are derived for all C-states (timings in us). The results are used in the cpuidle table (in arch/arm/mach-omap2/cpuidle34xx.c).
| C-state | Sleep lat | Wake-up lat | Threshold |
|---|---|---|---|
| C1: MPU WFI/ON - CORE ON | 73.6 | 78 | 151.6 |
| C2: MPU WFI - CORE INA | 165 | 88.16 | 345 (1) |
| C3: MPU CSWR - CORE INA | 163 | 182 | 345 |
| C4: MPU OFF - CORE INA | 2852 | 605 | 150000 (2) |
| C5: MPU CSWR - CORE CSWR | 800 | 366 (3) | 2120 |
| C6: MPU OFF - CORE CSWR | 4080 | 801 | 215000 (1) |
| C7: MPU OFF - CORE OFF | 4300 | 12933 (4) | 215000 (5) |
Notes:
- The power efficient C-states are identifed as C1, C3, C5, C7
- (1) To force the cpuidle algorithm to chose the power efficient C-states, the other C-states have a threshold value equal to the next power efficient C-state
- (2) The threshold value is derived using the intersection of C3 and C4 in the graph
- (3) No sys_clkoff is supported, this value need to be corrected with the correct value of SYSCLK on/off timings (1ms for sysclk on, 2.5ms for sysclk off)
- (4) From the 'HW and SW measurements results' here above and the T2 scripts page, this value is equal to the HW and SW parts, so 11500 + (915 + 488 + 30)
- (5) The new threshold value is derived using the intersection of C5 and C9 in the graph. However since the sleep and wake-up values are different, C9 is offset in time and in energy by a constant factor (from the initial value of (3760 + 8794) to the new value (4300 + 12933)), and the intersection gives the new threshold
Results for individual power domains
Since cpuidle only manages the MPU (and depending power domains), the wake-up latency values for the other power domains must be measured separately. By adjusting the target states of the power domains (in /debug/pm_debug/xxxx_pwrdm/suspend) the following combinations have been measured. All values are in us:
HW and SW measurements results
Notes:
- sys_clkreq and sys_offmode are not supported, only the SW restore timing values are relevant.
The significative power domains latencies are derived from the table as follows:
| Power Domain | RET latency | OFF latency | Table location |
|---|---|---|---|
| MPU | 121 | 1830 | (5), (6) |
| NEON | 0 | 0 | Included in MPU transitions? |
| CORE | 153 | 3082 | (3), (4) |
| PER | 0 | 671 | (1), (2) |
Those figures are used in the code as the power domains wake-up latencies for RET and OFF, cf. arch/arm/mach-omap2/powerdomains3xxx_data.c.
ToDo
- Measure the wake-up latencies for all power domains for OMAP3
- Measure and add figures for OMAP4
- Correct some numbers when sys_clkreq and sys_offmode are supported
Links
Device latency patches
PM QoS device constraint code patches
Attachments
Kernel patches and config
File:OMAP latency measurements patches and config.tar.gz
--Jpihet 2 Sep 2011


