Power Management Device Latencies Measurement

From OMAPpedia

(Difference between revisions)
Jump to: navigation, search
(Created page with '=PM Devices constraintes measurements= ==Introduction== To correctly implmeent the device latency constraint support it is needed to get accurate measurements of the system low…')
(C1 performance problem: analysis)
 
(37 intermediate revisions not shown)
Line 2: Line 2:
==Introduction==
==Introduction==
-
 
+
To correctly implement the device latency constraint support it is needed to get accurate measurements of the system low power modes overhead:
-
To correctly implmeent the device latency constraint support it is needed to get accurate measurements of the system low power modes overhead:
+
* Total amount of time taken for a device to become accessible, and so the time for the device to wake-up from a given low power mode.
* Total amount of time taken for a device to become accessible, and so the time for the device to wake-up from a given low power mode.
-
* It includes turning the clocks on, bringing the clockdomain out of inactive, power domain out of RET or OFF (with context restore) state.
+
* It includes turning the clocks on, bringing the clock domain out of inactive, power domain out of RET or OFF (with context restore) state.
-
* This constraint mainly governs the deepest device idle state (only clocks cut, clockdomain in inactive, Powerdomain in RET or off) acceptable to the device at any given time.  
+
* This constraint mainly governs the deepest device idle state (only clocks cut, clock domain in inactive, power domain in RET or off) acceptable to the device at any given time.  
This wiki page details the measurements setup and the results. The latency data is to be fed into the constraints latency patches.
This wiki page details the measurements setup and the results. The latency data is to be fed into the constraints latency patches.
==Kernel patches & build==
==Kernel patches & build==
 +
Some kernel changes are required for the kernel instrumentation. The patches and config are attached to this page.
-
Some kernel changes are required for the kernel instrumentation. The patches and config are attached to this page
+
* Starting point: linux-omap master branch as of Sep 2 2011.
-
* Starting point: linux-omap master branch at commit [[http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap-2.6.git;a=commit;h=a83d12a47c9a8c78a184910150797045d69a1570][a83d12a47c9a8c78a184910150797045d69a1570 Linux-omap rebuilt: Updated to v2.6.36, add 24xx uart fix]]
+
* GPIO instrumentation
-
 
+
** ''e27b7a5dbb8cbc126b332e7e89b4e01e3d0aa286 OMAP3: Add HW tracing code''
-
* khilman's patch to fix the low power mode: [[http://git.kernel.org/?p=linux/kernel/git/khilman/linux-omap-pm.git;a=commitdiff;h=20b9e9aa97cf811e799420b9096235eb67ef6191][1e91c5f70da4d7d108cbcf026164d001e0e688b3 OMAP: bus-level PM: enable use of runtime PM API for suspend/resume]]
+
-
* Tracing instrumentation patches
+
* GPT instrumentation
-
<verbatim>
+
** ''c8ae9658b20f76ce2eb69d796b400668dce6339a OMAP3: Use GPT12 timer for low level PM instrumentation''
-
2ae0c90dc76c7b96c192be1a6b7180016ecafc57 tracing, perf: add more power related events
+
-
e44b2e1cdf19039771e02b3f1a021f9c3c43cf68 perf: add suspend tracepoint calls
+
-
2e7387e37300f490de6a29c231bf61849f0a8a2e OMAP3: clean up ASM idle code
+
-
e7eeedd4c3a5d102d57d53707b322806cd21365e OMAP3: add low power entry/exit latency trace points
+
-
</verbatim>
+
-
 
+
-
* GPIO instrumentation
+
-
<verbatim>
+
-
17f9faf3bc2c3d643a48624779acd90d2d499f2c OMAP3: Add HW tracing code
+
-
</verbatim>
+
* Kernel config for Beagleboard
* Kernel config for Beagleboard
-
Changes: DSS for Beagle, Initramfs Busybox root FS
+
Changes: enable IDLE, DSS for Beagle, Initramfs Busybox root FS
==HW traces details==
==HW traces details==
The trace points are connected on Beagleboard rev B7.
The trace points are connected on Beagleboard rev B7.
-
* Trace 1: on the USER button, at the connection to R36. The trigger is set on the raising edge of the signal.
+
* Trace A: on the USER button, at the connection to R36. This signal is the system wake-up event. The trigger is set on the raising edge of the signal.
-
 
+
* Trace B: USR1 LED (GPIO_149). This signal is set at the end of omap_sram_idle, along with ''trace_power_start(POWER_WAKEN, 7, smp_processor_id());''. This allows to synchronize the time between the HW and the SW traces.
-
* Trace 2: USR1 LED (GPIO_149). This signal is set at the end of omap_sram_idle, along with ''trace_power_start(POWER_WAKEN, 7, smp_processor_id());''. This allows to synchronize the time between the HW and the SW traces.
+
-
 
+
-
Because of the difficulty to add trace code for HW/SW tracing in the early wake-up code path, the MPU context restore code in ASM (triggered by the ROM code) until the MMU is re-enabled is considered as part of the HW wake-up sequence. This is valid for OFF mode only.
+
!Warning!
!Warning!
-
The HW power supplies and external clocks are not cut off in this config (no support in l-o), so the HW latencies are lower than expected.
+
The HW power supplies and external clocks are not cut off in this config (no support for System OFF in l-o), so the HW latencies are lower than expected. The HW measurements need to be performed as soon as l-o supports the System OFF.
The measurements from TI are used for the real HW latency.
The measurements from TI are used for the real HW latency.
-
Here are some scope screenshots showing the time delta between the wake-up event (USER button press, trace A) and the end of omap_sram_idle (USR1 Led), for both RET and OFF mode.
+
Here are some scope screenshots showing the time delta between the wake-up event (USER button press, trace A) and the end of omap_sram_idle (USR1 Led).
 +
 
 +
For RET mode, showing a delta of 408us:
[[File:scope_capture_ret.jpg|center|thumb|320px]]
[[File:scope_capture_ret.jpg|center|thumb|320px]]
 +
 +
For OFF mode, showing a delta of 2700us:
[[File:scope_capture_off.jpg|center|thumb|320px]]
[[File:scope_capture_off.jpg|center|thumb|320px]]
 +
 +
== GPT tracer==
 +
Since GPT12 is used as a wake-up source from the idle mode, it can be used to track the timings during the wake-up sequence.
 +
A patch is needed to let the timer count after it overflowed and woke up the system.
 +
 +
The GPT runs on 32KHz clock and so the resolution is limited to 30.518us. Given the latencies to measure for OFF mode, the resolution is accpetable.
 +
 +
4 GPT measurements are performed during the wake-up:
 +
* At the wake-up event the GPT overflows and the counter value is 0,
 +
* At the time the WFI instrcution is done, before the MPU context restore code (in ASM),
 +
* At the same time as the SW tracers 1 and 7. This allows to synchronize the HW and SW tracers.
== SW trace usage==
== SW trace usage==
Enable the power events and dump the trace:
Enable the power events and dump the trace:
-
<verbatim>
+
<pre>
# echo 1 > /debug/tracing/events/power/enable
# echo 1 > /debug/tracing/events/power/enable
# cat /debug/tracing/trace_pipe &
# cat /debug/tracing/trace_pipe &
-
</verbatim>
+
</pre>
 +
 
 +
Enable the system idle in RET mode:
 +
<pre>
 +
# echo 5 > /sys/devices/platform/omap/omap-hsuart.0/sleep_timeout
 +
# echo 5 > /sys/devices/platform/omap/omap-hsuart.1/sleep_timeout
 +
# echo 5 > /sys/devices/platform/omap/omap-hsuart.2/sleep_timeout
-
Trigger the suspend, RET mode:
 
-
<verbatim>
 
# echo 0 > /debug/pm_debug/enable_off_mode
# echo 0 > /debug/pm_debug/enable_off_mode
-
# echo mem > /sys/power/state
+
# echo 1 > /debug/pm_debug/sleep_while_idle
-
[ 3966.216278] PM: Syncing filesystems ... done.
+
</pre>
-
[ 3966.221557] Freezing user space processes ... (elapsed 0.02 seconds) done.
+
 
-
[ 3966.249267] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) done.
+
Trace output:
-
[ 3966.281188] Suspending console(s) (use no_console_suspend to debug)
+
<pre>
-
[ 3966.404327] PM: suspend of devices complete after 112.426 msecs
+
[   62.311462] *** GPT12 wake-up (HW wake-up, ASM restore, delta trace1-7): 183, 0, 244 us      => Dump of GPT timing deltas
-
[ 3966.406127] PM: late suspend of devices complete after 1.739 msecs
+
          <idle>-0    [000]   62.241608: power_start: type=1 state=1 cpu_id=0                 => Idle start
-
[ 3966.406158] Disabling non-boot CPUs ...
+
          <idle>-0    [000]   62.241608: power_start: type=4 state=1 cpu_id=0                 => First suspend SW trace in omap_sram_idle
-
[ 3967.267150] Successfully put all powerdomains to target state
+
          <idle>-0    [000]   62.241638: power_start: type=4 state=2 cpu_id=0                 => ...
-
[ 3967.267791] PM: early resume of devices complete after 0.396 msecs
+
          <idle>-0    [000]   62.241669: power_start: type=4 state=3 cpu_id=0
-
[ 3967.686279] PM: resume of devices complete after 418.243 msecs
+
          <idle>-0    [000]   62.241699: power_domain_target: name=neon_pwrdm state=1 cpu_id=0
-
[ 3967.726715] Restarting tasks ... done.
+
          <idle>-0    [000]    62.241699: power_start: type=4 state=4 cpu_id=0
-
              sh-503  [000] 3966.406433: power_start: type=3 state=3 cpu_id=0   => Enter OMAP idle function
+
          <idle>-0    [000]   62.241699: clock_disable: name=uart3_fck state=0 cpu_id=0
-
              sh-503  [000] 3966.406494: power_start: type=4 state=1 cpu_id=0   => First suspend trace point
+
          <idle>-0    [000]    62.241730: power_start: type=4 state=5 cpu_id=0
-
              sh-503  [000] 3966.406525: power_start: type=4 state=2 cpu_id=0   => ...
+
          <idle>-0    [000]   62.241730: clock_disable: name=uart1_fck state=0 cpu_id=0
-
              sh-503  [000] 3966.406555: power_start: type=4 state=3 cpu_id=0
+
          <idle>-0    [000]    62.241730: clock_disable: name=uart2_fck state=0 cpu_id=0
-
              sh-503  [000] 3966.406586: power_start: type=4 state=4 cpu_id=0
+
          <idle>-0    [000]    62.241760: power_start: type=4 state=6 cpu_id=0
-
              sh-503  [000] 3966.406616: power_start: type=4 state=5 cpu_id=0
+
          <idle>-0    [000]   62.241760: power_start: type=4 state=7 cpu_id=0
-
              sh-503  [000] 3966.406647: power_start: type=4 state=6 cpu_id=0
+
          <idle>-0    [000]   62.241760: power_start: type=4 state=8 cpu_id=0                 => Last suspend SW trace in omap_sram_idle
-
              sh-503  [000] 3966.406647: power_start: type=4 state=7 cpu_id=0
+
          <idle>-0    [000]   62.311188: power_start: type=5 state=1 cpu_id=0                 => First resume SW trace in omap_sram_idle
-
              sh-503  [000] 3966.406647: power_start: type=4 state=8 cpu_id=0   => Last trace for low power enter, just before WFI
+
          <idle>-0    [000]   62.311188: power_start: type=5 state=2 cpu_id=0                 => ...
-
              sh-503  [000] 3967.266937: power_start: type=5 state=1 cpu_id=0   => First wake-up trace point, at return from omap34xx_save_cpu_context_wfi
+
          <idle>-0    [000]   62.311188: power_start: type=5 state=3 cpu_id=0
-
              sh-503  [000] 3967.266937: power_start: type=5 state=2 cpu_id=0   => ...
+
          <idle>-0    [000]   62.311188: power_start: type=5 state=4 cpu_id=0
-
              sh-503  [000] 3967.266937: power_start: type=5 state=3 cpu_id=0
+
          <idle>-0    [000]   62.311218: clock_enable: name=uart1_fck state=1 cpu_id=0
-
              sh-503  [000] 3967.266937: power_start: type=5 state=4 cpu_id=0
+
          <idle>-0    [000]    62.311310: clock_enable: name=uart2_fck state=1 cpu_id=0
-
              sh-503  [000] 3967.267029: power_start: type=5 state=5 cpu_id=0
+
          <idle>-0    [000]    62.311310: power_start: type=5 state=5 cpu_id=0
-
              sh-503  [000] 3967.267059: power_start: type=5 state=6 cpu_id=0
+
          <idle>-0    [000]   62.311340: clock_enable: name=uart3_fck state=1 cpu_id=0
-
              sh-503  [000] 3967.267120: power_start: type=5 state=7 cpu_id=0   => Last wake-up trace point, at the end of omap_sram_idle
+
          <idle>-0    [000]    62.311340: power_start: type=5 state=6 cpu_id=0
-
              sh-503  [000] 3967.267181: power_start: type=3 state=0 cpu_id=0   => Exit OMAP idle function
+
          <idle>-0    [000]   62.311432: power_start: type=5 state=7 cpu_id=0                 => Last resume SW trace in omap_sram_idle
-
</verbatim>
+
          <idle>-0    [000]   62.311462: power_end: cpu_id=0                                   => Idle end
 +
</pre>
 +
 
 +
Enable the system idle in OFF mode:
 +
<pre>
 +
# echo 5 > /sys/devices/platform/omap/omap-hsuart.0/sleep_timeout
 +
# echo 5 > /sys/devices/platform/omap/omap-hsuart.1/sleep_timeout
 +
# echo 5 > /sys/devices/platform/omap/omap-hsuart.2/sleep_timeout
-
With OFF mode enabled:
 
-
<verbatim>
 
# echo 1 > /debug/pm_debug/enable_off_mode
# echo 1 > /debug/pm_debug/enable_off_mode
-
# echo mem > /sys/power/state
+
# echo 1 > /debug/pm_debug/sleep_while_idle
-
[ 1898.722778] PM: Syncing filesystems ... done.
+
</pre>
-
[ 1898.728057] Freezing user space processes ... (elapsed 0.02 seconds) done.
+
 
-
[ 1898.755950] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) done.
+
Trace output:
-
[ 1898.787872] Suspending console(s) (use no_console_suspend to debug)
+
<pre>
-
[ 1898.910980] PM: suspend of devices complete after 112.396 msecs
+
/ # echo 1 > /debug/pm_debug/enable_off_mode
-
[ 1898.912780] PM: late suspend of devices complete after 1.739 msecs
+
/ #         
-
[ 1898.912811] Disabling non-boot CPUs ...
+
              sh-503  [000]   70.862366: power_domain_target: name=iva2_pwrdm state=0 cpu_id=0
-
[ 1900.083923] Successfully put all powerdomains to target state
+
              sh-503  [000]   70.862396: power_domain_target: name=mpu_pwrdm state=0 cpu_id=0
-
[ 1900.084686] PM: early resume of devices complete after 0.488 msecs
+
              sh-503  [000]   70.862396: power_domain_target: name=neon_pwrdm state=0 cpu_id=0
-
[ 1900.514953] PM: resume of devices complete after 429.962 msecs
+
              sh-503  [000]   70.862396: power_domain_target: name=core_pwrdm state=0 cpu_id=0
-
[ 1900.555725] Restarting tasks ... done.
+
              sh-503  [000]   70.862427: power_domain_target: name=cam_pwrdm state=0 cpu_id=0
-
              sh-503  [000] 1898.913086: power_start: type=3 state=3 cpu_id=0   => Enter OMAP idle function
+
              sh-503  [000]   70.862457: power_domain_target: name=dss_pwrdm state=0 cpu_id=0
-
              sh-503  [000] 1898.913147: power_start: type=4 state=1 cpu_id=0   => First suspend trace point
+
              sh-503  [000]   70.862488: power_domain_target: name=per_pwrdm state=0 cpu_id=0
-
              sh-503  [000] 1898.913147: power_start: type=4 state=2 cpu_id=0   => ...
+
              sh-503  [000]   70.862488: power_domain_target: name=usbhost_pwrdm state=0 cpu_id=0
-
              sh-503  [000] 1898.913208: power_start: type=4 state=3 cpu_id=0
+
/ #
-
              sh-503   [000] 1898.913239: power_start: type=4 state=4 cpu_id=0
+
[ 557.240020] *** GPT12 wake-up (HW wake-up, ASM restore, delta trace1-7): 1495, 915, 488 us    => Dump of GPT timing deltas
-
              sh-503   [000] 1898.913269: power_start: type=4 state=5 cpu_id=0
+
          <idle>-0    [000]   557.156769: power_start: type=1 state=1 cpu_id=0                 => Idle start
-
              sh-503   [000] 1898.913391: power_start: type=4 state=6 cpu_id=0
+
          <idle>-0    [000]   557.156769: power_start: type=4 state=1 cpu_id=0                 => First suspend SW trace in omap_sram_idle
-
              sh-503  [000] 1898.913391: power_start: type=4 state=7 cpu_id=0
+
          <idle>-0    [000]   557.156769: power_start: type=4 state=2 cpu_id=0                 => ...
-
              sh-503  [000] 1898.914337: power_start: type=4 state=8 cpu_id=0   => Last trace for low power enter, just before WFI
+
          <idle>-0    [000]   557.156830: power_start: type=4 state=3 cpu_id=0
-
              sh-503  [000] 1900.083405: power_start: type=5 state=1 cpu_id=0   => First wake-up trace point, at return from omap34xx_save_cpu_context_wfi
+
          <idle>-0    [000]   557.156830: power_domain_target: name=neon_pwrdm state=0 cpu_id=0
-
              sh-503  [000] 1900.083466: power_start: type=5 state=2 cpu_id=0   => ...
+
          <idle>-0    [000]   557.156830: power_start: type=4 state=4 cpu_id=0
-
              sh-503  [000] 1900.083496: power_start: type=5 state=3 cpu_id=0
+
          <idle>-0    [000]   557.156860: clock_disable: name=uart3_fck state=0 cpu_id=0
-
              sh-503  [000] 1900.083588: power_start: type=5 state=4 cpu_id=0
+
          <idle>-0    [000]   557.156891: power_start: type=4 state=5 cpu_id=0
-
              sh-503   [000] 1900.083771: power_start: type=5 state=5 cpu_id=0
+
          <idle>-0    [000]   557.156891: clock_disable: name=uart1_fck state=0 cpu_id=0
-
              sh-503   [000] 1900.083801: power_start: type=5 state=6 cpu_id=0
+
          <idle>-0    [000]   557.156921: clock_disable: name=uart2_fck state=0 cpu_id=0
-
              sh-503  [000] 1900.083862: power_start: type=5 state=7 cpu_id=0   => Last wake-up trace point, at the end of omap_sram_idle
+
          <idle>-0    [000]  557.157013: power_start: type=4 state=6 cpu_id=0
-
              sh-503  [0001900.083954: power_start: type=3 state=0 cpu_id=0   => Exit OMAP idle function
+
          <idle>-0    [000]   557.157013: power_start: type=4 state=7 cpu_id=0
-
</verbatim>
+
          <idle>-0    [000]   557.157898: power_start: type=4 state=8 cpu_id=0                 => Last suspend SW trace in omap_sram_idle
 +
          <idle>-0    [000]   557.236084: power_start: type=5 state=1 cpu_id=0                 => First resume SW trace in omap_sram_idle
 +
          <idle>-0    [000]   557.236145: power_start: type=5 state=2 cpu_id=0                 => ...
 +
          <idle>-0    [000]   557.236206: power_start: type=5 state=3 cpu_id=0
 +
          <idle>-0    [000]   557.236267: power_start: type=5 state=4 cpu_id=0
 +
          <idle>-0    [000]   557.236389: clock_enable: name=uart1_fck state=1 cpu_id=0
 +
          <idle>-0    [000]   557.236450: clock_enable: name=uart2_fck state=1 cpu_id=0
 +
          <idle>-0    [000]  557.236450: power_start: type=5 state=5 cpu_id=0
 +
          <idle>-0    [000]   557.236481: clock_enable: name=uart3_fck state=1 cpu_id=0
 +
          <idle>-0    [000]   557.236511: power_start: type=5 state=6 cpu_id=0
 +
          <idle>-0    [000]   557.236572: power_start: type=5 state=7 cpu_id=0                 => Last resume SW trace in omap_sram_idle
 +
          <idle>-0    [000]  557.236602: power_end: cpu_id=0                                  => Idle end
 +
</pre>
 +
 
 +
==Results interpretation==
 +
The low power transition sequence is pictured as nested calls to functions:
 +
[[File:Low power transition sequence.png|center|thumb|840px]]
 +
 
 +
The measured results (from the HW and SW traces) are mapped to the pictured states according to the following table:
 +
{| border="1"
 +
!Pictured state
 +
!Trace point
 +
!Performed SW action
 +
|- style="color:green;"
 +
|Idle enter
 +
|start suspend
 +
|System ready to enter idle
 +
|-
 +
|omap_sram_idle 1
 +
|suspend trace point 1
 +
|Enter omap_sram_idle
 +
|-
 +
|omap_sram_idle 2
 +
|suspend trace point 2
 +
|calculation of next power domains modes
 +
|-
 +
|omap_sram_idle 3
 +
|suspend trace point 3
 +
|Power domains pre-transition: program power domains current state, clear status
 +
|-
 +
|omap_sram_idle 4
 +
|suspend trace point 4
 +
|Context save for NEON<br>IO pad and chain new state programmed
 +
|-
 +
|omap_sram_idle 5
 +
|suspend trace point 5
 +
|Context save for PER, GPIO<br>Prepare UARTs 2&3
 +
|-
 +
|omap_sram_idle 6
 +
|suspend trace point 6
 +
|Context save for CORE and PRCM<br>Prepare UARTs 0&1
 +
|-
 +
|omap_sram_idle 7
 +
|suspend trace point 7
 +
|Context save for INTC<br>Program SDRC
 +
|- style="color:orange;"
 +
|WFI enter
 +
|suspend trace point 8
 +
|GPIO HW trace<br>MPU context save in ASM (caches, registers, disable cache & prediction)
 +
|- style="color:red;"
 +
|System OFF active
 +
| - sys_off_mode, external clocks and power supplies to be measured with System OFF support
 +
| -
 +
|- style="color:red;"
 +
|Wake-up event: IO or GPT12
 +
|HW trace A (if IO wake-up)<br>GPT12=0 (if GPT wake-up)
 +
| -
 +
|- style="color:red;"
 +
|System OFF inactive
 +
| - sys_off_mode, external clocks and power supplies to be measured with System OFF support
 +
| -
 +
|- style="color:orange;"
 +
|WFI exit
 +
|GPT12 sampling right after WFI
 +
| -
 +
|-
 +
|omap_sram_idle 1
 +
|GPT12 sampling at return from ASM code<br>Wake-up trace point 1
 +
|SDRC errata for ES3.1<br>MPU context restore<br>MMU restore and enable
 +
|-
 +
|omap_sram_idle 2
 +
|wake-up trace point 2
 +
|cpu_init
 +
|-
 +
|omap_sram_idle 3
 +
|wake-up trace point 3
 +
|SDRC settings restore
 +
|-
 +
|omap_sram_idle 4
 +
|wake-up trace point 4
 +
|Restore MMU tables<br>Enable caches and prediction
 +
|-
 +
|omap_sram_idle 5
 +
|wake-up trace point 5
 +
|Context restore for CORE, PRCM, SRAM, SMS<br>Resume UARTs 0&1
 +
|-
 +
|omap_sram_idle 6
 +
|wake-up trace point 6
 +
|Context restore for PER, INTC, GPIO<br>IO pad & chain<br>Resume UARTS 2&3
 +
|-
 +
|omap_sram_idle 7
 +
|wake-up trace point 7<br>GPT sampling<br>HW trace B
 +
|Power domains post-transition: program power domains current state, clear status<br>Restore SDRC settings
 +
|- style="color:green;"
 +
|Idle exit
 +
|exit suspend
 +
|System out of idle
 +
|}
 +
 
 +
==cpuidle results==
 +
===PSI measurements results===
 +
Some timings measurements have been made by the TI PSI team.
 +
The following tables gives the results for the sleep and wake-up latencies for the C-states:
 +
 
 +
[[File:C_states_sleep_latency.png|center|thumb|840px]]
 +
[[File:C_states_wake_up_latency.png|center|thumb|840px]]
 +
 
 +
Note: in the linux code there is no C7/C8/C9 as in the table. C7 is MPU OFF + CORE OFF, which is identical to C9 in the table.
 +
 
 +
A model with the energy spent in the C-states has been built from the measured numbers. Here is the graph of the energy vs time:
 +
 
 +
[[File:C_states_os_idle_energy.png|center|thumb|840px]]
 +
 
 +
Taking the minimum energy from the graph allows to identify the 4 energy-wise interesting C-states: C1, C3, C5, C9 and the threshold time for those C-states to be efficient:
 +
 
 +
[[File:C_states_data.png|center|thumb|840px]]
 +
 
 +
Notes:
 +
* The measurements have been performed at OPP50
 +
* No data has been measured for C9 (MPU OFF + CORE OFF). Data from the HW and SW trace points are used to fill in the results.
 +
* The sys_offmode signal is not supported and so not used for the measurements. A value of 8ms is used in the table. From the T2 scripts page the value should be 11.5ms. The measurements data and the threshold for C9 need to be corrected.
 +
* The sys_clkreq signal is not used and so a correction is needed. ToBeDone
 +
 
 +
===HW and SW measurements results===
 +
Here are the results for full RET and full OFF modes:
 +
{|border="1"
 +
!Sequence
 +
!Time (us) - RET = C5
 +
!Time (us) - OFF = C9
 +
|-
 +
|From idle start till omap_sram_idle entry
 +
|0
 +
|0
 +
|-
 +
|From omap_sram_idle entry till WFI
 +
|152
 +
|1129
 +
|-
 +
|... HW sleep...
 +
|
 +
|
 +
|-
 +
|From WKUP event till WFI<br>(HW wake-up - GPT12)
 +
|183
 +
|1495
 +
|-
 +
|From WFI till return from omap34xx_save_cpu_context_wfi<br>(MPU context restore in ASM)
 +
|0
 +
|915
 +
|-
 +
|From return from omap34xx_save_cpu_context_wfi till end of omap_sram_idle<br>(System restore)
 +
|244
 +
|488
 +
|-
 +
|From end of omap_sram_idle till return from idle
 +
|30
 +
|30
 +
|-
 +
|}
 +
 
 +
===Aggregated timings results===
 +
From the various sources of data the following figures are derived for all C-states (timings in us).
 +
The results are used in the cpuidle table (in arch/arm/mach-omap2/cpuidle34xx.c).
 +
 
 +
{|border="1"
 +
!C-state
 +
!Sleep lat
 +
!Wake-up lat
 +
!Threshold
 +
|-
 +
|C1: MPU WFI/ON - CORE ON
 +
|73.6
 +
|78
 +
|151.6
 +
|-
 +
|C2: MPU WFI - CORE INA
 +
|165
 +
|88.16
 +
|345 (1)
 +
|-
 +
|C3: MPU CSWR - CORE INA
 +
|163
 +
|182
 +
|345
 +
|-
 +
|C4: MPU OFF - CORE INA
 +
|2852
 +
|605
 +
|150000 (2)
 +
|-
 +
|C5: MPU CSWR - CORE CSWR
 +
|800
 +
|366 (3)
 +
|2120
 +
|-
 +
|C6: MPU OFF - CORE CSWR
 +
|4080
 +
|801
 +
|215000 (1)
 +
|-
 +
|C7: MPU OFF - CORE OFF
 +
|4300
 +
|12933 (4)
 +
|215000 (5)
 +
|-
 +
|}
 +
 
 +
Notes:
 +
* The power efficient C-states are identifed as C1, C3, C5, C7
 +
* (1) To force the cpuidle algorithm to chose the power efficient C-states, the other C-states have a threshold value equal to the next power efficient C-state
 +
* (2) The threshold value is derived using the intersection of C3 and C4 in the graph
 +
* (3) No sys_clkoff is supported, this value need to be corrected with the correct value of SYSCLK on/off timings (1ms for sysclk on, 2.5ms for sysclk off)
 +
* (4) From the 'HW and SW measurements results' here above and the T2 scripts page, this value is equal to the HW and SW parts, so 11500 + (915 + 488 + 30)
 +
* (5) The new threshold value is derived using the intersection of C5 and C9 in the graph. However since the sleep and wake-up values are different, C9 is offset in time and in energy by a constant factor (from the initial value of (3760 + 8794) to the new value (4300 + 12933)), and the intersection gives the new threshold
 +
 
 +
==Results for individual power domains==
 +
Since cpuidle only manages the MPU (and depending power domains), the wake-up latency values for the other power domains must be measured separately.
 +
By adjusting the target states of the power domains (in /debug/pm_debug/xxxx_pwrdm/suspend) the following combinations have been measured. All values are in us:
 +
 
 +
===HW and SW measurements results===
 +
The HW and SW tracers are used to measure the wake-up latencies of the power domains. The results are in the table:
 +
[[File:PD_measurements.png‎|center|thumb|840px]]
 +
Notes:
 +
* sys_clkreq and sys_offmode are not supported, only the SW restore timing values are relevant.
 +
 
 +
The significative power domains latencies are derived from the table as follows:
 +
{|border="1"
 +
  !Power Domain
 +
!RET latency
 +
!OFF latency
 +
!Table location
 +
|-
 +
|MPU
 +
|121
 +
|1830
 +
|(5), (6)
 +
|-
 +
|NEON
 +
|0
 +
|0
 +
|Included in MPU transitions?
 +
|-
 +
|CORE
 +
|153
 +
|3082
 +
|(3), (4)
 +
|-
 +
|PER
 +
|0
 +
|671
 +
|(1), (2)
 +
|-
 +
|}
 +
 
 +
Those figures are used in the code as the power domains wake-up latencies for RET and OFF, cf. arch/arm/mach-omap2/powerdomains3xxx_data.c.
 +
 
 +
==ToDo==
 +
* Measure the wake-up latencies for all power domains for OMAP3
 +
* Measure and add figures for OMAP4
 +
* Correct some numbers when sys_clkreq and sys_offmode are supported
 +
 
 +
== C1 performance problem: analysis ==
 +
A serious performance degradation has been noticed during transfers from the NAND device using DMA, cf. http://marc.info/?l=linux-omap&m=133467316214021&w=2 for detailed discussion and patches [1]->[6].
 +
The C1 C-state has a very big latency and degrades the use case performance.
 +
 
 +
===Setup===
 +
* Beagleboard (OMAP3530) at 500MHz,
 +
* l-o master kernel + functional power states + per-device PM QoS. It has been checked that the changes from l-o master do not have an impact on the performance.
 +
* The data transfer is performed using dd from a file in JFFS2 to /dev/null: 'dd if=/tmp/mnt/a of=/dev/null bs=1M count=32'.
 +
 
 +
===Results===
 +
Here are the results on Beagleboard:
 +
* Without using DMA: 4.7MB/s,
 +
* Using DMA
 +
{|border="1"
 +
!Patches applied
 +
!Description
 +
!Measured BW
 +
|-
 +
|[0]
 +
|Initial code
 +
|2.1MB/s
 +
|-
 +
|[1]
 +
|C1 Only
 +
|2.1MB/s
 +
|-
 +
|[1]+[2]
 +
|No pre_ post_
 +
|2.6MB/s
 +
|-
 +
|[1]+[5]
 +
|No pwrdm_for_each_clkdm
 +
|2.3MB/s
 +
|-
 +
|[1]+[5]+[2]
 +
|
 +
|2.8MB/s
 +
|-
 +
|[1]+[7]
 +
|Regs cache (invalidate current states after WFI, invalidate prev states in clear_all_prev_pwrst)
 +
!2.2MB/s
 +
|-
 +
|[1]+[7]+[8]
 +
|khilman's optimizations on pre_ post_ transitions + Regs cache
 +
!2.6MB/s
 +
|-
 +
|[1]+[7]+[8]+[9]
 +
|per=core in C1 + khilman's optimizations on pre_ post_ transitions + Regs cache
 +
!2.8MB/s
 +
|-
 +
|[1]+[7]+[8]+[9]+[10]
 +
|allow/deny_idle on pwrdm->clkdm[0] for mpu, core + per=core in C1 + khilman's optimizations on pre_ post_ transitions + Regs cache
 +
!3.0MB/s
 +
|-
 +
|[1]+[8]+[9]+[10]
 +
|allow/deny_idle on pwrdm->clkdm[0] for mpu, core + per>=core in C1 + khilman's optimizations on pre_ post_ transitions
 +
!3.1MB/s
 +
|-
 +
|[1]+[5]+[6]
 +
|No omap_sram_idle
 +
|3.1MB/s
 +
|-
 +
|[1]+[5]+[6]+No IDLE
 +
|No CPUIDLE, No omap_sram_idle, all pwrdms to ON
 +
|3.1MB/s
 +
|-
 +
|}
 +
 
 +
This shows there is some serious performance issue with the C1 C-state but also that the patches [7]->[10] are providing some solutions.
 +
 
 +
Notes:
 +
* Patches for [7] are at http://marc.info/?l=linux-omap&m=133587781712039&w=2
 +
* Patches for [8] are at http://marc.info/?l=linux-omap&m=133527749024432&w=2
 +
* Patch for [9] is at http://marc.info/?l=linux-omap&m=133656106811605&w=2
 +
* Patch for [10] is at http://marc.info/?l=linux-omap&m=133656106911606&w=2
 +
 
 +
===Main contributors===
 +
Here are the contributors inside __omap3_enter_idle (averaged over 30 samples):
 +
 
 +
[[File:Omap sram idle latency table.png|center|thumb|540px]]
 +
 
 +
The main contributors are:
 +
* (140us) pwrdm_pre_transition and pwrdm_post_transition,
 +
* (105us) omap2_gpio_prepare_for_idle and omap2_gpio_resume_after_idle. This could be avoided if PER stays ON in the latency-critical C-states,
 +
* (78us) pwrdm_for_each_clkdm(mpu, core, deny_idle/allow_idle),
 +
* (33us estimated) omap_set_pwrdm_state(mpu, core, neon),
 +
* (11 us) clkdm_allow_idle(mpu). Is this needed?
 +
 
 +
The HW idle time is 6.5us, which is neglectable compared to the SW overhead required to reach the idle state.
 +
 
 +
===Use case idle stats===
 +
Using only the cpuidle tracepoints, the average times in idle are (averaged over 60 samples):
 +
{|border="1"
 +
!Use case
 +
!Measured BW MB/s
 +
!Description
 +
!Idle us
 +
!Active us
 +
|-
 +
|[1]
 +
|2.1
 +
|Cpuidle, omap_sram_idle, only C1
 +
!397
 +
|394
 +
|-
 +
|[1]+[7]
 +
|2.2
 +
|Regs cache, only C1
 +
!349
 +
|397
 +
|-
 +
|[1]+[7]+[8]
 +
|2.6
 +
|khilman optims + Regs cache, only C1
 +
!246
 +
|364
 +
|-
 +
|[1]+[7]+[8]+[9]+[10]
 +
|3.0
 +
|allow/deny_idle on pwrdm->clkdm[0] for mpu, core + pre=core in C1 + khilman's optims + Regs cache, only C1
 +
!178
 +
|403
 +
|-
 +
|[1]+[8]+[9]+[10]
 +
|3.1
 +
|allow/deny_idle on pwrdm->clkdm[0] for mpu, core + per>=core in C1 + khilman's optims, only C1
 +
!152
 +
|259
 +
|-
 +
|[1]+[5]+[6]+No IDLE
 +
|3.1
 +
|No cpuidle, no omap_sram_idle, all pwrdms to ON
 +
!113
 +
|477
 +
|-
 +
|}
-
==Timings results==
+
Notes:
-
===Full RET mode===
+
* From the above stats, the average latencies in C1 (397us;349us;246us;178us) exceed the idle duration without cpuidle (113us), hence the performance degradation.
-
* Suspend of devices: 112.426 ms
+
* The registers cache optimizes the low power mode transitions, but is not sufficient to obtain a big gain. A few unused domains are transitioning, which causes a big penalty in the idle path.
-
* Late suspend of devices: 1.739 ms
+
* khilman's optimizations are really helpful. Furthermore it optimizes farther the registers cache statistics accesses.
 +
* The [1]+[8]+[9]+[10] combination brings the performance close to the non CPUIDLE case (no IDLE, no omap_sram_idle, all pwrdms to ON).
-
* From idle start till omap_sram_idle entry: 61us
+
===Registers cache accesses stats===
-
* From omap_sram_idle entry till WFI: 153us
+
The number of registers accesses are shown in PM debug using a registers cache statistics patch.
 +
The debug log shows the big number of accesses in this optimized use case ([0]+[7]+[8]) and the cache efficiency:
-
* ... HW sleep... including MPU context restore in ASM
+
/ # cat /debug/pm_debug/count
 +
usbhost_pwrdm (ON),OFF:719,OSWR:0,CSWR:578,INA:0,ON:1298,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 8279, hit 29, rate 0%
 +
sgx_pwrdm (OFF),OFF:1,OSWR:0,CSWR:0,INA:0,ON:1,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 8275, hit 26, rate 0%
 +
core_pwrdm (ON),OFF:19,OSWR:0,CSWR:20,INA:14,ON:54,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0,RET-MEMBANK2-OFF:0. Cache access 14960, hit 3966, rate 26%
 +
per_pwrdm (ON),OFF:33,OSWR:0,CSWR:20,INA:0,ON:54,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 18811, hit 7817, rate 41%
 +
dss_pwrdm (ON),OFF:719,OSWR:0,CSWR:578,INA:0,ON:1298,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 8279, hit 29, rate 0%
 +
cam_pwrdm (OFF),OFF:1,OSWR:0,CSWR:1,INA:0,ON:1,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 10907, hit 2657, rate 24%
 +
neon_pwrdm (ON),OFF:19,OSWR:0,CSWR:1271,INA:7,ON:1298,RET-LOGIC-OFF:0. Cache access 12611, hit 2885, rate 22%
 +
mpu_pwrdm (ON),OFF:19,OSWR:0,CSWR:1271,INA:7,ON:1298,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 18390, hit 7396, rate 40%
 +
iva2_pwrdm (OFF),OFF:1,OSWR:0,CSWR:1,INA:0,ON:1,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0,RET-MEMBANK2-OFF:0,RET-MEMBANK3-OFF:0,RET-MEMBANK4-OFF:0. Cache access 8281, hit 31, rate 0%
 +
usbhost_clkdm->usbhost_pwrdm (1)
 +
sgx_clkdm->sgx_pwrdm (0)
 +
per_clkdm->per_pwrdm (16)
 +
cam_clkdm->cam_pwrdm (0)
 +
dss_clkdm->dss_pwrdm (1)
 +
core_l4_clkdm->core_pwrdm (21)
 +
core_l3_clkdm->core_pwrdm (4)
 +
d2d_clkdm->core_pwrdm (0)
 +
iva2_clkdm->iva2_pwrdm (0)
 +
neon_clkdm->neon_pwrdm (0)
 +
mpu_clkdm->mpu_pwrdm (0)
 +
prm_clkdm->wkup_pwrdm (0)
 +
cm_clkdm->core_pwrdm (0)
 +
/ #
-
* From WKUP event (USER button) till return from omap34xx_save_cpu_context_wfi: 194us
+
It can be noted that some power domains have a cache hit rate of 0%, because they are unused (i.e. not controlled by any driver). Still a lot of registers accesses are performed in the idle path.
-
* From return from omap34xx_save_cpu_context_wfi till end of omap_sram_idle: 214us
+
-
* From end of omap_sram_idle till end of idle: 30us
+
-
* Early resume of devices: 0.396 ms
+
Here is the statistics patch:
-
* Resume of devices: 418.243 msecs
+
diff --git a/arch/arm/mach-omap2/pm-debug.c b/arch/arm/mach-omap2/pm-debug.c
 +
index ed9846e..632db47 100644
 +
--- a/arch/arm/mach-omap2/pm-debug.c
 +
+++ b/arch/arm/mach-omap2/pm-debug.c
 +
@@ -119,6 +119,9 @@ static int pwrdm_dbg_show_counter(struct powerdomain *pwrdm, void *user)
 +
seq_printf(s, ",RET-MEMBANK%d-OFF:%d", i + 1,
 +
pwrdm->ret_mem_off_counter[i]);
 +
 +
+ seq_printf(s, ". Cache access %d, hit %d, rate %d%%",
 +
+ pwrdm->pwrdm_cache_access, pwrdm->pwrdm_cache_hit,
 +
+ (100 * pwrdm->pwrdm_cache_hit)/pwrdm->pwrdm_cache_access);
 +
seq_printf(s, "\n");
 +
 +
return 0;
 +
diff --git a/arch/arm/mach-omap2/powerdomain.c b/arch/arm/mach-omap2/powerdomain.c
 +
index 537595c..1b80ee9 100644
 +
--- a/arch/arm/mach-omap2/powerdomain.c
 +
+++ b/arch/arm/mach-omap2/powerdomain.c
 +
@@ -711,10 +711,12 @@ static int pwrdm_cache_read(struct powerdomain *pwrdm, int index, int *value)
 +
if (index >= PWRDM_CACHE_SIZE)
 +
return -EINVAL;
 +
 +
+ pwrdm->pwrdm_cache_access++;
 +
if (!(pwrdm->cache_state & (1 << index)))
 +
return -ENODATA;
 +
 +
*value = pwrdm->cache[index];
 +
+ pwrdm->pwrdm_cache_hit++;
 +
return 0;
 +
}
 +
 +
diff --git a/arch/arm/mach-omap2/powerdomain.h b/arch/arm/mach-omap2/powerdomain.h
 +
index 92386bd..a9eae1c 100644
 +
--- a/arch/arm/mach-omap2/powerdomain.h
 +
+++ b/arch/arm/mach-omap2/powerdomain.h
 +
@@ -172,6 +172,10 @@ struct powerdomain {
 +
struct mutex lock;
 +
int state;
 +
int cache[PWRDM_CACHE_SIZE];
 +
+
 +
+ int pwrdm_cache_access;
 +
+ int pwrdm_cache_hit;
 +
+
 +
long cache_state;
 +
unsigned state_counter[PWRDM_MAX_FUNC_PWRSTS];
 +
unsigned ret_logic_off_counter;
-
===Full OFF mode===
+
==Links==
-
* Suspend of devices: 112.396 ms
+
===Device latency patches===
-
* Late suspend of devices: 1.739 ms
+
[http://marc.info/?l=linux-omap&m=133475685213067&w=2 PM QoS device constraint code patches]
-
* From idle start till omap_sram_idle entry: 61us
+
[http://omappedia.org/wiki/TWL4030_power_scripts T2 scripts information page]
-
* From omap_sram_idle entry till WFI: 1190us
+
-
* ... HW sleep... including MPU context restore
+
==Attachments==
 +
===Kernel patches and config===
 +
[[File:OMAP_latency_measurements_patches_and_config.tar.gz|center|thumb|320px]]
-
* From WKUP event (USER button) till return from omap34xx_save_cpu_context_wfi: 2151us
+
==Presentation slides: Fosdem/ELC 2012==
-
* From return from omap34xx_save_cpu_context_wfi till end of omap_sram_idle: 457us
+
[[File:ELC-2012-jpihet-DeviceLatencyModel.pdf]]
-
* From end of omap_sram_idle till end of idle: 92us
+
-
* Early resume of devices: 0.488 ms
+
--[[User:Jpihet|Jpihet]] 24 Apr 2012
-
* Resume of devices: 429.962 msecs
+

Latest revision as of 11:09, 9 May 2012

Contents

[edit] PM Devices constraintes measurements

[edit] Introduction

To correctly implement the device latency constraint support it is needed to get accurate measurements of the system low power modes overhead:

This wiki page details the measurements setup and the results. The latency data is to be fed into the constraints latency patches.

[edit] Kernel patches & build

Some kernel changes are required for the kernel instrumentation. The patches and config are attached to this page.

Changes: enable IDLE, DSS for Beagle, Initramfs Busybox root FS

[edit] HW traces details

The trace points are connected on Beagleboard rev B7.

!Warning! The HW power supplies and external clocks are not cut off in this config (no support for System OFF in l-o), so the HW latencies are lower than expected. The HW measurements need to be performed as soon as l-o supports the System OFF. The measurements from TI are used for the real HW latency.

Here are some scope screenshots showing the time delta between the wake-up event (USER button press, trace A) and the end of omap_sram_idle (USR1 Led).

For RET mode, showing a delta of 408us:

Scope capture ret.jpg

For OFF mode, showing a delta of 2700us:

Scope capture off.jpg

[edit] GPT tracer

Since GPT12 is used as a wake-up source from the idle mode, it can be used to track the timings during the wake-up sequence. A patch is needed to let the timer count after it overflowed and woke up the system.

The GPT runs on 32KHz clock and so the resolution is limited to 30.518us. Given the latencies to measure for OFF mode, the resolution is accpetable.

4 GPT measurements are performed during the wake-up:

[edit] SW trace usage

Enable the power events and dump the trace:

# echo 1 > /debug/tracing/events/power/enable
# cat /debug/tracing/trace_pipe &

Enable the system idle in RET mode:

# echo 5 > /sys/devices/platform/omap/omap-hsuart.0/sleep_timeout 
# echo 5 > /sys/devices/platform/omap/omap-hsuart.1/sleep_timeout 
# echo 5 > /sys/devices/platform/omap/omap-hsuart.2/sleep_timeout 

# echo 0 > /debug/pm_debug/enable_off_mode
# echo 1 > /debug/pm_debug/sleep_while_idle

Trace output:

[   62.311462] *** GPT12 wake-up (HW wake-up, ASM restore, delta trace1-7): 183, 0, 244 us       => Dump of GPT timing deltas
          <idle>-0     [000]    62.241608: power_start: type=1 state=1 cpu_id=0                  => Idle start
          <idle>-0     [000]    62.241608: power_start: type=4 state=1 cpu_id=0                  => First suspend SW trace in omap_sram_idle
          <idle>-0     [000]    62.241638: power_start: type=4 state=2 cpu_id=0                  => ...
          <idle>-0     [000]    62.241669: power_start: type=4 state=3 cpu_id=0
          <idle>-0     [000]    62.241699: power_domain_target: name=neon_pwrdm state=1 cpu_id=0
          <idle>-0     [000]    62.241699: power_start: type=4 state=4 cpu_id=0
          <idle>-0     [000]    62.241699: clock_disable: name=uart3_fck state=0 cpu_id=0
          <idle>-0     [000]    62.241730: power_start: type=4 state=5 cpu_id=0
          <idle>-0     [000]    62.241730: clock_disable: name=uart1_fck state=0 cpu_id=0
          <idle>-0     [000]    62.241730: clock_disable: name=uart2_fck state=0 cpu_id=0
          <idle>-0     [000]    62.241760: power_start: type=4 state=6 cpu_id=0
          <idle>-0     [000]    62.241760: power_start: type=4 state=7 cpu_id=0
          <idle>-0     [000]    62.241760: power_start: type=4 state=8 cpu_id=0                  => Last suspend SW trace in omap_sram_idle
          <idle>-0     [000]    62.311188: power_start: type=5 state=1 cpu_id=0                  => First resume SW trace in omap_sram_idle
          <idle>-0     [000]    62.311188: power_start: type=5 state=2 cpu_id=0                  => ...
          <idle>-0     [000]    62.311188: power_start: type=5 state=3 cpu_id=0
          <idle>-0     [000]    62.311188: power_start: type=5 state=4 cpu_id=0
          <idle>-0     [000]    62.311218: clock_enable: name=uart1_fck state=1 cpu_id=0
          <idle>-0     [000]    62.311310: clock_enable: name=uart2_fck state=1 cpu_id=0
          <idle>-0     [000]    62.311310: power_start: type=5 state=5 cpu_id=0
          <idle>-0     [000]    62.311340: clock_enable: name=uart3_fck state=1 cpu_id=0
          <idle>-0     [000]    62.311340: power_start: type=5 state=6 cpu_id=0
          <idle>-0     [000]    62.311432: power_start: type=5 state=7 cpu_id=0                  => Last resume SW trace in omap_sram_idle
          <idle>-0     [000]    62.311462: power_end: cpu_id=0                                   => Idle end

Enable the system idle in OFF mode:

# echo 5 > /sys/devices/platform/omap/omap-hsuart.0/sleep_timeout 
# echo 5 > /sys/devices/platform/omap/omap-hsuart.1/sleep_timeout 
# echo 5 > /sys/devices/platform/omap/omap-hsuart.2/sleep_timeout 

# echo 1 > /debug/pm_debug/enable_off_mode
# echo 1 > /debug/pm_debug/sleep_while_idle

Trace output:

/ # echo 1 > /debug/pm_debug/enable_off_mode
/ #           
              sh-503   [000]    70.862366: power_domain_target: name=iva2_pwrdm state=0 cpu_id=0
              sh-503   [000]    70.862396: power_domain_target: name=mpu_pwrdm state=0 cpu_id=0
              sh-503   [000]    70.862396: power_domain_target: name=neon_pwrdm state=0 cpu_id=0
              sh-503   [000]    70.862396: power_domain_target: name=core_pwrdm state=0 cpu_id=0
              sh-503   [000]    70.862427: power_domain_target: name=cam_pwrdm state=0 cpu_id=0
              sh-503   [000]    70.862457: power_domain_target: name=dss_pwrdm state=0 cpu_id=0
              sh-503   [000]    70.862488: power_domain_target: name=per_pwrdm state=0 cpu_id=0
              sh-503   [000]    70.862488: power_domain_target: name=usbhost_pwrdm state=0 cpu_id=0
/ # 
[  557.240020] *** GPT12 wake-up (HW wake-up, ASM restore, delta trace1-7): 1495, 915, 488 us    => Dump of GPT timing deltas
          <idle>-0     [000]   557.156769: power_start: type=1 state=1 cpu_id=0                  => Idle start
          <idle>-0     [000]   557.156769: power_start: type=4 state=1 cpu_id=0                  => First suspend SW trace in omap_sram_idle
          <idle>-0     [000]   557.156769: power_start: type=4 state=2 cpu_id=0                  => ...
          <idle>-0     [000]   557.156830: power_start: type=4 state=3 cpu_id=0
          <idle>-0     [000]   557.156830: power_domain_target: name=neon_pwrdm state=0 cpu_id=0
          <idle>-0     [000]   557.156830: power_start: type=4 state=4 cpu_id=0
          <idle>-0     [000]   557.156860: clock_disable: name=uart3_fck state=0 cpu_id=0
          <idle>-0     [000]   557.156891: power_start: type=4 state=5 cpu_id=0
          <idle>-0     [000]   557.156891: clock_disable: name=uart1_fck state=0 cpu_id=0
          <idle>-0     [000]   557.156921: clock_disable: name=uart2_fck state=0 cpu_id=0
          <idle>-0     [000]   557.157013: power_start: type=4 state=6 cpu_id=0
          <idle>-0     [000]   557.157013: power_start: type=4 state=7 cpu_id=0
          <idle>-0     [000]   557.157898: power_start: type=4 state=8 cpu_id=0                  => Last suspend SW trace in omap_sram_idle
          <idle>-0     [000]   557.236084: power_start: type=5 state=1 cpu_id=0                  => First resume SW trace in omap_sram_idle
          <idle>-0     [000]   557.236145: power_start: type=5 state=2 cpu_id=0                  => ...
          <idle>-0     [000]   557.236206: power_start: type=5 state=3 cpu_id=0
          <idle>-0     [000]   557.236267: power_start: type=5 state=4 cpu_id=0
          <idle>-0     [000]   557.236389: clock_enable: name=uart1_fck state=1 cpu_id=0
          <idle>-0     [000]   557.236450: clock_enable: name=uart2_fck state=1 cpu_id=0
          <idle>-0     [000]   557.236450: power_start: type=5 state=5 cpu_id=0
          <idle>-0     [000]   557.236481: clock_enable: name=uart3_fck state=1 cpu_id=0
          <idle>-0     [000]   557.236511: power_start: type=5 state=6 cpu_id=0
          <idle>-0     [000]   557.236572: power_start: type=5 state=7 cpu_id=0                  => Last resume SW trace in omap_sram_idle
          <idle>-0     [000]   557.236602: power_end: cpu_id=0                                   => Idle end

[edit] Results interpretation

The low power transition sequence is pictured as nested calls to functions:

Low power transition sequence.png

The measured results (from the HW and SW traces) are mapped to the pictured states according to the following table:

Pictured state Trace point Performed SW action
Idle enter start suspend System ready to enter idle
omap_sram_idle 1 suspend trace point 1 Enter omap_sram_idle
omap_sram_idle 2 suspend trace point 2 calculation of next power domains modes
omap_sram_idle 3 suspend trace point 3 Power domains pre-transition: program power domains current state, clear status
omap_sram_idle 4 suspend trace point 4 Context save for NEON
IO pad and chain new state programmed
omap_sram_idle 5 suspend trace point 5 Context save for PER, GPIO
Prepare UARTs 2&3
omap_sram_idle 6 suspend trace point 6 Context save for CORE and PRCM
Prepare UARTs 0&1
omap_sram_idle 7 suspend trace point 7 Context save for INTC
Program SDRC
WFI enter suspend trace point 8 GPIO HW trace
MPU context save in ASM (caches, registers, disable cache & prediction)
System OFF active - sys_off_mode, external clocks and power supplies to be measured with System OFF support -
Wake-up event: IO or GPT12 HW trace A (if IO wake-up)
GPT12=0 (if GPT wake-up)
-
System OFF inactive - sys_off_mode, external clocks and power supplies to be measured with System OFF support -
WFI exit GPT12 sampling right after WFI -
omap_sram_idle 1 GPT12 sampling at return from ASM code
Wake-up trace point 1
SDRC errata for ES3.1
MPU context restore
MMU restore and enable
omap_sram_idle 2 wake-up trace point 2 cpu_init
omap_sram_idle 3 wake-up trace point 3 SDRC settings restore
omap_sram_idle 4 wake-up trace point 4 Restore MMU tables
Enable caches and prediction
omap_sram_idle 5 wake-up trace point 5 Context restore for CORE, PRCM, SRAM, SMS
Resume UARTs 0&1
omap_sram_idle 6 wake-up trace point 6 Context restore for PER, INTC, GPIO
IO pad & chain
Resume UARTS 2&3
omap_sram_idle 7 wake-up trace point 7
GPT sampling
HW trace B
Power domains post-transition: program power domains current state, clear status
Restore SDRC settings
Idle exit exit suspend System out of idle

[edit] cpuidle results

[edit] PSI measurements results

Some timings measurements have been made by the TI PSI team. The following tables gives the results for the sleep and wake-up latencies for the C-states:

C states sleep latency.png
C states wake up latency.png

Note: in the linux code there is no C7/C8/C9 as in the table. C7 is MPU OFF + CORE OFF, which is identical to C9 in the table.

A model with the energy spent in the C-states has been built from the measured numbers. Here is the graph of the energy vs time:

C states os idle energy.png

Taking the minimum energy from the graph allows to identify the 4 energy-wise interesting C-states: C1, C3, C5, C9 and the threshold time for those C-states to be efficient:

C states data.png

Notes:

[edit] HW and SW measurements results

Here are the results for full RET and full OFF modes:

Sequence Time (us) - RET = C5 Time (us) - OFF = C9
From idle start till omap_sram_idle entry 0 0
From omap_sram_idle entry till WFI 152 1129
... HW sleep...
From WKUP event till WFI
(HW wake-up - GPT12)
183 1495
From WFI till return from omap34xx_save_cpu_context_wfi
(MPU context restore in ASM)
0 915
From return from omap34xx_save_cpu_context_wfi till end of omap_sram_idle
(System restore)
244 488
From end of omap_sram_idle till return from idle 30 30

[edit] Aggregated timings results

From the various sources of data the following figures are derived for all C-states (timings in us). The results are used in the cpuidle table (in arch/arm/mach-omap2/cpuidle34xx.c).

C-state Sleep lat Wake-up lat Threshold
C1: MPU WFI/ON - CORE ON 73.6 78 151.6
C2: MPU WFI - CORE INA 165 88.16 345 (1)
C3: MPU CSWR - CORE INA 163 182 345
C4: MPU OFF - CORE INA 2852 605 150000 (2)
C5: MPU CSWR - CORE CSWR 800 366 (3) 2120
C6: MPU OFF - CORE CSWR 4080 801 215000 (1)
C7: MPU OFF - CORE OFF 4300 12933 (4) 215000 (5)

Notes:

[edit] Results for individual power domains

Since cpuidle only manages the MPU (and depending power domains), the wake-up latency values for the other power domains must be measured separately. By adjusting the target states of the power domains (in /debug/pm_debug/xxxx_pwrdm/suspend) the following combinations have been measured. All values are in us:

[edit] HW and SW measurements results

The HW and SW tracers are used to measure the wake-up latencies of the power domains. The results are in the table:

PD measurements.png

Notes:

The significative power domains latencies are derived from the table as follows:

Power Domain RET latency OFF latency Table location
MPU 121 1830 (5), (6)
NEON 0 0 Included in MPU transitions?
CORE 153 3082 (3), (4)
PER 0 671 (1), (2)

Those figures are used in the code as the power domains wake-up latencies for RET and OFF, cf. arch/arm/mach-omap2/powerdomains3xxx_data.c.

[edit] ToDo

[edit] C1 performance problem: analysis

A serious performance degradation has been noticed during transfers from the NAND device using DMA, cf. http://marc.info/?l=linux-omap&m=133467316214021&w=2 for detailed discussion and patches [1]->[6]. The C1 C-state has a very big latency and degrades the use case performance.

[edit] Setup

[edit] Results

Here are the results on Beagleboard:

Patches applied Description Measured BW
[0] Initial code 2.1MB/s
[1] C1 Only 2.1MB/s
[1]+[2] No pre_ post_ 2.6MB/s
[1]+[5] No pwrdm_for_each_clkdm 2.3MB/s
[1]+[5]+[2] 2.8MB/s
[1]+[7] Regs cache (invalidate current states after WFI, invalidate prev states in clear_all_prev_pwrst) 2.2MB/s
[1]+[7]+[8] khilman's optimizations on pre_ post_ transitions + Regs cache 2.6MB/s
[1]+[7]+[8]+[9] per=core in C1 + khilman's optimizations on pre_ post_ transitions + Regs cache 2.8MB/s
[1]+[7]+[8]+[9]+[10] allow/deny_idle on pwrdm->clkdm[0] for mpu, core + per=core in C1 + khilman's optimizations on pre_ post_ transitions + Regs cache 3.0MB/s
[1]+[8]+[9]+[10] allow/deny_idle on pwrdm->clkdm[0] for mpu, core + per>=core in C1 + khilman's optimizations on pre_ post_ transitions 3.1MB/s
[1]+[5]+[6] No omap_sram_idle 3.1MB/s
[1]+[5]+[6]+No IDLE No CPUIDLE, No omap_sram_idle, all pwrdms to ON 3.1MB/s

This shows there is some serious performance issue with the C1 C-state but also that the patches [7]->[10] are providing some solutions.

Notes:

[edit] Main contributors

Here are the contributors inside __omap3_enter_idle (averaged over 30 samples):

Omap sram idle latency table.png

The main contributors are:

The HW idle time is 6.5us, which is neglectable compared to the SW overhead required to reach the idle state.

[edit] Use case idle stats

Using only the cpuidle tracepoints, the average times in idle are (averaged over 60 samples):

Use case Measured BW MB/s Description Idle us Active us
[1] 2.1 Cpuidle, omap_sram_idle, only C1 397 394
[1]+[7] 2.2 Regs cache, only C1 349 397
[1]+[7]+[8] 2.6 khilman optims + Regs cache, only C1 246 364
[1]+[7]+[8]+[9]+[10] 3.0 allow/deny_idle on pwrdm->clkdm[0] for mpu, core + pre=core in C1 + khilman's optims + Regs cache, only C1 178 403
[1]+[8]+[9]+[10] 3.1 allow/deny_idle on pwrdm->clkdm[0] for mpu, core + per>=core in C1 + khilman's optims, only C1 152 259
[1]+[5]+[6]+No IDLE 3.1 No cpuidle, no omap_sram_idle, all pwrdms to ON 113 477

Notes:

[edit] Registers cache accesses stats

The number of registers accesses are shown in PM debug using a registers cache statistics patch. The debug log shows the big number of accesses in this optimized use case ([0]+[7]+[8]) and the cache efficiency:

/ # cat /debug/pm_debug/count 
usbhost_pwrdm (ON),OFF:719,OSWR:0,CSWR:578,INA:0,ON:1298,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 8279, hit 29, rate 0%
sgx_pwrdm (OFF),OFF:1,OSWR:0,CSWR:0,INA:0,ON:1,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 8275, hit 26, rate 0%
core_pwrdm (ON),OFF:19,OSWR:0,CSWR:20,INA:14,ON:54,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0,RET-MEMBANK2-OFF:0. Cache access 14960, hit 3966, rate 26%
per_pwrdm (ON),OFF:33,OSWR:0,CSWR:20,INA:0,ON:54,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 18811, hit 7817, rate 41%
dss_pwrdm (ON),OFF:719,OSWR:0,CSWR:578,INA:0,ON:1298,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 8279, hit 29, rate 0%
cam_pwrdm (OFF),OFF:1,OSWR:0,CSWR:1,INA:0,ON:1,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 10907, hit 2657, rate 24%
neon_pwrdm (ON),OFF:19,OSWR:0,CSWR:1271,INA:7,ON:1298,RET-LOGIC-OFF:0. Cache access 12611, hit 2885, rate 22%
mpu_pwrdm (ON),OFF:19,OSWR:0,CSWR:1271,INA:7,ON:1298,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0. Cache access 18390, hit 7396, rate 40%
iva2_pwrdm (OFF),OFF:1,OSWR:0,CSWR:1,INA:0,ON:1,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0,RET-MEMBANK2-OFF:0,RET-MEMBANK3-OFF:0,RET-MEMBANK4-OFF:0. Cache access 8281, hit 31, rate 0%
usbhost_clkdm->usbhost_pwrdm (1)
sgx_clkdm->sgx_pwrdm (0)
per_clkdm->per_pwrdm (16)
cam_clkdm->cam_pwrdm (0)
dss_clkdm->dss_pwrdm (1)
core_l4_clkdm->core_pwrdm (21)
core_l3_clkdm->core_pwrdm (4)
d2d_clkdm->core_pwrdm (0)
iva2_clkdm->iva2_pwrdm (0)
neon_clkdm->neon_pwrdm (0)
mpu_clkdm->mpu_pwrdm (0)
prm_clkdm->wkup_pwrdm (0)
cm_clkdm->core_pwrdm (0)
/ # 

It can be noted that some power domains have a cache hit rate of 0%, because they are unused (i.e. not controlled by any driver). Still a lot of registers accesses are performed in the idle path.

Here is the statistics patch:

diff --git a/arch/arm/mach-omap2/pm-debug.c b/arch/arm/mach-omap2/pm-debug.c
index ed9846e..632db47 100644
--- a/arch/arm/mach-omap2/pm-debug.c
+++ b/arch/arm/mach-omap2/pm-debug.c
@@ -119,6 +119,9 @@ static int pwrdm_dbg_show_counter(struct powerdomain *pwrdm, void *user)
		seq_printf(s, ",RET-MEMBANK%d-OFF:%d", i + 1,
				pwrdm->ret_mem_off_counter[i]);

+	seq_printf(s, ". Cache access %d, hit %d, rate %d%%",
+			pwrdm->pwrdm_cache_access, pwrdm->pwrdm_cache_hit,
+			(100 * pwrdm->pwrdm_cache_hit)/pwrdm->pwrdm_cache_access);
	seq_printf(s, "\n");

	return 0;
diff --git a/arch/arm/mach-omap2/powerdomain.c b/arch/arm/mach-omap2/powerdomain.c
index 537595c..1b80ee9 100644
--- a/arch/arm/mach-omap2/powerdomain.c
+++ b/arch/arm/mach-omap2/powerdomain.c
@@ -711,10 +711,12 @@ static int pwrdm_cache_read(struct powerdomain *pwrdm, int index, int *value)
	if (index >= PWRDM_CACHE_SIZE)
		return -EINVAL;

+	pwrdm->pwrdm_cache_access++;
	if (!(pwrdm->cache_state & (1 << index)))
		return -ENODATA;

	*value = pwrdm->cache[index];
+	pwrdm->pwrdm_cache_hit++;
	return 0;
}

diff --git a/arch/arm/mach-omap2/powerdomain.h b/arch/arm/mach-omap2/powerdomain.h
index 92386bd..a9eae1c 100644
--- a/arch/arm/mach-omap2/powerdomain.h
+++ b/arch/arm/mach-omap2/powerdomain.h
@@ -172,6 +172,10 @@ struct powerdomain {
	struct mutex lock;
	int state;
	int cache[PWRDM_CACHE_SIZE];
+
+	int pwrdm_cache_access;
+	int pwrdm_cache_hit;
+
	long cache_state;
	unsigned state_counter[PWRDM_MAX_FUNC_PWRSTS];
	unsigned ret_logic_off_counter;

[edit] Links

[edit] Device latency patches

PM QoS device constraint code patches

T2 scripts information page

[edit] Attachments

[edit] Kernel patches and config

File:OMAP latency measurements patches and config.tar.gz

[edit] Presentation slides: Fosdem/ELC 2012

File:ELC-2012-jpihet-DeviceLatencyModel.pdf

--Jpihet 24 Apr 2012

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox