Using the Linux Kernel Latency Histograms This document gives a short explanation how to enable, configure and use latency histograms. Latency histograms are primarily relevant in the context of real-time enabled kernels (CONFIG_PREEMPT/CONFIG_PREEMPT_RT) and are used in the quality management of the Linux real-time capabilities. * Purpose of latency histograms A latency histogram continuously accumulates the frequencies of latency data. There are two types of histograms - potential sources of latencies - effective latencies * Potential sources of latencies Potential sources of latencies are code segments where interrupts, preemption or both are disabled (aka critical sections). To create histograms of potential sources of latency, the kernel stores the time stamp at the start of a critical section, determines the time elapsed when the end of the section is reached, and increments the frequency counter of that latency value - irrespective of whether any concurrently running process is affected by latency or not. - Configuration items (in the Kernel hacking/Tracers submenu) CONFIG_INTERRUPT_OFF_LATENCY CONFIG_PREEMPT_OFF_LATENCY * Effective latencies Effective latencies are actually occuring during wakeup of a process. To determine effective latencies, the kernel stores the time stamp when a process is scheduled to be woken up, and determines the duration of the wakeup time shortly before control is passed over to this process. Note that the apparent latency in user space may be somewhat longer, since the process may be interrupted after control is passed over to it but before the execution in user space takes place. Simply measuring the interval between enqueuing and wakeup may also not appropriate in cases when a process is scheduled as a result of a timer expiration. The timer may have missed its deadline, e.g. due to disabled interrupts, but this latency would not be registered. Therefore, the offsets of missed timers are recorded in a separate histogram. If both wakeup latency and missed timer offsets are configured and enabled, a third histogram may be enabled that records the overall latency as a sum of the timer latency, if any, and the wakeup latency. This histogram is called "timerandwakeup". - Configuration items (in the Kernel hacking/Tracers submenu) CONFIG_WAKEUP_LATENCY CONFIG_MISSED_TIMER_OFSETS * Usage The interface to the administration of the latency histograms is located in the debugfs file system. To mount it, either enter mount -t sysfs nodev /sys mount -t debugfs nodev /sys/kernel/debug from shell command line level, or add nodev /sys sysfs defaults 0 0 nodev /sys/kernel/debug debugfs defaults 0 0 to the file /etc/fstab. All latency histogram related files are then available in the directory /sys/kernel/debug/tracing/latency_hist. A particular histogram type is enabled by writing non-zero to the related variable in the /sys/kernel/debug/tracing/latency_hist/enable directory. Select "preemptirqsoff" for the histograms of potential sources of latencies and "wakeup" for histograms of effective latencies etc. The histogram data - one per CPU - are available in the files /sys/kernel/debug/tracing/latency_hist/preemptoff/CPUx /sys/kernel/debug/tracing/latency_hist/irqsoff/CPUx /sys/kernel/debug/tracing/latency_hist/preemptirqsoff/CPUx /sys/kernel/debug/tracing/latency_hist/wakeup/CPUx /sys/kernel/debug/tracing/latency_hist/wakeup/sharedprio/CPUx /sys/kernel/debug/tracing/latency_hist/missed_timer_offsets/CPUx /sys/kernel/debug/tracing/latency_hist/timerandwakeup/CPUx The histograms are reset by writing non-zero to the file "reset" in a particular latency directory. To reset all latency data, use #!/bin/sh TRACINGDIR=/sys/kernel/debug/tracing HISTDIR=$TRACINGDIR/latency_hist if test -d $HISTDIR then cd $HISTDIR for i in `find . | grep /reset$` do echo 1 >$i done fi * Data format Latency data are stored with a resolution of one microsecond. The maximum latency is 10,240 microseconds. The data are only valid, if the overflow register is empty. Every output line contains the latency in microseconds in the first row and the number of samples in the second row. To display only lines with a positive latency count, use, for example, grep -v " 0$" /sys/kernel/debug/tracing/latency_hist/preemptoff/CPU0 #Minimum latency: 0 microseconds. #Average latency: 0 microseconds. #Maximum latency: 25 microseconds. #Total samples: 3104770694 #There are 0 samples greater or equal than 10240 microseconds #usecs samples 0 2984486876 1 49843506 2 58219047 3 5348126 4 2187960 5 3388262 6 959289 7 208294 8 40420 9 4485 10 14918 11 18340 12 25052 13 19455 14 5602 15 969 16 47 17 18 18 14 19 1 20 3 21 2 22 5 23 2 25 1 * Wakeup latency of a selected process To only collect wakeup latency data of a particular process, write the PID of the requested process to /sys/kernel/debug/tracing/latency_hist/wakeup/pid PIDs are not considered, if this variable is set to 0. * Details of the process with the highest wakeup latency so far Selected data of the process that suffered from the highest wakeup latency that occurred in a particular CPU are available in the file /sys/kernel/debug/tracing/latency_hist/wakeup/max_latency-CPUx. In addition, other relevant system data at the time when the latency occurred are given. The format of the data is (all in one line): () \ <- The value of is only relevant in the combined timer and wakeup latency recording. In the wakeup recording, it is always 0, in the missed_timer_offsets recording, it is the same as . When retrospectively searching for the origin of a latency and tracing was not enabled, it may be helpful to know the name and some basic data of the task that (finally) was switching to the late real-tlme task. In addition to the victim's data, also the data of the possible culprit are therefore displayed after the "<-" symbol. Finally, the timestamp of the time when the latency occurred in . after the most recent system boot is provided. These data are also reset when the wakeup histogram is reset.