Linux

Linux Performance Bottleneck Cheatsheet: The Top Utilities for Every Situation

A structured guide to diagnosing Linux performance problems layer by layer — from application and kernel, through filesystem and block devices, to the network stack, CPU scheduler, memory, and hardware. Covers classic tools alongside eBPF/BCC utilities and notable open-source monitoring projects.

Many people have seen Brendan Gregg's classic diagram mapping CLI utilities to each Linux subsystem. It is a useful reference, but when things are on fire you tend to search the internet rather than methodically work through a diagram. This guide compiles those tools plus additional open-source utilities for monitoring, organised by subsystem so you can work top-down.

Linux performance tools overview diagram based on Brendan Gregg's work

Tools are organised by subsystem. Open-source utilities are briefly described at the end of each section — details are available on their respective GitHub pages.

Applications and System Libraries

Performance problems typically originate here, because applications determine what load the kernel and hardware see. Common issues at this layer include:

A single process consuming CPU with no obvious reason why
An application that freezes periodically, stuck in a system call
DNS resolution delays that appear to be network problems
Resource leaks — typically file descriptors and sockets
Short-lived child processes that complete between top updates

strace examines system calls, showing which calls are slow and how much time is spent in the kernel. The most practical invocation shows aggregate statistics:

strace -c -p <PID>

ltrace does the same for dynamic library calls. Useful when strace shows nothing unusual but the application still consumes CPU or runs slowly:

ltrace -f ./myapp

gethostlatency catches DNS resolution delays — situations where performance seems network-related but actually involves DNS. This eBPF/BCC tool requires bcc-tools to be installed:

sudo gethostlatency

For service-level logs, use journalctl rather than dmesg in systemd environments:

journalctl -u <service> -n 200 --no-pager

If the problem remains unresolved after this layer, third-party tools to consider:

rr — record/replay debugger that records and replays execution of an application and its process tree
bpftrace — tracing language and tool for writing custom eBPF observations
sysdig — universal system event sniffer
lnav — log viewer and journal analyser with filtering and search
procps-ng — process monitoring suite

System Call Interface and Kernel

When application-level analysis looks normal, go deeper. Kernel issues — a blocked driver, a problematic subsystem — manifest as slow system calls or a high percentage of time spent in kernel mode.

perf is the primary profiling tool. It exposes hardware CPU counters and shows which functions consume time. If kernel or driver functions dominate, the bottleneck is not application logic.

View overall CPU statistics:

perf stat -a -- sleep 2

Profile a specific process with call stack:

perf top -g -p <PID>

ftrace is the built-in kernel tracer. Via trace-cmd, observe how many times a system call executes and how long it spends in the kernel:

trace-cmd record -e sys_enter_read -e sys_exit_read
trace-cmd report

LTTng is a kernel and application tracing system for detailed temporal event correlation:

# Create a session
sudo lttng create syswide --output=/tmp/lttng-trace

# Enable kernel events
sudo lttng enable-event -k sched_switch,sched_wakeup
sudo lttng enable-event -k block_rq_issue,block_rq_complete
sudo lttng enable-event -k net_dev_queue,netif_receive_skb

# Start tracing
sudo lttng start

# Collect a sample (e.g. 30 seconds)
sleep 30

# Stop and finish
sudo lttng stop
sudo lttng destroy

# View the output
babeltrace /tmp/lttng-trace

eBPF/BCC utilities answer specific questions without modifying the kernel. Notable examples: opensnoop (tracks all open() calls), profile (frequency-based sampling), offcputime (sums time spent off-CPU), softirqs (interrupt timing), runqlen (scheduler run-queue histogram), execsnoop (watches process launches).

Third-party tools:

Inspektor Gadget — system call and kernel behaviour analysis utilities for containers
SystemTap — powerful kernel tracer with its own scripting language

File System and VFS

Problems at this layer typically involve how applications handle files, not the disks themselves. Mass open/close operations, constant metadata writes, thousands of small files, and descriptor leaks all stress the VFS layer and kernel caches, even when disk utilisation appears normal.

Start by checking what is open. lsof shows files, sockets, and devices opened by a process, and which processes are accessing a directory:

lsof -p PID
lsof +D /path/to/dir

The -p form is useful when a process hits its descriptor limit or when deleted log files still occupy disk space. The +D form reveals which processes are accessing a specific directory.

fatrace shows real-time filesystem activity — open, read, write, and close events. Identifies processes generating continuous small I/O or repeatedly rewriting files:

sudo fatrace -c

For short-lived files, filelife (BCC) shows processes creating and deleting files with very brief lifespans:

sudo filelife

pcstat shows whether a file's data is in the page cache or must be read from disk:

pcstat somefile

Monitor SLAB cache memory with slabtop. Rapidly growing inode or dentry caches indicate heavy file operations:

slabtop

For deep Ext4 diagnostics, ext4slower and ext4dist (BCC tools) reveal operation delays and latency distributions. Equivalent tools exist for XFS and Btrfs.

Third-party tools:

Linux System Monitor — console and GUI monitor
Baobab — scans local and remote folders and volumes
QDirStat — cross-platform disk space visualiser with GUI
Filelight — KDE tool displaying space usage as ring diagrams

Block Devices (Disks and Volumes)

Once the VFS layer shows high operation counts, determine whether the disk subsystem is keeping up. Queue depth, latency, I/O depth, and device state all matter here.

lsblk is the first command to run for orientation — it shows all block devices hierarchically:

lsblk -f

iostat shows per-device load and latency. Extended output includes utilisation (util), operation count (tps), read/write throughput, and latency. Update every 2 seconds:

iostat -x -z 2

iotop immediately shows which process is doing disk I/O — often the fastest way to identify a culprit during an incident:

sudo iotop

hdparm measures raw read speed and accesses low-level disk registers:

sudo hdparm -t /dev/sda

nvme retrieves NVMe drive statistics, PCIe bus data, and error logs:

sudo nvme smart-log /dev/nvme0n1

For detailed latency characteristics, eBPF/BCC tools are essential: biolatency (histogram of block I/O latencies), biosnoop (individual I/O events with PID and device details), biotop (processes ranked by I/O volume).

Third-party tools:

Netdata — full-featured real-time monitoring with a web dashboard
SysUsage — CPU, memory, I/O, network, and disk graphs via rrdtool
Monres — lightweight VPS resource monitor with alerting
Bandwidth Monitor NG — simple console network and disk I/O monitor
tophat — GNOME Shell extension for resource monitoring

Network Stack

Network problems almost always manifest as slowness, but the cause varies widely: interface errors, TCP retransmissions, connection queue overflows, DNS delays. The diagnostic approach should start broad and narrow down.

ss replaces netstat and shows all TCP/UDP connections and sockets. It quickly reveals who is holding connections, which ports are listening, and whether SYN queues are filling up:

ss -tunap

Interface statistics and routing use ip from iproute2:

ip -s link
ip route show

ethtool shows hardware-level network card properties and error counters:

ethtool -S eth0

nicstat provides interface statistics: packets per second, bitrate, and average packet size. Run without parameters for a summary of all interfaces.

nstat reads the kernel's built-in SNMP counters. With -z, view all packet counters including zero ones:

nstat -z

lldptool shows which switch port the server is connected to and what capabilities are advertised:

sudo lldptool -t -i eth0 -V

dropwatch is invaluable when packets disappear for no clear reason. It shows exactly where in the network stack packets are being dropped and distinguishes driver drops, stack drops, and filtering drops:

sudo dropwatch arp

solisten shows which processes are listening on ports and catches backlog overflow moments:

sudo solisten

# With 1-second update interval:
sudo solisten 1

BCC TCP tools: tcplife (tracks TCP session durations), tcpretrans (identifies retransmission problems), tcpdump (full packet capture for the CLI).

Third-party tools:

EtherApe — graphical network monitor with node and traffic visualisation
Simon — shows network activity on status panels
Linux-Resource-Monitor — browser-based live traffic graphs
ntopng — web-based network traffic monitoring

Scheduler and CPU

Scheduler and CPU problems manifest as rising load averages, long run queues, many processes in R state, or unexpected load concentration on specific cores. The cause may be application logic, interrupt storms, or hardware-specific behaviour.

Basic CPU Picture

mpstat shows per-core load and immediately reveals whether load is spread evenly or concentrated on one or two cores:

mpstat -P ALL 1

pidstat maps load to processes. High load typically surfaces here:

pidstat -u 1

Real-time monitoring: top, atop, htop, btop. Also worth knowing: tiptop — a top variant that shows hardware performance counters per process.

Queues and Scheduler

When load is high but CPU utilisation is not fully maxed out, check the scheduler run queue length with BCC's runqlen:

sudo runqlen 1

If processes regularly queue up, either the CPU cannot service them fast enough or there is a core imbalance issue.

Profiling

Once it is clear that the CPU is busy, profile with perf. Show time-consuming functions with call stacks:

perf top -g -p PID

Hardware counters — cycles, instructions, and cache misses:

perf stat -e cycles,instructions,cache-misses -p PID sleep 1

Few instructions per many cycles, or a high cache-miss rate, points to a hardware-level profile rather than an algorithmic one.

Interrupts

Unusual load distribution or suspected interrupt storms: use BCC utilities to summarise softirq and hardware IRQ processing time:

sudo softirqs
sudo hardirqs

CPU Hardware Level

turbostat displays per-core frequency, C-states, temperature, and power consumption in real time:

turbostat 1

rdmsr (msr-tools package) reads CPU model-specific registers for low-level diagnostics:

sudo rdmsr 0x10A

showboost shows whether Turbo Boost is actually engaging under load, per core:

sudo showboost

Third-party tools:

Parca Agent — continuously samples user and kernel stack traces at 19 Hz, producing pprof profiles
FlameGraph — builds flame graphs from perf, eBPF, and other profiling sources
Scalene — advanced profiler that separately attributes CPU time, system time, and memory pressure

Memory and Cache

Typical memory problems: insufficient memory, active swapping, fragmentation, bad pages, or incorrect NUMA distribution on multi-socket systems.

free for a quick overview:

free -h

vmstat provides dynamic process, memory, swap, I/O, and CPU statistics together. Set a 2-second interval:

vmstat 2

Non-zero values in the si (swap in) and so (swap out) columns almost always explain response-time degradation.

On multi-socket systems, understand where memory lives with numastat:

numastat
numastat -p PID

Growing dentry, inode, or other kernel structure caches show up in slabtop (covered above). For behaviour over time, use sar:

sar -r 1   # memory statistics
sar -W 1   # swap statistics

Useful for distinguishing one-time spikes from sustained trends.

When memory "vanishes" and standard tools give no answer, use memleak (BCC) to find non-freed allocations with call stacks:

sudo memleak

oomkill shows which processes were killed by the OOM Killer and when:

sudo oomkill

Additional useful commands: mdflush (tracks mdraid flush operations during RAID diagnostics), swapin (shows per-process page swap-in delays), and sorting processes by resident memory:

ps aux --sort=-rss

Third-party tools:

memray — Python memory profiler with flame graph output
heaptrack — tracks all C/C++ memory allocations with full stack traces

Hardware Level

When all metrics look normal but the system still misbehaves, the problem may lie lower: CPU frequency scaling, thermal throttling, memory bus saturation, or device health issues.

Hardware-level perf counters give the first signal:

perf stat -e cycles,instructions,cache-misses -p PID sleep 1

Check the current CPU frequency scaling policy:

cpupower frequency-info

Device health via SMART:

sudo smartctl -a /dev/sda
sudo nvme smart-log /dev/nvme0n1

Temperature, voltages, and power status via BMC/IPMI:

ipmitool sensor

Or via lm-sensors:

sensors

Overheating or voltage drops sometimes explain unusual behaviour that appears only under load.

For a quick snapshot of the whole system in a single output, dstat combines CPU, disk, network, and memory metrics:

dstat -tcmnd

Third-party tools:

Open Hardware Monitor — popular cross-platform hardware monitor
hwinfo — hardware detection library and CLI tool
lshw — detailed computer hardware configuration reporter

Conclusion

The diagnostic approach is straightforward in principle: start at the application layer, confirm or rule out problems at each level, then go one layer deeper. Use standard tools first — they are fast, widely available, and usually sufficient. When they give no clear answer, reach for the eBPF/BCC utilities, which can answer very specific questions with minimal overhead. For deeper study of this methodology, Brendan Gregg's website and books are the definitive reference — though be warned: it is easy to spend hours there.

Linux Performance Bottleneck Cheatsheet: The Top Utilities for Every Situation

Applications and System Libraries

System Call Interface and Kernel

File System and VFS

Block Devices (Disks and Volumes)

Network Stack

Scheduler and CPU

Basic CPU Picture

Queues and Scheduler

Profiling

Interrupts

CPU Hardware Level

Memory and Cache

Hardware Level

Conclusion

Further reading

Why Airships Never Took Off. Part 12: Italian Semi-Rigid Airships

Why Airships Never Took Off. Part 11: Aircraft Carriers in the Sky

Why Airships Never Took Off. Part 10: The Most Famous and Successful Zeppelin

Why Airships Never Took Off. Part 9: Ashes of War and New Opportunities