Measuring IO Performance and IOWaits

Disk subsystems can be a significant performance bottleneck on an SDElements system if incorrectly configured or under-provisioned.

When working with disk systems there are two important performance metrics:

  1. IOPS
  2. Throughput

Throughput is the speed of sequential data transfer, while this is an important metric for an application like a file server it's mostly irrelevant for systems that depend on databases and non-sequential data access. Random seek operations are measured in Input/Output Operations per second (IOPS) and are a much better indicator of SDElements performance.

When a server reaches a bottleneck with total throughput this will show up in the data rates being transferred to the disk but if random seek operations are the cause this will show up as CPU load (The CPU spends it's time waiting for IO). This can be determined by looking at the IOWAIT on the server. Note that this needs to be done as close to peak load as is possible.

The simplest way to determine IOWAIT across the server is to use the "top" utility where the "wa" percentage indicates how much of the CPU time is spend waiting for disk IO.

[sde_admin@server ~]$ top
top - 15:30:53 up 173 days, 18:53, 1 user, load average: 0.40, 0.18, 0.12
Tasks: 183 total, 1 running, 182 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.3%us, 2.5%sy, 0.0%ni, 96.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 3924512k total, 2963652k used, 960860k free, 335512k buffers

For more detailed analysis a tool such as iostat will need to be installed and used to view per device/disk and even per process information.

To install the iostat utility on Centos/RHEL use the following command:

  • sudo yum install sysstat iotop

To install the iostat utility on Debian/Ubuntu use the following command:

  • sudo apt-get install sysstat iotop

Once the package has been installed iostat can be started with the -x 1 parameters to update once per second.

[sde_admin@server ~]$ iostat -x 1
Linux 2.6.32-573.18.1.el6.x86_64 ( 24/02/17 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle
2.89 0.02 0.65 0.04 0.00 96.40 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.05 3.86 0.15 2.24 6.53 48.86 23.12 0.01 3.81 5.70 3.68 0.73 0.18
dm-0 0.00 0.00 0.17 6.07 4.08 48.56 8.44 0.01 1.38 8.62 1.18 0.27 0.17
dm-1 0.00 0.00 0.02 0.04 0.17 0.30 8.00 0.00 23.29 6.28 32.72 0.70 0.00
dm-2 0.00 0.00 0.01 2.60 0.69 20.63 8.15 0.00 0.96 18.14 0.86 0.27 0.07

The relevant numbers here are the await, r_await, w_await which correspond to the average wait time per device.

If it's determined that disk load is causing issues the iotop command can be used to determine which processes are responsible.

# iotop
 Total DISK READ: 8.00 M/s | Total DISK WRITE: 20.36 M/s
 15758 be/4 root 7.99 M/s 8.01 M/s 0.00 % 61.97 % postgres
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request