Jeff Sauro • June 14, 2004

The benefit of using a z-score in usability metrics was explained in "What's a Z-Score and why use it in Usability Testing?" this article discusses different ways of calculating a z-score.The short answer is: It depends on your data and what you're looking for. If you've encountered the z-score in a statistics book you usually get some formula like:

| Verbal | Quantitative |

Mean | 469 | 591 |

StDev | 119 | 148 |

By plugging in your scores you get the following:

Verbal z = (630 - 469) ÷ 119 = 1.35σ

Quantitative z = (700 - 591) ÷ 148 = .736σ

To convert these sigma values into a percentage you can look them up in a standard z-table, use the Excel formula =NORMSDIST(1.35) or use the Z-Score to Percentile Calculator (choose 1-sided) and get the percentages : 91% Verbal and 77% Quantitative. You can see where your score falls within the sample of other test takers and also see that the verbal score was better than the quantitative score. Assuming the sample data was normally distributed, here's how the scores would look graphically:Sample |

USL: 120 |

To calculate the process sigma you subtract the mean (104) of the sample from the target (120) and divide by the sample standard deviation (12). For Sample 1 the process sigma is -1.32σ. The visual representation of the data can be seen below:

In the case of task times, a negative process sigma is ideal--as you want more people completing the task below the task time, not above it. You can simply drop the negative when communicating the results in the event it causes confusion. If you were to make radical improvements to the UI and then sampled another set of ten users, here are more results:

Sample 2 |

60 75 99 88 65 72 75 72 87 65 |

USL: 120 Mean: 75.8 StDev: 12.14 |

In the redesign, the average of the new sample is well below the spec limit and the process sigma is now very high. The corresponding defect area is now only .01% and the quality area is 99.98%

Of course having users perform that much below the spec limit is not very common due to the inherent variability in user performance.

If you need more help with z-scores, see the Crash course in Z-scores, a tutorial with plenty of pictures, examples and review questions for you to grasp this concept.

