Usability, Customer Experience & Statistics

10 Tips for Benchmark Usability Tests

Jeff Sauro • February 14, 2011

How usable is a website or software application? 

To know if design changes improved the usability of an application, you first need a baseline measure of usability from a benchmark test.

Here are 10 tips to use when planning your next benchmark test.

  1. Recruit for representativeness over randomness: It will be difficult to select a random group of users for your tests. Worry less about the random selection than how well your test users match your entire user-population. If you need to compare new versus existing users or domestic versus international users, it is more important to have these users proportionally represented than trying to randomly select them. Even clinical trials have problems with random selection and most usability tests don't involve life and death decisions.

  2. Triangulate using multiple metrics:  Usability is the combination of effectiveness, efficiency and satisfaction (ISO 9241 pt 11). Your metrics should tap into each of these constructs. Typically this is done using a combination of completion rates, time on task and task-level and test-level satisfaction measures and errors.

  3. Estimate Sample Size using the desired margin of error:  For tests where no comparisons are made, the needed sample size is derived from how precise you want your measures to be. You can use the 20/20 rule for quick calculations. To achieve a 20% margin of error you need 20 users. To cut the margin of error in half you need to quadruple the sample size. So a 10% margin of error requires approximately 80 users. A 5% margin of error requires 320 users. At this sample size, when you report your average completion rates, times and satisfaction scores, you will have confidence intervals that are 10 percentage points wide (a margin of error of plus or minus 5%).

  4. Counterbalance tasks: Alternate the presentation of the tasks to minimize undesirable sequence effects. Often the first one or two tasks have lower performance metrics because users are still getting acquainted with the test and application. The more time they spend completing tasks, the more their performance (time, completion rates) improves.  You will want to spread the learning effects evenly across tasks by counterbalancing or randomizing the task-order.

  5. Collect both Post-Test and Post-Task Satisfaction:  Usability questionnaires like the SUS provide more stable estimates of the application's overall impressions of usability. They are less sensitive to task performance. Post-task questions like the Single Ease Question (SEQ) are more sensitive to usability problems, errors and higher task-times. Task-level satisfaction can be combined with the other task level metrics into a single usability metric.

  6. Combine measures into a Single Usability Metric for reporting: When you record multiple metrics to measure usability you can standardize them and combine them into a Single Usability Metric (SUM). Having a single score makes it easier to convey the usability of a task or system on dashboards and reports and you still retain all the information in the component metrics for more detailed analysis.

  7. Use confidence intervals around all your metrics: Data from any sample will deviate from the entire user population by some amount.  This difference is called sampling error and is quantified using the margin of error. Margins of error are found by computing confidence intervals around all your measures. They tell you the most likely range of the total user-population average. If you need help with the computations, the Quantitative Starter Package will do the work for you.

  8. Conduct a pilot test: Even having one or two people complete your usability test can reveal obvious flaws with your test design or with the application prior to the full test. A pilot test can reduce ambiguities in task scenarios, prevent embarrassing system problems and improve the quality of your analysis.

  9. Include some cheater/speeder detection for remote usability tests: Around 10% of online usability test-takers and survey participants will rush through your study just to collect the honorarium.  You will want to identify these speeders using some sort of question and remove them from your analysis. Having a high percentage of speeders/cheaters (>20%) suggests your tasks are too complex or the test is too long.

  10. When you record task time don't throw away the failed task-times. You can report: average task completion time, average time on task and average time to failure. All three of these can be both valuable diagnostic tools and used for comparisons in your next benchmark test or after design changes.

About Jeff Sauro

Jeff Sauro is the founding principal of MeasuringU, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 5 books on statistics and the user-experience.
More about Jeff...

Learn More

You Might Also Be Interested In:

Related Topics

Usability Testing, Benchmarking, Summative

Posted Comments

There are 3 Comments

August 31, 2016 | KL83 wrote:

Can someone please clarify if you this article is saying to use the SUS and SUM in conjunction? Or is it just giving two different ways to report on the data and the SUM is better for when giving an overall report of system performance?  

August 4, 2011 | Usertesting wrote:

Great post, Jeff. Really interesting information here 

June 28, 2011 | Jacob from IntuitionHQ wrote:

Great post, Jeff. Really interesting information here. With our testing tool ( we've had a very simliar experience, although since we don't generally reward users for taking the test, we don't have an issue with people rushing through.

Even with a simple tools like ours, however, we do notice a learning curve for the first one or two questions. These points really do match up with our own experience, which I guess is a good thing.

Thanks very much for sharing. 

Post a Comment


Your Name:

Your Email Address:


To prevent comment spam, please answer the following :
What is 5 + 4: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[6389 Subscribers]

Connect With Us

Our Supporters

Use Card Sorting to improve your IA

Userzoom: Unmoderated Usability Testing, Tools and Analysis

Loop11 Online Usabilty Testing


Jeff's Books

Customer Analytics for DummiesCustomer Analytics for Dummies

A guidebook for measuring the customer experience

Buy on Amazon

Quantifying the User Experience 2nd Ed.: Practical Statistics for User ResearchQuantifying the User Experience 2nd Ed.: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to Quantifying the User ExperienceExcel & R Companion to Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download