Usability, Customer Experience & Statistics

Report Usability Issues in a User by Problem Matrix

Jeff Sauro • June 6, 2012

A lot happens when you observe a user during a usability test.

There's the interface, the utterances, the body language and the metrics.

This rich experience can result in a long list of usability issues

Issues can range from cosmetic (a user finds the type in ALL CAPS a bit much) to catastrophic (data loss), suggestions (a new feature) or positives (finding defaults helpful).

These usability issues should be described and usually accompanied by a screen shot or short video clip to illustrate their impact.  But don't stop there. Usability issues should be converted into a user by problem matrix.

User by Problem Matrix

A user by problem matrix organizes all the issues found by users.  Issues include problems, suggestions and positives but we still call it a problem matrix because most issues are usually problems.

An example of a user by problem matrix is shown below from an evaluation of the Enterprise Rental Car website.

 Figure 1: A User by Problem matrix for renting a car on This is not a QR code.

The users are numbered on the Y axis and the issues have each been given a number.  Each black square represents which user encountered which problem.  You can compute such matrices by task or across tasks for the whole usability test.

For this particular task, we observed 30 users encountering 28 unique issues while attempting to rent a car.  We sort our problem matrices with the most frequent occurring problems on the left and users having the most problems on top. This gives the most problem density in the upper left corner.
A User by Problem matrix provides :
  1. Frequency by Problem: For example, the most common problem (far left) involved users having trouble adding a GPS navigation to their rental (they had to do it after they entered their personal details). This problem impacted 24 out of 30 users (80%).  We can be 95% confident between 62% and 91% of ALL users would also have this problem.

  2. Frequency by User:  The first user in the matrix encountered 9 of the 28 problems (32%) while user 28 encountered only 1 problem (3%) and users 29 and 30 encountered no problems while renting the car.

  3. Problems that affected only one user (the long tail of usability problems): The last nine problems were encountered by only a single user (3%).

  4. The average problem frequency: By averaging together the problem frequencies we get a sense of how common problems are. For this task, the average adjusted problem frequency is 12%, meaning users have about a 12% chance of encountering a usability problem given this task scenario. Technical Note: See the discussion on why and how to adjust Average Problem Frequencies.

  5. The percent of problems likely discovered: Given the total number of problems and the total number of problems encountered only once after observing 30 users we've seen about 97% of problems for this task, interface and type of user. See Chapter 7 of Quantifying the User Experience and the calculator for how to compute this figure as well as discussion on the limitations of this approach.
Problem matrices illustrate the point that some problems affect a lot of users and are quickly revealed with just the first 2-3 users. Other problems, which can have just as large an impact, are often less common and don't show up until you've tested more users. 

You can also compare problem matrices. The following image shows four matrices for two tasks on two different rental car websites: Enterprise and Budget. 

Figure 2: User by Problem matrices and average problem occurrences (P) for two tasks and two rental car websites.

For renting a car, Enterprise had an adjusted average problem frequency of 12% compared to Budget's 9% for the same task.  For finding the nearest rental location for a specified address the average problem frequency was 15% for Enterprise and 7% for Budget. 

For both tasks, Enterprise had a higher frequency of usability problems and this was reflected in the lower task completion rates, longer task times and lower satisfaction scores we recorded in the comparative usability evaluation.

Taking that additional step of converting long lists of usability problems into a more numeric and visually digestible format provides both more clarity and credibility to usability testing.

About Jeff Sauro

Jeff Sauro is the founding principal of MeasuringU, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 5 books on statistics and the user-experience.
More about Jeff...

Learn More

UX Bootcamp: Denver: Aug 17-21, 2016

You Might Also Be Interested In:

Related Topics

Usability Problems, Usability Testing

Posted Comments

There are 6 Comments

May 8, 2014 | sumesh wrote:

how to prove 1-cos2x-10cos3x=0 

May 7, 2014 | George Adamides wrote:

How do you actually build a problem matrix? What about when there are several evaluators? 

June 24, 2012 | Bex Tindle wrote:

Thanks Jeff. Your site is my first port of call for all things quant and I've found it invaluable. I also recently bought Quantifying the User Experience and am working my way through - it's exactly the resource I need to help me become a quant and qual UX researcher. I really like the idea of a usability problem matrix and am doing one for my current round of testing. Unless I find a better way I'm just going to use Excel - can you suggest a better tool? 

June 8, 2012 | Martin Schmettow wrote:


glad to hear that the editors of CACM followed my suggestion.

I understand that my answer is unsettling, and my statistical model may turn out over-conservative. But, this is totally different from accepting the binomial model (with adjustments) as a good predictor. I provided quite some evidence against the binomial estimator in earlier papers.

However, what really worries me is that just a few days ago Mr. Nielsen revived his "Five users" claim. And here we are on the same page, I hope.

Thanks for discussing this issue.


"Just when I thought I was out...they pull me back in." (Micheal Corleone) 

June 7, 2012 | Jeff Sauro wrote:


Thanks for your comment. A couple notes. I was using the adjusted average (GT/Norm) of p to estimate the average problem occurrence, not the simple average (I added a note of clarification) which we discuss in the book and elsewhere on the site. So the unadjusted p is .15 but the adjusted p is .12.

I think you bring a good perspective to this debate and am familiar with the paper (given I was one of the reviewers). We agree there’s no magic number and the binomial will underestimate, but the GT/Norm approach, while not as mathematically elegant seems to provide the best number so far. I think your model has promise but it’s not quite there yet. It probably overestimates the number of undiscovered problems and more data and testing is needed to tweak your approach.

For example, the matrix I showed in Figure 1 was part of a larger test. We were actually able to watch 48 videos of users attempting this task. After watching 18 more users we only observed 5 more new issues. So at 30 users we predicted seeing 97% of problems, which is itself an estimate that while clearly not right on the money, speaks to the massive diminishing return.

A model that predicts we saw less than half the problems says that we should continue testing a lot more users (a practical interpretation). It became a stretch to find even those 5 new issues after watching those additional 18 users.

Statistics/probability are a useful approach in UX insofar as it leads to better decisions. It becomes less useful if the recommendations become divorced from the real constraints of applying those recommendations—the testing must stop at some point, budgets are limited and resources are often better allocated elsewhere.

While this might become more of an academic exercise I’d love to see more of your work with new sets of data to help refine your model. In the real world, when you’re running a usability study to find and fix problems it makes a lot more sense to stop after a handful of users and fix those major issues instead of spending more time trying to find every last possible problem.

June 7, 2012 | Martin Schmettow wrote:

Hi Jeff,

it is exactly the long tail of rarely occurring usability problems that render the binomial formula useless for estimating completeness. By virtue of the advanced model that I describe in [1], I come to a very different conclusions: the test shown in Fig.1 uncovered only 45% of problems, not 97% as you claim.

The reason: One third of problems are singletons, you've only seen them once. What do you think? How many of those might still be out there? Plot a histogram on the bottom margin sum and you'll see what I mean. Compare this histogram to the binomial distribution with p=0.15* and n=30 and you'll see why it's a bad choice. Just watch how many singletons it predicts (answer: 1).


[1] Schmettow, M. (2012). Sample size in usability studies. Communications of the ACM, 55(4), 64. doi:10.1145/2133806.2133824

*p is 0.15 in Fig.1 

Post a Comment


Your Name:

Your Email Address:


To prevent comment spam, please answer the following :
What is 5 + 5: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[6132 Subscribers]

Connect With Us

UX Bootcamp

3 Days of Hands-On Training on UX Methods, Metrics and Analysis
Denver: Aug. 17-19 2016

Our Supporters

Use Card Sorting to improve your IA

Loop11 Online Usabilty Testing

Userzoom: Unmoderated Usability Testing, Tools and Analysis


Jeff's Books

Customer Analytics for DummiesCustomer Analytics for Dummies

A guidebook for measuring the customer experience

Buy on Amazon

Quantifying the User Experience: Practical Statistics for User ResearchQuantifying the User Experience: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to Quantifying the User ExperienceExcel & R Companion to Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download