Usability, Customer Experience & Statistics

What five users can tell you that 5000 cannot

Web-Analytics and User-testing

Jeff Sauro • June 16, 2010

With usability testing it used to be that we had to make our best guess as how users actually interacted with software outside a contrived lab-setting. We didn't have all the information we needed. Knowing what users did was in a sense a puzzle with a lot of missing pieces.  Web-analytics provides us with a wealth of data about actual usage we just never had before. With real-time access to click-paths, time on pages and navigation paths we now know much more about what users have done.  Where we once didn't have enough information, now we have a new problem--too much information.  Web analytics is transforming user behavior from a puzzle to a mystery. Mysteries require judgment and the assessment of uncertainty. 

To solve the mysteries of why users are doing what they're doing, we still need to observe users and ask them about their intentions and expectations. This can help solve the mystery of why. A small lab based study of a small number of users can tells us things analytic data from 5000 cannot.
Web analytics is transforming user behavior from a puzzle to a mystery.

Why were users downloading the wrong version?

Recently I was assisting a team working on a consumer software product. There were problems with a trial version available for download off their company website.  Users were calling tech-support because the 64-bit version they downloaded was incompatible with their operating system.  As you can imagine, getting users to install a trial and then convert to a paying customer is an important business strategy, so any impediment to installation hits the bottom line.

The analytic data provided a partial answer to the mystery. There was a lot of data showing users downloading different versions of the software (some 32 -bit but most 64-bit). But you can't tell what the users intended to download. Were users who mistakenly downloaded the 64 bit version mislead by what they saw on the page? Did they understand the difference between the two versions? With some deeper analytic mining the operating systems of the users revealed many more should've been downloading the 32-bit version.

An observational study was conducted to see what users might be doing.  Eleven users were observed as they browsed the website, picked their products and went for the download. Three of these eleven users downloaded the 64bit version of the product. A few minutes into the installation these users got the operating system error. 
The mystery generated from the web-logs was easily solved from watching and asking a few users
They needed to download the 32-bit but instead downloaded and attempted to install the wrong version. Why?


It was obvious from watching the users with their mouse movements and asking them why they were confused. It turned out a design element on the download page was luring some people to the 64-bit download. The mystery generated from the web-logs was easily solved from watching and asking a few users.

Ah, but only 3 out of 11 users had the problem. You can hear the Analytics Team and Marketing department dismissing this result as not being "statistically significant." And yet it is. If we see 3 out of 11 users have a problem we can be 95% sure between 9% and 52% of all users will have that problem downloading the correct version.

It is easy to prove something is NOT usable with small sample sizes

The problem with small sample sizes is that we're only able to reliably detect major issues (issues that affect a lot of users). The good news about small sample sizes is that we're detecting issues that matter!  So when you see a problem occur repeatedly with a small sample test, it means a problem is probably affecting a lot of your users. Small sample sizes don't do a good job of finding problems that only affect a small portion of the users. As Jim Lewis likes to say:  "It is easy to prove something is NOT usable with small sample sizes. It is hard to show that something IS usable with small sample size."

Within hours a new design element was mocked-up, approved and uploaded to the web. Within 24 hours  the live A/B testing results  showed a 4 percentage point increase in the download rate of the 32-bit trial version. This was definitely an improvement but the observational study showed that even the most conservative estimate suggested at least 9% of users are clicking the wrong trial. Apparently the new version only solved part of the problem. With a 4% increase, there's still a lot more to fix. And more is being done: eliminating the choice altogether.  A new version of the trial page will detect the correct operating system based on the user's web-signature and suggest the correct version for download (after all, not everyone knows whether their system is 32 or 64bit).
There will be a continued demand for user researchers who can quantify observational data and make the most of analytic data.

Analytic data is not a replacement for user testing--but it's a good place to start

Analytic data is an easy first place to start understanding user behavior, but it is not a replacement for user testing. While having more information may reduce the puzzle problem it doesn't address the mystery of why.  Just like the classic whodunit mystery, with murder weapons, motives and suspects—we need to solve the mystery of why; why users do what they do. To answer that we need the classic tools of user-research—the small sample observational studies that tell us so much about why users do what they do. There will be a continued demand for user researchers who can quantify observational data and make the most of analytic data--the Quantitative Starter package can help you get started.

About Jeff Sauro

Jeff Sauro is the founding principal of MeasuringU, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 5 books on statistics and the user-experience.
More about Jeff...

Learn More

You Might Also Be Interested In:

Related Topics

Sample Size, Analytics

Posted Comments

There are 10 Comments

June 10, 2014 | Robert wrote:

Thanks for this insight!rnOn another note, as Susan pointed out, the error message is horrible. It is incomprehensible even for tech-savvy users (why isn't it supported?), is rude, blames the user and does not offer help.rnrnThe provided solution is not a redesign of the error message, but a change in the underlying system: it now helps the user to automatically download the right version. This is the best approach, as Don Norman pointed out: rnrn"Error messages punish people for not behaving like machines. It is time we let people behave like people. When a problem arises, we should call it machine error, not human error: the machine was designed wrong, demanding that we conform to its peculiar requirements. It is time to design and build machines that conform to our requirements. Stop confronting us: Collaborate with us." is probably a standard error message used by the ramework the software is written in, anyway. 

May 29, 2012 | Lucas wrote:

Hey Jeff,
One thing that I'm having a hard time understanding is: how do you determine if a 9-to-52% issue is a BIG problem or a "NiceToHave". I mean, if the issue you found out only occurs to 9 in 100 people it's not a big deal, bue what if it happens to 1 out of 2? That'll be a mess!
So, I understand the value of understanding the "Why?" of an issue but I'm not sure you can determine the impact of it like this...
What do you think?


June 23, 2010 | Jeff Sauro wrote:

I like your depiction of user-researchers changing a light-bulb—I think there’s a good joke in there somewhere. Problems can appear to be so obvious after the fact that a little bit of common sense would have seemed to catch them. Such hind-sight bias is common with usability problems but if there is no evidence that users will have a problem with a design element it becomes harder to make changes in light of other business rules. One complication in this situation was that users can be downloading for their own machine or downloading to install on a network—meaning the auto detect isn’t a no-brainier and will also likely introduce new issues with users who need to deploy the software on a network. 

June 23, 2010 | Jeff Sauro wrote:

Heuristic Evaluations can catch problems in designs and are best when performed by between 2-3 trained professionals. Problems have a way of seeming patently obvious after they’ve been discovered. With the number of eyes that had seen and approved this screen I suspect this one would also have fallen through the cracks even with 1-3 experts reviewing it. 

June 22, 2010 | scott wrote:

I like how the article points out how small samples can effectively highlight problems with a product. The \"insufficient sample\" argument is one that I hear from time to time. 

June 18, 2010 | Beverly Freeman wrote:

Great article. However, if there was a design element that was luring people to the wrong download, couldn't that have been caught during a heuristic evaluation? A trained human factors professional could look at the download screen and predict the issue without running a single participant. 

June 18, 2010 | Wendy Castleman wrote:

Insightful as always, Jeff. I think the key is that most people don't understand the difference between reliability (repeatability) and validity. So, if in a small sample you find a usability problem, that is a valid problem. It may not be a reliable one, but it is still valid, and should therefor be taken seriously. 

June 17, 2010 | Susan wrote:

The error message could be greatly improved to explain the problem and offer the correct link too. 

June 16, 2010 | Briac wrote:

You\'re absolutely right on the \"missing why\", this is how user testing completes analytics. However, I don\'t think this example is really supportive of the whole research process, whether analytics of user testing. Indeed, you really could have thought of that on the first place. You state that downloading the right version is the business objective. Detecting the operating system and offering the right version sounds like a no-brainer, the cost involved in developing the necessary code and design is probably less than doing the research. Before thinking of your conversion rate and failure points, you think of designing a pleasant user experience, and doing the extra mile by suggesting the right version is value-added anyway.
I mean, it\'s not even solving a problem, even before knowing that there is a problem, you know it\'s a really nice feature!
Now, for the management to make the decision to invest in that feature, you have two choices. Either you have the opinion that it\'s a good design choice, either you try to find out if there really is a need for it (a problem) and then justify by the numbers why the solution is needed. The first approach requires faith-based decisions made by a good designer, the second a team of analysts / researchers. I\'m not fond of opinionated decisions but seriously, in that case research if an overkill.

When the management is coming to you with this information \"Users were calling tech-support because the 64-bit version they downloaded was incompatible with their operating system\", it\'s gold!
It should just tell you right away \"I can\'t find the right link because either I have no clue what to choose or you have a bad design making the choice unclear\". What else?
You even make the assumption yourself: \"after all, not everyone knows whether their system is 32 or 64bit\", so why not offer this feature in the first place?

This is the kind of example that makes me picture a team of researchers trying to change a lightbulb, \"we tried to understand why people can\'t walk around in the dark but really, they might just need light\" 

June 16, 2010 | Ben Shaw wrote:

Thanks Jeff, it really helps having you continue to arm us with arguments re the benefits and validity of doing small-number studies. 

Post a Comment


Your Name:

Your Email Address:


To prevent comment spam, please answer the following :
What is 1 + 1: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[6634 Subscribers]

Connect With Us

Our Supporters

Use Card Sorting to improve your IA

Loop11 Online Usabilty Testing

Userzoom: Unmoderated Usability Testing, Tools and Analysis


Jeff's Books

Customer Analytics for DummiesCustomer Analytics for Dummies

A guidebook for measuring the customer experience

Buy on Amazon

Quantifying the User Experience 2nd Ed.: Practical Statistics for User ResearchQuantifying the User Experience 2nd Ed.: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to the 2nd Ed. of Quantifying the User ExperienceExcel & R Companion to the 2nd Ed. of Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download