Usability, Customer Experience & Statistics

What happens to task-ratings when you interrupt users?

Jeff Sauro • November 30, 2010

In usability testing we ask users to complete tasks and often ask them to rate how difficult or easy the task was. Does it matter when you ask this question?

What happens if we interrupt users during the task instead of asking it after the task experience is over?

Almost ten years ago researchers at Intel asked 28 users to complete various tasks across websites such as,, and

Half the users were interrupted after 30 seconds and 120 seconds and asked to rate the task ease on a 7 point scale (concurrent ratings). The other half of users attempted the same tasks and answered the same question except only after the task was completed (post-task ratings).

They found users rated tasks 26% higher after the task versus during the task. The concurrent group's post-task ratings were also lower than the post-task only group--suggesting that the act of concurrently rating itself lowers post-task ratings.

5 Seconds Tests using SUS

In earlier research, I found that when users are interrupted after only five seconds, their System Usability Scale (SUS) scores are the same as users who had no time limit.  However, users who were interrupted after 60 seconds had higher SUS ratings than both 5 second and no-time limit groups.

These results suggested interruption either had no effect or actually increased ratings on post-test questionnaires.

Task-Level Ratings are Different than Overall Impressions

One possibility for the differences is simply that users respond to task-level questions differently than they do overall website usability questions. To investigate this phenomenon further, I recruited 66 users to interrupt during website task attempts.

Tasks Interrupted at 5 and 60 seconds

I randomly assigned users to one of two retail websites ( and and one of three interruption conditions: 5 seconds, 60 seconds or no interruption. Users were all asked to answer the 7-point Single Ease Question(SEQ) after attempting to locate an item from the stores.

The graph below shows the results concurred with the earlier research. Users who were interrupted at 5 and 60 seconds rated the task as more difficult than users who had no time limit [F (2, 65)=3.45, p <.04 ].

Figure 1: Task ease ratings were lower when rated after only having 5 or 60 seconds to complete the task

The full time group's average rating was 50% higher than the 5 second group and 26% higher than the 60 second group. In the earlier research conducted at Intel, the 30 and 120 second groups were combined so the 60 second group is likely the best comparison--both were 26% higher.

Users also completed the SUS and differences between groups were not statistically different (p > .10).

With this sample size there is a definite difference between the 5 second and full groups, however, the 60 second group is only marginally different than each adjacent group. There also appears to be a linear pattern: ratings get better with more time. A larger sample size would be needed to confirm this pattern.

When should task ratings be taken?

Are users artificially rating the tasks as easier at the end of the task, or are they artificially rating them too hard during the task?  It's hard to say and I can think of compelling explanations for both theories. 

Give me more time!: Interrupted users might rate tasks as more difficult because they didn't feel they had enough time to complete the task and so think it is harder.

Oh, that task was easy because I completed it:  Post task ratings might be inflated because users give less emphasis to the negative aspects of the task and are affected by a sense of accomplishment.


A few things to consider about interrupting users to obtain concurrent ratings:
  • Interrupted users will rate tasks as more difficult but they don't necessarily consider the overall website as more difficult
  • Concurrent ratings will add time to the test session and task-time (around 15% more)
  • Don't mix ratings: compare post-task ratings to other post-task ratings and concurrent ratings to other concurrent ratings.
  • Concurrent ratings might be better for diagnosing interaction problems (much like eye-tracking) than post-task ratings, since they allow you to identify more precise problem points than at the task level

About Jeff Sauro

Jeff Sauro is the founding principal of MeasuringU, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 5 books on statistics and the user-experience.
More about Jeff...

Learn More

You Might Also Be Interested In:

Related Topics


Posted Comments

There are 5 Comments

December 8, 2010 | Jennifer Romano Bergstrom wrote:

What about waiting until the entire session is complete, after 5-10 tasks, and asking participants to rate satisfaction with the interface? 

December 1, 2010 | Jeff Sauro wrote:

I generally take an "all of the above approach" to usability measurement. Errors are great and underused metrics (mostly because they take more effort to collect and interpret).
Ease ratings tell you something that errors, time and completion rates don't tell you and they are generally easy to collect on both prototypes and completed systems. They provide a great measure for making comparisons to benchmarks or previous iterations and as this data suggests, tell you something post-test ease ratings don't. 

December 1, 2010 | Jeff Sauro wrote:


You're right, getting interrupted while trying to do something else is annoying. Why would we not expect this to carry over to those ratings?  

December 1, 2010 | Whitney wrote:

How about, "I'm annoyed by being interrupted, so I'll punish you for doing so."

I'll confess now that when I find a satisfaction questionnaire that has required questions or other rude, annoying features, I routinely give those questions the worst possible rating.  

December 1, 2010 | Michael Van der Gaag wrote:

What happens to ratings if you describe to users the difficulty you observed... or the divergence from the optimal path?
Anecdotally, while I have not used this approach in exclusion of users' initial ratings, I do find that users will reconsider their rating after divergent paths or difficulty is described. While the rating may be considered "tainted', it does help point out areas of difficulty in an interaction.
Some clients are opting out of ease of use ratings because they are less reliable than error rates and observed difficulty. Thoughts? 

Post a Comment


Your Name:

Your Email Address:


To prevent comment spam, please answer the following :
What is 3 + 4: (enter the number)

Newsletter Sign Up

Receive bi-weekly updates.
[6634 Subscribers]

Connect With Us

Our Supporters

Use Card Sorting to improve your IA

Loop11 Online Usabilty Testing

Userzoom: Unmoderated Usability Testing, Tools and Analysis


Jeff's Books

Customer Analytics for DummiesCustomer Analytics for Dummies

A guidebook for measuring the customer experience

Buy on Amazon

Quantifying the User Experience 2nd Ed.: Practical Statistics for User ResearchQuantifying the User Experience 2nd Ed.: Practical Statistics for User Research

The most comprehensive statistical resource for UX Professionals

Buy on Amazon

Excel & R Companion to the 2nd Ed. of Quantifying the User ExperienceExcel & R Companion to the 2nd Ed. of Quantifying the User Experience

Detailed Steps to Solve over 100 Examples and Exercises in the Excel Calculator and R

Buy on Amazon | Download

A Practical Guide to the System Usability ScaleA Practical Guide to the System Usability Scale

Background, Benchmarks & Best Practices for the most popular usability questionnaire

Buy on Amazon | Download

A Practical Guide to Measuring UsabilityA Practical Guide to Measuring Usability

72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software

Buy on Amazon | Download