Jeff Sauro • December 6, 2016

Researchers rely heavily on sampling.It's rarely possible, or even makes sense, to measure every single person of a population (all customers, all prospects, all homeowners, etc.).

But when you use a sample, the average value you observe (e.g. a completion rate, average satisfaction) differs from the actual population average.

Consequently, the differences between designs or attitudes measured in a questionnaire may be the result of random noise rather than an actual difference—what's referred to as sampling error.

Understanding and appreciating the consequences of sampling error and statistical significance is one thing. Conveying this concept to a reader is another challenge—especially if a reader is less quantitatively inclined.

Picking the "right" visualization is a balance between knowing your audience, working with conventions in your field, and not overwhelming your reader. Here are six ways to indicate sampling error and statistical significance to the consumers of your research.

That is, if there's no overlap in confidence intervals, the differences are statistically significant at the level of confidence (in most cases). For example, Figure 1 shows the findability rates on two websites for different products along with 90% confidence intervals depicted as the black "whisker" error bars.

Almost 60% of 75 participants found the sewing machine on website B compared to only 4% of a different group of 75 participants on website A. The lower boundary of website B's findability rate (49%) is well above the upper boundary of website A's findability rate (12%). This difference is statistically significant at p < .10.

You can also see that the findability rate for website A is unlikely to ever exceed 15% (the upper boundary is at 12%). This visually tells you that with a sample size of 75, it's highly unlikely (less than a 5% chance) that the findability rate would ever exceed 15%. Of course, a 15% findability rate is abysmally low (meaning roughly only 1 in 7 people will ever find the sewing machine).

This is my preferred method for displaying statistical significance, but even experienced researchers with strong statistics backgrounds have trouble interpreting confidence intervals and they aren't always the best option, as we see below.

The standard error is often used in multiple statistical calculations (e.g. for computing confidence intervals and statistical significance) so an advantage of showing just the standard error is that other researchers can more easily create derived computations.

The main disadvantage I see is that people still interpret it as a confidence interval, but the non-overlap no longer corresponds to the typical thresholds of statistical significance. Showing one standard error is actually equivalent to showing a 68% confidence interval. The 90% confidence intervals for the same data are shown in Figure 3. You can see the overlap in R1 and R2 (meaning they are NOT statistically different); whereas the non-statistical difference is less easy to spot with standard error error bars (Figure 2).

