# Confidence Interval Calculator for a Completion Rate

Jeff Sauro • October 1, 2005

Use this calculator to calculate a confidence interval and best point estimate for an observed completion rate. This calculator provides the Adjusted Wald, Exact, Score and Wald intervals.

 Input Table Passed Total Tested Confidence Level: 99%98%95%90%85%80%75%70%65%60%55% Likely Population Completion Rate Between .5 and 1Unknown
 Results Table Confidence Intervals Point Estimates Low High Margin of Error* Adj. Wald Best Estimate Exact MLE Score LaPlace Wald Jeffrey's Using Alpha: Wilson

### Explanation

The Adjusted Wald method should be used almost all the time. For exceptions, see below.
For a detailed discussion of binomial confidence intervals with small samples, see the HFES paper and for a discussion on the best point estimate see the JUS paper.

The adjusted Wald interval (also called the modified Wald interval), provides the best coverage for the specified interval when samples are less than about 150. In other words, if you want a 95% confidence interval then this formula will produce an interval that will contain the observed proportion on AVERAGE about 95 percent of the time. It uses the Wald Formula but is "adjusted" in that it adds half of the squared Z-critical value to the numerator and the entire squared critical value to the denominator before computing the interval i.e (x+z2/2)/(n+z2). For example, a 95% confidence level uses the Z-critical value of 1.96 or approximately 2. If you observe 9 out of 10 users completing a task, this formula computes the proportion as( 9 + (1.962/2) )/ (10 + (1.962)) = approx. 11/14 and builds the interval using the Wald formula. Note: Prior to March 1st 2006, this calculator computed this interval by adding one z-value to the numerator and a squared z-value to the denominator.

Exact Method

The Exact method was designed to guarantee at least 95% coverage, whereas the approximate methods (adjusted Wald and Score) provide an average coverage of 95% only in the long run. Use the Exact method when you need to be sure you are calculating a 95% or greater interval - erring on the conservative side. For example, at the population completion rate of 97.8% both the Score and adjusted Wald methods had actual coverage that fell to 89%. When the risk of this level of actual coverage is inappropriate for an application, then the Exact method provides the necessary precision.

Score Method

The Score method provided coverage better than the Exact and Wald methods but falls short of the adjusted Wald method. Additionally, its drawback is its computational difficulty and its poor coverage for some values when the population completion rate is around 98% or 2%, regardless of sample size (Agresti and Coull, 1998). The only advantage in using the Score method is that it provides more precise endpoints when the ends of the intervals are close to 0 or 1. For some values (e.g. 9/10) the adjusted Wald's crude intervals go beyond 0 and 1 and a substitution of >.999 is used. For the score method, the upper interval is .9975.

Wald Method

The Wald method should be avoided if calculating confidence intervals for completion rates with sample sizes less than 100. Its coverage is too far from the nominal level to provide a reliable estimate of the population completion rate. As the sample size increases above 100, all four methods converge to similar intervals. Use the Wald as a point of reference or for larger sample sizes.

* The "Margin of Error" values are half the width of the Confidence Intervals. For the adjusted wald and wald formulas, you can use the proportion +/- the confidence interval. For the exact method, the intervals are not symmetrical as the proportion complete gets further from 50% (e.g. 90% or 15%). Therefore the margin of error should be only used at as an approximation for the exact method and the actual values above and below the proportion should be reported.

When All Users Pass or Fail

With small sample sizes, it is a common occurrence that all users in the sample will complete a task (100% completion rate) or all will fail the task (0% completion rate). For these scenarios, it is often unpalatable to report 100% or 0%. After all, how likely is it that the true population parameter is as extreme as 100% or 0%? The Best Estimate box provides the best point estimate under these conditions and uses the LaPlace method for calculation. While this value may seem too far from the observed 100%, its attractiveness is that it is a function of the sample size-- the greater the sample size, the closer this value will be to 100%.
Calculation Note: When the observed completion rate is 100% or 0% there cannot be a two sided confidence interval (since you cannot have more than 100% or less than 0%). In these cases it is necessary to use a z-critical value for a one-sided confidence interval. For example, a 95% two sided confidence interval uses the z-score of approximately 1.96, a one sided interval uses a z-score of approximately 1.64.

Likely Population Completion Rate

The two options in this drop-down:

Between .5 and 1
If you conduct usability tests in which your task completion rates are roughly restricted to the range of .5 to 1.0, then select "Between .5 and 1" in the drop-down. See the Best Estimates section below for how the point estimate is calculated with this option. Unknown
If your task completion rates typically take a wide range of values, uniformly distributed between 0 and 1, then select "Unknown" from the drop down. If you don't know either way then leave it at "Unknown." This selection will use the LaPlace method for the best estimate of the completion rate.

Point Estimates

Whereas a confidence interval describes a likely range or interval of values, a point estimate describes a single value- a point as an estimate of an unknown parameter in the population. The chance that the sample point estimate is the same as the unknown population completion rate is extremely unlikely. For that reason, you should always compute a confidence interval when reporting a completion rate. It is much more informative than a point estimate since it provides a reasonably likely boundary for the population completion rate.
Although it receives little attention in introductory statistics classes and has had little influence on measurement practices in the field of usability engineering, there is a rich history of alternative methods developed to achieve a more accurate point estimate of p than simply dividing the number of successes by the number of attempts (for example, see Chew, 1971; Laplace, 1812; Manning & Schutze, 1999). This need is most evident when there is an extreme outcome, specifically, when x=0 (0%) or x=n (100%) - especially, but not exclusively, when sample sizes are small. Four estimation methods that pertain to situations more common in usability testing are detailed below:

MLE:(Maximum Likelihood Estimate)(x / n)

The MLE is the sample proportion or the number of users succeeding divided by the total attempting. It is the most common point estimate reported.

LaPlace (x+1)/(n+2)

A famous large-sample problem comes from the seminal work of Laplace in the early 1800s. He posed the question of how certain you can be that the sun will rise tomorrow, given that you know that it has risen every day for the past 5000 years (1,825,000 days). You can be pretty sure that it will rise, but you can't be absolutely sure. The sun might explode, or a large asteroid might smash the Earth into pieces. In response to this question, he proposed the Laplace Law of Succession, which is to add one to the numerator and two to the denominator ((x+1)/(n+2)). Applying this procedure, you'd be 99.999945% sure that the sun will rise tomorrow - close to 100%, but slightly backed away from that extreme. The magnitude of the adjustment is greater when sample sizes are small. For example, if you observe two out of two successes and apply the LaPlace procedure, then your estimate of p is 75% (x+1=3, n+2=4, p=3/4) rather than 100%. If you had observed two failures, then your estimate of p is 25% (x+1=1, n+2=4, p=1/4) rather than 0%. LaPlace in essence is saying, the next result is a toss up, so give each alternative an equally likely chance of occurring.

Wilson (x+z2/2)/(n+z2)

Wilson's point estimate is the midpoint of the adjusted wald interval. It is derived by adding half a squared critical value to the numerator and a squared critical value to the denominator. Wilson's is the more conservative approach.

Jeffreys (x+.5)/(n+1)

Jeffreys (1961) provided a compromise between the LaPlace and MLE methods. See reference for technical details.

Best Estimate

The best point estimate is calculated using the following logic: If "Unknown" is selected from the Likely Population Completion Rate drop-down, the LaPlace method is used. The smaller your sample size and the farther your initial estimate of p is from .5, the greater the benefit over the MLE.

If "Between .5 and 1" is selected from the Likely Population Completion Rate drop-down and the observed completion rate is:

1. Less than or equal to .5: the Wilson method is used.
2. Between .5 and .9: the MLE is used.
3. Greater than .9: the LaPlace method is used (Note, if 1 > x > .9 the Jefferys method is also a viable alternative).
Need more information? Be sure to check out the online confidence interval tutorial.

References

1. Agresti, A., and Coull, B. (1998). Approximate is better than 'exact' for interval estimation of binomial proportions. The American Statistician, 52, 119-126.

2. Chew, V. (1971). Point estimation of the parameter of the binomial distribution. The American Statistician, 25, 47-50.

3. Jeffreys, H (1961) Theory of Probability (3rd Ed), Clarendon Press, Oxford pp. 179-192.

4. Laplace, P. S. (1812). Theorie analytique des probabilitites. Paris, France: Courcier.

5. Lewis, J.R. & Sauro, J. (2006) "When 100% Really Isn't 100%: Improving the Accuracy of Small-Sample Estimates of Completion Rates" in Journal of Usability Studies Issue 3, Vol. 1, May 2006, pp. 136-150

6. Manning, C. D., & Schutze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.

7. Sauro, J & Lewis, J R (2005) " Estimating Completion Rates from Small Samples using Binomial Confidence Intervals: Comparisons and Recommendations" in Proceedings of the Human Factors and Ergonomics Society Annual Meeting (HFES 2005) Orlando, FL

Jeff Sauro is the founding principal of MeasuringU, a company providing statistics and usability consulting to Fortune 1000 companies.
He is the author of over 20 journal articles and 5 books on statistics and the user-experience.

.

October 30, 2016 | MIRABEL wrote:

rnHi everyone!rnI'm so excited share this testimony here about how i got my ex boyfriend back after a break up. I'm Mirabel 21 yr old from England, my boyfriend of a 4yr just broke up with me and am 30 weeks pregnant.I have cried my self to sleep most of the nights and don’t seem to concentrate during lectures sometimes I stay awake almost all night thinking about him and start to cry all over again.Because of this I end up not having energy for my next day’s classes ,my attendance has dropped and am always in uni and on time.Generally he is a very nice guy ,he ended it because he said we were arguing a lot and not getting along.He is right we’ve been arguing during the pregnancy a lot .After the break up I kept ringing him and telling him I will change.I am in love with this guy and he is the best guy I have ever been with.I’m still hurt and in disbelief when he said he didn’t have any romantic feelings towards me anymore that hurt me faster than a lethal syringe.He texts me now and then mainly to check up on how am doing with the pregnancy,he is supportive with it but it’s not fair on me, him texting me as I just want to grieve the pain and not have any stress due to the pregnancy.i was really upset and i needed help, so i searched for help online and I came across a website that suggested that Dr Purity can help solve marital problems, restore broken relationships and so on. So, I felt I should give him a try. I contacted him and he told me what to do and i did it then he did a spell for me. 28 hours later, my bf came to me and apologized for the wrongs he did and promise never to do it again. Ever since then, everything has returned back to normal. I and my bf are living together happily again.. All thanks to Dr Ahmed . If you need a spell caster that can cast a spell that truly works, I suggest you contact him. He will not disappoint you.This is his E-mail: Ahmedutimate@gmail.com or Add him up on Whats-app: +2348160153829rn

October 25, 2016 | Alex wrote:

Hello,
How to calculate confidence interval for a function of two proportions f(p1,p2), for instance f(p1,p2)=p1*p2 or =(p1*p2)^0.5, where p1 and p2 could be both - small sample size and marginal (close to 100% or 0%).

August 15, 2016 | xisyuvi wrote:

March 29, 2016 | Khalil wrote:

Hi, thanks for very informative blog. Wilson point estimate is computed as a part of the Adjusted Wald confidence interval. My question is, would it be ok to use a different point estimate - not Wilson but use with it an adjusted wald CI? Example: 7/10=0.7 MLE is used as best point estimate but can we use Adjusted Wald CI with it? for instance 0.7 +/- 0.2522

March 7, 2016 | cohhiejk wrote:

February 2, 2016 | Mark wrote:

4qrtZc http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com

January 7, 2016 | sivakumar sivaraj wrote:

I want to decode the Transport(BUS/TRAIN/BIKE/CAR/OTHER) variable as numbers for analysis,How to de-code this nominal Variable for regression analysis ,multivariate analysis and logistic regression analysis,please advice me to understand the decoding for different above mentioned analysis.rnrnI already coded above veariable as for multivariate analysis((BUS=1/TRAIN=2/BIKE=3/CAR=3/OTHER=3)rnrnFor logisticrnrnrn ID|BUS|TRAIN||BIKE|CAR|rn 101 1 0 0 0 rn 102 0 1 0 0rn 103 1 0 1 1rn 104 0 0 0 1rn 105 0 1 0 0rnrnhow to decode for Multiple regression model above decode is right or wrong ,help me..thanks siva!rnrnrn

July 28, 2015 | Greg Asztalos wrote:

Hi Jeff, Thank you for your online contribution to binomial confidence interval calculations. The biologic industry, including our company has been using your calculator since at least 2009, as referenced in “Validation of Automation Systems for Immunohematological Testing Before Implementation; Approved Guideline”. Clinical and Laboratory Standards Institute, I/LA33-A, Vol. 29 No. 28 (http://clsi.org/).

I was comparing your calculator with Statpages and Minitab and observed some differences with the Lower Bound calculation using the Exact method. For 5000 Total Tested Samples, your calculator seems to have a jump in the lower bound confidence (see below). Does your calculator have a sample size limitation?

*http://statpages.org/confint.html#Binomial
**http://www.measuringu.com/wald.htm

***Minitab > Stats > Basic Statistics > 1 Proportion

Confidence Pass Tot Prop *Statpages **MeasuringU ***Minitab
0.9 1 5 0.2 0.0102 0.0102 0.0102
0.9 2 5 0.4 0.0764 0.0764 0.0764
0.9 3 5 0.6 0.1893 0.1893 0.1893
0.9 4 5 0.8 0.3426 0.3426 0.3426
0.9 5 5 1 0.5493 0.6310 0.6310
0.9 10 50 0.2 0.1127 0.1127 0.1127
0.9 20 50 0.4 0.2831 0.2831 0.2831
0.9 30 50 0.6 0.4739 0.4739 0.4739
0.9 40 50 0.8 0.6844 0.6844 0.6844
0.9 50 50 1 0.9418 0.9550 0.9550
0.9 100 500 0.2 0.1710 0.1710 0.1710
0.9 200 500 0.4 0.3635 0.3635 0.3635
0.9 300 500 0.6 0.5626 0.5626 0.5626
0.9 400 500 0.8 0.7683 0.7683 0.7683
0.9 500 500 1 0.9940 0.9954 0.9954
0.9 1000 5000 0.2 0.1907 0.4747 0.1907
0.9 2000 5000 0.4 0.3885 0.6890 0.3885
0.9 3000 5000 0.6 0.5885 0.7801 0.5885
0.9 4000 5000 0.8 0.7905 0.8300 0.7905
0.9 5000 5000 1 0.9994 0.9995 0.9995

April 30, 2015 | Shawn wrote:

Would adjusted wald be okay to use for ab testing for a low traffic site? For example, let's say 3 out of 10 visits result in a conversion (ie: form submission) for an adjusted wald interval of 10% - 61%. If I decide to test an alternative form, would it be valid to say the alternative form is better if the adjusted wald interval is higher than the original (ie: I find 7 out 10 visits convert on alternative form for an adjusted wald interval of 39% - 90%)?

April 30, 2015 | Shawn wrote:

Would adjusted wald be okay to use for ab testing for a low traffic site? For example, let's say 3 out of 10 visits result in a conversion (ie: form submission) for an adjusted wald interval of 10% - 61%. If I decide to test an alternative form, would it be valid to say the alternative form is better if the adjusted wald interval is higher than the original (ie: I find 7 out 10 visits convert on alternative form for an adjusted wald interval of 39% - 90%)?

January 20, 2015 | Roul wrote:

Your online calculator gives wrong upper limit for the score method when passed = 0.rnrni tried the following:rnN = 50 at 95% confidence level.rnthe limits your calculator gives is 0 and 0.0513...whereas the correct limits are 0 and 0.0714...rnrni double checked the limits manually in excel using the score formula, and also using another online tool rnhttp://epitools.ausvet.com.au/content.php?page=CIProportionrnrnhowever, the limits match when passed =! 0....rncan u please rectify the same...rnthanks :-)rn

August 14, 2014 | sjwjdkfkgbh wrote:

July 1, 2013 | Milton wrote:

I had realized a calc in Confidence Interval Calculator For A Completion Rate for the follows values passed=780 Total=5662 but for exact the result in low score in very different to respect other test

June 2, 2013 | Marina Treneva wrote:

Thank you for the explanation of CI variations. Your calculator is in work with denominators more than 2000. The QuickCalk in Graphpad is not valid for the denominator more than 2000.

February 4, 2013 | Katie wrote:

Can you use this calculator for finding confidence intervals for multiple choice (radio or check box) survey questions?

For example, how many people said they would vote for Candidate A, B, or C? Or how many people in the past month have recently visited sites A, B, or C?

Thanks!

January 23, 2013 | Chandrasekhar wrote:

A sound mathematical reasoning with simple examples and description.

May 11, 2012 | Martin Raic wrote:

As to the case when none or all of the trials succeed, I agree with Charles Bedard. In particular, the exact interval then fails to guarantee the 95% coverage. For example, in the case of 5 trials, the calculator yields [0, 0.4507) for no success and (0.5493, 1] for 5 successes. Therefore, if the actual success probability equals 1/2, the coverage probability equals Bin(5, 1/2){1,2,3,4} = 0.9375.

The point: when considering one or two sides, the actual and the estimated success probability should not be confused.

July 7, 2011 | Michelle wrote:

I like this

June 22, 2011 | Mikael Goldstein wrote:

Wen 8 passed (out of 10 (p=0.8)) Laplace turned out to be the best point estimate, when in fact it should med MLE!
When 4 out of 10 passed LaPlace turned out to be the best pe, when in fact it should be Wilson!
for 9 out of 10, LaPlace is OK.
For 10 out of 10, LaPlace is OK

In your paper The Wilson Method is displayed as (x+2)/(n+4) but the computations are done with the x + c square estimator. Which one is the correct to use?

confidence level uses the Z-critical value of 1.96 or approximately 2. It fells odd to use the z value when dealing with small samples?

Regards,
Mikael Goldstein

January 3, 2011 | MaN wrote:

Great stuff, easy and handy

December 10, 2010 | williamkinney wrote:

December 3, 2010 | Monica wrote:

Thank you very much, this is helpful! Could you also provide the formulas used for each?

October 11, 2010 | Nestor Garcia wrote:

Excellent summary of information related to confidence interval.
I have used widely the exact method to support risk analysis

February 22, 2010 | anonymous wrote:

very easy to use.

November 6, 2009 | Jim Hodges wrote:

Which exact method is your exact ;method? I can't find it here now, but I recall being able to find it on a previous visit to this page. Your link to the confidence interval tutorial is dead.

August 10, 2009 | Greg wrote:

would you use the laplace interval for fast-time modeling results that yield 0 "successes" out of 5 million runs (treating the 5 million runs as a sample)?

June 3, 2009 | B Joseph wrote:

In 1992, the FAA conducted 86,991 pre-employment drug tests on job applicants who were to be engaged in safety and security-related jobs, and found that 1,143 were positive. (a) Construct a 95 percent confidence interval for the population proportion of positive drug tests. (b) Why is the normality assumption not a problem, despite the very small value of p

May 25, 2009 | Sujan Karki wrote:

I want to calculate confidence intervel of cluster sample. How to use this calculator for CI for cluster effect? any modification or can not use this calculator?
thanks
sujan

April 4, 2009 | sammy wrote:

4nWrDb vkoo7wvY5Xkfak7bf1Th

April 4, 2009 | sammy wrote:

4nWrDb vkoo7wvY5Xkfak7bf1Th

February 8, 2009 | Alexandre miranda wrote:

cant find how to calculate the exercise on page 66 -confidence interval based on binomial distribution- (figure 4.1)

May 20, 2008 | Charles Bedard wrote:

Not sure if my comment went throug. Instead of 2 as an answer to the question "What is 1+1", I entered 1.999999..... , which is mathematicaly equivilent. My joke. to repeat my comments.
----------------------------------------------
I find that all the estimators have one fatal flaw. A two sided confidence interval is specified with the presumtion that the error in each tail is alpha/2. When the number of successes is equal to zero or the number of trials, all the stated CI's take either 0 or 1 as one end of the CI and put ALL the error in the inside tail, making the CI a one sided confidence interval with alpha (not alpha/s) in the tail. I prefer a modified Agresti CI (using an unassumed prior to keep frequentests happy or a simple uniform over .5 to 1 (or .5 to 0) The modified Agresti CI is based on the Beta distribution since the distribution of the proportion is a continuous distribution. More honest, especialy in one-shot (non-production) situations.

May 14, 2008 | Pieter Johnson wrote:

This website is excellent! Very helpful.

Comment:

.

What is 1 + 2: (enter the number)