Why 5 Participants Are Okay in a Qualitative Study, but Not in a Quantitative One

View the original post

Summary: Qualitative usability testing aims to identify issues in an interface, while quantitative usability testing is meant to provide metrics that capture the behavior of your whole user population.

In our quantitative-usability classes ( Measuring UX and ROI and Statistics for UX ) we often recommend a sizeable number of participants for quantitative studies — usually more than 30. We’ve said again and again that metrics collected in qualitative usability testing are often misleading and do not generalize to the general population. (There could be exceptions, but you always need to check by calculating confidence intervals and statistical significance ). And, almost inevitably, the retort comes back — Didn’t Jakob Nielsen recommend 5 users for usability studies ? If you need more users for statistical reasons, then it certainly means that the results obtained with 5 users aren’t valid, doesn’t it?

This question is so frequent, that we need to address the misunderstanding.

Quantitative Usability Studies: More than 5 Participants

Quantitative usability studies are usually summative in nature: their goal is to measure the usability of a system (site, application, or some other product), arriving at one or more numbers. These studies attempt to get a sense of how good an interface is for its users by looking at a variety of metrics: how many users from the general population can complete one or more top tasks, how long it takes them, how many errors they make, and how satisfied they are with their experience. They usually involve collecting values for each of the participant, aggregating those values in summary statistics such as averages or success rates, calculating confidence intervals for those aggregates, and reporting likely ranges for the true score for the whole population. The results of such a study may indicate that the success rate for a top task for the whole population is somewhere between 75% and 90%, with a 95% confidence level and that the task time is between 2.3 and 2.6 minutes. These ranges (in effect, confidence intervals) should be fairly narrow to convey any interesting information (knowing that a success rate is between 5% and 95% is not very helpful, is it?), and they usually are narrow only if you include a large number of participants (40 or more).  Hence, the recommendation to calculate confidence intervals for all metrics collected and not to rely on summary statistics when studies contain just a few users.

Read Full Article