If you're doing qualitative design research, don't worry about sample size. Sample size and statistical significance don't matter*.
The only thing that matters is how confident your team is about the next decision they need to make**
You might get that confidence from watching 2 people.
You might need to watch 20.
You might need to watch 20, then check your hypothesis at a larger scale with a survey or analytics.
It all depends on the confidence of your team.
There are only 2 things you need to do to start building confidence in your next decision:
- Start watching end users using the thing you're making. (Do research in every iteration.)***
- Make sure enough of your team observe the research. (Research is a team sport, help your team get their exposure hours.)
As soon as you start doing this, you'll be so busy working out what to do with everything you're learning, you'll have no time to worry about statistics.
You'll know what is broken and what is not, and you'll know what you need to spend more time learning about.
It's as simple as that.
* Obviously it's better to see a lot of people over very few
We recommend you see a handful in each iteration - keep iterating, and you'll gradually get a larger and broader sample. Also, the 'Is 5 people enough?' question often discussed by usability people is irrelevant when you're seeing 5 people in every iteration.
** Your next decision might be...
- How do we fix the question in this form so that people can answer it better?
- Is this service meeting the user needs?
- What is the next most important thing we should prioritise in the backlog?
- Does this work better than what we've currently got live?
*** Who do you watch?
It's tempting to launch into a complicated segmentation exercise. Don't do that. There will be some obvious segments – just start with those. Start talking to people and the segmentation that really matters will emerge from those discussions.
The important segments will sometimes be things you couldn't have guessed at before you started meeting people. Start talking to the easiest people. Pick the people who are most likely to be able to (and want to) use your service and test with them. If your service doesn't work for them, it won't work for anybody.
As you begin to solve problems, make it harder for yourself by inviting people who are going to be more difficult – those with edge case and the complex user needs and those with lower (or no) digital literacy.
Keep in touch. Sign up to email updates from this blog. Follow Leisa on Twitter.
Comment by Chris Newell posted on
"If you are doing qualitative design research, don’t worry about sample size.
Sample size and statistical significance don’t matter"
I'd argue it would depend on what you're trying to determine. If it's identifying bugs or whether an interface is confusing then it's significant if one person finds a bug or a few people find the interface confusing. However, if you're trying to decide whether people prefer interface A over interface B then sample size is significant, as is whether your panel reflects the demographic composition of your target audience.
Comment by Leisa Reichelt posted on
You missed the important part in the bit you chose to quote - the thing that is important, the confidence of the team!
And yes, to get confidence for different kinds of research questions, you need different sample sizes and compositions.
At GDS we focus on the research that you do to design a service that meets user needs and is easy to use. Marketing research is a different fish entirely.
It's worth adding that we're very rarely interested in what people tell us they prefer and always very interested in what interfaces actually help people get the job done.
Comment by Chris Newell posted on
If confidence is based on observations which aren't statistically significant isn't there a danger that it could be misplaced confidence? Teams are prone to confirmation bias in the same way as individuals. I think we'll have to agree to disagree on this because it comes down to how much one trusts human opinions.
Comment by Leisa Reichelt posted on
Have you watched much qualitative research Chris? What I would usually say to team members who have similar objections is 'come watch half a dozen sessions and then tell me if you think you need to watch any more sessions before you feel like you know what we need to do'. Almost always, after four sessions, we know what needs doing. (And we still have a session or two left in the bag).
Remember, we're not doing one piece of research on which we bet the farm. We're doing this every fortnight, gradually iterating, seeing more and more people and more diverse segments over time. It's low risk and it works.
Also, remember that statistical significance doesn't necessarily require large numbers, it's about the probability of the event we are observing not being caused by chance. The intention is to help you understand whether or not you should act on that observation. If you watch four out of four people fail to understand a question in a form, it is usually not that it was four people that makes you change the form, it is what you now understand about WHY the people are failing that allows you to improve the question so that less people will fail.
You could try to test everything with very large numbers, but if you did that, you'd be so busy testing and analysing that you'd never get anything fixed.
By continuing to conduct research and measuring other data, like web analytics, we can reduce confirmation bias and ensure accuracy.
Comment by Dominic Hurst posted on
Great post Leisa. Like you say, research for every iteration is more important than one big piece of research. Its whats everyone is happy with so if everyone isnt in agreement then iterate/ test again. Lets not forget you have the quantitative insights from the live site which if the iteration research is right will collectively show an effective solution.
Comment by Simon H posted on
I think the other side to it is that we also look at the data side of things to help support us. This is something Ben Holliday does so well on Carers. We will see something during research that we believe is an issue, we can then tie this in to what our analytics data shows us. For example, we may see through analytics a number of people dropping out on a certain page. We can then consider what we have seen in the lab. If people have been struggling to answer a particular question then we can make an assumption that it is causing the drop out we see.
We will then reconsider the design of that particular question and test it in the lab. If it improves things then we can add it into the backlog.
Once it is release we will then monitor it in live to see if it has the desired effect.
I think Ben plans to do a blog post about how we run experiments on carers that would explain this far better than I can!
Comment by Phil posted on
Wow, just wow!
I must say I'm staggered by the arrogance coming out of these blogs (and some of the defensive responses to valid comments).
I stumbled across this by accident whilst researching significance testing and then read a few more and it very much feels to me that this research arm of the government is doing exactly what it professes not to do - telling and deciding what users needs and behaviours are and not actually speaking to broad and representative groups at all.
The most important thing is not getting the confidence of the team (that's important to your career not the end user!), it's understanding how, why and who uses something and what you need to do to get it better for ALL the users and then communicating that out.
I'm sorry but for some parts of government, speaking to 30 or so people across a development cycle is not going to cut it to understand the behaviours of the rest of the population. I second Chris' comments!
Comment by Leisa Reichelt posted on
I'm sorry this comes across as arrogant, it is certainly not intended that way. What we are trying to do is to simplify what we are asking people to do for user research so that they're more able to do some, rather than none. Or doing a huge survey in place of doing any qualitative research. Or using a focus group to ask whether people like the interaction design.
By starting to engage with users in an appropriate way for the research questions you have to answer we are able to help service designers and product owners build better experience for end users. Once you start doing this, there is no reason to stop. We don't just talk to thirty people and leave it at that – we do continuous improvement and as a part of that, we talk to more and more people, and more and more diverse people over time.
For most projects in the early stages, the problems with the service design are so significant that you can identify them and start to fix them with very few people. The trick is not to stop there, but then to introduce more diverse groups of people so that you do get proper representation from the breadth the key segments of your audience.
Also, just to be clear, we're not saying that the goal is for the researcher to win the confidence of the team, but for the team, through observing the user research and the real people actually trying to use the product, to be confident of what they should do next to help improve the experience. So, nothing to do with our careers, everything to do with improving the experience for end users.
Hope this helps to clarify things a little more.
Comment by Mike Ryan posted on
Great post on an often debated subject. For qualitative research having large numbers shows a diminished return on investment. Depending on the type of issue it is, the diminished return happens in the 5-10 participant range. This has been confirmed in a number or research studies on usability tests.
* Nielsen http://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
* Faulkner http://www.simplifyinginterfaces.com/wp-content/uploads/2008/07/faulkner_brmic_vol35.pdf
* Lewis http://drjim.0catch.com/2006_DeterminingUsabilityTestSampleSize.pdf
Running qual research is expensive per participant because the sessions are long and the data is messy. When I need additional confidence I pair the qual study with a similar unmoderated study to get some quant.
But the trade-off it time and money. It is much better to run smaller, frequent tests then no tests. Remember many projects do no testing with users! If you are running 2-week sprints they have to be smaller and faster or you are S.O.L.
We like to test 10 users and feel confident that we are finding most of the important issues. And those patterns tend emerge around 4 or 5 participants.