Survey Sample Sizes: How Many Respondents Do You Really Need?

Publisher of, ranked amongst most influential people on the Internet, Market Research expert

Survey Sample Sizes: How Many Respondents Do You Really Need?Today’s guest post is by Gary Angel, Gary co-founded Semphonic and is president and chief technology officer. You can read more articles by Gary at the Semphonic Blog.

We just finished our annual web analytics conference – the Web Analytics X Change. The conference is unusual in that it’s all small group discussions with enterprise practitioners. It’s a great format for hearing what’s really bugging people. As part of the conference, I attended a session focused on survey research and online behavioral integration. One of the most intensely debated topics was online survey sample size.

It’s a big issue because many organizations find themselves deploying almost as many different surveys as tags and they don’t want to suffer too much from “uncertainty principle syndrome” – damaging the user experience that they’re trying to measure.

How many respondents are really enough?

There are two schools of thought about sample size – one is that as long as a survey is representative, a relatively small sample size is adequate. Perhaps 300-500 respondents can work. The other point of view is that while maintaining a representative sample is essential, the more respondents you have the better.

This is a big issue because it impacts all sorts of decisions including the length of your survey, your collection mechanism, and, of course, you’re sampling rate.

So what’s the right answer?

If you aren’t integrating your survey data with web behavioral data, then a relatively small sample size might be okay (I’d emphasize the might). But if you want to combine behavioral analysis and survey data, then forget a sample 300 or 500 respondents. Those numbers simply won’t work.

Let me give you a real-world example showing why that’s true. We’re working right now with a client that samples approximately 1000 site visitors a month for their satisfaction survey. They asked us to do a study of the impact of using one of two internal search tools on their site on both overall site satisfaction and visit accomplishment.

On this site, search is used in about 10% of visits. Since the site gets more than 10 million visits a month, that still yields a heckuva lot of behavior to study – more than 1 million search visits every month. No problem there.

But our representative sample only captured about 100 respondents who’d used search.

Between the two search tools, one served about 70% of the queries. So for the second search tool, we had about 30 respondents to deal with. Getting the picture?

For our analysis, we wanted to track visit reason vs. satisfaction vs. outcomes for searchers. With some visit reasons only accounting for about 10% of visits, there were cases where we were supposed to analyze the outcomes for all of 3 visitors.

And that’s with a survey size of 1000 and a relatively simple cross-tabulation of visit intent and one fairly common behavior. Sure, we could add lots more months to the picture. But tracking behavior over extended periods of time adds all sorts of complications to the analysis. The combination of seasonality, site change and macro-economic change make this dangerous. Very few of our client sites remain constant for six months.

If doing behavioral analysis with 1000 survey respondents is challenging, imagine what it would be like with a sample size of 300. Impossible.

So what’s the right answer?

If I had my druthers, I’d recommend that high volume sites strive for a much higher sample size – something like 15K would be nice on a monthly basis.

Is that too much for your user experience to bear? Obviously, the answer depends on your site volume and your take-up and completion rates.

You can’t do much about volume, but if your survey length is impacting your take-ups or completion rates, then I’d be willing to sacrifice a whole bunch of questions to get to the increased size. The fact is that on many 30-40 question surveys, we’d only expect to use at most 5-10 of those questions in a behavioral analysis. I’d bet even money that your analysts feel that same way and that a heavy majority of questions on many long surveys hardly ever get studied at all.

At some point you may have to make a decision: do you want a whole lot of really shallow information or do you actually want to do analysis on a narrower set of data?

So when it comes to behavioral analysis combined with survey integration, the right answer is pretty obvious. A representative sample is essential, but size really does matter. Let yourself get talked into a 300 person sample, and you might as well throw all that work you did to integrate online survey data with behavioral data in the junk pile.


    I think there may be a couple of different things going on here.

    First, I hate to sound like a math professor but there shouldn’t be schools of thought on what constitutes a readable sample size. That’s a function of mathematics and using a truly random sample. Google central limit theorem if you want to see the mathematical proof.

    (Since I’m in professor mode though, in the 4th paragraph there’s a statement about the survey being representative. That should probably be sample; survey instruments in and of themselves can’t really be representative of anything but survey instruments.)

    Then there is the law of large numbers. Everybody knows this one. Essentially it states that in an infinite number of flips of a coin the number of heads and the number of tails will each be 50% given an undoctored coin.

    So you want the bigegst possible sample, right? Well, no, you probably don’t. Small samples work because they can be designed to render the information you want. Your findings can then be verified with a larger data extract or through boot strapping or other techniques that enable you to verify that your finding is really consequential.

    One under-reported side effect of using large sample sizes is that statistically significant but otherwise meaningless results tend to emerge all over the place. Plus there are all sorts of complicating factors. What controls are in place to avoid duplicates in the sample when drawing from a vast pool of visitors, some of whom may visit the site multiple times in a month? Or alternatively, what analytical techniques are you going to apply to what could end up being a non-standard distribution?

    I understand that you are talking about integrating survey data with behavior. In many ways that’s the holy grail. But I wonder if matters aren’t needlessly complicated by that end goal.

    It is notoriously difficult to marry stated intentions with outcomes. There are all sorts of dynamics in play when a human being interacts with a survey tool that emerge differently, if at all, when engaged in commercial activity. That we can in a wired world collect more information than we ever have makes it seem like we can be more certain.

    The events in the financial sector roughly 12 months ago ought to remind us that we are dealing with probability not certainty.

    A representative and large enough sample size is crucial, yes, but the way to find out how large the sample size is that you need is not just by calculating how many visitors there are left in a contingency table.

    The thing to consider is effects size: how large are the effects or differences you want to find, and how certain do you want to be that you will actually find these differences in the sample when they exist in the population.

    Using that, a typical sample size of a couple of thousand cases is very often enough for most practical social science (including Internet behavior) purposes. And even with 300 you can still find large effects. Try G*Power or PS (both are free) to get an idea about the right sample size. Often you might find that 15k is really overdoing it.

    I need to carry out a survey and the population is small….25 individuals. No sampling is required then…..but is this possible with a small number of respondents??

    Everyone is making this too complcated (apart from Tim who mentioned the need to address sampling error).
    If you have a sample of 25 this is really qualitative research so don’t worry about selecting them on some sort of random basis, it is more important that you get a good spread of demographics or any other metric that is relevant. On a more general front for surveys that need some statistical reliabilty (i.e normally all quantified surveys), think of ensuring that you have around 100 respondents for each sub group that you will to look at individually. depending on the range of answers you get the error on this would average around +/-5% which is reasonable for most studies. While there is a need to sample randomly from a known universe (ie don’t use an access panel) you can have booster samples for those minority groups that will take too many contact points to achieve 100 providing you weight them down to their correct proportion when looking at the overall sample.
    If someone tries to convince you that randomness is an old fashioned concept that no longer applies, tell them to leave research and take up astology or online gambling.

    Sir I performed a survey. and i have collected 20 samples from a sugarcane farms. what i did in determining the respondents is by draw by lots. i made a list on the total sugarcane farmers in the place where the samples has been collected.and get the 20 farmers by draw by lots. so that the respondents wil be selected randomly and will not be biass. my adviser corrected me that it is not the right and scientific way in determining the respondents. there is certain formula or calculation to be followed. . and i became really confused about those scientific methods, population, sample size, error and the likes, regarding this matter. thats why this blog really gave me some calrifications, well i still find it a little bit complicated but at least i have now ideas, right ideas on how to determine a respondent in survey. thank you.

    I think this is among the most vital info for me. And i’m glad reading your article. But want to remark on some general things, The web site style is great, the articles is really excellent : D. Good job, cheers

Leave a Comment

Leave a Comment

How Surveying Customers Can Actually Increase Sales

Maintaining and building a customer base is the key to good business. While acquiring new customers is more difficult than maintaining existing ones, it is still crucial to keep existing customers while enticing new ones….

Competitive Intelligence – How to do Competitor Research

Today, I’m going to share my competitor research hacks in the hopes that you will fare way better than I did. Dumpster Diving – vs – Value Driving — Which would you pick? “Instead of…