[This is a guest post from Simon Chadwick, CEO of Peanut Labs, Managing Partner of Cambiar and Editor-in-Chief of Research World.]
This is a continuation of my last post, “Digital Fingerprinting and Sample Quality”…
What, then, can we do about sample quality? As usual where multivariate problems are concerned, the answer is ‘a lot of things’. There is no one magic silver bullet to solve the data quality problem, but a series of bullets that, combined, will serve to make things a whole lot better. To name just a few:
- We can come together as an industry to lay down guidelines and standards for online research and data gathering. An enormous amount of work has gone in to this, including the study by ORQC, the ACE collaborative effort between industry associations and the recently issued ISO standard. These, in addition to the ESOMAR 26 questions, lay the groundwork for real professionalism in the industry, together with learning that can be passed down through the ranks as to how to do good online research in which we can have confidence.
- As part of this, we can tighten up our panel recruitment practices. It is painful to see reputable firms subscribe to “cash for surveys” websites that register people for multiple panels, all in the service of getting “bodies”. The example below is just one of a multitude that exist out there.
- We can start to eat our own dog food. By that I mean that we can knuckle down and start designing surveys that are not only a whole lot shorter, but also more engaging to the respondent. We have known about Flash and other techniques for making surveys more interesting and involving for ages, yet how many of us truly use them on a regular basis? And, please, don’t use the excuse that we can’t do that on tracking or syndicated studies because we don’t want to risk breaking the data trends. Is it better to give clients reliable data or trendable data? And who has not heard of side-by-side trials?
All of that being said, these are longer term solutions. They require long-term adoption and education and will not solve the quality issue as it confronts us today. So what can we do that will move the needle right here and now?
Part of the answer lies in the very thing that gave rise to the problem itself. Data and sample quality issues online are the result of the technology that gave birth to this means of data collection. If we were not online, we would not have the problems with which online presents us. These include problems that are both quantifiable and a little more qualitative or “fuzzy”.
Quantifiable issues that we know about and can deal with include:
- Duplicates: a “duplicate” is someone who tries, knowingly or unknowingly, to take the same survey more than once. At its most innocent, a duplicate is where someone may be a member of more than one panel and be presented with the same survey on both. At the other end of the spectrum are people – and, indeed, survey factories – who deliberately enter surveys multiple times in order to maximize the cash value of their participation. The fact that they can do so is a function of their ability to get around most of the safeguards that are normally put in place. For example, they can delete cookies, come in under different email addresses or change their IP address.
- Geo-IP violation: simply stated, this involves a person who takes a survey as if they were in one country, but in fact are in another. So, for example, if your survey looks for people in the US, a person from China could take the survey and claim to be in the US. At its most simple (and innocent), this could be a traveling business person. At its most deviant (and more usual) it could be a survey factory in China using either people or bots to take surveys on a fraudulent basis. Simple IP checking can eliminate this unless, of course, the factory is using proxy servers
- Speedsters: these are people who take a survey and zip through it in record time. Very often, they will straightline through complex grids (“satisficing”) or deploy patterns to try and avoid being caught. They will also usually provide garbage answers to open-ended questions.
Then there are the more qualitative or “fuzzy” quality problems:
- Hyperactives: people who take a very large number of surveys in a given period of time. A now infamous comScore report suggested that 32% of surveys were being taken by 0.25% of the online panel population. Is this a bad thing or not? Do we want to put limits on this type of behavior or not?
- Cross-Panel Accounts: does it matter if someone is on 6 panels? Some data say ‘yes’, others (including ORQC) say ‘no’. Do we want to flag such people to see if their data differ from others?
- Repeat Offenders: people who have been flagged in the past as having engaged repeatedly in ‘suspect’ behavior. Do we want these people in our surveys? Should they be flagged for separate analysis?
These are issues that exist in the here and now. No matter how much education and standards-setting goes on, we have to deal with them today. This is where technology comes in and puts power in the hands of the researcher to make key decisions as to what constitutes quality and what does not.
Enter digital fingerprinting. Digital fingerprinting is not a new technology. It has been used by the financial services sector for some time and, indeed, has been present in research, on a proprietary basis, for a few years. But it was when Peanut Labs introduced its OptimusTM technology in 2008 as an industry-wide solution that it really started to gain attention. Indeed, since then, multiple companies have launched their own versions of the technology, mostly on a proprietary basis (i.e. you have to use our sample or our hosting to get the benefit), and have made DF a standard offering in the quality debate.
Digital fingerprinting is not rocket science (otherwise it could not have been copied so quickly!). All it does is to take the 100-150 data points that a computer puts out when it connects via a browser to the Internet and combine those via an algorithm to produce a unique identifier for that computer that can be referenced every time it seeks to take a survey. These data points include such items as
• your browser and its version
• your IP address
• your computer’s configuration
• the software you are using and their various versions
• the plug-ins that you have and their versions
• other downloads that you may have on your machine.
The combination of these data points produces a machine ID that is unique. So unique, in fact, that OptimusTM can detect a machine 98.8% of the time that it comes back and tries to take a survey. Even if you delete your cookies, change your browser and change your IP address, a DF technology such as OptimusTM will nail you.
Why is this important? For starters, it means that we can detect duplicates straight away. If a machine tries to come in and take a survey more than once, DF will know. If a machine comes in and tries to pretend – even via a proxy server – that it is from a geographic location that it is not, DF will know. But there is more than that. By allowing the researcher to set certain variables, the technology will know if a machine is trying to speed through a survey, straightline through answers or provide poor quality open responses. The researcher can set the lowest amount of time that is respectable to take the survey, ask DF to look for satisficing,
inspect opens for length and other variables and look out for repeat offenders.
More importantly, DF technologies such as OptimusTM that are applied across multiple data collection sources can enable a researcher to decide whether he or she wants to block machines that have been identified as engaging in suspect behavior, or merely tag them so that they can assess the quality of data that they have provided at the back end of the survey. Peanut Labs’ Optimus™ Research Database (ORD) now has the results of 21 million respondents from across the entire spectrum of the survey industry. If one of these has been identified in the past as having engaged in suspect behavior, then they can be eliminated up front from participating in a survey, thus saving time and cost at the back end and improving the de facto quality of the survey itself.
What does ORD tell us about these 21 million respondents? Well,1.5 million were duplicates, 500,000 were Geo-IP violators and 250,000 were speedsters. The overall ‘suspect’ rate was – you guessed it – 15%.
Digital Fingerprinting is not the solution to online data and sample quality problems. It is a solution, available right now, that can combine with industry initiatives, guidelines, standards and training to produce a quality product. And that is what we, research companies, data collectors and clients all want.
- Peanut Labs: http://peanutlabs.com
- Optimus Digital Fingerprinting: http://www.peanutlabs.com/optimus/technology
[Simon Chadwick is the CEO of Peanut Labs, Managing Partner of Cambiar and Editor-in-Chief of Research World. He has over 30 years’ experience in the research profession, both corporately and as an entrepreneur. ]