• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
QuestionPro

QuestionPro

questionpro logo
  • Products
    survey software iconSurvey softwareEasy to use and accessible for everyone. Design, send and analyze online surveys.research edition iconResearch SuiteA suite of enterprise-grade research tools for market research professionals.CX iconCustomer ExperienceExperiences change the world. Deliver the best with our CX management software.WF iconEmployee ExperienceCreate the best employee experience and act on real-time data from end to end.
  • Solutions
    IndustriesGamingAutomotiveSports and eventsEducationGovernment
    Travel & HospitalityFinancial ServicesHealthcareCannabisTechnology
    Use CaseAskWhyCommunitiesAudienceContactless surveysMobile
    LivePollsMember ExperienceGDPRPositive People Science360 Feedback Surveys
  • Resources
    BlogeBooksSurvey TemplatesCase StudiesTrainingHelp center
  • Features
  • Pricing
Language
  • English
  • Español (Spanish)
  • Português (Portuguese (Brazil))
  • Nederlands (Dutch)
  • العربية (Arabic)
  • Français (French)
  • Italiano (Italian)
  • 日本語 (Japanese)
  • Türkçe (Turkish)
  • Svenska (Swedish)
  • Hebrew IL (Hebrew)
  • ไทย (Thai)
  • Deutsch (German)
  • Portuguese de Portugal (Portuguese (Portugal))
  • Español / España (Spanish / Spain)
Call Us
+1 800 531 0228 +1 (647) 956-1242 +55 9448 6154 +49 030 9173 9255 +44 01344 921310 +81-3-6869-1954 +61 (02) 6190 6592 +971 529 852 540
Log In Log In
SIGN UP FREE

Home Market Research

Propensity Score: How to Construct and Evaluate It

The propensity score is the chance that a person will get a particular treatment based on what is known about them at the start.

A propensity score is the probability that a person, customer, patient, or other unit receives a treatment based on observed characteristics measured before the treatment. Researchers use it in observational studies when treatment is not randomly assigned and selection bias may affect the results.

Selection bias happens when the treatment group and comparison group differ in ways that also affect the outcome. For example, in healthcare research, patients who receive a treatment may be older, sicker, or more engaged with care than patients who do not. In market research, customers who receive an offer may already be more likely to buy.

Propensity score methods help researchers make treated and untreated groups more comparable based on measured variables. They do not make observational data the same as randomized data, but they can reduce bias from observed confounders when used carefully.

Content Index hide
1. What is a propensity score?
2. When should you use a propensity score?
3. How do you construct a propensity score?
4. How do you evaluate a propensity score?
5. What methods use propensity scores?
6. What are the limitations of propensity scores?
7. What mistakes should you avoid?
8. Final thoughts
9. Frequently Asked Questions (FAQs)

What is a propensity score?

A propensity score is the conditional probability of receiving a treatment given observed baseline covariates.

In plain language, it estimates how likely someone was to receive the treatment based on what was known before the treatment happened. Rosenbaum and Rubin introduced the propensity score in 1983 as a way to adjust for observed covariates in observational studies.

A covariate is a variable measured before treatment that may be related to treatment assignment, the outcome, or both. Examples include age, income, location, health status, prior purchase history, education level, or product usage.

A treatment can mean many things depending on the study, such as:

  • Receiving a medical treatment.
  • Seeing a marketing campaign.
  • Joining a loyalty program.
  • Using a product feature.
  • Receiving a discount.
  • Participating in a training program.

The outcome is the result the researcher wants to study, such as recovery, churn, purchase behavior, satisfaction, retention, or performance.

When should you use a propensity score?

Use a propensity score when you want to estimate a treatment effect from observational data and the treatment was not assigned at random.

A treatment effect is the estimated impact of the treatment on an outcome. In a randomized controlled trial, random assignment helps make groups comparable before treatment. In observational studies, people or units often select into treatment or are selected because of existing characteristics.

This analysis is useful when:

  • Random assignment is not possible or ethical.
  • The treatment and control groups differ at baseline.
  • You have enough pre-treatment covariates to model treatment assignment.
  • You want to reduce confounding from measured variables.
  • You need a transparent way to compare treated and untreated groups.

Propensity scores are common in healthcare, education, economics, social science, and applied quantitative research. They can also be useful in business research when analysts compare customers exposed to an intervention with similar customers who were not exposed.

How do you construct a propensity score?

To construct a propensity score, estimate each unit’s probability of receiving the treatment using observed pre-treatment covariates. The treatment indicator is the dependent variable in the propensity model, and the covariates are the predictors.

1. Define the treatment and comparison groups

Start by defining who received the treatment and who did not.

The treatment group includes units exposed to the intervention. The comparison group includes units not exposed to it. This definition must be clear before modeling begins.

For example:

  • Treatment group: customers who received a promotional email.
  • Comparison group: similar customers who did not receive that email.
  • Outcome: purchase within 30 days.

A vague treatment definition will create weak analysis. Be specific about timing, exposure, eligibility, and outcome measurement.

2. Choose pre-treatment covariates

Choose covariates measured before the treatment. These variables should be related to treatment assignment, the outcome, or both.

Good covariates may include:

  • Demographics.
  • Prior behavior.
  • Baseline health or performance measures.
  • Purchase history.
  • Engagement level.
  • Location.
  • Device type.
  • Customer segment.
  • Prior satisfaction scores.

Do not include variables caused by the treatment. A post-treatment variable can hide part of the treatment effect and distort the analysis.

Also avoid variables that perfectly predict treatment assignment. Propensity score methods need overlap between treatment and comparison groups. If one covariate perfectly separates the groups, there may be no fair comparison for some units.

3. Estimate the propensity score

Researchers often estimate propensity scores using logistic regression or probit regression.

Logistic regression is a statistical model used when the outcome is binary, such as treated versus untreated. Probit regression is another binary-outcome model that uses a different statistical link function.

The model estimates a probability between 0 and 1 for each unit. A score of 0.80 means the model estimates an 80% probability that the unit would receive the treatment based on observed covariates.

The goal is not to predict treatment assignment perfectly. The goal is to create a score that helps balance measured covariates between treated and untreated groups.

4. Check overlap and common support

After estimating the propensity score, check whether treated and untreated groups have overlapping score distributions.

Common support means there are treated and untreated units with similar propensity scores. Without overlap, the data cannot support a credible comparison for those units.

For example, if some treated customers have propensity scores near 0.95 but no untreated customers have similar scores, the analysis cannot estimate the treatment effect well for that group.

Researchers often inspect overlap using histograms, density plots, or side-by-side score distributions. Units outside the common support may need to be excluded, but that changes the population to which the results apply.

How do you evaluate a propensity score?

You evaluate it by checking whether it balances baseline covariates between the treatment and comparison groups. A good propensity score process should make the groups look more similar on measured pre-treatment variables.

Check covariate balance

Covariate balance means the treated and comparison groups have similar distributions of baseline variables after matching, weighting, stratification, or adjustment.

You should check balance before and after applying the method. The analysis should show whether imbalance was reduced.

Useful balance checks include:

  • Means or proportions by group.
  • Standardized mean differences.
  • Variance ratios.
  • Distribution plots.
  • Balance tables.
  • Love plots.

A love plot is a chart that shows covariate imbalance before and after adjustment, often using standardized mean differences.

Use standardized mean differences

A standardized mean difference, or SMD, measures the difference in a covariate between groups in standard deviation units.

SMDs are widely used because they do not depend heavily on sample size. A p-value can look significant in a large sample even when the difference is small, so p-values are not the best balance diagnostic.

Many applied studies treat an SMD below 0.10 as a rough sign of acceptable balance, but the threshold should not be used blindly. Researchers should still consider the study context and the importance of each covariate.

Avoid relying only on AUC or c-statistics

AUC and c-statistics measure how well a model distinguishes treated from untreated units. They are useful for prediction, but propensity score analysis is mainly about balance.

A model with high predictive power may separate groups too strongly, which can reduce overlap. A model with a lower AUC may still produce better covariate balance.

The better question is not, “Did the model predict treatment well?”
The better question is, “Did the method balance the groups on important measured covariates?”

What methods use propensity scores?

Common propensity score methods include matching, weighting, stratification, and covariate adjustment. Austin’s review of propensity score methods explains these approaches as ways to reduce confounding in observational studies.

MethodHow it worksBest use case
Propensity score matchingMatches treated units with untreated units that have similar scoresCreating comparable pairs or groups
Propensity score weightingWeights observations based on treatment probabilityEstimating effects in a weighted sample
Propensity score stratificationDivides units into score-based groups or strataComparing outcomes within similar score ranges
Covariate adjustmentAdds the propensity score to an outcome modelSimple adjustment, but often less robust than matching or weighting

Propensity score matching

Propensity score matching pairs treated and untreated units with similar propensity scores. The goal is to create groups that are more comparable than the original sample.

A caliper is a maximum allowed distance between matched propensity scores. Tight calipers can improve match quality, but may exclude more units.

After matching, check covariate balance again. Matching is not successful just because every treated unit has a match. It is successful only if balance improves.

Propensity score weighting

Propensity score weighting uses the score to assign weights to units. One common approach is inverse probability of treatment weighting, often called IPTW.

IPTW gives more weight to units that received a treatment pattern that was less likely based on their covariates. This can create a weighted sample where measured covariates are more balanced across groups.

Weighting can be powerful, but extreme weights can create unstable estimates. Researchers should inspect weight distributions and consider trimming or alternative weighting methods when needed.

Propensity score stratification

Propensity score stratification divides the sample into groups based on score ranges, such as quintiles.

Within each stratum, treated and untreated units should have similar propensity scores. Researchers then compare outcomes within strata and combine the results.

Stratification is easier to explain than some weighting methods, but it still requires balance checks inside each stratum.

Covariate adjustment using the propensity score

Covariate adjustment includes the propensity score as a predictor in an outcome model.

This method is simple, but it may not balance covariates as well as matching or weighting. It can be useful in some settings, but researchers should be cautious and still check diagnostics.

What are the limitations of propensity scores?

Propensity scores can reduce bias from measured covariates, but they cannot fix unmeasured confounding.

Unmeasured confounding happens when an important variable affects both treatment assignment and the outcome but is missing from the data. If that variable is not measured, the propensity score cannot balance it.

Other limitations include:

  • Poor overlap between groups.
  • Missing important baseline covariates.
  • Incorrect model specification.
  • Extreme weights.
  • Loss of sample after matching.
  • Results that apply only to the matched or weighted population.
  • Overinterpretation of observational findings as proof of causality.

A propensity score can support causal reasoning, but it does not prove causality on its own.

What mistakes should you avoid?

The most common mistake is treating the propensity score model like a prediction model. The goal is balance, not prediction.

Avoid these mistakes:

  • Including post-treatment variables in the propensity score model.
  • Ignoring overlap and common support.
  • Reporting treatment effects without balance diagnostics.
  • Using p-values alone to check balance.
  • Relying only on AUC or c-statistics.
  • Matching without checking whether covariates are balanced.
  • Forgetting that unmeasured confounders may still bias results.
  • Making claims that sound stronger than the design supports.

A clean analysis should explain the covariates used, the method chosen, the balance diagnostics, the excluded observations, and the target population for the estimate.

Final thoughts

A propensity score is useful when researchers need to compare treated and untreated groups in observational data. It helps reduce imbalance in measured baseline covariates, which can make treatment effect estimates more credible.

The strongest analyses do not stop after estimating the score. They check overlap, evaluate covariate balance, choose an appropriate method, and explain the limits of the design.

Propensity score analysis works best when it is planned carefully, reported transparently, and interpreted with caution. It is a helpful tool, but it is not a substitute for good research design.

Create memorable experiences based on real-time data, insights and advanced analysis. Request Demo

Frequently Asked Questions (FAQs)

What is a propensity score in simple terms?

A propensity score is the estimated probability that someone receives a treatment based on observed characteristics. It helps researchers compare treated and untreated groups that may differ before treatment.

Is propensity score matching the same as propensity score analysis?

No. Propensity score matching is one method within propensity score analysis. Other methods include weighting, stratification, and covariate adjustment using the estimated propensity score.

What variables should be included in a propensity score model?

A propensity score model should include pre-treatment covariates related to treatment assignment, the outcome, or both. It should not include variables caused by the treatment or variables measured after treatment.

Can propensity scores prove causation?

No. Propensity scores can reduce bias from measured confounders, but they cannot remove bias from unmeasured variables. They support causal inference only when assumptions, design, and diagnostics are reasonable.

How do you know if a propensity score worked?

A propensity score method worked better if treated and untreated groups are balanced on important baseline covariates after matching, weighting, stratification, or adjustment. Standardized mean differences are commonly used to check this.

Why is common support important?

Common support is important because treatment effects cannot be estimated reliably when treated units have no comparable untreated units. Without overlap, the analysis depends on unsupported comparisons.

SHARE THIS ARTICLE:

About the author
Anas Al Masud
Digital Marketing Lead at QuestionPro. SEO-driven content strategist specializing in content that ranks, engages, and converts, while boosting online visibility through hands-on digital marketing expertise.
View all posts by Anas Al Masud

Primary Sidebar

Research what's on your mind. Find out what's on theirs!

A suite of tools to leverage research and transform insights.

Discover our insight platform

RELATED ARTICLES

HubSpot - QuestionPro Integration

Customer Experience in Insurance: Importance + Examples

Aug 29,2023

HubSpot - QuestionPro Integration

Super Bowl LIX: Fan Sentiment and Spending Trends in Philadelphia and Kansas City

Mar 12,2025

HubSpot - QuestionPro Integration

Total Experience Examples for Enhanced Satisfaction

Dec 11,2023

BROWSE BY CATEGORY

Footer

MORE LIKE THIS

Best Workday Peakon Alternatives for Universities in 2026

Jun 11, 2026

insighthub-for-brands

How Can Your Brand’s Research Actually Drive Decisions? A Guide to InsightsHub for Brands

Jun 11, 2026

What Is an MCP Survey Tool? A Guide for Academic Researchers.

Jun 11, 2026

Student Mental Health Survey: Why Campuses Need Better Wellbeing Data

Jun 4, 2026

Other categories

questionpro-logo-nw
Help center Live Chat SIGN UP FREE
  • Sample questions
  • Sample reports
  • Survey logic
  • Branding
  • Integrations
  • Professional services
  • Security
  • Survey Software
  • Customer Experience
  • Workforce
  • Communities
  • Audience
  • Polls Explore the QuestionPro Poll Software - The World's leading Online Poll Maker & Creator. Create online polls, distribute them using email and multiple other options and start analyzing poll results.
  • Research Edition
  • LivePolls
  • InsightsHub
  • Blog
  • Articles
  • eBooks
  • Survey Templates
  • Case Studies
  • Training
  • Webinars
  • All Plans
  • Nonprofit
  • Academic
  • Qualtrics Alternative Explore the list of features that QuestionPro has compared to Qualtrics and learn how you can get more, for less.
  • SurveyMonkey Alternative
  • VisionCritical Alternative
  • Medallia Alternative
  • Likert Scale Complete Likert Scale Questions, Examples and Surveys for 5, 7 and 9 point scales. Learn everything about Likert Scale with corresponding example for each question and survey demonstrations.
  • Conjoint Analysis
  • Net Promoter Score (NPS) Learn everything about Net Promoter Score (NPS) and the Net Promoter Question. Get a clear view on the universal Net Promoter Score Formula, how to undertake Net Promoter Score Calculation followed by a simple Net Promoter Score Example.
  • Offline Surveys
  • Customer Satisfaction Surveys
  • Employee Survey Software Employee survey software & tool to create, send and analyze employee surveys. Get real-time analysis for employee satisfaction, engagement, work culture and map your employee experience from onboarding to exit!
  • Market Research Survey Software Real-time, automated and advanced market research survey software & tool to create surveys, collect data and analyze results for actionable market insights.
  • GDPR & EU Compliance
  • Employee Experience
  • Customer Journey
  • Synthetic Data
  • About us
  • Executive Team
  • In the news
  • Testimonials
  • Advisory Board
  • Careers
  • Brand
  • Media Kit
  • Contact Us

QuestionPro in your language

  • English
  • Español (Spanish)
  • Português (Portuguese (Brazil))
  • Nederlands (Dutch)
  • العربية (Arabic)
  • Français (French)
  • Italiano (Italian)
  • 日本語 (Japanese)
  • Türkçe (Turkish)
  • Svenska (Swedish)
  • Hebrew IL (Hebrew)
  • ไทย (Thai)
  • Deutsch (German)
  • Portuguese de Portugal (Portuguese (Portugal))
  • Español / España (Spanish / Spain)

Awards & certificates

  • survey-leader-asia-leader-2023
  • survey-leader-asiapacific-leader-2023
  • survey-leader-enterprise-leader-2023
  • survey-leader-europe-leader-2023
  • survey-leader-latinamerica-leader-2023
  • survey-leader-leader-2023
  • survey-leader-middleeast-leader-2023
  • survey-leader-mid-market-leader-2023
  • survey-leader-small-business-leader-2023
  • survey-leader-unitedkingdom-leader-2023
  • survey-momentumleader-leader-2023
  • bbb-acredited
The Experience Journal

Find innovative ideas about Experience Management from the experts

  • © 2022 QuestionPro Survey Software | +1 (800) 531 0228
  • Sitemap
  • Privacy Statement
  • Terms of Use