Positivity: what it is and why it matters for data science
A causal inference explainer
Jan 26, 2019 · 5 min read
If you are interested in understanding causal effects from data, then you need to understand the concept of positivity. This article explains what this means and why it’s important.
First, what it’s not: the type of positivity I’m going to talk about is not a positive mental attitude!
The technical definition of positivity is that the probability of having a particular level of exposure, conditional on your covariates, is greater than 0 and less than 1, for all strata and exposure levels of interest:
But what does that actually mean?
In simple English, positivity means that if you want to compare two types of exposures, then you have to have people in your data who are able to & sometimes will receive any or all of the relevant exposure values!
This is a fundamental requirement for causal inference, but it’s also just plain common sense. Think about it: You can’t compare apples to oranges, if you’ve only ever seen apples! Positivity tells us that if we want to compare apples and oranges we should talk to people who are able to eat both apples and oranges.
Now that we know what positivity is, what are the implications? When we want to understand causal effects, we need to check for potential positivity violations because if we don’t have positivity then we can’t compare the exposure levels.
There are two types of violations which can happen:
Random non-positivity happens when by chance or because of a small sample size some types of people in your study are only exposed or only unexposed.
Random non-positivity is most likely for strata of continuous variables. For eg, enroll 45–65 year olds but by chance recruit only four 52 year olds, none of whom are/get exposed. Luckily, for continuous vars, it’s usually OK to borrow info from similar people (i.e. interpolation).
Structural non-positivity is tricker, because in your data it can look the same but the meaning is different. Imagine that in the scenario above the exposure of interest was “is on a 2019 50-under-50 list” vs “is not on a 2019 50-under-50 list”. But 52 year olds can’t be exposed because they aren’t eligible for 50-under-50 lists. That means that there is structural non-positivity.
When we have structural non-positivity, the causal effect in that group is meaningless because they can’t ever (or will always) have exposure.
The solution is to exclude them from our data & inference entirely! For example, to learn about how being included on a 50-under-50 list affects income the following year, we should only include people currently under 50!
How does this play out in designing studies?
If we are going to conduct a randomized experiment, like a randomized controlled trial or an A/B test, the random assignment process performs 2 important functions. The first, and most commonly discussed, is that it removes confounding. The second is that it ensures positivity: everyone has a chance of being assigned to all exposures because you get to choose the probability of each exposure level.
Importantly, this is true even if we don’t assign people in a 1:1 ratio. We still have positivity if we assign twice as many people to treatment compared to control, or four times as many people to scenario A as to scenario B.
But we don’t have positivity if we assign everyone to a single exposure level, as is sometimes done in so-called “single arm” trials (which, I argue aren’t really trials, don’t @ me!). We also don’t have positivity if we compare pre- and post-experiment levels within each randomly assigned group, which is often how “placebo effects” are estimated.
So, one reason experiments work is because they have positivity. But, just like with confounding, random assignment only guarantees positivity at baseline.
When we have sustained treatments, like medication use or continuing to visit a website over time, we can get post-randomization positivity violations!
For example, imagine we want to look at the use of statins over time versus no use ever. At baseline, we enroll people with no contraindications (i.e. they have no medical conditions that would stop a doctor from giving them statins) and assign them to either the statins group or no statins group.
People with contraindications are excluded before randomization and can’t be in either the treatment or control arm. But, life happens, and some people will develop contraindications over followup. If we just compare the outcome (dependent variable) between the two groups — called the intention-to-treat analysis or ITT — then post-randomization positivity violations are fine because we had positivity at the time of randomization, and the ITT is the effect of randomization.
But if we want to estimate the effect of actually using statins over time, we need to build in rules for how to handle people who develop contraindications after randomizaiton.
If everyone who develops a contraindication stops statins, we have structural non-positivity for statins among people with contraindications. In this case, we probably want to exclude people after they develop complications, but this forces us to think carefully about how we are defining our comparisons.
For example, it wouldn’t make sense to estimate a per-protocol effect for “continuous use of statins even if contraindications develop” vs “no statins ever”.
On the other hand, we could probably compare “take statins unless a contraindication develops” vs “no statins except if strong indication occurs”. This would let us include those people who stop statins after contraindications because they are still following the strategy we’re interested in.
So to recap, for experiments we have positivity for the intention-to-treat analysis and for some but not all definitions of the effect of exposure (called the ‘per-protocol effect’). What about observational studies?
Unlike in an experiment, in an observational study we don’t necessarily have positivity at baseline because we don’t get to control who gets which exposure level, and we need to worry about positivity over follow-up.
So, we need to add two things to our design when we are looking at data that doesn’t come from an experiment:
(1) when groups can’t ever be, or will always be, exposed at baseline, we should exclude them from our study & target pops.
(2) when people enter these groups over follow-up, we should excuse them or specify a rule in our exposure definition.
So, in summary, positivity in causal inference means we only assess causal effects in people who are eligible for all levels of exposure we care about. Anyone who would always or never get the exposure should not be included in our study or our target pop.
Positivity means that if we want to understand causal effects of an exposure we should only include people who have a chance of having every level of exposure we care about. For example, if we always treat pregnant women, then we should not include these women in our study. Similarly if we never treat those who are allergic, we shouldn’t include them either. But if we sometimes treat people who are in pain and sometimes don’t, that’s the right set of people to look at!
Thanks for reading! This article started life as a twitter thread filled with gifs, which you can read here: positivity #tweetorial .
If you want to know more about causal inference, follow me on here and on Twitter Ellie Murray . I tweet and blog about methods for causal inference that can help you make better data-informed decisions.