RCTs, surveys, and the search for causality — without jargon

Published

February 16, 2026

The causal question in applied work is disarmingly simple: what would have happened to the same people under a different action? We never observe both realities for the same unit, so we need a device that makes the treated world and the untreated world comparable. A randomised controlled trial (RCT) does exactly that. By allocating treatment by chance via a lottery, a coin flip, or a pre‑registered algorithm we ensure that, on average, the treated and the control groups are alike in everything except the treatment itself. This is true for the variables we can see and, crucially, for those we cannot. When such an assignment mechanism is in place and followed, any systematic difference in outcomes between the two groups can be attributed to the treatment. That is what people mean when they say RCTs identify causality “by construction”: the counterfactual comparison is built into the assignment rule rather than reconstructed after the fact.

Uncertainty in RCTs has a precise interpretation. The p‑values and confidence intervals reported in an RCT do not depend on whether the realised split looks “balanced” to our eyes; they are statements about the many other assignments we could have made but did not. If we were to re‑run the same randomisation repeatedly, the procedure would behave in a predictable way: sometimes we would be lucky, sometimes not, but the long‑run properties are known. That is why an apparently lopsided allocation on some covariates does not, by itself, destroy validity. It is simply one draw from a known mechanism.

This does not mean RCTs are magic or painless. Non‑compliance (when those assigned to treatment do not take it) changes which effect you are estimating. You can still report the intention‑to‑treat effect, which is the effect of being offered the program, and, using the random assignment as an instrument, you can recover the effect for those who comply with the offer; but you should say so explicitly. Attrition can distort the effective assignment if outcomes go missing in a way related to treatment and prognosis; here transparency and simple sensitivity analyses already help. Spillovers mean that one unit’s treatment changes another unit’s outcome; in that case, the usual treated‑vs‑control contrast no longer equals the effect you had in mind, and cluster‑level designs or alternative estimator are more appropriate. Finally, effects are rarely constant across people. What an RCT gives you is an average over those who entered the trial, so it is good practice to say who they are.

Surveys live in a different part of the world. A high‑quality probability survey randomises sampling, not assignment. It gives each member of a defined population a known chance of being included. This is perfect for describing that population: means, proportions, distributions and changes over time come with uncertainty statements that have the same long‑run interpretation as in RCTs, but now the imagined re‑runs are the many samples we could have drawn but did not. What a survey does not give you “for free” is causality. No one inside the survey was randomly assigned to policy A rather than policy B, so the missing counterfactual has to be reconstructed with design and assumptions.

It helps to anchor this discussion in the linear model most students carry around. Write the outcome as a constant plus a coefficient on treatment, a bundle of pre‑treatment controls, and an error term:

\[ \text{Outcome} \;=\; \alpha + \tau\,\text{Treatment} + \beta'\,\text{Controls} + \text{error}. \]

The object of interest is \(\tau\), the average difference caused by the treatment. Ordinary least squares will estimate \(\tau\) correctly when two things are true. First, after you condition on the right pre‑treatment variables, the remaining error term is not related to who the treatment received. Second, there is genuine overlap: for the kinds of people you observe, some did and some did not receive treatment. Random assignment guarantees both conditions by design. In observational data the treatment is often chosen or targeted, so it tends to move with unmeasured factors that also affect the outcome such motivation, ability, need, policy targeting, violating the first condition. In that case, the treatment coefficient absorbs part of those other influences, and the estimate is biased. The familiar omitted‑variable story is a useful intuition: if a genuine driver of the outcome is left out and that driver is correlated with treatment, the “treatment effect” is dragged towards the influence of that missing driver. Larger samples do not fix this; they make you precisely wrong.

Because we cannot randomise assignment inside a survey, the strategy is to recreate the consequences of randomisation as best we can. The aim is simple to state: make treatment status behave as if random once we condition on credible pre‑treatment variables. If we succeed, the error term is no longer linked to treatment and the linear comparison isolates the effect of interest. There are several ways to pursue this aim. Regression adjustment includes the pre‑treatment variables that jointly predict who gets treatment and the outcome, allowing flexible forms and interactions where needed. Matching, stratification and propensity scores compare like with like by building treated and untreated groups with similar profiles; this attacks the link between treatment and the error by balancing what we can observe. Inverse‑probability weighting re‑weights observations by the inverse of their estimated treatment probability to create a pseudo‑sample where treated and untreated look comparable. Instrumental variables tackle unobserved confounding by using a variable that nudges treatment but cannot affect the outcome except through that treatment—draft lotteries and deterministic cut‑offs are classic examples. In linear settings, two‑stage least squares then recovers a local effect for those whose treatment status actually moves with the instrument, and it is important to report that local nature. Regression discontinuity uses discontinuous assignment rules to form a comparison that is almost random in a narrow window around the cut‑off. Differences‑in‑differences exploits before‑after contrasts between treated and comparison groups, under the idea that, absent the treatment, their trends would have been parallel.

All of these methods amount to the same conceptual move: after what I have done, the remaining treatment variation behaves as if random. In the linear model, this is exactly the condition OLS needs. But none of these is a mechanical checklist; each requires a story about the world. Even a sketchy causal model is better than none. You should be clear about which variables are genuine pre‑treatment drivers of both treatment and outcome and therefore belong in the analysis, and which variables are downstream consequences of treatment and therefore should be left out. Adjusting for consequences of treatment (the “bad controls” problem) can wash away part of the very effect you want to measure. The choice of functional form is critical. If the real-world relationship is curved or involves interactions, forcing it into a linear model can inadvertently introduce bias. The “missing” structure, the part the model failed to capture, gets pushed into the error term. If that unmodeled structure correlates with your treatment, you have essentially reintroduced the very confounding you were trying to eliminate.

The difference between RCTs and surveys, then, can be said in one breath. RCTs place randomness where causality lives: in the assignment. That is why treated and control groups are comparable by design and why the simple difference is interpreted as a cause for the trial’s participants. Surveys place randomness where description lives: in the sampling. That is why they are excellent for population facts but do not deliver causal contrasts on their own. To talk about causes with survey or other observational data, you need to create or find treatment variation that is as‑if random given a defensible set of pre‑treatment variables, or you need to lean on a clear design that mimics randomisation. In the linear model, that means ensuring the treatment is unrelated to the regression’s leftover error after you control for the right, truly pre‑treatment variables, and that there is genuine overlap. Without a model in mind these conditions rarely hold by accident. Controls and big data will not save you.