As‑If by Instrument: What IV Really Buys You
Up to now, the “as‑if random” story has been about what you can do with rich pre‑treatment information. You match, you weight, you regress, and you hope that—after conditioning on the right variables, the remaining differences in treatment are determined by luck. But sometimes you simply do not believe that conditioning will ever be enough. Motivation, ability, networks, expectations, local enforcement, health, bargaining power: these are precisely the forces that drive both treatment and outcomes, and they are either unmeasured or badly measured. In those situations, there is a different route to “as‑if”: you stop trying to explain treatment away with controls, and instead look for a source of variation that moves treatment for reasons unrelated to the outcome. That is the idea of an instrument.
Instrumental variables (IV) is often taught as a two‑stage trick. That is a misleading. IV is not a trick. It is a very specific claim about a mechanism. If it is credible, it buys you something powerful: a causal effect identified by a piece of variation that behaves like an assignment device. If it is not credible, it buys you very little besides a false sense of rigor. The goal of this post is to make clear what IV really buys you, and what it does not.
Suppose you would like to estimate the effect of education on wages. You run a regression of wages on schooling and some controls. The worry is not that the equation is wrong algebraically; the worry is that schooling is chosen. People with higher ability, better family background, or stronger ambition tend to get more education and also earn more even without extra schooling. Those forces sit in the “leftover” part of the wage equation and are correlated with education. Hence, the regression’s leftover term is linked to treatment, and the treatment coefficient mixes the effect of schooling with the effect of those unobservables.
IV enters with a different thought experiment. Instead of trying to measure all the forces that make education endogenous, you look for a variable that changes education but is otherwise unrelated to wages except through education. If you can find such a variable and defend it, then the variation in education that comes from this behaves as if it were assigned. You can then estimate the effect of education using only that “as‑if assigned” component.
This is the core promise of IV: it tries to reintroduce an assignment‑like mechanism into an observational world.
An instrument is a variable that does two jobs at once.
First, it must move treatment. If the nudge does not actually change who takes the programme or who gets more schooling, it cannot reveal anything about the treatment’s effect. In practice this is called relevance, but the plain meaning is: if the instrument barely changes treatment, then you are trying to estimate a causal effect using almost no real variation.
Second, the instrument must be as‑good‑as random with respect to the outcome, once you account for basic pre‑treatment factors. More precisely, the instrument must affect the outcome only through its effect on treatment, not through any other channel that reaches the outcome. This is the famous exclusion restriction, but again the plain meaning is: the IV should not have its own direct effect on wages, nor should it be a marker for unmeasured wage‑relevant factors.
When both claims are credible, IV creates a very specific comparison. It compares people whose treatment status differs because of the IV, not because of their unobserved characteristics.
IV buys you a causal effect for the people whose treatment choice is actually changed by the instrument, using only the variation in treatment generated by that instrument.
Notice that IV does not, in general, identify “the” average causal effect for everyone. It identifies an effect for a particular group, often called compliers, and that group depends on the instrument.
This is not a technical detail. It is the central interpretation.
Consider a policy that increases the distance cost of attending university by opening or closing a local campus. Some people would go to university regardless. Some people would not go regardless. Some people are on the margin: they go when the campus is nearby and do not go when it is far. IV learns about the return to education from this marginal group, because they are the ones whose education is actually changed by the instrument.
That is what IV “sees.” It does not see the return to education for the always‑go group, because their education does not change with the instrument. It does not see the return for the never‑go group, for the same reason. IV estimates the effect for those who respond to the instrument. The estimate is “local” to the instrument, not universal.
Once you accept this, many debates about IV become easier. The right question is not “is IV estimating the average treatment effect?” but “who are the compliers in this setting, and do I care about their effect?” In policy terms, that is often exactly what you want: a reform changes behaviour only for some people, and the relevant effect is the effect for those whose behaviour actually changes.
IV is often presented as a way to escape modelling. In reality, IV requires a model, just not the same model as regression adjustment.
To defend an instrument, you must be able to tell a coherent story about why the instrument shifts treatment and why it does not reach the outcome through other channels. That story is always context‑dependent. Distance to a college is more plausible as an instrument when it mainly affects the cost of attendance, and less plausible when distance is also a proxy for local labour markets, school quality, family sorting, or regional economic opportunities. A draft lottery number is more plausible when it changes military service assignment, and less plausible when it also changes later schooling decisions, migration, or health in ways not mediated by the treatment you claim to study.
The point is not that these instruments are “good” or “bad” in the abstract. The point is that the exclusion claim is a mechanism claim, and it lives or dies on institutional details. Without a model of the world (i.e. who sorts where, what channels exist, what timing is plausible) you cannot defend the idea that the instrument is as‑good‑as random with respect to the outcome.
“As‑if” is always a mechanism statement. IV simply changes which mechanism you must defend.
Three failures that matter.
The first failure is the obvious one: the instrument does not really move treatment. When the first stage is weak, IV becomes unstable. Estimates jump wildly across specifications, standard errors blow up, and, more importantly, small violations of the exclusion story can generate large distortions. Weak instruments make IV fragile. They do not merely reduce precision; they can invalidate the comforting intuition that “IV fixes endogeneity.”
The second failure is the subtle one: the instrument reaches the outcome through channels other than treatment. This can happen in mundane ways. A policy rule used as an instrument might also change expectations, prices, or local labour demand. A geographic instrument might also capture regional differences that directly influence the outcome. A scheduling rule might affect who applies, not just who gets treated. These are not technical quibbles; they are alternative causal pathways. If they exist, the IV estimate is no longer an effect of treatment alone.
The third failure is interpretational: even when IV is valid, people talk about it as if it were estimating a general effect for everyone. It rarely is. The estimate is tied to the instrument and to the group of people whose treatment status the instrument changes. If you report an IV estimate without saying who it pertains to, you are asking the reader to pretend that treatment effects are the same for everyone or that the complier group is “representative.” That is an additional assumption, and it is often untrue.
It is tempting to say: first regress treatment on the instrument, then regress outcome on the predicted treatment. This is useful as a computational recipe, but it is not the conceptual content. The conceptual content is that the instrument extracts the component of treatment variation that is “as‑if assigned,” and then asks how the outcome moves with that component.
The danger of focusing on the recipe is that it encourages mechanical behaviour: “I need an IV; I will find something correlated with treatment.” Correlation is not enough. The exclusion story is the whole game.
Matching, weighting, and regression adjustment try to make treatment “as‑if random” by conditioning on observed pre‑treatment information. IV tries to make treatment “as‑if random” by isolating a component of treatment variation generated by an external nudge. Both approaches are attempts to restore a comparison that resembles random assignment, but they rely on different credibility pillars.
When conditioning is plausible because you measured the right drivers and have overlap, the selection‑on‑observables road can be persuasive and transparent. When conditioning is implausible because crucial drivers are unobserved or because treatment selection is deeply endogenous, IV can be the cleaner road (if you can defend the instrument).
In practice, the best papers treat IV as part of a broader argument. They show that the instrument truly shifts treatment; they make the exclusion story plausible with institutional details and auxiliary checks; they describe who the compliers are; and they avoid claiming more than the design supports.