Design‑ versus model‑based inference

Econometrics has two main ways to justify what we say with data. One is to lean on design: the way the data were collected or treatments were assigned supplies the guarantees. The other is to lean on a model: a set of assumptions about how the data were generated supplies the guarantees. Both routes can be rigorous. Both can fail. Being clear which road you are on—design‑based or model‑based—prevents a lot of mistakes.

Design‑based inference takes the sampling or assignment mechanism as the source of validity. In a probability survey, random sampling gives every unit in the population a known non‑zero chance to be included. In a randomised experiment, random assignment puts units into treatment and control group by chance. In both cases, uncertainty statements do not come from an assumed distribution for the data. They come from imagining the other samples or assignments one could have had but did not under the same procedure.

This way of thinking is concrete. If a labour‑force survey uses stratification and clustering with published selection probabilities, you can estimate a city‑wide employment rate and attach an interval whose long‑run coverage is guaranteed by the sampling design. If a training program is rationed by lottery, the simple difference in average outcomes between winners and losers is a causal effect for the applicants, and a permutation (randomisation) p‑value is available with no distributional assumptions. The numbers you report are tied to the actual mechanism that produced the dataset. If you can defend the mechanism, you can defend the inference.

Design has two obvious strengths. First, it keeps you honest about what the data represent. A probability sample lets you speak about a defined population. A randomised assignment lets you speak about a defined causal contrast. Second, its uncertainty statements are robust to the look and shape of the realised data. Odd samples and unbalanced covariates sometimes occur by chance. The guarantees are about the procedure across the paths not taken.

Design also has limits. You get what the design gives you, not what you wish you had. A beautifully randomised training trial among volunteers does not, by itself, tell you the effect for non‑volunteers. A carefully designed survey that never measured the right covariates will not let you adjust for them after the fact. Design can be extended and combined with modelling, but its core discipline is to say what randomised, and for which claims the randomisation is relevant.

Model‑based inference takes a different stance. Here one specifies a probabilistic description of how outcomes and covariates are generated—independence, identical distributions, functional forms, error structures, and this description is used to justify estimation, standard errors and tests. In a linear regression, for example, once one controls for a set of pre‑treatment variables, the remaining variation is unrelated to treatment. In time series one argues for stationarity and a particular error process. In hierarchical settings distributions are placed on unit‑level effects and borrow strength across groups.

When the model is well‑chosen for the question and the data, you can gain efficiency and answer queries that design alone cannot handle. You can interpolate between sparse cells, adjust for many covariates compactly, and make predictions for units that were not sampled at all.

But models are theoretical constracts. They are only as credible as their assumptions. If “treatment given covariates is as good as random” is wrong, the treatment coefficient inherits bias from omitted drivers. If the functional form is misspecified, the structure left out slides into the error term, and if that structure is related to the regressor of interest there is omitted variables bias. If one fits a neat normal‑errors model to heavy‑tailed outcomes, the resulting intervals are too optimistic. Model‑based inference is powerful precisely because it lets one go beyond what the design alone guarantees. The price is that one must guard the assumptions that power those steps.

It helps to see the contrast in a familiar setting. Suppose you have an randomised control trial (RCT) with a recorded randomisation protocol. A design‑based analysis can stop at the difference in means and a randomisation‑based p‑value. A model‑based analysis might run a regression of the outcome on treatment and a set of baseline covariates, add interactions to improve precision, and use robust standard errors. Both are defensible. The first gets validity from the assignment mechanism; the second still gets validity from the mechanism (randomisation ensures the key exogeneity condition) but uses a model to reduce noise.

Now flip the situation. You have an observational survey with rich covariates but no random assignment. A design‑based analysis can give you population averages and trends, with correct uncertainty, because the sampling was probabilistic. A model‑based causal analysis can go further using matching, regression adjustment, inverse‑probability weighting, instrumental variables. But its validity depends on whether, after all this, the remaining treatment variation behaves as if random. In plain linear language: after one includes truly pre‑treatment variables and respect overlap, is the leftover regression error unrelated to treatment? If yes, the treatment coefficient is interpretable as a causal effect. If no, the coefficient mixes treatment with other forces.

In both cases the same theme returns: design gives you ground to stand on; modelling lets you reach further. The stronger the design, the less fragile the modelling. The weaker the design, the more your modelling must carry and justify.

Design‑based inference often treats the population as finite and fixed. The only randomness comes from which units you sampled or how you assigned treatment. A 95% interval then refers to coverage across the samples or assignments your design could have produced. Model‑based inference typically imagines a super‑population: the data are a draw from an underlying stochastic process. A 95% interval then refers to repeated draws from that process under the model. The numbers “95%” look the same on the page but the thought experiment behind them is different. That difference is not pedantry. It determines which departures matter, which checks are relevant, and how you should generalise.

Most good empirical work is a blend. In experiments, analysts often report both a design‑based difference in means and a regression‑adjusted estimate that improves precision while respecting the randomisation. In surveys, analysts use design weights and replicate methods for uncertainty, then fit models that are design‑consistent (that is, they acknowledge the sampling scheme so that population claims remain valid). In observational causal questions, design thinking comes first: who could have been compared to whom, what overlap is there, what variation is plausibly as‑if random? Model thinking comes next: what functional form, which interactions, what robustness to alternative specifications?

A useful habit is to keep a “design first, model second” workflow, even when the final analysis is model‑heavy. Start by writing down the mechanism that put units into your dataset or into treatment. Which claims does that mechanism justify without further assumptions? Then write the smallest set of additional assumptions you need to make the next claim. Check whether those assumptions are visible in the data (overlap), plausible in the setting (timing, institutions), and stable across perturbations (alternative functional forms, different bandwidths, leave‑one‑cluster‑out). If a result flips under small, reasonable changes, you are probably asking the model to do the design’s job.

What goes wrong when you mix the logics?

The most common failure is to ignore design while doing model‑based analysis. Fitting an unweighted regression to complex survey data, for example, can change the target implicitly: you may be estimating a relationship for the sample you happened to observe rather than for the population the survey was built to represent, and your standard errors will usually be wrong if you forget stratification and clustering. A second failure is to over‑trust design while making claims the design does not support. A volunteer‑based RCT can identify a clean effect for the volunteers; it does not license a sweeping statement about everyone else without a transportability story. A third failure is to skip the identification step in observational work. Beautiful tables with robust standard errors cannot rescue a comparison that is not a causal contrast to begin with.

In summary. Design‑based inference ties your claims to the mechanism that brought data into your hands or assigned treatments. Model‑based inference ties your claims to a transparent description of how the data behave. The first keeps you anchored; the second lets you range further. Use design whenever you can, use models where you must, and make it obvious to the reader which is doing the heavy lifting. Above all, remember that your numbers are about the paths you could have seen but did not, either under the design you actually used or under the model you are willing to defend.