This simulation study explores whether formal sampling strategies for selecting districts and schools improve the generalizability of impact evidence from experimental studies. Specifically, the simulation study evaluates a hypothetical intervention targeting K–5 schools. The authors constructed a national target population of schools from the Common Core of Data and generated simulated impacts of the intervention for the entire population. From this population, the authors selected a sample of districts and schools, simulated district and school decisions about whether to participate, and simulated replacing districts and schools that decline to participate. The authors then calculate the average school-level impact for the resulting sample of schools and compare it to the average impact for the target population. The simulation repeats this procedure many times, each time selecting a different sample.
The selection strategies the authors tested include: (1) a stylized approach that recruits districts and schools in order from largest to smallest; (2) random selection with probabilities proportional to district size, as used in some surveys; and (3) balanced selection, which prioritizes the most typical districts and schools based on their characteristics. The authors tested all combinations of these three approaches for both districts and schools.
The study finds that random selection of districts with either balanced or random selection of schools produced samples with the most consistently strong generalizability. The study also explores recruiting burden, selecting replacement districts when using random selection, and sensitivity of the findings to simulation parameters.