Articles
| Open Access | Integrating Probability and Nonprobability Survey Samples for Robust Population Inference: Theoretical Foundations, Methodological Innovations, and Practical Implications
Dr. Alejandro M. Ríos , Department of Statistics and Data Science, Universidad Nacional de Córdoba, ArgentinaAbstract
The rapid expansion of digital data sources, online panels, and administrative records has profoundly transformed the landscape of survey research. Traditional probability sampling, long regarded as the gold standard for population inference, is increasingly complemented or even supplanted by nonprobability samples due to cost, timeliness, and operational constraints. However, nonprobability samples pose substantial challenges for valid statistical inference, primarily because of unknown selection mechanisms and systematic selection biases. This article develops an extensive theoretical and methodological examination of data integration strategies that combine probability and nonprobability samples to support robust population-level inference. Drawing strictly on foundational and contemporary literature in survey statistics, the study synthesizes classical sampling theory with modern approaches such as mass imputation, propensity score weighting, doubly robust estimation, and statistical learning–based adjustments. The article elaborates on the conceptual underpinnings of these methods, the assumptions required for their validity, and the practical consequences of assumption violations, particularly focusing on common support, ignorability, and nonresponse mechanisms. Using the National Health and Nutrition Examination Survey as a conceptual reference framework, the paper explores how probability samples can serve as calibration anchors for integrating rich but biased nonprobability data. Rather than presenting numerical results, the analysis emphasizes interpretive insights, methodological trade-offs, and inferential implications. The discussion critically evaluates the limits of existing methods, highlighting the persistent risks of overconfidence in hybrid estimators and the need for transparency in uncertainty assessment. The article concludes by outlining future research directions, including the integration of machine learning with survey theory and the development of principled diagnostics for assessing inferential validity. Overall, this work provides a comprehensive, publication-ready contribution to the evolving field of survey data integration.
Keywords
Nonprobability samples, probability sampling, data integration
References
Beaumont, J. F., & Rao, J. (2021). Pitfalls of making inferences from non-probability samples: Can data integration through probability samples provide remedies? The Survey Statistician, 83, 11–22.
Bethlehem, J. (2016). Solving the nonresponse problem with sample matching? Social Science Computer Review, 34(1), 59–77.
Centers for Disease Control and Prevention. (2015–2020). NHANES – National Health and Nutrition Examination Survey.
Chen, S., Yang, S., & Kim, J. K. (2022). Nonparametric mass imputation for data integration. Journal of Survey Statistics and Methodology, 10(1), 1–24.
Chen, Y., Li, P., & Wu, C. (2020). Doubly robust inference with nonprobability survey samples. Journal of the American Statistical Association, 115(532), 2011–2021.
Dever, J. (2018). Combining probability and nonprobability samples to form efficient hybrid estimates: An evaluation of the common support assumption. Proceedings of the Federal Committee on Statistical Methodology Research Conference, 1–15.
Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. The Annals of Mathematical Statistics, 35(4), 1491–1523.
Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260), 663–685.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer.
Kalay, A. F. (2021). Double robust mass-imputation with matching estimators.
Kern, C., Li, Y., & Wang, L. (2021). Boosted kernel weighting–using statistical learning to improve inference from nonprobability samples. Journal of Survey Statistics and Methodology, 9(5), 1088–1113.
Kim, J. K., Park, S., Chen, Y., & Wu, C. (2021). Combining non-probability and probability survey samples through mass imputation. Journal of the Royal Statistical Society: Series A, 184(3), 941–963.
Lee, B. K., Lessler, J., & Stuart, E. A. (2011). Weight trimming and propensity score weighting. PLoS ONE, 6(3), e18174.
Article Statistics
Downloads
Copyright License
Copyright (c) 2026 Dr. Alejandro M. Ríos

This work is licensed under a Creative Commons Attribution 4.0 International License.