
Machine Learning–Augmented ETL Pipelines for Fraud–Resistant Insurance Claims Processing
Kawaljeet Singh Chadha , University of the Cumberlands, Williamsburg, KY, USAAbstract
The insurance industry is also affected by insurance fraud, which incurs massive financial losses and operational inefficiencies. Current fraud detection methods tend to be based on rule-based systems and static Extract, Transform, Load (ETL) pipelines, which are unable to keep up with the pace of rapidly evolving fraud tactics. However, these conventional approaches exhibit high false-positive rates, limited flexibility, and cannot perform real-time analysis, causing delayed detection and increased operational costs. This article describes the integration of machine learning (ML) techniques into Extract, Transform, and Load (ETL) pipelines to facilitate real-time, data-driven fraud identification during insurance claims processing. This system features embedded supervised machine learning classifiers within the ETL workflow, enabling dynamic analysis of claims data during ingestion and transformation. Temporal behavior modelling, behavior modelling, and external data source enrichment, co-enabled with fraud auto-registry, will allow the system to improve the detection of complex behaviors over time. Scalability and near real-time processing are supported by the pipeline orchestration, resulting in timely fraud risk scoring. The results of experiments demonstrate that the proposed methods yield a significant improvement in detection accuracy and latency reduction compared to traditional methods. By incorporating dimensionality reduction techniques, further optimization of model performance can be achieved. With this approach, claims processing can effectively evolve in lockstep with dynamic and ever-changing scales, adapting without impacting efficiency and resiliency. Ultimately, an ML-augmented ETL pipeline is proposed, which provides insurers with a powerful tool for reducing fraud losses while maintaining agility and compliance.
Keywords
Insurance fraud detection, Machine learning, ETL pipeline, temporal behavior modeling, Real-time fraud scoring.
References
Abakarim, Y., Lahby, M., & Attioui, A. (2018, October). An efficient real time model for credit card fraud detection based on deep learning. In Proceedings of the 12th international conference on intelligent systems: theories and applications (pp. 1-7). https://dl.acm.org/doi/abs/10.1145/3289402.3289530
Bello, H. O., Ige, A. B., & Ameyaw, M. N. (2024). Adaptive machine learning models: concepts for real-time financial fraud prevention in dynamic environments. World Journal of Advanced Engineering Technology and Sciences, 12(02), 021-034. https://doi.org/10.30574/wjaets.2024.12.2.0266
Beteto, A., Melo, V., Lin, J., Alsultan, M., Dias, E. M., Korte, E., ... & Lambert, J. H. (2022). Anomaly and cyber fraud detection in pipelines and supply chains for liquid fuels. Environment Systems and Decisions, 42(2), 306-324. https://link.springer.com/article/10.1007/s10669-022-09843-5
Chavan, A. (2022). Importance of identifying and establishing context boundaries while migrating from monolith to microservices. Journal of Engineering and Applied Sciences Technology, 4, E168. http://doi.org/10.47363/JEAST/2022(4)E168
Chavan, A. (2023). Managing scalability and cost in microservices architecture: Balancing infinite scalability with financial constraints. Journal of Artificial Intelligence & Cloud Computing, 2, E264. http://doi.org/10.47363/JAICC/2023(2)E264
Crotty, J., & Horrocks, I. (2017). Managing legacy system costs: A case study of a meta-assessment model to identify solutions in a large financial services company. Applied computing and informatics, 13(2), 175-183. https://doi.org/10.1016/j.aci.2016.12.001
Darville, J., Yavuz, A., Runsewe, T., & Celik, N. (2023). Effective sampling for drift mitigation in machine learning using scenario selection: A microgrid case study. Applied Energy, 341, 121048. https://doi.org/10.1016/j.apenergy.2023.121048
Dhanagari, M. R. (2024). MongoDB and data consistency: Bridging the gap between performance and reliability. Journal of Computer Science and Technology Studies, 6(2), 183-198. https://doi.org/10.32996/jcsts.2024.6.2.21
Dhanagari, M. R. (2024). Scaling with MongoDB: Solutions for handling big data in real-time. Journal of Computer Science and Technology Studies, 6(5), 246-264. https://doi.org/10.32996/jcsts.2024.6.5.20
Drakesmith, M., Caeyenberghs, K., Dutt, A., Lewis, G., David, A. S., & Jones, D. K. (2015). Overcoming the effects of false positives and threshold bias in graph theoretical analyses of neuroimaging data. Neuroimage, 118, 313-333. https://doi.org/10.1109/ACCESS.2019.2945930
Elmes, A., Alemohammad, H., Avery, R., Caylor, K., Eastman, J. R., Fishgold, L., ... & Estes, L. (2020). Accounting for training data error in machine learning applied to earth observations. Remote Sensing, 12(6), 1034. https://doi.org/10.3390/rs12061034
Fursov, I., Kovtun, E., Rivera-Castro, R., Zaytsev, A., Khasyanov, R., Spindler, M., & Burnaev, E. (2022). Sequence embeddings help detect insurance fraud. IEEE Access, 10, 32060-32074. https://doi.org/10.1109/ACCESS.2022.3149480
Goel, G., & Bhramhabhatt, R. (2024). Dual sourcing strategies. International Journal of Science and Research Archive, 13(2), 2155. https://doi.org/10.30574/ijsra.2024.13.2.2155
Hardy, B., Mohoric, T., Exner, T., Dokler, J., Brajnik, M., Bachler, D., ... & Athar, A. (2024). Knowledge infrastructure for integrated data management and analysis supporting new approach methods in predictive toxicology and risk assessment. Toxicology in Vitro, 100, 105903. https://doi.org/10.1016/j.tiv.2024.105903
Kalluri, K. (2022). Optimizing Financial Services Implementing Pega's Decisioning Capabilities for Fraud Detection. International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences, 10(1), 1-9.
Karwa, K. (2024). The future of work for industrial and product designers: Preparing students for AI and automation trends. Identifying the skills and knowledge that will be critical for future-proofing design careers. International Journal of Advanced Research in Engineering and Technology, 15(5). https://iaeme.com/MasterAdmin/Journal_uploads/IJARET/VOLUME_15_ISSUE_5/IJARET_15_05_011.pdf
Khurana, R. (2020). Fraud detection in ecommerce payment systems: The role of predictive ai in real-time transaction security and risk management. International Journal of Applied Machine Learning and Computational Intelligence, 10(6), 1-32. https://neuralslate.com/
Konneru, N. M. K. (2021). Integrating security into CI/CD pipelines: A DevSecOps approach with SAST, DAST, and SCA tools. International Journal of Science and Research Archive. Retrieved from https://ijsra.net/content/role-notification-scheduling-improving-patient
Kumar, A. (2019). The convergence of predictive analytics in driving business intelligence and enhancing DevOps efficiency. International Journal of Computational Engineering and Management, 6(6), 118-142. Retrieved from https://ijcem.in/wp-content/uploads/THE-CONVERGENCE-OF-PREDICTIVE-ANALYTICS-IN-DRIVING-BUSINESS-INTELLIGENCE-AND-ENHANCING-DEVOPS-EFFICIENCY.pdf
Lepri, B., Oliver, N., Letouzé, E., Pentland, A., & Vinck, P. (2018). Fair, transparent, and accountable algorithmic decision-making processes: The premise, the proposed solutions, and the open challenges. Philosophy & Technology, 31(4), 611-627. https://link.springer.com/article/10.1007/S13347-017-0279-X
Machireddy, J. R. (2024). Integrating Machine Learning-Driven RPA with Cloud-Based Data Warehousing for Real-Time Analytics and Business Intelligence. Hong Kong Journal of AI and Medicine, 4(1), 98-121. https://hongkongscipub.com/
Misiura, A. (2015). Enterprise risk management in the airline industry-risk management structures and practices (Doctoral dissertation, Brunel University London). http://bura.brunel.ac.uk/handle/2438/11087
Mittal, S., & Tyagi, S. (2019, January). Performance evaluation of machine learning algorithms for credit card fraud detection. In 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (pp. 320-324). IEEE. https://doi.org/10.1109/CONFLUENCE.2019.8776925
Mori, T., & Uchihira, N. (2019). Balancing the trade-off between accuracy and interpretability in software defect prediction. Empirical Software Engineering, 24, 779-825. https://link.springer.com/article/10.1007/s10664-018-9638-1
Nelson, J., & Temple, S. (2020, April). MLOps Framework for Continuous Integration and Deployment.
Njoku, D. O., Iwuchukwu, V. C., Jibiri, J. E., Ikwuazom, C. T., Ofoegbu, C. I., & Nwokoma, F. O. (2024). Machine learning approach for fraud detection system in financial institution: A web base application. Machine Learning, 20(4), 01-12.
Nyati, S. (2018). Transforming telematics in fleet management: Innovations in asset tracking, efficiency, and communication. International Journal of Science and Research (IJSR), 7(10), 1804-1810. Retrieved from https://www.ijsr.net/getabstract.php?paperid=SR24203184230
Olayinka, O. H. (2021). Big data integration and real-time analytics for enhancing operational efficiency and market responsiveness. Int J Sci Res Arch, 4(1), 280-96. https://doi.org/10.30574/ijsra.2021.4.1.0179
Pillai, V. (2022). Anomaly Detection for Innovators: Transforming Data into Breakthroughs. Libertatem Media Private Limited.
Raju, R. K. (2017). Dynamic memory inference network for natural language inference. International Journal of Science and Research (IJSR), 6(2). https://www.ijsr.net/archive/v6i2/SR24926091431.pdf
Reddy, G. T., Reddy, M. P. K., Lakshmanna, K., Kaluri, R., Rajput, D. S., Srivastava, G., & Baker, T. (2020). Analysis of dimensionality reduction techniques on big data. Ieee Access, 8, 54776-54788. https://doi.org/10.1109/ACCESS.2020.2980942
Sahin, E. K. (2020). Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest. SN Applied Sciences, 2(7), 1308. https://link.springer.com/article/10.1007/s42452-020-3060-1
Sardana, J. (2022). Scalable systems for healthcare communication: A design perspective. International Journal of Science and Research Archive. https://doi.org/10.30574/ijsra.2022.7.2.0253
Sarma, W., Nagavalli, S. P., & Sresth, V. (2020). Leveraging AI-Driven Algorithms to Address Real-World Challenges in E-Commerce: Enhancing User Experience, Fraud Detection, and Operational Efficiency. INTERNATIONAL JOURNAL OF RESEARCH AND ANALYTICAL REVIEWS, 7, 2348-1269. http://www.ijrar.org/
Sartzetaki, M., Karagkouni, A., & Dimitriou, D. (2023). A conceptual framework for developing intelligent services (a platform) for transport enterprises: The designation of key drivers for action. Electronics, 12(22), 4690. https://doi.org/10.3390/electronics12224690
Singh, V. (2022). Visual question answering using transformer architectures: Applying transformer models to improve performance in VQA tasks. Journal of Artificial Intelligence and Cognitive Computing, 1(E228). https://doi.org/10.47363/JAICC/2022(1)E228
Singh, V. (2023). Enhancing object detection with self-supervised learning: Improving object detection algorithms using unlabeled data through self-supervised techniques. International Journal of Advanced Engineering and Technology. https://romanpub.com/resources/Vol%205%20%2C%20No%201%20-%2023.pdf
Sukhadiya, J., Pandya, H., & Singh, V. (2018). Comparison of Image Captioning Methods. INTERNATIONAL JOURNAL OF ENGINEERING DEVELOPMENT AND RESEARCH, 6(4), 43-48. https://rjwave.org/ijedr/papers/IJEDR1804011.pdf
Van Rijn, J. N., & Hutter, F. (2018, July). Hyperparameter importance across datasets. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2367-2376). https://dl.acm.org/doi/abs/10.1145/3219819.3220058
Yarram, S., & Bittla, S. R. (2023). Predictive Test Automation: Shaping the Future of Quality Engineering in Enterprise Platforms. Available at SSRN 5132329. https://ssrn.com/abstract=5132329
Article Statistics
Downloads
Copyright License
Copyright (c) 2025 Kawaljeet Singh Chadha

This work is licensed under a Creative Commons Attribution 4.0 International License.