
Scalable Data Quality Frameworks for Record Keeper Aggregation in Financial Platforms - Proposes a framework to standardize and enhance the quality of financial datasets across heterogeneous record keepers
Santosh Durgam , Manager of software engineering, Morningstar Investments LLC Chicago, Illinois, USAAbstract
With the rapid development of financial services, recording and data aggregation need to be efficient so information from varying record keepers (banks, custodians, pension administrators) can be aggregated. The downside of various data formats in disparate datasets coming together into one unified place for the sake of being in one place is glaring in the data format, standards of how it will be reported, and lack of metadata. Inaccuracies, timeliness, and unreliability cause financial data to threaten business operations and compliance requirements. Based on financial datasets, it frames an aggregation framework for producing a scalable, standardized, and improved data quality aggregated dataset. Such data quality can be addressed by modular architecture, real-time validation, and centralized monitoring provided by the architecture.Using grace with the metadata-driven rule handling and automation via ETL pipelines to guarantee integrity and compliance with data, the framework also leverages the framework. A case study of a multi-manager pension platform using the proposed framework is further demonstrated, leading to improved data consistency, reporting timeliness, and reduction of reconciliation errors. The paper ends by discussing ethical issues, explaining how to practice the framework, and looking at two future trends employing AI for predictive error models, blockchain for data lineage and audibility, and how regulators can use RegTech to automate the reporting process with compliance. Considering all this, the above-proposed framework provides the perfect overall solution for financial institutions, fintech platforms, and asset managers to make the operation more efficient and build trust between financial data in the industry.
Keywords
Data Aggregation, Record Keepers, Data Quality Framework, Compliance, AI and ML, Blockchain.
References
Austin, C. C., Brown, S., Fong, N., Humphrey, C., Leahey, A., & Webster, P. (2016). Research data repositories: review of current features, gap analysis, and recommendations for minimum requirements. IASSIST quarterly, 39(4), 24-24.
Bhaskaran, S. V. (2020). Integrating data quality services (dqs) in big data ecosystems: Challenges, best practices, and opportunities for decision-making. Journal of Applied Big Data Analytics, Decision-Making, and Predictive Modelling Systems, 4(11), 1-12.
Chastin, S. F., Dontje, M. L., Skelton, D. A., Čukić, I., Shaw, R. J., Gill, J. M. R., ... & Dall, P. M. (2018). Systematic comparative validation of self-report measures of sedentary time against an objective measure of postural sitting (activPAL). International Journal of Behavioral Nutrition and Physical Activity, 15, 1-12.
Chavan, A. (2021). Eventual consistency vs. strong consistency: Making the right choice in microservices. International Journal of Software and Applications, 14(3), 45-56. https://ijsra.net/content/eventual-consistency-vs-strong-consistency-making-right-choice-microservices
Chavan, A. (2021). Exploring event-driven architecture in microservices: Patterns, pitfalls, and best practices. International Journal of Software and Research Analysis. https://ijsra.net/content/exploring-event-driven-architecture-microservices-patterns-pitfalls-and-best-practices
Chavan, A. (2022). Importance of identifying and establishing context boundaries while migrating from monolith to microservices. Journal of Engineering and Applied Sciences Technology, 4, E168. http://doi.org/10.47363/JEAST/2022(4)E168
Chrysafis, C., Collins, B., Dugas, S., Dunkelberger, J., Ehsan, M., Gray, S., ...&Shraer, A. (2019, June). Foundationdb record layer: A multi-tenant structured datastore. In Proceedings of the 2019 International Conference on Management of Data (pp. 1787-1802).
Dhanagari, M. R. (2024). MongoDB and data consistency: Bridging the gap between performance and reliability. Journal of Computer Science and Technology Studies, 6(2), 183-198. https://doi.org/10.32996/jcsts.2024.6.2.21
Donati, F., Aguilar-Hernandez, G. A., Sigüenza-Sánchez, C. P., de Koning, A., Rodrigues, J. F., & Tukker, A. (2020). Modeling the circular economy in environmentally extended input-output tables: Methods, software and case study. Resources, conservation and recycling, 152, 104508.
Gao, J., Xie, C., & Tao, C. (2016, March). Big data validation and quality assurance--issuses, challenges, and needs. In 2016 IEEE symposium on service-oriented system engineering (SOSE) (pp. 433-441). IEEE.
Gharaibeh, A., Salahuddin, M. A., Hussini, S. J., Khreishah, A., Khalil, I., Guizani, M., & Al-Fuqaha, A. (2017). Smart cities: A survey on data management, security, and enabling technologies. IEEE Communications Surveys & Tutorials, 19(4), 2456-2501.
Goel, G., &Bhramhabhatt, R. (2024). Dual sourcing strategies. International Journal of Science and Research Archive, 13(2), 2155. https://doi.org/10.30574/ijsra.2024.13.2.2155
Gupta, S., Drave, V. A., Dwivedi, Y. K., Baabdullah, A. M., &Ismagilova, E. (2020). Achieving superior organizational performance via big data predictive analytics: A dynamic capability view. Industrial Marketing Management, 90, 581-592.
Hume, S., Sarnikar, S., & Noteboom, C. (2020). Enhancing traceability in clinical research data through a metadata framework. Methods of Information in Medicine, 59(02/03), 075-085.
Jonck, P., & Minnaar, R. (2015). Validating an employer graduate-employability skills questionnaire in the faculty of management sciences. education.
Karwa, K. (2023). AI-powered career coaching: Evaluating feedback tools for design students. Indian Journal of Economics & Business. https://www.ashwinanokha.com/ijeb-v22-4-2023.php
Karwa, K. (2024). The future of work for industrial and product designers: Preparing students for AI and automation trends. Identifying the skills and knowledge that will be critical for future-proofing design careers. International Journal of Advanced Research in Engineering and Technology, 15(5). https://iaeme.com/MasterAdmin/Journal_uploads/IJARET/VOLUME_15_ISSUE_5/IJARET_15_05_011.pdf
Konneru, N. M. K. (2021). Integrating security into CI/CD pipelines: A DevSecOps approach with SAST, DAST, and SCA tools. International Journal of Science and Research Archive. Retrieved from https://ijsra.net/content/role-notification-scheduling-improving-patient
Krupa Goel. (2023). How Data Analytics Techniques can Optimize Sales Territory Planning. Journal of Computer Science and Technology Studies, 5(4), 248-264. https://doi.org/10.32996/jcsts.2023.5.4.26
Kumar, A. (2019). The convergence of predictive analytics in driving business intelligence and enhancing DevOps efficiency. International Journal of Computational Engineering and Management, 6(6), 118-142. Retrieved from https://ijcem.in/wp-content/uploads/THE-CONVERGENCE-OF-PREDICTIVE-ANALYTICS-IN-DRIVING-BUSINESS-INTELLIGENCE-AND-ENHANCING-DEVOPS-EFFICIENCY.pdf
López, L., Bagnato, A., Ahberve, A., & Franch, X. (2021, May). QFL: data-driven feedback loop to manage quality in agile development. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS) (pp. 58-66). IEEE.
Middelkoop, T. (2021). High-resolution data collection for automated fault diagnostics. In Automated Diagnostics and Analytics for Buildings (pp. 271-290). River Publishers.
Mozzherin, D. Y., Myltsev, A. A., & Patterson, D. J. (2017). “gnparser”: a powerful parser for scientific names based on Parsing Expression Grammar. BMC bioinformatics, 18, 1-14.
Mudambo, N. A. (2021). A Data Pipeline Architecture For Classification Of Potential Claimants In Reunification Of Unclaimed Financial Assets (Doctoral dissertation, Kca University).
Musembi, I. N. (2019). Effect of post-clearance audit process on trade facilitation in Kenya.
Naeem, M. (2020). Using social networking applications to facilitate change implementation processes: insights from organizational change stakeholders. Business Process Management Journal, 26(7), 1979-1998.
Nyati, S. (2018). Revolutionizing LTL carrier operations: A comprehensive analysis of an algorithm-driven pickup and delivery dispatching solution. International Journal of Science and Research (IJSR), 7(2), 1659-1666. Retrieved from https://www.ijsr.net/getabstract.php?paperid=SR24203183637
Raju, R. K. (2017). Dynamic memory inference network for natural language inference. International Journal of Science and Research (IJSR), 6(2). https://www.ijsr.net/archive/v6i2/SR24926091431.pdf
Redyuk, S., Kaoudi, Z., Markl, V., & Schelter, S. (2021, March). Automating Data Quality Validation for Dynamic Data Ingestion. In EDBT (pp. 61-72).
Rezaee, Z. (2017). Business sustainability: Performance, compliance, accountability and integrated reporting. Routledge.
Saffady, W. (2021). Records and information management: fundamentals of professional practice. Rowman & Littlefield.
Sardana, J. (2022). Scalable systems for healthcare communication: A design perspective. International Journal of Science and Research Archive. https://doi.org/10.30574/ijsra.2022.7.2.0253
Sardana, J. (2022). The role of notification scheduling in improving patient outcomes. International Journal of Science and Research Archive. Retrieved from https://ijsra.net/content/role-notification-scheduling-improving-patient
Schwichtenberg, S., Gerth, C., & Engels, G. (2017, June). From open API to semantic specifications and code adapters. In 2017 IEEE International Conference on Web Services (ICWS) (pp. 484-491). IEEE.
Shi, C., Jugulum, R., Joyce, H. I., Singh, J., Granese, B., Ramachandran, R., ...& Talburt, J. R. (2015). Improving financial services data quality–a financial company practice. International Journal of Lean Six Sigma, 6(2), 98-110.
Singh, V. (2021). Generative AI in medical diagnostics: Utilizing generative models to create synthetic medical data for training diagnostic algorithms. International Journal of Computer Engineering and Medical Technologies. https://ijcem.in/wp-content/uploads/GENERATIVE-AI-IN-MEDICAL-DIAGNOSTICS-UTILIZING-GENERATIVE-MODELS-TO-CREATE-SYNTHETIC-MEDICAL-DATA-FOR-TRAINING-DIAGNOSTIC-ALGORITHMS.pdf
Singh, V. (2024). Ethical considerations in deploying AI systems in public domains: Addressing the ethical challenges of using AI in areas like surveillance and healthcare. Turkish Journal of Computer and Mathematics Education (TURCOMAT). https://turcomat.org/index.php/turkbilmat/article/view/14959
Singu, S. K. (2022). ETL Process Automation: Tools and Techniques. ESP Journal of Engineering & Technology Advancements, 2(1), 74-85.
Singu, S. K. (2022). ETL Process Automation: Tools and Techniques. ESP Journal of Engineering & Technology Advancements, 2(1), 74-85.
Taleb, I., Serhani, M. A., Bouhaddioui, C., &Dssouli, R. (2021). Big data quality framework: a holistic approach to continuous quality management. Journal of Big Data, 8(1), 76.
Thumburu, S. K. R. (2021). Real-Time Data Quality Monitoring and Remediation in EDI. Advances in Computer Sciences, 4(1).
Wang, G., Chen, L., Dikshit, A., Gustafson, J., Chen, B., Sax, M. J., ...& Rao, J. (2021, June). Consistency and completeness: Rethinking distributed stream processing in apache kafka. In Proceedings of the 2021 international conference on management of data (pp. 2602-2613).
Wehrle, K., Tozzi, V., Braune, S., Roßnagel, F., Dikow, H., Paddock, S., ...& van Hövell, P. (2022). Implementation of a data control framework to ensure confidentiality, integrity, and availability of high-quality real-world data (RWD) in the NeuroTransData (NTD) registry. JAMIA open, 5(1), ooac017.
ZarrabiJorshari, F. (2016). A semantic based framework for software regulatory compliance (Doctoral dissertation, University of East London).
Zhang, H., Wang, S., & Wang, X. (2020, November). Rule-based Data Quality Intelligent Monitoring System. In Journal of Physics: Conference Series (Vol. 1670, No. 1, p. 012031). IOP Publishing.
Article Statistics
Downloads
Copyright License
Copyright (c) 2025 Santosh Durgam

This work is licensed under a Creative Commons Attribution 4.0 International License.