Articles | Open Access |

Event Sourcing for Retail Inventory: Kafka + BigQuery Real-Time Analytics

Sandeep Reddy Gundla , Lead Software Engineer, MACYS Inc, GA, USA

Abstract

The retail business is experiencing difficulties in maintaining accurate inventory levels with low latency because of old batch processing systems. The proposed solution is a real-time application that uses the data stream architecture based on event-sourcing with Apache Kafka and Google BigQuery to provide streaming analytics. The immutability of the events makes it possible to reliably stream to Kafka such events as sales, returns, and stock adjustments. The scalability and fault-tolerant design of Kafka and the fact that BigQuery is serverless ensure a persistent foundation and real-time analytics in a sub-second query resolution. Significant improvements were the elaborate schema design, optimal Kafka topic setup, and introduction of real-time dashboards to gain insights into inventory. The results also show less than one-second update latency up to 100,000 events per second, and cost-efficient benchmarks of cloud infrastructure. The solution improves the real-time monitoring of the stock, thus reducing instances of out-of-stock and facilitating better decision making. The possibilities are immense for retail operations, providing better control of inventory management, pricing, and forecasting of demand. This model represents a significant step away from legacy batch ETL processes towards having the correct data as needed within retail to support a high-velocity, globalised sector. Future efforts will be dedicated to multi-region deployment, automatic schema change, and incorporating edge computing in offline shop scenarios.

Keywords

Event Sourcing, Kafka, BigQuery, Real-Time Analytics, Retail Inventory.

References

Adelusi, B. S., Ojika, F. U., & Uzoka, A. C. (2022). Systematic Review of Cloud-Native Data Modeling Techniques Using dbt, Snowflake, and Redshift Platforms. International Journal of Scientific Research in Civil Engineering, 6(6), 177-204.

Akanbi, A., & Masinde, M. (2020). A distributed stream processing middleware framework for real-time analysis of heterogeneous data on big data platform: Case of environmental monitoring. Sensors, 20(11), 3166.

Alfatafta, M. (2019). An analysis of partial network partitioning failures in modern distributed systems (Doctoral dissertation, University of Waterloo).

Armbrust, M., Das, T., Sun, L., Yavuz, B., Zhu, S., Murthy, M., ... & Zaharia, M. (2020). Delta lake: high-performance ACID table storage over cloud object stores. Proceedings of the VLDB Endowment, 13(12), 3411-3424.

Chavan, A. (2021). Exploring event-driven architecture in microservices: Patterns, pitfalls, and best practices. International Journal of Software and Research Analysis. https://ijsra.net/content/exploring-event-driven-architecture-microservices-patterns-pitfalls-and-best-practices

Chavan, A. (2022). Importance of identifying and establishing context boundaries while migrating from monolith to microservices. Journal of Engineering and Applied Sciences Technology, 4, E168. http://doi.org/10.47363/JEAST/2022(4)E168

CHITNIS, A. (2022, February). Machine Learning for Fraud Detection Leveraging Sap Finance Data: A Case Study of BigQuery ML Application.

Emily, H., & Oliver, B. (2020). Event-driven architectures in modern systems: designing scalable, resilient, and real-time solutions. International Journal of Trend in Scientific Research and Development, 4(6), 1958-1976.

Emma, O. T., & Peace, P. (2023). Building an Automated Data Ingestion System: Leveraging Kafka Connect for Predictive Analytics.

Feng, M., Krunz, M., & Zhang, W. (2021). Joint task partitioning and user association for latency minimization in mobile edge computing networks. IEEE Transactions on Vehicular Technology, 70(8), 8108-8121.

Karwa, K. (2023). AI-powered career coaching: Evaluating feedback tools for design students. Indian Journal of Economics & Business. https://www.ashwinanokha.com/ijeb-v22-4-2023.php

Kindson, M., & Péter, M. (2023). A simplified approach to distributed message handling in a CQRS architecture. Acta Polytechnica Hungarica, 20(4).

Kumar, A. (2019). The convergence of predictive analytics in driving business intelligence and enhancing DevOps efficiency. International Journal of Computational Engineering and Management, 6(6), 118-142. Retrieved from https://ijcem.in/wp-content/uploads/THE-CONVERGENCE-OF-PREDICTIVE-ANALYTICS-IN-DRIVING-BUSINESS-INTELLIGENCE-AND-ENHANCING-DEVOPS-EFFICIENCY.pdf

Lakshmanan, V., & Tigani, J. (2019). Google Bigquery: the definitive guide: data warehousing, analytics, and machine learning at scale. O'Reilly Media.

Lindberg, S. (2022). Real-time Balance Management in Omnichannel Retail.

Marrandino, A. (2021). Machine Learning with BigQuery ML: Create, execute, and improve machine learning models in BigQuery using standard SQL queries. Packt Publishing Ltd.

Mehmood, E., Anees, T., Al-Shamayleh, A. S., Al-Ghushami, A. H., Khalil, W., & Akhunzada, A. (2023). DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture. IEEE Access, 11, 63565-63578.

Mucchetti, M. (2020). BigQuery for Data Warehousing. Springer.

Nyati, S. (2018). Revolutionizing LTL carrier operations: A comprehensive analysis of an algorithm-driven pickup and delivery dispatching solution. International Journal of Science and Research (IJSR), 7(2), 1659-1666. Retrieved from https://www.ijsr.net/getabstract.php?paperid=SR24203183637

Nyati, S. (2018). Transforming telematics in fleet management: Innovations in asset tracking, efficiency, and communication. International Journal of Science and Research (IJSR), 7(10), 1804-1810. Retrieved from https://www.ijsr.net/getabstract.php?paperid=SR24203184230

Oliveira Rocha, H. F. (2021). How to Manage Eventual Consistency. In Practical Event-Driven Microservices Architecture: Building Sustainable and Highly Scalable Event-Driven Microservices (pp. 187-225). Berkeley, CA: Apress.

Pamisetty, A. (2019). Big Data Engineering for Real-Time Inventory Optimization in Wholesale Distribution Networks. Available at SSRN 5267328.

Pandey, P. K. (2019). Kafka Streams-Real-Time Stream Processing. Learning Journal.

Raju, R. K. (2017). Dynamic memory inference network for natural language inference. International Journal of Science and Research (IJSR), 6(2). https://www.ijsr.net/archive/v6i2/SR24926091431.pdf

Ramuka, M. (2019). Data analytics with Google Cloud platform. BPB Publications.

Raptis, T. P., & Passarella, A. (2023). A survey on networked data streaming with apache kafka. IEEE access, 11, 85333-85350.

Sardana, J. (2022). Scalable systems for healthcare communication: A design perspective. International Journal of Science and Research Archive. https://doi.org/10.30574/ijsra.2022.7.2.0253

Sardana, J. (2022). The role of notification scheduling in improving patient outcomes. International Journal of Science and Research Archive. Retrieved from https://ijsra.net/content/role-notification-scheduling-improving-patient

Sarvari, P. A., Khadraoui, D., Martin, S., & Baskurt, G. (2023, August). Next-Generation Infrastructure and Application Scaling: Enhancing Resilience and Optimizing Resource Consumption. In Global Joint Conference on Industrial Engineering and Its Application Areas (pp. 63-76). Cham: Springer Nature Switzerland.

Shapira, G., Palino, T., Sivaram, R., & Petty, K. (2021). Kafka: the definitive guide. " O'Reilly Media, Inc.".

Singh, V. (2022). Advanced generative models for 3D multi-object scene generation: Exploring the use of cutting-edge generative models like diffusion models to synthesize complex 3D environments. https://doi.org/10.47363/JAICC/2022(1)E224

Sulkava, A. (2023). Building scalable and fault-tolerant software systems with Kafka.

Tranquillin, M., Lakshmanan, V., & Tekiner, F. (2023). Architecting data and machine learning platforms: enable analytics and AI-driven innovation in the cloud. " O'Reilly Media, Inc.".

Van Dongen, G., & Van den Poel, D. (2020). Evaluation of stream processing frameworks. IEEE Transactions on Parallel and Distributed Systems, 31(8), 1845-1858.

Vargas, R. F. L. (2022). A performance comparison of data lake table formats in cloud object storages.

Wang, G., Chen, L., Dikshit, A., Gustafson, J., Chen, B., Sax, M. J., ... & Rao, J. (2021, June). Consistency and completeness: Rethinking distributed stream processing in apache kafka. In Proceedings of the 2021 international conference on management of data (pp. 2602-2613).

Wang, Y. (2022). Evolution of microservice-based applications: Modelling and safe dynamic updating (Doctoral dissertation, Institut Polytechnique de Paris).

Zargar, S., Yao, Y., & Tu, Q. (2022). A review of inventory modeling methods for missing data in life cycle assessment. Journal of Industrial Ecology, 26(5), 1676-1689.

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

Event Sourcing for Retail Inventory: Kafka + BigQuery Real-Time Analytics. (2024). International Journal of Data Science and Machine Learning, 4(01), 11-36. https://www.academicpublishers.org/journals/index.php/ijdsml/article/view/6119