Articles | Open Access | https://doi.org/10.55640/ijdsml-05-01-25

A Review of Large Language Models in Edge Computing: Applications, Challenges, Benefits, and Deployment Strategies

Venkata Srinivas Kompally , Northeastern University, Boston, MA

Abstract

Large Language Models (LLMs) have achieved very good success in natural language processing, but deployment of these powerful models on edge computing devices across all domains presents unique challenges. This paper reviews the state of LLMs in edge computing, focusing on four key aspects: their emerging applications across various sectors, the technical challenges of running LLMs on resource-constrained edge devices, the potential benefits of bringing LLM capabilities closer to data sources, and effective deployment strategies to enable LLMs at the edge. We also discuss on how LLM edge deployment could offer low-latency, privacy-preserving intelligent assistance throughout a range of domains, such as healthcare, IoT, industrial automation, and more.  We also look at some techniques and architectures that can overcome the limitations of edge devices, such as cloud-edge collaboration, federated learning, model compression, and on-device inference. This review identifies practical ways to integrate LLMs into edge environments by examining current practices and their trade-offs. It also provides guidance for future research to address the remaining issues in this quickly expanding field.

Keywords

Edge computing, large language models (LLMs), Edge AI, On-Device Inference, IoT, Federated learning, Model compression, Deployment Strategies

References

Bhardwaj, Sarthak, Pardeep Singh, and Mohammad Khalid Pandit. "A survey on the integration and optimization of large language models in edge computing environments." 2024 16th International Conference on Computer and Automation Engineering (ICCAE). IEEE, 2024.

Zheng, Yue, et al. "A review on edge large language models: Design, execution, and applications." ACM Computing Surveys (2024).

Yuan, Xingyu, et al. "Generative inference of large language models in edge computing: An energy efficient approach." 2024 International Wireless Communications and Mobile Computing (IWCMC). IEEE, 2024.

Chen, Catherine Yu-Chi, et al. "Conformal Tail Risk Control for Large Language Model Alignment." arXiv preprint arXiv:2502.20285 (2025).

Eliganti Ramalakshmi, Venkata Srinivas Kompally, Baddam Deepika Reddy. (2020). Solar Powered Smart Irrigation and Monitoring System for Greenhouse Farming using IoT. International Journal of Advanced Science and Technology, 29(04), 8239 -. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/30559

S. K. Gunda, "Comparative Analysis of Machine Learning Models for Software Defect Prediction," 2024 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS), Chennai, India, 2024, pp. 1-6, doi: 10.1109/ICPECTS62210.2024.10780167. keywords: {Training;Logistic regression;Analytical models;Accuracy;Nearest neighbor methods;Predictive models;Software;Software reliability;Regression tree analysis;Testing;Software Defect Detection;Machine Learning;Logistic Regression;KNN;Decision Tree},

V. S. Kompally, "A microservices-based hybrid cloud-edge architecture for real-time IIoT analytics," Journal of Information Systems Engineering and Management, vol. 10, no. 16s, 2025. doi: 10.52783/jisem.v10i16s.2567

Tang, Yehui, et al. "A survey on transformer compression." arXiv preprint arXiv:2402.05964 (2024).

Ye, Rui, et al. "Openfedllm: Training large language models on decentralized private data via federated learning." Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining. 2024.

Hou, Zejiang, and Sun-Yuan Kung. "Multi-dimensional model compression of vision transformer." 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2022.

Jiang, Feibo, et al. "Personalized wireless federated learning for large language models." arXiv preprint arXiv:2404.13238 (2024).

Du, Jiangsu, et al. "Co-designing Transformer Architectures for Distributed Inference with Low Communication." IEEE Transactions on Parallel and Distributed Systems (2024).

Rancea, Alexandru, Ionut Anghel, and Tudor Cioara. "Edge computing in healthcare: Innovations, opportunities, and challenges." Future internet 16.9 (2024): 329.

Gupta, Piyush, et al. "Prediction of health monitoring with deep learning using edge computing." Measurement: Sensors 25 (2023): 100604.

Gong, Taiyuan, et al. "Edge intelligence in intelligent transportation systems: A survey." IEEE Transactions on Intelligent Transportation Systems 24.9 (2023): 8919-8944.

Purohit, Rutuja, and Sanjay Bang. "SecurAI: Leveraging Edge Computing and Large Language Models for Intelligent Surveillance." 2024 IEEE 4th International Conference on ICT in Business Industry & Government (ICTBIG). IEEE, 2024.

Li, Xiaoxia, et al. "Edge-computing-enabled unmanned module defect detection and diagnosis system for large-scale photovoltaic plants." IEEE Internet of Things Journal 7.10 (2020): 9651-9663.

Kubiak, Kacper, Grzegorz Dec, and Dorota Stadnicka. "Possible applications of edge computing in the manufacturing industry—systematic literature review." Sensors 22.7 (2022): 2445.

Bonam, Janakiramaiah, et al. "Lightweight cnn models for product defect detection with edge computing in manufacturing industries." Journal of Scientific & Industrial Research 82.04 (2023): 418-425.

Chen, Jiao, et al. "Towards General Industrial Intelligence: A Survey of Continual Large Models in Industrial IoT." arXiv preprint arXiv:2409.01207 (2024).

Awaisi, Kamran Sattar, Qiang Ye, and Srinivas Sampalli. "A Survey of Industrial AIoT: Opportunities, Challenges, and Directions." IEEE Access (2024).

Linghe Kong, Jinlin Tan, Junqin Huang, Guihai Chen, Shuaitian Wang, Xi Jin, Peng Zeng, Muhammad Khan, and Sajal K. Das. 2022. Edge-computing-driven Internet of Things: A Survey. ACM Comput. Surv. 55, 8, Article 174 (August 2023), 41 pages. https://doi.org/10.1145/3555308

Kyle Hoffpauir, Jacob Simmons, Nikolas Schmidt, Rachitha Pittala, Isaac Briggs, Shanmukha Makani, and Yaser Jararweh. 2023. A Survey on Edge Intelligence and Lightweight Machine Learning Support for Future Applications and Services. J. Data and Information Quality 15, 2, Article 20 (June 2023), 30 pages. https://doi.org/10.1145/3581759

He, Ying, et al. "Large language models (LLMs) inference offloading and resource allocation in cloud-edge computing: An active inference approach." IEEE Transactions on Mobile Computing (2024).

Zhang, Mingjin, et al. "Edgeshard: Efficient llm inference via collaborative edge computing." IEEE Internet of Things Journal (2024)

Zhao, Wentao, et al. "Edge and terminal cooperation enabled llm deployment optimization in wireless network." 2024 IEEE/CIC International Conference on Communications in China (ICCC Workshops). IEEE, 2024.

Bhardwaj, Sarthak, Pardeep Singh, and Mohammad Khalid Pandit. "A survey on the integration and optimization of large language models in edge computing environments." 2024 16th International Conference on Computer and Automation Engineering (ICCAE). IEEE, 2024.

Xu, Daliang, et al. "Llmcad: Fast and scalable on-device large language model inference." arXiv preprint arXiv:2309.04255 (2023).

Yang, Zheming, et al. "Perllm: Personalized inference scheduling with edge-cloud collaboration for diverse llm services." arXiv preprint arXiv:2405.14636 (2024)

Zhu, Jing, et al. "Edge intelligence-assisted animation design with large models: a survey." Journal of Cloud Computing 13.1 (2024): 48.

Rancea, Alexandru, Ionut Anghel, and Tudor Cioara. "Edge computing in healthcare: Innovations, opportunities, and challenges." Future internet 16.9 (2024): 329

Rivkin, Dmitriy, et al. "AIoT Smart Home via Autonomous LLM Agents." IEEE Internet of Things Journal (2024).

‘Kök, İbrahim, Orhan Demirci, and Suat Özdemir. "When IoT Meet LLMs: Applications and Challenges." 2024 IEEE International Conference on Big Data (BigData). IEEE, 2024.

Chen, Jiao, et al. "Edge-cloud collaborative motion planning for autonomous driving with large language models." 2024 IEEE 24th International Conference on Communication Technology (ICCT). IEEE, 2024.

Xu, Ronghua, Deeraj Nagothu, and Yu Chen. "AR-Edge: Autonomous and Resilient Edge Computing Architecture for Smart Cities." Edge Computing Architecture-Architecture and Applications for Smart Cities. IntechOpen, 2024.

Yu, Zhongzhi, et al. "Edge-llm: Enabling efficient large language model adaptation on edge devices via unified compression and adaptive layer voting." Proceedings of the 61st ACM/IEEE Design Automation Conference. 2024.

Jayakodi, Nitthilan Kannappan, Janardhan Rao Doppa, and Partha Pratim Pande. "A general hardware and software Co-design framework for energy-efficient edge AI." 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 2021.

Singh, Raghubir, and Sukhpal Singh Gill. "Edge AI: a survey." Internet of Things and Cyber-Physical Systems 3 (2023): 71-92.

Friha, Othmane, et al. "Llm-based edge intelligence: A comprehensive survey on architectures, applications, security and trustworthiness." IEEE Open Journal of the Communications Society (2024).

Zhang, Zhaoyun, and Jingpeng Li. "A review of artificial intelligence in embedded systems." Micromachines 14.5 (2023): 897.

Natarajan, Sundaram, et al. "Diagnostic accuracy of community-based diabetic retinopathy screening with an offline artificial intelligence system on a smartphone." JAMA ophthalmology 137.10 (2019): 1182-1188.

Wu, Shanshan, et al. "Prompt public large language models to synthesize data for private on-device applications." arXiv preprint arXiv:2404.04360 (2024).

Li, En, et al. "Edge AI: On-demand accelerating deep neural network inference via edge computing." IEEE transactions on wireless communications 19.1 (2019): 447-457.

Moon, Ji Joong, et al. "A new frontier of ai: On-device ai training and personalization." arXiv preprint arXiv:2206.04688 (2022).

Zhu, Xunyu, et al. "A survey on model compression for large language models." Transactions of the Association for Computational Linguistics 12 (2024): 1556-1577.

Hilmkil, Agrin, et al. "Scaling federated learning for fine-tuning of large language models." International Conference on Applications of Natural Language to Information Systems. Cham: Springer International Publishing, 2021.

Xu, Jiajun, et al. "On-device language models: A comprehensive review." arXiv preprint arXiv:2409.00088 (2024).

Yang, Zheming, et al. "Perllm: Personalized inference scheduling with edge-cloud collaboration for diverse llm services." arXiv preprint arXiv:2405.14636 (2024).

Yao, Jiangchao, et al. "Edge-cloud polarization and collaboration: A comprehensive survey for ai." IEEE Transactions on Knowledge and Data Engineering 35.7 (2022): 6866-6886.

Li, Jinrong, et al. "CoLLM: A Collaborative LLM Inference Framework for Resource-Constrained Devices." 2024 IEEE/CIC International Conference on Communications in China (ICCC). IEEE, 2024.

Hosseinzadeh, Minoo, et al. "Optimal accuracy-time trade-off for deep learning services in edge computing systems." ICC 2021-IEEE International Conference on Communications. IEEE, 2021.

Stojkovic, Jovan, et al. "Towards greener llms: Bringing energy-efficiency to the forefront of llm inference." arXiv preprint arXiv:2403.20306 (2024).

Shen, Tao, et al. "Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices." arXiv preprint arXiv:2503.08223 (2025).

Hayyolalam, Vahideh, Safa Otoum, and Öznur Özkasap. "Dynamic QoS/QoE-aware reliable service composition framework for edge intelligence." Cluster Computing 25.3 (2022): 1695-1713.

Zhou, P. J., et al. "A Neuromorphic Transformer Architecture Enabling Hardware-Friendly Edge Computing." IEEE Transactions on Circuits and Systems I: Regular Papers (2025).

Mazumder, Arnab Neelim, et al. "A survey on the optimization of neural network accelerators for micro-ai on-device inference." IEEE Journal on Emerging and Selected Topics in Circuits and Systems 11.4 (2021): 532-547.

Rui, Lanlan, et al. "Smart network maintenance in an edge cloud computing environment: An adaptive model compression algorithm based on model pruning and model clustering." IEEE Transactions on Network and Service Management 19.4 (2022): 4165-4175.

Wang, Zifeng, et al. "Sparcl: Sparse continual learning on the edge." Advances in Neural Information Processing Systems 35 (2022): 20366-20380.

Zhang, Xuechen, et al. "Fedyolo: Augmenting federated learning with pretrained transformers." arXiv preprint arXiv:2307.04905 (2023).

Hu, Yaqi, et al. "A Cloud-Edge Collaborative Architecture for Multimodal LLMs-Based Advanced Driver Assistance Systems in IoT Networks." IEEE Internet of Things Journal (2024).

HaddadPajouh, Hamed, et al. "AI4SAFE-IoT: An AI-powered secure architecture for edge layer of Internet of things." Neural Computing and Applications 32.20 (2020): 16119-16133.

Qin, Ruiyang, et al. "Empirical guidelines for deploying llms onto resource-constrained edge devices." ACM Transactions on Design Automation of Electronic System s (2024).

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

A Review of Large Language Models in Edge Computing: Applications, Challenges, Benefits, and Deployment Strategies. (2025). International Journal of Data Science and Machine Learning, 5(01), 300-322. https://doi.org/10.55640/ijdsml-05-01-25