EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Improving Model Deployment Pipelines for Efficiency in Cloud-Based Machine Learning Platforms

Sanjeev Kumar

Pages - 16 - 25 | Revised - 31-01-2025 | Published - 28-02-2025

Published in International Journal of Software Engineering (IJSE)

Volume - 12 Issue - 1 | Publication Date - February 2025 Table of Contents

MORE INFORMATION

References | Abstracting & Indexing

KEYWORDS

Model Deployment, Cloud-based Machine Learning, CI/CD Pipelines, Serverless Computing, Resource Optimization.

ABSTRACT

Thus, increasing demand for the cloud-based machine learning solution is highly pushing the focus forward into making deployment pipelines for models efficient. These pipelines are very important to get a trained model to scale, provide real-time predictions, and manage the cloud infrastructure complexities in general. This paper reports on strategies improving model deployment pipelines on cloud-based ML platforms centered around automation, monitoring, and resource optimization. We investigate current tools, such as containerization, serverless computing, and CI/CD frameworks for streamlined transition pipelines through development and production. We also investigate how superior monitoring tools support the best possible resources allocation while keeping downtime at its lowest and latency low. It discusses case studies from top cloud providers and creates an optimized architecture model, especially suited to varied applications. Our experiments demonstrate that the optimized pipelines can show up to an order of magnitude improvement in terms of deployment speed, model performance, and cost effectiveness, providing a robust basis for scaling ML solutions in the cloud. Finally, we point out some of the limitations of current approaches and outline areas of future research as one considers expanding deployment pipelines in increasingly complex cloud environments.

REFERENCES

A. Giretti, "Understanding the gRPC Specification," in Beginning gRPC with ASP.NET Core 6, Berkeley, CA, USA: Apress, 2022, pp. 85-102, https://doi.org/10.1007/978-1-4842-8008-9

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, "Pytorch: An imperative style, high-performance deep learning library," in Adv. Neural Inf. Process. Syst., vol. 32, pp. 8024-8035, 2019, DOI: 10.48550/arXiv.1912.01703.

A. Tanwani, R. Anand, J. E. Gonzalez, and K. Goldberg, "RILaaS: Robot Inference and Learning as a Service," IEEE Robot. Autom. Lett., vol. 5, pp. 4423-4430, 2020, DOI: 10.1109/LRA.2020.2998414.

B. Li, L. Zeng, Z. Zhou, and X. Chen, "Edge AI: On-demand accelerating deep neural network inference via edge computing," IEEE Trans. Wirel. Commun., vol. 19, pp. 447-457, 2019, DOI: 10.1109/TWC.2019.2946140

C. Hu and B. Li, "Distributed inference with deep learning models across heterogeneous edge devices," in Proc. IEEE INFOCOM 2022, pp. 330-339, 2022, DOI: 10.1109/INFOCOM48880.2022.9796780.

J. Ma, C. Yu, A. Zhou, B. Wu, X. Wu, X. Chen, X. Chen, L. Wang, and D. Cao, "S3ML: A Secure Serving System for Machine Learning Inference," arXiv preprint, 2020, DOI: 10.48550/arXiv.2004.10337.

K. Bogacka, A. Danilenka, K. Wasielewska-Michniewska, M. Paprzycki, M. Ganzha, E. Garro, and L. Tassakos, "Introducing Federated Learning into Internet of Things Ecosystems-Maintaining Cooperation Between Competing Parties," in Proc. of the 10th Int. Conf. on Big Data Analytics (BDA 2022), Aizu, Japan, 2023, pp. 53-69.

K�fner, T., Uhlemann, T.H.-J., Ziegler, B, �Lean Data in Manufacturing Systems: Using Artificial Intelligence for Decentralized Data Reduction and Information Extraction,� Procedia CIRP, 51st CIRP Conference on Manufacturing Systems,vol.72, pp.219-224, 2018. Https://Doi.Org/10.1016/J.Procir.2018.03.125.

M. Bolanowski, K. Żak, A. Paszkiewicz, M. Ganzha, M. Paprzycki, P. Sowiński, I. Lacalle, and C. E. Palau, "Efficiency of REST and gRPC realizing communication tasks in microservice-based ecosystems," arXiv preprint, 2022, DOI:10.3233/FAIA220242.

M. Johansson and O. Isabella, "Comparative Study of REST and gRPC for Microservices in Established Software Architectures," 2023, DiVA, id: diva2:1772587

P. P��kk�nen, D. Pakkala, J. Kiljander, and R. Sarala, "Architecture for enabling edge inference via model transfer from cloud domain in a kubernetes environment," Future Internet, vol. 13, no. 5, 2020, DOI: 10.3390/fi13010005.

Q. Lin, S. Wu, J. Zhao, J. Dai, M. Shi, G. Chen, and F. Li, "SmartLite: A DBMS-Based Serving System for DNN Inference in Resource-Constrained Environments," Proc. VLDB Endow., vol. 17, pp. 278-291, 2023, DOI: 10.14778/3632093.3632095.

X. Wang, W. Li, and Z. Wu, "CarDD: A New Dataset for Vision-Based Car Damage Detection," IEEE Trans. Intell. Transp. Syst., vol. 24, pp. 7202-7214, 2023, DOI: 10.1109/TITS.2023.3258480.

MANUSCRIPT AUTHORS

Mr. Sanjeev Kumar

Independent researcher, SME in Cloud Engineering, Georgia - United States of America

sanjeevkumar.sk@ieee.org

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS