Home > CSC-OpenAccess Library > Manuscript Information
EXPLORE PUBLICATIONS BY COUNTRIES |
![]() |
![]() |
EUROPE |
![]() |
MIDDLE EAST |
![]() |
ASIA |
![]() |
AFRICA |
............................. | |
![]() |
United States of America |
![]() |
United Kingdom |
![]() |
Canada |
![]() |
Australia |
![]() |
Italy |
![]() |
France |
![]() |
Brazil |
![]() |
Germany |
![]() |
Malaysia |
![]() |
Turkey |
![]() |
China |
![]() |
Taiwan |
![]() |
Japan |
![]() |
Saudi Arabia |
![]() |
Jordan |
![]() |
Egypt |
![]() |
United Arab Emirates |
![]() |
India |
![]() |
Nigeria |
Improving Model Deployment Pipelines for Efficiency in Cloud-Based Machine Learning Platforms
Sanjeev Kumar
Pages - 16 - 25 | Revised - 31-01-2025 | Published - 28-02-2025
Published in International Journal of Software Engineering (IJSE)
MORE INFORMATION
KEYWORDS
Model Deployment, Cloud-based Machine Learning, CI/CD Pipelines, Serverless
Computing, Resource Optimization.
ABSTRACT
Thus, increasing demand for the cloud-based machine learning solution is highly pushing the
focus forward into making deployment pipelines for models efficient. These pipelines are very
important to get a trained model to scale, provide real-time predictions, and manage the cloud
infrastructure complexities in general. This paper reports on strategies improving model
deployment pipelines on cloud-based ML platforms centered around automation, monitoring, and
resource optimization. We investigate current tools, such as containerization, serverless
computing, and CI/CD frameworks for streamlined transition pipelines through development and
production. We also investigate how superior monitoring tools support the best possible
resources allocation while keeping downtime at its lowest and latency low. It discusses case
studies from top cloud providers and creates an optimized architecture model, especially suited to
varied applications. Our experiments demonstrate that the optimized pipelines can show up to an
order of magnitude improvement in terms of deployment speed, model performance, and cost
effectiveness, providing a robust basis for scaling ML solutions in the cloud. Finally, we point out
some of the limitations of current approaches and outline areas of future research as one
considers expanding deployment pipelines in increasingly complex cloud environments.
A. Giretti, "Understanding the gRPC Specification," in Beginning gRPC with ASP.NET Core 6, Berkeley, CA, USA: Apress, 2022, pp. 85-102, https://doi.org/10.1007/978-1-4842-8008-9 | |
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, "Pytorch: An imperative style, high-performance deep learning library," in Adv. Neural Inf. Process. Syst., vol. 32, pp. 8024-8035, 2019, DOI: 10.48550/arXiv.1912.01703. | |
A. Tanwani, R. Anand, J. E. Gonzalez, and K. Goldberg, "RILaaS: Robot Inference and Learning as a Service," IEEE Robot. Autom. Lett., vol. 5, pp. 4423-4430, 2020, DOI: 10.1109/LRA.2020.2998414. | |
B. Li, L. Zeng, Z. Zhou, and X. Chen, "Edge AI: On-demand accelerating deep neural network inference via edge computing," IEEE Trans. Wirel. Commun., vol. 19, pp. 447-457, 2019, DOI: 10.1109/TWC.2019.2946140 | |
C. Hu and B. Li, "Distributed inference with deep learning models across heterogeneous edge devices," in Proc. IEEE INFOCOM 2022, pp. 330-339, 2022, DOI: 10.1109/INFOCOM48880.2022.9796780. | |
J. Ma, C. Yu, A. Zhou, B. Wu, X. Wu, X. Chen, X. Chen, L. Wang, and D. Cao, "S3ML: A Secure Serving System for Machine Learning Inference," arXiv preprint, 2020, DOI: 10.48550/arXiv.2004.10337. | |
K. Bogacka, A. Danilenka, K. Wasielewska-Michniewska, M. Paprzycki, M. Ganzha, E. Garro, and L. Tassakos, "Introducing Federated Learning into Internet of Things Ecosystems-Maintaining Cooperation Between Competing Parties," in Proc. of the 10th Int. Conf. on Big Data Analytics (BDA 2022), Aizu, Japan, 2023, pp. 53-69. | |
Küfner, T., Uhlemann, T.H.-J., Ziegler, B, “Lean Data in Manufacturing Systems: Using Artificial Intelligence for Decentralized Data Reduction and Information Extraction,” Procedia CIRP, 51st CIRP Conference on Manufacturing Systems,vol.72, pp.219-224, 2018. Https://Doi.Org/10.1016/J.Procir.2018.03.125. | |
M. Bolanowski, K. Żak, A. Paszkiewicz, M. Ganzha, M. Paprzycki, P. Sowiński, I. Lacalle, and C. E. Palau, "Efficiency of REST and gRPC realizing communication tasks in microservice-based ecosystems," arXiv preprint, 2022, DOI:10.3233/FAIA220242. | |
M. Johansson and O. Isabella, "Comparative Study of REST and gRPC for Microservices in Established Software Architectures," 2023, DiVA, id: diva2:1772587 | |
P. Pääkkönen, D. Pakkala, J. Kiljander, and R. Sarala, "Architecture for enabling edge inference via model transfer from cloud domain in a kubernetes environment," Future Internet, vol. 13, no. 5, 2020, DOI: 10.3390/fi13010005. | |
Q. Lin, S. Wu, J. Zhao, J. Dai, M. Shi, G. Chen, and F. Li, "SmartLite: A DBMS-Based Serving System for DNN Inference in Resource-Constrained Environments," Proc. VLDB Endow., vol. 17, pp. 278-291, 2023, DOI: 10.14778/3632093.3632095. | |
X. Wang, W. Li, and Z. Wu, "CarDD: A New Dataset for Vision-Based Car Damage Detection," IEEE Trans. Intell. Transp. Syst., vol. 24, pp. 7202-7214, 2023, DOI: 10.1109/TITS.2023.3258480. | |
Mr. Sanjeev Kumar
Independent researcher, SME in Cloud Engineering, Georgia - United States of America
sanjeevkumar.sk@ieee.org
|
|
|
|
View all special issues >> | |
|
|