Design and Evaluation of Resource-Aware AI Services Using Serverless Functions on the Cloud
Abstract
As the demand for artificial intelligence (AI) services continues to scale, cloud-native paradigms like serverless computing have emerged as critical enablers of efficient, elastic, and cost-effective AI deployments. This study investigates the design and evaluation of resource-aware AI services using serverless functions in the cloud. We explore architectural models, resource allocation mechanisms, and scheduling techniques tailored for dynamic AI workloads. Using published benchmarks and container orchestration logs, we compare function performance under different resource-aware policies. Our findings indicate that resource-aware adaptation can reduce cold start latency by 35% and improve execution throughput by up to 42% across heterogeneous workloads. The paper contributes a lightweight evaluation framework and discusses implications for sustainability in large-scale AI inferencing environments