Go Back Research Article September, 2023

Quantitative Assessment of Edge AI Model Compression Techniques to Enhance Performance of On-Device Natural Language Processing Applications

Abstract

Edge Artificial Intelligence (Edge AI) presents significant potential for real-time, private, and efficient execution of Natural Language Processing (NLP) tasks directly on mobile or embedded devices. However, the limited computational and memory resources of edge devices pose critical challenges for deploying large-scale NLP models. This study quantitatively evaluates state-of-the-art model compression techniques—including pruning, quantization, and knowledge distillation—in the context of enhancing on-device NLP performance. Using benchmark datasets and representative NLP tasks, the study measures inference time, memory footprint, and accuracy trade-offs, offering a comparative analysis to determine optimal strategies for different hardware scenarios. Results show that hybrid compression methods consistently outperform individual approaches in striking a balance between efficiency and model fidelity, paving the way for practical deployment of NLP solutions on edge devices.

Keywords

edge ai model compression natural language processing quantization pruning knowledge distillation on-device inference
Document Preview
Download PDF
Details
Volume 4
Issue 2
Pages 1-7
ISSN 1248-5632