Quantitative Assessment of Edge AI Model Compression Techniques to Enhance Performance of On-Device Natural Language Processing Applications

Patrick Gallinari

Go Back Research Article September, 2023

ISCSITR - International Journal of Information Technology

Quantitative Assessment of Edge AI Model Compression Techniques to Enhance Performance of On-Device Natural Language Processing Applications

Patrick Gallinari

Abstract

Edge Artificial Intelligence (Edge AI) presents significant potential for real-time, private, and efficient execution of Natural Language Processing (NLP) tasks directly on mobile or embedded devices. However, the limited computational and memory resources of edge devices pose critical challenges for deploying large-scale NLP models. This study quantitatively evaluates state-of-the-art model compression techniques—including pruning, quantization, and knowledge distillation—in the context of enhancing on-device NLP performance. Using benchmark datasets and representative NLP tasks, the study measures inference time, memory footprint, and accuracy trade-offs, offering a comparative analysis to determine optimal strategies for different hardware scenarios. Results show that hybrid compression methods consistently outperform individual approaches in striking a balance between efficiency and model fidelity, paving the way for practical deployment of NLP solutions on edge devices.

Keywords

edge ai model compression natural language processing quantization pruning knowledge distillation on-device inference

Document Preview

Download PDF

Details

Volume 4

Issue 2

Pages 1-7

ISSN 1248-5632

Quantitative Assessment of Edge AI Model Compression Techniques to Enhance Performance of On-Device Natural Language Processing Applications

Abstract

Keywords

Cite this publication

QUICKLINKS

CONTACT US