Go Back Research Article May, 2023

MAKING GENERATIVE AI MODELS SMALLER: MODEL COMPRESSION TECHNIQUES AND ITS CHALLENGES

Abstract

From computer vision and NLP to speech recognition and autonomous driving, deep learning models have been remarkably efficient in many different areas. The size of such models, however, grows proportionally with the complexity and information they encompass. Several strong reasons have highlighted the importance of shrinking deep learning models. Deep learning model size reduction is critical for overcoming storage and computing constraints, enhancing efficiency and inference speed, lowering environmental impact, and resolving privacy and security concerns. As deep learning advances and finds applications in new fields, efforts to reduce model size will be critical to establishing scalable, accessible, and sustainable deep learning systems. In this work, we explore some of the model compression techniques such as quantization, model pruning, and low-rank factorization, and also highlight their limitations. In addition to building more efficient models, it is also paramount to make them sustainable and model compression plays a critical role in that.

Keywords

model size reduction quantization pruning low-rank factorization hybrid-precision
Document Preview
Download PDF
Details
Volume 4
Issue 1
Pages 65-71
ISSN 2251-2809