Abstract

The rapid expansion of deep learning applications has driven significant interest in optimizing the execution of convolutional neural networks (CNNs), particularly on edge and embedded devices. The convolutional layer, being the computational backbone of CNNs, is highly resource-intensive and requires efficient implementation strategies. This paper proposes a hardware-software co-optimization framework that jointly tunes computational graph mappings and hardware accelerator configurations to maximize throughput and minimize energy consumption. Design leverages parameter-aware scheduling and layer-specific profiling to bridge the performance-efficiency gap observed in traditional accelerator deployments. Empirical results demonstrate up to 2.4 improvement in latency and 1.9 reduction in energy usage over baseline FPGA-based implementations.

Close Copy Text

Paper Title

Design of a Hardware-Software Co-Optimization Framework for Efficient Execution of Convolutional Layers in Deep Neural Networks

Authors

Keywords

Article Type

Journal

Issue

Published On

Downloads

Abstract

Uploded Document Preview

QUICKLINKS

CONTACT US