Go Back Research Article May, 2025

Contrastive and Generative Self-Supervised Learning for Robust Feature Extraction in Cross-Modal and Low-Label Regimes

Abstract

Learning meaningful representations without large amounts of labeled data has become a cornerstone challenge in machine learning, especially in scenarios involving multimodal data and sparse annotation. This paper explores a hybrid approach combining contrastive learning and generative self-supervised techniques for robust feature extraction in cross-modal environments under low-label regimes. Our proposed framework jointly optimizes representation alignment across modalities and sample diversity using contrastive objectives and latent reconstruction. Empirical evaluation on image-text and audio-visual datasets shows improved performance in downstream classification and transfer learning tasks. The findings support the potential of integrated self-supervision for scalable, data-efficient representation learning.

Keywords

Self-Supervised Learning Contrastive Learning Generative Learning Cross-Modal Representation Low-Label Regimes Unsupervised Learning Feature Extraction
Document Preview
Download PDF
Details
Volume 15
Issue 3
Pages 7-12
ISSN 2223-1331