AnnoGCD: a generalized category discovery framework for automatic cell type annotation

Pietro Liò; Francesco Ceccarelli

doi:10.1093/nargab/lqae166

Go Back Research Article December, 2024

NAR Genomics and Bioinformatics

AnnoGCD: a generalized category discovery framework for automatic cell type annotation

Pietro Liò

Francesco Ceccarelli

Abstract

The identification of cell types in single-cell RNA sequencing (scRNA-seq) data is a critical task in understanding complex biological systems. Traditional supervised machine learning methods rely on large, well-labeled datasets, which are often impractical to obtain in open-world scenarios due to budget constraints and incomplete information. To address these challenges, we propose a novel computational framework, named AnnoGCD, building on Generalized Category Discovery (GCD) and Anomaly Detection (AD) for automatic cell type annotation. Our semi-supervised method combines labeled and unlabeled data to accurately classify known cell types and to discover novel ones, even in imbalanced datasets. AnnoGCD includes a semi-supervised block to first classify known cell types, followed by an unsupervised block aimed at identifying and clustering novel cell types. We evaluated our approach on five human scRNA-seq datasets and a mouse model atlas, demonstrating superior performance in both known and novel cell type identification compared to existing methods. Our model also exhibited robustness in datasets with significant class imbalance. The results suggest that AnnoGCD is a powerful tool for the automatic annotation of cell types in scRNA-seq data, providing a scalable solution for biological research and clinical applications. Our code and the datasets used for evaluations are publicly available on GitHub: https://github.com/cecca46/AnnoGCD/.

Document Preview

Download PDF

Details

Volume 6

Issue 4

Pages 1-10

DOI 10.1093/nargab/lqae166

ISSN 2631-9268

Impact Metrics

AnnoGCD: a generalized category discovery framework for automatic cell type annotation

Abstract

Cite this publication

QUICKLINKS

CONTACT US