Go Back Research Article August, 2023

Unsupervised Query Reformulation through Latent Concept Induction in Large-Scale Heterogeneous Information Retrieval Environments

Abstract

In large-scale heterogeneous information retrieval (IR) environments, user queries are often semantically ambiguous or structurally sparse, limiting retrieval effectiveness. This paper proposes a novel unsupervised query reformulation framework based on latent concept induction (LCI), which learns implicit semantic structures from retrieved document sets. Unlike supervised approaches, the proposed model autonomously uncovers latent concepts via document co-occurrence and context propagation techniques. Experiments on TREC and ClueWeb datasets show significant improvements in mean average precision (MAP) and normalized discounted cumulative gain (nDCG) over baseline and supervised models. The proposed LCI framework enhances retrieval effectiveness without requiring annotated query reformulation data, making it scalable across domains and languages

Keywords

query reformulation latent concept induction unsupervised learning information retrieval semantic matching large-scale retrieval
Document Preview
Download PDF
Details
Volume 4
Issue 2
Pages 1-6
ISSN 1551-1116