Go Back Research Article September, 2023

Architecting Scalable Feature Engineering Pipelines through Automated Machine Learning and Data Mining Techniques in Heterogeneous Data Ecosystems

Abstract

Feature engineering remains a critical and resource-intensive phase in the machine learning (ML) lifecycle, especially within large-scale, heterogeneous data ecosystems. This paper investigates how automated machine learning (AutoML) and data mining techniques can be systematically orchestrated to develop scalable and adaptive feature engineering pipelines. We present a synthesis of existing literature and introduce architectural strategies that ensure both computational scalability and semantic alignment across disparate data sources. Visual artifacts such as flowcharts and tabular summaries aid in illustrating the challenges and solutions in constructing robust, automated feature transformation pipelines. Our findings suggest that the integration of AutoML with knowledge-driven feature selection leads to enhanced model performance and generalization across diverse domains.

Keywords

automl feature engineering data mining scalable pipelines heterogeneous data high-dimensional data ml automation feature transformation
Document Preview
Download PDF
Details
Volume 4
Issue 2
Pages 1-7
ISSN 1142-4177