Skip to main content
Loading...
Scholar9 logo True scholar network
  • Article ▼
    • Article List
    • Deposit Article
  • Mentorship ▼
    • Overview
    • Sessions
  • Questions
  • Scholars
  • Institutions
  • Journals
  • Login/Sign up
Back to Top

Transparent Peer Review By Scholar9

ASD-Pipeline: An Ensemble Machine Learning Framework Integrating Feature Selection, Behavioural Clustering, and Class Rebalancing for Accurate Autism Spectrum Disorder Prediction

Abstract

Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition characterized by a variety of behavioral and cognitive patterns. Early and precise detection is critical in enabling timely interventions. Conventional classification models frequently exhibit poor generalization due to irrelevant features, unstructured behavioral data, and severe class imbalance. Despite current advances in machine learning for ASD detection, current models do not integrate adaptive feature selection, behavioral grouping, or imbalanced class handling in a unified, end-to-end pipeline. The lack of incorporation frequently results in suboptimal performance and limited interpretability. This study proposes a new ensemble-based framework called ASD-Pipeline, which integrates flexible feature selection, hybrid clustering, synthetic minority oversampling, and ensemble voting classification to improve the predictive performance for ASD identification. The proposed ASD-Pipeline framework uses a five-stage process to improve the accuracy of autism spectrum disorder prediction. First, the dataset is normalized utilizing Min-Max scaling to guarantee that the feature ranges remain consistent. Next, feature selection is performed utilizing FlexiFeat, an ensemble method integrating filter-based (CfsSubsetEval with BestFirst), wrapper-based (WrapperSubsetEval with GreedyStepwise), and embedded (ReliefF with Ranker) techniques to maintain only the most pertinent feature. The ClusterGroup stage uses K-Means clustering (k=5) and DBSCAN improvement (ε=0.5, minPts=3) within each cluster to create behavioral groups and remove outliers. The ReBalance stage uses Cluster-SMOTE to tackle class imbalance by producing synthetic samples for the minority class and a balanced dataset. Finally, the ASDClassifier stage involves training an ensemble of Logistic Regression, Support Vector Machine, and Gradient Boosting classifiers that are combined using soft voting. Metrics used to assess the model include accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC). The proposed ASD-Pipeline surpassed existing models, achieving a significantly higher accuracy of 96.18% compared to previous techniques ranging from 76.80% to 90.60%. It also scored 91.51% precision, 91.63% recall, 95.57% F1-score, and 92.51% specificity. These findings emphasize the pipeline's efficacy in enhancing generalization and tackling difficulties such as feature relevance, behavioral grouping, and class imbalance for ASD prediction. The ASD-Pipeline offers a reliable, interpretable, and modular machine learning solution for ASD prediction. Its incorporated method tackles critical challenges in feature relevance, behavioral variability, and data imbalance, rendering it a promising tool for healthcare practitioners and researchers seeking data-driven insights into early ASD detection.

Rahul Arulkumaran Reviewer

badge Review Request Accepted

Rahul Arulkumaran Reviewer

30 May 2025 01:32 PM

badge Approved

Relevance and Originality

Methodology

Validity & Reliability

Clarity and Structure

Results and Analysis

Relevance and Originality

The proposed work addresses a key issue in autism diagnostics: the fragmented nature of existing machine learning models for ASD detection. By consolidating multiple underutilized strategies—adaptive feature selection, behavioral clustering, and synthetic data augmentation—the study offers a cohesive and forward-thinking framework. This integrative approach stands out for its originality, especially in a field where such comprehensive models remain rare. The focus on real-world diagnostic limitations elevates its practical importance and scientific contribution.

Methodology

The methodology demonstrates strong conceptual depth and technical precision. Each of the five stages is well-justified and sequentially supports the pipeline’s goal of improving ASD prediction. The combination of feature selection layers in FlexiFeat is particularly notable, enhancing data quality before downstream modeling. Using hybrid clustering techniques to identify behavior patterns adds interpretability, while Cluster-SMOTE for addressing class imbalance demonstrates thoughtful problem framing. The final ensemble classifier setup ensures predictive consistency and model resilience, confirming a robust engineering design.

Validity & Reliability

The high accuracy, specificity, and F1-score suggest strong internal validity, supported by the use of a wide range of evaluation metrics, including MCC. The model’s structure inherently supports generalization by eliminating irrelevant data and emphasizing class balance. However, the reliability could be reinforced by reporting results from multiple datasets or through cross-institutional validations. The pipeline’s modular design increases confidence in its adaptability, but external testing would further confirm its clinical or cross-domain applicability.

Clarity and Structure

The research is structured with clear logic and technical coherence. Each phase of the pipeline is explained with appropriate terminology and flow, facilitating reader understanding even for interdisciplinary audiences. The narrative remains grounded in both the medical and machine learning context, which improves its accessibility. A slight improvement could be made by including illustrative elements such as diagrams or flowcharts to depict the end-to-end architecture of the ASD-Pipeline, especially to show interaction between feature selection, clustering, and classification stages.

Result Analysis

The outcome discussion is strong and effectively illustrates the superiority of the ASD-Pipeline over prior models. The reported improvements across all metrics are not only statistically impressive but also practically meaningful in clinical prediction tasks. The model’s consistent performance across accuracy, precision, and specificity reflects a balanced and reliable prediction framework. The interpretation of these results is aligned with the initial research objectives and affirms the strength of integrating behavioral insights with advanced machine learning methods.

avatar

IJ Publication Publisher

Respected Sir,

Thank you for your insightful and balanced evaluation. We appreciate your recognition of the pipeline’s originality and robust methodology. Your suggestion to include visual aids and pursue external validations is valuable for enhancing clarity and reliability. We acknowledge the need for broader testing to confirm adaptability and will consider this for future work.

Thank you once again for your thoughtful feedback.

Publisher

User Profile

IJ Publication

Reviewer

User Profile

Rahul Arulkumaran

More Detail

User Profile

Paper Category

Computer Engineering

User Profile

Journal Name

IJNRD - INTERNATIONAL JOURNAL OF NOVEL RESEARCH AND DEVELOPMENT

User Profile

p-ISSN

User Profile

e-ISSN

2456-4184

Subscribe us to get updated

logo logo

Scholar9 is aiming to empower the research community around the world with the help of technology & innovation. Scholar9 provides the required platform to Scholar for visibility & credibility.

QUICKLINKS

  • What is Scholar9?
  • About Us
  • Mission Vision
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • Blogs
  • FAQ

CONTACT US

  • logo +91 82003 85143
  • logo hello@scholar9.com
  • logo www.scholar9.com

© 2025 Sequence Research & Development Pvt Ltd. All Rights Reserved.

whatsapp