Skip to main content
Loading...
Scholar9 logo True scholar network
  • Article ▼
    • Article List
    • Deposit Article
  • Mentorship ▼
    • Overview
    • Sessions
  • Questions
  • Scholars
  • Institutions
  • Journals
  • Login/Sign up
Back to Top

Transparent Peer Review By Scholar9

ASD-Pipeline: An Ensemble Machine Learning Framework Integrating Feature Selection, Behavioural Clustering, and Class Rebalancing for Accurate Autism Spectrum Disorder Prediction

Abstract

Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition characterized by a variety of behavioral and cognitive patterns. Early and precise detection is critical in enabling timely interventions. Conventional classification models frequently exhibit poor generalization due to irrelevant features, unstructured behavioral data, and severe class imbalance. Despite current advances in machine learning for ASD detection, current models do not integrate adaptive feature selection, behavioral grouping, or imbalanced class handling in a unified, end-to-end pipeline. The lack of incorporation frequently results in suboptimal performance and limited interpretability. This study proposes a new ensemble-based framework called ASD-Pipeline, which integrates flexible feature selection, hybrid clustering, synthetic minority oversampling, and ensemble voting classification to improve the predictive performance for ASD identification. The proposed ASD-Pipeline framework uses a five-stage process to improve the accuracy of autism spectrum disorder prediction. First, the dataset is normalized utilizing Min-Max scaling to guarantee that the feature ranges remain consistent. Next, feature selection is performed utilizing FlexiFeat, an ensemble method integrating filter-based (CfsSubsetEval with BestFirst), wrapper-based (WrapperSubsetEval with GreedyStepwise), and embedded (ReliefF with Ranker) techniques to maintain only the most pertinent feature. The ClusterGroup stage uses K-Means clustering (k=5) and DBSCAN improvement (ε=0.5, minPts=3) within each cluster to create behavioral groups and remove outliers. The ReBalance stage uses Cluster-SMOTE to tackle class imbalance by producing synthetic samples for the minority class and a balanced dataset. Finally, the ASDClassifier stage involves training an ensemble of Logistic Regression, Support Vector Machine, and Gradient Boosting classifiers that are combined using soft voting. Metrics used to assess the model include accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC). The proposed ASD-Pipeline surpassed existing models, achieving a significantly higher accuracy of 96.18% compared to previous techniques ranging from 76.80% to 90.60%. It also scored 91.51% precision, 91.63% recall, 95.57% F1-score, and 92.51% specificity. These findings emphasize the pipeline's efficacy in enhancing generalization and tackling difficulties such as feature relevance, behavioral grouping, and class imbalance for ASD prediction. The ASD-Pipeline offers a reliable, interpretable, and modular machine learning solution for ASD prediction. Its incorporated method tackles critical challenges in feature relevance, behavioral variability, and data imbalance, rendering it a promising tool for healthcare practitioners and researchers seeking data-driven insights into early ASD detection.

Hrishikesh Rajesh Mane Reviewer

badge Review Request Accepted

Hrishikesh Rajesh Mane Reviewer

30 May 2025 01:26 PM

badge Approved

Relevance and Originality

Methodology

Validity & Reliability

Clarity and Structure

Results and Analysis

Relevance and Originality

The study introduces a compelling solution to the well-documented challenges in autism spectrum disorder prediction by merging multiple machine learning techniques into one structured pipeline. What sets it apart is the integration of adaptive feature selection and behavioral pattern clustering within the same framework, addressing known weaknesses in generalization and data imbalance. This approach significantly enhances relevance in the healthcare analytics domain and provides a meaningful contribution by filling a methodological gap in ASD detection models through ensemble learning and synthetic data augmentation.

Methodology

The research adopts a detailed five-phase design, starting with Min-Max scaling for normalization, which ensures consistency in data range. The use of FlexiFeat to blend filter-based, wrapper-based, and embedded feature selection techniques is a strong methodological highlight, reducing dimensionality and increasing relevance. Incorporating both K-Means and DBSCAN for behavioral grouping captures complex patterns, while the application of Cluster-SMOTE strategically balances the dataset. Finally, the ensemble classifier leverages diverse algorithms for improved robustness, making this methodology both exhaustive and technically sound.

Validity & Reliability

The study presents a convincing argument for the robustness of the ASD-Pipeline, reporting high performance across a variety of standard evaluation metrics. The use of soft voting among diverse classifiers ensures the system captures varying data behaviors, supporting model reliability. Furthermore, the inclusion of Matthews Correlation Coefficient alongside F1-score, precision, and specificity provides a balanced validation strategy. Nevertheless, the findings would be even more compelling if tested against independent datasets to strengthen claims of generalizability and confirm applicability in varied clinical settings.

Clarity and Structure

The progression from problem statement to technical implementation is clear, and the logical sequencing of the pipeline’s stages ensures reader comprehension. The research is well-articulated, and technical components are described with clarity and appropriate depth. Each technique used is justified within the broader context of autism prediction. While the narrative is precise, the article could benefit from the inclusion of visual process flows or schematic overviews to reinforce the modular architecture of the proposed system.

Result Analysis

The performance gains reported over previous methods are well-articulated and supported with a comprehensive set of metrics. The superiority of the ASD-Pipeline is effectively demonstrated through comparative accuracy, specificity, and F1-score. The results validate the strategic inclusion of behavioral clustering and synthetic oversampling as performance-enhancing components, and the conclusions are logically consistent with the data presented.

avatar

IJ Publication Publisher

Respected Sir,

Thank you for your detailed and constructive review. We are pleased that the integrated approach and methodological rigor of the ASD-Pipeline were well received. Your point about external validation and the potential benefit of visual aids for clarity is well taken and will guide future improvements. We appreciate your acknowledgment of the model’s strong performance and balanced evaluation.

Thank you once again for your valuable feedback.

Publisher

User Profile

IJ Publication

Reviewer

User Profile

Hrishikesh Rajesh Mane

More Detail

User Profile

Paper Category

Computer Engineering

User Profile

Journal Name

IJNRD - INTERNATIONAL JOURNAL OF NOVEL RESEARCH AND DEVELOPMENT

User Profile

p-ISSN

User Profile

e-ISSN

2456-4184

Subscribe us to get updated

logo logo

Scholar9 is aiming to empower the research community around the world with the help of technology & innovation. Scholar9 provides the required platform to Scholar for visibility & credibility.

QUICKLINKS

  • What is Scholar9?
  • About Us
  • Mission Vision
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • Blogs
  • FAQ

CONTACT US

  • logo +91 82003 85143
  • logo hello@scholar9.com
  • logo www.scholar9.com

© 2025 Sequence Research & Development Pvt Ltd. All Rights Reserved.

whatsapp