Skip to main content
Loading...
Scholar9 logo True scholar network
  • Article ā–¼
    • Article List
    • Deposit Article
  • Mentorship ā–¼
    • Overview
    • Sessions
  • Questions
  • Scholars
  • Institutions
  • Journals
  • Login/Sign up
Back to Top

Transparent Peer Review By Scholar9

ASD-Pipeline: An Ensemble Machine Learning Framework Integrating Feature Selection, Behavioural Clustering, and Class Rebalancing for Accurate Autism Spectrum Disorder Prediction

Abstract

Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition characterized by a variety of behavioral and cognitive patterns. Early and precise detection is critical in enabling timely interventions. Conventional classification models frequently exhibit poor generalization due to irrelevant features, unstructured behavioral data, and severe class imbalance. Despite current advances in machine learning for ASD detection, current models do not integrate adaptive feature selection, behavioral grouping, or imbalanced class handling in a unified, end-to-end pipeline. The lack of incorporation frequently results in suboptimal performance and limited interpretability. This study proposes a new ensemble-based framework called ASD-Pipeline, which integrates flexible feature selection, hybrid clustering, synthetic minority oversampling, and ensemble voting classification to improve the predictive performance for ASD identification. The proposed ASD-Pipeline framework uses a five-stage process to improve the accuracy of autism spectrum disorder prediction. First, the dataset is normalized utilizing Min-Max scaling to guarantee that the feature ranges remain consistent. Next, feature selection is performed utilizing FlexiFeat, an ensemble method integrating filter-based (CfsSubsetEval with BestFirst), wrapper-based (WrapperSubsetEval with GreedyStepwise), and embedded (ReliefF with Ranker) techniques to maintain only the most pertinent feature. The ClusterGroup stage uses K-Means clustering (k=5) and DBSCAN improvement (ε=0.5, minPts=3) within each cluster to create behavioral groups and remove outliers. The ReBalance stage uses Cluster-SMOTE to tackle class imbalance by producing synthetic samples for the minority class and a balanced dataset. Finally, the ASDClassifier stage involves training an ensemble of Logistic Regression, Support Vector Machine, and Gradient Boosting classifiers that are combined using soft voting. Metrics used to assess the model include accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC). The proposed ASD-Pipeline surpassed existing models, achieving a significantly higher accuracy of 96.18% compared to previous techniques ranging from 76.80% to 90.60%. It also scored 91.51% precision, 91.63% recall, 95.57% F1-score, and 92.51% specificity. These findings emphasize the pipeline's efficacy in enhancing generalization and tackling difficulties such as feature relevance, behavioral grouping, and class imbalance for ASD prediction. The ASD-Pipeline offers a reliable, interpretable, and modular machine learning solution for ASD prediction. Its incorporated method tackles critical challenges in feature relevance, behavioral variability, and data imbalance, rendering it a promising tool for healthcare practitioners and researchers seeking data-driven insights into early ASD detection.

Rajkumar Kyadasu Reviewer

badge Review Request Accepted

Rajkumar Kyadasu Reviewer

30 May 2025 01:34 PM

badge Approved

Relevance and Originality

Methodology

Validity & Reliability

Clarity and Structure

Results and Analysis

Relevance and Originality

The research presents a meaningful advancement by targeting long-standing challenges in ASD prediction—namely the lack of integrated solutions for feature selection, behavioral patterning, and class imbalance. The ASD-Pipeline introduces a conceptually sound and practically necessary innovation that combines these aspects within a unified structure. This kind of consolidated approach is not only novel but aligns well with the increasing demand for interpretable, scalable tools in digital health diagnostics. The emphasis on early detection underscores the article’s relevance to public health priorities.

Methodology

The framework’s five-stage design showcases a careful orchestration of preprocessing, analysis, and classification techniques. It begins with normalization to standardize inputs, followed by a thoughtfully layered feature selection using FlexiFeat, which blends filter, wrapper, and embedded methods for relevance optimization. The dual clustering approach using K-Means and DBSCAN contributes to meaningful behavioral grouping, while Cluster-SMOTE offers a targeted strategy to rectify class imbalance. Final model training through ensemble voting strikes a balance between model bias and variance. Overall, the methodology reflects both technical depth and practical foresight.

Validity & Reliability

The reported evaluation metrics—especially the substantial improvements in accuracy and F1-score—suggest strong internal validity. The diverse set of classifiers and feature engineering techniques helps reduce overfitting and increase robustness. However, details about cross-validation methods or external dataset testing would strengthen the case for broader reliability. The design choices indicate a high likelihood of reproducibility, but empirical testing on heterogeneous datasets would confirm generalizability.

Clarity and Structure

The research is clearly structured and well-articulated. The description of each component in the pipeline flows logically, offering a smooth narrative from problem identification to solution implementation. The authors maintain a balance between technical specificity and readability, making the article accessible to both machine learning practitioners and medical professionals. Slight improvements could be made by offering real-case deployment examples or user-centric evaluations to enhance contextual understanding.

Result Analysis

The performance metrics are not only well selected but comprehensively discussed, offering strong evidence for the system's efficacy. The comparison against previous models is well-executed, underscoring clear advancements in all evaluated areas. The emphasis on interpretability and modular design supports the claim of practical usability. These results collectively validate the framework’s goal to deliver a holistic and high-performing ASD prediction model.

avatar

IJ Publication Publisher

Respected Sir,

Thank you for your comprehensive and balanced review. We appreciate your acknowledgment of our pipeline’s innovative integration of feature selection, behavioral grouping, and class imbalance handling. Your points regarding the need for external validation and practical deployment insights are well taken and will be valuable in guiding future work to enhance the model’s reliability and applicability.

Thank you again for your thoughtful comments.

Publisher

User Profile

IJ Publication

Reviewer

User Profile

Rajkumar Kyadasu

More Detail

User Profile

Paper Category

Computer Engineering

User Profile

Journal Name

IJNRD - INTERNATIONAL JOURNAL OF NOVEL RESEARCH AND DEVELOPMENT

User Profile

p-ISSN

User Profile

e-ISSN

2456-4184

Subscribe us to get updated

logo logo

Scholar9 is aiming to empower the research community around the world with the help of technology & innovation. Scholar9 provides the required platform to Scholar for visibility & credibility.

QUICKLINKS

  • What is Scholar9?
  • About Us
  • Mission Vision
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • Blogs
  • FAQ

CONTACT US

  • logo +91 82003 85143
  • logo hello@scholar9.com
  • logo www.scholar9.com

© 2026 Sequence Research & Development Pvt Ltd. All Rights Reserved.

whatsapp