Predictive Modeling of Software Defects Using Ensemble Machine Learning Techniques and Feature Extraction from Static Code Metrics and Version Control Histories
Abstract
In software engineering, predicting defects early in the development lifecycle is essential to improving code quality, reducing maintenance costs, and enhancing software reliability. This study investigates the use of ensemble machine learning techniques to build predictive models for software defect detection, leveraging features extracted from both static code metrics and version control histories. By integrating multiple sources of data, we enhance the predictive capacity of models beyond what traditional defect prediction approaches offer. Our empirical evaluation, conducted on open-source software projects, demonstrates that ensemble models, particularly Random Forest and Gradient Boosting Machines, outperform individual learners in terms of precision, recall, and F1-score. The study provides a framework for early defect prediction that can be integrated into modern DevOps pipelines to proactively manage software quality.