Abstract
This study investigates the predictive performance of logistic regression models with varying parameter specifications in classifying binary outcomes. Utilizing SAS software, the analysis focuses on key predictive metrics, including sensitivity, specificity, and overall classification accuracy. The model’s predictive strength is quantified using the concordance index (c), with an area under the Receiver Operating Characteristic (ROC) curve of 0.738, indicating acceptable classification capability. Goodness-of-fit assessments, such as the Hosmer-Lemeshow and Pearson tests, reveal no significant deviations, thereby confirming the model's adequacy. A backward elimination approach is employed to refine the model, balancing predictive power with interpretability by selecting a parsimonious set of main effects and interaction terms. Parameter estimates, confidence intervals, and significance levels are provided for key predictors, including smoking and alcohol use, which exhibit significant associations with binary health outcomes. The analysis also examines the sensitivity of parameter estimates to unbalanced data, demonstrating how modifications in single observations can influence model outcomes. This study emphasizes the critical role of model selection and fit diagnostics in logistic regression, offering valuable insights for optimizing predictive models in the classification of categorical data.
View more >>