Go Back Research Article October, 2023
International Journal of Emerging Trends in Computer Science and Information Technology

Integrating Site Reliability Engineering SRE Principles into Enterprise Architecture for Predictive Resilience

Abstract

Modern enterprises increasingly depend on complex distributed software systems where small faults can cascade into large customer impact. Site Reliability Engineering provides a disciplined approach to reliability through explicit service level objectives, error budgets, automation, incident response and continuous learning. Enterprise Architecture provides an enterprise-wide design view that connects business capabilities, information flows and technology platforms. This paper proposes an integrated framework that makes reliability a first-class architecture concern and that links architecture decisions to runtime evidence. The framework introduces an artifact mapping between Enterprise Architecture models and SRE primitives such as service level indicators, service level objectives and error budgets. It also defines a predictive resilience loop that combines observability telemetry with architecture context and change signals to anticipate degradation risk before user impact occurs. The paper synthesizes related work on resilience, observability, trace and log anomaly detection and interpretable root cause analysis then proposes implementation patterns for SLO hierarchies, drift detection, policy driven release governance and chaos experiments. Finally, it defines evaluation metrics and an illustrative enterprise scenario that demonstrates how predictive signals can trigger targeted governance actions and architecture updates

Details
Volume 4
Issue 3
ISSN 3050-9416
Impact Metrics