PLATFORM RELIABILITY IN MICROSOFT AZURE: ARCHITECTURE PATTERNS AND FAULT TOLERANCE FOR ENTERPRISE WORKLOADS
Abstract
In cloud computing, achievement of platform reliability has become a mission-critical issue to businesses that use high workloads. Microsoft Azure being a major cloud service provider has a wide range of services and architectural patterns that target at improving the fault tolerance, scalability, and business continuity. In this paper, the researcher examines the fundamental reliability features that Azure incorporated into its infrastructure, such as the availability zones, load-balancing strategies, data replication, and data recovery models. Examining the system on the level of architectural settings and fault treatment, the paper determines how Azure supports the interruption of its services and ensures compliance with enterprise Service-Level Agreements (SLAs). The practical aspects of the implementation of the fault-tolerant strategies are shown during the discussion of the simulated enterprise deployment scenario that considers fault-tolerant frameworks using the Azure Site Recovery RPO and the geo-redundant options. The results show the inclination toward matching the reliability measures with the level of workload criticality and suggest a cost-conscious model of introducing robust solutions in Azure. The study will end with the evaluation of the current trends and future improvement of automated recovery and AI-based monitoring system that will enhance platform durability.