Skip to main content
  1. CloudCertPro - Learn the Architecture Behind the Certification
  2. >
  3. Azure Cloud Knowledge Hub - CloudCertPro
  4. >
  5. AZ-104 Azure Administrator Associate Learning Hub
  6. >
  7. AZ-104 Skill Overview
  8. >
  9. AZ-104 Skill 5

AZ-104 Skill 5: Monitor and Maintain Azure Resources

Deep-dive guide | Monitoring, Backup, and Disaster Recovery for the Azure Administrator

Cloud operations do not end when a resource is deployed. The operational lifecycle—Build → Operate → Monitor → Recover—continues as long as the workload runs. Microsoft allocates 10–15% of the AZ‑104 exam to monitoring and maintenance because an administrator who cannot see, alert on, or recover from failures is operating blind. This skill area completes the operational loop, ensuring that the identity, storage, compute, and networking foundations you built remain healthy, performant, and restorable.

This guide maps every tested monitoring and recovery skill to the underlying domains, the architecture patterns that drive operational excellence, and the real‑world scenarios where visibility and resilience are non‑negotiable.


1. Overview
#

Monitoring and maintenance are the feedback system of cloud operations. They answer four essential questions:

  1. Is it working? — Metrics, health probes, availability checks.
  2. What happened? — Logs, diagnostic data, change history.
  3. Who needs to know? — Alerts, action groups, notifications.
  4. How do we recover? — Backups, snapshots, failover procedures.

As an administrator, you configure Azure Monitor to collect metrics and logs, set up alerts that trigger action groups, use Network Watcher to diagnose connectivity issues, and implement Azure Backup and Azure Site Recovery to protect data and workloads against accidental deletion, corruption, or regional disasters.

From an architectural perspective, monitoring and disaster recovery are not afterthoughts—they are pillars of the Well‑Architected Framework’s Operational Excellence and Reliability pillars. The skills you learn here form the operational backbone that supports everything else you deploy.


2. Skill Area Breakdown
#

Monitoring and Observability
#

  • Metrics: Numeric data collected at regular intervals (CPU percentage, network in/out, storage transactions). Stored for 93 days; used for dashboards and alert rules.
  • Logs: Detailed event records, typically sent to a Log Analytics workspace. Queryable using Kusto Query Language (KQL).
  • Alert rules: Conditions based on metrics or log queries that trigger actions.
  • Action groups: Lists of actions to take when an alert fires (email, SMS, webhook, ITSM, Azure Function).
  • Alert processing rules: Suppress or modify alert notifications during maintenance windows or for specific conditions.
  • Azure Monitor Insights: Curated monitoring experiences for VMs, storage accounts, networks, containers, and more.
  • Network Watcher: Tools for diagnosing network‑level issues: IP flow verify, next hop, VPN troubleshoot, packet capture, NSG flow logs.
  • Connection Monitor: End‑to‑end connectivity monitoring across VNets, regions, and on‑premises.

Backup and Disaster Recovery
#

  • Recovery Services vault: A storage entity in Azure that holds backup data and Site Recovery replication policies. Used for Azure Backup (VMs, SQL, SAP HANA, Azure Files) and Azure Site Recovery.
  • Backup vault: A newer, simpler vault for backing up newer workloads like Azure Blobs, Azure Disks, and PostgreSQL. Not all workloads are supported here yet.
  • Backup policies: Define frequency (daily, weekly), retention (daily, weekly, monthly, yearly), and consistency type (crash‑consistent, app‑consistent).
  • Restore operations: Restore VMs (create new or replace), file/folder recovery, database restores.
  • Azure Site Recovery (ASR): Orchestrates replication, failover, and failback of VMs and physical servers between Azure regions or from on‑premises to Azure.
  • Failover operations: Test failover (non‑disruptive) and planned/unplanned failover.
  • Backup reporting: Backup Explorer, Backup Reports (Power BI template), and alerts for failed backup jobs.

3. Azure Domains Mapping
#

Azure Domain What it encompasses Key services in this skill area
Observability Visibility into system health, performance, and behaviour Azure Monitor, Log Analytics, Application Insights, Insights (VM, Storage, Network)
Monitoring Metrics collection, alerting, operational awareness Azure Monitor Metrics, Alert Rules, Action Groups, Alert Processing Rules
Backup & Disaster Recovery Data and workload protection, recovery strategies, business continuity Azure Backup, Recovery Services Vault, Backup Vault, Azure Site Recovery
Networking Operations Network‑level diagnostics, connectivity monitoring, troubleshooting Network Watcher, Connection Monitor

Understanding these domains helps you connect exam tasks with the higher‑level goal: a question about configuring a Log Analytics workspace is an Observability task; a question about failing over to a secondary region is a Backup & DR task.


4. Azure Architecture Mapping
#

Observability Architecture
#

Observability is the strategy of inferring the internal state of a system from its external outputs. In Azure, that strategy is built on a unified data plane:

graph LR Resources[Azure Resources] --> Metrics[Azure Monitor Metrics] Resources --> Logs[Diagnostic Settings] Logs --> LA[Log Analytics Workspace] Metrics --> Alerts[Alert Rules] LA --> Alerts Alerts --> AG[Action Groups] AG --> Notify[Email / SMS / Webhook / ITSM] LA --> Dashboards[Dashboards & Workbooks] Metrics --> Dashboards
  • Metrics pipeline: Fast, lightweight, used for near‑real‑time dashboards and alerting.
  • Logs pipeline: Flexible, queryable, used for deep diagnostics, trend analysis, and complex alert conditions.
  • Log Analytics workspace: The central hub for log data from multiple subscriptions and regions. Design consideration: tenant vs. regional workspaces for data residency and access control.

Monitoring Architecture
#

Effective monitoring moves from reactive to proactive. The architecture includes:

  • Health model: Define what “healthy” means for each resource (e.g., CPU < 80%, response time < 200ms).
  • Alert strategy: Alerts should be actionable, not noisy. Use dynamic thresholds where possible, and set up suppression during maintenance with alert processing rules.
  • Automated response: Use action groups to trigger runbooks or webhooks for auto‑remediation (e.g., restart a VM, scale out a scale set).

Backup Architecture
#

Backup is about protecting against data loss. Architecture decisions revolve around recovery objectives:

  • Recovery Point Objective (RPO): Maximum acceptable data loss measured in time. Determines backup frequency.
  • Recovery Time Objective (RTO): Maximum acceptable downtime. Determines restore speed and process.
  • Backup design: Use Azure Backup for VMs, files, and databases. For VMs, the backup extension takes a snapshot, then transfers the data to the Recovery Services vault. Application‑consistent backups use VSS writers.
  • Retention: Define retention policies based on compliance needs (e.g., 30 daily, 12 monthly, 5 yearly). Long‑term retention can use GRS vault storage.

Disaster Recovery Architecture
#

Azure Site Recovery (ASR) replicates entire VM workloads to a secondary region. Key design points:

  • Replication policy: Defines RPO (as low as 30 seconds for certain workloads) and retention of recovery points.
  • Failover: Can be planned (no data loss, primary region is still up), unplanned (primary region down, may lose recent data), or test failover (validates setup without impacting production).
  • Recovery plans: Scripted groups of VMs that failover together, with custom actions (runbooks, manual steps).

Resilience and Operational Excellence
#

Monitoring and recovery feed into the broader architecture’s resilience. By combining health metrics, automated failover, and tested recovery plans, you create a system that can withstand failure gracefully. Operational excellence is achieved by continuously refining alerts, automating responses, and learning from incidents.


5. Azure Services Deep Dive
#

Azure Monitor
#

  • Metrics: Platform metrics automatically collected from most Azure resources (CPU, disk IOPS, network bytes). Custom metrics can be sent via API.
  • Logs: Requires configuring diagnostic settings to send resource logs and platform metrics to a Log Analytics workspace, storage account, or Event Hub. Not all logs are collected by default.
  • Insights: Pre‑built monitoring experiences that combine metrics and logs. VM Insights (guest‑level performance, processes, dependencies), Network Insights (topology, connectivity), Storage Insights (capacity, transactions).

Log Analytics
#

  • Workspace: A container for log data. Query with KQL. Workspaces can aggregate data from multiple subscriptions if they share the same Entra ID tenant.
  • KQL basics: Perf \| where TimeGenerated > ago(1h) \| summarize avg(CounterValue) by Computer — this is a typical exam skill; interpreting a query’s output.
  • Agents: Log Analytics agent (legacy) and Azure Monitor Agent (new). The exam may reference both; know that VM Insights requires an agent.

Alert Rules & Action Groups
#

  • Alert rule types: Metric alerts (fixed or dynamic thresholds), log search alerts (KQL query result), activity log alerts (administrative events).
  • Action groups: Reusable list of notification and action channels. Can trigger email, SMS, voice, push notification, webhook, Azure Function, Logic App, ITSM connector.
  • Alert processing rules: Applied after alert firing to suppress alerts during maintenance or apply additional actions without modifying the alert rule.

Network Watcher
#

  • IP flow verify: Checks if a packet is allowed or denied between source and destination.
  • Next hop: Shows the next hop type (Internet, VirtualAppliance, etc.) for a destination IP.
  • Connection troubleshoot: Comprehensive end‑to‑end test (NSGs, routing, platform).
  • NSG flow logs: Log allowed/denied traffic for analysis.

Connection Monitor
#

  • Purpose: Periodic probes from source to destination over TCP, HTTP, or ICMP, across VNets, regions, and hybrid networks. Provides latency, hop‑by‑hop topology, and trend analysis.

Recovery Services Vault
#

  • Use cases: Backup for Azure VMs, SQL in Azure VM, SAP HANA, Azure Files, and on‑premises workloads. Also used for Azure Site Recovery.
  • Storage: Built‑in redundancy (LRS/GRS) configurable at vault creation and not modifiable later.
  • Soft delete: Enabled by default for vault backups, retaining deleted backup data for 14 days.

Azure Backup
#

  • VM backup: At first backup, a snapshot is created. Subsequent backups are incremental, only transferring changed blocks.
  • Restore: VM restore (create new VM or replace existing), disk restore, file/folder recovery (mount snapshot).
  • Backup center: Unified management for backup at scale across subscriptions.

Azure Site Recovery
#

  • Replication: Continuous replication of Azure VMs from one region to another. Uses a cache storage account in source region, then sends data to target region.
  • Failover: Unplanned failover selects a recovery point (latest, latest processed, custom). Planned failover ensures zero data loss by shutting down source VMs first.
  • Failback: After the primary region recovers, re‑protect the target VMs and failback.

6. Monitoring & Recovery Decision Framework
#

Decision Recommended Service Key Considerations
Near‑real‑time performance monitoring and alerting on CPU, memory, disk, network Azure Monitor Metrics Fast, free platform metrics; 93‑day retention. Use dynamic thresholds for intelligent alerting.
Deep diagnostics, correlation across resources, long‑term log retention Log Analytics / KQL Pay per GB ingested and retained; powerful query language.
Pre‑built health and performance views for VMs, storage, networks Insights (VM, Storage, Network) Enable with a few clicks; may require agents.
Notify on‑call team when a critical metric crosses a threshold Alert Rules + Action Groups Action groups reusable; multiple notification types.
Diagnose network connectivity issues between VMs Network Watcher (IP flow verify, next hop) No additional cost for basic features.
Monitor latency and connectivity across regions/hybrid Connection Monitor Agents or VM extensions required.
Protect Azure VMs from accidental deletion, corruption, or ransomware Azure Backup (Recovery Services vault) Policy‑based, app‑consistent or crash‑consistent, incremental forever.
Ensure business continuity if an entire Azure region goes down Azure Site Recovery Continuous replication, orchestrated failover, recovery plans.
Backup Azure Blob storage against deletion/overwriting Operational backup (Backup vault) Point‑in‑time restore for block blobs.

Cost considerations:

  • Azure Monitor metrics are free; logs are charged by ingestion, retention, and queries.
  • Azure Backup is charged per protected instance size and storage consumed in vault.
  • Azure Site Recovery is charged per protected instance and storage consumed.

7. Real‑World Enterprise Scenario
#

Company: Adatum Corp runs a multi‑tier application in Azure.

Environment:

  • Primary region: East US with production VMs (web, app, SQL), App Services, and Azure SQL Database.
  • DR region: West Europe for business continuity.

Requirements:

  1. Proactively monitor application health and performance.
  2. Automatically alert the operations team when CPU exceeds 90% for 10 minutes on any web VM.
  3. Protect all VMs from accidental deletion and allow point‑in‑time restore for at least 30 days.
  4. Ensure the entire application can failover to West Europe within 4 hours with minimal data loss.
  5. Diagnose network connectivity issues quickly.

Solution design:

  • Monitoring: VM Insights enabled on all VMs. Azure Monitor collects platform metrics. Log Analytics workspace aggregates logs from all resources, including App Service and SQL Database diagnostics.
  • Alerting: A metric alert rule with a dynamic threshold monitors CPU on web VMs. Action group emails the on‑call team and creates an ITSM ticket.
  • Backup: All production VMs are backed up daily using a Recovery Services vault in East US with GRS redundancy. Backup policy retains 30 daily restore points. Azure SQL Database has built‑in automated backups, but Adatum also configures long‑term retention (LTR) to a separate storage.
  • Disaster Recovery: Azure Site Recovery replicates all VMs from East US to West Europe with an RPO of 5 minutes. A recovery plan orchestrates the failover order: App VMs → Web VMs → SQL VM, with a post‑failover script to update DNS. Test failovers are performed quarterly.
  • Network diagnostics: Network Watcher is deployed in the East US region. Connection Monitor probes connectivity between the web tier and API layer to detect latency issues early.

Operational workflow:

  1. Proactive: Dashboards show real‑time health. Insights surface anomalies.
  2. Reactive: CPU alert triggers → on‑call engineer investigates → if VM unresponsive, ASR can be invoked if the issue is regional.
  3. Recovery: If a VM is accidentally deleted, restore from backup within hours. If a regional disaster is declared, failover to West Europe using Site Recovery.

8. AZ‑104 Exam Thinking
#

Monitoring Questions
#

  • Metrics interpretation: You might be shown a chart of metric values and asked what configuration change would resolve the issue (e.g., CPU spiking → scale out).
  • Log queries: You could be presented with a KQL query and asked what data it returns, or which query would produce a specific result. Understand basic operations: where, summarize, count, timechart.
  • Alert configuration: “You need to be notified if the number of failed sign‑ins exceeds 10 in 5 minutes.” Use a log search alert rule or metric alert (if available). Know which alert type fits the signal type.

Network Monitoring Questions
#

  • Connectivity troubleshooting: “A VM cannot reach an external service.” Use IP flow verify to check NSG rules, next hop to see routing. Connection troubleshoot for end‑to‑end.
  • Network Watcher tools are region‑based: Network Watcher must be enabled in the region where the resource resides.

Backup Questions
#

  • Vault selection: “You need to back up an Azure Blob storage account.” → Use Backup vault, not Recovery Services vault. (Exam tip: Recovery Services vault for VMs, SQL, SAP; Backup vault for blobs, disks.)
  • Restore scenarios: Replace existing VM (must stop VM and select overwrite), create new VM (different name, location), or file‑folder recovery (mount snapshot).
  • Policies: Understand retention settings and consistency options.

Disaster Recovery Questions
#

  • Failover procedure: Unplanned failover from Region A to Region B. What steps? Start failover, select recovery point, and then commit after validation.
  • RPO/RTO: Matching replication policy to business requirement.
  • Recovery plans: grouping VMs and custom scripts.

General reasoning approach:

  1. Identify the operational requirement (detect, notify, restore, failover).
  2. Match to the correct service (Azure Monitor, Backup, ASR, Network Watcher).
  3. Consider constraints: cost, time, compliance.

9. Practice Scenarios
#

Scenario 1 – Metric Alert
#

Your production VM Scale Set runs a web application. You need an alert that triggers when the average CPU percentage across the scale set exceeds 85% for 10 minutes.

What type of alert rule should you create?

A. A log search alert rule based on performance counters. B. A metric alert rule with a static threshold. C. A metric alert rule with a dynamic threshold. D. An activity log alert rule.

Correct answer: B (or C, but given the specific threshold, static is precise). Dynamic thresholds are great when the baseline varies, but a fixed 85% is a static threshold. The question asks for a specific value, so static threshold metric alert is appropriate. C could also work, but the requirement is explicit. B is the most direct.
Explanation: Metric alert rules can evaluate any platform or custom metric. A static threshold of 85% with an aggregation of “Average” and a 10‑minute window meets the requirement.
Architecture reasoning: Use static thresholds when you know the exact limit; dynamic thresholds are better for unpredictable patterns.
Related services: Azure Monitor Metrics, Alert Rules.

Scenario 2 – Log Query
#

You need to find all events from the last hour where a VM was stopped and list who initiated the stop.

Which solution should you use?

A. Query the Event table in Log Analytics with a filter on EventID 1074. B. View the VM’s Metrics blade for stop events. C. Run a query on AzureActivity table filtering by OperationName and Caller. D. Use Network Watcher’s IP flow verify.

Correct answer: C
Explanation: The AzureActivity table in Log Analytics contains administrative operations, including VM start/stop events and the caller. The Event table holds guest OS logs, not platform management actions.
Architecture reasoning: Platform‑level operations are sent to the Activity Log, which can be streamed to Log Analytics for querying.
Related services: Log Analytics, Azure Activity Log.

Scenario 3 – Backup Vault Selection
#

You need to configure backup for an Azure Blob storage account to protect against accidental deletion of blobs.

What should you create?

A. A Recovery Services vault and configure a VM backup policy. B. A Backup vault and configure a backup policy for Azure Blobs. C. A Log Analytics workspace and enable diagnostic settings. D. An Azure Site Recovery vault.

Correct answer: B
Explanation: Backup vaults support Azure Blob operational backup, which provides point‑in‑time restore for block blobs. Recovery Services vaults do not support blobs.
Architecture reasoning: Different vault types serve different workloads; choosing the correct one is an exam focus.
Related services: Backup vault, Azure Blob Storage.

Scenario 4 – Site Recovery Failover
#

Your primary Azure region experiences a widespread outage. You decide to initiate a failover of critical VMs to the secondary region using Azure Site Recovery.

What must you do to make the failed‑over VMs accessible with the same public IP addresses?

A. Configure a traffic manager endpoint and update DNS manually. B. Nothing; ASR preserves public IPs automatically. C. Create a public IP in the secondary region and reassign DNS records; public IPs cannot be preserved natively across regions. D. Use an Azure Load Balancer with a floating IP.

Correct answer: C
Explanation: Azure public IPs are regional. ASR replicates VMs but cannot preserve the same public IP across regions. You must create new public IPs and update DNS records, or use a global load balancer like Front Door or Traffic Manager to abstract the IP.
Architecture reasoning: Disaster recovery plans must account for DNS changes. Recovery plans can include scripts to update DNS.
Related services: Azure Site Recovery, Azure DNS, Traffic Manager.

Scenario 5 – Network Watcher
#

A VM named VM‑App cannot connect to an Azure SQL Database. You suspect an NSG is blocking outbound traffic on port 1433.

Which Network Watcher tool should you use to verify this?

A. Connection troubleshoot. B. IP flow verify. C. NSG flow logs. D. Next hop.

Correct answer: B
Explanation: IP flow verify checks whether a packet from a specific source VM to a destination IP and port is allowed or denied by NSG rules, and which rule is responsible. Connection troubleshoot does more comprehensive end‑to‑end but IP flow verify is the precise tool for NSG evaluation.
Architecture reasoning: Use IP flow verify for quick NSG rule diagnostics. Connection troubleshoot is better for overall connectivity.
Related services: Network Watcher, IP flow verify.


10. Common Exam Mistakes
#

  • Confusing metrics and logs: Metrics are numeric time‑series (CPU, memory); logs are text records (events, traces). Alert rules can be based on either, but the type determines configuration.
  • Confusing Azure Backup and Azure Site Recovery: Backup protects against data loss (deletion, corruption); ASR protects against infrastructure failure (regional outage) and provides workload mobility.
  • Ignoring RPO/RTO requirements: When a scenario specifies a data loss tolerance of 5 minutes, ASR must be configured with a replication policy meeting that RPO; backup alone may not suffice.
  • Configuring alerts without action groups: An alert rule without an action group is useless for notifications. The exam checks if you know action groups are required.
  • Misunderstanding backup vault types: Recovery Services vault for VMs, SQL, SAP HANA, and Azure Files. Backup vault for Blobs, Managed Disks, and PostgreSQL. A scenario about backing up a storage account blob requires a Backup vault.
  • Overlooking diagnostic settings: Resource logs are not collected by default. If a question asks to analyze VM sign‑in logs or storage access logs, you must first configure diagnostic settings to send them to Log Analytics.
  • Misinterpreting monitoring data: An exam might show a metric graph with a sudden spike and ask what likely caused it. Know the typical patterns (e.g., high CPU → need to scale; network in drop → NSG rule change).
  • Network Watcher regionality: If the VM is in East US, Network Watcher must be enabled in East US to use its tools on that VM.

11. Skill 5 Learning Checklist
#

Must Know (exam‑critical)
#

  • Interpret Azure Monitor metrics and create metric alert rules (static thresholds)
  • Configure diagnostic settings to send logs to Log Analytics workspace
  • Query logs using basic KQL (where, summarize, timechart, count)
  • Create and configure alert rules (metric, log search, activity log)
  • Create and manage action groups (notifications: email, SMS, webhook)
  • Configure alert processing rules to suppress alerts during maintenance
  • Use VM Insights, Storage Insights, and Network Insights to monitor resources
  • Use Network Watcher tools: IP flow verify, next hop, connection troubleshoot
  • Create a Recovery Services vault and configure backup policies for Azure VMs
  • Perform VM backup and restore operations (create new VM, replace existing, file recovery)
  • Create a Backup vault and configure operational backup for blobs
  • Understand the differences between Recovery Services vault and Backup vault
  • Configure Azure Site Recovery for Azure VMs (replication, failover)
  • Perform test failover and unplanned failover
  • Configure backup reports and backup alerts

Should Know (strong working knowledge)
#

  • Use dynamic thresholds in metric alerts for automatic baselining
  • Create log search alert rules based on custom KQL queries
  • Configure NSG flow logs and traffic analytics
  • Use Connection Monitor for end‑to‑end network monitoring
  • Understand application‑consistent vs. crash‑consistent backups and when to use each
  • Configure backup retention policies with GRS vault storage
  • Create and run recovery plans in Site Recovery (group VMs, scripts)

Nice to Know (architecture context)
#

  • Design a multi‑workspace Log Analytics strategy (tenant vs. regional)
  • Integrate Azure Monitor with Azure Sentinel for security monitoring
  • Understand how to use Azure Workbooks to create custom dashboards
  • Know the role of Azure Backup Server for on‑premises workloads
  • Familiarity with Azure Backup Center for enterprise‑scale backup management

12. What Skill 5 Really Means in AZ-104
#

AZ-104 Skill 5 (Monitor and maintain Azure resources) is not just about monitoring—it represents the operational feedback loop of cloud systems.

This skill covers:

  • Azure Monitor for metrics and logs
  • Log Analytics for query and diagnostics
  • Application Insights for application performance monitoring
  • Azure Backup for data protection
  • Azure Site Recovery for disaster recovery
  • Update Manager for patching and lifecycle maintenance

Together, these capabilities form the operational foundation of Azure systems reliability.

13. From Operations to Architecture Thinking
#

Skill 5 is where AZ-104 transitions from administration into architecture thinking.

Operational signals such as:

  • VM performance metrics
  • Network latency and failures
  • Backup success/failure rates
  • Region failover behavior

become inputs for architectural decisions such as:

  • High availability vs cost trade-offs
  • Multi-region design
  • Disaster recovery strategy (RPO/RTO)
  • Observability architecture design

14. AZ‑104 Certification Summary
#

The AZ‑104 exam validates your ability to operate Azure environments end‑to‑end. Across the five skill areas, you’ve built a complete operational foundation:

  1. Manage Azure Identities and Governance (20–25%): You control who can access what, enforce compliance with policy, and organize the enterprise’s resource hierarchy. Every other skill area depends on this security and organizational bedrock.

  2. Implement and Manage Storage (15–20%): You provide the persistent data layer—blobs, files, and disks—secured with access controls, lifecycle management, and geo‑redundancy. Compute workloads consume this storage.

  3. Deploy and Manage Azure Compute Resources (20–25%): You run the applications—VMs, containers, App Services—scaling them to meet demand and automating deployment with Infrastructure as Code. This is where business value runs.

  4. Implement and Manage Virtual Networking (15–20%): You connect everything: VNets, subnets, peering, NSGs, private endpoints, load balancers, and DNS. Networking is the fabric that makes compute and storage reachable and secure.

  5. Monitor and Maintain Azure Resources (10–15%): You close the loop with visibility, alerting, backup, and disaster recovery. Monitoring tells you what’s happening; backup and recovery ensure you can restore service when things go wrong.

Together, these five skill areas form the Azure operational architecture. An administrator who masters them can run production workloads confidently. And that operational knowledge is exactly what the Azure Solutions Architect Expert (AZ‑305) builds upon.


15. Transition to AZ-305
#

Once you understand how Azure systems behave in production, the next step is learning how to design systems before they are built.

This is where AZ-305 (Azure Solutions Architect Expert) begins:

  • Designing scalable architectures
  • Selecting appropriate Azure services
  • Defining identity, networking, and security boundaries
  • Making trade-offs between cost, reliability, and performance

Next Step Learning Path
#

Recommended progression after AZ-104:

  • AZ-305 → Azure Solutions Architecture & Design
  • AZ-104 Deep Dive → Domain & Service-level mastery
  • Azure Architecture Hub → Design patterns & reference architectures

Continue to AZ-305 Architecture Learning Path →


Related Architecture and Domain Content #

  • Observability domain: /azure/domains/observability
  • Security domain (backup encryption, access): /azure/domains/security
  • Governance (policy for backup compliance): /azure/domains/governance
  • Observability architecture: /azure/architecture/observability-architecture
  • Disaster recovery architecture: /azure/architecture/disaster-recovery-architecture
  • Resilience architecture: /azure/architecture/resilience-architecture
  • Operational excellence architecture: /azure/architecture/operational-excellence
  • Azure Monitor service: /services/azure-monitor
  • Log Analytics: /services/log-analytics
  • Application Insights (for deeper app monitoring): /services/application-insights
  • Network Watcher: /services/network-watcher
  • Azure Backup: /services/backup
  • Azure Site Recovery: /services/site-recovery