- CloudCertPro - Learn the Architecture Behind the Certification
- >
- Azure Cloud Knowledge Hub - CloudCertPro
- >
- Azure Domains Learning Hub: Master Azure by Capability Domains
- >
- Azure Compute Domain
Azure Compute Domain
Azure Compute is the execution fabric of the cloud—the set of services that provide processing power for applications, workflows, and AI workloads. This domain covers the full spectrum from raw virtual machines to serverless functions, container orchestrators, and batch processing. This page explains compute as an architectural decision domain, not a feature catalogue, and connects it to scalability, reliability, cost, and the specific demands of modern AI and agent systems.
1. Overview #
What Is Compute in Cloud Architecture #
In cloud architecture, compute refers to the resources that execute instructions: CPU cycles, memory, GPU, and the orchestration layer that manages lifecycle, scaling, and availability. It is the layer where business logic runs, data is processed, and models infer.
Compute abstracts physical hardware into consumable units—virtual machines, containers, or code execution environments—and provides mechanisms for scaling, networking, identity integration, and health monitoring.
Compute as the Execution Layer of Distributed Systems #
Every cloud application is a distributed system. The compute layer is the active substrate where services process requests, run algorithms, and transform data. It interacts with storage (state), networking (communication), and identity (access). In well-architected systems, compute is decoupled from these concerns; it owns no durable state, relies on managed identities for authentication, and scales independently.
Why Compute Is Central to System Scalability and Design #
Compute choices directly impact:
- Throughput and latency under load.
- Cost efficiency—idle resources vs. pay-per-use models.
- Operational overhead—patching, scaling, OS management.
- Resilience—how quickly the system recovers from failures.
- Flexibility to adopt new technologies like AI models and agents.
Selecting the right compute model for each workload is the single most consequential architecture decision after identity design.
2. Core Compute Models in Azure #
Azure provides a compute continuum from full-control IaaS to fully-managed serverless.
Virtual Machines (IaaS) #
Virtual Machines give you full control over the OS, installed software, and kernel configuration. They are used for:
- Legacy lift-and-shift applications that cannot be refactored.
- Custom OS requirements, third‑party licensing, or specialized networking.
- High‑performance computing (HPC) with InfiniBand and GPU types.
VMs require manual configuration for high availability (Availability Sets, Availability Zones), scaling (through VM Scale Sets), and patch management. They are best for steady, predictable workloads and environments demanding OS‑level customization.
Virtual Machine Scale Sets (VMSS) #
VM Scale Sets build on VMs, offering automated horizontal scaling based on metrics, schedule, or custom rules. They distribute identical VM instances across fault domains and can integrate with load balancers or Application Gateway. Use cases:
- Stateless web/API front‑ends.
- Worker pools for queue‑based processing.
- Container hosts for self‑managed clusters.
VMSS still requires OS and application maintenance, but eliminates manual scale‑out.
App Service (PaaS) #
App Service is a fully‑managed platform for web apps, REST APIs, and mobile backends. It abstracts infrastructure, provides auto‑scaling, deployment slots, and native authentication. It supports multiple languages and frameworks on Windows and Linux.
Ideal for HTTP‑based workloads that fit standard web patterns. Not designed for background tasks that run longer than a few minutes or require raw network control.
Azure Functions (Serverless) #
Azure Functions is an event‑driven compute service that scales per event. Functions run in response to triggers (HTTP, queues, timers, events) and are billed per execution or on dedicated plans. They are inherently stateless; durable functions add orchestration for stateful workflows.
Use Functions for lightweight, bursty workloads like file processing, webhooks, and API glue. Avoid for long‑running, high‑throughput stream processing unless using the Premium plan with always‑warm instances.
Containers (AKS, Container Instances, Container Apps) #
Azure provides a spectrum of container runtimes:
- Azure Kubernetes Service (AKS): fully managed Kubernetes for microservices at scale. It provides fine‑grained scheduling, networking, and orchestration. Best for teams with Kubernetes expertise running complex distributed applications.
- Azure Container Instances (ACI): single‑instance, pay‑per‑second containers with no orchestration. Useful for burst jobs or simple testing.
- Azure Container Apps (ACA): a serverless container platform that abstracts Kubernetes. It offers auto‑scaling, zero‑scale, revisions, and native event‑driven capabilities (via KEDA). Suited for teams that want container benefits without cluster management.
Batch Processing Workloads #
Azure Batch manages large‑scale, parallel job execution across pools of VMs. It handles job scheduling, task distribution, retries, and lifecycle. Common for media rendering, financial modeling, and data preparation pipelines. Batch can also orchestrate training jobs for AI models.
3. Compute Architecture Patterns #
Stateless vs Stateful Workloads #
- Stateless compute instances handle any request without relying on local data that survives a restart. They scale horizontally easily; sessions are stored externally (Redis, database). Most PaaS and serverless services assume stateless design.
- Stateful workloads hold data locally (in‑memory, local SSD) and require sticky sessions or partition awareness. Stateful patterns are implemented in VMs with persistent disks, or in AKS using StatefulSets and Persistent Volume Claims. Always prefer pushing state to external stores to enable resilience and scaling.
Horizontal vs Vertical Scaling #
- Vertical (scale up/down): increase CPU/RAM of an existing node. Simple but limited by hardware caps; often requires a restart.
- Horizontal (scale out/in): add/remove nodes. This is the cloud‑native scaling mode, providing elasticity and fault isolation.
Azure services natively support horizontal scaling: VMSS, App Service, Functions, AKS Pod autoscaling, Container Apps.
Auto-Scaling Strategies #
Auto‑scaling aligns cost with demand. Key strategies:
- Metric‑based: scale on CPU, memory, HTTP queue length, or custom metrics. Include cool‑down periods to avoid flapping.
- Scheduled: pre‑scale for known peaks (e.g., 9 AM login surge).
- Event‑driven: native to Functions and Container Apps (KEDA), scaling based on queue depth, Kafka lag, or service invocation rate.
Scaling can be reactive (metric breaches threshold) or predictive (machine‑learning forecasts). Functions and App Service offer both.
High Availability Compute Design #
HA design ensures the application remains available despite failures. Core techniques:
- Deploy instances across Availability Zones for zone redundancy.
- Within a region, use Availability Sets or VMSS to spread across fault/update domains.
- For global availability, deploy to multiple regions and use Azure Front Door or Traffic Manager for failover.
- Architect stateless compute for instant redirection; use geo‑replicated storage for state.
Multi-Region Compute Deployment #
Multi‑region architectures serve users globally and provide disaster recovery. Compute stacks are replicated in active/active or active/passive configurations. Synchronization of identity, data, and configuration is essential. Use Azure Front Door for global load‑balancing; CI/CD pipelines must promote consistent deployments across regions.
4. Compute Decision Framework #
VM vs Container vs Serverless #
| Criterion | VM (IaaS) | Containers (AKS/ACA) | Serverless (Functions) |
|---|---|---|---|
| Control | Full OS, kernel, drivers | Application dependencies only | Code and configuration |
| Operational overhead | High (patching, scaling) | Medium (AKS) / Low (ACA) | Minimal |
| Startup time | Minutes | Seconds (containers) | Sub‑second to seconds |
| Scaling model | Manual or VMSS | HPA (AKS), auto‑scale (ACA) | Automatic per event |
| Cost model | Fixed, reserved instances | Cluster cost + per‑node | Pay per execution or plan |
| Best for | Legacy, HPC, custom networking | Microservices, complex topologies | Event‑driven, intermittent use |
Guidance: Use VMs only when you must control the runtime. For new workloads, containers or serverless reduce operational burden and improve elasticity.
When to Use AKS vs Container Apps #
- AKS: you need the full Kubernetes surface (custom scheduling, network policies, Operators, StatefulSets, multi‑tenant namespaces). AKS is ideal for large teams with platform engineering practices.
- Container Apps: you want serverless containers without managing nodes. Perfect for web APIs, event‑driven processing, and background jobs. It provides revisions, ingress, and auto‑scaling out of the box.
When to Use App Service vs Functions #
- App Service: continuous HTTP workloads with stable performance, deployment slots, and integrated authentication. Not ideal for sporadic or event‑driven processing.
- Functions: fine‑grained execution triggered by events. Use for decoupled, async tasks. Use the Premium plan if you need VNET integration and longer timeouts.
Often, the two are combined: App Service handles the synchronous API, while Functions process background events.
Cost vs Control vs Scalability Trade-offs #
More abstraction lowers operational cost but limits customisation. The decision matrix:
- High control, high cost: VMs (manage OS, scaling logic).
- Medium control, lower cost: Containers (packaged dependencies, platform scales out).
- No infrastructure, variable cost: Serverless (no idle cost, but cold starts and execution limits).
Match the billing model to the workload pattern: steady traffic → reserved capacity; bursty → consumption‑based.
Long-Running vs Event-Driven Workloads #
- Long‑running (web servers, stream processors, agent loops): use App Service, AKS, Container Apps with minimum replica counts, or VMs.
- Event‑driven (queues, timers, file uploads): use Functions or Container Apps with KEDA scaling, scaling down to zero when inactive.
5. Compute in Enterprise Architecture #
Microservices Architectures #
Microservices demand independent deployability and scaling. AKS is the go‑to for large‑scale microservices with service meshes, mTLS, and canary deployments. Container Apps offer a simpler alternative when full Kubernetes is overkill.
Each microservice runs with its own managed identity and scales on its own metrics. State is always external (Redis, Cosmos DB, SQL). Asynchronous messaging (Service Bus, Event Hubs) decouples services and allows independent compute scaling.
Multi-Tier Web Applications #
A classic three‑tier app maps to:
- Web tier: App Service (or static site in Storage/CDN) for the front‑end.
- API tier: App Service, Container Apps, or AKS for REST/GraphQL APIs.
- Background tier: Functions or Container Apps consuming queues for order fulfillment, emails, reporting.
Identity tokens flow from the web tier through the API tier; managed identities secure service‑to‑service calls to databases and storage.
Hybrid Cloud Systems #
With Azure Arc, on‑premises VMs and Kubernetes clusters become Azure‑managed resources. This enables consistent governance, monitoring, and identity across environments. Compute bursting from on‑premises to Azure for peak demand can use Arc‑enabled Kubernetes or Azure Stack HCI.
Enterprise Integration Workloads #
Integration logic often involves protocol mediation, message translation, and long‑running workflows. Azure Logic Apps provides managed workflows; Azure Functions adds custom transformation steps; AKS hosts full integration engines when needed. Compute choices must ensure reliability and transactional consistency, often using the outbox pattern and idempotent handlers.
6. Compute for AI & Agent Systems #
Compute is the runtime engine for AI applications. It hosts the logic that orchestrates model calls, enforces safety, and manages state.
Hosting LLM Applications (Azure OpenAI Apps) #
An LLM‑powered application runs on compute that:
- Accepts user prompts and manages conversations.
- Calls Azure OpenAI (via SDK or REST) with managed identity.
- Applies guardrails, filters, and business logic.
- Orchestrates calls to knowledge bases (RAG) and tools.
This compute is stateless web services (App Service, Container Apps) that scale based on request rate. They cache embeddings or responses and integrate with Redis for session state.
RAG Pipelines Execution Layer #
Retrieval‑Augmented Generation involves both ingestion and query execution:
- Ingestion: compute processes documents, chunks text, generates embeddings, and indexes them. Functions or Container Apps triggered by blob events are a natural fit.
- Query: the runtime retrieves relevant chunks, assembles a prompt, calls the LLM, and formats the result. Low latency is critical; Container Apps with scale‑to‑zero and fast startup or always‑warm Functions Premium are used.
Agent Runtime Environments #
AI agents (multi‑step planners that call tools) require compute that supports:
- Stateful loops: the agent plans, executes, observes, and replans. Durable Functions or Container Apps with Dapr workflow provide long‑running orchestration.
- Tool execution: the runtime invokes APIs, databases, and code interpreters. It must do so with scoped, user‑delegated permissions (OBO tokens) to prevent over‑privileged actions.
- Concurrency and isolation: each agent session may be a separate container instance or actor, ensuring resource isolation and secure context.
The compute layer for agents must be resilient to tool call failures, support timeouts, and log every action for auditability.
Inference Workloads and Scaling Patterns #
When serving custom AI models (fine‑tuned LLMs, vision models) on AKS or Azure ML managed online endpoints:
- GPU sizing: choose SKUs (NCasT4_v3, ND A100 v4) matching model precision and latency targets.
- Scaling: AKS uses Horizontal Pod Autoscaler with GPU utilization or custom metrics; Azure ML endpoints support automatic scale‑out.
- Cold start mitigation: keep at least one replica warm, use Triton Inference Server for fast model loading, or trade cost for pre‑provisioned capacity.
Low-Latency API Serving for AI Systems #
For real‑time AI features (chat, recommendation, search), compute must minimize latency:
- Co‑locate compute in the same region as the AI endpoint.
- Use streaming protocols (SSE, WebSockets) and keep connections alive.
- Offload heavy post‑processing to async workers, acknowledging the user quickly.
- Monitor end‑to‑end latency and adjust instance sizes and scaling thresholds.
7. Reliability, Scaling & Performance #
Auto-Scaling Strategies #
Effective auto‑scaling requires:
- Statelessness: any instance can serve any request.
- Health probes: only healthy instances receive traffic.
- Scale‑out speed: fast‑starting instances (containers, Functions) react quickly.
- Scale‑in protection: drain connections before termination.
Use predictive scaling and schedule‑based scaling for predictable patterns to avoid cold start impact.
Availability Zones and Redundancy #
Deploying compute across Availability Zones protects against datacenter failure. App Service (zone‑redundant SKU), AKS (zonal node pools), VMSS, and Container Apps support zone redundancy. In a zone‑redundant setup, the platform automatically load‑balances healthy zones.
Stateless Compute Design #
Stateless compute improves reliability and simplifies scaling. Best practices:
- Externalize session state (Redis, Cosmos DB).
- Store all durable data in managed storage (Blob, SQL) never on instance local disk.
- Use durable execution frameworks for workflows that must survive crashes (Durable Functions, Temporal, Dapr Workflows).
Performance Optimization Patterns #
- Choose the right plan: App Service Premium v3 for higher performance per core; Functions Premium for always‑warm workers and longer execution; Container Apps dedicated workload profiles for predictable performance.
- Async everywhere: decouple request handling from processing via queues.
- Caching: CDN, in‑memory cache, and response caching reduce load on compute.
- Connection pooling: reuse HTTP connections to databases and AI services.
- Resource limits: configure memory and CPU limits to prevent noisy‑neighbor issues in containers.
8. Certification Mapping #
Compute is a foundational domain across Azure certifications:
| Certification | Compute Relevance |
|---|---|
| AZ-104 | Configure and manage VMs, VMSS, App Service, Functions, containers. Implement scaling and networking for compute. |
| AZ-305 | Design compute solutions: choose services, design for HA/DR, size and scale, container orchestration strategy. |
| AI-900 | Basic understanding of compute for AI: where models run, serverless AI integration. |
| AI-103 | Build and deploy AI application compute: host LLM apps, agent runtimes, configure scaling for AI APIs. |
| AI-300 | Architect MLOps compute: training clusters, inference endpoints, GPU sizing, and scaling for AI workloads. |
| GH-600 | Design agent execution environments: secure tool invocation runtime, container isolation, identity boundaries for autonomous agents. |
9. Real-World Architecture Example #
Scenario: A multi‑tenant SaaS platform with an integrated AI co‑pilot.
Compute components:
- Tenant web portal: Azure App Service (zone‑redundant, auto‑scaled by HTTP queue length). Serves React front‑end and APIs for tenant management.
- Core microservices: AKS cluster hosting services for billing, user management, and reporting. Each service runs in its own namespace, with HPA scaling on CPU and custom metrics (requests per second). Istio service mesh handles mutual TLS and retries.
- Async job processor: Azure Functions (Premium) triggered by Event Hubs processes usage events, enriches data, and writes to Azure Cosmos DB. The Premium plan ensures warm workers and VNet integration to access private endpoints.
- AI co‑pilot service: Azure Container Apps hosts a Python agent that supports chat and “take action” commands. It uses Dapr for state management and service invocation. The agent calls Azure OpenAI (GPT‑4) and a RAG index (Azure AI Search). It obtains an on‑behalf‑of token from the user’s session to execute backend actions with user‑scoped permissions.
- Custom model inference: A fine‑tuned summarization model runs on an AKS GPU node pool (ND A100 v4) with KEDA‑based auto‑scaling on GPU utilization. The agent service calls this endpoint for document summarization tasks.
- Batch training pipeline: Azure Machine Learning pipelines run on ephemeral compute clusters, provisioned only during training, using managed identity for data access.
Resilience: The AKS cluster spans three Availability Zones, App Service and Container Apps are zone‑redundant. Azure Front Door distributes global traffic. Functions are deployed to a secondary region for disaster recovery.
Identity: All components use managed identities (system‑assigned for App Service and Functions, user‑assigned for AKS workloads via workload identity). The AI agent’s tool access is scoped to the user’s permissions through token exchange.
This architecture demonstrates how compute decisions are workload‑driven: PaaS for standard web apps, serverless containers for agent flexibility, AKS for complex microservices, and dedicated GPU nodes for high‑throughput inference—all bound together by identity and event‑driven messaging.