- CloudCertPro - Learn the Architecture Behind the Certification
- >
- Azure Cloud Knowledge Hub - CloudCertPro
- >
- Azure Domains Learning Hub: Master Azure by Capability Domains
- >
- Azure DevOps Domain
Azure DevOps Domain
Azure DevOps is the automation and delivery fabric of the cloud—the practices, services, and pipelines that transform code and infrastructure definitions into running, observable workloads. This domain page defines DevOps as a first‑class architectural concern, not a tooling catalogue, and covers CI/CD patterns, Infrastructure as Code, security integration, and the specific requirements of AI and agent system delivery. It is structured for reuse across multiple Azure certifications.
1. Overview #
What Is DevOps in Cloud Architecture #
In cloud architecture, DevOps is the integration of development, operations, and quality engineering through automated pipelines that build, test, deploy, and monitor applications and infrastructure. It is the control plane that translates source code and configuration into production services, enforcing consistency, repeatability, and auditability.
DevOps spans the entire lifecycle: planning, coding, building, testing, releasing, deploying, operating, and feeding back telemetry into the next iteration. It is not a single tool; it is a system‑level capability that enables continuous delivery of value while maintaining stability and security.
DevOps as the Automation and Delivery Layer of Cloud Systems #
Every cloud resource—compute, storage, networking, AI services—must be provisioned and updated. DevOps provides the automation layer that orchestrates these changes. It replaces manual, error‑prone processes with declarative definitions and automated workflows, enabling:
- Infrastructure as Code (IaC) to version and replicate environments.
- CI/CD pipelines to build, validate, and deploy application code.
- Policy‑driven governance to ensure compliance at every step.
DevOps sits above the individual domains, coordinating changes across them. A single pipeline may provision a VNet, deploy an AKS cluster, update a Cosmos DB container, and roll out an AI model endpoint—all governed by the same automation logic.
Why DevOps Is Critical for Scalability, Reliability, and AI Systems #
Without DevOps, scaling becomes manual and brittle. With it:
- Scalability: environments can be replicated to new regions or scaled out via parameterised IaC.
- Reliability: deployments follow consistent, tested paths; rollbacks are automated.
- AI systems: model training, evaluation, and deployment are integrated into pipelines (MLOps), ensuring models are versioned, tested, and promoted with the same rigour as application code.
- Agents: autonomous systems require continuous updates to tools, prompts, and models; DevOps provides the controlled delivery mechanism.
DevOps is the engine that makes the cloud manageable at enterprise scale.
2. Core DevOps Components in Azure #
Azure DevOps Services #
Azure DevOps is a suite of services that supports the entire application lifecycle:
- Azure Repos: Git repositories for source control, with branch policies and pull request workflows.
- Azure Pipelines: CI/CD service that builds, tests, and deploys to any cloud or on‑premises target. Supports YAML and classic pipelines.
- Azure Boards: work item tracking for agile planning, with Kanban boards, sprints, and backlog management.
- Azure Artifacts: package management for NuGet, npm, Maven, Python, and universal packages.
- Azure Test Plans: manual and exploratory testing tools integrated into the pipeline.
GitHub Actions for Azure #
GitHub Actions is a deeply integrated CI/CD platform for GitHub repositories. It provides:
- Workflows: YAML‑defined automation triggered by events (push, PR, schedule).
- Azure‑specific actions: pre‑built actions for Azure CLI,
azure/loginwith federated credentials, and deployment to App Service, AKS, Functions, etc. - Marketplace: thousands of community actions for testing, security scanning, and notifications.
For Azure‑centric teams, GitHub Actions offers tight integration with the developer experience already in GitHub, while Azure Pipelines provides deeper enterprise governance features.
Infrastructure as Code (IaC) #
IaC is the practice of managing and provisioning infrastructure through machine‑readable definition files:
- Bicep: Azure‑native DSL that compiles to ARM templates. Provides a clean syntax, modularity, and strong type checking. Recommended for Azure‑only deployments.
- ARM templates: JSON‑based declarative definitions; mature but verbose.
- Terraform: HashiCorp’s multi‑cloud IaC tool. Declarative, with a large provider ecosystem. Use when managing resources across multiple clouds or when teams already standardise on Terraform.
IaC files are versioned alongside application code, enabling repeatable environment creation and drift detection.
Release Pipelines and Deployment Strategies #
Release pipelines manage the progression of a build through environments (dev → test → staging → production). They define:
- Stages: logical divisions (build, deploy to dev, deploy to prod).
- Approvals: manual gates before production deployment.
- Deployment jobs: execute the actual deployment steps.
- Strategies: blue‑green, canary, rolling (often implemented via the deployment environment rather than the pipeline itself, but orchestrated by it).
Modern pipelines often combine build and release into a single multi‑stage YAML pipeline for simplicity.
3. CI/CD Architecture Patterns #
Continuous Integration vs Continuous Delivery vs Continuous Deployment #
- Continuous Integration (CI): frequently merging code changes into a shared branch, followed by automated builds and tests. Catches integration issues early.
- Continuous Delivery (CD): every CI‑verified change is automatically prepared for release (packaged, IaC generated), but production deployment requires manual approval.
- Continuous Deployment (CD): every change that passes automated gates is automatically deployed to production without manual intervention.
Most enterprise environments adopt Continuous Delivery with manual approval for production, while less critical internal services may use full Continuous Deployment.
Blue‑Green Deployments #
Two identical production environments (blue and green) run simultaneously. At deployment time, the new version is deployed to the inactive environment, validated, and then traffic is switched. Benefits:
- Instant rollback by switching back to the previous environment.
- Zero‑downtime deployments.
In Azure, this can be implemented with App Service deployment slots, AKS with two identical deployments and a service selector, or API Management version switching.
Canary Releases #
A new version is deployed to a subset of production instances or users, monitored, and then progressively rolled out to the entire fleet. This limits the blast radius of a bad release. Implementation:
- In AKS, using a service mesh (Istio) for traffic splitting.
- In App Service, using deployment slots with traffic percentage control.
- In Functions, using deployment slots (Premium plan).
- Canary analysis can be automated by observing metrics (e.g., error rate, latency) and deciding to proceed or roll back.
Rolling Deployments #
Instances are updated incrementally, one batch at a time, ensuring that some instances always serve traffic. Native to VMSS, AKS rolling update strategy, and Container Apps revisions. No additional infrastructure required; simpler than blue‑green but rollback is slower (must deploy previous version again).
Feature Flags and Progressive Delivery #
Feature flags decouple deployment from release. A feature can be deployed to production but hidden behind a flag, then gradually enabled for specific users or percentages. Azure App Configuration integrates with feature management libraries (.NET, JavaScript). This enables:
- Dark launching: deploying code without exposing it.
- A/B testing.
- Instant kill switches for problematic features.
4. Infrastructure as Code (IaC) Design #
Declarative vs Imperative Infrastructure #
- Declarative (Bicep, ARM, Terraform): defines the desired state; the tool computes the changes. Idempotent, easier to review, preferred for production IaC.
- Imperative (Azure CLI, PowerShell scripts): specifies the steps to achieve a state. Harder to maintain idempotency; useful for quick automation or tasks not supported by declarative tools.
Declarative IaC is the standard for environment provisioning; imperative scripts are used for operational runbooks.
Modular IaC Design #
IaC should be composed of reusable modules:
- Bicep modules: small, parameterised files that deploy a set of related resources (e.g., a secure VNet module).
- Terraform modules: root modules that call child modules from a private registry.
- ARM template linked templates: allow decomposition but are less clean than Bicep modules.
Modules encapsulate best practices (e.g., a “storage account with private endpoint” module) and enforce consistency.
Environment Separation #
Environments (dev, test, prod) should be isolated using separate subscriptions (recommended) or resource groups at minimum. IaC parameter files define environment‑specific values (SKU size, instance count, region). This prevents accidental changes to production and enables cost tracking per environment.
State Management Strategies #
- Bicep/ARM: state is managed by Azure Resource Manager; no external state file required. Simpler, but no drift detection beyond comparing deployments.
- Terraform: state file (
terraform.tfstate) stored in Azure Storage with locking. Enables drift detection andterraform plan. Requires careful access control to the state file.
For multi‑cloud or complex drift management, Terraform’s state is an advantage; for Azure‑only, Bicep reduces operational overhead.
Reusable Infrastructure Components #
Design IaC to create a platform that application teams can consume:
- A networking module that creates a VNet, subnets, and NSGs.
- A compute module that deploys an App Service Plan and a web app.
- A database module that provisions Azure SQL with private endpoint.
These components are published in a private registry (Bicep public registry or Terraform private registry) or kept in a shared repository, enabling self‑service provisioning with guardrails.
5. DevOps Design Decisions #
Azure DevOps vs GitHub Actions Selection #
| Factor | Azure DevOps | GitHub Actions |
|---|---|---|
| Ecosystem | Deep Azure integration, Boards, Artifacts, Test Plans. | Native to GitHub repositories, large community actions. |
| Enterprise governance | Service connections, agent pools, approval gates, environments. | Environments with protection rules, OIDC federation. |
| YAML pipelines | Multi‑stage YAML with templates and expressions. | Workflow YAML with composite actions and reusable workflows. |
| Hosted agents | Microsoft‑hosted (Windows, Linux, macOS), self‑hosted agents. | GitHub‑hosted runners, self‑hosted runners. |
| Auditing | Rich audit logs, retention policies. | Audit logs in GitHub Enterprise. |
| Use case | Organisations already using Azure DevOps for work tracking and requiring integrated test plans. | Teams on GitHub for source control, wanting a unified DevOps experience. |
Guidance: Use GitHub Actions if the organisation is on GitHub and prefers a single platform. Use Azure DevOps if you need tight integration with Boards, Test Plans, or have existing investments. They can coexist; for example, source in GitHub, pipelines in Azure DevOps.
When to Use Terraform vs Bicep #
- Bicep: Azure‑native, no state file to manage, tight Azure integration, faster support for new Azure features. Recommended for Azure‑only landscapes with teams familiar with Azure.
- Terraform: multi‑cloud, large module ecosystem, provider abstraction. Use when managing resources outside Azure or when Terraform is an enterprise standard.
Bicep is simpler operationally; Terraform offers greater flexibility for hybrid/multi‑cloud.
Pipeline Design for Microservices vs Monoliths #
- Monolith: a single pipeline builds and deploys the entire application. Simpler but forces redeployment of everything on any change.
- Microservices: each service has its own repository and pipeline, deploying independently. Requires careful versioning and contract testing. Infrastructure may still be managed centrally via a platform pipeline.
The pipeline architecture should mirror the system architecture: independent deployment units → independent pipelines.
Build vs Release Pipeline Separation #
- Unified multi‑stage pipeline: build, test, and deploy stages in a single YAML definition. Simpler, easier to trace end‑to‑end.
- Separate build and release: classic approach; build produces an artifact, release consumes it and deploys across environments. More complex but offers independent lifecycle and approval gates.
Modern practice favours multi‑stage pipelines for most scenarios, with approval gates at deployment stages.
Manual Approval vs Fully Automated Deployment #
- Manual approval: required for production deployments in regulated industries, or when verification cannot be fully automated. Provides a safety net but introduces latency.
- Fully automated: after passing all automated tests (unit, integration, security scans, canary analysis), changes are deployed to production. Faster, but requires high confidence in testing.
Many enterprises automate deployments to dev/test, and require approval for production, gradually extending automation as telemetry and confidence improve.
6. DevOps in Enterprise Architecture #
Microservices Delivery Pipelines #
Each microservice has a pipeline that builds a container image, runs tests, and deploys to an AKS cluster or Container Apps environment. A central platform pipeline manages cluster infrastructure. Service pipelines use infrastructure as code (e.g., Helm charts, Kustomize) for the application configuration. Deployment strategies (canary, blue‑green) are applied per service using the orchestrator’s native features.
Multi-Team Enterprise Engineering Workflows #
Central platform teams provide reusable pipeline templates, IaC modules, and shared services (logging, monitoring, service mesh). Application teams consume these templates, inheriting governance (e.g., security scans, approval gates) without defining them from scratch. Azure DevOps template or GitHub Actions reusable workflows implement this pattern.
Hybrid Cloud Deployments #
Pipelines can target both Azure and on‑premises environments. Self‑hosted agents or runners deployed on‑premises execute deployment tasks inside the private network. IaC can manage Azure resources while configuration management (e.g., Ansible, DSC) handles on‑premises servers.
Large-Scale System Integration #
For systems spanning many components, a delivery platform orchestrates deployment order and dependencies. Azure DevOps environments with multiple resources model this; tools like Azure Managed DevOps Pools (or self‑hosted pools) provide scalable execution capacity.
Governance and Compliance Automation #
Pipelines become enforcement points:
- Branch policies require code review and passing builds.
- Security scanning (dependency scanning, SAST, DAST) runs automatically.
- Policy compliance using Azure Policy extension tasks checks IaC against corporate standards before deployment.
- Change tracking is automatic: each deployment is logged with who, what, and when.
7. DevOps for AI & ML Systems #
AI and ML systems introduce additional lifecycle stages that must be integrated into DevOps.
MLOps Pipelines #
MLOps extends CI/CD to machine learning. A typical pipeline:
- Data preparation: preprocess datasets, version them (DVC or Azure ML datasets).
- Model training: Azure ML training pipeline (or Databricks) triggered on code change or schedule.
- Model evaluation: compute metrics, compare with baseline.
- Model registration: if evaluation passes, register model in Azure ML Registry with version.
- Deployment: to staging (managed online endpoint) for validation, then promote to production after automated tests or manual approval.
- Monitoring: inference telemetry, data drift, and model performance feed back to trigger retraining.
Model Versioning and Rollback Strategies #
Models are versioned artifacts. Azure ML Model Registry tracks versions and metadata. Deployment targets (endpoints) support traffic splitting between model versions (blue‑green or canary at model level). Rollback is switching traffic to the previous model version instantly.
CI/CD for LLM Applications #
LLM‑based applications (RAG, agents) have pipeline needs beyond traditional apps:
- Prompt versioning: prompts are assets stored in Git and deployed as part of the application configuration. A change to system prompt triggers a pipeline that runs evaluation against a test set before deployment.
- RAG index update: when documents change, a pipeline triggers re‑indexing (embedding generation + AI Search indexing) and, if quality tests pass, updates the search index or swaps aliases.
- Tool updates: agent tools are code; their deployment follows standard CI/CD. New tool endpoints are deployed, then the agent tool registry is updated.
RAG Pipeline Deployment Automation #
The RAG ingestion pipeline is automated as part of the data pipeline or triggered by document uploads. IaC provisions AI Search, Azure OpenAI, storage, and the Function/Container App that performs ingestion. The pipeline:
- Validates document format.
- Generates embeddings.
- Indexes to staging index.
- Runs retrieval quality tests.
- Swaps staging index to production alias.
Agent System Deployment and Updates #
Agents are containers or functions that execute reasoning loops. Their deployment includes:
- Agent code: the orchestrator and tool implementations.
- Agent configuration: tool catalog, memory store connection strings, model endpoint.
- Safety guardrails: updated content filtering settings, approval workflow definitions.
Pipelines for agents should include integration tests that simulate user conversations with mock tools, verifying that the agent chooses tools correctly and handles failures gracefully.
Prompt and Model Lifecycle Management #
Prompts and model configurations are treated as code:
- Prompts stored in Git, with version tags.
- Evaluations run as part of CI; if a prompt change degrades quality, the pipeline fails.
- Deploying a new model version (e.g., GPT‑4 Turbo to GPT‑4o) follows a canary pattern: deploy alongside existing model, compare quality/cost, then cut over.
8. Security in DevOps (DevSecOps) #
Secure Pipeline Design #
Pipelines are attack vectors; secure them by:
- Using service connections with federated credentials (OIDC) instead of secrets where possible.
- Minimising permissions: grant pipeline identity only necessary RBAC roles at resource group scope.
- Environment protection: lock production environments with required approvals and exclusive deployment locks.
- Pipeline logging: avoid echoing secrets; use secret variables.
Secret Management (Key Vault Integration) #
All secrets (connection strings, API keys, certificates) are stored in Azure Key Vault. Pipelines access secrets at runtime via:
- Azure DevOps Variable Groups linked to Key Vault.
- GitHub Actions
azure/keyvault-secrets-getaction or direct CLI calls.
Never store secrets in pipeline variables or code.
Dependency Scanning and Vulnerability Detection #
- Dependency scanning: tools like GitHub Dependabot, Microsoft Security DevOps extension, or Snyk scan package dependencies for known vulnerabilities (CVEs) during CI.
- Container scanning: Microsoft Defender for Containers or open‑source tools (Trivy) scan container images in the registry.
- IaC scanning: check Bicep/Terraform code against security best practices using tools like
checkov,tfsec, or Azure Policy extension.
Policy Enforcement in CI/CD #
Azure Policy can evaluate IaC templates before deployment in a pipeline. The “Azure Policy Check” task in Azure Pipelines or azure/policy action in GitHub runs compliance checks. Non‑compliant deployments can be blocked, ensuring that production resources always meet governance standards.
Identity-Based Pipeline Access Control #
- Azure DevOps: use service connections with workload identity federation (instead of secrets), scoped to specific subscriptions/resource groups.
- GitHub Actions: OIDC connect to Azure, removing the need for static credentials.
- Use managed identities for agents/runners when possible (e.g., self‑hosted agents on Azure VMs with managed identity).
9. Observability in DevOps Pipelines #
Deployment Monitoring #
Every deployment should emit telemetry:
- Log deployment start, duration, and status to Application Insights or custom dashboard.
- Track deployment frequency, lead time for changes, and change failure rate (DORA metrics).
Release Health Tracking #
After deployment, monitor application health metrics (error rate, latency, exception count) for a defined stabilization period. If metrics breach thresholds, automatically trigger a rollback or alert the on‑call team.
Rollback Triggers Based on Telemetry #
Implement automated rollback logic:
- In Azure Pipelines, use gates (Azure Monitor alerts) before proceeding to the next stage.
- A monitoring tool can initiate a rollback deployment via an Azure Function or webhook if health signals degrade.
Integration with Azure Monitor and Application Insights #
Application Insights can be queried during pipeline stages to evaluate test results or production health. Use the Application Insights REST API or Azure Monitor metrics actions in pipelines to incorporate telemetry into deployment decisions.
Feedback Loop from Production to Development #
Telemetry from production (errors, performance, user feedback) should automatically generate work items or bugs in Azure Boards or GitHub Issues. This closes the loop, ensuring that operations data directly informs development priorities.
10. Reliability, Scaling & Performance #
High Availability Pipeline Design #
Pipelines themselves must be reliable:
- Use multiple agent pools/runners across zones for self‑hosted infrastructure.
- Employ retry policies for transient failures within jobs.
- Store pipeline outputs (artifacts) in geo‑redundant storage.
Parallel Builds and Distributed Execution #
Break large pipelines into parallel jobs to reduce execution time:
- Split test suites across multiple agents.
- Build container images and service binaries in parallel.
- Use matrix strategies to test across multiple configurations simultaneously.
Artifact Caching Strategies #
- Use package caching (Azure Artifacts, GitHub Packages) to avoid rebuilding dependencies.
- Layer Docker images to leverage cache, only rebuilding changed layers.
- Cache external dependencies (NuGet, npm) using pipeline caching mechanisms.
Multi-Region Deployment Strategies #
Pipelines can orchestrate deployments to multiple regions sequentially or in parallel:
- Sequential: deploy to primary region, validate, then deploy to secondary. Reduces blast radius.
- Parallel: deploy to all regions simultaneously for faster global rollout. Needs strong confidence in the release.
Traffic routing (Front Door, Traffic Manager) is updated as part of the deployment pipeline.
11. Certification Mapping #
| Certification | DevOps Domain Relevance |
|---|---|
| AZ-104 | Deploy resources using ARM templates, configure basic CI/CD with Azure DevOps, manage artifacts. |
| AZ-305 | Design deployment architecture: IaC strategy, CI/CD patterns, environment management, governance automation, and multi‑region delivery. |
| AI-900 | Understand the role of pipelines in AI service integration, basic lifecycle concepts. |
| AI-103 | Build CI/CD for AI applications: deploy LLM apps, RAG pipeline updates, prompt versioning, agent tool delivery. |
| AI-300 | Architect MLOps: training pipeline automation, model registry, deployment strategies, drift monitoring integration, and feedback loops. |
| GH-600 | Design agent delivery pipelines: containerised agent deployment, tool registry updates, safety guardrail updates, multi‑agent system rollout. |
12. Real-World Architecture Example #
Scenario: A team developing a microservices‑based e‑commerce platform with an AI‑powered product recommendation service and a customer service agent.
Source control and planning:
- All code stored in GitHub repositories: one for each microservice, one for infrastructure, one for the AI agent.
- Work items tracked in Azure Boards (or GitHub Issues) with automated linking from PRs.
CI/CD pipelines:
- Microservices CI: each push triggers a GitHub Actions workflow that:
- Builds a container image and runs unit + integration tests.
- Scans for vulnerabilities (container and dependencies).
- Pushes the image to Azure Container Registry.
- Deploys to the dev environment (Container Apps) automatically.
- On PR to main, a preview environment is created temporarily for review.
- Microservices CD: merging to main triggers deployment to staging (another Container Apps environment), runs smoke tests. After manual approval (for production), deploys to production using a blue‑green strategy with traffic splitting in Container Apps revisions.
- Infrastructure pipeline: a separate GitHub Actions workflow for infrastructure, triggered on changes to the
iac/folder. Uses Bicep to deploy shared resources (VNet, private endpoints, database) and per‑environment parameter files. The pipeline includes a compliance check using Azure Policy before apply. - AI recommendation service:
- MLOps pipeline in Azure Machine Learning, triggered by new training data or schedule.
- Trains model, evaluates, registers, and deploys to a staging endpoint.
- After manual approval, promotes to production endpoint, which is consumed by the product service.
- AI agent (customer service agent):
- The agent code is in a separate repo. CI builds the container, runs integration tests that simulate conversations with mock tools, then deploys to a Container Apps dev environment.
- The tool catalog is stored as a configuration file in the repo; changes to tool definitions trigger the same CI, which validates tool schemas against the actual tool APIs.
- Deployment to staging and production follows the same pipeline with approval gates.
- The RAG index update pipeline is a separate Azure Data Factory pipeline (or GitHub workflow) triggered when new knowledge base documents are uploaded to blob storage. It regenerates embeddings and swaps the search index alias only if quality tests pass.
Secret management:
- All secrets stored in Azure Key Vault. GitHub Actions uses OIDC to authenticate to Azure and fetch secrets at runtime, with no static credentials.
Observability:
- Application Insights tracks all applications; deployment events are logged via Azure DevOps/GitHub annotations.
- A deployment dashboard (Power BI, built on Log Analytics) shows DORA metrics and release health.
- Automatic rollback: if production error rate exceeds threshold within 15 minutes of deployment, an Azure Monitor alert triggers a GitHub Actions workflow that deploys the previous known‑good version.
Multi‑environment setup:
- Three Azure subscriptions: dev, test, prod.
- IaC parameter files define smaller SKUs in dev/test, and zone‑redundant, high‑capacity SKUs in prod.
- Network policies ensure dev/test cannot access prod resources.
This architecture demonstrates DevOps as an integrated system that automates delivery across compute, data, AI, and infrastructure, ensuring speed, consistency, and safety at scale.