Well-Architected Framework
Deploy applications with zero downtime
Application deployments can cause downtime and increase risk when you deploy updates directly to production. Zero-downtime strategies like blue/green, canary, and rolling deployments maintain availability during updates and support fast rollback.
Your deployment strategy depends on your infrastructure, such as virtual machines or containers, and orchestration tools, like Nomad or Kubernetes. Use load balancers and orchestrators to gradually shift traffic, test changes with production load, and rollback instantly when issues occur.
Why deploy applications with zero downtime
Deploying applications with zero-downtime strategies addresses the following operational challenges:
Lower service disruptions and revenue loss: Application downtime during deployments causes lost revenue, frustrated users, and damaged reputation. Zero-downtime deployments maintain service availability throughout updates, ensuring users experience no interruptions.
Reduce deployment risk with gradual rollouts: Deploying application changes to all users simultaneously creates high risk. If issues occur, all users are affected. Canary and rolling deployments gradually shift traffic, limiting the affected scope and allowing you to catch issues before full rollout.
Enable instant rollback capabilities: When application updates cause bugs or performance issues, traditional deployments require time-consuming rollback procedures. Blue/green deployments maintain the previous version running, allowing traffic switching back to the working version.
Test changes with production traffic: Canary deployments let you test changes with real production traffic on a small user subset, validating performance and functionality before full deployment.
Choose a deployment strategy
Select your deployment strategy based on application requirements, infrastructure constraints, and risk tolerance.
Use the following criteria to choose a deployment strategy:
Use blue/green deployments when you need:
- Instant rollback capability for critical applications
- Complete validation before switching traffic
- Ability to maintain two full environments simultaneously
- Predictable cutover timing
Use canary deployments when you need:
- Risk reduction for high-impact changes
- Gradual validation with real production traffic
- Early detection of issues before full rollout
- Ability to test with a subset of users first
Use rolling deployments when you need:
- Resource efficiency with minimal overhead
- Gradual replacement without double infrastructure costs
- Continuous availability during updates
- Automated orchestration with Kubernetes or Nomad
Combine strategies for comprehensive safety. For example, you can use blue/green deployment with canary testing to deploy to green environment, route 10% traffic for canary validation, then switch all traffic if successful.
Deploy applications on virtual machines with load balancers
Blue/green and canary deployments work well for applications on virtual machines. Load balancers and reverse proxies manage traffic between blue and green environments, enabling you to direct a subset of users for canary testing and control traffic for rolling deployments.
Load balancers route traffic between application environments during updates, supporting blue/green deployments and canary releases. They allow you to gradually shift users to new versions while maintaining the ability to roll back if issues occur. By continuously monitoring application health and automatically routing around failed instances, load balancers increase service availability throughout the deployment process.
Regardless of your cloud provider, you can use Terraform to manage load balancers and proxies. Using Terraform for infrastructure as code allows you to version control your load balancer configurations alongside your application code, ensuring that changes are tracked, reviewed, and rolled back if needed. You can define target groups, health check parameters, routing rules, and SSL certificates declaratively, then apply these configurations automatically as part of your CI/CD pipeline.
The following example shows Terraform configuration for canary deployment using AWS Application Load Balancer with weighted target groups:
# Create target group for stable (blue) version
resource "aws_lb_target_group" "blue" {
name = "app-blue"
port = 8080
protocol = "HTTP"
vpc_id = var.vpc_id
health_check {
enabled = true
healthy_threshold = 2
interval = 30
path = "/health"
timeout = 5
}
}
# Create target group for new (green) version
resource "aws_lb_target_group" "green" {
name = "app-green"
port = 8080
protocol = "HTTP"
vpc_id = var.vpc_id
health_check {
enabled = true
healthy_threshold = 2
interval = 30
path = "/health"
timeout = 5
}
}
# Configure listener with weighted traffic distribution
resource "aws_lb_listener" "app" {
load_balancer_arn = aws_lb.main.arn
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
forward {
target_group {
arn = aws_lb_target_group.blue.arn
weight = 90 # 90% of traffic to stable version
}
target_group {
arn = aws_lb_target_group.green.arn
weight = 10 # 10% of traffic to canary version
}
stickiness {
enabled = false
duration = 600
}
}
}
}
The Terraform configuration creates two target groups and distributes traffic with a 90/10 split for canary testing. To gradually shift traffic, you can update the weight values for example, to 50/50, then 0/100, and run terraform apply. The load balancer immediately adjusts traffic distribution without downtime.
Canary testing workflow
After the green environment is ready, the load balancer sends a small fraction of traffic to the green environment (in this example, 10%).

If the canary test succeeds without errors, you can incrementally direct traffic to the green environment over time. In the end state, you redirect all traffic to the green environment. After verifying the new deployment, you can destroy the old blue environment. The green environment is now your current production service.

To learn how to implement canary deployments with AWS Application Load Balancers, follow the blue-green and canary deployments tutorial.
Deploy containerized applications with orchestration tools
Containers support rolling, blue/green, and canary deployments through orchestration tools like Nomad and Kubernetes. Orchestrators automate the deployment process, manage health checks, and handle traffic routing during updates.
The following deployment strategies lower downtime risk:
- Blue/green deployments: Provide instant rollback capability by maintaining two identical environments and switching traffic between them, ensuring zero downtime but requiring double the resources.
- Rolling deployments: Gradually replace instances one by one, minimizing resource usage while maintaining availability, making them efficient for resource-constrained environments.
- Canary deployments: Mitigate risk by releasing to a small subset of users first, allowing you to validate changes and catch issues before full rollout.
Rolling deployments with Nomad
Nomad supports rolling updates as a first-class feature. Use the update block to control how Nomad replaces old allocations with new ones during deployment.
The following example shows a Nomad job specification with rolling update configuration:
job "web-app" {
datacenters = ["dc1"]
type = "service"
update {
max_parallel = 2 # Update 2 instances at a time
health_check = "checks" # Wait for health checks to pass
min_healthy_time = "10s" # Minimum time to be healthy
healthy_deadline = "5m" # Maximum time to become healthy
progress_deadline = "10m" # Overall deployment timeout
auto_revert = true # Automatically revert on failure
canary = 2 # Deploy 2 canary instances first
}
group "web" {
count = 6 # Total 6 instances
network {
port "http" {
to = 8080
}
}
service {
name = "web-app"
port = "http"
check {
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
}
}
task "app" {
driver = "docker"
config {
image = "myregistry/myapp:1.0.0"
ports = ["http"]
}
}
}
}
The Nomad job specification deploys 6 instances with rolling updates. Nomad first deploys 2 canary instances, waits for health checks to pass for 10 seconds, then progressively updates 2 instances at a time. If any instance fails health checks, Nomad automatically reverts to the previous version. The progress_deadline ensures the entire deployment completes within 10 minutes or fails.
To learn how to implement rolling and canary deployments with Nomad, follow the Nomad job updates tutorials.
Rolling deployments with Kubernetes
Kubernetes uses rolling updates by default. Kubernetes incrementally replaces current pods with new ones, scheduling new pods on nodes with available resources and waiting for them to become ready before removing old pods.
Both Nomad and Kubernetes support blue/green deployments. Before sending all traffic to your new deployment, use canary testing to validate the new version works correctly with production traffic.
HashiCorp resources
- Learn about zero-downtime deployment strategies for an overview of progressive delivery options.
- Deploy blue/green infrastructure when you need instant rollback for infrastructure changes.
- Deploy with traffic splitting using service mesh when you need service-level traffic control.
- Implement atomic deployments to deploy infrastructure changes as a single unit.
- Learn how to package applications so deployments promote immutable artifacts.
- Implement automated testing so rollouts stop before users see failures.
Terraform for load balancers:
- Follow the blue-green and canary deployments tutorial for a weighted target group example.
- Read the AWS Load Balancer target group documentation for health checks and target registration.
- Read the AWS Load Balancer listener documentation for weighted routing configuration.
- Read the AWS Load Balancer listener rule documentation for path-based and header-based routing.
Nomad for rolling and canary deployments:
- Follow the Nomad blue/green and canary deployments tutorial for end-to-end rollout patterns.
- Follow the Nomad rolling updates tutorial to control batch size and health checks.
- Read the Nomad update block reference for
canary,auto_revert, and deployment deadlines. - Read the Nomad job specification documentation for service definitions and health checks.
External resources
Kubernetes rolling updates:
- Read the Kubernetes Deployments documentation for rollout strategy options.
- Read the Kubernetes readiness probes documentation to gate traffic on application health.
- Read the kubectl rollout documentation to pause, resume, and undo a rollout.
Next steps
In this section of Zero-downtime deployments, you learned about methods to deploy application changes with zero-downtime. Zero-downtime deployments is part of the Define and automate processes pillar.
Start with zero-downtime deployment strategies and then choose an implementation path:
- Deploy blue/green infrastructure when you deploy changes to virtual machines.
- Deploy traffic splitting using service mesh when you deploy microservices and need service-level traffic control.