Lesson 1: Deployment Strategies

Production-ready deployment strategies: Blue/Green, Canary, and real-world patterns.

Lesson 1: Deployment Strategies

Real-World Problem: Black Friday Deployment

Scenario: You need to deploy a critical payment fix on Black Friday morning. The system handles $10M/hour in transactions. How do you deploy without risking downtime?

Wrong approach: Deploy directly to production and hope nothing breaks.

Right approach: Use a proven deployment strategy that minimizes risk.

Why Deployment Strategies Matter

Industry statistics:

60% of outages are caused by bad deployments (Gartner, 2023)
Average cost of downtime: $5,600/minute for large enterprises
99.9% uptime = 8.76 hours downtime/year (still too much for critical systems)

Product team perspective: Every minute of downtime means lost revenue, frustrated customers, and damaged reputation.

DevOps perspective: Need automated, repeatable, safe deployment processes.

Blue/Green Deployment

Concept

You have two identical environments (Blue and Green). One is live, the other is idle. You deploy to the idle one, test it, and then switch traffic.

Real-World Example: E-Commerce Platform

import { * } from 'sruja.ai/stdlib'


ECommerce = system "E-Commerce Platform" {
    API = container "REST API" {
        technology "Go"
        scale {
            min 10
            max 200
        }
    }
    PaymentService = container "Payment Service" {
        technology "Go"
        description "Critical: Processes all payments"
    }
    OrderDB = database "Order Database" {
        technology "PostgreSQL"
    }
}

deployment Production "Production Environment" {
    node Blue "Active Cluster (Blue)" {
        containerInstance API {
            replicas 50
            traffic 100
            status "active"
        }
        containerInstance PaymentService {
            replicas 20
            traffic 100
        }
        containerInstance OrderDB {
            role "primary"
        }
    }

    node Green "Staging Cluster (Green)" {
        containerInstance API {
            replicas 50
            traffic 0
            status "ready"
        }
        containerInstance PaymentService {
            replicas 20
            traffic 0
            status "ready"
        }
        containerInstance OrderDB {
            role "standby"
            description "Synced from Blue, ready for switch"
        }
    }
}

view index {
include *
}

DevOps Workflow

Deploy to Green: Deploy new version to idle Green environment
Smoke Tests: Run automated health checks and integration tests
Load Testing: Verify Green can handle production load
Switch Traffic: Use load balancer to route 100% traffic to Green
Monitor: Watch metrics for 30 minutes
Rollback Plan: Keep Blue ready for instant rollback if issues occur

When to Use Blue/Green

✅ Good for:

Critical services (payment, authentication)
Stateful applications with database replication
Zero-downtime requirements
Large, infrequent deployments

❌ Not ideal for:

Frequent small deployments (wasteful)
Stateless services (Canary is better)
Limited infrastructure budget

Cost Consideration

Example: Running duplicate production environment

Cost: 2x infrastructure during deployment window
Typical window: 1-2 hours
Trade-off: Higher cost for lower risk

Canary Deployment

Concept

You roll out the new version to a small percentage of users (e.g., 5%) and monitor for errors. Gradually increase if metrics look good.

Real-World Example: API Service

import { * } from 'sruja.ai/stdlib'


API = system "REST API" {
    APIv1 = container "API v1.2.3" {
        technology "Go"
        description "Current stable version"
    }
    APIv2 = container "API v1.2.4" {
        technology "Go"
        description "New version with performance improvements"
    }
}

deployment Production "Production Environment" {
    node Canary "Canary Cluster" {
        containerInstance APIv2 {
            replicas 2
            traffic 5
            description "5% of traffic, monitoring error rate"
            metadata {
                maxErrorRate "1%"
                rollbackTrigger "error_rate > 1% or latency_p95 > 500ms"
            }
        }
    }

    node Stable "Stable Cluster" {
        containerInstance APIv1 {
            replicas 38
            traffic 95
        }
    }
}

view index {
include *
}

Gradual Rollout Strategy

Document the rollout plan in metadata:

import { * } from 'sruja.ai/stdlib'


ECommerce = system "E-Commerce Platform" {
API = container "API Service" {
  metadata {
    deploymentStrategy "Canary"
    rolloutSteps "5% → 25% → 50% → 100%"
    stepDuration "15 minutes per step"
    monitoringWindow "15 minutes between steps"
    rollbackCriteria "error_rate > 1% OR latency_p95 > 500ms OR cpu > 90%"
  }
}
}

view index {
include *
}

Real-World Rollout Timeline

Example: Deploying new API version

10:00 AM - Deploy to Canary (5% traffic)
10:15 AM - Monitor: Error rate 0.2%, Latency p95: 180ms ✅
10:15 AM - Increase to 25% traffic
10:30 AM - Monitor: Error rate 0.3%, Latency p95: 195ms ✅
10:30 AM - Increase to 50% traffic
10:45 AM - Monitor: Error rate 0.4%, Latency p95: 210ms ✅
10:45 AM - Increase to 100% traffic
11:00 AM - Deployment complete

When to Use Canary

✅ Good for:

Stateless services
Frequent deployments (multiple per day)
A/B testing new features
Performance-sensitive changes
Limited infrastructure budget

❌ Not ideal for:

Database schema changes (requires coordination)
Breaking API changes (incompatible versions)
Services with complex state

Rolling Deployment

Concept

Gradually replace old instances with new ones, one at a time.

deployment Production "Production Environment" {
    node Cluster "Kubernetes Cluster" {
        containerInstance API {
            replicas 20
            strategy "rolling"
            maxUnavailable 1
            maxSurge 2
            description "Replace 1 pod at a time, max 1 unavailable"
        }
    }
}

When to Use Rolling

✅ Good for:

Kubernetes-native deployments
Stateless microservices
Cost-effective (no duplicate infrastructure)
Automated rollback via health checks

Feature Flags: Deployment Strategy Alternative

Sometimes you don’t need a deployment strategy—use feature flags instead:

import { * } from 'sruja.ai/stdlib'


Platform = system "Platform" {
FeatureFlags = container "Feature Flag Service" {
  technology "LaunchDarkly, Split.io"
  description "Controls feature rollout without deployment"
}

API = container "API Service" {
  // Feature flags: newPaymentFlow (10% rollout), experimentalSearch (5% rollout)
}
}

view index {
include *
}

Use case: Deploy code with new feature disabled, then gradually enable via feature flags.

Monitoring During Deployment

Model your observability during deployments:

import { * } from 'sruja.ai/stdlib'


Observability = system "Observability Stack" {
Prometheus = container "Metrics" {
  description "Tracks error rate, latency, throughput during deployment"
}
AlertManager = container "Alerting" {
  description "Alerts on deployment issues"
}
}

// Link monitoring to deployment
deployment Production "Production Environment" {
    // Monitoring: error_rate, latency_p95, cpu_usage, request_rate
    // Alert thresholds: errorRate > 1%, latencyP95 > 500ms, cpuUsage > 90%
    // Rollback automation enabled
}

Real-World Case Study: Netflix Canary Deployment

Challenge: Deploy to 100M+ users without downtime

Solution:

Canary deployment to 1% of users
Automated analysis of 50+ metrics
Automatic rollback if any metric degrades
Gradual rollout over 6 hours

Result: 99.99% deployment success rate

Key Takeaways

Choose the right strategy: Blue/Green for critical, Canary for frequent, Rolling for cost-effective
Automate everything: Use CI/CD pipelines to automate deployment and rollback
Monitor aggressively: Track error rates, latency, and resource usage during deployment
Have a rollback plan: Always be ready to rollback within minutes
Document in Sruja: Model your deployment strategy so teams understand the process

Exercise: Design a Deployment Strategy

Scenario: You’re deploying a new checkout flow for an e-commerce platform. The system processes $1M/hour.

Tasks:

Choose a deployment strategy (Blue/Green, Canary, or Rolling)
Model it in Sruja with deployment nodes
Add monitoring and rollback criteria
Document the rollout timeline

Time: 15 minutes

Lesson 1: Deployment Strategies

Real-World Problem: Black Friday Deployment

Why Deployment Strategies Matter

Blue/Green Deployment

Concept

Real-World Example: E-Commerce Platform

DevOps Workflow

When to Use Blue/Green

Cost Consideration

Canary Deployment

Concept

Real-World Example: API Service

Gradual Rollout Strategy

Real-World Rollout Timeline

When to Use Canary

Rolling Deployment

Concept

When to Use Rolling

Feature Flags: Deployment Strategy Alternative

Monitoring During Deployment

Real-World Case Study: Netflix Canary Deployment

Key Takeaways

Exercise: Design a Deployment Strategy

Further Reading