Lesson 1: Interview Question - Design a Video Streaming Platform
Master scaling questions by designing YouTube/Netflix-style systems.
Lesson 1: Interview Question - Design a Video Streaming Platform
The Interview Question
“Design a video streaming platform like YouTube or Netflix that can handle millions of concurrent viewers.”
This is a classic system design interview question asked at Google, Netflix, and other top companies. Let’s break it down step-by-step.
Step 1: Clarify Requirements (What Interviewers Want to Hear)
Before jumping into design, always clarify:
You should ask:
- “What’s the scale? How many concurrent viewers?”
- “What’s the latency requirement? How fast should videos start?”
- “What types of videos? Short clips or full movies?”
- “Do we need to support live streaming or just on-demand?”
Interviewer’s typical answer:
- “Let’s say 10 million concurrent viewers”
- “Videos should start within 2 seconds”
- “Both short clips and full movies”
- “Focus on on-demand for now”
Step 2: Design the High-Level Architecture
Start with the core components:
- Client (mobile app, web browser)
- CDN (Content Delivery Network) - serves videos
- Origin Server - stores original videos
- API Server - handles metadata, user requests
- Database - stores video metadata, user data
Step 3: Model with Sruja
Let’s model this architecture:
import { * } from 'sruja.ai/stdlib'
Viewer = person "Video Viewer"
StreamingPlatform = system "Video Streaming Service" {
CDN = container "Content Delivery Network" {
technology "Cloudflare, AWS CloudFront"
description "Serves videos from edge locations worldwide"
}
OriginServer = container "Origin Server" {
technology "S3, GCS"
description "Stores original video files"
}
VideoAPI = container "Video API" {
technology "Go, gRPC"
description "Handles video metadata, user requests"
}
TranscodingService = container "Video Transcoding" {
technology "FFmpeg, Kubernetes"
description "Converts videos to different formats/qualities"
}
VideoDB = database "Video Metadata Database" {
technology "PostgreSQL"
}
UserDB = database "User Database" {
technology "PostgreSQL"
}
}
Viewer -> StreamingPlatform.CDN "Streams video"
StreamingPlatform.CDN -> StreamingPlatform.OriginServer "Fetches on cache miss"
Viewer -> StreamingPlatform.VideoAPI "Requests video info"
StreamingPlatform.VideoAPI -> StreamingPlatform.VideoDB "Queries metadata"
StreamingPlatform.VideoAPI -> StreamingPlatform.UserDB "Queries user data"
StreamingPlatform.OriginServer -> StreamingPlatform.TranscodingService "Processes videos"
view index {
include *
}
Step 4: Address Scaling (The Key Part)
Interviewer will ask: “How does this handle 10 million concurrent viewers?”
This is where you show your scaling knowledge. Let’s add scaling configuration:
import { * } from 'sruja.ai/stdlib'
Viewer = person "Video Viewer"
StreamingPlatform = system "Video Streaming Service" {
CDN = container "Content Delivery Network" {
technology "Cloudflare, AWS CloudFront"
// CDN scales automatically - no need to configure
description "Serves videos from edge locations worldwide"
}
VideoAPI = container "Video API" {
technology "Go, gRPC"
// This is what interviewers want to see!
scale {
min 10
max 1000
metric "cpu > 75% or requests_per_second > 10000"
}
description "Handles video metadata, user requests"
}
TranscodingService = container "Video Transcoding" {
technology "FFmpeg, Kubernetes"
scale {
min 5
max 100
metric "queue_length > 50"
}
description "Converts videos to different formats/qualities"
}
VideoDB = database "Video Metadata Database" {
technology "PostgreSQL"
// Database scaling: read replicas
description "Primary database with 5 read replicas for scaling reads"
}
}
view index {
include *
}
What Interviewers Look For
✅ Good Answer (What You Just Did)
- Clarified requirements before designing
- Started with high-level architecture
- Modeled with Sruja to visualize
- Addressed scaling with specific numbers
- Explained trade-offs (CDN vs origin server)
❌ Bad Answer (Common Mistakes)
- Jumping straight to code/implementation details
- Not asking clarifying questions
- Designing for small scale only
- Not mentioning CDN or caching
- Ignoring database scaling
Key Points to Mention in Interview
1. CDN for Video Delivery
Say: “We use a CDN to serve videos from edge locations close to users. This reduces latency and offloads traffic from origin servers.”
2. Horizontal Scaling for API
Say: “The API server scales horizontally from 10 to 1000 instances based on CPU and request rate. This handles traffic spikes during peak hours.”
3. Database Read Replicas
Say: “We use read replicas for the database to scale read operations. Writes go to primary, reads can go to any replica.”
4. Caching Strategy
Say: “We cache frequently accessed video metadata in Redis to reduce database load.”
Interview Practice: Add Caching
Interviewer might ask: “How do you reduce database load?”
Add caching to your design:
import { * } from 'sruja.ai/stdlib'
StreamingPlatform = system "Video Streaming Service" {
VideoAPI = container "Video API" {
technology "Go, gRPC"
scale {
min 10
max 1000
metric "cpu > 75%"
}
}
VideoDB = database "Video Metadata Database" {
technology "PostgreSQL"
}
Cache = database "Video Metadata Cache" {
technology "Redis"
description "Caches frequently accessed video metadata"
}
}
StreamingPlatform.VideoAPI -> StreamingPlatform.Cache "Reads metadata (cache hit)"
StreamingPlatform.VideoAPI -> StreamingPlatform.VideoDB "Reads metadata (cache miss)"
StreamingPlatform.VideoAPI -> StreamingPlatform.Cache "Writes to cache"
view index {
include *
}
Understanding Scale Block Fields
min - Minimum Replicas
Interview tip: “We keep at least 10 instances running to handle baseline traffic and provide fault tolerance.”
max - Maximum Replicas
Interview tip: “We cap at 1000 instances to control costs. If we need more, we’d need to optimize the architecture first.”
metric - Scaling Trigger
Interview tip: “We scale based on CPU usage and request rate. When CPU exceeds 75% or requests exceed 10k/sec, we add more instances.”
Real Interview Example: Capacity Estimation
Interviewer: “How many API servers do you need for 10M concurrent users?”
Your answer:
- “Assume each user makes 1 request per minute = 10M requests/minute = ~167k requests/second”
- “Each API server handles ~1000 requests/second”
- “We need ~167 servers at peak”
- “With 2x headroom for spikes: ~350 servers”
- “Our scale block allows 10-1000, so we’re covered”
Exercise: Practice This Question
Design a video streaming platform and be ready to explain:
- Why you chose CDN
- How scaling works
- Database scaling strategy
- Caching approach
Practice tip: Time yourself (30-40 minutes) and explain out loud as if in an interview.
Common Follow-Up Questions
Be prepared for:
- “How do you handle video uploads?” (Add upload service, queue for processing)
- “What about live streaming?” (Add live streaming infrastructure)
- “How do you ensure availability?” (Add redundancy, health checks)
- “What’s the cost?” (Estimate based on scale)
Next Steps
In the next lesson, we’ll learn about SLOs (Service Level Objectives) - another common interview topic about defining performance targets.