WPP Software Engineer
System Design & Scalability
1. Real-Time Campaign Management System Architecture
Level: Senior Software Engineer
Difficulty: Hard
Source: Extrapolated from GroupM/Choreograph Technology Stack
Team: Platform Engineering, Ad Tech
Interview Round: System Design
Question: “Design a real-time campaign management system handling 100,000 requests/second during product launches. Track impressions, clicks, conversions across display, social, and search with near-real-time reporting.”
Concise Answer:
Architecture Overview:
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Load │ -> │ API Gateway │ -> │ Event Ingestion │
│ Balancer │ │ (Rate Limit) │ │ (Kafka/Kinesis) │
└─────────────┘ └──────────────┘ └─────────────────┘
│
┌─────────────────────────┼─────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌─────────────────┐ ┌──────────┐
│ Stream Proc │ │ Redis │ │ S3/Data │
│ (Flink/Spark) │ ----> │ (Real-time) │ │ Lake │
└───────────────┘ └─────────────────┘ └──────────┘
│ │
▼ ▼
┌───────────────┐ ┌─────────────────┐
│ Cassandra/ │ │ PostgreSQL │
│ DynamoDB │ <---- │ (Aggregated) │
│ (Events) │ │ │
└───────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Reporting API │
│ + Dashboard │
└─────────────────┘Technology Stack:
Ingestion Layer:
// API Gateway with rate limitingconst rateLimit = require('express-rate-limit');const apiLimiter = rateLimit({
windowMs: 1000, // 1 second max: 100000, // 100K requests per second standardHeaders: true});app.post('/api/events', apiLimiter, async (req, res) => {
const event = req.body; // Async publish to Kafka (don't block response) kafkaProducer.send({
topic: 'campaign-events', messages: [{
key: event.campaign_id, value: JSON.stringify(event)
}]
}).catch(err => logger.error('Kafka publish failed', err)); // Immediate 202 response res.status(202).send();});Stream Processing:
from pyspark.sql import SparkSession
from pyspark.sql.functions import window, col, count, sumspark = SparkSession.builder.appName("CampaignMetrics").getOrCreate()
# Read from Kafkaevents = spark.readStream \ .format("kafka") \ .option("kafka.bootstrap.servers", "localhost:9092") \ .option("subscribe", "campaign-events") \ .load()
# Real-time aggregation (5-second windows)metrics = events \ .selectExpr("CAST(value AS STRING)") \ .select(from_json(col("value"), event_schema).alias("data")) \ .groupBy(
window(col("data.timestamp"), "5 seconds"),
col("data.campaign_id")
) \ .agg(
count("*").alias("impressions"),
sum("data.cost").alias("spend"),
count(when(col("data.event_type") == "click", 1)).alias("clicks")
)
# Write to Redis for real-time dashboardmetrics.writeStream \ .foreach(RedisWriter()) \ .start()Data Storage Strategy:
-- Hot Path: Redis (sub-second access)Key: campaign:{campaign_id}:realtimeValue: {
"impressions": 15234,
"clicks": 456,
"spend": 1234.56,
"updated_at": "2025-01-15T10:30:00Z"}
TTL: 3600 seconds
-- Warm Path: Cassandra (recent events, high write throughput)CREATE TABLE campaign_events (
campaign_id UUID,
event_date DATE,
event_id TIMEUUID,
event_type TEXT,
user_id UUID,
cost DECIMAL,
PRIMARY KEY ((campaign_id, event_date), event_id)
) WITH CLUSTERING ORDER BY (event_id DESC);
-- Cold Path: S3/Parquet (historical analysis)s3://campaign-data-lake/year=2025/month=01/day=15/events.parquetScalability Approach:
- Horizontal Scaling: Stateless API services behind ALB, auto-scale based on CPU/requests
- Partitioning: Kafka partitions by
campaign_idfor parallel processing
- Caching: Redis for real-time counters (5-second refresh)
- Database Sharding: Cassandra sharded by
campaign_id
- Asynchronous Processing: Non-critical aggregations in batch jobs
Performance Optimizations:
// Batch writes to reduce database pressureclass EventBatcher {
constructor(maxSize = 1000, maxWait = 1000) {
this.batch = []; this.maxSize = maxSize; this.maxWait = maxWait; this.timer = null; }
add(event) {
this.batch.push(event); if (this.batch.length >= this.maxSize) {
this.flush(); } else if (!this.timer) {
this.timer = setTimeout(() => this.flush(), this.maxWait); }
}
async flush() {
if (this.batch.length === 0) return; const toWrite = this.batch; this.batch = []; clearTimeout(this.timer); this.timer = null; await cassandra.batchWrite(toWrite); }
}Trade-offs Addressed:
| Consideration | Choice | Rationale |
|---|---|---|
| Consistency | Eventual consistency | Real-time analytics can tolerate 5-10s delay |
| Storage | Tiered (Redis/Cassandra/S3) | Balance cost and performance |
| Processing | Stream + Batch hybrid | Real-time for dashboard, batch for complex reports |
| Database | Cassandra for events, PostgreSQL for aggregates | Write-heavy vs. read-heavy optimization |
Monitoring:
// Distributed tracingconst { trace } = require('@opentelemetry/api');app.post('/api/events', async (req, res) => {
const span = trace.getTracer('api').startSpan('process_event'); try {
await processEvent(req.body); span.setStatus({ code: SpanStatusCode.OK }); } catch (error) {
span.recordException(error); span.setStatus({ code: SpanStatusCode.ERROR }); } finally {
span.end(); }
});// Metricsmetrics.histogram('api.latency', Date.now() - startTime);metrics.increment('events.processed');Expected Outcomes:
- Throughput: 100K+ requests/second with auto-scaling
- Latency: <100ms API response, <5s dashboard updates
- Availability: 99.95% uptime with multi-AZ deployment
- Cost: ~$15K/month at scale (reserved instances + spot for batch)
Algorithms & Data Structures
2. Ad Frequency Capping at Scale
Level: Mid-Senior Software Engineer
Difficulty: Moderate-Hard
Source: Digital Advertising Best Practices
Team: Ad Tech, Platform Engineering
Interview Round: Technical Coding
Question: “Implement frequency capping ensuring users see the same ad ≤N times/day across devices. Handle 50M active users. Optimize for speed and memory.”
Concise Answer:
Approach 1: Exact Counting (Small Scale)
from collections import defaultdict
from datetime import datetime, timedelta
class FrequencyCapper:
def __init__(self, max_impressions_per_day):
self.max_impressions = max_impressions_per_day
self.user_impressions = defaultdict(list) # {user_ad: [timestamps]} def can_show_ad(self, user_id, ad_id):
key = f"{user_id}:{ad_id}" now = datetime.now()
cutoff = now - timedelta(days=1)
# Remove old impressions self.user_impressions[key] = [
ts for ts in self.user_impressions[key] if ts > cutoff
]
return len(self.user_impressions[key]) < self.max_impressions
def record_impression(self, user_id, ad_id):
key = f"{user_id}:{ad_id}" self.user_impressions[key].append(datetime.now())
# Complexity: O(k) time, O(n*k) space where k=impressions/user# Problem: ~10GB+ memory for 50M usersApproach 2: Probabilistic (Production Scale)
import mmh3
import numpy as np
from datetime import datetime
class ScalableFrequencyCapper:
"""Count-Min Sketch for memory-efficient counting""" def __init__(self, max_impressions, width=100000, depth=5):
self.max_impressions = max_impressions
self.width = width
self.depth = depth
self.counts = np.zeros((depth, width), dtype=np.int16)
self.last_reset = datetime.now().date()
def _check_daily_reset(self):
today = datetime.now().date()
if today > self.last_reset:
self.counts.fill(0)
self.last_reset = today
def _hash(self, user_id, ad_id, seed):
key = f"{user_id}:{ad_id}" return mmh3.hash(key, seed) % self.width
def can_show_ad(self, user_id, ad_id):
self._check_daily_reset()
# Get minimum count across hash functions min_count = min(
self.counts[i][self._hash(user_id, ad_id, i)]
for i in range(self.depth)
)
return min_count < self.max_impressions
def record_impression(self, user_id, ad_id):
for i in range(self.depth):
idx = self._hash(user_id, ad_id, i)
self.counts[i][idx] += 1# Complexity: O(d) time where d=depth (constant ~5)# Space: O(width * depth) = ~1MB for 100K width, 5 depth# Handles 50M+ users with minimal memoryApproach 3: Distributed (Redis)
import redis
from datetime import datetime, timedelta
class DistributedFrequencyCapper:
def __init__(self, redis_client, max_impressions):
self.redis = redis_client
self.max_impressions = max_impressions
def can_show_ad(self, user_id, ad_id):
key = f"freq:{user_id}:{ad_id}" count = self.redis.get(key)
if count is None:
return True return int(count) < self.max_impressions
def record_impression(self, user_id, ad_id):
key = f"freq:{user_id}:{ad_id}" pipe = self.redis.pipeline()
# Atomic increment pipe.incr(key)
# Set expiry to end of day if new key seconds_until_midnight = (
datetime.combine(datetime.now().date() + timedelta(days=1),
datetime.min.time()) - datetime.now()
).seconds
pipe.expire(key, seconds_until_midnight)
pipe.execute()
# Scales horizontally with Redis Cluster# Trade-off: Network latency vs. memory efficiencyComparison:
| Approach | Memory (50M users) | Latency | Accuracy | Best For |
|---|---|---|---|---|
| Exact | ~10GB | O(k) | 100% | Small scale |
| Count-Min Sketch | ~1MB | O(1) | 95-99% | High memory constraints |
| Redis | Distributed | <1ms | 100% | Production (horizontal scale) |
Production Implementation:
class HybridFrequencyCapper:
"""Combine local cache + distributed store""" def __init__(self, redis_client, max_impressions):
self.redis = redis_client
self.local_cache = {} # LRU cache self.cache_size = 10000 self.max_impressions = max_impressions
def can_show_ad(self, user_id, ad_id):
key = f"{user_id}:{ad_id}" # Check local cache first if key in self.local_cache:
return self.local_cache[key] < self.max_impressions
# Fallback to Redis count = self.redis.get(f"freq:{key}")
count = int(count) if count else 0 # Update local cache if len(self.local_cache) >= self.cache_size:
self.local_cache.popitem() # Remove oldest self.local_cache[key] = count
return count < self.max_impressions
def record_impression(self, user_id, ad_id):
key = f"{user_id}:{ad_id}" # Update both cache and Redis self.local_cache[key] = self.local_cache.get(key, 0) + 1 self.redis.incr(f"freq:{key}")
# Reduces Redis calls by 80%+ with local cachingCross-Device Tracking:
# Probabilistic user matchingdef get_unified_user_id(user_identifiers):
""" Combine device IDs, cookie IDs, login IDs """ deterministic_ids = [
id for id in user_identifiers
if id['type'] in ['email_hash', 'login_id']
]
if deterministic_ids:
return deterministic_ids[0]['value']
# Probabilistic matching via device graph return device_graph.match(user_identifiers)Expected Outcomes:
- Memory: <5MB for Count-Min Sketch, distributed with Redis
- Latency: <1ms lookup, <2ms update
- Accuracy: 99%+ (over-capping acceptable, under-capping not)
- Scalability: Linear with Redis Cluster sharding
API Design & Backend Development
3. Campaign Management RESTful API
Level: Mid-Senior Software Engineer
Difficulty: Moderate
Source: Marketing Platform Best Practices
Team: Platform Engineering, Backend
Interview Round: Technical Design
Question: “Design a RESTful API for campaign management supporting CRUD operations, asset uploads, targeting, scheduling, and performance reports. Define endpoints, auth, rate limiting, and versioning.”
Concise Answer:
Core Endpoints:
Authentication:
POST /api/v1/auth/login
POST /api/v1/auth/refresh
POST /api/v1/auth/logout
Campaigns:
GET /api/v1/campaigns # List (paginated, filtered)
POST /api/v1/campaigns # Create
GET /api/v1/campaigns/{id} # Get details
PUT /api/v1/campaigns/{id} # Full update
PATCH /api/v1/campaigns/{id} # Partial update
DELETE /api/v1/campaigns/{id} # Delete
PATCH /api/v1/campaigns/{id}/status # Activate/pause
Assets:
POST /api/v1/campaigns/{id}/assets # Upload (multipart/form-data)
GET /api/v1/campaigns/{id}/assets
DELETE /api/v1/campaigns/{id}/assets/{assetId}
Targeting:
PUT /api/v1/campaigns/{id}/targeting
GET /api/v1/campaigns/{id}/targeting
Reports:
GET /api/v1/campaigns/{id}/reports?start_date=...&end_date=...
GET /api/v1/campaigns/{id}/reports/export?format=csvRequest/Response Format:
// POST /api/v1/campaigns{
"name": "Summer Sale 2025", "brand_id": "brand-123", "budget": {
"total": 50000, "currency": "USD", "daily_cap": 2000 }, "schedule": {
"start_date": "2025-06-01T00:00:00Z", "end_date": "2025-08-31T23:59:59Z" }, "objectives": ["awareness", "conversions"]
}
// Response: 201 Created{
"id": "campaign-789", "name": "Summer Sale 2025", "status": "draft", "created_at": "2025-05-15T10:30:00Z", "created_by": "user-456", "budget": { ... }, "_links": {
"self": "/api/v1/campaigns/campaign-789", "assets": "/api/v1/campaigns/campaign-789/assets", "reports": "/api/v1/campaigns/campaign-789/reports" }
}Pagination & Filtering:
// GET /api/v1/campaigns?page=2&limit=50&status=active&sort=-created_atapp.get('/api/v1/campaigns', authenticate, async (req, res) => {
const {
page = 1, limit = 50, status, brand_id, sort = '-created_at' } = req.query; const query = { tenant_id: req.user.tenant_id }; if (status) query.status = status; if (brand_id) query.brand_id = brand_id; const sortField = sort.startsWith('-') ? sort.slice(1) : sort; const sortOrder = sort.startsWith('-') ? -1 : 1; const [campaigns, total] = await Promise.all([
Campaign.find(query)
.sort({ [sortField]: sortOrder })
.skip((page - 1) * limit)
.limit(limit)
.lean(), Campaign.countDocuments(query)
]); res.json({
data: campaigns, pagination: {
page: parseInt(page), limit: parseInt(limit), total_pages: Math.ceil(total / limit), total_count: total
}, _links: {
next: page * limit < total ? `/api/v1/campaigns?page=${parseInt(page)+1}&limit=${limit}` : null, prev: page > 1 ? `/api/v1/campaigns?page=${parseInt(page)-1}&limit=${limit}` : null }
});});Authentication (JWT):
const jwt = require('jsonwebtoken');const bcrypt = require('bcrypt');// Loginapp.post('/api/v1/auth/login', async (req, res) => {
const { email, password } = req.body; const user = await User.findOne({ email }); if (!user || !await bcrypt.compare(password, user.password_hash)) {
return res.status(401).json({ error: 'Invalid credentials' }); }
const accessToken = jwt.sign(
{ user_id: user.id, tenant_id: user.tenant_id, role: user.role }, process.env.JWT_SECRET, { expiresIn: '15m' }
); const refreshToken = jwt.sign(
{ user_id: user.id }, process.env.JWT_REFRESH_SECRET, { expiresIn: '7d' }
); await RefreshToken.create({ token: refreshToken, user_id: user.id }); res.json({ access_token: accessToken, refresh_token: refreshToken });});// Middlewarefunction authenticate(req, res, next) {
const token = req.headers.authorization?.replace('Bearer ', ''); if (!token) return res.status(401).json({ error: 'No token' }); try {
req.user = jwt.verify(token, process.env.JWT_SECRET); next(); } catch (error) {
res.status(401).json({ error: 'Invalid token' }); }
}Rate Limiting:
const rateLimit = require('express-rate-limit');const apiLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes max: 1000, // 1000 requests per window standardHeaders: true, keyGenerator: (req) => req.user?.user_id || req.ip, handler: (req, res) => {
res.status(429).json({
error: 'Rate limit exceeded', retry_after: Math.ceil(req.rateLimit.resetTime / 1000)
}); }
});app.use('/api', authenticate, apiLimiter);Multi-Tenancy:
// Middleware for tenant isolationfunction enforceTenancy(req, res, next) {
req.tenantId = req.user.tenant_id; next();}
// All queries auto-filter by tenantasync function getCampaign(req, res) {
const campaign = await Campaign.findOne({
_id: req.params.id, tenant_id: req.tenantId // Automatic isolation }); if (!campaign) {
return res.status(404).json({ error: 'Campaign not found' }); }
res.json(campaign);}Error Handling:
// 400 Bad Request{
"error": {
"code": "VALIDATION_ERROR", "message": "Request validation failed", "errors": [
{ "field": "budget.total", "message": "Must be positive number" }, { "field": "schedule.start_date", "message": "Must be future date" }
]
}
}
// Validation middlewareconst { body, validationResult } = require('express-validator');app.post('/api/v1/campaigns', authenticate, body('name').notEmpty().trim(), body('budget.total').isFloat({ gt: 0 }), body('schedule.start_date').isISO8601(), async (req, res) => {
const errors = validationResult(req); if (!errors.isEmpty()) {
return res.status(400).json({ error: { code: 'VALIDATION_ERROR', errors: errors.array() } }); }
// ... create campaign }
);API Versioning:
// URL-based versioningapp.use('/api/v1', routerV1);app.use('/api/v2', routerV2);// Deprecation headerapp.use('/api/v1', (req, res, next) => {
res.set('Sunset', 'Sat, 31 Dec 2025 23:59:59 GMT'); res.set('Link', '</api/v2>; rel="successor-version"'); next();});Expected Outcomes:
- Consistency: RESTful conventions, predictable responses
- Security: JWT auth, rate limiting, tenant isolation
- Performance: Pagination, caching headers, efficient queries
- Developer Experience: Clear errors, HATEOAS links, OpenAPI docs
Frontend Development
4. Asset Management Dashboard Performance
Level: Frontend Engineer
Difficulty: Moderate-Hard
Source: Hogarth Digital Asset Management
Team: Creative Technology, Frontend
Interview Round: Technical Coding
Question: “Optimize a React dashboard displaying 10,000+ image thumbnails. Users experience slow load times. How do you fix performance?”
Concise Answer:
Problem Diagnosis:
- Initial State: 10,000 DOM nodes, 5s load time, 500MB memory
- Root Causes: Rendering all items, loading full images, no virtualization
Solution 1: Virtualization (react-window)
import { FixedSizeGrid } from 'react-window';import AutoSizer from 'react-virtualized-auto-sizer';function AssetGrid({ assets }) {
const COLUMN_COUNT = 5; const COLUMN_WIDTH = 200; const ROW_HEIGHT = 200; const Cell = ({ columnIndex, rowIndex, style }) => {
const index = rowIndex * COLUMN_COUNT + columnIndex; if (index >= assets.length) return null; return (
<div style={style}> <AssetCard asset={assets[index]} /> </div> ); }; return (
<AutoSizer> {({ height, width }) => (
<FixedSizeGrid columnCount={COLUMN_COUNT} columnWidth={COLUMN_WIDTH} height={height} rowCount={Math.ceil(assets.length / COLUMN_COUNT)} rowHeight={ROW_HEIGHT} width={width} > {Cell} </FixedSizeGrid> )} </AutoSizer> );}
// Only renders ~30 visible items instead of 10,000Solution 2: Image Optimization
function AssetCard({ asset }) {
return (
<img src={asset.thumbnail_url} // Serve 150x150px, not 4K original srcSet={` ${asset.thumbnail_small} 150w, ${asset.thumbnail_medium} 300w `} sizes="(max-width: 768px) 150px, 200px" loading="lazy" // Native lazy loading alt={asset.name} onError={(e) => e.target.src = '/fallback-thumbnail.png'} /> );}
// Backend: Generate thumbnails on uploadasync function processUpload(file) {
const original = await uploadToS3(file); // Generate WebP thumbnails const thumbnail = await sharp(file.buffer)
.resize(150, 150)
.webp({ quality: 80 })
.toBuffer(); const thumbnailUrl = await uploadToS3(thumbnail, 'thumbnail_'); return { original_url: original, thumbnail_url: thumbnailUrl };}Solution 3: Data Fetching Optimization
import { useInfiniteQuery } from '@tanstack/react-query';function useAssets() {
return useInfiniteQuery({
queryKey: ['assets'], queryFn: ({ pageParam = 0 }) =>
fetch(`/api/assets?offset=${pageParam}&limit=100`).then(r => r.json()), getNextPageParam: (lastPage) => lastPage.next_offset, staleTime: 5 * 60 * 1000, // Cache for 5 minutes });}
function AssetManager() {
const { data, fetchNextPage, hasNextPage, isFetchingNextPage } = useAssets(); const assets = data?.pages.flatMap(page => page.assets) || []; return (
<InfiniteLoader isItemLoaded={index => index < assets.length} loadMoreItems={fetchNextPage} itemCount={hasNextPage ? assets.length + 1 : assets.length} > {({ onItemsRendered, ref }) => (
<AssetGrid assets={assets} onItemsRendered={onItemsRendered} ref={ref} /> )} </InfiniteLoader> );}Solution 4: Component Optimization
// Memoize expensive componentsconst AssetCard = React.memo(({ asset }) => {
return (
<div className="asset-card"> <img src={asset.thumbnail_url} alt={asset.name} /> <p>{asset.name}</p> </div> );}, (prev, next) => prev.asset.id === next.asset.id);// useMemo for expensive computationsconst sortedAssets = useMemo(() => {
return assets.sort((a, b) => b.created_at - a.created_at);}, [assets]);// useCallback for stable callbacksconst handleAssetClick = useCallback((assetId) => {
navigate(`/assets/${assetId}`);}, [navigate]);Solution 5: Code Splitting
import { lazy, Suspense } from 'react';// Lazy load asset detail viewconst AssetDetail = lazy(() => import('./AssetDetail'));function App() {
return (
<Suspense fallback={<Spinner />}> <AssetDetail /> </Suspense> );}Performance Monitoring:
// Web Vitals trackingimport { onCLS, onFID, onLCP } from 'web-vitals';onLCP(metric => analytics.track('LCP', metric.value));onFID(metric => analytics.track('FID', metric.value));onCLS(metric => analytics.track('CLS', metric.value));// Performance budget in CI// lighthouse-ci.json{
"ci": {
"assert": {
"assertions": {
"first-contentful-paint": ["error", { "maxNumericValue": 2000 }], "largest-contentful-paint": ["error", { "maxNumericValue": 2500 }], "total-blocking-time": ["error", { "maxNumericValue": 300 }]
}
}
}
}Expected Outcomes:
- Load Time: 5s → 0.8s (84% improvement)
- Memory: 500MB → 50MB (90% reduction)
- DOM Nodes: 10,000 → 30 (99.7% reduction)
- FPS: Smooth 60fps scrolling
Database Optimization
5. Campaign Reporting Query Performance
Level: Backend Engineer, Data Engineer
Difficulty: Hard
Source: Marketing Analytics Platforms
Team: Data Platform, Backend
Interview Round: Technical Deep Dive
Question: “This query takes 45 seconds on 50M campaign rows. Optimize it.”
SELECT
c.campaign_name, COUNT(i.id) as impressions,
SUM(i.cost) as spend, COUNT(DISTINCT i.user_id) as unique_users
FROM campaigns c
LEFT JOIN impressions i ON c.id = i.campaign_id
WHERE c.status = 'active' AND i.impression_date BETWEEN '2025-06-01' AND '2025-06-30'GROUP BY c.id, c.campaign_name
ORDER BY spend DESC LIMIT 100;Concise Answer:
Step 1: Analyze Execution Plan
EXPLAIN ANALYZE [query];
-- Likely issues:-- 1. Sequential scan on campaigns (no index on status)-- 2. Sequential scan on impressions (no index on date/campaign_id)-- 3. Full table JOIN before filteringStep 2: Add Indexes
-- Partial index for active campaignsCREATE INDEX idx_campaigns_active ON campaigns(status, start_date)
WHERE status = 'active';
-- Composite index for impressionsCREATE INDEX idx_impressions_campaign_date ON impressions(campaign_id, impression_date)
INCLUDE (cost, user_id);
-- Or covering indexCREATE INDEX idx_impressions_covering ON impressions(
campaign_id, impression_date, id, cost, user_id
) WHERE impression_date >= '2025-01-01';Step 3: Rewrite Query
-- Optimized version with CTEsWITH active_campaigns AS (
SELECT id, campaign_name
FROM campaigns
WHERE status = 'active' AND start_date >= '2025-01-01'),
impression_stats AS (
SELECT
campaign_id,
COUNT(*) as impression_count,
SUM(cost) as total_spend,
COUNT(DISTINCT user_id) as unique_users
FROM impressions
WHERE impression_date BETWEEN '2025-06-01' AND '2025-06-30' AND campaign_id IN (SELECT id FROM active_campaigns)
GROUP BY campaign_id
)
SELECT
ac.campaign_name,
COALESCE(ist.impression_count, 0) as impressions,
COALESCE(ist.total_spend, 0) as spend,
COALESCE(ist.unique_users, 0) as unique_users
FROM active_campaigns ac
LEFT JOIN impression_stats ist ON ac.id = ist.campaign_id
ORDER BY spend DESC NULLS LASTLIMIT 100;Step 4: Materialized Views
-- Pre-aggregate daily statsCREATE MATERIALIZED VIEW daily_campaign_stats ASSELECT
campaign_id,
DATE(impression_date) as day,
COUNT(*) as impressions,
SUM(cost) as spend,
COUNT(DISTINCT user_id) as unique_users
FROM impressions
GROUP BY campaign_id, DATE(impression_date);
-- Refresh nightlyREFRESH MATERIALIZED VIEW CONCURRENTLY daily_campaign_stats;
-- Query becomes trivialSELECT
c.campaign_name,
SUM(dcs.impressions) as impressions,
SUM(dcs.spend) as spend,
SUM(dcs.unique_users) as unique_users
FROM campaigns c
JOIN daily_campaign_stats dcs ON c.id = dcs.campaign_id
WHERE c.status = 'active' AND dcs.day BETWEEN '2025-06-01' AND '2025-06-30'GROUP BY c.id, c.campaign_name
ORDER BY spend DESCLIMIT 100;
-- Now executes in <200msStep 5: Partitioning
-- Partition impressions by monthCREATE TABLE impressions (
id BIGSERIAL,
campaign_id BIGINT,
impression_date DATE,
cost DECIMAL(10,2),
user_id BIGINT
) PARTITION BY RANGE (impression_date);
CREATE TABLE impressions_2025_06 PARTITION OF impressions
FOR VALUES FROM ('2025-06-01') TO ('2025-07-01');
CREATE INDEX ON impressions_2025_06(campaign_id, impression_date);
-- Queries automatically scan only relevant partitionStep 6: Application-Level Caching
const redis = require('redis');async function getCampaignReport(startDate, endDate) {
const cacheKey = `report:${startDate}:${endDate}`; // Check cache const cached = await redis.get(cacheKey); if (cached) return JSON.parse(cached); // Query database const result = await db.query(optimizedQuery, [startDate, endDate]); // Cache for 5 minutes await redis.setex(cacheKey, 300, JSON.stringify(result)); return result;}Performance Comparison:
| Optimization | Query Time | Improvement |
|---|---|---|
| Original | 45s | Baseline |
| + Indexes | 8s | 82% |
| + Query Rewrite | 2s | 96% |
| + Materialized View | 200ms | 99.6% |
| + Caching | <10ms | 99.98% |
Expected Outcomes:
- Query Time: 45s → 200ms (99.6% faster)
- Database Load: 80% reduction in CPU usage
- Scalability: Handles 10x data growth with same performance
- Cost: Lower RDS instance size saves $500/month
Microservices & Distributed Systems
6. Resilient Microservices Communication
Level: Senior Software Engineer
Difficulty: Hard
Source: Distributed Systems Best Practices
Team: Platform Engineering, DevOps
Interview Round: System Design
Question: “Service A (campaign management) calls Service B (targeting) and Service C (asset delivery). How do you handle failures when B or C are down? Design a resilient system.”
Concise Answer:
Resilience Patterns:
1. Circuit Breaker Pattern
class CircuitBreaker {
constructor(service, options = {}) {
this.service = service; this.failureThreshold = options.failureThreshold || 5; this.timeout = options.timeout || 3000; this.resetTimeout = options.resetTimeout || 60000; this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN this.failureCount = 0; this.nextAttempt = Date.now(); }
async call(method, ...args) {
if (this.state === 'OPEN') {
if (Date.now() < this.nextAttempt) {
throw new Error('Circuit breaker OPEN'); }
this.state = 'HALF_OPEN'; }
try {
const result = await Promise.race([
this.service[method](...args), this.timeout(this.timeout)
]); this.onSuccess(); return result; } catch (error) {
this.onFailure(); throw error; }
}
onSuccess() {
this.failureCount = 0; if (this.state === 'HALF_OPEN') this.state = 'CLOSED'; }
onFailure() {
this.failureCount++; if (this.failureCount >= this.failureThreshold) {
this.state = 'OPEN'; this.nextAttempt = Date.now() + this.resetTimeout; }
}
}
// Usageconst targetingBreaker = new CircuitBreaker(targetingService, {
failureThreshold: 5, timeout: 3000, resetTimeout: 30000});2. Retry with Exponential Backoff
async function retryWithBackoff(fn, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn(); } catch (error) {
if (i === maxRetries - 1) throw error; const delay = Math.min(1000 * Math.pow(2, i), 10000); await new Promise(resolve => setTimeout(resolve, delay)); }
}
}3. Fallback Strategies
async function getCampaignData(campaignId) {
try {
const [targeting, assets] = await Promise.all([
targetingBreaker.call('getAudience', campaignId), assetBreaker.call('getAssets', campaignId)
]); return { targeting, assets, source: 'live' }; } catch (error) {
logger.warn('Live services failed', { error, campaignId }); // Fallback 1: Cached data const cached = await cache.get(`campaign:${campaignId}`); if (cached) return { ...cached, source: 'cache' }; // Fallback 2: Default/degraded data return {
targeting: { audience: 'broad', segments: [] }, assets: { creative_id: 'default' }, source: 'default', degraded: true }; }
}4. Event-Driven Async Communication
// Don't wait for processing - publish eventsclass CampaignService {
async createCampaign(data) {
// Create campaign in local DB const campaign = await db.campaigns.create(data); // Publish event (fire and forget) await eventBus.publish('campaign.created', {
campaign_id: campaign.id, timestamp: new Date()
}); return campaign; // Return immediately }
}
// Subscribers process asynchronouslyeventBus.subscribe('campaign.created', async (event) => {
try {
await targetingService.initializeAudience(event.campaign_id); } catch (error) {
// Dead letter queue for retries await dlq.enqueue('campaign.created', event); }
});5. Health Checks & Monitoring
// Service health endpointapp.get('/health', async (req, res) => {
const checks = {
database: await checkDatabase(), redis: await checkRedis(), targeting_service: await checkService('targeting'), asset_service: await checkService('assets')
}; const healthy = Object.values(checks).every(c => c.status === 'ok'); res.status(healthy ? 200 : 503).json(checks);});// Distributed tracingconst { trace } = require('@opentelemetry/api');async function callService(serviceName, fn) {
const span = trace.getTracer('api').startSpan(`call_${serviceName}`); try {
const result = await fn(); span.setStatus({ code: SpanStatusCode.OK }); return result; } catch (error) {
span.recordException(error); metrics.increment(`service.${serviceName}.error`); throw error; } finally {
span.end(); }
}Architectural Recommendations:
# Kubernetes deployment with service mesh (Istio)apiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: campaign-servicespec: hosts: - campaign-service http: - route: - destination: host: campaign-service retries: attempts: 3 perTryTimeout: 2s timeout: 10sExpected Outcomes:
- Availability: 99.9%+ despite downstream failures
- Latency: <100ms with circuit breaker (vs. 30s timeout)
- Error Recovery: Automatic retry with exponential backoff
- Observability: Full request tracing across services
DevOps & CI/CD
7. Zero-Downtime Deployment Pipeline
Level: Senior Software Engineer, DevOps Engineer
Difficulty: Moderate-Hard
Source: AWS Best Practices
Team: Platform Engineering, Infrastructure
Interview Round: System Design
Question: “Design a CI/CD pipeline for microservices on AWS with automated testing, security scanning, and zero-downtime deployments.”
Concise Answer:
Pipeline Architecture:
# .github/workflows/deploy.ymlname: CI/CD Pipelineon: push: branches: [main, develop] pull_request: branches: [main]jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: '18' cache: 'npm' - name: Install & Test run: | npm ci
npm run lint
npm run test:unit
npm run test:integration
- name: Code Coverage run: npm run coverage security: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Security Scan run: | npm audit --audit-level=high
docker build -t app:${{ github.sha }} .
trivy image app:${{ github.sha }}
build: needs: [test, security] runs-on: ubuntu-latest steps: - name: Build & Push to ECR run: | aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_REGISTRY
docker build -t $ECR_REGISTRY/campaign-service:${{ github.sha }} .
docker push $ECR_REGISTRY/campaign-service:${{ github.sha }}
deploy-production: needs: build if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest steps: - name: Blue-Green Deployment run: | # Deploy to green environment
aws ecs update-service \
--cluster prod \
--service campaign-green \
--task-definition campaign:${{ github.sha }} \
--force-new-deployment
# Wait for green to be healthy
aws ecs wait services-stable --cluster prod --services campaign-green
# Run smoke tests
./scripts/smoke-test.sh https://green.api.com
# Switch traffic (update ALB target group)
aws elbv2 modify-listener \
--listener-arn $LISTENER_ARN \
--default-actions TargetGroupArn=$GREEN_TG
# Monitor for 10 minutes
./scripts/monitor.sh --duration=10m
- name: Rollback on Failure if: failure() run: | aws elbv2 modify-listener \
--listener-arn $LISTENER_ARN \
--default-actions TargetGroupArn=$BLUE_TGInfrastructure as Code (Terraform):
resource "aws_ecs_service" "campaign" {
name = "campaign-service"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.campaign.arn
desired_count = 3
deployment_configuration {
maximum_percent = 200 # Allow double capacity during deploy
minimum_healthy_percent = 100 # Never drop below 100%
}
deployment_circuit_breaker {
enable = true
rollback = true # Auto-rollback on failure
}
load_balancer {
target_group_arn = aws_lb_target_group.campaign.arn
container_name = "campaign"
container_port = 3000
}
}Deployment Strategies:
BLUE-GREEN DEPLOYMENT:
┌─────────────┐ ┌─────────────┐
│ Blue │ │ Green │
│ (Current) │ │ (New) │
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
┌─────────────────────────────┐
│ Load Balancer │
│ Traffic: 100% Blue │
└─────────────────────────────┘
↓
After validation
↓
┌─────────────────────────────┐
│ Load Balancer │
│ Traffic: 100% Green │
└─────────────────────────────┘
CANARY DEPLOYMENT:
5% → 25% → 50% → 100% traffic shift
Monitor metrics at each stageTesting Strategy:
// Unit Tests (80%+ coverage)describe('Campaign Service', () => {
it('should create campaign', async () => {
const campaign = await service.create({ name: 'Test' }); expect(campaign.id).toBeDefined(); });});// Integration Testsdescribe('Campaign API', () => {
it('should persist to database', async () => {
const res = await request(app).post('/campaigns').send(data); expect(res.status).toBe(201); const dbRecord = await db.campaigns.findById(res.body.id); expect(dbRecord).toBeDefined(); });});// Contract Tests (Pact)describe('Targeting Service Contract', () => {
it('should return audience data', () => {
provider.addInteraction({
uponReceiving: 'request for audience', withRequest: { method: 'GET', path: '/audience/123' }, willRespondWith: { status: 200, body: { segments: [] } }
}); });});Monitoring & Rollback:
// Automated rollback on errorsclass DeploymentMonitor {
async monitor(duration = 600000) { // 10 minutes const startTime = Date.now(); while (Date.now() - startTime < duration) {
const metrics = await getMetrics(); if (metrics.errorRate > 0.01) { // >1% error rate throw new Error('Error rate threshold exceeded'); }
if (metrics.p99Latency > 1000) { // >1s p99 throw new Error('Latency threshold exceeded'); }
await sleep(30000); // Check every 30s }
}
}Expected Outcomes:
- Deployment Time: <10 minutes end-to-end
- Downtime: 0 seconds (blue-green deployment)
- Rollback Time: <2 minutes (switch traffic back)
- Failure Rate: <0.1% (automated rollback prevents incidents)
Real-Time Systems
8. Campaign Alert Notification System
Level: Senior Software Engineer
Difficulty: Hard
Source: Event-Driven Architecture Patterns
Team: Platform Engineering
Interview Round: System Design
Question: “Design a notification system sending real-time alerts when campaigns reach thresholds (budget exhausted, performance targets met). Handle 1000+ clients with custom rules.”
Concise Answer:
Architecture:
Campaign Events → Kinesis → Lambda → Rule Engine → SQS → Notification Workers
↓
DynamoDB (Rules)Rule Storage:
// DynamoDB Rule Schema{
rule_id: "rule-123", client_id: "client-456", campaign_id: "campaign-789", conditions: [
{ metric: "spend", operator: ">=", threshold: 10000 }, { metric: "ctr", operator: "<", threshold: 1.0 }
], logic: "OR", // AND or OR channels: ["email", "webhook"], cooldown: 3600, // Don't re-trigger for 1 hour notification: {
email: ["manager@client.com"], webhook_url: "https://client.com/webhook", template: "Campaign {{name}} has spent ${{spend}}" }
}Event Processing:
// Lambda function processes campaign eventsexports.handler = async (event) => {
for (const record of event.Records) {
const campaignEvent = JSON.parse(
Buffer.from(record.kinesis.data, 'base64').toString()
); // Fetch rules for this campaign const rules = await getRulesForCampaign(campaignEvent.campaign_id); for (const rule of rules) {
if (evaluateRule(rule, campaignEvent.metrics)) {
// Check cooldown if (await isCooledDown(rule.rule_id)) continue; // Queue notification await sqs.sendMessage({
QueueUrl: NOTIFICATION_QUEUE, MessageBody: JSON.stringify({
rule_id: rule.rule_id, campaign_id: campaignEvent.campaign_id, metrics: campaignEvent.metrics, notification: rule.notification })
}); // Set cooldown await setCooldown(rule.rule_id, rule.cooldown); }
}
}
};function evaluateRule(rule, metrics) {
const results = rule.conditions.map(cond => {
const value = metrics[cond.metric]; switch (cond.operator) {
case '>=': return value >= cond.threshold; case '<': return value < cond.threshold; case '==': return value == cond.threshold; default: return false; }
}); return rule.logic === 'AND'
? results.every(r => r)
: results.some(r => r);}Notification Delivery:
// Worker processes notification queueexports.handler = async (event) => {
for (const record of event.Records) {
const notification = JSON.parse(record.body); // Render message const message = renderTemplate(
notification.notification.template, notification.metrics ); // Send via multiple channels const promises = notification.channels.map(channel => {
switch (channel) {
case 'email': return ses.sendEmail({
To: notification.notification.email, Subject: 'Campaign Alert', Body: message
}); case 'webhook': return axios.post(notification.notification.webhook_url, notification); case 'slack': return axios.post(SLACK_WEBHOOK, { text: message }); }
}); await Promise.allSettled(promises); // Don't fail if one channel fails }
};User Interface:
// React rule builderfunction RuleBuilder({ campaignId }) {
const [conditions, setConditions] = useState([{
metric: 'spend', operator: '>=', threshold: 0 }]); const saveRule = async () => {
await api.post('/rules', {
campaign_id: campaignId, conditions, logic: 'OR', channels: ['email'], notification: { email: [user.email] }
}); }; return (
<div> {conditions.map((cond, i) => (
<div key={i}> <select value={cond.metric} onChange={e => updateCondition(i, 'metric', e.target.value)}> <option value="spend">Spend</option> <option value="ctr">CTR</option> <option value="conversions">Conversions</option> </select> <select value={cond.operator}> <option value=">=">≥</option> <option value="<"><</option> </select> <input type="number" value={cond.threshold} /> </div> ))} <button onClick={saveRule}>Save Rule</button> </div> );}Expected Outcomes:
- Latency: <10s from event to notification delivery
- Throughput: 10,000+ events/second processed
- Scalability: Lambda auto-scales, SQS buffers bursts
- Reliability: Dead letter queue for failed notifications
Security & Authentication
9. Multi-Tenant JWT Authentication
Level: Mid-Senior Software Engineer
Difficulty: Moderate
Source: Security Best Practices
Team: Backend Engineering
Interview Round: Technical Design
Question: “Implement JWT authentication for multi-tenant API. Each tenant accesses only their data. Include token refresh and rate limiting.”
Concise Answer:
JWT Implementation:
const jwt = require('jsonwebtoken');const bcrypt = require('bcrypt');// Loginapp.post('/api/auth/login', async (req, res) => {
const { email, password } = req.body; const user = await User.findOne({ email }); if (!user || !await bcrypt.compare(password, user.password_hash)) {
return res.status(401).json({ error: 'Invalid credentials' }); }
// Generate tokens const accessToken = jwt.sign(
{
user_id: user.id, tenant_id: user.tenant_id, role: user.role }, process.env.JWT_SECRET, { expiresIn: '15m' }
); const refreshToken = jwt.sign(
{ user_id: user.id }, process.env.JWT_REFRESH_SECRET, { expiresIn: '7d' }
); await RefreshToken.create({
token: refreshToken, user_id: user.id, expires_at: new Date(Date.now() + 7 * 24 * 60 * 60 * 1000)
}); res.json({
access_token: accessToken, refresh_token: refreshToken, expires_in: 900 });});// Refresh endpointapp.post('/api/auth/refresh', async (req, res) => {
const { refresh_token } = req.body; try {
const decoded = jwt.verify(refresh_token, process.env.JWT_REFRESH_SECRET); const storedToken = await RefreshToken.findOne({
token: refresh_token, user_id: decoded.user_id, revoked: false }); if (!storedToken || storedToken.expires_at < new Date()) {
return res.status(401).json({ error: 'Invalid refresh token' }); }
const user = await User.findById(decoded.user_id); const accessToken = jwt.sign(
{ user_id: user.id, tenant_id: user.tenant_id, role: user.role }, process.env.JWT_SECRET, { expiresIn: '15m' }
); res.json({ access_token: accessToken }); } catch (error) {
res.status(401).json({ error: 'Invalid refresh token' }); }
});Authentication Middleware:
function authenticate(req, res, next) {
const token = req.headers.authorization?.replace('Bearer ', ''); if (!token) return res.status(401).json({ error: 'No token' }); try {
req.user = jwt.verify(token, process.env.JWT_SECRET); next(); } catch (error) {
res.status(401).json({ error: 'Invalid or expired token' }); }
}Multi-Tenant Isolation:
// Automatic tenant filteringfunction enforceTenancy(req, res, next) {
req.tenantId = req.user.tenant_id; next();}
app.use('/api', authenticate, enforceTenancy);// All queries auto-filter by tenantapp.get('/api/campaigns', async (req, res) => {
const campaigns = await Campaign.find({
tenant_id: req.tenantId // Automatic isolation }); res.json(campaigns);});// Prevent cross-tenant accessapp.get('/api/campaigns/:id', async (req, res) => {
const campaign = await Campaign.findOne({
_id: req.params.id, tenant_id: req.tenantId }); if (!campaign) {
return res.status(404).json({ error: 'Not found' }); }
res.json(campaign);});Rate Limiting:
const rateLimit = require('express-rate-limit');const RedisStore = require('rate-limit-redis');const limiter = rateLimit({
store: new RedisStore({ client: redisClient }), windowMs: 15 * 60 * 1000, // 15 minutes max: 1000, keyGenerator: (req) => req.user?.user_id || req.ip, handler: (req, res) => {
res.status(429).json({
error: 'Rate limit exceeded', retry_after: Math.ceil(req.rateLimit.resetTime / 1000)
}); }
});app.use('/api', authenticate, limiter);RBAC (Role-Based Access Control):
function authorize(...allowedRoles) {
return (req, res, next) => {
if (!allowedRoles.includes(req.user.role)) {
return res.status(403).json({ error: 'Insufficient permissions' }); }
next(); };}
// Usageapp.delete('/api/campaigns/:id', authenticate, authorize('admin', 'campaign_manager'), deleteCampaign
);Security Best Practices:
// Helmet.js for security headersconst helmet = require('helmet');app.use(helmet());// Input validationconst { body, validationResult } = require('express-validator');app.post('/api/campaigns', authenticate, body('name').trim().isLength({ min: 1, max: 100 }), body('budget').isFloat({ gt: 0 }), async (req, res) => {
const errors = validationResult(req); if (!errors.isEmpty()) {
return res.status(400).json({ errors: errors.array() }); }
// ... }
);// CORS configurationconst cors = require('cors');app.use(cors({
origin: process.env.ALLOWED_ORIGINS.split(','), credentials: true}));Expected Outcomes:
- Security: JWT tokens, tenant isolation, rate limiting
- User Experience: Token refresh for seamless sessions
- Scalability: Redis-backed rate limiting
- Compliance: RBAC for fine-grained permissions
Problem Solving & Debugging
10. Production Incident Response
Level: All Levels
Difficulty: Moderate (Behavioral)
Source: Standard Behavioral Question
Team: All Teams
Interview Round: Behavioral Assessment
Question: “Describe a time you debugged a critical production issue under pressure. Walk through your process from alert to resolution.”
Concise Answer (STAR Method):
Situation:
“At my previous role, I received a 2 AM alert that our campaign API response times spiked from 200ms to 15+ seconds. This API serves real-time ad requests, so the degradation was causing ~$5,000/hour revenue loss.”
Task:
“As the on-call engineer, I needed to quickly diagnose and restore service within our 99.9% uptime SLA.”
Action:
1. Confirm & Scope (5 minutes)
- Verified alert in monitoring dashboards (DataDog)
- Confirmed: p99 latency at 18s, error rate 12%
- Affected all API endpoints
- No recent deployments (ruled out bad code)
2. Form Hypotheses
- Database slowdown (most common)
- External service degradation
- Memory leak causing GC pauses
- Network issues
3. Gather Data (10 minutes)
# Database healthSELECT * FROM pg_stat_activity WHERE state = 'active';# Result: Normal query times, no locks, CPU 40%# Application logstail -f /var/log/app.log | grep ERROR
# Result: Timeout errors from external targeting service# Application metricscurl localhost:9090/metrics | grep memory
# Result: Memory usage normal4. Root Cause Identified
- External targeting service responding in 30+ seconds
- Our API waited synchronously (no timeout configured)
- Blocked threads caused request queue buildup
5. Immediate Mitigation (2 minutes)
// Deployed feature flag to disable external callsconfig.targeting.enabled = false;// Fallback to cached targeting dataconst targeting = await cache.get(`targeting:${userId}`)
|| getDefaultTargeting();Response times dropped to 300ms within 2 minutes.
6. Communication
- Posted status updates in Slack incident channel every 10 minutes
- Updated status page for external clients
- Notified manager after extending beyond 30 minutes
7. Long-Term Fix (Next Day)
// Implemented circuit breakerconst targetingBreaker = new CircuitBreaker(targetingService, {
failureThreshold: 3, timeout: 3000, // Fail fast after 3s fallback: getCachedTargeting
});// Added monitoringmetrics.histogram('external.targeting.latency');alerting.addRule('targeting.latency > 1000ms for 2m');Result:
Immediate:
- Restored service in 15 minutes from alert
- Prevented estimated $1,250 revenue loss
- No customer data loss
Long-Term:
- Circuit breaker prevented 2 subsequent outages in following months
- Received VP Engineering recognition for incident response
Lessons Learned:
1. Always implement timeouts for external dependencies
2. Design for failure with fallbacks and graceful degradation
3. Clear communication reduces stakeholder anxiety
4. Blameless post-mortems drive continuous improvement
Post-Mortem Actions:
- Documented timeline and root cause
- Added circuit breakers to all external service calls
- Implemented synthetic monitoring for external dependencies
- Reduced default timeout from 30s to 3s across all services
Key Takeaway: Assume all dependencies will fail and design accordingly.
End of WPP Software Engineer Interview Guide
This comprehensive guide covers essential skills for WPP software engineering roles across Platform Engineering, Backend Development, Frontend Development, DevOps, and Data Engineering teams at agencies including Hogarth Technology, AKQA Engineering, Choreograph, GroupM Technology, and VML.