Walmart Software Engineer

Q: Design Walmart’s omnichannel platform architecture that seamlessly integrates online shopping (Walmart.com, app), in-store experiences (POS, kiosks, Scan & Go), BOPIS (Buy Online Pickup In-Store), curbside pickup, and delivery services. The system must maintain consistent inventory visibility, unified customer profiles, synchronized pricing and promotions, order fulfillment across channels, and real-time data synchronization. Handle 5,000+ physical stores, 270M+ weekly customers, peak traffic of 1M concurrent users, and ensure <99.99% uptime across all channels.

Functional Requirements: - Unified product catalog across all channels - Real-time inventory sync (online, in-store, warehouse) - Unified customer profile and cart across devices - Order fulfillment options (ship, pickup, delivery) - Consistent pricing and promotions - Support BOPIS, curbside pickup, Scan & Go Non-Functional Requirements: - Scale: 5K+ stores, 270M+ users, 1M concurrent, 100M orders/day - Latency: <200ms API responses, <1s inventory sync - Availability: 99.99% uptime, zero downti

1. Design a high-scale, consistent, distributed inventory system supporting BOPIS for 5,000+ locations under peak load

Difficulty Level: Extreme

Engineering Level: Senior Software Engineer, Staff Engineer, Principal Engineer

Source: GeeksforGeeks (2024), SystemDesignHandbook.com (November 2024), Reddit r/cscareerquestions

Team: Supply Chain Technology, Store Systems

Interview Round: System Design Round (Round 2 or 3)

Question: “Design a distributed system that tracks inventory in real-time across Walmart’s 5,000+ stores and distribution centers. The system must handle millions of concurrent updates, ensure consistency between online and physical inventory, support BOPIS (Buy-Online-Pickup-In-Store), and maintain high availability during Black Friday traffic spikes.”

Answer Framework

Requirements Clarification

Functional Requirements:
- Real-time inventory updates across all channels (in-store POS, online orders, warehouse transfers)
- Support for 5,000+ stores + 150 distribution centers
- BOPIS reservation system with 15-min hold periods
- Multi-channel inventory visibility (web, mobile, in-store kiosks)
- RFID integration for automated tracking (16% stockout reduction)

Non-Functional Requirements:
- Scale: 10M+ updates/day, 100K+ concurrent requests, P95 latency <150ms reads/<800ms writes
- Availability: 99.99% uptime with auto-scaling during Black Friday (10x traffic)
- Consistency: Eventual consistency for reads (5-min cache), strong consistency for reservations
- Cost: ~$50K/month infrastructure (PostgreSQL, Redis, Kafka)

Key Design Decisions:
- CAP Theorem: Chose AP (Availability + Partition Tolerance) over strong consistency
- Database: Shard by store_id across 10 nodes, optimistic locking with version field
- Caching: Redis with 5-min TTL, 85%+ hit rate, cache-aside pattern
- Concurrency: Distributed locks (Redis) for updates, timeout 5s
- Monitoring: Track inventory accuracy >99%, cache hit rate, consumer lag <1 min

System Architecture

High-Level Design:

┌─────────────────────────────────────────────────────────────────┐
│                         CLIENT LAYER                             │
│  [Web App] [Mobile App] [In-Store POS] [Store Associate Devices]│
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      API GATEWAY LAYER                           │
│           [Load Balancer] → [API Gateway Cluster]                │
│         Rate Limiting (100 req/s/user) | Auth | Routing         │
└─────────────────────────────────────────────────────────────────┘
                              │
                ┌─────────────┼─────────────┐
                ▼             ▼             ▼
    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
    │  Inventory   │  │  Reservation │  │   Analytics  │
    │   Service    │  │   Service    │  │   Service    │
    └──────────────┘  └──────────────┘  └──────────────┘
            │                 │                 │
            └─────────────────┼─────────────────┘
                              ▼
        ┌────────────────────────────────────────────┐
        │         MESSAGE QUEUE (Kafka)              │
        │   Topics: inventory-updates, reservations  │
        │   10 consumers/group, 30-day retention     │
        └────────────────────────────────────────────┘
                              │
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
    │  Warehouse   │  │   Real-Time  │  │   Supplier   │
    │  Management  │  │   Processor  │  │  Integration │
    └──────────────┘  └──────────────┘  └──────────────┘
                              │
                              ▼
        ┌────────────────────────────────────────────┐
        │          DATA PERSISTENCE LAYER            │
        │  [PostgreSQL Cluster - 10 shards by store] │
        │  [Redis Cache - 5min TTL] [Cassandra]      │
        └────────────────────────────────────────────┘

Scalability & Performance:
- Throughput: 100K reads/sec (cached), 10K writes/sec (Kafka buffered)
- Auto-scaling: Kubernetes HPA, pre-warm 2hrs before Black Friday
- Degraded mode: Serve cached data if DB unavailable
- Event sourcing: Complete audit trail in Cassandra, replay capability

Code

Inventory Service (Java + Spring Boot):

@Servicepublic class InventoryService {    @Autowired private InventoryRepository inventoryRepo;    @Autowired private RedisTemplate redisTemplate;    @Autowired private DistributedLockManager lockManager;    // Get inventory with cache-aside pattern    public InventoryResponse getInventory(String productId, String storeId) {        String cacheKey = "inventory:" + productId + ":" + storeId;        InventoryCache cached = redisTemplate.opsForValue().get(cacheKey);        if (cached != null && !cached.isExpired()) {            return InventoryResponse.fromCache(cached);        }        Inventory inventory = inventoryRepo.findByProductAndStore(productId, storeId);        redisTemplate.opsForValue().set(cacheKey, inventory, 300, TimeUnit.SECONDS);        return InventoryResponse.from(inventory);    }    // Update inventory with distributed locking    @Transactional    public void updateInventory(InventoryUpdateRequest request) {        String lockKey = "inventory:lock:" + request.getProductId() + ":" + request.getStoreId();        if (!lockManager.tryLock(lockKey, 5, TimeUnit.SECONDS)) {            throw new InventoryLockException("Could not acquire lock");        }        try {            Inventory inventory = inventoryRepo.findByProductAndStoreForUpdate(                request.getProductId(), request.getStoreId()            );            int newQuantity = inventory.getAvailableQuantity() + request.getQuantityDelta();            if (newQuantity < 0) throw new InsufficientInventoryException();            inventory.setAvailableQuantity(newQuantity);            inventory.setVersion(inventory.getVersion() + 1);            inventoryRepo.save(inventory);            // Publish event to Kafka            kafkaTemplate.send("inventory-updates", createEvent(request, newQuantity));            // Invalidate cache            redisTemplate.delete("inventory:" + request.getProductId() + ":" + request.getStoreId());        } finally {            lockManager.unlock(lockKey);        }    }    // Reserve inventory for BOPIS with TTL    @Transactional    public ReservationResponse reserveInventory(ReservationRequest request) {        Inventory inventory = inventoryRepo.findByProductAndStoreForUpdate(            request.getProductId(), request.getStoreId()        );        if (inventory.getAvailableQuantity() < request.getQuantity()) {            return ReservationResponse.failed("Insufficient inventory");        }        Reservation reservation = Reservation.builder()            .reservationId(UUID.randomUUID().toString())            .expiresAt(Instant.now().plus(15, ChronoUnit.MINUTES))            .build();        inventory.setAvailableQuantity(inventory.getAvailableQuantity() - request.getQuantity());        inventory.setReservedQuantity(inventory.getReservedQuantity() + request.getQuantity());        inventoryRepo.save(inventory);        redisTemplate.opsForValue().set("reservation:" + reservation.getReservationId(),            reservation, 15, TimeUnit.MINUTES);        return ReservationResponse.success(reservation.getReservationId());    }}

Database Schema (PostgreSQL):

-- Inventory table with sharding by store_id for horizontal scalingCREATE TABLE inventory (
    id BIGSERIAL PRIMARY KEY,
    product_id VARCHAR(50) NOT NULL,
    store_id VARCHAR(20) NOT NULL,
    available_quantity INT NOT NULL DEFAULT 0,
    reserved_quantity INT NOT NULL DEFAULT 0,
    total_quantity INT GENERATED ALWAYS AS (available_quantity + reserved_quantity) STORED,
    last_updated TIMESTAMP NOT NULL DEFAULT NOW(),
    version INT NOT NULL DEFAULT 1, -- For optimistic locking    CONSTRAINT positive_quantity CHECK (available_quantity >= 0),
    CONSTRAINT unique_product_store UNIQUE (product_id, store_id)
) PARTITION BY HASH (store_id);
-- Create partitions for horizontal scalingCREATE TABLE inventory_p0 PARTITION OF inventory FOR VALUES WITH (MODULUS 10, REMAINDER 0);
CREATE TABLE inventory_p1 PARTITION OF inventory FOR VALUES WITH (MODULUS 10, REMAINDER 1);
-- ... up to p9-- Indexes for fast lookupsCREATE INDEX idx_inventory_product ON inventory (product_id);
CREATE INDEX idx_inventory_store ON inventory (store_id);
CREATE INDEX idx_inventory_updated ON inventory (last_updated);
-- Event sourcing table for audit trailCREATE TABLE inventory_events (
    event_id UUID PRIMARY KEY,
    event_type VARCHAR(20) NOT NULL, -- SALE, RESTOCK, RETURN, TRANSFER    product_id VARCHAR(50) NOT NULL,
    store_id VARCHAR(20) NOT NULL,
    quantity_delta INT NOT NULL,
    new_quantity INT NOT NULL,
    source VARCHAR(20) NOT NULL, -- POS, ONLINE, WAREHOUSE, RFID    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    metadata JSONB
) PARTITION BY RANGE (timestamp);
-- Partition by month for efficient archivalCREATE TABLE inventory_events_2025_01 PARTITION OF inventory_events
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

Kafka Event Consumer:

@Servicepublic class InventoryEventConsumer {    @KafkaListener(topics = "inventory-updates", concurrency = "10")    public void processInventoryEvent(InventoryEvent event) {        try {            analyticsService.recordInventoryChange(event);            // Low stock alert and auto-reorder            if (event.getNewQuantity() <= 10) {                notificationService.sendLowStockAlert(event);                if (isAutoReorderEnabled(event.getProductId())) {                    supplierService.createPurchaseOrder(event);                }            }            updateSearchIndex(event);        } catch (Exception e) {            kafkaTemplate.send("inventory-updates-dlq", event);        }    }}

RFID Integration:

@Servicepublic class RFIDInventoryService {    // Process RFID scans for real-time tracking (16% stockout reduction)    public void processRFIDScan(RFIDScanEvent scanEvent) {        Map<String, Integer> productCounts = groupByProduct(scanEvent.getElectronicProductCodes());        for (Map.Entry<String, Integer> entry : productCounts.entrySet()) {            InventoryReconciliation reconciliation = reconcileInventory(                entry.getKey(), scanEvent.getStoreId(), entry.getValue()            );            if (reconciliation.hasDrift()) {                inventoryService.updateInventory(                    InventoryUpdateRequest.rfidReconciliation(entry.getKey(),
                        scanEvent.getStoreId(), reconciliation.getDrift())                );                auditService.logInventoryDrift(reconciliation);            }        }    }}

2. Design a rating and review system for e-commerce that prevents fake reviews at scale

Difficulty Level: Very Hard

Engineering Level: Software Engineer III (SDE-3), Senior Engineer

Source: YouTube interview by Sanket Singh (March 2024), Reddit r/developersIndia

Team: E-commerce Platform, Marketplace

Interview Round: Technical Round 2 - High Level Design (45 minutes)

Question: “Design a complete rating and review system for Walmart’s e-commerce platform handling millions of products. The system must prevent fake reviews, scale to support different product categories (groceries to electronics), allow verified purchase reviews, implement fraud detection mechanisms, and be extensible for Walmart Marketplace integration.”

Answer Framework

Requirements Clarification

Functional Requirements:
- Submit reviews with star rating (1-5), text, images/videos
- Verified purchase badge for authenticated buyers
- Review helpfulness voting (upvote/downvote)
- Seller response capability
- Filter/sort reviews (most helpful, recent, rating)
- Report inappropriate reviews

Non-Functional Requirements:
- Scale: Support 10M+ products, 100M+ reviews, 1M writes/day
- Fraud Detection: ML-based fake review detection (>95% accuracy)
- Latency: <100ms review retrieval, <2s review submission with fraud check
- Availability: 99.9% uptime, eventual consistency acceptable
- Cost: ~$30K/month (database, ML inference, storage)

Key Design Decisions:
- Database: PostgreSQL for reviews + Elasticsearch for search/filtering
- Fraud Detection: Real-time ML scoring, block suspicious patterns (IP clustering, burst reviews)
- Storage: S3 for images/videos with CDN caching
- Rate Limiting: 5 reviews/product/user, 10 reviews/day/user
- Moderation: Hybrid approach (ML auto-moderation + human review queue)

System Architecture

High-Level Design:

┌──────────────────────────────────────────────────────────┐
│              CLIENT LAYER                                 │
│  [Web App] [Mobile App] [Seller Dashboard]               │
└──────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────┐
│              API GATEWAY                                  │
│  Rate Limiting | Authentication | Request Validation     │
└──────────────────────────────────────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        ▼                 ▼                 ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   Review     │  │    Fraud     │  │   Search     │
│   Service    │  │  Detection   │  │   Service    │
└──────────────┘  └──────────────┘  └──────────────┘
        │                 │                 │
        └─────────────────┼─────────────────┘
                          ▼
        ┌────────────────────────────────────────┐
        │         MESSAGE QUEUE (Kafka)          │
        │  Topics: review-submitted, moderation  │
        └────────────────────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        ▼                 ▼                 ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│  ML Fraud    │  │  Moderation  │  │  Analytics   │
│  Pipeline    │  │   Queue      │  │   Service    │
└──────────────┘  └──────────────┘  └──────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────┐
│            DATA LAYER                                     │
│ [PostgreSQL - Reviews] [Elasticsearch - Search]          │
│ [Redis - Cache] [S3 - Media] [ML Models - SageMaker]    │
└──────────────────────────────────────────────────────────┘

Scalability & Performance:
- Read Optimization: Elasticsearch with aggregations for filtering, Redis cache for popular products
- Write Throughput: Async processing via Kafka, batch ML inference
- Media Handling: Direct S3 upload with pre-signed URLs, CloudFront CDN
- Fraud Detection: Real-time scoring <500ms, batch analysis nightly

Code

Review Service (Java + Spring Boot):

@Servicepublic class ReviewService {    @Autowired private ReviewRepository reviewRepo;    @Autowired private FraudDetectionService fraudService;    // Submit review with fraud detection    @Transactional    public ReviewResponse submitReview(ReviewRequest request) {        validateReviewEligibility(request.getUserId(), request.getProductId());        Review review = Review.builder()            .id(UUID.randomUUID().toString())            .productId(request.getProductId())            .userId(request.getUserId())            .rating(request.getRating())            .content(request.getContent())            .isVerifiedPurchase(orderService.hasUserPurchased(request.getUserId(), request.getProductId()))            .status(ReviewStatus.PENDING)            .build();        // Fraud detection scoring        FraudScore fraudScore = fraudService.calculateFraudScore(review, request.getMetadata());        review.setFraudScore(fraudScore.getScore());        if (fraudScore.getScore() > 0.8) {            review.setStatus(ReviewStatus.REJECTED);        } else if (fraudScore.getScore() > 0.5) {            review.setStatus(ReviewStatus.PENDING_MANUAL_REVIEW);        } else {            review.setStatus(ReviewStatus.APPROVED);        }        reviewRepo.save(review);        kafkaTemplate.send("review-submitted", ReviewEvent.from(review));        redisTemplate.delete("reviews:" + request.getProductId());        return ReviewResponse.success(review.getId(), review.getStatus());    }    // Get reviews with caching    public ReviewListResponse getReviews(String productId, ReviewFilter filter) {        String cacheKey = "reviews:" + productId + ":" + filter;        ReviewStats cached = redisTemplate.opsForValue().get(cacheKey);        if (cached != null) return ReviewListResponse.fromCache(cached);        SearchHits<Review> hits = elasticsearchTemplate.search(buildQuery(productId, filter), Review.class);        ReviewStats stats = calculateReviewStats(productId, hits);        redisTemplate.opsForValue().set(cacheKey, stats, 10, TimeUnit.MINUTES);        return ReviewListResponse.from(stats);    }}

Fraud Detection Service (Python + ML):

from sklearn.ensemble import RandomForestClassifier
import redis
class FraudDetectionService:
    def __init__(self):
        self.model = self.load_trained_model()
        self.redis = redis.Redis()
    def calculate_fraud_score(self, review, metadata):
        # ML model + rule-based scoring        ml_score = self.model.predict_proba([self.extract_features(review)])[0][1]
        pattern_score = self.check_suspicious_patterns(review, metadata)
        return FraudScore(score=0.7 * ml_score + 0.3 * pattern_score)
    def extract_features(self, review):
        return [
            len(review.content), review.rating,
            self.calculate_sentiment_score(review.content),
            self.get_user_review_count(review.user_id),
            self.get_account_age_days(review.user_id),
            int(review.is_verified_purchase),
            self.calculate_text_similarity(review),
            self.get_review_velocity(review.user_id)
        ]
    def check_suspicious_patterns(self, review, metadata):
        score = 0.0        # IP burst detection        ip_key = f"reviews:ip:{metadata.ip_address}:24h"        if self.redis.incr(ip_key) > 10:
            score += 0.4        self.redis.expire(ip_key, 86400)
        # New account with many reviews        if self.get_account_age_days(review.user_id) < 7 and self.get_user_review_count(review.user_id) > 5:
            score += 0.3        # Template text detection        if self.is_template_text(review.content):
            score += 0.2        # Extreme sentiment pattern        ratings = self.get_user_rating_history(review.user_id)
        if len(ratings) > 3 and len(set(ratings)) == 1:
            score += 0.2        return min(score, 1.0)

Database Schema (PostgreSQL):

-- Reviews tableCREATE TABLE reviews (
    id VARCHAR(36) PRIMARY KEY,
    product_id VARCHAR(50) NOT NULL,
    user_id VARCHAR(36) NOT NULL,
    rating INT NOT NULL CHECK (rating >= 1 AND rating <= 5),
    title VARCHAR(200),
    content TEXT NOT NULL,
    is_verified_purchase BOOLEAN DEFAULT FALSE,
    status VARCHAR(20) NOT NULL, -- PENDING, APPROVED, REJECTED, PENDING_MANUAL_REVIEW    fraud_score DECIMAL(3,2),
    rejection_reason TEXT,
    helpful_count INT DEFAULT 0,
    not_helpful_count INT DEFAULT 0,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
    CONSTRAINT unique_user_product UNIQUE (user_id, product_id)
);
CREATE INDEX idx_reviews_product ON reviews (product_id, status, created_at DESC);
CREATE INDEX idx_reviews_user ON reviews (user_id);
CREATE INDEX idx_reviews_fraud ON reviews (fraud_score) WHERE fraud_score > 0.5;
-- Review media (images/videos)CREATE TABLE review_media (
    id BIGSERIAL PRIMARY KEY,
    review_id VARCHAR(36) NOT NULL REFERENCES reviews(id) ON DELETE CASCADE,
    media_type VARCHAR(10) NOT NULL, -- IMAGE, VIDEO    s3_key VARCHAR(500) NOT NULL,
    cdn_url VARCHAR(500) NOT NULL,
    uploaded_at TIMESTAMP NOT NULL DEFAULT NOW()
);
-- Review helpfulness votesCREATE TABLE review_votes (
    review_id VARCHAR(36) NOT NULL REFERENCES reviews(id) ON DELETE CASCADE,
    user_id VARCHAR(36) NOT NULL,
    vote_type VARCHAR(10) NOT NULL, -- HELPFUL, NOT_HELPFUL    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    PRIMARY KEY (review_id, user_id)
);
-- Seller responsesCREATE TABLE seller_responses (
    id BIGSERIAL PRIMARY KEY,
    review_id VARCHAR(36) NOT NULL REFERENCES reviews(id) ON DELETE CASCADE,
    seller_id VARCHAR(36) NOT NULL,
    response_text TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    CONSTRAINT unique_review_response UNIQUE (review_id)
);

3. Implement a concurrent thread pool executor with dynamic task scheduling

Difficulty Level: Very Hard

Engineering Level: Staff Engineer, Principal Engineer

Source: LinkedIn post documenting Walmart Staff Engineer interview (September 2024)

Team: Walmart Global Tech, Platform Engineering

Interview Round: Low Level Design Round

Question: “Design and implement from scratch a thread pool executor that schedules tasks dynamically as soon as a thread becomes free. The implementation must handle thread lifecycle management, task queuing, load balancing across threads, graceful shutdown, exception handling, and support for different task priorities.”

Answer Framework

Requirements Clarification

Functional Requirements:
- Fixed-size thread pool with configurable threads (e.g., 10 threads)
- Task queue with priority support (HIGH, NORMAL, LOW)
- Dynamic task scheduling (assign task when thread becomes free)
- Graceful shutdown (finish running tasks, reject new tasks)
- Task rejection policy when queue is full
- Task timeout support

Non-Functional Requirements:
- Thread Safety: No race conditions, proper synchronization
- Performance: Minimal locking overhead, <1ms task dispatch latency
- Memory: Bounded queue to prevent OOM (max 10,000 tasks)
- Monitoring: Track completed tasks, queue size, thread utilization

Key Design Decisions:
- Queue: PriorityBlockingQueue for task ordering
- Synchronization: ReentrantLock with Condition variables for thread coordination
- Rejection Policy: Throw exception when queue full (configurable)
- Shutdown: Two-phase (shutdown() vs shutdownNow())

System Architecture

Component Design:

┌────────────────────────────────────────────────────────────┐
│           ThreadPoolExecutor                               │
│  - Worker threads pool                                     │
│  - Task queue (PriorityBlockingQueue)                      │
│  - Lifecycle management (RUNNING, SHUTDOWN, TERMINATED)    │
└────────────────────────────────────────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        ▼                 ▼                 ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   Worker 1   │  │   Worker 2   │  │   Worker N   │
│  (Thread)    │  │  (Thread)    │  │  (Thread)    │
└──────────────┘  └──────────────┘  └──────────────┘
        │                 │                 │
        └─────────────────┼─────────────────┘
                          ▼
        ┌────────────────────────────────────┐
        │   PriorityBlockingQueue            │
        │   [Task 1] [Task 2] ... [Task N]   │
        └────────────────────────────────────┘

Code

Thread Pool Executor Implementation (Java):

import java.util.concurrent.*;import java.util.concurrent.atomic.*;public class CustomThreadPoolExecutor {    private final PriorityBlockingQueue<PriorityTask> taskQueue;    private final List<Worker> workers;    private final AtomicInteger activeThreads = new AtomicInteger(0);    private final AtomicLong completedTasks = new AtomicLong(0);    private volatile ExecutorState state = ExecutorState.RUNNING;    private final ReentrantLock mainLock = new ReentrantLock();    enum ExecutorState { RUNNING, SHUTDOWN, TERMINATED }    public CustomThreadPoolExecutor(int corePoolSize, int maxQueueSize) {        this.taskQueue = new PriorityBlockingQueue<>(maxQueueSize);        this.workers = new ArrayList<>();        for (int i = 0; i < corePoolSize; i++) {            Worker worker = new Worker();            workers.add(worker);            worker.thread.start();        }    }    public Future<?> submit(Runnable task, TaskPriority priority) {        if (state != ExecutorState.RUNNING) {            throw new RejectedExecutionException("Executor shutting down");        }        PriorityTask priorityTask = new PriorityTask(task, priority);        if (!taskQueue.offer(priorityTask)) {            throw new RejectedExecutionException("Queue full");        }        return priorityTask.future;    }    public void shutdown() {        mainLock.lock();        try {            state = ExecutorState.SHUTDOWN;            workers.forEach(w -> w.thread.interrupt());        } finally {            mainLock.unlock();        }    }    // Worker thread    private class Worker implements Runnable {        final Thread thread = new Thread(this);        @Override        public void run() {            while (state != ExecutorState.TERMINATED) {                try {                    PriorityTask task = (state == ExecutorState.RUNNING)
                        ? taskQueue.take()
                        : taskQueue.poll(100, TimeUnit.MILLISECONDS);                    if (task != null) {                        activeThreads.incrementAndGet();                        try {                            task.run();                            completedTasks.incrementAndGet();                        } catch (Exception e) {                            task.future.completeExceptionally(e);                        } finally {                            activeThreads.decrementAndGet();                        }                    } else if (state == ExecutorState.SHUTDOWN) {                        break;                    }                } catch (InterruptedException e) {                    if (state == ExecutorState.TERMINATED) break;                }            }        }    }    // Priority task wrapper    static class PriorityTask implements Runnable, Comparable<PriorityTask> {        final Runnable task;        final TaskPriority priority;        final long sequenceNumber;        final CompletableFuture<Void> future = new CompletableFuture<>();        static final AtomicLong sequencer = new AtomicLong(0);        PriorityTask(Runnable task, TaskPriority priority) {            this.task = task;            this.priority = priority;            this.sequenceNumber = sequencer.getAndIncrement();        }        @Override        public void run() {            try {                task.run();                future.complete(null);            } catch (Exception e) {                future.completeExceptionally(e);                throw e;            }        }        @Override        public int compareTo(PriorityTask other) {            int result = Integer.compare(other.priority.value, this.priority.value);            return (result != 0) ? result : Long.compare(this.sequenceNumber, other.sequenceNumber);        }    }    enum TaskPriority {        LOW(1), NORMAL(5), HIGH(10);        final int value;        TaskPriority(int value) { this.value = value; }    }}

4. Design a high-concurrency ticket booking system handling race conditions and payment workflows

Difficulty Level: Very Hard

Engineering Level: Staff Engineer

Source: LinkedIn post on Walmart Staff Engineer interview (September 2024)

Team: Walmart Global Tech

Interview Round: System Design/High Level Design Round

Question: “Design a complete ticket booking system similar to BookMyShow in 45 minutes, covering seat selection with real-time availability, booking workflows, payment processing, handling high concurrency for popular shows, scalability for millions of users, and performance optimization. The system must prevent double-booking, handle payment failures gracefully, and provide responsive user experience.”

Answer Framework

Requirements Clarification

Functional Requirements:
- Browse shows, theaters, and showtimes
- View seat map with real-time availability
- Reserve seats (hold for 10 minutes)
- Process payment and confirm booking
- Cancel bookings with refunds

Non-Functional Requirements:
- Concurrency: Handle 10K concurrent users, prevent double-booking
- Latency: <500ms seat selection, <3s booking confirmation
- Availability: 99.9% uptime, strong consistency for reservations
- Scale: 1M bookings/day, 100K shows
- Cost: ~$40K/month (DB, Redis, compute)

Key Design Decisions:
- Locking: Pessimistic locks using Redis distributed locks with 10-min TTL
- Payment: Two-phase commit (reserve → pay → confirm)
- Idempotency: Unique booking IDs to handle payment retries
- Cleanup: Scheduled job to release expired reservations

System Architecture

High-Level Design:

┌────────────────────────────────────────────────────────┐
│            CLIENT LAYER                                │
│  [Web App] [Mobile App] [Booking Kiosk]               │
└────────────────────────────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│          API GATEWAY + LOAD BALANCER                   │
│  Rate Limiting (100 req/s) | Session Management       │
└────────────────────────────────────────────────────────┘
                        │
      ┌─────────────────┼─────────────────┐
      ▼                 ▼                 ▼
┌──────────┐    ┌──────────┐    ┌──────────┐
│ Show     │    │ Booking  │    │ Payment  │
│ Service  │    │ Service  │    │ Service  │
└──────────┘    └──────────┘    └──────────┘
      │                 │                 │
      └─────────────────┼─────────────────┘
                        ▼
      ┌────────────────────────────────────┐
      │     MESSAGE QUEUE (Kafka)          │
      │  Topics: bookings, payments        │
      └────────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│            DATA LAYER                                  │
│ [PostgreSQL - Bookings] [Redis - Locks/Cache]         │
│ [Elasticsearch - Search] [S3 - Tickets]               │
└────────────────────────────────────────────────────────┘

Scalability & Performance:
- Seat Locking: Redis SETNX with automatic expiry prevents double-booking
- Read Optimization: Cache seat maps, invalidate on booking
- Write Optimization: Async notifications via Kafka
- Payment Retry: Idempotent APIs with deduplication keys

Code

Booking Service (Java + Spring Boot):

@Servicepublic class BookingService {    @Autowired private RedisTemplate<String, String> redisTemplate;    @Autowired private BookingRepository bookingRepo;    private static final int SEAT_LOCK_DURATION_MINUTES = 10;    // Reserve seats with distributed locking    @Transactional    public ReservationResponse reserveSeats(ReservationRequest request) {        List<String> lockedSeats = new ArrayList<>();        try {            // Try to acquire locks for all seats            for (String seatId : request.getSeatIds()) {                String lockKey = "seat:lock:" + request.getShowId() + ":" + seatId;                Boolean lockAcquired = redisTemplate.opsForValue().setIfAbsent(                    lockKey, request.getUserId(), SEAT_LOCK_DURATION_MINUTES, TimeUnit.MINUTES                );                if (Boolean.FALSE.equals(lockAcquired)) {                    releaseLocks(request.getShowId(), lockedSeats);                    return ReservationResponse.failed("Seat " + seatId + " already reserved");                }                lockedSeats.add(seatId);            }            // Create reservation            Reservation reservation = Reservation.builder()                .id(UUID.randomUUID().toString())                .showId(request.getShowId())                .userId(request.getUserId())                .seatIds(request.getSeatIds())                .status(ReservationStatus.PENDING_PAYMENT)                .expiresAt(Instant.now().plus(SEAT_LOCK_DURATION_MINUTES, ChronoUnit.MINUTES))                .build();            bookingRepo.save(reservation);            return ReservationResponse.success(reservation.getId(), reservation.getExpiresAt());        } catch (Exception e) {            releaseLocks(request.getShowId(), lockedSeats);            throw new BookingException("Failed to reserve seats", e);        }    }    // Confirm booking after payment    @Transactional    public BookingResponse confirmBooking(String reservationId, PaymentResponse payment) {        Reservation reservation = bookingRepo.findById(reservationId)            .orElseThrow(() -> new NotFoundException("Reservation not found"));        if (reservation.getExpiresAt().isBefore(Instant.now())) {            releaseLocks(reservation.getShowId(), reservation.getSeatIds());            throw new ReservationExpiredException("Reservation expired");        }        if (!paymentService.verifyPayment(payment.getPaymentId())) {            throw new PaymentVerificationException("Payment failed");        }        Booking booking = Booking.builder()            .id(UUID.randomUUID().toString())            .showId(reservation.getShowId())            .userId(reservation.getUserId())            .seatIds(reservation.getSeatIds())            .paymentId(payment.getPaymentId())            .status(BookingStatus.CONFIRMED)            .build();        bookingRepo.save(booking);        updateSeatStatus(reservation.getShowId(), reservation.getSeatIds(), SeatStatus.BOOKED);        releaseLocks(reservation.getShowId(), reservation.getSeatIds());        kafkaTemplate.send("bookings", BookingEvent.from(booking));        return BookingResponse.success(booking.getId());    }    // Cleanup expired reservations    @Scheduled(fixedRate = 60000)    public void cleanupExpiredReservations() {        List<Reservation> expired = bookingRepo.findExpiredReservations(Instant.now());        expired.forEach(r -> {            releaseLocks(r.getShowId(), r.getSeatIds());            r.setStatus(ReservationStatus.EXPIRED);            bookingRepo.save(r);        });    }    private void releaseLocks(String showId, List<String> seatIds) {        seatIds.forEach(seatId ->
            redisTemplate.delete("seat:lock:" + showId + ":" + seatId)        );    }}

Database Schema (PostgreSQL):

CREATE TABLE shows (
    id VARCHAR(36) PRIMARY KEY,
    movie_id VARCHAR(36) NOT NULL,
    theater_id VARCHAR(36) NOT NULL,
    show_time TIMESTAMP NOT NULL,
    available_seats INT NOT NULL,
    total_seats INT NOT NULL);
CREATE TABLE seats (
    id VARCHAR(36) PRIMARY KEY,
    show_id VARCHAR(36) NOT NULL REFERENCES shows(id),
    row_number VARCHAR(5) NOT NULL,
    seat_number INT NOT NULL,
    status VARCHAR(20) NOT NULL, -- AVAILABLE, RESERVED, BOOKED    CONSTRAINT unique_seat UNIQUE (show_id, row_number, seat_number)
);
CREATE TABLE reservations (
    id VARCHAR(36) PRIMARY KEY,
    show_id VARCHAR(36) NOT NULL,
    user_id VARCHAR(36) NOT NULL,
    seat_ids TEXT[] NOT NULL,
    status VARCHAR(20) NOT NULL,
    expires_at TIMESTAMP NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_reservations_expires ON reservations (expires_at)
    WHERE status = 'PENDING_PAYMENT';
CREATE TABLE bookings (
    id VARCHAR(36) PRIMARY KEY,
    show_id VARCHAR(36) NOT NULL,
    user_id VARCHAR(36) NOT NULL,
    seat_ids TEXT[] NOT NULL,
    payment_id VARCHAR(36) NOT NULL,
    status VARCHAR(20) NOT NULL,
    booked_at TIMESTAMP NOT NULL DEFAULT NOW()
);

5. Design a secure document vault with granular access controls and encryption

Difficulty Level: Hard

Engineering Level: Senior Software Engineer (SDE-3)

Source: Reddit r/developersIndia (September 2024)

Team: Walmart US - Full Stack Engineering

Interview Round: System Design Round 2 (conducted by Distinguished Engineer)

Question: “Design a secure document storage system similar to Google Drive or Dropbox with focus on multi-user access control. The system must support granular permissions (read, write, delete, share), hierarchical folder structures, sharing documents with different permission levels, encryption at rest and in transit, audit logging for compliance, and version control.”

Answer Framework

Requirements Clarification

Functional Requirements:
- Upload/download documents and folders
- Hierarchical folder structure (nested folders)
- Granular permissions (READ, WRITE, DELETE, SHARE)
- Share files/folders with specific users or groups
- Version control (track document changes)
- Search documents by name, content, tags

Non-Functional Requirements:
- Security: Encryption at rest (AES-256), in transit (TLS 1.3)
- Scale: 1M users, 100M documents, 10PB storage
- Latency: <100ms permission checks, <1s file access
- Compliance: SOX, GDPR, HIPAA audit trails
- Cost: ~$35K/month (S3, RDS, KMS)

Key Design Decisions:
- Storage: S3 for files, PostgreSQL for metadata
- Access Control: RBAC with permission inheritance from parent folders
- Encryption: Envelope encryption (DEK + CMK via AWS KMS)
- Versioning: S3 versioning + metadata tracking

System Architecture

High-Level Design:

┌────────────────────────────────────────────────────────┐
│            CLIENT LAYER                                │
│  [Web App] [Mobile App] [Desktop Sync]                │
└────────────────────────────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│          API GATEWAY + WAF                             │
│  TLS Termination | Rate Limiting | DDoS Protection    │
└────────────────────────────────────────────────────────┘
                        │
      ┌─────────────────┼─────────────────┐
      ▼                 ▼                 ▼
┌──────────┐    ┌──────────┐    ┌──────────┐
│ Document │    │ Access   │    │ Search   │
│ Service  │    │ Control  │    │ Service  │
└──────────┘    └──────────┘    └──────────┘
      │                 │                 │
      └─────────────────┼─────────────────┘
                        ▼
      ┌────────────────────────────────────┐
      │     ENCRYPTION SERVICE (KMS)       │
      │  Key Management | Encrypt/Decrypt  │
      └────────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│            DATA LAYER                                  │
│ [PostgreSQL - Metadata] [S3 - Files + Versioning]     │
│ [Redis - Permission Cache] [Elasticsearch - Search]   │
└────────────────────────────────────────────────────────┘

Scalability & Performance:
- Caching: Redis for permission checks (5-min TTL), CloudFront for downloads
- Sharding: Partition metadata by user_id for horizontal scaling
- Async Processing: Kafka for audit logs, virus scanning, thumbnail generation

Code

Access Control Service (Java + Spring Boot):

@Servicepublic class AccessControlService {    @Autowired private PermissionRepository permissionRepo;    @Autowired private RedisTemplate<String, Boolean> redisTemplate;    // Check permission with inheritance    public boolean hasPermission(String userId, String documentId, PermissionType required) {        String cacheKey = "perm:" + userId + ":" + documentId + ":" + required;        Boolean cached = redisTemplate.opsForValue().get(cacheKey);        if (cached != null) return cached;        // Check direct permissions        Optional<Permission> direct = permissionRepo.findByUserAndDocument(userId, documentId);        if (direct.isPresent() && direct.get().hasPermission(required)) {            cacheResult(cacheKey, true);            return true;        }        // Check inherited permissions from parent folders        List<String> parentFolders = getParentFolderHierarchy(documentId);        for (String folderId : parentFolders) {            Optional<Permission> folderPerm = permissionRepo.findByUserAndDocument(userId, folderId);            if (folderPerm.isPresent() && folderPerm.get().hasPermission(required)) {                cacheResult(cacheKey, true);                return true;            }        }        // Check group permissions        List<String> userGroups = getUserGroups(userId);        for (String groupId : userGroups) {            Optional<Permission> groupPerm = permissionRepo.findByGroupAndDocument(groupId, documentId);            if (groupPerm.isPresent() && groupPerm.get().hasPermission(required)) {                cacheResult(cacheKey, true);                return true;            }        }        cacheResult(cacheKey, false);        return false;    }    // Share document with another user    @Transactional    public ShareResponse shareDocument(ShareRequest request) {        if (!hasPermission(request.getOwnerId(), request.getDocumentId(), PermissionType.SHARE)) {            throw new UnauthorizedException("No share permission");        }        Permission permission = Permission.builder()            .id(UUID.randomUUID().toString())            .documentId(request.getDocumentId())            .userId(request.getTargetUserId())            .permissionType(request.getPermissionType())            .grantedBy(request.getOwnerId())            .grantedAt(Instant.now())            .build();        permissionRepo.save(permission);        invalidatePermissionCache(request.getTargetUserId(), request.getDocumentId());        auditService.logShare(request);        notificationService.sendShareNotification(request.getTargetUserId(), request.getDocumentId());        return ShareResponse.success(permission.getId());    }    private void cacheResult(String key, boolean result) {        redisTemplate.opsForValue().set(key, result, 5, TimeUnit.MINUTES);    }}enum PermissionType {    READ(1), WRITE(2), DELETE(3), SHARE(4), OWNER(5);    final int level;    PermissionType(int level) { this.level = level; }    public boolean isAtLeast(PermissionType other) {        return this.level >= other.level;    }}

Encryption Service:

@Servicepublic class EncryptionService {    @Autowired private KMSClient kmsClient;    // Encrypt file using envelope encryption    public EncryptedFile encryptFile(byte[] fileData, String userId) {        // Generate Data Encryption Key (DEK)        byte[] dataKey = kmsClient.generateDataKey(userId);        // Encrypt file with DEK using AES-256-GCM        Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");        SecretKeySpec keySpec = new SecretKeySpec(dataKey, "AES");        byte[] iv = generateIV();        GCMParameterSpec gcmSpec = new GCMParameterSpec(128, iv);        cipher.init(Cipher.ENCRYPT_MODE, keySpec, gcmSpec);        byte[] encryptedData = cipher.doFinal(fileData);        // Encrypt DEK with Customer Master Key (CMK)        byte[] encryptedDEK = kmsClient.encrypt(dataKey, userId);        // Clear plaintext DEK from memory        Arrays.fill(dataKey, (byte) 0);        return EncryptedFile.builder()            .encryptedData(encryptedData)            .encryptedDEK(encryptedDEK)            .iv(iv)            .build();    }    // Decrypt file    public byte[] decryptFile(EncryptedFile encryptedFile, String userId) {        byte[] dataKey = kmsClient.decrypt(encryptedFile.getEncryptedDEK(), userId);        Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");        SecretKeySpec keySpec = new SecretKeySpec(dataKey, "AES");        GCMParameterSpec gcmSpec = new GCMParameterSpec(128, encryptedFile.getIv());        cipher.init(Cipher.DECRYPT_MODE, keySpec, gcmSpec);        byte[] decryptedData = cipher.doFinal(encryptedFile.getEncryptedData());        Arrays.fill(dataKey, (byte) 0);        return decryptedData;    }}

Database Schema (PostgreSQL):

CREATE TABLE documents (
    id VARCHAR(36) PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    type VARCHAR(20) NOT NULL, -- FILE, FOLDER    parent_folder_id VARCHAR(36) REFERENCES documents(id),
    owner_id VARCHAR(36) NOT NULL,
    s3_key VARCHAR(500),
    encrypted_dek TEXT,
    file_size BIGINT,
    version INT DEFAULT 1,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_documents_parent ON documents (parent_folder_id);
CREATE INDEX idx_documents_owner ON documents (owner_id);
CREATE TABLE permissions (
    id VARCHAR(36) PRIMARY KEY,
    document_id VARCHAR(36) NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
    user_id VARCHAR(36),
    group_id VARCHAR(36),
    permission_type VARCHAR(20) NOT NULL, -- READ, WRITE, DELETE, SHARE, OWNER    granted_by VARCHAR(36) NOT NULL,
    granted_at TIMESTAMP NOT NULL DEFAULT NOW(),
    CONSTRAINT user_or_group CHECK (
        (user_id IS NOT NULL AND group_id IS NULL) OR        (user_id IS NULL AND group_id IS NOT NULL)
    )
);
CREATE INDEX idx_permissions_doc_user ON permissions (document_id, user_id);
CREATE TABLE audit_logs (
    id BIGSERIAL PRIMARY KEY,
    user_id VARCHAR(36) NOT NULL,
    document_id VARCHAR(36) NOT NULL,
    action VARCHAR(50) NOT NULL, -- VIEW, DOWNLOAD, UPLOAD, DELETE, SHARE    ip_address INET,
    timestamp TIMESTAMP NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (timestamp);

6. Design a container image storage system similar to Amazon ECR

Difficulty Level: Hard

Engineering Level: Senior Software Engineer

Source: Reddit r/developersIndia (September 2024)

Team: Walmart Global Tech - Infrastructure

Interview Round: System Design Round 3 (conducted by Staff Engineer)

Question: “Design a container image storage and distribution system like Amazon Elastic Container Registry (ECR). The system must handle container image layers, implement deduplication to save storage, support versioning and tagging, provide fast image pull/push operations at scale, implement access control and vulnerability scanning, and optimize for both regional and global distribution.”

Answer Framework

Requirements Clarification

Functional Requirements:
- Push/pull container images (Docker, OCI format)
- Image layer deduplication (same layers shared across images)
- Tagging and versioning (latest, v1.0, etc.)
- Registry API compatibility (Docker Registry v2)
- Vulnerability scanning integration
- Multi-region distribution

Non-Functional Requirements:
- Scale: 100K images, 1M layers, 1PB storage
- Latency: <2s image push, <5s pull (depending on size)
- Availability: 99.99% uptime
- Cost: ~$25K/month (S3, compute, data transfer)

Key Design Decisions:
- Storage: Content-addressable storage using SHA256 hashing
- Deduplication: Layer-level deduplication (save 60-70% storage)
- Distribution: Multi-region S3 + CloudFront CDN
- Access Control: IAM integration with registry tokens

System Architecture

High-Level Design:

┌────────────────────────────────────────────────────────┐
│            CLIENT LAYER                                │
│  [Docker CLI] [K8s] [CI/CD Pipelines]                 │
└────────────────────────────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│          API GATEWAY + AUTH                            │
│  Registry API v2 | Token Auth | Rate Limiting         │
└────────────────────────────────────────────────────────┘
                        │
      ┌─────────────────┼─────────────────┐
      ▼                 ▼                 ▼
┌──────────┐    ┌──────────┐    ┌──────────┐
│ Registry │    │  Scan    │    │ Replica  │
│ Service  │    │ Service  │    │ Service  │
└──────────┘    └──────────┘    └──────────┘
      │                 │                 │
      └─────────────────┼─────────────────┘
                        ▼
┌────────────────────────────────────────────────────────┐
│            DATA LAYER                                  │
│ [S3 - Content-Addressable Storage] [RDS - Metadata]   │
│ [Redis - Cache] [CloudFront - CDN]                    │
└────────────────────────────────────────────────────────┘

Scalability & Performance:
- Content Addressing: Store layers by SHA256 digest for deduplication
- CDN: CloudFront for fast global pulls
- Garbage Collection: Remove unreferenced layers periodically
- Multi-region: S3 cross-region replication for HA

Code

Registry Service (Go):

package registry
import (    "crypto/sha256"    "encoding/hex"    "io")type RegistryService struct {    s3Client     *S3Client
    metadataRepo *MetadataRepository
    cache        *RedisCache
}// Push image layer (content-addressable storage)func (r *RegistryService) PushLayer(imageRepo string, layerData io.Reader) (*LayerResponse, error) {    // Calculate SHA256 digest while reading    hash := sha256.New()    teeReader := io.TeeReader(layerData, hash)    // Upload to S3    tempKey := fmt.Sprintf("temp/%s", uuid.New().String())    err := r.s3Client.Upload(tempKey, teeReader)    if err != nil {        return nil, err
    }    // Get digest    digest := "sha256:" + hex.EncodeToString(hash.Sum(nil))    // Move to content-addressable location    layerKey := fmt.Sprintf("layers/%s", digest)    // Check if layer already exists (deduplication)    if !r.layerExists(layerKey) {        err = r.s3Client.Move(tempKey, layerKey)        if err != nil {            return nil, err
        }    } else {        r.s3Client.Delete(tempKey) // Layer already exists    }    // Update metadata    layer := &Layer{        Digest:     digest,        Repository: imageRepo,        Size:       size,        UploadedAt: time.Now(),    }    r.metadataRepo.SaveLayer(layer)    return &LayerResponse{Digest: digest, Location: layerKey}, nil}// Pull image layer with CDN supportfunc (r *RegistryService) PullLayer(digest string) (*LayerData, error) {    // Check cache    cacheKey := fmt.Sprintf("layer:%s", digest)    if cached, err := r.cache.Get(cacheKey); err == nil {        return cached, nil    }    // Get layer location    layerKey := fmt.Sprintf("layers/%s", digest)    // Generate pre-signed CloudFront URL    url, err := r.generateCDNUrl(layerKey)    if err != nil {        return nil, err
    }    layerData := &LayerData{        Digest:      digest,        DownloadURL: url,        Expires:     time.Now().Add(15 * time.Minute),    }    // Cache for future requests    r.cache.Set(cacheKey, layerData, 10*time.Minute)    return layerData, nil}// Push image manifestfunc (r *RegistryService) PushManifest(imageRepo, tag string, manifest *Manifest) error {    // Validate all layers exist    for _, layer := range manifest.Layers {        if !r.layerExists(fmt.Sprintf("layers/%s", layer.Digest)) {            return fmt.Errorf("layer %s not found", layer.Digest)        }    }    // Calculate manifest digest    manifestJSON, _ := json.Marshal(manifest)    manifestDigest := "sha256:" + sha256Hash(manifestJSON)    // Store manifest    manifestKey := fmt.Sprintf("manifests/%s/%s", imageRepo, manifestDigest)    err := r.s3Client.Upload(manifestKey, bytes.NewReader(manifestJSON))    if err != nil {        return err
    }    // Update tag pointer    tagKey := fmt.Sprintf("tags/%s/%s", imageRepo, tag)    err = r.metadataRepo.UpdateTag(imageRepo, tag, manifestDigest)    return err
}// Garbage collection - remove unreferenced layersfunc (r *RegistryService) GarbageCollect() error {    // Get all layer digests    allLayers, err := r.metadataRepo.GetAllLayerDigests()    if err != nil {        return err
    }    // Get referenced layers from manifests    referencedLayers := make(map[string]bool)    manifests, _ := r.metadataRepo.GetAllManifests()    for _, manifest := range manifests {        for _, layer := range manifest.Layers {            referencedLayers[layer.Digest] = true        }    }    // Delete unreferenced layers    for _, layerDigest := range allLayers {        if !referencedLayers[layerDigest] {            layerKey := fmt.Sprintf("layers/%s", layerDigest)            r.s3Client.Delete(layerKey)            r.metadataRepo.DeleteLayer(layerDigest)        }    }    return nil}

Database Schema (PostgreSQL):

CREATE TABLE repositories (
    id VARCHAR(36) PRIMARY KEY,
    name VARCHAR(255) NOT NULL UNIQUE,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE TABLE layers (
    digest VARCHAR(71) PRIMARY KEY, -- sha256:... (71 chars)    size BIGINT NOT NULL,
    media_type VARCHAR(100),
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE TABLE manifests (
    digest VARCHAR(71) PRIMARY KEY,
    repository_id VARCHAR(36) NOT NULL REFERENCES repositories(id),
    manifest_json JSONB NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_manifests_repo ON manifests (repository_id);
CREATE TABLE tags (
    id BIGSERIAL PRIMARY KEY,
    repository_id VARCHAR(36) NOT NULL REFERENCES repositories(id),
    tag_name VARCHAR(128) NOT NULL,
    manifest_digest VARCHAR(71) NOT NULL REFERENCES manifests(digest),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
    CONSTRAINT unique_repo_tag UNIQUE (repository_id, tag_name)
);
CREATE TABLE layer_references (
    manifest_digest VARCHAR(71) NOT NULL REFERENCES manifests(digest) ON DELETE CASCADE,
    layer_digest VARCHAR(71) NOT NULL REFERENCES layers(digest),
    layer_index INT NOT NULL,
    PRIMARY KEY (manifest_digest, layer_digest)
);
CREATE INDEX idx_layer_refs_layer ON layer_references (layer_digest);

7. Design a product recommendation engine at scale for e-commerce

Difficulty Level: Extreme

Engineering Level: Senior Engineer, Staff Engineer, ML Engineers

Source: SystemDesignHandbook.com (November 2024), Emerj AI interview with Walmart (January 2023)

Team: E-commerce, Data & Analytics, ML Engineering

Interview Round: System Design Round

Question: “Design Walmart’s product recommendation system that personalizes suggestions for millions of users across billions of products. The system must implement collaborative filtering (users who bought X also bought Y), content-based filtering (based on product attributes), hybrid approaches combining multiple techniques, handle cold start problems for new users/products, support real-time personalization based on current session, integrate A/B testing framework, and scale to handle Walmart’s traffic.”

Answer Framework

Requirements Clarification

Functional Requirements:
- Recommend products based on user history and behavior
- Support multiple recommendation types (you may like, frequently bought together, trending)
- Real-time personalization during browsing session
- A/B testing for recommendation algorithms
- Handle cold start (new users/products)

Non-Functional Requirements:
- Scale: 270M weekly users, billions of products, 100M recommendations/day
- Latency: <100ms for recommendation generation
- Accuracy: CTR >95%, conversion lift >10%
- Cost: ~$60K/month (compute, storage, ML infrastructure)

Key Design Decisions:
- Hybrid Approach: Collaborative filtering + content-based + contextual signals
- Architecture: Batch processing (Spark) + real-time serving (low-latency API)
- Cold Start: Content-based fallback + popularity-based recommendations
- A/B Testing: Multi-armed bandit for algorithm selection

System Architecture

High-Level Design:

┌────────────────────────────────────────────────────────┐
│            CLIENT LAYER                                │
│  [Web App] [Mobile App] [Email]                       │
└────────────────────────────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│          API GATEWAY                                   │
│  Recommendation API | Personalization                 │
└────────────────────────────────────────────────────────┘
                        │
      ┌─────────────────┼─────────────────┐
      ▼                 ▼                 ▼
┌──────────┐    ┌──────────┐    ┌──────────┐
│ Online   │    │  Model   │    │   A/B    │
│ Serving  │    │ Selection│    │  Testing │
└──────────┘    └──────────┘    └──────────┘
      │                 │                 │
      └─────────────────┼─────────────────┘
                        ▼
      ┌────────────────────────────────────┐
      │     FEATURE STORE (Redis)          │
      │  User Embeddings | Product Vectors│
      └────────────────────────────────────┘
                        │
      ┌─────────────────┼─────────────────┐
      ▼                 ▼                 ▼
┌──────────┐    ┌──────────┐    ┌──────────┐
│ Offline  │    │  Event   │    │ Product  │
│ Training │    │ Tracking │    │ Catalog  │
│ (Spark)  │    │ (Kafka)  │    │  (RDS)   │
└──────────┘    └──────────┘    └──────────┘

Scalability & Performance:
- Batch Processing: Nightly model training on Spark, pre-compute top-N recommendations
- Real-time: Stream processing with Kafka for immediate behavior updates
- Caching: Redis for user/product embeddings, 1-hour TTL
- A/B Testing: Thompson sampling for algorithm selection

Code

Recommendation Service (Python + FastAPI):

from fastapi import FastAPI
import numpy as np
import redis
app = FastAPI()
class RecommendationService:
    def __init__(self):
        self.redis = redis.Redis()
        self.product_catalog = ProductCatalog()
        self.ab_tester = ABTestingService()
    # Get recommendations with hybrid approach    async def get_recommendations(self, user_id: str, context: dict) -> List[Product]:
        # Select algorithm via A/B testing        algorithm = self.ab_tester.select_algorithm(user_id)
        # Get user embeddings        user_embedding = self.get_user_embedding(user_id)
        if user_embedding is None:
            # Cold start: use content-based + popularity            return self.cold_start_recommendations(context)
        # Collaborative filtering        collab_scores = self.collaborative_filtering(user_embedding)
        # Content-based filtering        content_scores = self.content_based_filtering(user_id, context)
        # Contextual signals (time, location, device)        contextual_boost = self.apply_contextual_signals(context)
        # Hybrid scoring        final_scores = (
            0.5 * collab_scores +
            0.3 * content_scores +
            0.2 * contextual_boost
        )
        # Get top-N products        top_products = self.get_top_n_products(final_scores, n=20)
        # Apply business rules (inventory, profitability)        filtered_products = self.apply_business_rules(top_products)
        # Track recommendation for feedback loop        self.track_recommendation(user_id, filtered_products, algorithm)
        return filtered_products[:10]
    def get_user_embedding(self, user_id: str) -> np.ndarray:
        # Try cache first        cached = self.redis.get(f"user:embedding:{user_id}")
        if cached:
            return np.frombuffer(cached, dtype=np.float32)
        # Fall back to database        embedding = self.load_user_embedding_from_db(user_id)
        if embedding is not None:
            self.redis.setex(
                f"user:embedding:{user_id}",
                3600,  # 1 hour TTL                embedding.tobytes()
            )
        return embedding
    def collaborative_filtering(self, user_embedding: np.ndarray) -> dict:
        # Get similar users        similar_users = self.find_similar_users(user_embedding, top_k=100)
        # Aggregate their purchases        product_scores = {}
        for similar_user, similarity in similar_users:
            user_purchases = self.get_user_purchases(similar_user)
            for product_id in user_purchases:
                product_scores[product_id] = product_scores.get(product_id, 0) + similarity
        return product_scores
    def content_based_filtering(self, user_id: str, context: dict) -> dict:
        # Get user's historical preferences        user_preferences = self.get_user_preferences(user_id)
        # Current browsing context        current_product_id = context.get('current_product')
        if current_product_id:
            # Find similar products            similar_products = self.find_similar_products(current_product_id, top_k=50)
            return {p.id: p.similarity_score for p in similar_products}
        # Match user preferences with product attributes        return self.match_preferences_to_products(user_preferences)
    def cold_start_recommendations(self, context: dict) -> List[Product]:
        # For new users: trending + context-based        category = context.get('category')
        if category:
            # Popular products in category            return self.get_trending_in_category(category, limit=10)
        else:
            # Overall trending products            return self.get_trending_products(limit=10)
    def apply_business_rules(self, products: List[Product]) -> List[Product]:
        filtered = []
        for product in products:
            # Check inventory            if product.in_stock:
                # Boost profitable items                if product.margin > 0.25:
                    product.score *= 1.2                filtered.append(product)
        return sorted(filtered, key=lambda p: p.score, reverse=True)
# Offline training job (Spark)class ModelTrainer:
    def train_user_embeddings(self):
        # Load interaction data        interactions = spark.read.parquet("s3://data/interactions/")
        # Matrix factorization (ALS algorithm)        als = ALS(
            rank=128,
            maxIter=10,
            regParam=0.01,
            userCol="user_id",
            itemCol="product_id",
            ratingCol="implicit_rating"        )
        model = als.fit(interactions)
        # Extract embeddings        user_embeddings = model.userFactors
        product_embeddings = model.itemFactors
        # Save to feature store        user_embeddings.write.format("redis").save()
        product_embeddings.write.format("redis").save()

A/B Testing Service:

import random
class ABTestingService:
    def __init__(self):
        self.algorithms = {
            'collab_heavy': {'weight': 0.7, 'content': 0.2, 'context': 0.1},
            'content_heavy': {'weight': 0.3, 'content': 0.5, 'context': 0.2},
            'balanced': {'weight': 0.5, 'content': 0.3, 'context': 0.2}
        }
        self.performance = {alg: {'ctr': 0.0, 'trials': 0} for alg in self.algorithms}
    def select_algorithm(self, user_id: str) -> str:
        # Thompson sampling (multi-armed bandit)        samples = {}
        for alg in self.algorithms:
            # Beta distribution based on performance            alpha = self.performance[alg].get('successes', 1)
            beta = self.performance[alg]['trials'] - alpha + 1            samples[alg] = np.random.beta(alpha, beta)
        # Select best sample        return max(samples, key=samples.get)
    def record_outcome(self, user_id: str, algorithm: str, clicked: bool):
        self.performance[algorithm]['trials'] += 1        if clicked:
            self.performance[algorithm]['successes'] = \                self.performance[algorithm].get('successes', 0) + 1

Database Schema (PostgreSQL):

CREATE TABLE user_interactions (
    id BIGSERIAL PRIMARY KEY,
    user_id VARCHAR(36) NOT NULL,
    product_id VARCHAR(36) NOT NULL,
    interaction_type VARCHAR(20) NOT NULL, -- VIEW, CART, PURCHASE    timestamp TIMESTAMP NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (timestamp);
CREATE INDEX idx_interactions_user ON user_interactions (user_id, timestamp DESC);
CREATE INDEX idx_interactions_product ON user_interactions (product_id);
CREATE TABLE recommendations_served (
    id BIGSERIAL PRIMARY KEY,
    user_id VARCHAR(36) NOT NULL,
    product_ids TEXT[] NOT NULL,
    algorithm_version VARCHAR(50) NOT NULL,
    context JSONB,
    served_at TIMESTAMP NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (served_at);
CREATE TABLE recommendation_feedback (
    id BIGSERIAL PRIMARY KEY,
    recommendation_id BIGINT NOT NULL,
    clicked BOOLEAN DEFAULT FALSE,
    purchased BOOLEAN DEFAULT FALSE,
    feedback_at TIMESTAMP NOT NULL DEFAULT NOW()
);

8. Design Walmart’s last-mile delivery system with multi-carrier orchestration

Difficulty Level: Extreme

Engineering Level: Senior Engineer, Staff Engineer, Distinguished Engineer

Source: DELIVER Event 2025 (November 2025), YouTube interview

Team: Supply Chain Technology, Walmart GoLocal, Last Mile Delivery

Interview Round: System Design/High Level Design Round

Question: “Design Walmart’s last-mile delivery orchestration platform that routes orders across 5,000+ stores using multiple delivery modes: in-house delivery vans, Spark Driver (Walmart’s gig platform), third-party delivery service providers (DSPs), and drone delivery (300,000+ deliveries to date). The system must provide real-time tracking, optimize delivery routes, handle carrier failover, scale to 100,000+ deliveries per day (1,000+ routes of ~100 stops each), achieve 98%+ on-time delivery rate, and support same-day, next-day, and scheduled deliveries.”

Answer Framework

Requirements Clarification

Functional Requirements:
- Assign orders to optimal delivery carriers (in-house, gig, DSP, drone)
- Optimize delivery routes (minimize time/cost)
- Real-time tracking and ETA updates
- Handle carrier failures (automatic reassignment)
- Support multiple delivery speeds (same-day, next-day, 2-hour)

Non-Functional Requirements:
- Scale: 100K+ deliveries/day, 1K+ routes, 5K+ stores
- Latency: <500ms carrier assignment, <2s route optimization
- Availability: 99.99% uptime, carrier failover <1min
- Accuracy: 98%+ on-time delivery, <0.5% lost packages
- Cost: ~$55K/month (compute, routing API, carriers)

Key Design Decisions:
- Carrier Selection: Multi-criteria optimization (cost, ETA, capacity, reliability)
- Routing: Google Maps API + internal routing engine
- Failover: Automatic reassignment with priority queue
- Tracking: Geofencing + periodic location updates

System Architecture

High-Level Design:

┌────────────────────────────────────────────────────────┐
│            CLIENT LAYER                                │
│  [Customer App] [Driver App] [Store Associate App]    │
└────────────────────────────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│          API GATEWAY                                   │
│  Tracking API | Driver API | Assignment API           │
└────────────────────────────────────────────────────────┘
                        │
      ┌─────────────────┼─────────────────┐
      ▼                 ▼                 ▼
┌──────────┐    ┌──────────┐    ┌──────────┐
│ Order    │    │ Route    │    │ Tracking │
│Assignment│    │Optimizer │    │ Service  │
└──────────┘    └──────────┘    └──────────┘
      │                 │                 │
      └─────────────────┼─────────────────┘
                        ▼
      ┌────────────────────────────────────┐
      │     MESSAGE QUEUE (Kafka)          │
      │  Topics: deliveries, locations     │
      └────────────────────────────────────┘
                        │
      ┌─────────────────┼─────────────────┐
      ▼                 ▼                 ▼
┌──────────┐    ┌──────────┐    ┌──────────┐
│ Carrier  │    │   ETA    │    │ Failover │
│ Integr.  │    │Calculator│    │ Manager  │
└──────────┘    └──────────┘    └──────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│            DATA LAYER                                  │
│ [PostgreSQL - Orders] [Redis - Live Tracking]         │
│ [Google Maps API - Routing] [MongoDB - Routes]        │
└────────────────────────────────────────────────────────┘

Scalability & Performance:
- Carrier Selection: Real-time scoring algorithm considers cost, ETA, driver rating
- Routing: Batch optimization every 5 minutes, incremental updates for new orders
- Tracking: WebSocket connections for live updates, Redis pub/sub
- Failover: Automatic detection (timeout, driver decline), priority reassignment queue

Code

Order Assignment Service (Java + Spring Boot):

@Servicepublic class OrderAssignmentService {    @Autowired private CarrierService carrierService;    @Autowired private RoutingService routingService;    // Assign order to optimal carrier    public AssignmentResponse assignOrder(Order order) {        // Get available carriers near store        List<Carrier> availableCarriers = carrierService.getAvailableCarriers(            order.getStoreLocation(),            order.getDeliveryWindow()        );        if (availableCarriers.isEmpty()) {            return AssignmentResponse.failed("No carriers available");        }        // Score each carrier        List<CarrierScore> scores = availableCarriers.stream()            .map(carrier -> scoreCarrier(carrier, order))            .sorted(Comparator.comparingDouble(CarrierScore::getScore).reversed())            .collect(Collectors.toList());        // Try to assign to best carrier        for (CarrierScore score : scores) {            AssignmentResult result = attemptAssignment(score.getCarrier(), order);            if (result.isSuccess()) {                // Update routing                routingService.addStopToRoute(score.getCarrier().getRouteId(), order);                // Send notification                notificationService.notifyDriver(score.getCarrier().getDriverId(), order);                return AssignmentResponse.success(result);            }        }        return AssignmentResponse.failed("All carriers declined");    }    // Multi-criteria scoring    private CarrierScore scoreCarrier(Carrier carrier, Order order) {        double costScore = calculateCostScore(carrier, order);        double etaScore = calculateETAScore(carrier, order);        double capacityScore = calculateCapacityScore(carrier);        double reliabilityScore = carrier.getOnTimeRate();        // Weighted scoring        double finalScore =
            0.3 * costScore +            0.4 * etaScore +            0.2 * capacityScore +            0.1 * reliabilityScore;        return new CarrierScore(carrier, finalScore);    }    private double calculateETAScore(Carrier carrier, Order order) {        Duration eta = routingService.calculateETA(            carrier.getCurrentLocation(),            order.getDeliveryAddress()        );        // Score: 1.0 if within delivery window, decreases outside window        long minutesToDeadline = Duration.between(Instant.now(), order.getDeliveryBy()).toMinutes();        long etaMinutes = eta.toMinutes();        if (etaMinutes <= minutesToDeadline) {            return 1.0;        } else {            return Math.max(0, 1.0 - (etaMinutes - minutesToDeadline) / 60.0);        }    }}

Route Optimizer (Python):

from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import prcpvrp
class RouteOptimizer:
    def __init__(self):
        self.google_maps = GoogleMapsClient()
    # Optimize routes for all carriers    def optimize_routes(self, orders: List[Order], carriers: List[Carrier]) -> List[Route]:
        # Group orders by store/zone        order_clusters = self.cluster_orders(orders)
        optimized_routes = []
        for cluster in order_clusters:
            routes = self.solve_vrp(cluster, carriers)
            optimized_routes.extend(routes)
        return optimized_routes
    def solve_vrp(self, orders: List[Order], carriers: List[Carrier]) -> List[Route]:
        # Vehicle Routing Problem with Time Windows        num_vehicles = len(carriers)
        depot = orders[0].store_location
        # Create routing model        manager = pywrapcp.RoutingIndexManager(
            len(orders),
            num_vehicles,
            depot
        )
        routing = pywrapcp.RoutingModel(manager)
        # Distance callback        def distance_callback(from_index, to_index):
            from_node = manager.IndexToNode(from_index)
            to_node = manager.IndexToNode(to_index)
            return self.get_distance(orders[from_node], orders[to_node])
        transit_callback_index = routing.RegisterTransitCallback(distance_callback)
        routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)
        # Add capacity constraints        def demand_callback(from_index):
            from_node = manager.IndexToNode(from_index)
            return orders[from_node].weight
        demand_callback_index = routing.RegisterUnaryTransitCallback(demand_callback)
        routing.AddDimensionWithVehicleCapacity(
            demand_callback_index,
            0,  # null capacity slack            [carrier.capacity for carrier in carriers],
            True,  # start cumul to zero            'Capacity'        )
        # Add time window constraints        def time_callback(from_index, to_index):
            from_node = manager.IndexToNode(from_index)
            to_node = manager.IndexToNode(to_index)
            return self.get_travel_time(orders[from_node], orders[to_node])
        time_callback_index = routing.RegisterTransitCallback(time_callback)
        routing.AddDimension(
            time_callback_index,
            30,  # allow waiting time            180,  # maximum time per vehicle (3 hours)            False,  # don't force start cumul to zero            'Time'        )
        time_dimension = routing.GetDimensionOrDie('Time')
        for i, order in enumerate(orders):
            index = manager.NodeToIndex(i)
            time_dimension.CumulVar(index).SetRange(
                order.delivery_window_start,
                order.delivery_window_end
            )
        # Solve        search_parameters = pywrapcp.DefaultRoutingSearchParameters()
        search_parameters.first_solution_strategy = (
            routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC
        )
        search_parameters.local_search_metaheuristic = (
            routing_enums_pb2.LocalSearchMetaheuristic.GUIDED_LOCAL_SEARCH
        )
        search_parameters.time_limit.seconds = 30        solution = routing.SolveWithParameters(search_parameters)
        return self.extract_routes(solution, routing, manager, orders, carriers)

Tracking Service (Real-time):

@Servicepublic class TrackingService {    @Autowired private RedisTemplate<String, Location> redisTemplate;    @Autowired private SimpMessagingTemplate websocketTemplate;    // Update driver location    public void updateDriverLocation(String driverId, Location location) {        // Store in Redis        redisTemplate.opsForValue().set(            "driver:location:" + driverId,            location,            5, TimeUnit.MINUTES        );        // Get orders assigned to this driver        List<Order> assignedOrders = getAssignedOrders(driverId);        // Update ETA for each order        for (Order order : assignedOrders) {            Duration newETA = calculateETA(location, order.getDeliveryAddress());            order.setEstimatedDelivery(Instant.now().plus(newETA));            // Push update to customer via WebSocket            websocketTemplate.convertAndSend(                "/topic/order/" + order.getId(),                new TrackingUpdate(driverId, location, newETA)            );            // Check geofence (delivery imminent)            if (isNearDestination(location, order.getDeliveryAddress())) {                notificationService.sendArrivalNotification(order.getCustomerId());            }        }    }    private boolean isNearDestination(Location current, Address destination) {        double distance = calculateDistance(current, destination);        return distance < 500; // 500 meters    }}

Database Schema (PostgreSQL):

CREATE TABLE deliveries (
    id VARCHAR(36) PRIMARY KEY,
    order_id VARCHAR(36) NOT NULL,
    store_id VARCHAR(36) NOT NULL,
    carrier_type VARCHAR(20) NOT NULL, -- IN_HOUSE, SPARK, DSP, DRONE    driver_id VARCHAR(36),
    status VARCHAR(20) NOT NULL, -- ASSIGNED, IN_TRANSIT, DELIVERED, FAILED    assigned_at TIMESTAMP,
    delivered_at TIMESTAMP,
    estimated_delivery TIMESTAMP);
CREATE INDEX idx_deliveries_driver ON deliveries (driver_id, status);
CREATE INDEX idx_deliveries_status ON deliveries (status, assigned_at);
CREATE TABLE delivery_routes (
    id VARCHAR(36) PRIMARY KEY,
    driver_id VARCHAR(36) NOT NULL,
    route_date DATE NOT NULL,
    stops JSONB NOT NULL, -- Array of delivery IDs with sequence    optimized_at TIMESTAMP NOT NULL,
    total_distance_km DECIMAL(10,2),
    estimated_duration_min INT);
CREATE TABLE carrier_performance (
    carrier_id VARCHAR(36) PRIMARY KEY,
    on_time_rate DECIMAL(5,4),
    avg_delivery_time_min INT,
    total_deliveries INT,
    last_updated TIMESTAMP NOT NULL DEFAULT NOW()
);

9. Design an efficient tag management system for multi-million product catalog

Difficulty Level: Hard

Engineering Level: Mid-Level Engineer, Senior Engineer

Source: Reddit r/developersIndia (November 2024)

Team: Product Catalog Engineering

Interview Round: System Design Round

Question: “Design an efficient tag management system for Walmart’s product catalog with billions of products. The system must support flexible tagging (categories, attributes, promotions, seasonal), efficient search and filtering (find all products tagged ‘organic’ AND ‘gluten-free’), tag hierarchies (Electronics > Mobile > Smartphones), bulk tag operations, and ensure consistency across distributed systems. Handle 10M+ tag updates per day while maintaining <100ms search latency.”

Answer Framework

Requirements Clarification

Functional Requirements:
- Add/update/delete tags on products
- Hierarchical tags (parent-child relationships)
- Multi-tag search (AND, OR, NOT operations)
- Bulk tag operations (tag 10K products at once)
- Tag analytics (most used tags, trending)

Non-Functional Requirements:
- Scale: Billions of products, 100K+ unique tags, 10M updates/day
- Latency: <100ms tag search, <500ms bulk operations
- Consistency: Eventually consistent, no lost updates
- Cost: ~$15K/month (DB, cache, search)

Key Design Decisions:
- Storage: PostgreSQL for tags + Elasticsearch for search
- Indexing: Inverted index for fast tag lookups
- Hierarchies: Materialized path pattern for efficient traversal
- Bulk Ops: Async processing via message queue

System Architecture

High-Level Design:

┌────────────────────────────────────────────────────────┐
│            CLIENT LAYER                                │
│  [Catalog Manager] [Search API] [Bulk Import Tool]    │
└────────────────────────────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│          API GATEWAY                                   │
│  Tag API | Search API | Bulk API                      │
└────────────────────────────────────────────────────────┘
                        │
      ┌─────────────────┼─────────────────┐
      ▼                 ▼                 ▼
┌──────────┐    ┌──────────┐    ┌──────────┐
│   Tag    │    │  Search  │    │  Bulk    │
│ Service  │    │ Service  │    │Processor │
└──────────┘    └──────────┘    └──────────┘
      │                 │                 │
      └─────────────────┼─────────────────┘
                        ▼
      ┌────────────────────────────────────┐
      │     MESSAGE QUEUE (Kafka)          │
      │  Topics: tag-updates, bulk-ops     │
      └────────────────────────────────────┘
                        │
                        ▼
┌────────────────────────────────────────────────────────┐
│            DATA LAYER                                  │
│ [PostgreSQL - Tags/Products] [Elasticsearch - Search] │
│ [Redis - Cache] [S3 - Bulk Import Files]             │
└────────────────────────────────────────────────────────┘

Scalability & Performance:
- Caching: Redis for frequently accessed tags, 1-hour TTL
- Search: Elasticsearch with custom analyzers for tag matching
- Bulk Operations: Async processing with progress tracking
- Consistency: Event sourcing for tag changes, idempotent operations

Code

Tag Service (Java + Spring Boot):

@Servicepublic class TagService {    @Autowired private TagRepository tagRepo;    @Autowired private ProductTagRepository productTagRepo;    @Autowired private ElasticsearchTemplate esTemplate;    @Autowired private RedisTemplate<String, List<Tag>> redisTemplate;    // Add tag to product    @Transactional    public void addTagToProduct(String productId, String tagId) {        Tag tag = tagRepo.findById(tagId)            .orElseThrow(() -> new NotFoundException("Tag not found"));        // Add tag and all parent tags (hierarchy)        List<Tag> tagsToAdd = getTagHierarchy(tag);        for (Tag t : tagsToAdd) {            ProductTag productTag = ProductTag.builder()                .productId(productId)                .tagId(t.getId())                .createdAt(Instant.now())                .build();            productTagRepo.save(productTag);        }        // Update search index        esTemplate.update(productId, doc -> doc.addTags(tagsToAdd));        // Invalidate cache        redisTemplate.delete("product:tags:" + productId);        // Publish event        kafkaTemplate.send("tag-updates", TagEvent.added(productId, tagId));    }    // Get tag hierarchy (including parents)    private List<Tag> getTagHierarchy(Tag tag) {        List<Tag> hierarchy = new ArrayList<>();        Tag current = tag;        while (current != null) {            hierarchy.add(current);            current = current.getParentId() != null
                ? tagRepo.findById(current.getParentId()).orElse(null)                : null;        }        return hierarchy;    }    // Search products by tags (AND operation)    public List<Product> searchByTags(List<String> tagIds, TagOperator operator) {        String cacheKey = "search:tags:" + String.join(",", tagIds) + ":" + operator;        List<Product> cached = redisTemplate.opsForValue().get(cacheKey);        if (cached != null) return cached;        // Build Elasticsearch query        BoolQueryBuilder query = QueryBuilders.boolQuery();        if (operator == TagOperator.AND) {            for (String tagId : tagIds) {                query.must(QueryBuilders.termQuery("tags.id", tagId));            }        } else if (operator == TagOperator.OR) {            BoolQueryBuilder orQuery = QueryBuilders.boolQuery();            for (String tagId : tagIds) {                orQuery.should(QueryBuilders.termQuery("tags.id", tagId));            }            query.must(orQuery);        }        SearchHits<Product> results = esTemplate.search(            new NativeSearchQueryBuilder()                .withQuery(query)                .withPageable(PageRequest.of(0, 100))                .build(),            Product.class        );        List<Product> products = results.stream()            .map(SearchHit::getContent)            .collect(Collectors.toList());        // Cache results        redisTemplate.opsForValue().set(cacheKey, products, 10, TimeUnit.MINUTES);        return products;    }    // Bulk tag operation    @Async    public CompletableFuture<BulkOperationResult> bulkAddTags(List<String> productIds, List<String> tagIds) {        String operationId = UUID.randomUUID().toString();        // Publish to Kafka for async processing        BulkTagOperation operation = BulkTagOperation.builder()            .id(operationId)            .productIds(productIds)            .tagIds(tagIds)            .operation(BulkOperation.ADD)            .status(OperationStatus.PENDING)            .createdAt(Instant.now())            .build();        kafkaTemplate.send("bulk-tag-ops", operation);        return CompletableFuture.completedFuture(            BulkOperationResult.pending(operationId)        );    }}enum TagOperator { AND, OR, NOT }enum BulkOperation { ADD, REMOVE, REPLACE }enum OperationStatus { PENDING, IN_PROGRESS, COMPLETED, FAILED }

Bulk Tag Processor (Kafka Consumer):

@Servicepublic class BulkTagProcessor {    @Autowired private TagService tagService;    @Autowired private BulkOperationRepository bulkOpRepo;    @KafkaListener(topics = "bulk-tag-ops", concurrency = "5")    public void processBulkOperation(BulkTagOperation operation) {        operation.setStatus(OperationStatus.IN_PROGRESS);        bulkOpRepo.save(operation);        int processed = 0;        int failed = 0;        try {            // Process in batches of 1000            for (int i = 0; i < operation.getProductIds().size(); i += 1000) {                int end = Math.min(i + 1000, operation.getProductIds().size());                List<String> batch = operation.getProductIds().subList(i, end);                for (String productId : batch) {                    try {                        for (String tagId : operation.getTagIds()) {                            if (operation.getOperation() == BulkOperation.ADD) {                                tagService.addTagToProduct(productId, tagId);                            } else if (operation.getOperation() == BulkOperation.REMOVE) {                                tagService.removeTagFromProduct(productId, tagId);                            }                        }                        processed++;                    } catch (Exception e) {                        failed++;                        log.error("Failed to process product {}: {}", productId, e.getMessage());                    }                }                // Update progress                operation.setProgress((double) processed / operation.getProductIds().size());                bulkOpRepo.save(operation);            }            operation.setStatus(OperationStatus.COMPLETED);        } catch (Exception e) {            operation.setStatus(OperationStatus.FAILED);            operation.setErrorMessage(e.getMessage());        }        operation.setProcessedCount(processed);        operation.setFailedCount(failed);        operation.setCompletedAt(Instant.now());        bulkOpRepo.save(operation);    }}

Database Schema (PostgreSQL):

-- Tags table with materialized path for hierarchyCREATE TABLE tags (
    id VARCHAR(36) PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    slug VARCHAR(100) NOT NULL UNIQUE,
    parent_id VARCHAR(36) REFERENCES tags(id),
    path VARCHAR(500), -- Materialized path (e.g., "/electronics/mobile/")    level INT NOT NULL DEFAULT 0,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_tags_path ON tags USING GIN (path gin_trgm_ops);
CREATE INDEX idx_tags_parent ON tags (parent_id);
-- Product-Tag junction tableCREATE TABLE product_tags (
    product_id VARCHAR(36) NOT NULL,
    tag_id VARCHAR(36) NOT NULL REFERENCES tags(id),
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    PRIMARY KEY (product_id, tag_id)
);
CREATE INDEX idx_product_tags_tag ON product_tags (tag_id);
-- Bulk operations trackingCREATE TABLE bulk_tag_operations (
    id VARCHAR(36) PRIMARY KEY,
    operation_type VARCHAR(20) NOT NULL,
    status VARCHAR(20) NOT NULL,
    total_count INT NOT NULL,
    processed_count INT DEFAULT 0,
    failed_count INT DEFAULT 0,
    progress DECIMAL(5,4) DEFAULT 0,
    error_message TEXT,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    completed_at TIMESTAMP);
-- Tag analyticsCREATE TABLE tag_usage_stats (
    tag_id VARCHAR(36) PRIMARY KEY REFERENCES tags(id),
    product_count INT NOT NULL DEFAULT 0,
    search_count INT NOT NULL DEFAULT 0,
    last_used TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

Elasticsearch Index Mapping:

{  "mappings": {    "properties": {      "product_id": { "type": "keyword" },      "name": { "type": "text" },      "tags": {        "type": "nested",        "properties": {          "id": { "type": "keyword" },          "name": { "type": "text" },          "slug": { "type": "keyword" },          "path": { "type": "keyword" }        }      }    }  }}

10. Design Walmart’s omnichannel architecture supporting online, in-store, and mobile

Difficulty Level: Extreme

Engineering Level: Staff Engineer, Distinguished Engineer, Principal Architect

Source: Glassdoor Walmart interview experiences (2024-2025), Architecture Summit talks

Team: Enterprise Architecture, Platform Engineering

Interview Round: System Design Round (senior roles)

Question: “Design Walmart’s omnichannel platform architecture that seamlessly integrates online shopping (Walmart.com, app), in-store experiences (POS, kiosks, Scan & Go), BOPIS (Buy Online Pickup In-Store), curbside pickup, and delivery services. The system must maintain consistent inventory visibility, unified customer profiles, synchronized pricing and promotions, order fulfillment across channels, and real-time data synchronization. Handle 5,000+ physical stores, 270M+ weekly customers, peak traffic of 1M concurrent users, and ensure <99.99% uptime across all channels.”

Answer Framework

Requirements Clarification

Functional Requirements:
- Unified product catalog across all channels
- Real-time inventory sync (online, in-store, warehouse)
- Unified customer profile and cart across devices
- Order fulfillment options (ship, pickup, delivery)
- Consistent pricing and promotions
- Support BOPIS, curbside pickup, Scan & Go

Non-Functional Requirements:
- Scale: 5K+ stores, 270M+ users, 1M concurrent, 100M orders/day
- Latency: <200ms API responses, <1s inventory sync
- Availability: 99.99% uptime, zero downtime deployments
- Consistency: Strong consistency for orders, eventual for inventory
- Cost: ~$200K/month (cloud, data sync, CDN)

Key Design Decisions:
- Architecture: Event-driven microservices with domain-driven design
- Data Sync: CDC (Change Data Capture) + event streaming (Kafka)
- API Gateway: GraphQL federation for unified API layer
- Consistency: SAGA pattern for distributed transactions

System Architecture

High-Level Design:

┌────────────────────────────────────────────────────────────────┐
│                  CLIENT LAYER                                  │
│  [Walmart.com] [Mobile App] [In-Store POS] [Scan & Go]       │
└────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌────────────────────────────────────────────────────────────────┐
│          API GATEWAY (GraphQL Federation)                      │
│  Authentication | Rate Limiting | Request Routing             │
└────────────────────────────────────────────────────────────────┘
                              │
      ┌───────────────────────┼───────────────────────┐
      ▼                       ▼                       ▼
┌────────────┐      ┌────────────┐        ┌────────────┐
│  Product   │      │  Customer  │        │   Order    │
│  Service   │      │  Service   │        │  Service   │
└────────────┘      └────────────┘        └────────────┘
      │                       │                       │
      ▼                       ▼                       ▼
┌────────────┐      ┌────────────┐        ┌────────────┐
│ Inventory  │      │   Cart     │        │ Fulfillment│
│  Service   │      │  Service   │        │  Service   │
└────────────┘      └────────────┘        └────────────┘
                              │
                              ▼
      ┌───────────────────────────────────────────────┐
      │     EVENT STREAMING (Kafka)                   │
      │  Topics: inventory, orders, customers, carts  │
      └───────────────────────────────────────────────┘
                              │
      ┌───────────────────────┼───────────────────────┐
      ▼                       ▼                       ▼
┌────────────┐      ┌────────────┐        ┌────────────┐
│Store POS   │      │ Warehouse  │        │ Analytics  │
│Integration │      │Management  │        │  Service   │
└────────────┘      └────────────┘        └────────────┘
                              │
                              ▼
┌────────────────────────────────────────────────────────────────┐
│                      DATA LAYER                                │
│ [Product DB] [Customer DB] [Order DB] [Inventory DB]         │
│ [Redis Cache] [Elasticsearch] [Data Warehouse]               │
└────────────────────────────────────────────────────────────────┘

Scalability & Performance:
- Microservices: Domain-driven, independently deployable services
- Event Streaming: Kafka for async communication, CDC for data sync
- Caching: Multi-layer (CDN, API Gateway, Redis, App Cache)
- Database: Polyglot persistence (PostgreSQL, MongoDB, Cassandra)

Code

Order Orchestration Service (Java + Spring Boot):

@Servicepublic class OrderOrchestrationService {    @Autowired private InventoryService inventoryService;    @Autowired private PaymentService paymentService;    @Autowired private FulfillmentService fulfillmentService;    // Place order using SAGA pattern    @Transactional    public OrderResponse placeOrder(OrderRequest request) {        String orderId = UUID.randomUUID().toString();        Order order = Order.builder()            .id(orderId)            .customerId(request.getCustomerId())            .items(request.getItems())            .fulfillmentType(request.getFulfillmentType()) // SHIP, PICKUP, DELIVERY            .status(OrderStatus.PENDING)            .build();        orderRepository.save(order);        try {            // Step 1: Reserve inventory            InventoryReservation reservation = inventoryService.reserveItems(                order.getItems(), order.getStoreId()            );            // Step 2: Process payment            PaymentResult payment = paymentService.processPayment(                order.getId(), order.getTotalAmount(), request.getPaymentMethod()            );            // Step 3: Create fulfillment            FulfillmentRequest fulfillment = fulfillmentService.createFulfillment(                order.getId(), order.getFulfillmentType(), order.getStoreId()            );            // Confirm order            order.setStatus(OrderStatus.CONFIRMED);            orderRepository.save(order);            kafkaTemplate.send("orders", OrderEvent.confirmed(order));            return OrderResponse.success(orderId);        } catch (Exception e) {            // SAGA compensation (rollback)            compensateOrder(order, e);            return OrderResponse.failed(e.getMessage());        }    }    private void compensateOrder(Order order, Exception error) {        if (order.getFulfillmentId() != null) {            fulfillmentService.cancelFulfillment(order.getFulfillmentId());        }        if (order.getPaymentId() != null) {            paymentService.refundPayment(order.getPaymentId());        }        if (order.getReservationId() != null) {            inventoryService.releaseReservation(order.getReservationId());        }        order.setStatus(OrderStatus.FAILED);        orderRepository.save(order);    }}

Unified Cart Service (synced across channels):

@Servicepublic class CartService {    @Autowired private RedisTemplate<String, Cart> redisTemplate;    // Add item to cart (works across all channels)    public Cart addItem(String userId, String channel, CartItem item) {        String cartKey = "cart:" + userId;        Cart cart = redisTemplate.opsForValue().get(cartKey);        if (cart == null) {            cart = cartRepo.findByUserId(userId)                .orElse(Cart.builder().userId(userId).items(new ArrayList<>()).build());        }        // Check inventory        boolean available = inventoryService.checkAvailability(            item.getProductId(), item.getQuantity(), channel
        );        if (!available) {            throw new OutOfStockException("Product not available");        }        cart.addItem(item);        cart.setLastModified(Instant.now());        // Update Redis        redisTemplate.opsForValue().set(cartKey, cart, 7, TimeUnit.DAYS);        // Async persist to DB        cartRepo.save(cart);        // Publish cart update event        kafkaTemplate.send("cart-updates", CartEvent.itemAdded(userId, item, channel));        return cart;    }    // Sync cart across devices    @KafkaListener(topics = "cart-updates")    public void syncCart(CartEvent event) {        websocketTemplate.convertAndSend("/topic/cart/" + event.getUserId(), event);    }}

Inventory Sync Service (CDC + Kafka):

@Servicepublic class InventorySyncService {    // Listen to inventory changes from stores (CDC)    @KafkaListener(topics = "store-inventory-cdc", concurrency = "10")    public void processInventoryChange(InventoryChangeEvent event) {        Inventory inventory = inventoryRepo.findByProductAndStore(            event.getProductId(), event.getStoreId()        );        inventory.setQuantity(event.getNewQuantity());        inventory.setLastUpdated(Instant.now());        inventoryRepo.save(inventory);        // Update cache        redisTemplate.opsForValue().set(            "inventory:" + event.getProductId() + ":" + event.getStoreId(),            inventory, 5, TimeUnit.MINUTES        );        // Publish to all channels        kafkaTemplate.send("inventory-updates", InventoryEvent.updated(inventory));        // Update online availability        if (inventory.getQuantity() == 0) {            productService.markOutOfStock(event.getProductId(), event.getStoreId());        }    }    // Propagate to online systems    @KafkaListener(topics = "inventory-updates")    public void propagateToOnline(InventoryEvent event) {        // Update Elasticsearch        elasticsearchService.updateProductAvailability(            event.getProductId(), event.getStoreId(), event.getQuantity()        );        // Notify subscribed customers        if (event.getQuantity() > 0) {            notificationService.notifyBackInStock(event.getProductId(), event.getStoreId());        }    }}

Database Schema (PostgreSQL):

-- Orders tableCREATE TABLE orders (
    id VARCHAR(36) PRIMARY KEY,
    customer_id VARCHAR(36) NOT NULL,
    store_id VARCHAR(36),
    fulfillment_type VARCHAR(30) NOT NULL,
    status VARCHAR(20) NOT NULL,
    total_amount DECIMAL(12,2) NOT NULL,
    payment_id VARCHAR(36),
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);
CREATE INDEX idx_orders_customer ON orders (customer_id, created_at DESC);
-- Unified cartCREATE TABLE carts (
    id BIGSERIAL PRIMARY KEY,
    user_id VARCHAR(36) NOT NULL UNIQUE,
    items JSONB NOT NULL,
    channel VARCHAR(20), -- WEB, MOBILE, INSTORE    last_modified TIMESTAMP NOT NULL);
-- Multi-channel inventoryCREATE TABLE omnichannel_inventory (
    id BIGSERIAL PRIMARY KEY,
    product_id VARCHAR(36) NOT NULL,
    store_id VARCHAR(36) NOT NULL,
    online_quantity INT NOT NULL DEFAULT 0,
    instore_quantity INT NOT NULL DEFAULT 0,
    reserved_quantity INT NOT NULL DEFAULT 0,
    last_synced TIMESTAMP NOT NULL DEFAULT NOW(),
    CONSTRAINT unique_product_store UNIQUE (product_id, store_id)
);
CREATE INDEX idx_inventory_product ON omnichannel_inventory (product_id);