Flipkart Software Engineer
This guide features 10 challenging Software Engineer interview questions for Flipkart (SDE-1 to SDE-3 levels), covering machine coding, data structures, system design, distributed systems, and backend architecture aligned with Flipkart’s e-commerce platform challenges and Big Billion Days scale.
1. Logger Library System Design
Difficulty Level: High
Role: Software Engineer / Senior Software Engineer (SDE-2/3)
Source: LinkedIn (Manish Ranjan Behera), Flipkart Machine Coding
Topic: Design Patterns & Infrastructure
Interview Round: Machine Coding Round (90 min coding + 60 min discussion)
Engineering Domain: Backend Engineering / Infrastructure
Question: “Design and implement a logger library that applications can use to log messages to multiple sinks with hierarchical message levels. Support message levels in priority order: FATAL, ERROR, WARN, INFO, DEBUG. Route messages to appropriate sinks based on configured levels. Enrich messages with additional information (timestamps, namespaces). Support multiple message level-to-sink configurations. A message level tied to a sink should log that level and all higher-priority levels.”
Answer Framework
STAR Method Structure:
- Situation: Applications need flexible logging infrastructure supporting multiple outputs (console, file, remote) with configurable verbosity
- Task: Design extensible logger with hierarchical levels, multiple sinks, message enrichment, and clean separation of concerns
- Action: Implement Chain of Responsibility pattern for level filtering, Observer pattern for sink notifications, decorator for message enrichment
- Result: Extensible logger supporting new levels/sinks without code changes, SOLID principles adherence, production-ready error handling
Key Competencies Evaluated:
- Design Patterns: Chain of Responsibility for level propagation, Observer for sink management
- SOLID Principles: Single Responsibility (separate logger, sink, formatter), Open/Closed (extensible without modification)
- Code Quality: Clean separation of concerns, proper abstraction, error handling
- Extensibility: Adding new message levels or sinks requires minimal changes
Logger Library Implementation
// Message Level Enum (Hierarchical)
enum LogLevel {
DEBUG(1), INFO(2), WARN(3), ERROR(4), FATAL(5);
private final int priority;
LogLevel(int priority) {
this.priority = priority;
}
public boolean shouldLog(LogLevel configuredLevel) {
return this.priority >= configuredLevel.priority;
}
}
// Log Message (Enriched with metadata)
class LogMessage {
private final LogLevel level;
private final String message;
private final String namespace;
private final long timestamp;
public LogMessage(LogLevel level, String message, String namespace) {
this.level = level;
this.message = message;
this.namespace = namespace;
this.timestamp = System.currentTimeMillis();
}
public String format() {
return String.format("[%s] [%s]%s:%s",
new Date(timestamp), level, namespace, message);
}
public LogLevel getLevel() { return level; }
}
// Sink Interface (Observer Pattern)
interface LogSink {
void write(LogMessage message);
LogLevel getConfiguredLevel();
}
// Console Sink Implementation
class ConsoleSink implements LogSink {
private final LogLevel configuredLevel;
public ConsoleSink(LogLevel configuredLevel) {
this.configuredLevel = configuredLevel;
}
@Override
public void write(LogMessage message) {
if (message.getLevel().shouldLog(configuredLevel)) {
System.out.println(message.format());
}
}
@Override
public LogLevel getConfiguredLevel() {
return configuredLevel;
}
}
// File Sink Implementation
class FileSink implements LogSink {
private final LogLevel configuredLevel;
private final String filePath;
private final BufferedWriter writer;
public FileSink(LogLevel configuredLevel, String filePath) throws IOException {
this.configuredLevel = configuredLevel;
this.filePath = filePath;
this.writer = new BufferedWriter(new FileWriter(filePath, true));
}
@Override
public void write(LogMessage message) {
if (message.getLevel().shouldLog(configuredLevel)) {
try {
writer.write(message.format());
writer.newLine();
writer.flush();
} catch (IOException e) {
System.err.println("Failed to write to file: " + e.getMessage());
}
}
}
@Override
public LogLevel getConfiguredLevel() {
return configuredLevel;
}
}
// Logger (Chain of Responsibility + Observer)
class Logger {
private final String namespace;
private final List<LogSink> sinks;
public Logger(String namespace) {
this.namespace = namespace;
this.sinks = new ArrayList<>();
}
public void addSink(LogSink sink) {
sinks.add(sink);
}
public void log(LogLevel level, String message) {
LogMessage logMessage = new LogMessage(level, message, namespace);
// Notify all sinks (Observer pattern)
for (LogSink sink : sinks) {
sink.write(logMessage);
}
}
// Convenience methods
public void debug(String message) { log(LogLevel.DEBUG, message); }
public void info(String message) { log(LogLevel.INFO, message); }
public void warn(String message) { log(LogLevel.WARN, message); }
public void error(String message) { log(LogLevel.ERROR, message); }
public void fatal(String message) { log(LogLevel.FATAL, message); }
}
// Usage Example
public class LoggerDemo {
public static void main(String[] args) throws IOException {
Logger logger = new Logger("com.flipkart.inventory");
// Configure sinks with different levels
logger.addSink(new ConsoleSink(LogLevel.INFO)); // Console shows INFO+
logger.addSink(new FileSink(LogLevel.DEBUG, "app.log")); // File shows DEBUG+
logger.addSink(new FileSink(LogLevel.ERROR, "error.log")); // Error file shows ERROR+
// Log messages
logger.debug("Inventory check started"); // Only in app.log
logger.info("Processing 100 items"); // Console + app.log
logger.warn("Low stock detected"); // Console + app.log
logger.error("Failed to update item"); // Console + app.log + error.log
logger.fatal("Database connection lost"); // All sinks
}
}Answer (Part 1 of 3): Design Pattern Application
Implementation uses Chain of Responsibility pattern where each log level filters messages based on priority (FATAL > ERROR > WARN > INFO > DEBUG), with shouldLog() method checking if message priority ≥ configured sink level enabling hierarchical propagation—Observer pattern manages multiple sinks where Logger maintains list of LogSink observers, log() method notifies all sinks allowing each to independently decide whether to write based on configured level, enabling flexible routing (console for INFO+, file for DEBUG+, remote for ERROR+) without tight coupling. Decorator pattern enriches messages with metadata (timestamp, namespace, level) via LogMessage class separating raw message from formatted output, demonstrating SOLID principles: Single Responsibility (Logger handles distribution, Sink handles writing, LogMessage handles formatting), Open/Closed (new sinks added without modifying Logger), Liskov Substitution (any LogSink implementation works), Interface Segregation (minimal sink interface), Dependency Inversion (Logger depends on LogSink abstraction not concrete implementations).
Answer (Part 2 of 3): Extensibility & Error Handling
Extensibility achieved through abstraction where adding new sink types (DatabaseSink, RemoteSink, CloudWatchSink) requires implementing LogSink interface without modifying Logger class, adding new log levels requires updating LogLevel enum and priority comparison logic (centralized in one place), and adding message enrichment (thread ID, stack trace, correlation ID) requires extending LogMessage class not changing Logger/Sink contracts—error handling prevents logger failures from crashing application via try-catch in FileSink.write() logging errors to stderr not throwing exceptions, graceful degradation where if one sink fails (file permission denied) other sinks continue working, and defensive programming checking null messages, validating file paths, handling IOException during file operations. Production considerations include thread safety (synchronize write() methods for concurrent logging), buffering (batch writes reducing I/O overhead), log rotation (size-based or time-based file rotation preventing unbounded growth), and performance optimization (async logging via queue + background thread preventing blocking on slow sinks).
Answer (Part 3 of 3): Production Enhancements
Production enhancements implement async logging where Logger.log() enqueues messages to BlockingQueue, background thread consumes queue writing to sinks asynchronously preventing application threads from blocking on I/O, with bounded queue size (10,000 messages) and overflow policy (drop oldest or block caller)—log rotation adds RotatingFileSink extending FileSink with size-based rotation (when file exceeds 100MB, rename to app.log.1, create new app.log) and time-based rotation (daily rotation at midnight), maintaining configurable number of backup files (keep last 7 days)—structured logging supports JSON format via JsonLogMessage extending LogMessage formatting as {“timestamp”: 1234567890, “level”: “ERROR”, “namespace”: “inventory”, “message”: “…”, “metadata”: {…}} enabling log aggregation tools (ELK, Splunk) to parse and index efficiently—sampling adds SampledSink wrapper implementing probabilistic logging (log 1% of DEBUG messages, 100% of ERROR+) reducing volume while maintaining error visibility, critical for high-throughput services generating millions of logs daily—correlation enriches messages with request ID/trace ID enabling distributed tracing across microservices, demonstrating systems thinking beyond pure coding exercise considering scalability, observability, and operational requirements.
2. Longest Valid Parentheses + Maximize Product of Worker Speeds
Difficulty Level: Very High
Role: Software Engineer (SDE-2)
Source: GeeksforGeeks Interview Experience, Flipkart DSA Round
Topic: Dynamic Programming & Greedy Algorithms
Interview Round: Problem Solving & Data Structures (60 min, 2 problems)
Engineering Domain: Backend Engineering / Algorithms
Question: “Solve two hard-level problems: (1) Longest Valid Parentheses: Given a string containing only parentheses, find the length of the longest valid (well-formed) parentheses substring. (2) Maximize Product: Given an array of worker speeds and efficiencies, select K workers to maximize the product of (sum of speeds × minimum efficiency).”
Answer Framework
STAR Method Structure:
- Situation: Technical screening requiring efficient solutions to hard algorithmic problems under time pressure
- Task: Implement optimal solutions with correct time complexity (O(n) for parentheses, O(n log n) for workers)
- Action: Use dynamic programming + stack for parentheses, sorting + greedy for worker selection
- Result: Working code for both problems demonstrating strong DSA fundamentals and optimization skills
Key Competencies Evaluated:
- Dynamic Programming: State definition, transition, optimization
- Stack-Based Algorithms: Matching parentheses, index tracking
- Greedy Algorithms: Sorting, priority selection
- Time Complexity: Achieving optimal O(n) and O(n log n) solutions
Problem 1: Longest Valid Parentheses
def longestValidParentheses(s: str) -> int:
"""
Find longest valid parentheses substring using stack.
Time: O(n), Space: O(n)
"""
stack = [-1] # Initialize with base index
max_length = 0
for i, char in enumerate(s):
if char == '(':
stack.append(i)
else: # char == ')'
stack.pop()
if not stack:
# No matching '(', push current index as new base
stack.append(i)
else:
# Calculate length from last unmatched index
max_length = max(max_length, i - stack[-1])
return max_length
# Alternative DP solution
def longestValidParenthesesDP(s: str) -> int:
"""
DP approach: dp[i] = length of longest valid ending at i
Time: O(n), Space: O(n)
"""
n = len(s)
if n < 2:
return 0
dp = [0] * n
max_length = 0
for i in range(1, n):
if s[i] == ')':
if s[i-1] == '(':
# Case: ...()
dp[i] = (dp[i-2] if i >= 2 else 0) + 2
elif i - dp[i-1] > 0 and s[i - dp[i-1] - 1] == '(':
# Case: ...))
dp[i] = dp[i-1] + 2 + (dp[i - dp[i-1] - 2] if i - dp[i-1] >= 2 else 0)
max_length = max(max_length, dp[i])
return max_length
# Test cases
print(longestValidParentheses("(()")) # Output: 2
print(longestValidParentheses(")()())")) # Output: 4
print(longestValidParentheses("")) # Output: 0Problem 2: Maximize Product of Worker Speeds
import heapq
def maxPerformance(n: int, speed: list[int], efficiency: list[int], k: int) -> int:
"""
Select k workers maximizing (sum of speeds) × (min efficiency).
Time: O(n log n), Space: O(n)
"""
MOD = 10**9 + 7
# Create (efficiency, speed) pairs and sort by efficiency descending
workers = sorted(zip(efficiency, speed), reverse=True)
max_product = 0
speed_sum = 0
min_heap = [] # Min heap to maintain k highest speeds
for eff, spd in workers:
# Add current worker's speed
heapq.heappush(min_heap, spd)
speed_sum += spd
# If we have more than k workers, remove slowest
if len(min_heap) > k:
speed_sum -= heapq.heappop(min_heap)
# Current efficiency is minimum (sorted descending)
max_product = max(max_product, speed_sum * eff)
return max_product % MOD
# Test case
n = 6
speed = [2, 10, 3, 1, 5, 8]
efficiency = [5, 4, 3, 9, 7, 2]
k = 2
print(maxPerformance(n, speed, efficiency, k)) # Output: 60 (workers 1,5: (10+5)*min(4,7)=60)Answer (Part 1 of 3): Parentheses Stack Approach
Stack solution maintains stack of indices tracking unmatched parentheses where for ‘(’ push index onto stack, for ‘)’ pop from stack (matching previous ‘(’), if stack empty after pop then current ‘)’ has no match so push its index as new base, otherwise calculate valid length as current index minus top of stack (last unmatched position)—key insight is stack top always represents last unmatched index enabling O(1) length calculation, with initialization stack=[-1] providing base for first valid substring. Time complexity O(n) single pass, space O(n) worst case all characters unmatched, demonstrating efficient solution avoiding nested loops or backtracking.
Answer (Part 2 of 3): DP Transition Logic
Dynamic programming defines dp[i] = length of longest valid parentheses ending at index i, with transition cases: if s[i]=‘(’ then dp[i]=0 (cannot end with opening), if s[i]=‘)’ and s[i-1]=‘(’ then dp[i] = dp[i-2] + 2 (matching pair plus previous valid), if s[i]=‘)’ and s[i-1]=‘)’ then check if s[i-dp[i-1]-1]=‘(’ (matching opening for current closing after nested valid substring), if match then dp[i] = dp[i-1] + 2 + dp[i-dp[i-1]-2] (nested valid + matching pair + previous valid before nested)—this handles complex cases like “()(())” where nested parentheses extend previous valid substrings, with max_length tracking global maximum across all positions.
Answer (Part 3 of 3): Worker Selection Greedy Strategy
Greedy approach sorts workers by efficiency descending ensuring when processing worker i, all previous workers have efficiency ≥ current, making current efficiency the minimum for any subset including current worker—maintains min heap of k highest speeds seen so far, for each worker add speed to heap and sum, if heap size exceeds k remove slowest worker (heap pop), calculate product speed_sum × current_efficiency (current is minimum due to sorting), track maximum product across all iterations—correctness proven by observation that optimal solution must include worker with minimum efficiency in selected set, and for that worker we want k-1 highest speeds among workers with efficiency ≥ minimum, which greedy approach guarantees by processing in descending efficiency order. Time complexity O(n log n) for sorting + O(n log k) for heap operations, space O(n) for worker pairs + O(k) for heap, demonstrating understanding of greedy algorithm design and heap-based optimization.
3. FlipMed: Doctor’s Appointment Booking System
Difficulty Level: High
Role: Software Engineer (SDE-2)
Source: YouTube (Shubh Patel), Flipkart Machine Coding Round
Topic: Low-Level Design & Strategy Pattern
Interview Round: Machine Coding Round (90 min)
Engineering Domain: Backend Engineering / Service Architecture
Question: “Design and implement a doctor appointment booking system with multiple doctors, available time slots, and dynamic sorting strategies. Core features: Register doctors with specialties and available time slots, search available doctors by specialty, book appointments with conflict detection, implement rating-based and start-time-based sorting strategies, implement waitlist feature if slots are booked, display available slots based on selected sorting strategy.”
Answer Framework
STAR Method Structure:
- Situation: Healthcare appointment system requiring flexible doctor search, conflict-free booking, and dynamic ranking
- Task: Design modular service layer with Strategy pattern for sorting, Repository pattern for data access
- Action: Implement Doctor/Patient/Slot/Appointment entities, service layer with booking logic, strategy interface for ranking algorithms
- Result: Extensible system supporting new sorting strategies without code changes, clean separation of concerns, waitlist conflict resolution
Key Competencies Evaluated:
- Design Patterns: Strategy pattern for dynamic sorting, Repository for data access
- Service Layer Architecture: Separation of business logic from data layer
- Entity Relationships: Doctor-Slot-Appointment modeling, conflict detection
- Code Quality: Minimal hardcoding, extensibility, clean abstractions
FlipMed System Implementation
// Entities
class Doctor {
private String id;
private String name;
private String specialty;
private double rating;
private List<TimeSlot> availableSlots;
public Doctor(String id, String name, String specialty) {
this.id = id;
this.name = name;
this.specialty = specialty;
this.rating = 0.0;
this.availableSlots = new ArrayList<>();
}
public void addSlot(TimeSlot slot) {
availableSlots.add(slot);
}
// Getters
public String getId() { return id; }
public String getSpecialty() { return specialty; }
public double getRating() { return rating; }
public List<TimeSlot> getAvailableSlots() { return availableSlots; }
}
class TimeSlot {
private LocalDateTime startTime;
private LocalDateTime endTime;
private boolean isBooked;
public TimeSlot(LocalDateTime startTime, LocalDateTime endTime) {
this.startTime = startTime;
this.endTime = endTime;
this.isBooked = false;
}
public boolean isAvailable() { return !isBooked; }
public void book() { this.isBooked = true; }
public LocalDateTime getStartTime() { return startTime; }
}
class Appointment {
private String appointmentId;
private Doctor doctor;
private Patient patient;
private TimeSlot slot;
public Appointment(String id, Doctor doctor, Patient patient, TimeSlot slot) {
this.appointmentId = id;
this.doctor = doctor;
this.patient = patient;
this.slot = slot;
}
}
class Patient {
private String id;
private String name;
public Patient(String id, String name) {
this.id = id;
this.name = name;
}
}
// Strategy Pattern for Doctor Sorting
interface DoctorSortStrategy {
List<Doctor> sort(List<Doctor> doctors);
}
class RatingBasedSort implements DoctorSortStrategy {
@Override
public List<Doctor> sort(List<Doctor> doctors) {
return doctors.stream()
.sorted(Comparator.comparingDouble(Doctor::getRating).reversed())
.collect(Collectors.toList());
}
}
class StartTimeBasedSort implements DoctorSortStrategy {
@Override
public List<Doctor> sort(List<Doctor> doctors) {
return doctors.stream()
.sorted((d1, d2) -> {
LocalDateTime t1 = d1.getAvailableSlots().stream()
.filter(TimeSlot::isAvailable)
.map(TimeSlot::getStartTime)
.min(LocalDateTime::compareTo)
.orElse(LocalDateTime.MAX);
LocalDateTime t2 = d2.getAvailableSlots().stream()
.filter(TimeSlot::isAvailable)
.map(TimeSlot::getStartTime)
.min(LocalDateTime::compareTo)
.orElse(LocalDateTime.MAX);
return t1.compareTo(t2);
})
.collect(Collectors.toList());
}
}
// Service Layer
class AppointmentService {
private Map<String, Doctor> doctors;
private Map<String, Appointment> appointments;
private Queue<Patient> waitlist;
private DoctorSortStrategy sortStrategy;
public AppointmentService() {
this.doctors = new HashMap<>();
this.appointments = new HashMap<>();
this.waitlist = new LinkedList<>();
this.sortStrategy = new RatingBasedSort(); // Default
}
public void setSortStrategy(DoctorSortStrategy strategy) {
this.sortStrategy = strategy;
}
public void registerDoctor(Doctor doctor) {
doctors.put(doctor.getId(), doctor);
}
public List<Doctor> searchBySpecialty(String specialty) {
List<Doctor> filtered = doctors.values().stream()
.filter(d -> d.getSpecialty().equalsIgnoreCase(specialty))
.collect(Collectors.toList());
return sortStrategy.sort(filtered);
}
public Appointment bookAppointment(String doctorId, Patient patient, TimeSlot slot) {
Doctor doctor = doctors.get(doctorId);
if (doctor == null) {
throw new IllegalArgumentException("Doctor not found");
}
// Check slot availability
TimeSlot doctorSlot = doctor.getAvailableSlots().stream()
.filter(s -> s.equals(slot) && s.isAvailable())
.findFirst()
.orElse(null);
if (doctorSlot == null) {
// Add to waitlist
waitlist.offer(patient);
return null;
}
// Book slot
doctorSlot.book();
String appointmentId = UUID.randomUUID().toString();
Appointment appointment = new Appointment(appointmentId, doctor, patient, doctorSlot);
appointments.put(appointmentId, appointment);
return appointment;
}
public List<TimeSlot> getAvailableSlots(String doctorId) {
Doctor doctor = doctors.get(doctorId);
return doctor.getAvailableSlots().stream()
.filter(TimeSlot::isAvailable)
.collect(Collectors.toList());
}
}Answer (Part 1 of 3): Strategy Pattern Implementation
Strategy pattern enables dynamic doctor sorting without modifying AppointmentService by defining DoctorSortStrategy interface with sort() method, implementing concrete strategies (RatingBasedSort sorting by rating descending, StartTimeBasedSort sorting by earliest available slot), and allowing runtime strategy selection via setSortStrategy()—this follows Open/Closed Principle where adding new sorting criteria (distance-based, cost-based, availability-based) requires creating new strategy class without changing existing code. Service layer uses sortStrategy.sort() delegating sorting responsibility to strategy object, demonstrating proper separation of concerns where AppointmentService handles business logic (booking, conflict detection) while strategy handles presentation logic (ordering).
Answer (Part 2 of 3): Conflict Detection & Waitlist
Conflict detection implemented in bookAppointment() by searching doctor’s available slots for matching slot with isAvailable() check, preventing double-booking via slot state management where TimeSlot.book() sets isBooked=true making slot unavailable for subsequent bookings—waitlist mechanism uses Queue storing patients when requested slot unavailable, with future enhancement supporting waitlist notifications when slot becomes available (cancellation scenario) via observer pattern where Appointment.cancel() triggers waitlist.poll() and automatic rebooking. Edge cases handled include doctor not found (IllegalArgumentException), slot not belonging to doctor (null check), and concurrent booking attempts (requires synchronization or optimistic locking in production).
Answer (Part 3 of 3): Production Enhancements
Production enhancements add concurrency control via ReentrantLock per doctor preventing race conditions during slot booking, optimistic locking using version field on TimeSlot detecting concurrent modifications, and distributed locking (Redis) for multi-instance deployments—notification system implements observer pattern where Appointment events (booked, cancelled, rescheduled) trigger notifications to patients via email/SMS, waitlist patients notified when slots available—analytics track doctor utilization (booked slots / total slots), popular specialties, peak booking times enabling capacity planning—caching stores frequently accessed data (doctor profiles, available slots) in Redis with TTL-based invalidation, reducing database load—audit trail logs all booking operations (who, when, what) for compliance and debugging, demonstrating systems thinking beyond pure coding exercise considering scalability, reliability, and operational requirements.
4. Inventory Management System with Alert Mechanisms
Difficulty Level: Very High
Role: Software Engineer / Senior Software Engineer (SDE-2/3)
Source: YouTube (Sanket Singh), Flipkart System Design Round
Topic: System Design - HLD & LLD
Interview Round: System Design Round (60+ min)
Engineering Domain: Backend Engineering / Platform Engineering
Question: “Design a scalable e-commerce inventory management system for sellers with real-time alert mechanisms, monitoring dashboards, and handling high-volume concurrent updates during flash sales like Big Billion Days. Requirements: Support product inventory tracking and updates, alert mechanisms for low stock, upcoming products, and out-of-stock scenarios, monitoring dashboard with real-time metrics (stock availability > 95%, stockout incidents < 5% of SKUs), consistency requirements HIGH (orders, payments, inventory), availability requirements HIGH (sellers must always be able to upload/update inventory), horizontal scalability for traffic spikes, event-driven architecture with message queues (Kafka).”
Answer Framework
STAR Method Structure:
- Situation: E-commerce platform requiring real-time inventory tracking for millions of SKUs across thousands of sellers during high-traffic events
- Task: Design distributed system balancing consistency (prevent overselling) with availability (seller operations) and scalability (Big Billion Days traffic)
- Action: Microservices architecture with Inventory Service, Alert Service, Monitoring Service, Kafka for events, Redis caching, database sharding
- Result: System handling 100k+ inventory updates/sec, <100ms p95 latency, 99.9% availability, eventual consistency for dashboards, strong consistency for orders
Key Competencies Evaluated:
- System Design: HLD component identification, data flow, API design
- Distributed Systems: Consistency vs availability trade-offs, CAP theorem application
- Scalability: Horizontal scaling, sharding, caching strategies
- Event-Driven Architecture: Kafka for async processing, event sourcing
Inventory System Architecture
┌─────────────────────────────────────────────────────────────────┐
│ API Gateway │
│ (Rate Limiting, Auth) │
└────────────┬────────────────────────────────────┬────────────────┘
│ │
┌────────▼────────┐ ┌───────▼────────┐
│ Inventory │ │ Order │
│ Service │◄─────────────────┤ Service │
│ (Write/Read) │ Reserve Stock │ │
└────────┬────────┘ └────────────────┘
│
│ Publish Events
▼
┌─────────────────┐
│ Kafka Topics │
│ - inventory.updated
│ - inventory.low_stock
│ - inventory.out_of_stock
└────────┬────────┘
│
┌──────┴──────┬─────────────┐
│ │ │
┌─────▼─────┐ ┌────▼─────┐ ┌────▼──────┐
│ Alert │ │ Monitoring│ │ Analytics │
│ Service │ │ Service │ │ Service │
│ │ │ (Metrics) │ │ │
└───────────┘ └───────────┘ └───────────┘Data Model & Sharding Strategy
-- Inventory Table (Sharded by seller_id)
CREATE TABLE inventory (
product_id VARCHAR(50) PRIMARY KEY,
seller_id VARCHAR(50) NOT NULL,
sku VARCHAR(100) NOT NULL,
quantity INT NOT NULL,
reserved_quantity INT DEFAULT 0,
available_quantity INT GENERATED ALWAYS AS (quantity - reserved_quantity),
low_stock_threshold INT DEFAULT 10,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
version INT DEFAULT 0, -- Optimistic locking
INDEX idx_seller (seller_id),
INDEX idx_available (available_quantity)
) PARTITION BY HASH(seller_id);
-- Inventory Events (Event Sourcing)
CREATE TABLE inventory_events (
event_id BIGINT AUTO_INCREMENT PRIMARY KEY,
product_id VARCHAR(50) NOT NULL,
event_type ENUM('CREATED', 'UPDATED', 'RESERVED', 'RELEASED', 'SOLD'),
quantity_delta INT,
previous_quantity INT,
new_quantity INT,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
metadata JSON,
INDEX idx_product_time (product_id, timestamp)
);Core Service Implementation
from dataclasses import dataclass
from enum import Enum
import redis
from kafka import KafkaProducer
import json
class InventoryEventType(Enum):
UPDATED = "UPDATED"
LOW_STOCK = "LOW_STOCK"
OUT_OF_STOCK = "OUT_OF_STOCK"
RESERVED = "RESERVED"
@dataclass
class InventoryUpdate:
product_id: str
seller_id: str
quantity_delta: int
operation: str # 'ADD', 'REMOVE', 'RESERVE', 'RELEASE'
class InventoryService:
def __init__(self, db, cache: redis.Redis, kafka: KafkaProducer):
self.db = db
self.cache = cache
self.kafka = kafka
def update_inventory(self, update: InventoryUpdate) -> dict:
"""
Update inventory with optimistic locking and event publishing.
Strong consistency for inventory updates.
"""
cache_key = f"inventory:{update.product_id}"
# Try cache first (read-through)
cached = self.cache.get(cache_key)
if cached:
inventory = json.loads(cached)
else:
inventory = self._fetch_from_db(update.product_id)
# Optimistic locking
current_version = inventory['version']
new_quantity = inventory['quantity'] + update.quantity_delta
if new_quantity < 0:
raise ValueError("Insufficient inventory")
# Database update with version check
query = """
UPDATE inventory
SET quantity =%s, version = version + 1, updated_at = NOW()
WHERE product_id =%s AND version =%s
"""
result = self.db.execute(query, (new_quantity, update.product_id, current_version))
if result.rowcount == 0:
# Version mismatch - concurrent update detected
raise ConcurrentModificationError("Inventory updated by another transaction")
# Update cache
inventory['quantity'] = new_quantity
inventory['version'] = current_version + 1
self.cache.setex(cache_key, 300, json.dumps(inventory)) # 5 min TTL
# Publish event to Kafka (async)
event = {
'product_id': update.product_id,
'seller_id': update.seller_id,
'event_type': InventoryEventType.UPDATED.value,
'new_quantity': new_quantity,
'timestamp': datetime.now().isoformat()
}
# Check thresholds and publish alerts
if new_quantity == 0:
event['event_type'] = InventoryEventType.OUT_OF_STOCK.value
self.kafka.send('inventory.out_of_stock', event)
elif new_quantity <= inventory.get('low_stock_threshold', 10):
event['event_type'] = InventoryEventType.LOW_STOCK.value
self.kafka.send('inventory.low_stock', event)
self.kafka.send('inventory.updated', event)
return {'success': True, 'new_quantity': new_quantity}
def reserve_inventory(self, product_id: str, quantity: int) -> bool:
"""
Reserve inventory for order (strong consistency required).
Uses database transaction to prevent overselling.
"""
with self.db.transaction():
# Lock row for update (pessimistic locking)
query = """
SELECT quantity, reserved_quantity, available_quantity
FROM inventory
WHERE product_id =%s
FOR UPDATE
"""
inventory = self.db.execute(query, (product_id,)).fetchone()
if inventory['available_quantity'] < quantity:
return False # Insufficient stock
# Update reserved quantity
update_query = """
UPDATE inventory
SET reserved_quantity = reserved_quantity +%s
WHERE product_id =%s
"""
self.db.execute(update_query, (quantity, product_id))
# Invalidate cache
self.cache.delete(f"inventory:{product_id}")
# Publish reservation event
event = {
'product_id': product_id,
'event_type': InventoryEventType.RESERVED.value,
'quantity': quantity,
'timestamp': datetime.now().isoformat()
}
self.kafka.send('inventory.reserved', event)
return TrueAnswer (Part 1 of 3): Architecture & Consistency Model
Microservices architecture separates Inventory Service (write/read operations, cache management, event publishing), Alert Service (consumes Kafka events, sends notifications for low stock/out-of-stock), Monitoring Service (aggregates metrics, dashboard APIs), and Order Service (reserves inventory before payment)—consistency model uses strong consistency for inventory updates (ACID transactions, optimistic locking via version field preventing lost updates) and order reservations (pessimistic locking via SELECT FOR UPDATE preventing overselling), while accepting eventual consistency for monitoring dashboards (Kafka consumers update metrics asynchronously, acceptable 1-2 second lag)—sharding strategy partitions inventory table by seller_id enabling horizontal scaling where each shard handles subset of sellers, with consistent hashing for even distribution, and shard-local transactions avoiding distributed transaction overhead.
Answer (Part 2 of 3): Caching & Event-Driven Design
Caching strategy implements write-through cache where inventory updates write to database first then update Redis cache (cache_key = “inventory:{product_id}”), with 5-minute TTL preventing stale data, and cache invalidation on reservations ensuring consistency—event-driven architecture publishes inventory events to Kafka topics (inventory.updated, inventory.low_stock, inventory.out_of_stock) enabling async processing where Alert Service consumes events triggering seller notifications (email/SMS for low stock), Monitoring Service aggregates metrics (stock availability %, stockout incidents), and Analytics Service tracks inventory trends—idempotency ensures duplicate event processing safe via event_id deduplication in consumers, critical for exactly-once semantics during Kafka rebalancing or retries.
Answer (Part 3 of 3): Scalability & Big Billion Days Handling
Horizontal scaling achieved through stateless Inventory Service instances behind load balancer, database read replicas for query scaling (eventual consistency acceptable for seller dashboard reads), and Kafka partitioning by product_id enabling parallel event processing—Big Billion Days optimizations include pre-warming cache with hot products (top 10k SKUs loaded into Redis before sale), rate limiting per seller (1000 updates/min preventing abuse), batch updates (group multiple SKU updates into single transaction reducing database load), and circuit breakers (fallback to degraded mode if database slow, accept writes to queue for async processing)—monitoring tracks QPS (target 100k updates/sec), p95 latency (<100ms), cache hit rate (>90%), stockout incidents (<5% of SKUs), and Kafka lag (<1 second), with auto-scaling triggers adding instances when CPU >70% or latency >200ms, demonstrating production-ready system design considering reliability, performance, and operational excellence.
5. Real-Time Order Tracking & Logistics System
Difficulty Level: Very High
Role: Data Engineer / Software Engineer (SDE-2/3)
Source: LinkedIn (Sameer Bhardwaj), Flipkart Data Engineering
Topic: Event Streaming & Data Pipelines
Interview Round: System Design + Data Pipeline Discussion (60 min)
Engineering Domain: Data Engineering / Backend Engineering
Question: “Design a real-time order tracking system that captures shipment location data, enriches it with geographical information, and processes it efficiently through both streaming and batch pipelines. Key challenges: Real-time event streaming from shipments, latitude/longitude data enrichment and persistence, batch job optimization and latency reduction, pipeline resilience and failure handling, transitioning from pure batch processing to real-time event-driven architecture.”
Answer Framework
STAR Method Structure:
- Situation: Logistics system processing millions of shipment location updates daily, batch processing causing hours of delay
- Task: Redesign pipeline from batch to real-time streaming, reduce latency from hours to seconds, ensure data enrichment reliability
- Action: Implement Kafka for event streaming, Spark Streaming for real-time processing, pre-enrichment strategy, failure handling
- Result: Latency reduced from 4+ hours to <10 seconds, pipeline resilience improved, batch job execution time reduced 80%
Key Competencies Evaluated:
- Stream Processing: Kafka, Spark Streaming, event-driven architecture
- Data Pipeline Design: Batch vs streaming trade-offs, lambda architecture
- Performance Optimization: Identifying bottlenecks, pre-enrichment strategies
- Failure Handling: Dead letter queues, retry mechanisms, idempotency
Real-Time Tracking Architecture
┌──────────────────────────────────────────────────────────────┐
│ Shipment Events │
│ (Delivery partners scanning packages at checkpoints) │
└────────────┬─────────────────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Kafka Topic │
│ shipment.events│
│ (Partitioned │
│ by order_id) │
└────────┬────────┘
│
┌──────┴──────┬─────────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌──────────┐ ┌─────────────────┐
│ Spark │ │ Flink │ │ Batch Job │
│ Streaming │ │ Consumer │ │ (Hourly │
│ (Real-time│ │ │ │ Aggregation) │
│ Enrich) │ │ │ │ │
└─────┬─────┘ └────┬─────┘ └────────┬────────┘
│ │ │
│ Enriched │ │
│ Events │ │
▼ ▼ ▼
┌──────────────────────────────────────┐
│ Data Lake (S3/HDFS) │
│ - Raw events (Parquet) │
│ - Enriched events (lat/long added) │
│ - Aggregated metrics │
└──────────────┬───────────────────────┘
│
▼
┌────────────────┐
│ PostgreSQL │
│ (Queryable │
│ tracking DB) │
└────────────────┘Streaming Pipeline Implementation
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, udf, from_json
from pyspark.sql.types import StructType, StructField, StringType, DoubleType
import requests
# Initialize Spark Streaming
spark = SparkSession.builder \
.appName("ShipmentTracking") \
.config("spark.streaming.kafka.maxRatePerPartition", "1000") \
.getOrCreate()
# Define schema for shipment events
shipment_schema = StructType([
StructField("order_id", StringType(), False),
StructField("shipment_id", StringType(), False),
StructField("checkpoint_id", StringType(), False),
StructField("timestamp", StringType(), False),
StructField("status", StringType(), False)
])
# Read from Kafka
shipment_stream = spark.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "localhost:9092") \
.option("subscribe", "shipment.events") \
.option("startingOffsets", "latest") \
.load()
# Parse JSON events
parsed_events = shipment_stream.select(
from_json(col("value").cast("string"), shipment_schema).alias("data")
).select("data.*")
# Geo-enrichment UDF (critical optimization)
@udf(returnType=StructType([
StructField("latitude", DoubleType()),
StructField("longitude", DoubleType()),
StructField("city", StringType())
]))
def enrich_location(checkpoint_id):
"""
Pre-fetch lat/long from checkpoint master table (cached).
OLD APPROACH: API call per event → 4+ hours batch processing
NEW APPROACH: Lookup from cache → <10 sec streaming
"""
# Cache checkpoint locations in Redis/local dict
checkpoint_cache = get_checkpoint_cache()
if checkpoint_id in checkpoint_cache:
return checkpoint_cache[checkpoint_id]
else:
# Fallback to API (rare for new checkpoints)
location = fetch_from_geo_api(checkpoint_id)
checkpoint_cache[checkpoint_id] = location
return location
# Apply enrichment
enriched_stream = parsed_events.withColumn(
"location",
enrich_location(col("checkpoint_id"))
).select(
col("order_id"),
col("shipment_id"),
col("checkpoint_id"),
col("timestamp"),
col("status"),
col("location.latitude").alias("latitude"),
col("location.longitude").alias("longitude"),
col("location.city").alias("city")
)
# Write to multiple sinks
# 1. Real-time database (PostgreSQL) for customer tracking
enriched_stream.writeStream \
.format("jdbc") \
.option("url", "jdbc:postgresql://localhost:5432/tracking") \
.option("dbtable", "shipment_locations") \
.option("checkpointLocation", "/tmp/checkpoint/realtime") \
.start()
# 2. Data lake (Parquet) for analytics
enriched_stream.writeStream \
.format("parquet") \
.option("path", "s3://flipkart-datalake/shipments/") \
.option("checkpointLocation", "/tmp/checkpoint/datalake") \
.partitionBy("date") \
.start()
# 3. Dead letter queue for failed enrichments
failed_enrichments = enriched_stream.filter(col("latitude").isNull())
failed_enrichments.writeStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "localhost:9092") \
.option("topic", "shipment.failed") \
.option("checkpointLocation", "/tmp/checkpoint/dlq") \
.start()
spark.streams.awaitAnyTermination()Batch Optimization Strategy
# OLD APPROACH (Problematic)
def batch_process_shipments_OLD():
"""
Accumulated 24 hours of events, then:
1. Fetch all events from database
2. For each event, call geo API for lat/long
3. Update database with enriched data
Problem: 1M events × 500ms API call = 138 hours!
"""
events = db.query("SELECT * FROM shipment_events WHERE enriched = FALSE")
for event in events:
# API call per event (BOTTLENECK)
location = requests.get(f"https://geo-api.com/checkpoint/{event.checkpoint_id}")
db.update(
"UPDATE shipment_events SET lat=%s, lng=%s WHERE id=%s",
(location.lat, location.lng, event.id)
)
# NEW APPROACH (Optimized)
def batch_process_shipments_NEW():
"""
Pre-enrichment strategy:
1. Maintain checkpoint master table with lat/long
2. Stream processing enriches events in real-time
3. Batch job only handles aggregations (no enrichment)
Result: Batch job reduced from 4+ hours to 15 minutes
"""
# Checkpoint master table (updated weekly)
checkpoint_master = db.query("""
SELECT checkpoint_id, latitude, longitude, city
FROM checkpoint_locations
""")
# Load into distributed cache
checkpoint_cache = {row['checkpoint_id']: row for row in checkpoint_master}
# Batch aggregation (no API calls)
aggregated = spark.sql("""
SELECT
order_id,
MAX(timestamp) as last_update,
COLLECT_LIST(STRUCT(checkpoint_id, latitude, longitude, timestamp)) as journey
FROM shipment_events
WHERE date = CURRENT_DATE - 1
GROUP BY order_id
""")
aggregated.write.mode("overwrite").parquet("s3://analytics/daily_shipments/")Answer (Part 1 of 3): Streaming vs Batch Trade-offs
Critical insight from interview: batch processing at scale hides dangerous inefficiencies where accumulating events then processing in bulk creates illusion of efficiency (single batch job vs continuous processing) but reality is sequential API calls become bottleneck (1M events × 500ms = 138 hours theoretical, 4+ hours actual with parallelization)—solution shifts enrichment from batch to stream by pre-fetching checkpoint locations into master table (updated weekly via separate pipeline), caching in Redis/memory, and enriching events in real-time as they arrive (Spark Streaming UDF lookup from cache <1ms vs API call 500ms), reducing per-event processing from 500ms to <1ms enabling true real-time tracking—batch job now only handles aggregations (daily summaries, analytics) not enrichment, reducing execution time from 4+ hours to 15 minutes.
Answer (Part 2 of 3): Failure Handling & Idempotency
Failure handling implements dead letter queue (DLQ) where events failing enrichment (checkpoint_id not in cache, null lat/long) written to shipment.failed Kafka topic for manual review and reprocessing, preventing pipeline blockage—retry mechanism uses exponential backoff for transient failures (network timeout, database unavailable) with max 3 retries before DLQ, and circuit breaker pattern stopping retries if failure rate >10% preventing cascade failures—idempotency ensures duplicate event processing safe via unique event_id deduplication in database (UPSERT based on shipment_id + checkpoint_id + timestamp), critical for exactly-once semantics when Kafka consumers rebalance or Spark checkpoints recover from failure—checkpointing stores Spark Streaming state to HDFS enabling recovery from last committed offset after crash, with checkpoint interval 10 seconds balancing recovery granularity vs overhead.
Answer (Part 3 of 3): Production Monitoring & Optimization
Monitoring tracks stream processing lag (Kafka consumer lag <1000 messages indicating real-time processing), enrichment success rate (>99.5% events successfully enriched), DLQ volume (spike indicates checkpoint master table outdated), and end-to-end latency (event timestamp to database write <10 seconds p95)—optimization includes Kafka partitioning by order_id enabling parallel processing while maintaining order within shipment, Spark Streaming micro-batching (5-second intervals balancing latency vs throughput), and database write batching (accumulate 1000 enriched events before JDBC write reducing connection overhead)—cost optimization uses S3 lifecycle policies archiving old tracking data to Glacier after 90 days, Parquet compression reducing storage 70%, and spot instances for batch aggregation jobs (non-critical, can tolerate interruptions), demonstrating data engineering maturity considering performance, reliability, and cost trade-offs.
6. E-Commerce Platform Design with Search, Cart, Checkout & Payment
Difficulty Level: Very High
Role: Senior Software Engineer (SDE-2/3)
Source: YouTube System Design Interviews, Flipkart HLD Round
Topic: High-Level System Design
Interview Round: System Design Round (60+ min)
Engineering Domain: Backend Engineering / Platform Architecture
Question: “Design a complete e-commerce platform (similar to Flipkart) covering product search, cart management, checkout process, payment integration, and order fulfillment. Core components: Search Service (handle millions of products with low latency and typo tolerance), Catalog Service (product management and metadata), Cart Service (add/remove items, cart persistence), Checkout Service (order placement with inventory reservation), Payment Service (integration with payment gateways with failure handling), Inventory Service (real-time stock management and reservation), Order Management Service (order tracking and fulfillment), Serviceability Service (delivery feasibility check, pin code availability), Reconciliation Service (consistency checks across orders and inventory).”
Answer Framework
STAR Method Structure:
- Situation: E-commerce platform requiring sub-second search, reliable checkout, payment processing, and inventory consistency
- Task: Design distributed system handling millions of products, thousands of concurrent checkouts, payment failures, inventory synchronization
- Action: Microservices architecture with Elasticsearch for search, RDBMS for transactions, Redis for caching, message queues for async processing
- Result: System supporting 10k+ orders/min, <200ms search latency, 99.99% payment success rate, zero overselling via inventory reservation
Key Competencies Evaluated:
- Microservices Design: Service boundaries, API contracts, inter-service communication
- Transactional Consistency: ACID for orders/payments, eventual consistency for search
- Failure Handling: Payment timeouts, inventory rollback, idempotency
- Scalability: Horizontal scaling, caching, database sharding
E-Commerce System Architecture
┌──────────────┐
│ API Gateway │
│ (Auth, Rate │
│ Limiting) │
└──────┬───────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐
│ Search │ │ Cart │ │ Checkout │
│ Service │ │ Service │ │ Service │
│(Elastic) │ │ (Redis) │ │ │
└──────────┘ └──────────┘ └────┬─────┘
│
┌─────────────────┼─────────────┐
│ │ │
┌────▼─────┐ ┌────▼────┐ ┌───▼──────┐
│Inventory │ │ Payment │ │Serviceab.│
│ Service │ │ Service │ │ Service │
│ │ │ │ │ │
└────┬─────┘ └────┬────┘ └──────────┘
│ │
└────────┬───────┘
│
┌─────▼──────┐
│ Order │
│ Management │
│ Service │
└────────────┘Critical Design Decision: Inventory-Payment Ordering
# WRONG APPROACH (Overselling Risk)
def checkout_WRONG(cart_items, payment_info):
# 1. Process payment first
payment_result = payment_service.charge(payment_info)
if payment_result.success:
# 2. Reserve inventory (TOO LATE!)
inventory_result = inventory_service.reserve(cart_items)
if not inventory_result.success:
# PROBLEM: Payment succeeded but no inventory
# Now must refund (slow, poor UX)
payment_service.refund(payment_result.transaction_id)
return {"error": "Out of stock"}
return {"order_id": create_order()}
# CORRECT APPROACH (Prevent Overselling)
def checkout_CORRECT(cart_items, payment_info):
# 1. Reserve inventory FIRST (pessimistic lock)
reservation_id = inventory_service.reserve(cart_items, ttl=300) # 5 min
if not reservation_id:
return {"error": "Out of stock"}
try:
# 2. Process payment (inventory already reserved)
payment_result = payment_service.charge(payment_info)
if payment_result.success:
# 3. Confirm reservation → convert to sale
inventory_service.confirm_reservation(reservation_id)
order_id = create_order(payment_result, reservation_id)
return {"order_id": order_id}
else:
# Payment failed → release reservation
inventory_service.release_reservation(reservation_id)
return {"error": "Payment failed"}
except PaymentTimeoutException:
# Timeout → release reservation
inventory_service.release_reservation(reservation_id)
return {"error": "Payment timeout"}Inventory Service Implementation
class InventoryService:
def __init__(self, db, cache):
self.db = db
self.cache = cache # Redis
def reserve(self, items: List[CartItem], ttl: int = 300) -> str:
"""
Reserve inventory with TTL (auto-release if payment times out).
Uses pessimistic locking to prevent overselling.
"""
reservation_id = generate_uuid()
with self.db.transaction():
for item in items:
# Lock row for update (prevents concurrent reservations)
query = """
SELECT product_id, available_quantity
FROM inventory
WHERE product_id =%s
FOR UPDATE
"""
inventory = self.db.execute(query, (item.product_id,)).fetchone()
if inventory['available_quantity'] < item.quantity:
raise InsufficientStockError(f"Product{item.product_id} out of stock")
# Update available quantity
self.db.execute("""
UPDATE inventory
SET reserved_quantity = reserved_quantity +%s,
available_quantity = available_quantity -%s
WHERE product_id =%s
""", (item.quantity, item.quantity, item.product_id))
# Store reservation in Redis with TTL
reservation_data = {
'items': [{'product_id': i.product_id, 'quantity': i.quantity} for i in items],
'created_at': datetime.now().isoformat()
}
self.cache.setex(
f"reservation:{reservation_id}",
ttl,
json.dumps(reservation_data)
)
# Schedule auto-release job (if payment doesn't confirm in 5 min)
self.schedule_auto_release(reservation_id, ttl)
return reservation_id
def confirm_reservation(self, reservation_id: str):
"""Convert reservation to confirmed sale."""
reservation = self.cache.get(f"reservation:{reservation_id}")
if not reservation:
raise ReservationExpiredError("Reservation not found or expired")
items = json.loads(reservation)['items']
with self.db.transaction():
for item in items:
# Move from reserved to sold
self.db.execute("""
UPDATE inventory
SET reserved_quantity = reserved_quantity -%s,
sold_quantity = sold_quantity +%s
WHERE product_id =%s
""", (item['quantity'], item['quantity'], item['product_id']))
# Delete reservation from cache
self.cache.delete(f"reservation:{reservation_id}")
def release_reservation(self, reservation_id: str):
"""Release reservation (payment failed/timeout)."""
reservation = self.cache.get(f"reservation:{reservation_id}")
if not reservation:
return # Already released or expired
items = json.loads(reservation)['items']
with self.db.transaction():
for item in items:
# Return to available quantity
self.db.execute("""
UPDATE inventory
SET reserved_quantity = reserved_quantity -%s,
available_quantity = available_quantity +%s
WHERE product_id =%s
""", (item['quantity'], item['quantity'], item['product_id']))
self.cache.delete(f"reservation:{reservation_id}")Payment Service with Idempotency
class PaymentService:
def __init__(self, payment_gateway, db):
self.gateway = payment_gateway
self.db = db
def charge(self, payment_info: PaymentInfo, idempotency_key: str) -> PaymentResult:
"""
Process payment with idempotency (prevent double charging on retry).
"""
# Check if payment already processed
existing = self.db.execute("""
SELECT transaction_id, status
FROM payments
WHERE idempotency_key =%s
""", (idempotency_key,)).fetchone()
if existing:
# Return cached result (idempotent)
return PaymentResult(
transaction_id=existing['transaction_id'],
status=existing['status']
)
try:
# Call payment gateway (Razorpay, Stripe, etc.)
result = self.gateway.charge(
amount=payment_info.amount,
currency=payment_info.currency,
payment_method=payment_info.method,
timeout=10 # 10 sec timeout
)
# Store result
self.db.execute("""
INSERT INTO payments (
transaction_id, idempotency_key, amount, status, created_at
) VALUES (%s,%s,%s,%s, NOW())
""", (result.transaction_id, idempotency_key, payment_info.amount, result.status))
return result
except PaymentGatewayTimeout:
# Timeout → mark as pending, reconcile later
self.db.execute("""
INSERT INTO payments (
transaction_id, idempotency_key, amount, status, created_at
) VALUES (%s,%s,%s, 'PENDING', NOW())
""", (generate_uuid(), idempotency_key, payment_info.amount))
raise PaymentTimeoutException("Payment gateway timeout")Answer (Part 1 of 3): Microservices & Data Consistency
Microservices architecture separates concerns where Search Service uses Elasticsearch for full-text search with typo tolerance (fuzzy matching, n-grams) and faceted filtering, accepting eventual consistency (product updates propagate via Kafka in 1-2 seconds)—Cart Service uses Redis for fast read/write with TTL-based expiration (abandoned carts deleted after 7 days), persisting to database for logged-in users enabling cross-device access—Checkout Service orchestrates workflow (validate cart → check serviceability → reserve inventory → process payment → create order) using saga pattern for distributed transactions where each step compensates on failure (release inventory if payment fails)—data consistency uses RDBMS (MySQL/PostgreSQL) for orders, payments, and inventory requiring ACID guarantees preventing overselling and payment inconsistencies, while using NoSQL (MongoDB) for product catalog and user reviews accepting eventual consistency for better scalability.
Answer (Part 2 of 3): Inventory Reservation & Payment Ordering
Critical design decision reserves inventory BEFORE payment preventing overselling where concurrent checkouts for last item both reserve inventory (pessimistic lock via SELECT FOR UPDATE), first succeeds and second fails immediately with “out of stock” error (better UX than payment succeeding then refunding)—reservation TTL auto-releases inventory after 5 minutes if payment not confirmed preventing inventory lockup from abandoned checkouts, with background job scanning expired reservations and returning stock to available pool—payment idempotency prevents double charging on retry using idempotency_key (generated client-side, typically order_id + timestamp) where duplicate charge requests return cached result without hitting payment gateway, critical for network failures where client retries uncertain payment status—reconciliation service runs hourly comparing payment gateway transactions with database records, identifying discrepancies (payment succeeded but order not created, payment pending for >1 hour) and triggering manual review or automatic refunds.
Answer (Part 3 of 3): Scalability & Performance Optimization
Horizontal scaling achieved through stateless services behind load balancer, database read replicas for product catalog queries (90% reads, 10% writes), and Redis cluster for distributed caching—search optimization uses Elasticsearch with index sharding by category (electronics, fashion, home) enabling parallel search, autocomplete via edge n-grams (prefix matching), and result caching for popular queries (top 10k searches cached for 5 minutes)—checkout optimization implements optimistic locking for low-contention items (available_quantity > 100) avoiding SELECT FOR UPDATE overhead, falling back to pessimistic locking for high-contention items (last few units during flash sale)—payment optimization batches refunds (process every 15 minutes instead of real-time) reducing payment gateway API calls, and uses webhook callbacks for async payment confirmation instead of polling—monitoring tracks checkout conversion rate (cart → order), payment success rate (>99.9%), inventory accuracy (physical vs system stock), and search relevance (click-through rate on top results), with alerts for anomalies (sudden drop in conversion indicating bug), demonstrating production-ready e-commerce platform design.
7. Product Search System for Millions of Products
Difficulty Level: High
Role: Software Engineer (SDE-2/3)
Source: Internshala, Flipkart System Design Guides
Topic: Search & Information Retrieval
Interview Round: System Design (45-60 min)
Engineering Domain: Backend Engineering / Search Infrastructure
Question: “Design a scalable search system that can handle millions of products on the Flipkart platform with support for fast retrieval, faceted search, and typo tolerance. Requirements: Support search across millions of products, fast query response times (sub-second), faceted search (filter by brand, price, rating, etc.), typo tolerance and fuzzy matching, real-time index updates as inventory changes, horizontal scaling and replication.”
Answer Framework
STAR Method Structure:
- Situation: E-commerce search requiring sub-second latency across 100M+ products with complex filtering and typo handling
- Task: Design search infrastructure balancing relevance, performance, and real-time updates
- Action: Elasticsearch cluster with inverted indexes, TF-IDF ranking, fuzzy matching, faceted aggregations, Redis caching
- Result: <200ms p95 search latency, 95%+ relevance for top 10 results, real-time inventory updates, horizontal scalability
Key Competencies Evaluated:
- Search Technology: Elasticsearch/Solr architecture, inverted indexes, ranking algorithms
- Performance Optimization: Caching, index sharding, query optimization
- Relevance Engineering: TF-IDF, BM25, personalization, A/B testing
- Scalability: Horizontal scaling, replication, failover
Search System Architecture
┌────────────────────────────────────────────────────────────┐
│ Search API Layer │
│ (Query parsing, spell check, autocomplete) │
└────────────┬───────────────────────────────────────────────┘
│
┌──────┴──────┬─────────────┐
│ │ │
┌─────▼─────┐ ┌────▼─────┐ ┌────▼──────┐
│ Redis │ │Elastics- │ │ Personali-│
│ Cache │ │ earch │ │ zation │
│ (Popular │ │ Cluster │ │ Service │
│ queries) │ │ │ │ │
└───────────┘ └────┬─────┘ └───────────┘
│
┌────────┼────────┐
│ │ │
┌────▼───┐ ┌─▼────┐ ┌─▼────┐
│ Shard │ │Shard │ │Shard │
│ 0 │ │ 1 │ │ 2 │
│(0-33M) │ │(33- │ │(66- │
│ │ │ 66M) │ │100M) │
└────────┘ └──────┘ └──────┘Elasticsearch Index Design
{
"settings": {
"number_of_shards": 10,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"product_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"product_synonym",
"edge_ngram_filter"
]
},
"autocomplete_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "edge_ngram_filter"]
}
},
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
},
"product_synonym": {
"type": "synonym",
"synonyms": [
"mobile, phone, smartphone",
"laptop, notebook, computer",
"tv, television"
]
}
}
}
},
"mappings": {
"properties": {
"product_id": {"type": "keyword"},
"title": {
"type": "text",
"analyzer": "product_analyzer",
"fields": {
"keyword": {"type": "keyword"},
"autocomplete": {"type": "text", "analyzer": "autocomplete_analyzer"}
}
},
"description": {"type": "text", "analyzer": "product_analyzer"},
"brand": {"type": "keyword"},
"category": {"type": "keyword"},
"price": {"type": "float"},
"rating": {"type": "float"},
"num_reviews": {"type": "integer"},
"in_stock": {"type": "boolean"},
"created_at": {"type": "date"}
}
}
}Search Query Implementation
from elasticsearch import Elasticsearch
class ProductSearchService:
def __init__(self, es_client: Elasticsearch, cache):
self.es = es_client
self.cache = cache # Redis
def search(self, query: str, filters: dict, page: int = 1, size: int = 20) -> dict:
"""
Multi-field search with faceted filtering and typo tolerance.
"""
cache_key = f"search:{query}:{json.dumps(filters)}:{page}"
# Check cache for popular queries
cached = self.cache.get(cache_key)
if cached:
return json.loads(cached)
# Build Elasticsearch query
must_clauses = [
{
"multi_match": {
"query": query,
"fields": ["title^3", "description^1", "brand^2"],
"type": "best_fields",
"fuzziness": "AUTO", # Typo tolerance
"prefix_length": 2
}
}
]
# Apply filters
filter_clauses = []
if filters.get('brand'):
filter_clauses.append({"terms": {"brand": filters['brand']}})
if filters.get('price_min') or filters.get('price_max'):
filter_clauses.append({
"range": {
"price": {
"gte": filters.get('price_min', 0),
"lte": filters.get('price_max', 999999)
}
}
})
if filters.get('rating_min'):
filter_clauses.append({"range": {"rating": {"gte": filters['rating_min']}}})
if filters.get('in_stock_only'):
filter_clauses.append({"term": {"in_stock": True}})
# Execute search
response = self.es.search(
index="products",
body={
"query": {
"bool": {
"must": must_clauses,
"filter": filter_clauses
}
},
"sort": [
{"_score": {"order": "desc"}},
{"rating": {"order": "desc"}},
{"num_reviews": {"order": "desc"}}
],
"from": (page - 1) * size,
"size": size,
"aggs": {
"brands": {"terms": {"field": "brand", "size": 50}},
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{"to": 1000},
{"from": 1000, "to": 5000},
{"from": 5000, "to": 10000},
{"from": 10000}
]
}
},
"avg_rating": {"avg": {"field": "rating"}}
}
}
)
result = {
"total": response['hits']['total']['value'],
"products": [hit['_source'] for hit in response['hits']['hits']],
"facets": {
"brands": response['aggregations']['brands']['buckets'],
"price_ranges": response['aggregations']['price_ranges']['buckets']
}
}
# Cache popular queries (TTL 5 min)
if page == 1:
self.cache.setex(cache_key, 300, json.dumps(result))
return result
def autocomplete(self, prefix: str, limit: int = 10) -> List[str]:
"""Autocomplete suggestions using edge n-grams."""
response = self.es.search(
index="products",
body={
"query": {
"match": {
"title.autocomplete": {
"query": prefix,
"operator": "and"
}
}
},
"size": limit,
"_source": ["title"]
}
)
return [hit['_source']['title'] for hit in response['hits']['hits']]Answer (Part 1 of 3): Elasticsearch Architecture & Indexing
Elasticsearch cluster uses inverted index data structure mapping terms → document IDs enabling fast full-text search where query “samsung phone” tokenizes to [“samsung”, “phone”], looks up posting lists (samsung: [doc1, doc5, doc9], phone: [doc1, doc3, doc5]), intersects lists finding matching documents [doc1, doc5], and ranks by relevance score (BM25 algorithm considering term frequency, inverse document frequency, field length normalization)—index sharding partitions 100M products across 10 shards (10M products each) enabling parallel search where query executes on all shards simultaneously, results merged and sorted by coordinator node, with 2 replicas per shard providing high availability (total 30 nodes: 10 primary + 20 replicas) and read scalability (queries distributed across replicas)—custom analyzer implements edge n-grams for autocomplete (tokenizing “smartphone” → [“sm”, “sma”, “smar”, “smart”, “smartp”, “smartph”, “smartpho”, “smartphon”, “smartphone”]), synonym filter (mobile = phone = smartphone), and ASCII folding (café → cafe) improving search recall.
Answer (Part 2 of 3): Relevance Engineering & Typo Tolerance
Relevance ranking uses multi-field search with boosting where title^3 (3x weight), brand^2 (2x weight), description^1 (1x weight) prioritizing title matches over description, combined with BM25 scoring and secondary sorting by rating and review count breaking ties for equally relevant products—typo tolerance implements fuzzy matching with fuzziness=“AUTO” allowing 1-character edit distance for words 3-5 characters (e.g., “phon” matches “phone”), 2-character edit distance for words >5 characters (e.g., “samsng” matches “samsung”), with prefix_length=2 requiring first 2 characters exact match preventing excessive false positives—faceted search uses aggregations computing brand distribution, price ranges, average rating in single query enabling filter UI without additional requests, with terms aggregation for categorical facets (brand, category) and range aggregation for numerical facets (price, rating).
Answer (Part 3 of 3): Caching & Real-Time Updates
Caching strategy stores popular search results in Redis with 5-minute TTL where cache key includes query + filters + page, cache hit rate >60% for top 1000 queries reducing Elasticsearch load, and cache invalidation on product updates (price change, stock update) via Kafka event triggering selective cache deletion—real-time indexing uses near-real-time (NRT) search where product updates written to Elasticsearch refresh every 1 second (configurable trade-off between freshness and indexing overhead), with bulk indexing API batching 1000 updates reducing HTTP overhead, and update-by-query for mass updates (price changes during sale)—performance optimization includes result pagination using search_after cursor (more efficient than from/size for deep pagination), query result caching (identical queries return cached results), and index warming (preloading frequently accessed data into memory), achieving <200ms p95 latency for 95% of queries, demonstrating production-ready search system design.
8. Recommendation System Design
Difficulty Level: Very High
Role: Software Engineer / ML Engineer (SDE-2/3)
Source: Internshala, Flipkart ML System Design
Topic: Machine Learning Systems & Recommendations
Interview Round: System Design (60+ min)
Engineering Domain: Search & Recommendations / ML Infrastructure
Question: “Design a scalable recommendation engine for an e-commerce platform using both collaborative filtering and content-based approaches. Key components: Collaborative Filtering (recommend based on user behavior patterns and similar users), Content-Based Filtering (recommend based on product attributes and user preferences), Machine Learning Models (improve recommendations over time with feedback loops), User Segmentation (demographic-based recommendations), Real-time Updates (fresh recommendations based on recent browsing history). Additional complexity: Handling cold-start problem for new users, explainability of recommendations, A/B testing framework, performance optimization for real-time serving (<100ms latency).”
Answer Framework
STAR Method Structure:
- Situation: E-commerce platform needing personalized recommendations driving 30%+ of revenue, requiring real-time serving at scale
- Task: Design hybrid recommendation system combining collaborative filtering, content-based filtering, and contextual signals
- Action: Offline training (matrix factorization, deep learning), online serving (feature store, model serving), A/B testing framework
- Result: 25% increase in click-through rate, <100ms p95 serving latency, cold-start handling via content-based fallback
Key Competencies Evaluated:
- ML System Design: Offline training vs online serving, feature engineering, model deployment
- Recommendation Algorithms: Collaborative filtering (matrix factorization, ALS), content-based (TF-IDF, embeddings)
- Scalability: Real-time feature computation, distributed model serving, caching
- Metrics & Experimentation: A/B testing, CTR, conversion rate, diversity metrics
Recommendation System Architecture
┌────────────────────────────────────────────────────────────┐
│ Recommendation API │
│ (User context, real-time features, model serving) │
└────────────┬───────────────────────────────────────────────┘
│
┌──────┴──────┬─────────────┬──────────────┐
│ │ │ │
┌─────▼─────┐ ┌────▼─────┐ ┌────▼──────┐ ┌────▼──────┐
│Collabor- │ │ Content- │ │ Contextual│ │ Ranking │
│ ative │ │ Based │ │ Bandits │ │ Model │
│ Filtering │ │ Filtering│ │ │ │ (XGBoost)│
│ (ALS) │ │ (Embed.) │ │ │ │ │
└─────┬─────┘ └────┬─────┘ └────┬──────┘ └────┬──────┘
│ │ │ │
└────────────┴─────────────┴─────────────┘
│
┌──────▼──────┐
│ Feature │
│ Store │
│ (Redis) │
└──────┬──────┘
│
┌────────────┴────────────┐
│ │
┌─────▼─────┐ ┌───────▼────────┐
│ User │ │ Product │
│ Features │ │ Features │
│ (Offline) │ │ (Offline) │
└───────────┘ └────────────────┘Collaborative Filtering Implementation
from pyspark.ml.recommendation import ALS
from pyspark.sql import SparkSession
class CollaborativeFilteringModel:
def __init__(self, spark: SparkSession):
self.spark = spark
self.model = None
def train(self, interactions_df):
"""
Train ALS (Alternating Least Squares) model on user-item interactions.
interactions_df: (user_id, product_id, rating/implicit_score)
"""
# ALS hyperparameters
als = ALS(
maxIter=10,
regParam=0.1,
userCol="user_id",
itemCol="product_id",
ratingCol="rating",
coldStartStrategy="drop", # Handle cold-start
implicitPrefs=True # Implicit feedback (views, clicks, purchases)
)
# Train model
self.model = als.fit(interactions_df)
# Generate user and item embeddings
user_factors = self.model.userFactors # (user_id, features[])
item_factors = self.model.itemFactors # (product_id, features[])
# Store embeddings in feature store
self._store_embeddings(user_factors, item_factors)
def recommend_for_user(self, user_id: int, num_recommendations: int = 10):
"""Generate top-N recommendations for user."""
user_df = self.spark.createDataFrame([(user_id,)], ["user_id"])
recommendations = self.model.recommendForUserSubset(user_df, num_recommendations)
return recommendations.collect()[0]['recommendations']
def _store_embeddings(self, user_factors, item_factors):
"""Store embeddings in Redis for real-time serving."""
# Convert to dict and store in Redis
for row in user_factors.collect():
redis_client.hset(
f"user_embedding:{row.id}",
mapping={"embedding": json.dumps(row.features.tolist())}
)
for row in item_factors.collect():
redis_client.hset(
f"item_embedding:{row.id}",
mapping={"embedding": json.dumps(row.features.tolist())}
)Content-Based Filtering
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
class ContentBasedRecommender:
def __init__(self):
self.vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')
self.product_vectors = None
self.product_ids = None
def fit(self, products_df):
"""
Build product embeddings from text features (title, description, category).
"""
# Combine text features
products_df['combined_features'] = (
products_df['title'] + ' ' +
products_df['description'] + ' ' +
products_df['category'] + ' ' +
products_df['brand']
)
# Generate TF-IDF vectors
self.product_vectors = self.vectorizer.fit_transform(
products_df['combined_features']
)
self.product_ids = products_df['product_id'].values
def recommend_similar_products(self, product_id: str, num_recommendations: int = 10):
"""Find similar products based on content similarity."""
# Get product index
product_idx = np.where(self.product_ids == product_id)[0][0]
# Compute cosine similarity
similarities = cosine_similarity(
self.product_vectors[product_idx],
self.product_vectors
).flatten()
# Get top-N similar products (excluding self)
similar_indices = similarities.argsort()[-num_recommendations-1:-1][::-1]
return [
{'product_id': self.product_ids[idx], 'similarity': similarities[idx]}
for idx in similar_indices
]
def recommend_for_user_profile(self, user_profile: dict, num_recommendations: int = 10):
"""
Recommend based on user preferences (cold-start handling).
user_profile: {'preferred_categories': [...], 'preferred_brands': [...]}
"""
# Build user profile vector
profile_text = ' '.join(
user_profile.get('preferred_categories', []) +
user_profile.get('preferred_brands', [])
)
user_vector = self.vectorizer.transform([profile_text])
# Find matching products
similarities = cosine_similarity(user_vector, self.product_vectors).flatten()
top_indices = similarities.argsort()[-num_recommendations:][::-1]
return [
{'product_id': self.product_ids[idx], 'score': similarities[idx]}
for idx in top_indices
]Hybrid Recommendation Service
class HybridRecommendationService:
def __init__(self, cf_model, cb_model, ranking_model, feature_store):
self.cf_model = cf_model
self.cb_model = cb_model
self.ranking_model = ranking_model
self.feature_store = feature_store # Redis
def get_recommendations(self, user_id: int, context: dict, num_recommendations: int = 10):
"""
Hybrid recommendations combining multiple signals.
context: {'device': 'mobile', 'time_of_day': 'evening', 'location': 'Bangalore'}
"""
# 1. Collaborative Filtering candidates
cf_candidates = self.cf_model.recommend_for_user(user_id, num_recommendations=50)
# 2. Content-Based candidates (based on recent views)
recent_views = self._get_recent_views(user_id, limit=5)
cb_candidates = []
for product_id in recent_views:
cb_candidates.extend(
self.cb_model.recommend_similar_products(product_id, num_recommendations=10)
)
# 3. Merge candidates (deduplication)
all_candidates = self._merge_candidates(cf_candidates, cb_candidates)
# 4. Feature engineering for ranking
features = self._extract_features(user_id, all_candidates, context)
# 5. Ranking model (XGBoost/LightGBM)
scores = self.ranking_model.predict(features)
# 6. Sort by score and return top-N
ranked_candidates = sorted(
zip(all_candidates, scores),
key=lambda x: x[1],
reverse=True
)[:num_recommendations]
return [candidate for candidate, score in ranked_candidates]
def _extract_features(self, user_id, candidates, context):
"""Extract features for ranking model."""
features = []
# User features (from feature store)
user_features = self.feature_store.hgetall(f"user_features:{user_id}")
for candidate in candidates:
# Product features
product_features = self.feature_store.hgetall(f"product_features:{candidate}")
# Interaction features
user_product_features = {
'user_category_affinity': self._get_category_affinity(user_id, product_features['category']),
'user_brand_affinity': self._get_brand_affinity(user_id, product_features['brand']),
'price_vs_user_avg': product_features['price'] / user_features.get('avg_purchase_price', 1000)
}
# Contextual features
contextual_features = {
'is_mobile': context['device'] == 'mobile',
'is_evening': context['time_of_day'] == 'evening',
'is_weekend': context.get('is_weekend', False)
}
# Combine all features
combined = {**user_features, **product_features, **user_product_features, **contextual_features}
features.append(combined)
return features
def _get_recent_views(self, user_id, limit=5):
"""Get user's recent product views from Redis."""
return self.feature_store.lrange(f"user_recent_views:{user_id}", 0, limit-1)Answer (Part 1 of 3): Collaborative vs Content-Based Filtering
Collaborative filtering uses matrix factorization (ALS) learning latent user and item embeddings where user_embedding · item_embedding predicts interaction score, trained on implicit feedback (views, clicks, purchases) not explicit ratings, handling sparsity via regularization and low-rank approximation (50-100 dimensions)—content-based filtering uses TF-IDF vectorization of product text (title, description, category, brand) computing cosine similarity between products, enabling “similar items” recommendations and cold-start handling for new users (recommend based on stated preferences not interaction history)—hybrid approach combines both where collaborative filtering provides personalization (users who bought X also bought Y), content-based provides diversity (similar to items you viewed), and ranking model (XGBoost) learns optimal weighting of signals based on context (mobile users prefer visual products, evening users prefer entertainment).
Answer (Part 2 of 3): Cold-Start & Real-Time Serving
Cold-start handling uses content-based fallback for new users (recommend popular items in preferred categories), demographic-based recommendations (users in same age/location segment), and contextual bandits (explore-exploit trade-off learning user preferences quickly via multi-armed bandit algorithms)—real-time serving stores precomputed embeddings in Redis feature store (user_embedding, item_embedding updated daily via batch job), computes dot product user · item for candidate generation (<10ms for 1000 candidates), and applies ranking model for final ordering (<50ms for 100 candidates), achieving <100ms p95 end-to-end latency—feature store maintains user features (purchase history, category affinity, avg price), product features (category, brand, price, rating), and real-time features (recent views, cart items, current session behavior) with TTL-based expiration preventing stale data.
Answer (Part 3 of 3): A/B Testing & Metrics
A/B testing framework randomly assigns users to control (existing recommendations) vs treatment (new algorithm), tracking metrics (CTR, conversion rate, revenue per user, diversity, serendipity) with statistical significance testing (t-test, p-value <0.05) and minimum detectable effect (5% CTR improvement)—evaluation metrics include online metrics (CTR, conversion rate, revenue, user engagement time) and offline metrics (precision@K, recall@K, NDCG, diversity measuring catalog coverage, serendipity measuring unexpected but relevant recommendations)—model retraining runs daily batch job training on previous 30 days of interactions, validates on holdout set (last 7 days), deploys if offline metrics improve >2%, with gradual rollout (10% → 50% → 100% traffic) monitoring for regressions—explainability provides recommendation reasons (“Customers who bought X also bought Y”, “Based on your recent views”, “Popular in Electronics”) improving user trust and engagement, demonstrating production ML system design considering accuracy, latency, explainability, and business impact.
9. Financial Database Schema & Query Optimization
Difficulty Level: High
Role: Data Engineer / Software Engineer (SDE-2)
Source: GeeksforGeeks Interview Experience, Flipkart Data Engineering
Topic: Database Design & Query Optimization
Interview Round: System Design (Design Round, 60 min)
Engineering Domain: Data Engineering / Backend Engineering
Question: “Design a financial database system for managing transactions, orders, and financial reconciliation with optimized query performance. Design challenges: Schema Design (create normalized data models for financial transactions), Partition Keys (choose appropriate keys for horizontal partitioning), Query Optimization (write efficient queries on large financial datasets), ACID Compliance (ensure transaction integrity), Indexing Strategy (optimize for read-heavy financial queries).”
Answer Framework
STAR Method Structure:
- Situation: Financial system requiring ACID guarantees, audit trails, and fast query performance on billions of transactions
- Task: Design normalized schema with proper partitioning, indexing, and query optimization for financial workloads
- Action: Implement 3NF schema, partition by date, create composite indexes, use materialized views for aggregations
- Result: <500ms p95 query latency on 1B+ transactions, zero data loss via ACID compliance, audit trail for compliance
Key Competencies Evaluated:
- Database Design: Normalization (3NF), entity relationships, foreign keys
- Partitioning Strategy: Time-based partitioning, partition pruning
- Query Optimization: Index selection, query rewriting, execution plan analysis
- ACID Compliance: Transaction isolation, consistency guarantees
Financial Database Schema
-- Orders Table (Partitioned by order_date)
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
order_date DATE NOT NULL,
total_amount DECIMAL(12, 2) NOT NULL,
status ENUM('PENDING', 'CONFIRMED', 'SHIPPED', 'DELIVERED', 'CANCELLED'),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
INDEX idx_user_date (user_id, order_date),
INDEX idx_status_date (status, order_date)
) PARTITION BY RANGE (YEAR(order_date) * 100 + MONTH(order_date)) (
PARTITION p202401 VALUES LESS THAN (202402),
PARTITION p202402 VALUES LESS THAN (202403),
PARTITION p202403 VALUES LESS THAN (202404),
-- ... monthly partitions
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Order Items (Normalized)
CREATE TABLE order_items (
order_item_id BIGINT PRIMARY KEY AUTO_INCREMENT,
order_id BIGINT NOT NULL,
product_id BIGINT NOT NULL,
quantity INT NOT NULL,
unit_price DECIMAL(10, 2) NOT NULL,
subtotal DECIMAL(12, 2) GENERATED ALWAYS AS (quantity * unit_price) STORED,
FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE,
INDEX idx_order (order_id),
INDEX idx_product (product_id)
);
-- Payments Table (Financial Transactions)
CREATE TABLE payments (
payment_id BIGINT PRIMARY KEY AUTO_INCREMENT,
order_id BIGINT NOT NULL,
transaction_id VARCHAR(100) UNIQUE NOT NULL,
payment_method ENUM('CREDIT_CARD', 'DEBIT_CARD', 'UPI', 'WALLET', 'COD'),
amount DECIMAL(12, 2) NOT NULL,
status ENUM('PENDING', 'SUCCESS', 'FAILED', 'REFUNDED'),
gateway_response JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
FOREIGN KEY (order_id) REFERENCES orders(order_id),
INDEX idx_transaction (transaction_id),
INDEX idx_order_status (order_id, status),
INDEX idx_created_at (created_at)
) PARTITION BY RANGE (UNIX_TIMESTAMP(created_at)) (
PARTITION p202401 VALUES LESS THAN (UNIX_TIMESTAMP('2024-02-01')),
PARTITION p202402 VALUES LESS THAN (UNIX_TIMESTAMP('2024-03-01')),
-- ... monthly partitions
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Refunds Table (Audit Trail)
CREATE TABLE refunds (
refund_id BIGINT PRIMARY KEY AUTO_INCREMENT,
payment_id BIGINT NOT NULL,
order_id BIGINT NOT NULL,
refund_amount DECIMAL(12, 2) NOT NULL,
reason VARCHAR(500),
status ENUM('INITIATED', 'PROCESSING', 'COMPLETED', 'FAILED'),
initiated_by BIGINT, -- user_id or admin_id
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
completed_at TIMESTAMP NULL,
FOREIGN KEY (payment_id) REFERENCES payments(payment_id),
FOREIGN KEY (order_id) REFERENCES orders(order_id),
INDEX idx_payment (payment_id),
INDEX idx_order (order_id),
INDEX idx_status_date (status, created_at)
);
-- Financial Reconciliation Table
CREATE TABLE financial_reconciliation (
reconciliation_id BIGINT PRIMARY KEY AUTO_INCREMENT,
reconciliation_date DATE NOT NULL,
total_orders BIGINT,
total_revenue DECIMAL(15, 2),
total_refunds DECIMAL(15, 2),
net_revenue DECIMAL(15, 2),
discrepancies JSON,
status ENUM('PENDING', 'RECONCILED', 'DISCREPANCY'),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE KEY unique_date (reconciliation_date),
INDEX idx_status_date (status, reconciliation_date)
);Optimized Query Examples
-- Query 1: User's order history (Optimized with covering index)
-- Uses idx_user_date covering index avoiding table lookup
SELECT
order_id,
order_date,
total_amount,
status
FROM orders
WHERE user_id = 12345
AND order_date >= '2024-01-01'
ORDER BY order_date DESC
LIMIT 20;
-- Execution Plan: Index scan on idx_user_date (covering index)
-- Partition pruning: Only scans partitions >= 2024-01
-- Query 2: Daily revenue aggregation (Materialized view)
CREATE MATERIALIZED VIEW daily_revenue_summary AS
SELECT
DATE(created_at) as revenue_date,
COUNT(DISTINCT order_id) as total_orders,
SUM(CASE WHEN status = 'SUCCESS' THEN amount ELSE 0 END) as total_revenue,
SUM(CASE WHEN status = 'REFUNDED' THEN amount ELSE 0 END) as total_refunds,
COUNT(CASE WHEN status = 'FAILED' THEN 1 END) as failed_transactions
FROM payments
WHERE created_at >= DATE_SUB(CURRENT_DATE, INTERVAL 90 DAYS)
GROUP BY DATE(created_at);
-- Refresh materialized view daily
REFRESH MATERIALIZED VIEW daily_revenue_summary;
-- Query 3: Payment reconciliation (Optimized join)
SELECT
o.order_id,
o.total_amount as order_amount,
COALESCE(SUM(p.amount), 0) as paid_amount,
o.total_amount - COALESCE(SUM(p.amount), 0) as discrepancy
FROM orders o
LEFT JOIN payments p ON o.order_id = p.order_id AND p.status = 'SUCCESS'
WHERE o.order_date = '2024-01-15'
AND o.status != 'CANCELLED'
GROUP BY o.order_id, o.total_amount
HAVING discrepancy != 0;
-- Execution Plan:
-- 1. Partition pruning on orders (p202401)
-- 2. Index scan on payments.idx_order_status
-- 3. Hash join (efficient for equality join)
-- Query 4: Top products by revenue (Optimized aggregation)
SELECT
oi.product_id,
SUM(oi.subtotal) as total_revenue,
COUNT(DISTINCT oi.order_id) as order_count,
AVG(oi.unit_price) as avg_price
FROM order_items oi
INNER JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= '2024-01-01'
AND o.status IN ('CONFIRMED', 'SHIPPED', 'DELIVERED')
GROUP BY oi.product_id
ORDER BY total_revenue DESC
LIMIT 100;
-- Optimization: Create composite index on (order_id, status) in orders
-- Enables index-only scan without table accessPartition Key Selection Strategy
def choose_partition_key(table_name: str, query_patterns: list) -> str:
"""
Choose optimal partition key based on query access patterns.
Financial tables: Partition by time (date/month)
- Most queries filter by date range
- Enables partition pruning (scan only relevant partitions)
- Supports data archival (drop old partitions)
User tables: Partition by user_id hash
- Distributes load evenly across partitions
- Enables parallel query execution
- Avoids hotspots
"""
if table_name in ['orders', 'payments', 'refunds']:
# Time-based partitioning
return 'PARTITION BY RANGE (YEAR(created_at) * 100 + MONTH(created_at))'
elif table_name in ['users', 'user_profiles']:
# Hash partitioning for even distribution
return 'PARTITION BY HASH(user_id) PARTITIONS 16'
else:
# No partitioning for small tables
return None
# Partition Pruning Example
query = """
SELECT * FROM payments
WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01'
"""
# MySQL optimizer automatically prunes to p202401 partition only
# Scans 1/12 of data instead of entire tableAnswer (Part 1 of 3): Schema Normalization & Partitioning
Normalized schema follows 3NF (Third Normal Form) eliminating redundancy where orders table stores order-level data (order_id, user_id, total_amount, status), order_items table stores line-item details (product_id, quantity, unit_price) with foreign key to orders preventing data duplication, and payments table stores transaction details with foreign key to orders enabling multiple payment attempts per order—partitioning strategy uses range partitioning by date for time-series tables (orders partitioned by order_date, payments by created_at) enabling partition pruning where queries filtering by date scan only relevant partitions (1 month partition vs entire table), and supports data lifecycle management (archive old partitions to cold storage after 2 years, drop after 7 years per compliance requirements)—partition key selection considers query patterns where 90% of financial queries filter by date range making date-based partitioning optimal, with monthly granularity balancing partition count (12 partitions/year manageable) vs partition size (millions of rows per partition acceptable).
Answer (Part 2 of 3): Indexing Strategy & Query Optimization
Composite indexes optimize common query patterns where idx_user_date (user_id, order_date) on orders table enables efficient user order history queries (covering index avoiding table lookup), idx_order_status (order_id, status) on payments table supports payment reconciliation queries, and idx_status_date (status, created_at) enables status-based filtering with date range—covering indexes include all columns needed by query in index itself (SELECT order_id, order_date, total_amount WHERE user_id = X uses idx_user_date covering index) avoiding expensive table lookups, reducing query time from 500ms to 50ms for high-cardinality user_id—query optimization uses EXPLAIN ANALYZE identifying bottlenecks (full table scan vs index scan, nested loop vs hash join), rewriting queries to leverage indexes (WHERE created_at >= ‘2024-01-01’ enables partition pruning vs WHERE YEAR(created_at) = 2024 prevents it), and materialized views for expensive aggregations (daily revenue summary precomputed and refreshed nightly instead of computed on every query).
Answer (Part 3 of 3): ACID Compliance & Reconciliation
ACID guarantees ensure transaction integrity where atomicity (payment + order update succeed together or both fail via database transaction), consistency (foreign key constraints prevent orphaned payments, check constraints ensure amount > 0), isolation (READ COMMITTED prevents dirty reads, SERIALIZABLE for critical financial operations), and durability (write-ahead logging ensures committed transactions survive crashes)—reconciliation process runs daily comparing order totals vs payment totals identifying discrepancies (order amount ≠ sum of successful payments), detecting duplicate payments (same transaction_id processed twice), and flagging anomalies (refund > original payment) with automated alerts for manual review—audit trail maintains immutable log of all financial operations (payments, refunds, adjustments) with created_at timestamp, initiated_by user tracking who made changes, and JSON gateway_response storing raw payment gateway data for debugging, enabling compliance with financial regulations (PCI-DSS, SOX) requiring 7-year retention and tamper-proof audit logs, demonstrating production database design for financial systems.
10. Complex Inventory System with Filtering & Sorting
Difficulty Level: Medium-High
Role: Software Engineer (SDE-1/2)
Source: YouTube Interview Walkthroughs, Flipkart Machine Coding
Topic: Low-Level Design & SOLID Principles
Interview Round: Machine Coding Round (90 min)
Engineering Domain: Backend Engineering / Object-Oriented Design
Question: “Design and implement an inventory management system that supports dynamic product listing, filtering by brand/type, and sorting by price/quantity. Functional requirements: Add products to inventory with attributes (name, brand, type, price, quantity), update product quantities, fetch all products, filter products by brand, type, or quantity threshold, sort products by price (ascending/descending) or quantity, ensure code extensibility for new filter/sort criteria. Critical evaluation criteria: Separation of concerns (inventory management logic separate from UI), extensibility (new filters/sorts should require minimal code changes), SOLID principles (Single Responsibility, Open/Closed Principle), error handling (proper exception handling and validation), code quality (clean, readable, well-documented code), test cases (demonstrate functionality with multiple scenarios).”
Answer Framework
STAR Method Structure:
- Situation: Machine coding round requiring clean OOD with extensible filtering and sorting
- Task: Implement inventory system demonstrating Strategy pattern, SOLID principles, and minimal hardcoding
- Action: Define Product entity, InventoryService with business logic, FilterStrategy and SortStrategy interfaces, concrete implementations
- Result: Extensible system where adding new filter/sort requires creating new strategy class without modifying existing code
Key Competencies Evaluated:
- Design Patterns: Strategy pattern for dynamic filtering/sorting
- SOLID Principles: Single Responsibility, Open/Closed, Dependency Inversion
- Code Quality: Clean abstractions, minimal hardcoding, proper error handling
- Extensibility: New requirements accommodated without code changes
Inventory System Implementation
// Product Entity
class Product {
private String id;
private String name;
private String brand;
private String type;
private double price;
private int quantity;
public Product(String id, String name, String brand, String type, double price, int quantity) {
this.id = id;
this.name = name;
this.brand = brand;
this.type = type;
this.price = price;
this.quantity = quantity;
}
// Getters and setters
public String getId() { return id; }
public String getBrand() { return brand; }
public String getType() { return type; }
public double getPrice() { return price; }
public int getQuantity() { return quantity; }
public void setQuantity(int quantity) { this.quantity = quantity; }
@Override
public String toString() {
return String.format("Product{id='%s', name='%s', brand='%s', type='%s', price=%.2f, quantity=%d}",
id, name, brand, type, price, quantity);
}
}
// Filter Strategy Interface (Strategy Pattern)
interface FilterStrategy {
boolean matches(Product product);
}
// Concrete Filter Implementations
class BrandFilter implements FilterStrategy {
private String brand;
public BrandFilter(String brand) {
this.brand = brand;
}
@Override
public boolean matches(Product product) {
return product.getBrand().equalsIgnoreCase(brand);
}
}
class TypeFilter implements FilterStrategy {
private String type;
public TypeFilter(String type) {
this.type = type;
}
@Override
public boolean matches(Product product) {
return product.getType().equalsIgnoreCase(type);
}
}
class QuantityThresholdFilter implements FilterStrategy {
private int threshold;
public QuantityThresholdFilter(int threshold) {
this.threshold = threshold;
}
@Override
public boolean matches(Product product) {
return product.getQuantity() >= threshold;
}
}
// Composite Filter (Combine multiple filters with AND logic)
class CompositeFilter implements FilterStrategy {
private List<FilterStrategy> filters;
public CompositeFilter(List<FilterStrategy> filters) {
this.filters = filters;
}
@Override
public boolean matches(Product product) {
return filters.stream().allMatch(filter -> filter.matches(product));
}
}
// Sort Strategy Interface
interface SortStrategy {
List<Product> sort(List<Product> products);
}
// Concrete Sort Implementations
class PriceAscendingSort implements SortStrategy {
@Override
public List<Product> sort(List<Product> products) {
return products.stream()
.sorted(Comparator.comparingDouble(Product::getPrice))
.collect(Collectors.toList());
}
}
class PriceDescendingSort implements SortStrategy {
@Override
public List<Product> sort(List<Product> products) {
return products.stream()
.sorted(Comparator.comparingDouble(Product::getPrice).reversed())
.collect(Collectors.toList());
}
}
class QuantitySort implements SortStrategy {
@Override
public List<Product> sort(List<Product> products) {
return products.stream()
.sorted(Comparator.comparingInt(Product::getQuantity).reversed())
.collect(Collectors.toList());
}
}
// Inventory Service (Business Logic)
class InventoryService {
private Map<String, Product> inventory;
public InventoryService() {
this.inventory = new HashMap<>();
}
public void addProduct(Product product) {
if (product == null || product.getId() == null) {
throw new IllegalArgumentException("Product and product ID cannot be null");
}
if (inventory.containsKey(product.getId())) {
throw new IllegalStateException("Product with ID " + product.getId() + " already exists");
}
inventory.put(product.getId(), product);
}
public void updateQuantity(String productId, int quantity) {
Product product = inventory.get(productId);
if (product == null) {
throw new IllegalArgumentException("Product not found: " + productId);
}
if (quantity < 0) {
throw new IllegalArgumentException("Quantity cannot be negative");
}
product.setQuantity(quantity);
}
public List<Product> getAllProducts() {
return new ArrayList<>(inventory.values());
}
public List<Product> getFilteredProducts(FilterStrategy filter) {
return inventory.values().stream()
.filter(filter::matches)
.collect(Collectors.toList());
}
public List<Product> getSortedProducts(SortStrategy sortStrategy) {
List<Product> products = new ArrayList<>(inventory.values());
return sortStrategy.sort(products);
}
public List<Product> getFilteredAndSortedProducts(FilterStrategy filter, SortStrategy sortStrategy) {
List<Product> filtered = getFilteredProducts(filter);
return sortStrategy.sort(filtered);
}
}
// Demo Application
public class InventoryManagementDemo {
public static void main(String[] args) {
InventoryService inventory = new InventoryService();
// Add products
inventory.addProduct(new Product("P1", "iPhone 15", "Apple", "Mobile", 79999, 50));
inventory.addProduct(new Product("P2", "Galaxy S24", "Samsung", "Mobile", 74999, 30));
inventory.addProduct(new Product("P3", "MacBook Pro", "Apple", "Laptop", 199999, 10));
inventory.addProduct(new Product("P4", "ThinkPad X1", "Lenovo", "Laptop", 129999, 5));
inventory.addProduct(new Product("P5", "iPad Air", "Apple", "Tablet", 59999, 20));
// Test 1: Filter by brand
System.out.println("=== Apple Products ===");
List<Product> appleProducts = inventory.getFilteredProducts(new BrandFilter("Apple"));
appleProducts.forEach(System.out::println);
// Test 2: Filter by type and sort by price
System.out.println("\n=== Mobile Products (Price Ascending) ===");
List<Product> mobiles = inventory.getFilteredAndSortedProducts(
new TypeFilter("Mobile"),
new PriceAscendingSort()
);
mobiles.forEach(System.out::println);
// Test 3: Composite filter (Apple + Quantity >= 15)
System.out.println("\n=== Apple Products with Quantity >= 15 ===");
List<Product> filtered = inventory.getFilteredProducts(
new CompositeFilter(Arrays.asList(
new BrandFilter("Apple"),
new QuantityThresholdFilter(15)
))
);
filtered.forEach(System.out::println);
// Test 4: Sort all products by quantity
System.out.println("\n=== All Products (Sorted by Quantity) ===");
List<Product> sortedByQuantity = inventory.getSortedProducts(new QuantitySort());
sortedByQuantity.forEach(System.out::println);
// Test 5: Update quantity
System.out.println("\n=== After Updating iPhone Quantity ===");
inventory.updateQuantity("P1", 100);
System.out.println(inventory.getAllProducts().stream()
.filter(p -> p.getId().equals("P1"))
.findFirst()
.orElse(null));
}
}Answer (Part 1 of 3): Strategy Pattern Application
Strategy pattern enables dynamic filtering and sorting without modifying InventoryService by defining FilterStrategy and SortStrategy interfaces, implementing concrete strategies (BrandFilter, TypeFilter, PriceAscendingSort, QuantitySort), and allowing runtime strategy selection via getFilteredProducts(FilterStrategy) and getSortedProducts(SortStrategy)—this follows Open/Closed Principle where adding new filter criteria (PriceRangeFilter, RatingFilter) or sort criteria (NameSort, BrandSort) requires creating new strategy class implementing interface without changing existing code, demonstrating extensibility. InventoryService depends on abstractions (FilterStrategy, SortStrategy) not concrete implementations following Dependency Inversion Principle, enabling easy testing via mock strategies and flexible composition (CompositeFilter combining multiple filters with AND logic).
Answer (Part 2 of 3): SOLID Principles Adherence
Single Responsibility Principle separates concerns where Product class handles product data, FilterStrategy handles filtering logic, SortStrategy handles sorting logic, and InventoryService handles inventory management (add, update, fetch) with each class having single reason to change—Open/Closed Principle achieved via strategy pattern where system open for extension (new strategies) but closed for modification (InventoryService unchanged when adding new filter/sort)—Liskov Substitution Principle ensures any FilterStrategy implementation (BrandFilter, TypeFilter, CompositeFilter) can substitute FilterStrategy interface without breaking functionality, same for SortStrategy—Interface Segregation uses minimal interfaces (FilterStrategy with single matches() method, SortStrategy with single sort() method) avoiding fat interfaces forcing implementations to implement unused methods—Dependency Inversion where InventoryService depends on FilterStrategy/SortStrategy abstractions not concrete BrandFilter/PriceAscendingSort classes, enabling loose coupling and testability.
Answer (Part 3 of 3): Error Handling & Extensibility
Error handling validates inputs throwing IllegalArgumentException for null products, duplicate product IDs, negative quantities, and non-existent product updates, with descriptive error messages aiding debugging—extensibility examples demonstrate adding new filter (PriceRangeFilter checking price between min and max), new sort (MultiCriteriaSort sorting by price then quantity), and composite operations (filter by brand AND type, sort by price DESC then name ASC) all requiring zero changes to InventoryService, only new strategy classes—production enhancements add builder pattern for Product construction (Product.builder().id(“P1”).name(“iPhone”).build()), repository pattern for data persistence (InventoryRepository interface with InMemoryRepository, DatabaseRepository implementations), and pagination for large inventories (getFilteredProducts(filter, page, size) returning Page), demonstrating clean OOD beyond pure coding exercise considering maintainability, testability, and real-world requirements.