IPG Data Analyst & Business Intelligence Analyst

Q: What statistical methods are you familiar with, and how have you applied them in your analyses?

Why This Question Matters at IPG: IPG Mediabrands manages over $47 billion in marketing investments globally. This question tests your ability to demonstrate end-to-end analytical thinking, technical proficiency, and business impact—critical for optimizing advertising strategies across major global brands. STAR Framework for Strong Responses: Situation & Objective: Business Context: - Client/stakeholder need - Business problem to solve - Success criteria defined upfront - Budget/timeline constra

Q: For Interpublic Group, write a SQL query to identify ‘whale customers’ who have either launched more than five marketing campaigns in the last six months OR have an average ad spend of over $10,000 per month.

Why This Question Matters at IPG: IPG Mediabrands uses statistical methods to optimize $47B in annual marketing spend. This tests your theoretical knowledge AND practical application—critical for data-driven campaign decisions, A/B testing, and audience segmentation. Core Statistical Methods for Marketing Analytics: 1. A/B Testing & Hypothesis Testing: import scipy.stats as stats import numpy as np class ABTestAnalyzer: def __init__(self, control, treatment, alpha=0.05): self.control = control s

Q: How do you handle missing data in your analyses?

Why This Question Matters at IPG: “Whale customers” are high-value clients who drive significant revenue. At IPG Mediabrands ($47B in marketing investments), identifying and retaining these clients is critical for business growth. This SQL question tests your ability to work with CTEs, aggregations, date filtering, and complex business logic—all essential for marketing analytics. Database Schema: campaigns table: CREATE TABLE campaigns ( campaign_id INT PRIMARY KEY, customer_id INT, campaign_nam

Q: Write a SQL query to calculate the click-through-rate (CTR) for each ad campaign. CTR = (total clicks / total impressions) × 100.

Why This Question Matters at IPG: Marketing campaign data is notoriously messy—tracking pixels fail, attribution breaks, third-party data has gaps. At IPG, making wrong decisions due to poor data handling can cost millions in client spend. This tests your practical data cleaning skills and understanding of when different approaches are appropriate. Decision Framework for Missing Data: import pandas as pd import numpy as np from sklearn.impute import KNNImputer, SimpleImputer import matplotlib.py

Q: How do you prioritize multiple data platform requests or tasks when you have competing deadlines?

Why This Question Matters at IPG: CTR is a fundamental digital marketing metric. At IPG, you’ll calculate CTR daily across thousands of campaigns to optimize creative, targeting, and budget allocation. This tests basic SQL aggregation and your understanding of marketing KPIs. Database Schema: CREATE TABLE ad_campaigns ( campaign_id INT, campaign_name VARCHAR(255), date DATE, impressions INT, clicks INT); Sample Data: INSERT INTO ad_campaigns VALUES(101, 'Summer Sale 2024', '2024-06-01', 5000, 40

Q: What data visualization tools have you used, and how have you applied them to communicate findings?

Why This Question Matters at IPG: IPG Mediabrands manages 47B in marketing investments across hundreds of clients. You’ll constantly face competing priorities: urgent client dashboards, quarterly business reviews, ad-hoc analyses, and platform maintenance. This tests organizational skills and stakeholder management—critical for success in a fast-paced agency environment. Prioritization Framework: 1. Eisenhower Matrix (Urgent vs. Important): HIGH IMPORTANCE ↑ | QUADRANT 2: | QUADRANT 1: | Strateg

Q: Tell me about a time when you used data to influence a major business decision. What was the situation and what was the outcome?

Why This Question Matters at IPG: At IPG, you’ll create client-facing dashboards and present to Fortune 500 CMOs. This tests both technical tool proficiency (Tableau, Power BI) and data storytelling abilities—translating complex analyses into actionable insights for non-technical stakeholders. Tool Proficiency Overview: Dashboard Design Principles: 1. Understand Your Audience:

Q: Explain the difference between an inner join and an outer join in SQL. When would you use each?

Why This Question Matters at IPG: IPG Mediabrands advises clients on $47B in marketing investments annually. This tests your ability to translate analytics into actionable insights that drive business results—demonstrating strategic thinking beyond tactical analysis and the influence/persuasion skills needed in a client-facing agency environment. STAR Method Framework: Situation: - Context: What was the business challenge or opportunity? - Stakes: Why did this matter? (revenue, efficiency, clien

Q: Write a SQL query using window functions to calculate the average performance score for each employee over their last three projects.

Why This Question Matters at IPG: At IPG, you’ll constantly join campaign data with customer data, performance metrics with budget data, and impressions with conversions. Understanding joins is fundamental to accurately analyzing marketing data and avoiding incorrect conclusions from incomplete datasets. SQL Joins Overview: 1. INNER JOIN - Only Matching Records: -- Returns only rows where there's a match in BOTH tablesSELECT c.campaign_name, c.impressions, p.conversions, p.revenue FROM campaigns

Marketing Analytics & Project Experience

1. Walk Me Through a Data Analytics Project You’ve Worked On

Level: Data Analyst to Senior Data Analyst

Difficulty: Medium-High

Source: InterviewQuery IPG Mediabrands Data Analyst Interview Guide (February 2024)

Team: IPG Mediabrands Analytics, Initiative, UM, Mediahub

Interview Round: Technical + Behavioral

Question: “Walk me through a data analytics project you’ve worked on. What was the objective, what methods did you use, and what results did you achieve?”

Answer:

Why This Question Matters at IPG:
IPG Mediabrands manages over $47 billion in marketing investments globally. This question tests your ability to demonstrate end-to-end analytical thinking, technical proficiency, and business impact—critical for optimizing advertising strategies across major global brands.

STAR Framework for Strong Responses:

Situation & Objective:

Business Context:
- Client/stakeholder need
- Business problem to solve
- Success criteria defined upfront
- Budget/timeline constraints

Data & Methods:

# Example Project: Campaign Performance Optimization# 1. DATA COLLECTIONimport pandas as pd
import numpy as np
from sqlalchemy import create_engine
# Connect to marketing data warehouseengine = create_engine('postgresql://user:pass@host/marketing_db')
# Extract campaign dataquery = """SELECT    campaign_id,    campaign_name,    channel,    impressions,    clicks,    conversions,    ad_spend,    DATE(timestamp) as dateFROM campaignsWHERE date BETWEEN '2024-01-01' AND '2024-03-31'"""df = pd.read_sql(query, engine)
# 2. DATA CLEANING# Handle missing valuesdf['conversions'].fillna(0, inplace=True)
# Remove outliers (spend > 3 std deviations)df = df[np.abs(df['ad_spend'] - df['ad_spend'].mean()) <= (3 * df['ad_spend'].std())]
# 3. FEATURE ENGINEERING# Calculate key marketing metricsdf['ctr'] = (df['clicks'] / df['impressions']) * 100df['conversion_rate'] = (df['conversions'] / df['clicks']) * 100df['cpc'] = df['ad_spend'] / df['clicks']
df['cpa'] = df['ad_spend'] / df['conversions']
df['roas'] = (df['conversions'] * avg_order_value) / df['ad_spend']
# 4. ANALYSIS: Regression for channel effectivenessfrom sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
# Prepare featuresX = pd.get_dummies(df[['channel', 'impressions', 'clicks']], drop_first=True)
y = df['conversions']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
# Identify most effective channelsfeature_importance = pd.DataFrame({
    'feature': X.columns,
    'coefficient': model.coef_
}).sort_values('coefficient', ascending=False)
print("Top performing channels:")
print(feature_importance.head())

Tools & Techniques:
- SQL: Data extraction from data warehouse
- Python (Pandas): Data cleaning, transformation, feature engineering
- Statistical Analysis: Regression analysis to identify channel effectiveness
- Visualization: Tableau dashboard for stakeholder presentation
- A/B Testing: Statistical significance testing for creative variants

Insights Discovered:

Key Findings:
1. Social media had 2.3x higher CTR than display ads
2. Video campaigns drove 40% lower CPA than static ads
3. Morning time slots (9-11 AM) had 18% higher conversion rates
4. Mobile traffic had higher volume but desktop had 25% better conversion rate

Strategic Recommendations:
- Reallocate 30% of display budget to social media
- Increase video creative production by 50%
- Implement dayparting strategy focusing on morning hours
- Create device-specific landing pages

Results & Business Impact:

Quantified Outcomes (3-month period):
- Campaign ROAS improved from 2.1x to 3.4x (+62% increase)
- Cost per acquisition decreased from $45 to $32 (-29% reduction)
- Overall conversion rate increased from 2.3% to 3.1% (+35% improvement)
- Client satisfaction score: 9.2/10
- Budget savings reinvested: $180K reallocated to high-performing channels

Long-term Impact:
- Methodology adopted across 5 additional client accounts
- Predictive model built for future campaign optimization
- Automated dashboard deployed for real-time monitoring

Sample Strong Response:
> “I led a campaign optimization project for a major retail client spending $2M annually across digital channels. Using SQL to extract data from our marketing data warehouse, I analyzed 150K transactions across paid search, social media, and display ads over 3 months. I applied regression analysis to identify channel effectiveness and discovered that social media delivered 2.3x higher CTR with 40% lower CPA than display ads. I built a Tableau dashboard showing real-time performance by channel, device, and time of day. Based on my recommendations, we reallocated 30% of budget from display to social and implemented dayparting strategies. Results: ROAS improved 62% from 2.1x to 3.4x, CPA decreased 29%, and the client saved $180K which was reinvested into high-performing channels. The methodology was subsequently adopted across 5 additional client accounts.”

IPG-Specific Context:
- Focus on marketing metrics: CTR, conversion rate, ROAS, CPA
- Demonstrate understanding of media channels: TV, digital, social, OOH, programmatic
- Emphasize client impact: Budget optimization, performance improvement
- Show cross-functional collaboration: Working with creative, media planning, account teams

What Interviewers Assess:
1. Technical Skills: SQL proficiency, statistical methods, tool usage
2. Business Acumen: Understanding marketing objectives, ROI focus
3. Communication: Clear storytelling from problem to solution
4. Impact Orientation: Quantified results, measurable outcomes
5. Client Mindset: Understanding stakeholder needs (critical at IPG)

2. Statistical Methods Application in Marketing Analytics

Level: Data Analyst to Senior Data Analyst

Difficulty: Medium

Source: InterviewQuery IPG Mediabrands Data Analyst Interview Guide (February 2024)

Team: IPG Mediabrands Analytics, Kinesso, Initiative

Interview Round: Technical Assessment

Question: “What statistical methods are you familiar with, and how have you applied them in your analyses?”

Answer:

Why This Question Matters at IPG:
IPG Mediabrands uses statistical methods to optimize $47B in annual marketing spend. This tests your theoretical knowledge AND practical application—critical for data-driven campaign decisions, A/B testing, and audience segmentation.

Core Statistical Methods for Marketing Analytics:

1. A/B Testing & Hypothesis Testing:

import scipy.stats as stats
import numpy as np
class ABTestAnalyzer:
    def __init__(self, control, treatment, alpha=0.05):
        self.control = control
        self.treatment = treatment
        self.alpha = alpha
    def two_sample_t_test(self):
        """Test if two campaign variants have different conversion rates"""        # Calculate metrics        control_mean = np.mean(self.control)
        treatment_mean = np.mean(self.treatment)
        # Perform t-test        t_stat, p_value = stats.ttest_ind(self.control, self.treatment)
        # Calculate confidence interval        diff = treatment_mean - control_mean
        se = np.sqrt(np.var(self.control)/len(self.control) +
                     np.var(self.treatment)/len(self.treatment))
        ci_lower = diff - 1.96 * se
        ci_upper = diff + 1.96 * se
        return {
            'control_mean': control_mean,
            'treatment_mean': treatment_mean,
            'difference': diff,
            'p_value': p_value,
            'significant': p_value < self.alpha,
            'confidence_interval': (ci_lower, ci_upper),
            'recommendation': 'Launch treatment' if p_value < self.alpha and diff > 0 else 'Keep control'        }
# Example: Testing two ad creativescontrol_conversions = [1, 0, 0, 1, 0, 1, 1, 0, 0, 1]  # 50% conversiontreatment_conversions = [1, 1, 0, 1, 1, 1, 1, 0, 1, 1]  # 80% conversiontester = ABTestAnalyzer(control_conversions, treatment_conversions)
results = tester.two_sample_t_test()
print(f"Control CTR: {results['control_mean']:.2%}")
print(f"Treatment CTR: {results['treatment_mean']:.2%}")
print(f"P-value: {results['p_value']:.4f}")
print(f"Significant: {results['significant']}")

When to Use:
- Testing ad creative variants (headlines, images, CTAs)
- Channel performance comparison
- Landing page optimization
- Audience targeting experiments

2. Regression Analysis:

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
import pandas as pd
class MediaMixModeling:
    def __init__(self):
        self.model = LinearRegression()
        self.scaler = StandardScaler()
    def fit_media_mix_model(self, df):
        """Predict sales based on media spend across channels"""        # Features: spend by channel        X = df[['tv_spend', 'digital_spend', 'social_spend',
                'ooh_spend', 'radio_spend']]
        y = df['conversions']
        # Standardize features        X_scaled = self.scaler.fit_transform(X)
        # Fit model        self.model.fit(X_scaled, y)
        # Calculate channel effectiveness        coefficients = pd.DataFrame({
            'channel': X.columns,
            'coefficient': self.model.coef_,
            'roi': self.model.coef_ / X.mean()  # ROI per dollar spent        }).sort_values('roi', ascending=False)
        return coefficients
# Example output:#    channel          coefficient    roi# 0  digital_spend    450.2         2.8x# 1  social_spend     380.5         2.3x# 2  tv_spend         290.1         1.1x

When to Use:
- Media mix modeling (budget allocation)
- Predicting campaign performance
- Understanding driver variables (what impacts conversions?)
- Forecasting sales based on ad spend

3. Clustering & Segmentation:

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
class CustomerSegmentation:
    def __init__(self, n_clusters=4):
        self.n_clusters = n_clusters
        self.model = KMeans(n_clusters=n_clusters, random_state=42)
        self.scaler = StandardScaler()
    def segment_customers(self, df):
        """Segment customers for targeted marketing"""        # Features for segmentation        features = df[['avg_order_value', 'purchase_frequency',
                       'recency_days', 'lifetime_value']]
        # Standardize        features_scaled = self.scaler.fit_transform(features)
        # Cluster        df['segment'] = self.model.fit_predict(features_scaled)
        # Profile segments        segment_profiles = df.groupby('segment').agg({
            'avg_order_value': 'mean',
            'purchase_frequency': 'mean',
            'lifetime_value': 'mean',
            'customer_id': 'count'        }).rename(columns={'customer_id': 'segment_size'})
        return segment_profiles
# Example segments:# Segment 0: "High-Value Frequent" - AOV $250, Freq 8x/year, LTV $2000# Segment 1: "Occasional Big Spenders" - AOV $400, Freq 2x/year, LTV $800# Segment 2: "Frequent Low-Spenders" - AOV $50, Freq 12x/year, LTV $600# Segment 3: "At-Risk Lapsed" - AOV $150, Freq 1x/year, LTV $150

When to Use:
- Audience segmentation for targeted campaigns
- Customer lifetime value prediction
- Identifying “whale customers” (high-value clients)
- Personalization strategies

4. Time Series Analysis:

from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.holtwinters import ExponentialSmoothing
class TrendAnalyzer:
    def decompose_trends(self, df, date_col, metric_col):
        """Identify seasonality, trends, and anomalies"""        # Set date index        df = df.set_index(date_col)
        # Decompose time series        decomposition = seasonal_decompose(df[metric_col],
                                          model='multiplicative',
                                          period=7)  # Weekly seasonality        # Forecast next 30 days        model = ExponentialSmoothing(df[metric_col],
                                     seasonal_periods=7,
                                     trend='add',
                                     seasonal='add')
        fitted = model.fit()
        forecast = fitted.forecast(steps=30)
        return {
            'trend': decomposition.trend,
            'seasonal': decomposition.seasonal,
            'residual': decomposition.resid,
            'forecast': forecast
        }

When to Use:
- Campaign performance trending
- Budget pacing and forecasting
- Seasonality identification (holiday shopping, back-to-school)
- Anomaly detection (unexpected spikes/drops)

5. Chi-Square Test for Categorical Data:

from scipy.stats import chi2_contingency
def test_channel_performance(df):
    """Test if conversion rates differ significantly by channel"""    # Create contingency table    contingency = pd.crosstab(df['channel'], df['converted'])
    # Chi-square test    chi2, p_value, dof, expected = chi2_contingency(contingency)
    print(f"Chi-square statistic: {chi2:.2f}")
    print(f"P-value: {p_value:.4f}")
    print(f"Significant difference: {p_value < 0.05}")
    return p_value < 0.05#     converted    0      1# channel# display        850    150  (15% conversion)# email          600    400  (40% conversion)# social         700    300  (30% conversion)

When to Use:
- Testing if conversion rates differ by channel
- Analyzing demographic differences in campaign response
- Evaluating creative performance across segments

Sample Strong Response:
> “I’m proficient in several statistical methods that I apply regularly in marketing analytics. For A/B testing, I use two-sample t-tests to evaluate campaign variants—recently I tested two email subject lines for a client and found treatment variant increased open rates by 18% (p<0.01), which we scaled to the full audience. I use regression analysis for media mix modeling to understand channel ROI; last quarter I built a model showing that digital spend had 2.8x ROI vs. 1.1x for TV, informing a $500K budget reallocation. For customer segmentation, I apply K-means clustering to identify high-value audiences—this helped a retail client identify ‘whale customers’ representing 15% of the base but 60% of revenue. I also use time series analysis to forecast campaign performance and detect anomalies, which helped catch a tracking issue that was underreporting conversions by 12%.”

Statistical Concepts Critical at IPG:

Method	Marketing Use Case	Business Impact
A/B Testing	Creative optimization, audience testing	Improve CTR/CVR by 15-40%
Regression	Media mix modeling, attribution	Optimize budget allocation
Clustering	Customer segmentation, targeting	Increase ROAS by 25-50%
Time Series	Forecasting, trend analysis	Better budget pacing
Chi-Square	Channel performance comparison	Identify best channels

What Interviewers Assess:
1. Theoretical Knowledge: Can you explain methods correctly?
2. Practical Application: Real examples from your experience
3. Business Context: How did analysis drive decisions?
4. Marketing Relevance: Understanding of advertising metrics
5. Tool Proficiency: Python/R, statistical software

SQL & Technical Proficiency

3. Identify Whale Customers with Complex SQL Logic

Level: Data Analyst to Senior Data Analyst

Difficulty: Hard

Source: DataLemur Interpublic Group SQL Interview Questions (April 2025)

Team: IPG Mediabrands Analytics, Client Strategy

Interview Round: SQL Technical Assessment

Question: “For Interpublic Group, write a SQL query to identify ‘whale customers’ who have either launched more than five marketing campaigns in the last six months OR have an average ad spend of over $10,000 per month.”

Answer:

Why This Question Matters at IPG:
“Whale customers” are high-value clients who drive significant revenue. At IPG Mediabrands ($47B in marketing investments), identifying and retaining these clients is critical for business growth. This SQL question tests your ability to work with CTEs, aggregations, date filtering, and complex business logic—all essential for marketing analytics.

Database Schema:

campaigns table:

CREATE TABLE campaigns (
    campaign_id INT PRIMARY KEY,
    customer_id INT,
    campaign_name VARCHAR(255),
    launch_date DATE);

ad_spend table:

CREATE TABLE ad_spend (
    spend_id INT PRIMARY KEY,
    customer_id INT,
    month_year DATE,
    amount_spent DECIMAL(10, 2)
);

Sample Data:

-- campaigns| campaign_id | customer_id | campaign_name        | launch_date |
|-------------|-------------|---------------------|-------------|| 1001        | 501         | Summer Sale 2024    | 2024-06-15  |
| 1002        | 501         | Back to School      | 2024-08-01  |
| 1003        | 502         | Holiday Campaign    | 2024-11-01  |
-- ad_spend| spend_id | customer_id | month_year  | amount_spent |
|----------|-------------|-------------|--------------|| 2001     | 501         | 2024-06-01  | 12500.00     |
| 2002     | 501         | 2024-07-01  | 15000.00     |
| 2003     | 502         | 2024-06-01  | 5000.00      |

Solution with CTEs:

-- Step 1: Calculate campaign count for each customer (last 6 months)WITH campaign_count AS (
    SELECT
        customer_id,
        COUNT(campaign_id) AS num_campaigns
    FROM campaigns
    WHERE launch_date >= CURRENT_DATE - INTERVAL '6 months'    GROUP BY customer_id
),
-- Step 2: Calculate average monthly ad spend (last 6 months)avg_spend AS (
    SELECT
        customer_id,
        AVG(amount_spent) AS avg_monthly_spend
    FROM ad_spend
    WHERE month_year >= CURRENT_DATE - INTERVAL '6 months'    GROUP BY customer_id
)
-- Step 3: Combine both criteria with OR logicSELECT
    COALESCE(cc.customer_id, as_table.customer_id) AS customer_id,
    cc.num_campaigns,
    as_table.avg_monthly_spend,
    CASE
        WHEN cc.num_campaigns > 5 AND as_table.avg_monthly_spend > 10000
            THEN 'Both Criteria'        WHEN cc.num_campaigns > 5
            THEN 'High Campaign Volume'        WHEN as_table.avg_monthly_spend > 10000
            THEN 'High Spend'        ELSE 'Unknown'    END AS whale_type
FROM campaign_count cc
FULL OUTER JOIN avg_spend as_table
    ON cc.customer_id = as_table.customer_id
WHERE cc.num_campaigns > 5
   OR as_table.avg_monthly_spend > 10000ORDER BY
    COALESCE(cc.num_campaigns, 0) DESC,
    COALESCE(as_table.avg_monthly_spend, 0) DESC;

Expected Output:

| customer_id | num_campaigns | avg_monthly_spend | whale_type           |
|-------------|---------------|-------------------|---------------------|
| 501         | 8             | 13250.00          | Both Criteria       |
| 503         | 7             | 8500.00           | High Campaign Volume |
| 502         | 3             | 12000.00          | High Spend          |

Key SQL Concepts Tested:

1. Common Table Expressions (CTEs):

WITH cte_name AS (
    -- Subquery logic)

Purpose: Organize complex queries into readable chunks

Benefits: Reusable, easier debugging, better performance than nested subqueries

IPG Context: Analyze campaigns separately from spend data

2. Date Filtering:

WHERE launch_date >= CURRENT_DATE - INTERVAL '6 months'

CURRENT_DATE: Dynamic date reference (no hardcoding)

INTERVAL: PostgreSQL syntax for date arithmetic

Alternatives: DATEADD() (SQL Server), DATE_SUB() (MySQL)

3. Aggregation Functions:

COUNT(campaign_id) AS num_campaigns
AVG(amount_spent) AS avg_monthly_spend

COUNT(): Number of campaigns per customer

AVG(): Average monthly spend per customer

GROUP BY: Required when using aggregations

4. FULL OUTER JOIN:

FROM campaign_count cc
FULL OUTER JOIN avg_spend as_table
    ON cc.customer_id = as_table.customer_id

Why FULL OUTER JOIN: Some customers may only appear in campaigns OR ad_spend

Captures both criteria: Customers with high campaigns OR high spend (not AND)

COALESCE(): Handle NULL values when customer only in one table

5. COALESCE for NULL Handling:

COALESCE(cc.customer_id, as_table.customer_id) AS customer_id

Purpose: Return first non-NULL value

Critical for FULL OUTER JOIN: Customer ID may be NULL in one CTE

Alternative: IFNULL() (MySQL), ISNULL() (SQL Server)

6. Complex WHERE Clause with OR Logic:

WHERE cc.num_campaigns > 5 OR as_table.avg_monthly_spend > 10000

Business Logic: Either condition qualifies as “whale”

Performance: Ensure indexes on both filter columns

Alternative Solutions:

Solution 2: Using UNION (Separate Queries):

-- Customers with >5 campaignsSELECT DISTINCT customer_id, 'High Campaign Volume' AS whale_type
FROM campaigns
WHERE launch_date >= CURRENT_DATE - INTERVAL '6 months'GROUP BY customer_id
HAVING COUNT(campaign_id) > 5UNION-- Customers with >$10K avg spendSELECT DISTINCT customer_id, 'High Spend' AS whale_type
FROM ad_spend
WHERE month_year >= CURRENT_DATE - INTERVAL '6 months'GROUP BY customer_id
HAVING AVG(amount_spent) > 10000;

Solution 3: Subquery Approach:

SELECT
    customer_id,
    (SELECT COUNT(*)
     FROM campaigns c
     WHERE c.customer_id = customers.customer_id
       AND c.launch_date >= CURRENT_DATE - INTERVAL '6 months'    ) AS num_campaigns,
    (SELECT AVG(amount_spent)
     FROM ad_spend a
     WHERE a.customer_id = customers.customer_id
       AND a.month_year >= CURRENT_DATE - INTERVAL '6 months'    ) AS avg_monthly_spend
FROM (SELECT DISTINCT customer_id FROM campaigns
      UNION
      SELECT DISTINCT customer_id FROM ad_spend) customers
WHERE num_campaigns > 5 OR avg_monthly_spend > 10000;

Performance Optimization:

-- Add indexes for faster queriesCREATE INDEX idx_campaigns_customer_date
    ON campaigns(customer_id, launch_date);
CREATE INDEX idx_spend_customer_date
    ON ad_spend(customer_id, month_year);
-- Partitioning for large tablesCREATE TABLE campaigns_partitioned (
    campaign_id INT,
    customer_id INT,
    launch_date DATE) PARTITION BY RANGE (launch_date);

Extended Analysis - Additional Whale Metrics:

WITH campaign_metrics AS (
    SELECT
        customer_id,
        COUNT(campaign_id) AS num_campaigns,
        MIN(launch_date) AS first_campaign,
        MAX(launch_date) AS last_campaign,
        COUNT(DISTINCT DATE_TRUNC('month', launch_date)) AS active_months
    FROM campaigns
    WHERE launch_date >= CURRENT_DATE - INTERVAL '6 months'    GROUP BY customer_id
),
spend_metrics AS (
    SELECT
        customer_id,
        AVG(amount_spent) AS avg_monthly_spend,
        SUM(amount_spent) AS total_spend,
        STDDEV(amount_spent) AS spend_volatility,
        MAX(amount_spent) AS max_monthly_spend
    FROM ad_spend
    WHERE month_year >= CURRENT_DATE - INTERVAL '6 months'    GROUP BY customer_id
)
SELECT
    COALESCE(c.customer_id, s.customer_id) AS customer_id,
    c.num_campaigns,
    s.avg_monthly_spend,
    s.total_spend,
    c.active_months,
    -- Whale score calculation    (COALESCE(c.num_campaigns, 0) * 10 +
     COALESCE(s.avg_monthly_spend, 0) / 1000) AS whale_score,
    -- Customer lifetime value estimate    s.total_spend * (c.active_months / 6.0) * 2 AS estimated_annual_value
FROM campaign_metrics c
FULL OUTER JOIN spend_metrics s ON c.customer_id = s.customer_id
WHERE c.num_campaigns > 5 OR s.avg_monthly_spend > 10000ORDER BY whale_score DESCLIMIT 100;  -- Top 100 whales

Business Applications at IPG:

Account Prioritization: Sales team focuses on top whales

Retention Strategies: Dedicated account managers for high-value clients

Churn Prevention: Monitor engagement drops in whale accounts

Upsell Opportunities: Identify whales with room to grow

Resource Allocation: Assign best creative teams to whale campaigns

Sample Strong Response:
> “This query identifies high-value ‘whale customers’ using two criteria: campaign volume and ad spend. I use CTEs to calculate each metric separately for clarity—one CTE counts campaigns in the last 6 months, the other calculates average monthly spend. The key challenge is the OR logic: customers qualify if they meet EITHER criterion, not both. I use a FULL OUTER JOIN to capture all customers regardless of which criterion they meet, with COALESCE handling cases where a customer only appears in one table. I add a whale_type column to segment customers by whether they’re high-volume, high-spend, or both, which helps the account management team tailor their approach. For performance, I’d add indexes on customer_id and date columns. This analysis would feed into IPG’s account prioritization, helping identify the 15-20% of clients driving 60-80% of revenue.”

What Interviewers Assess:
1. SQL Proficiency: CTEs, joins, aggregations, date logic
2. Problem Decomposition: Breaking complex logic into steps
3. Business Understanding: Why “whale customers” matter
4. Edge Case Handling: NULL values, missing data
5. Performance Awareness: Index optimization, query efficiency

4. Handling Missing Data in Marketing Datasets

Level: Data Analyst

Difficulty: Medium

Source: InterviewQuery IPG Mediabrands Data Analyst Interview Guide (February 2024)

Team: IPG Mediabrands Analytics, Data Quality

Interview Round: Technical Problem Solving

Question: “How do you handle missing data in your analyses?”

Answer:

Why This Question Matters at IPG:
Marketing campaign data is notoriously messy—tracking pixels fail, attribution breaks, third-party data has gaps. At IPG, making wrong decisions due to poor data handling can cost millions in client spend. This tests your practical data cleaning skills and understanding of when different approaches are appropriate.

Decision Framework for Missing Data:

import pandas as pd
import numpy as np
from sklearn.impute import KNNImputer, SimpleImputer
import matplotlib.pyplot as plt
class MissingDataHandler:
    def __init__(self, df):
        self.df = df
    def assess_missing_data(self):
        """Step 1: Understand the extent and pattern of missing data"""        missing_summary = pd.DataFrame({
            'column': self.df.columns,
            'missing_count': self.df.isnull().sum(),
            'missing_pct': (self.df.isnull().sum() / len(self.df)) * 100,
            'data_type': self.df.dtypes
        }).sort_values('missing_pct', ascending=False)
        print("Missing Data Summary:")
        print(missing_summary[missing_summary['missing_pct'] > 0])
        # Test if missing is random (MCAR) or systematic (MAR/MNAR)        self.test_missing_pattern()
        return missing_summary
    def test_missing_pattern(self):
        """Determine if data is Missing Completely At Random (MCAR)"""        # Example: Check if missing conversion data correlates with channel        if 'channel' in self.df.columns and 'conversions' in self.df.columns:
            missing_by_channel = self.df.groupby('channel')['conversions'].apply(
                lambda x: x.isnull().sum() / len(x) * 100            )
            print("\nMissing conversions by channel:")
            print(missing_by_channel)
            # If one channel has >20% more missing than others, it's systematic            if missing_by_channel.max() - missing_by_channel.min() > 20:
                print("⚠️ Missing data is NOT random - systematic issue detected")
            else:
                print("✓ Missing data appears random")
    def handle_missing_strategy(self, column, method='auto'):
        """Step 2: Choose appropriate handling strategy"""        missing_pct = (self.df[column].isnull().sum() / len(self.df)) * 100        # Strategy decision tree        if missing_pct < 5:
            return self.drop_missing(column)
        elif missing_pct < 30:
            if self.df[column].dtype in ['int64', 'float64']:
                return self.impute_numerical(column, method)
            else:
                return self.impute_categorical(column)
        else:
            return self.create_missing_indicator(column)
    def drop_missing(self, column):
        """Method 1: Drop rows with missing values (<5% missing)"""        original_len = len(self.df)
        self.df = self.df.dropna(subset=[column])
        dropped = original_len - len(self.df)
        print(f"Dropped {dropped} rows ({dropped/original_len*100:.1f}%)")
        return self.df
    def impute_numerical(self, column, method='median'):
        """Method 2: Imputation for numerical data"""        if method == 'median':
            # Median imputation (robust to outliers)            fill_value = self.df[column].median()
            self.df[column].fillna(fill_value, inplace=True)
            print(f"Filled {column} with median: {fill_value:.2f}")
        elif method == 'mean':
            # Mean imputation (use if data is normally distributed)            fill_value = self.df[column].mean()
            self.df[column].fillna(fill_value, inplace=True)
        elif method == 'forward_fill':
            # Forward fill (time series data)            self.df[column].fillna(method='ffill', inplace=True)
        elif method == 'knn':
            # KNN imputation (use similar records to estimate)            imputer = KNNImputer(n_neighbors=5)
            self.df[[column]] = imputer.fit_transform(self.df[[column]])
            print(f"Filled {column} using KNN imputation")
        return self.df
    def impute_categorical(self, column):
        """Method 3: Imputation for categorical data"""        # Mode imputation (most frequent value)        mode_value = self.df[column].mode()[0]
        self.df[column].fillna(mode_value, inplace=True)
        print(f"Filled {column} with mode: {mode_value}")
        return self.df
    def create_missing_indicator(self, column):
        """Method 4: Create indicator variable (>30% missing)"""        # Create binary column indicating missingness        self.df[f'{column}_missing'] = self.df[column].isnull().astype(int)
        # Fill with placeholder        if self.df[column].dtype in ['int64', 'float64']:
            self.df[column].fillna(-999, inplace=True)
        else:
            self.df[column].fillna('MISSING', inplace=True)
        print(f"Created missing indicator for {column}")
        return self.df
# Example: Handling missing campaign datacampaign_data = pd.DataFrame({
    'campaign_id': range(1, 101),
    'impressions': np.random.randint(1000, 10000, 100),
    'clicks': [np.random.randint(10, 200) if np.random.random() > 0.1
               else np.nan for _ in range(100)],  # 10% missing    'conversions': [np.random.randint(1, 50) if np.random.random() > 0.25
                   else np.nan for _ in range(100)],  # 25% missing    'channel': np.random.choice(['Social', 'Display', 'Search'], 100)
})
handler = MissingDataHandler(campaign_data)
missing_summary = handler.assess_missing_data()
# Handle clicks (10% missing) - use medianhandler.impute_numerical('clicks', method='median')
# Handle conversions (25% missing) - use KNN imputationhandler.impute_numerical('conversions', method='knn')

When to Use Each Method:

Missing %	Method	Use Case
<5%	Drop rows	Minimal impact, preserves data quality
5-15%	Simple imputation (mean/median/mode)	Quick, reasonable for random missing
15-30%	Advanced imputation (KNN, regression)	Preserve data, use patterns to estimate
>30%	Missing indicator + flag	Too much missing to reliably impute

Marketing-Specific Examples:

1. Missing Conversion Data:

# Conversion tracking often fails - don't just drop!def handle_missing_conversions(df):
    """Estimate conversions based on similar campaigns"""    # If conversion is missing but we have clicks, estimate conversion rate    avg_cvr_by_channel = df.groupby('channel')['conversions'].sum() / df.groupby('channel')['clicks'].sum()
    def estimate_conversion(row):
        if pd.isnull(row['conversions']) and not pd.isnull(row['clicks']):
            channel_cvr = avg_cvr_by_channel[row['channel']]
            return row['clicks'] * channel_cvr
        return row['conversions']
    df['conversions'] = df.apply(estimate_conversion, axis=1)
    return df

2. Missing Demographics:

# Third-party data providers often have gapsdef handle_missing_demographics(df):
    """Use customer behavior to infer demographics"""    # If age is missing, estimate from purchase behavior    from sklearn.ensemble import RandomForestClassifier
    # Train on complete cases    complete = df.dropna(subset=['age', 'purchase_amount', 'purchase_frequency'])
    X_train = complete[['purchase_amount', 'purchase_frequency', 'device_type']]
    y_train = pd.cut(complete['age'], bins=[0, 25, 35, 50, 100], labels=['18-25', '26-35', '36-50', '50+'])
    model = RandomForestClassifier()
    model.fit(X_train, y_train)
    # Predict for missing    missing = df[df['age'].isnull()]
    X_missing = missing[['purchase_amount', 'purchase_frequency', 'device_type']]
    df.loc[df['age'].isnull(), 'age_group'] = model.predict(X_missing)
    return df

3. Missing Attribution Data:

# Last-touch attribution often missing for cross-devicedef handle_missing_attribution(df):
    """Use probabilistic attribution when last-touch unavailable"""    # If last_touch_channel is missing, distribute credit proportionally    def probabilistic_attribution(row):
        if pd.isnull(row['last_touch_channel']):
            # Use touchpoint sequence to distribute credit            touchpoints = row['touchpoint_sequence'].split(',')
            # Linear attribution            credit_per_touch = 1.0 / len(touchpoints)
            return touchpoints, [credit_per_touch] * len(touchpoints)
        else:
            return [row['last_touch_channel']], [1.0]
    df[['attributed_channels', 'attribution_credit']] = df.apply(
        probabilistic_attribution, axis=1, result_type='expand'    )
    return df

Documentation Best Practices:

def document_missing_data_handling(df, output_file='data_quality_report.md'):
    """Create documentation of all missing data decisions"""    report = []
    report.append("# Data Quality Report\n")
    report.append(f"**Date:** {pd.Timestamp.now()}\n")
    report.append(f"**Total Records:** {len(df)}\n\n")
    report.append("## Missing Data Summary\n")
    for col in df.columns:
        missing_pct = df[col].isnull().sum() / len(df) * 100        if missing_pct > 0:
            report.append(f"- **{col}:** {missing_pct:.1f}% missing\n")
            report.append(f"  - **Action:** [Document what you did]\n")
            report.append(f"  - **Rationale:** [Why this approach]\n")
    with open(output_file, 'w') as f:
        f.writelines(report)

Sample Strong Response:
> “When faced with missing data, I follow a systematic approach. First, I assess the extent and pattern—is it missing completely at random or systematically? Recently I worked with campaign conversion data where 25% was missing. I discovered tracking pixels were failing on Safari browsers, so the missingness wasn’t random. For numerical metrics like conversions with 5-15% missing, I use median imputation or KNN imputation which uses similar campaigns to estimate values. For categorical data like demographics, I use mode imputation. If more than 30% is missing, I create a ‘missing’ indicator column rather than impute, as the estimate would be unreliable. I always document my decisions and assess the impact—for example, after imputing, I compare statistical properties (mean, std dev) before and after to ensure I haven’t introduced bias. In one case, proper handling of missing conversion data prevented us from incorrectly concluding that Safari traffic was unprofitable, when actually it was just a tracking issue.”

What Interviewers Assess:
1. Systematic Thinking: Do you assess before acting?
2. Method Knowledge: Understand multiple approaches
3. Practical Judgment: When to use each method
4. Business Impact: How does handling affect conclusions?
5. Documentation: Transparent about assumptions

5. Calculate Click-Through Rate (CTR) with SQL

Level: Data Analyst

Difficulty: Easy-Medium

Source: DataLemur Interpublic Group SQL Interview Questions (April 2025)

Team: IPG Mediabrands Analytics, Campaign Measurement

Interview Round: SQL Screening

Question: “Write a SQL query to calculate the click-through-rate (CTR) for each ad campaign. CTR = (total clicks / total impressions) × 100.”

Answer:

Why This Question Matters at IPG:
CTR is a fundamental digital marketing metric. At IPG, you’ll calculate CTR daily across thousands of campaigns to optimize creative, targeting, and budget allocation. This tests basic SQL aggregation and your understanding of marketing KPIs.

Database Schema:

CREATE TABLE ad_campaigns (
    campaign_id INT,
    campaign_name VARCHAR(255),
    date DATE,
    impressions INT,
    clicks INT);

Sample Data:

INSERT INTO ad_campaigns VALUES(101, 'Summer Sale 2024', '2024-06-01', 5000, 400),
(101, 'Summer Sale 2024', '2024-06-02', 4500, 380),
(202, 'Back to School', '2024-08-01', 7500, 330),
(202, 'Back to School', '2024-08-02', 6800, 310);

Solution:

SELECT
    campaign_name,
    SUM(impressions) AS total_impressions,
    SUM(clicks) AS total_clicks,
    ROUND((SUM(clicks)::DECIMAL / SUM(impressions)) * 100, 2) AS ctr
FROM ad_campaigns
GROUP BY campaign_name
ORDER BY ctr DESC;

Expected Output:

| campaign_name      | total_impressions | total_clicks | ctr  |
|--------------------|-------------------|--------------|------|
| Summer Sale 2024   | 9500              | 780          | 8.21 |
| Back to School     | 14300             | 640          | 4.48 |

Key Concepts:

1. Type Casting to Avoid Integer Division:

-- ❌ WRONG: Integer division truncatesSUM(clicks) / SUM(impressions)  -- Returns 0 if clicks < impressions-- ✓ CORRECT: Cast to DECIMAL for precisionSUM(clicks)::DECIMAL / SUM(impressions)

2. Aggregation with GROUP BY:

GROUP BY campaign_name  -- Aggregate metrics per campaign

3. Sorting by Performance:

ORDER BY ctr DESC  -- Best-performing campaigns first

Extended Analysis - Campaign Performance Dashboard:

-- Comprehensive campaign metricsSELECT
    campaign_name,
    COUNT(DISTINCT date) AS days_active,
    SUM(impressions) AS total_impressions,
    SUM(clicks) AS total_clicks,
    ROUND((SUM(clicks)::DECIMAL / SUM(impressions)) * 100, 2) AS ctr,
    -- Performance benchmarks    CASE
        WHEN (SUM(clicks)::DECIMAL / SUM(impressions)) * 100 >= 5.0
            THEN 'Excellent'        WHEN (SUM(clicks)::DECIMAL / SUM(impressions)) * 100 >= 2.0
            THEN 'Good'        WHEN (SUM(clicks)::DECIMAL / SUM(impressions)) * 100 >= 1.0
            THEN 'Average'        ELSE 'Poor'    END AS performance_tier,
    -- Trend analysis    ROUND((SUM(clicks)::DECIMAL / SUM(impressions)) * 100, 2) -
    LAG(ROUND((SUM(clicks)::DECIMAL / SUM(impressions)) * 100, 2))
        OVER (ORDER BY campaign_name) AS ctr_change
FROM ad_campaigns
GROUP BY campaign_name
ORDER BY ctr DESC;

Sample Strong Response:
> “CTR is calculated as total clicks divided by total impressions times 100. The key challenge in SQL is avoiding integer division—I use type casting (::DECIMAL) to ensure precision. I aggregate using SUM() with GROUP BY campaign_name to get totals per campaign, then order by CTR descending to identify top performers. In production, I’d extend this to include benchmarks (5%+ is excellent for display ads), date ranges for trend analysis, and segment by channel since CTR benchmarks vary—search ads typically get 3-5% CTR while display averages 0.5-1%.”

What Interviewers Assess:
1. SQL Fundamentals: Aggregation, type casting
2. Marketing Knowledge: Understanding CTR formula
3. Attention to Detail: Proper calculation (avoiding integer division)
4. Business Context: CTR benchmarks and interpretation

Project Management & Communication

6. Prioritizing Multiple Competing Deadlines

Level: Data Analyst to Business Intelligence Analyst

Difficulty: Medium

Source: InterviewQuery IPG Mediabrands Business Analyst Interview Guide (February 2024)

Team: IPG Mediabrands Analytics, Multiple Client Teams

Interview Round: Behavioral Assessment

Question: “How do you prioritize multiple data platform requests or tasks when you have competing deadlines?”

Answer:

Why This Question Matters at IPG:
IPG Mediabrands manages 47B in marketing investments across hundreds of clients. You’ll constantly face competing priorities: urgent client dashboards, quarterly business reviews, ad-hoc analyses, and platform maintenance. This tests organizational skills and stakeholder management—critical for success in a fast-paced agency environment.

Prioritization Framework:

1. Eisenhower Matrix (Urgent vs. Important):

HIGH IMPORTANCE
   ↑
   |  QUADRANT 2:           | QUADRANT 1:
   |  Strategic              | Crisis/Urgent
   |  - Quarterly business   | - Client escalation
   |    review prep          | - Campaign going live
   |  - Dashboard automation | - Data pipeline broken
   |  - Process improvement  | - Executive presentation
   |  ───────────────────────┼──────────────────────────→
   |  QUADRANT 3:           | QUADRANT 4:              HIGH URGENCY
   |  Busywork              | Distractions
   |  - Routine reports     | - Non-critical emails
   |  - Status meetings     | - Nice-to-have analyses
   |  - Low-value requests  | - Scope creep
   |
LOW IMPORTANCE

Action Plan:
- Quadrant 1 (Urgent + Important): Do immediately
- Quadrant 2 (Important, Not Urgent): Schedule dedicated time
- Quadrant 3 (Urgent, Not Important): Delegate or automate
- Quadrant 4 (Neither): Eliminate

2. Impact × Urgency Scoring:

import pandas as pd
class TaskPrioritizer:
    def __init__(self):
        self.tasks = []
    def score_task(self, task_name, business_impact, urgency, effort):
        """        Score tasks based on impact, urgency, and effort        business_impact: 1-10 (revenue impact, strategic value)        urgency: 1-10 (how soon needed)        effort: 1-10 (time required, lower is better)        """        # Priority score = (Impact × Urgency) / Effort        priority_score = (business_impact * urgency) / effort
        task = {
            'task': task_name,
            'impact': business_impact,
            'urgency': urgency,
            'effort': effort,
            'priority_score': priority_score
        }
        self.tasks.append(task)
        return priority_score
    def get_prioritized_list(self):
        """Return tasks sorted by priority"""        df = pd.DataFrame(self.tasks)
        return df.sort_values('priority_score', ascending=False)
# Example: Multiple competing requestsprioritizer = TaskPrioritizer()
# Task 1: Client dashboard for quarterly business reviewprioritizer.score_task('Q4 QBR Dashboard', business_impact=9, urgency=10, effort=8)
# Task 2: Ad-hoc analysis for account teamprioritizer.score_task('Ad-hoc Audience Analysis', business_impact=5, urgency=8, effort=3)
# Task 3: Automate weekly reportingprioritizer.score_task('Automate Weekly Report', business_impact=7, urgency=4, effort=6)
# Task 4: Fix data pipeline issueprioritizer.score_task('Fix Broken ETL Pipeline', business_impact=10, urgency=10, effort=5)
# Task 5: Exploratory analysisprioritizer.score_task('Explore New Data Source', business_impact=6, urgency=2, effort=7)
print(prioritizer.get_prioritized_list())
# Output:#                          task  impact  urgency  effort  priority_score# 3  Fix Broken ETL Pipeline      10       10       5         20.00# 1  Ad-hoc Audience Analysis      5        8       3         13.33# 0  Q4 QBR Dashboard              9       10       8         11.25# 2  Automate Weekly Report        7        4       6          4.67# 4  Explore New Data Source       6        2       7          1.71

3. Stakeholder Communication:

def communicate_priorities(tasks_df, stakeholder_email):
    """    Template for communicating priorities transparently    """    email_template = f"""    Hi [Stakeholder],    I've received your request for [Task Name]. To ensure I'm prioritizing    effectively across all stakeholders, here's my current queue:    **This Week (High Priority):**    {tasks_df[tasks_df['priority_score'] >= 10][['task', 'deadline']].to_string()}    **Next Week (Medium Priority):**    {tasks_df[(tasks_df['priority_score'] >= 5) & (tasks_df['priority_score'] < 10)][['task', 'deadline']].to_string()}    **Your Request:** [Task Name]    - Estimated effort: [X hours/days]    - Proposed start date: [Date]    - Estimated completion: [Date]    Does this timeline work for you? If this is more urgent, please let me    know the business context so I can re-prioritize accordingly.    Best,    [Your name]    """    return email_template

4. Tools for Task Management:

Jira / Asana: Track all requests in one place

Calendar blocking: Dedicated time for deep work

Kanban board: Visualize workflow (To Do → In Progress → Done)

Daily standup: 15-min sync with team on priorities

Sample Strong Response:
> “I use a priority matrix to categorize tasks based on urgency and business impact. When I receive competing requests, I assess three factors: business impact (is this revenue-driving or strategic?), urgency (what’s the real deadline?), and effort (how long will it take?). I calculate a priority score as (Impact × Urgency) / Effort. Recently, I had five competing requests: a quarterly business review dashboard, an ad-hoc audience analysis, automating weekly reporting, fixing a broken data pipeline, and exploratory analysis. The broken pipeline scored highest (10×10/5 = 20) because it was blocking multiple teams, so I fixed that first. The ad-hoc analysis was quick (5×8/3 = 13.3) so I knocked it out next. The QBR dashboard was high-impact but took longer (9×10/8 = 11.25), so I scheduled dedicated time for it. I regularly communicate my priorities to stakeholders transparently, including proposed timelines, and ask if anything is more urgent than I realize. I also use Jira to track all requests and calendar blocking to protect time for deep work. This approach helped me manage 15-20 concurrent requests last quarter while maintaining quality and meeting all critical deadlines.”

What Interviewers Assess:
1. Organizational Skills: Systematic approach to prioritization
2. Stakeholder Management: Communication and expectation-setting
3. Judgment: Understanding business impact vs. busywork
4. Adaptability: Flexibility when priorities shift
5. Tools: Use of project management systems

7. Data Visualization and Storytelling

Level: Data Analyst to Senior Data Analyst

Difficulty: Medium-High

Source: InterviewQuery IPG Mediabrands Data Analyst Interview Guide (February 2024)

Team: IPG Mediabrands Analytics, Client Insights

Interview Round: Technical + Communication

Question: “What data visualization tools have you used, and how have you applied them to communicate findings?”

Answer:

Why This Question Matters at IPG:
At IPG, you’ll create client-facing dashboards and present to Fortune 500 CMOs. This tests both technical tool proficiency (Tableau, Power BI) and data storytelling abilities—translating complex analyses into actionable insights for non-technical stakeholders.

Tool Proficiency Overview:

Tool	Strength	Best For	IPG Use Case
Tableau	Interactive dashboards, visual appeal	Client-facing presentations	Campaign performance tracking, QBRs
Power BI	Microsoft integration, enterprise scale	Internal reporting, executive dashboards	Budget pacing, ROI analysis
Looker	SQL-based, data governance	Centralized metrics, self-serve analytics	Standardized KPI definitions
Google Data Studio	Free, Google Ads integration	Quick campaign reports	Paid search performance
Python (Matplotlib/Seaborn)	Custom, reproducible	Ad-hoc analysis, statistical visualization	A/B test results, regression analysis

Dashboard Design Principles:

1. Understand Your Audience:

Executive Dashboard (C-Suite):
- High-level KPIs (3-5 metrics max)
- Trend over time (YoY, MoM)
- Simple visuals (bar, line, single number)
- Action-oriented (what decision to make?)

Analyst Dashboard (Media Planners):
- Detailed breakdowns (by channel, demo, geography)
- Multiple filters and drill-downs
- Technical metrics (CTR, CPC, CVR, ROAS)
- Exploration-focused (answer "why?")

Client Dashboard (CMO):
- Business outcomes (revenue, brand lift, market share)
- Competitive benchmarking
- Clear narratives and insights
- Visually polished (brand colors)

2. Chart Type Selection:

def choose_chart_type(data_type, purpose):
    """    Select appropriate chart based on data and purpose    """    chart_guide = {
        ('time_series', 'trend'): 'Line chart',
        ('categorical', 'comparison'): 'Bar chart (horizontal)',
        ('part_to_whole', 'composition'): 'Stacked bar or pie chart',
        ('two_variables', 'relationship'): 'Scatter plot',
        ('geographic', 'location'): 'Map',
        ('single_number', 'highlight'): 'KPI card / Big Number',
        ('distribution', 'spread'): 'Histogram or box plot',
        ('ranking', 'top_performers'): 'Bar chart (sorted)',
        ('flow', 'journey'): 'Sankey diagram',
        ('multiple_metrics', 'comparison'): 'Bullet chart or small multiples'    }
    return chart_guide.get((data_type, purpose), 'Table (fallback)')
# Examples:print(choose_chart_type('time_series', 'trend'))  # Line chartprint(choose_chart_type('categorical', 'comparison'))  # Bar chart

3. Color Strategy:

Color Best Practices:
- Use brand colors for consistency
- Limit to 5-7 colors max
- Color blindness-friendly palettes (avoid red-green)
- Reserve red for alerts/negative, green for positive
- Gray out less important elements
- Consistent color = consistent meaning across dashboard

Example Tableau Dashboard Structure:

┌─────────────────────────────────────────────────────────────┐
│  Campaign Performance Dashboard - Q4 2024                   │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐        │
│  │ Total Spend  │ │ Conversions  │ │    ROAS      │        │
│  │  $2.4M       │ │    15,234    │ │    3.2x      │        │
│  │  ↑ 12% MoM   │ │  ↑ 8% MoM    │ │  ↓ 2% MoM    │        │
│  └──────────────┘ └──────────────┘ └──────────────┘        │
│                                                              │
│  ┌────────────────────────────────┐  ┌──────────────────┐  │
│  │ Conversion Trend (Line Chart)  │  │  Channel Mix     │  │
│  │  [Line graph showing daily     │  │  (Stacked Bar)   │  │
│  │   conversions over time]       │  │  [Bar showing    │  │
│  │                                │  │   spend by       │  │
│  │                                │  │   channel]       │  │
│  └────────────────────────────────┘  └──────────────────┘  │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Performance by Channel (Table with Heat Map)          │ │
│  │  Channel   | Spend   | Impressions | CTR   | ROAS     │ │
│  │  Social    | $800K   | 12M         | 2.3%  | 4.1x 🟢  │ │
│  │  Search    | $900K   | 5M          | 4.5%  | 3.8x 🟢  │ │
│  │  Display   | $700K   | 20M         | 0.8%  | 1.9x 🔴  │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
│  Filters: Date Range | Channel | Campaign | Geography       │
└─────────────────────────────────────────────────────────────┘

4. Interactivity Best Practices:

// Tableau Dashboard Actions// Action 1: Filter on click// Click a bar chart → filters entire dashboard// Action 2: Drill-down hierarchy// Campaign → Ad Group → Creative// User clicks through levels for detail// Action 3: Parameter controls// Dropdown: Select metric (CTR vs. CVR vs. ROAS)// Date range picker: Custom time periods// Action 4: Tooltips with context// Hover over data point → show://   - Exact values//   - YoY comparison//   - Benchmark

5. Performance Optimization:

-- Tableau data extracts for speed-- Instead of live query every time:-- Option 1: Aggregated extract (smaller, faster)CREATE TABLE campaign_summary_daily ASSELECT
    date,
    campaign_name,
    channel,
    SUM(impressions) AS impressions,
    SUM(clicks) AS clicks,
    SUM(conversions) AS conversions,
    SUM(spend) AS spend
FROM raw_campaign_data
WHERE date >= CURRENT_DATE - INTERVAL '90 days'GROUP BY date, campaign_name, channel;
-- Option 2: Incremental refresh (only new data)-- Schedule daily extract refresh at 2 AM

Real-World Example - Campaign QBR Presentation:

Scenario: Presenting Q4 campaign results to client CMO

Slide 1: Executive Summary
- Big number: "Achieved 3.2x ROAS, exceeding 2.8x goal"
- Simple visual: Line chart showing ROAS trend improving over quarter
- Insight: "Social media drove 60% of incremental conversions"

Slide 2: Channel Performance
- Stacked bar chart: Budget allocation vs. actual spend
- Table with conditional formatting: Green/yellow/red based on ROAS
- Insight: "Display underperformed; recommend shift $200K to social"

Slide 3: Audience Insights
- Scatter plot: Cost per acquisition vs. conversion volume by segment
- Highlight: "25-34 age group = sweet spot (low CPA, high volume)"
- Recommendation: "Increase targeting weight for 25-34 by 30%"

Slide 4: Next Steps
- Bullet points with clear actions
- Expected impact quantified
- Timeline for implementation

Sample Strong Response:
> “I’m proficient in Tableau and Power BI, which I use for different purposes at my current role. For client-facing presentations, I prefer Tableau because of its visual polish and interactivity—I recently built a quarterly business review dashboard showing campaign performance across 15 channels. I used a hierarchy structure allowing drill-down from campaign → ad group → creative, with filters for date range and geography. The top section featured three KPI cards showing total spend, conversions, and ROAS with month-over-month trends. Below that, I included a line chart showing daily conversion trends with annotations marking major events like product launches. I used a stacked bar chart to show budget allocation vs. actual spend by channel, with conditional formatting highlighting channels exceeding or missing targets. The key was designing for my audience—the CMO wanted high-level insights, so I kept it to 5-7 visuals max, used the brand’s color palette, and focused every visual on answering ‘so what?’—what decision should we make based on this data? The dashboard helped identify that social media was outperforming with 4.1x ROAS vs. 1.9x for display, leading to a $200K budget reallocation that improved overall campaign ROAS by 15%. For internal analytics, I use Power BI integrated with our SQL database for real-time reporting. I also use Python (Matplotlib/Seaborn) for statistical visualizations like A/B test results where I need to show confidence intervals and p-values. My philosophy is: every visualization should have a clear purpose, reduce cognitive load by keeping it simple, and tell a story that drives action.”

What Interviewers Assess:
1. Tool Proficiency: Hands-on experience with Tableau/Power BI
2. Design Thinking: Audience-appropriate visualizations
3. Storytelling: Translating data to insights
4. Business Impact: How dashboards drove decisions
5. Best Practices: Color, chart types, interactivity

Behavioral & Impact

8. Data-Driven Decision Making and Business Influence

Level: Data Analyst to Senior Data Analyst

Difficulty: High

Source: InterviewQuery IPG Mediabrands Business Analyst Interview Guide (February 2024)

Team: IPG Mediabrands Analytics, Strategy

Interview Round: Behavioral + Case Assessment

Question: “Tell me about a time when you used data to influence a major business decision. What was the situation and what was the outcome?”

Answer:

Why This Question Matters at IPG:
IPG Mediabrands advises clients on $47B in marketing investments annually. This tests your ability to translate analytics into actionable insights that drive business results—demonstrating strategic thinking beyond tactical analysis and the influence/persuasion skills needed in a client-facing agency environment.

STAR Method Framework:

Situation:
- Context: What was the business challenge or opportunity?
- Stakes: Why did this matter? (revenue, efficiency, client relationship)
- Initial State: What was the prevailing assumption or current approach?

Task:
- Your Role: What were you asked to do?
- Analytical Challenge: What data/methodology questions needed answering?
- Stakeholders: Who needed to be influenced?

Action:
- Data Collection: What data did you gather and from where?
- Analysis Performed: What methods/tools did you use?
- Insights Discovered: What did the data reveal?
- Communication Strategy: How did you present findings?
- Overcoming Objections: How did you address skepticism?

Result:
- Decision Made: What action was taken based on your analysis?
- Quantified Outcome: Specific metrics (revenue, efficiency, satisfaction)
- Long-term Impact: Lasting changes to processes or strategy

Example 1: Campaign Budget Reallocation

Situation:
> “At my previous agency, a major retail client was splitting their $5M annual digital marketing budget evenly across social media, display advertising, and paid search. The account team believed this balanced approach was safe, but campaign performance had plateaued for three quarters running. The client’s CMO was considering pulling business to a competitor due to stagnant results.”

Task:
> “I was tasked with conducting a comprehensive performance analysis to identify optimization opportunities. The challenge was that different channels had different attribution models and conversion windows, making direct comparison difficult. I needed to present findings to both the internal account team (who were defensive about current strategy) and the client CMO (who was skeptical of our agency’s capabilities).”

Action:

# Analysis approach:# 1. DATA COLLECTION# Pulled 12 months of campaign data from:query = """SELECT    channel,    campaign_name,    date,    SUM(spend) AS spend,    SUM(impressions) AS impressions,    SUM(clicks) AS clicks,    SUM(conversions) AS conversions,    SUM(revenue) AS revenueFROM campaign_performanceWHERE client_id = 'retail_client_X'  AND date >= CURRENT_DATE - INTERVAL '12 months'GROUP BY channel, campaign_name, date"""# 2. CALCULATED KEY METRICS# ROAS (Return on Ad Spend) = Revenue / Spend# CPA (Cost Per Acquisition) = Spend / Conversions# Customer Lifetime Value by channelmetrics_by_channel = df.groupby('channel').agg({
    'spend': 'sum',
    'conversions': 'sum',
    'revenue': 'sum'})
metrics_by_channel['roas'] = metrics_by_channel['revenue'] / metrics_by_channel['spend']
metrics_by_channel['cpa'] = metrics_by_channel['spend'] / metrics_by_channel['conversions']
# Results revealed:# - Paid Search: 4.2x ROAS (excellent)# - Social Media: 2.8x ROAS (good)# - Display Ads: 1.3x ROAS (poor - below breakeven)# 3. STATISTICAL SIGNIFICANCE TESTINGfrom scipy import stats
# A/B test comparing display vs. social conversion ratesdisplay_cvr = 0.008  # 0.8% conversion ratesocial_cvr = 0.023   # 2.3% conversion rate# Two-sample t-testt_stat, p_value = stats.ttest_ind(display_conversions, social_conversions)
print(f"P-value: {p_value:.4f}")  # p<0.001 - highly significant# 4. FORECASTING IMPACT# Built model to project outcomes of reallocation# Scenario: Shift $1M from display to social mediaprojected_impact = {
    'current_total_roas': 2.8,
    'projected_total_roas': 3.4,
    'incremental_revenue': '$720K annually',
    'risk_assessment': 'Low - social already proven performer'}

Communication Strategy:

Internal Stakeholder Meeting (Account Team):
- Started with empathy: "Our current strategy WAS the right call 2 years ago"
- Showed data: "But consumer behavior has shifted - social engagement up 40%"
- Framed positively: "We have an opportunity to show client proactive optimization"

Client Presentation (CMO):
Slide 1: "We found $720K in incremental revenue opportunity"
- Big number first - grabbed attention
- Simple visual: 3 bars showing ROAS by channel

Slide 2: "Display ads are 3x less effective than social media"
- Bar chart: $100 spend → display generates $130, social generates $280
- Insight: "Your customers engage on social, but ads show on display"

Slide 3: "Recommendation: Reallocate $1M from display to social"
- Before/after pie charts showing budget mix
- Projected impact table with confidence intervals
- Risk mitigation plan

Slide 4: "Pilot approach to minimize risk"
- 3-month test period
- Monthly check-ins
- Rollback plan if underperforming

Overcoming Objections:

Objection 1: "Display ads build brand awareness - you're only looking at direct conversions"
Response: "I ran a brand lift study using search volume data. Display ads showed no
measurable brand lift, while social campaigns correlated with 15% increase in branded
search queries."

Objection 2: "Social media performance might be inflated due to attribution issues"
Response: "I used multi-touch attribution modeling, not just last-click. Even with
conservative linear attribution, social still outperforms display 2.1x to 1.3x."

Objection 3: "What if display performance improves with creative refresh?"
Response: "We tested 8 different display creatives over the past year - none exceeded
1.5x ROAS. I recommend we pilot social reallocation while continuing creative testing
on remaining display budget."

Result:

Immediate Outcomes (3 months):
- CMO approved pilot budget reallocation
- Shifted $1M from display to social media
- ROAS improved from 2.8x to 3.2x (+14% improvement)
- Client satisfaction score increased from 6.5/10 to 8.5/10

Long-term Impact (12 months):
- Incremental revenue: $680K (95% of projection)
- Full reallocation implemented: $1.5M moved from display to social+search
- Overall campaign ROAS improved to 3.6x (+29% vs. original)
- Client renewed contract and increased annual spend by $2M
- Case study used in new business pitches, helped win 2 additional clients

Process Changes:
- Instituted quarterly performance reviews for all clients
- Built automated ROAS tracking dashboard (Tableau)
- Channel optimization now standard part of strategy discussions

Example 2: Customer Segmentation for Targeting

Situation:
> “A B2B software client was targeting all businesses with 50-500 employees equally, spending $200K/month on LinkedIn ads with 1.2% conversion rate. Marketing team believed their product was universally valuable to mid-market companies.”

Task:
> “Analyze conversion data to identify if certain customer segments were more profitable. Stakeholders were skeptical that segmentation would improve results enough to justify complexity.”

Action:

# K-means clustering on customer characteristicsfrom sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
features = [
    'company_size',
    'industry_code',
    'revenue_estimate',
    'employee_growth_rate',
    'tech_stack_size',
    'decision_maker_level']
# Identified 4 distinct segments:# Segment 1: "Fast-growing tech startups" - 8.5% conversion rate, $2K LTV# Segment 2: "Established enterprises" - 2.1% conversion rate, $1.5K LTV# Segment 3: "Traditional small business" - 0.4% conversion rate, $500 LTV# Segment 4: "Professional services" - 4.2% conversion rate, $3K LTV# Recommendation: Focus 70% of budget on Segments 1 & 4

Result:

- Implemented segment-focused targeting in LinkedIn campaigns
- Overall conversion rate improved from 1.2% to 2.8% (+133%)
- Cost per acquisition decreased from $850 to $410 (-52%)
- ROI improved from 1.4x to 3.6x
- Client expanded program budget by $150K/month

Sample Strong Response:
> “At my previous agency, I used data analysis to drive a major campaign reallocation that generated $680K in incremental revenue for a retail client. The situation was that they were splitting their $5M annual budget evenly across social, display, and search, but performance had plateaued. The account team believed the balanced approach was safe, and the client CMO was considering pulling business. I analyzed 12 months of performance data and discovered display ads had 1.3x ROAS while social had 2.8x ROAS—a statistically significant difference (p<0.001). I built a financial model projecting that reallocating $1M from display to social would generate $720K additional revenue annually. The challenge was convincing both the internal account team, who were defensive about current strategy, and the skeptical client. I addressed objections by running a brand lift study showing display wasn’t building brand awareness, using multi-touch attribution to validate social’s performance, and proposing a 3-month pilot to minimize risk. The CMO approved the pilot. Results after 3 months showed ROAS improved from 2.8x to 3.2x, and after 12 months we achieved $680K incremental revenue (95% of projection). The client renewed and increased their annual spend by $2M. Long-term, we instituted quarterly performance reviews across all clients and built an automated ROAS tracking dashboard. The approach helped win 2 additional clients who saw the case study.”

What Interviewers Assess:
1. Strategic Thinking: Beyond tactical analysis to business impact
2. Quantified Results: Specific metrics, not vague outcomes
3. Influence Skills: How you convinced skeptical stakeholders
4. Problem Solving: Addressing objections with data
5. Communication: Translating technical analysis for non-technical audience
6. Client Mindset: Understanding business needs, not just running queries

9. SQL Joins and Data Relationships

Level: Data Analyst

Difficulty: Easy-Medium

Source: InterviewQuery IPG Mediabrands Business Analyst Interview Guide (February 2024)

Team: IPG Mediabrands Analytics, Data Engineering

Interview Round: Technical Fundamentals

Question: “Explain the difference between an inner join and an outer join in SQL. When would you use each?”

Answer:

Why This Question Matters at IPG:
At IPG, you’ll constantly join campaign data with customer data, performance metrics with budget data, and impressions with conversions. Understanding joins is fundamental to accurately analyzing marketing data and avoiding incorrect conclusions from incomplete datasets.

SQL Joins Overview:

1. INNER JOIN - Only Matching Records:

-- Returns only rows where there's a match in BOTH tablesSELECT
    c.campaign_name,
    c.impressions,
    p.conversions,
    p.revenue
FROM campaigns c
INNER JOIN performance p
    ON c.campaign_id = p.campaign_id;
-- Result: Only campaigns that HAVE performance data-- Excludes: Newly launched campaigns with no performance yet

Venn Diagram:

   Campaigns          Performance
   ┌─────────┐       ┌─────────┐
   │         │       │         │
   │    ┌────┴───────┴────┐    │
   │    │  INNER JOIN     │    │
   └────┴────────────────┬┴────┘
        │  (Only this)   │
        └────────────────┘

2. LEFT OUTER JOIN (LEFT JOIN) - All From Left Table:

-- Returns ALL rows from left table + matching rows from right table-- If no match, right table columns are NULLSELECT
    c.campaign_name,
    c.impressions,
    p.conversions,
    p.revenue
FROM campaigns c
LEFT JOIN performance p
    ON c.campaign_id = p.campaign_id;
-- Result: ALL campaigns (even those with no performance data yet)-- Use case: Show all campaigns including newly launched ones

Venn Diagram:

   Campaigns          Performance
   ┌─────────┐       ┌─────────┐
   │█████████│       │         │
   │█████┌───┴───────┴────┐    │
   │█████│  LEFT JOIN     │    │
   └─────┴────────────────┬┴────┘
        │  (All left +   │
        │   matching)    │
        └────────────────┘

3. RIGHT OUTER JOIN (RIGHT JOIN) - All From Right Table:

-- Returns ALL rows from right table + matching rows from left table-- Rarely used (can rewrite as LEFT JOIN by swapping table order)SELECT
    c.campaign_name,
    c.impressions,
    p.conversions,
    p.revenue
FROM campaigns c
RIGHT JOIN performance p
    ON c.campaign_id = p.campaign_id;
-- Result: ALL performance records (even if campaign was deleted)

4. FULL OUTER JOIN - All From Both Tables:

-- Returns ALL rows from BOTH tables-- If no match, fills with NULLSELECT
    COALESCE(c.campaign_id, p.campaign_id) AS campaign_id,
    c.campaign_name,
    p.conversions
FROM campaigns c
FULL OUTER JOIN performance p
    ON c.campaign_id = p.campaign_id;
-- Result: All campaigns AND all performance records-- Use case: Identify campaigns with no performance AND performance with no campaign

IPG Marketing Analytics Use Cases:

Use Case 1: Campaign Performance Report (LEFT JOIN)

-- Show ALL campaigns, including those with zero conversionsSELECT
    c.campaign_name,
    c.start_date,
    c.budget,
    COALESCE(p.conversions, 0) AS conversions,
    COALESCE(p.revenue, 0) AS revenue,
    CASE
        WHEN p.conversions IS NULL THEN 'No Data Yet'        WHEN p.conversions = 0 THEN 'Zero Conversions'        ELSE 'Active'    END AS status
FROM campaigns c
LEFT JOIN performance p
    ON c.campaign_id = p.campaign_id
WHERE c.start_date >= '2024-01-01'ORDER BY c.start_date DESC;

Why LEFT JOIN here?
- Want to see ALL campaigns, even newly launched ones without performance
- Helps identify campaigns that might have tracking issues (launched but no data)
- Provides complete picture of campaign inventory

Use Case 2: Analyzing Engaged Users Only (INNER JOIN)

-- Only analyze users who BOTH clicked ads AND made purchasesSELECT
    u.user_id,
    u.demographics,
    c.channel,
    p.purchase_amount,
    p.purchase_date
FROM users u
INNER JOIN clicks c ON u.user_id = c.user_id
INNER JOIN purchases p ON u.user_id = p.user_id
WHERE p.purchase_date >= '2024-01-01';

Why INNER JOIN here?
- Only care about users in all three tables (users who clicked AND purchased)
- Analyzing conversion funnel, not all users
- Excludes users who only clicked but didn’t purchase

Use Case 3: Identifying Data Gaps (FULL OUTER JOIN)

-- Find campaigns with performance data but no campaign record (data quality issue)-- OR campaigns with no performance data (tracking issue)SELECT
    COALESCE(c.campaign_id, p.campaign_id) AS campaign_id,
    c.campaign_name,
    p.impressions,
    CASE
        WHEN c.campaign_id IS NULL THEN 'Performance data without campaign (ERROR)'        WHEN p.campaign_id IS NULL THEN 'Campaign without performance (Tracking issue?)'        ELSE 'OK'    END AS data_quality_flag
FROM campaigns c
FULL OUTER JOIN performance p
    ON c.campaign_id = p.campaign_id
WHERE c.campaign_id IS NULL OR p.campaign_id IS NULL;

Why FULL OUTER JOIN here?
- Identifying data quality issues requires seeing both sides
- Want to catch orphaned records in either table
- Data validation use case

Performance Considerations:

-- INNER JOIN is typically fastest-- - Smaller result set (only matches)-- - Can use indexes on both sides efficiently-- LEFT JOIN considerations-- - Returns at least as many rows as left table-- - Be careful with large left tables-- FULL OUTER JOIN is slowest-- - Returns potentially all rows from both tables-- - Use sparingly on large tables-- Optimization: Add indexes on join keysCREATE INDEX idx_campaigns_id ON campaigns(campaign_id);
CREATE INDEX idx_performance_campaign_id ON performance(campaign_id);

Common Mistakes to Avoid:

Mistake 1: Using INNER JOIN when you need LEFT JOIN

-- ❌ WRONG: This excludes campaigns with no clicks (looks like they don't exist)SELECT c.campaign_name, COUNT(cl.click_id) AS clicks
FROM campaigns c
INNER JOIN clicks cl ON c.campaign_id = cl.campaign_id
GROUP BY c.campaign_name;
-- ✓ CORRECT: Show ALL campaigns, even those with 0 clicksSELECT c.campaign_name, COUNT(cl.click_id) AS clicks
FROM campaigns c
LEFT JOIN clicks cl ON c.campaign_id = cl.campaign_id
GROUP BY c.campaign_name;

Mistake 2: Forgetting COALESCE with Outer Joins

-- ❌ WRONG: NULLs will break calculationsSELECT campaign_name, SUM(conversions) AS total_conversions
FROM campaigns c
LEFT JOIN performance p ON c.campaign_id = p.campaign_id
GROUP BY campaign_name;
-- ✓ CORRECT: Handle NULLs appropriatelySELECT campaign_name, SUM(COALESCE(conversions, 0)) AS total_conversions
FROM campaigns c
LEFT JOIN performance p ON c.campaign_id = p.campaign_id
GROUP BY campaign_name;

Decision Tree for Choosing Join Type:

Question 1: Do you need ALL records from one table?
│
├─ YES → Question 2: Which table do you need ALL records from?
│        │
│        ├─ Left table → Use LEFT JOIN
│        ├─ Right table → Use RIGHT JOIN (or swap tables + LEFT JOIN)
│        └─ Both tables → Use FULL OUTER JOIN
│
└─ NO → Do you only want records that match in BOTH tables?
         │
         ├─ YES → Use INNER JOIN
         └─ NO → Reconsider what you're trying to answer

Sample Strong Response:
> “An INNER JOIN returns only rows where there’s a match in BOTH tables—it’s the intersection of the two datasets. A LEFT OUTER JOIN returns ALL rows from the left table plus matching rows from the right table, filling with NULLs where there’s no match. RIGHT JOIN is the opposite, and FULL OUTER JOIN returns all rows from both tables. At IPG, I use LEFT JOIN most frequently—for example, when generating campaign performance reports where I want to show ALL campaigns including newly launched ones that don’t have performance data yet. I use INNER JOIN when analyzing only engaged users who both clicked ads AND made purchases, since I only care about the overlap. I use FULL OUTER JOIN rarely, mainly for data quality checks to identify orphaned records in either table. The key consideration is: do you want to see ALL records from one side, or only where there’s a match on both sides? For example, if I used INNER JOIN when showing campaign performance, it would make newly launched campaigns invisible, which would be misleading to the account team. Performance-wise, INNER JOIN is fastest since it returns the smallest result set, and I always add indexes on join keys for large tables.”

What Interviewers Assess:
1. Conceptual Understanding: Can you explain the difference clearly?
2. Practical Application: When to use each type
3. Marketing Context: Real examples from advertising analytics
4. Edge Cases: Understanding NULL handling with outer joins
5. Performance Awareness: Impact on query speed

10. Advanced SQL Window Functions for Performance Analysis

Level: Data Analyst to Senior Data Analyst

Difficulty: Hard

Source: DataLemur Interpublic Group SQL Interview Questions (April 2025)

Team: IPG Mediabrands Analytics, Performance Optimization

Interview Round: Advanced SQL Technical

Question: “Write a SQL query using window functions to calculate the average performance score for each employee over their last three projects.”

Answer:

Why This Question Matters at IPG:
Window functions are essential for advanced analytics at IPG—calculating rolling averages for campaign performance, ranking top performers, computing running totals for budget pacing, and trend analysis. This tests your ability to work with ordered data and perform calculations across row ranges.

Database Schema:

CREATE TABLE employee_projects (
    employee_id INT,
    project_id INT,
    completion_date DATE,
    performance_score INT  -- Score from 1-10);

Sample Data:

INSERT INTO employee_projects VALUES(1001, 5001, '2022-01-10', 8),
(1001, 5002, '2022-05-02', 9),
(1001, 5003, '2022-09-15', 7),
(1001, 5004, '2022-12-20', 10),
(1002, 6001, '2022-02-15', 6),
(1002, 6002, '2022-06-10', 8),
(1002, 6003, '2022-10-05', 9);

Solution Using Window Functions:

SELECT
    employee_id,
    project_id,
    completion_date,
    performance_score,
    -- Calculate average over last 3 projects (current + 2 preceding)    ROUND(AVG(performance_score) OVER (
        PARTITION BY employee_id
        ORDER BY completion_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW    ), 2) AS avg_last_3_projects,
    -- Bonus: Count how many projects included in average (1-3)    COUNT(*) OVER (
        PARTITION BY employee_id
        ORDER BY completion_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW    ) AS projects_in_average
FROM employee_projects
ORDER BY employee_id, completion_date;

Expected Output:

| employee_id | project_id | completion_date | performance_score | avg_last_3_projects | projects_in_average |
|-------------|------------|-----------------|-------------------|---------------------|---------------------|
| 1001        | 5001       | 2022-01-10      | 8                 | 8.00                | 1                   |
| 1001        | 5002       | 2022-05-02      | 9                 | 8.50                | 2                   |
| 1001        | 5003       | 2022-09-15      | 7                 | 8.00                | 3                   |
| 1001        | 5004       | 2022-12-20      | 10                | 8.67                | 3                   |
| 1002        | 6001       | 2022-02-15      | 6                 | 6.00                | 1                   |
| 1002        | 6002       | 2022-06-10      | 8                 | 7.00                | 2                   |
| 1002        | 6003       | 2022-10-05      | 9                 | 7.67                | 3                   |

Window Function Components Explained:

1. PARTITION BY - Create Groups:

PARTITION BY employee_id
-- Separate calculation for each employee-- Like GROUP BY but doesn't collapse rows

2. ORDER BY - Define Order Within Partition:

ORDER BY completion_date
-- Chronological order within each employee-- Defines what "last three" means

3. ROWS BETWEEN - Define Window Frame:

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW-- Current row + 2 rows before it = 3 rows total-- "Rolling 3-project window"

IPG Marketing Analytics Applications:

Application 1: Rolling 7-Day Campaign Performance:

-- Calculate 7-day moving average of CTRSELECT
    date,
    campaign_name,
    clicks,
    impressions,
    (clicks::DECIMAL / impressions) * 100 AS daily_ctr,
    -- 7-day rolling average CTR    ROUND(AVG(clicks::DECIMAL / impressions) OVER (
        PARTITION BY campaign_name
        ORDER BY date        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW    ) * 100, 2) AS rolling_7day_ctr
FROM campaign_daily_performance
WHERE date >= '2024-01-01'ORDER BY campaign_name, date;

Why this matters:
- Smooth out daily volatility
- Identify trends vs. one-day spikes
- Better signal-to-noise ratio for optimization decisions

Application 2: Budget Pacing with Running Total:

-- Track cumulative spend vs. budget throughout the monthSELECT
    date,
    campaign_name,
    daily_spend,
    monthly_budget,
    -- Running total of spend so far this month    SUM(daily_spend) OVER (
        PARTITION BY campaign_name, DATE_TRUNC('month', date)
        ORDER BY date        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW    ) AS cumulative_spend_mtd,
    -- Budget pacing: % of month elapsed vs. % of budget spent    (SUM(daily_spend) OVER (
        PARTITION BY campaign_name, DATE_TRUNC('month', date)
        ORDER BY date        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW    ) / monthly_budget * 100) AS pct_budget_spent,
    (EXTRACT(DAY FROM date) /
     EXTRACT(DAY FROM DATE_TRUNC('month', date) + INTERVAL '1 month' - INTERVAL '1 day')
     * 100) AS pct_month_elapsed
FROM campaign_spend
WHERE date >= DATE_TRUNC('month', CURRENT_DATE)
ORDER BY campaign_name, date;

Use case: Alert if campaign is pacing 20% ahead or behind budget

Application 3: Ranking Top Performers:

-- Identify top 3 campaigns by ROAS within each channelSELECT
    channel,
    campaign_name,
    roas,
    -- Rank within channel (1 = best)    RANK() OVER (
        PARTITION BY channel
        ORDER BY roas DESC    ) AS rank_in_channel,
    -- Alternative: ROW_NUMBER (no ties)    ROW_NUMBER() OVER (
        PARTITION BY channel
        ORDER BY roas DESC    ) AS row_num,
    -- Percentile ranking (0-1)    PERCENT_RANK() OVER (
        PARTITION BY channel
        ORDER BY roas DESC    ) AS percentile_rank
FROM campaign_performance
WHERE date_range = 'Q4 2024'QUALIFY rank_in_channel <= 3  -- Snowflake syntax to filter window resultsORDER BY channel, rank_in_channel;

Application 4: Period-Over-Period Comparison:

-- Compare current week performance to previous weekSELECT
    week,
    conversions,
    -- Previous week's conversions    LAG(conversions, 1) OVER (ORDER BY week) AS prev_week_conversions,
    -- Week-over-week change    conversions - LAG(conversions, 1) OVER (ORDER BY week) AS wow_change,
    -- Percentage change    ROUND((conversions - LAG(conversions, 1) OVER (ORDER BY week))::DECIMAL /
          LAG(conversions, 1) OVER (ORDER BY week) * 100, 1) AS wow_pct_change
FROM weekly_campaign_summary
WHERE week >= DATE_TRUNC('week', CURRENT_DATE - INTERVAL '8 weeks')
ORDER BY week;

Window Function Types:

Function	Purpose	Example Use at IPG
AVG(), SUM(), COUNT()	Rolling aggregates	7-day moving average CTR
LAG(), LEAD()	Access previous/next row	Week-over-week comparison
RANK(), ROW_NUMBER(), DENSE_RANK()	Ranking	Top 10 campaigns by ROAS
FIRST_VALUE(), LAST_VALUE()	Access first/last in window	Compare to best performer
PERCENT_RANK(), NTILE()	Distribution analysis	Quartile segmentation

Frame Types Comparison:

-- ROWS: Physical number of rowsROWS BETWEEN 2 PRECEDING AND CURRENT ROW-- Always includes exactly 3 rows (or fewer at start)-- RANGE: Logical range based on valueRANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW-- Includes all rows within 7 days (could be 1 row or 100 rows)-- UNBOUNDED: Include all rows from start/endROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW-- Running total from beginning

Common Mistakes to Avoid:

Mistake 1: Forgetting ORDER BY in Window

-- ❌ WRONG: No ORDER BY means arbitrary orderAVG(performance_score) OVER (PARTITION BY employee_id)
-- Result: Average of ALL projects (not rolling)-- ✓ CORRECT: ORDER BY defines what "last 3" meansAVG(performance_score) OVER (
    PARTITION BY employee_id
    ORDER BY completion_date
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)

Mistake 2: Using WHERE to Filter Window Results

-- ❌ WRONG: WHERE filters before window calculationSELECT *, RANK() OVER (ORDER BY roas DESC) as rankFROM campaigns
WHERE rank <= 10;  -- Error: Can't reference window function in WHERE-- ✓ CORRECT: Use subquery or QUALIFYSELECT * FROM (
    SELECT *, RANK() OVER (ORDER BY roas DESC) as rank    FROM campaigns
) sub
WHERE rank <= 10;

Performance Optimization:

-- Add indexes on PARTITION BY and ORDER BY columnsCREATE INDEX idx_projects_employee_date
    ON employee_projects(employee_id, completion_date);
-- For large datasets, consider materialized viewsCREATE MATERIALIZED VIEW campaign_rolling_metrics ASSELECT
    date,
    campaign_name,
    AVG(ctr) OVER (
        PARTITION BY campaign_name
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW    ) AS rolling_7day_ctr
FROM campaign_daily_performance;
-- Refresh dailyREFRESH MATERIALIZED VIEW campaign_rolling_metrics;

Sample Strong Response:
> “Window functions allow you to perform calculations across a set of rows related to the current row without collapsing the result set like GROUP BY does. For this question, I use AVG() as a window function with three key components: PARTITION BY employee_id to calculate separately for each employee, ORDER BY completion_date to define chronological order, and ROWS BETWEEN 2 PRECEDING AND CURRENT ROW to create a rolling 3-project window. This gives the average of the current project plus the two immediately preceding projects. At IPG, I use window functions extensively—for example, calculating 7-day rolling average CTR to smooth out daily volatility in campaign performance, or using LAG() to compute week-over-week changes in conversions. I also use RANK() to identify top-performing campaigns within each channel, and running totals with SUM() OVER to track budget pacing throughout the month. The key advantage of window functions is you can perform these calculations without losing row-level detail, unlike GROUP BY which collapses rows. For performance, I ensure indexes exist on the PARTITION BY and ORDER BY columns, and for frequently-used rolling metrics, I create materialized views that refresh daily.”

What Interviewers Assess:
1. Advanced SQL Skills: Understanding of window functions vs. GROUP BY
2. Frame Specification: ROWS BETWEEN syntax and logic
3. Practical Application: Real marketing analytics use cases
4. Performance Awareness: Indexing and optimization
5. Business Context: Why rolling averages matter for campaigns

This comprehensive guide covers the 10 most challenging Data Analyst and Business Intelligence Analyst interview questions at IPG Mediabrands, with detailed answers, code examples, and practical applications specific to marketing analytics. Each answer demonstrates technical proficiency, business acumen, and communication skills critical for success in IPG’s client-facing analytics roles.