CRED — Data Scientist Interview Questions

CRED's data scientist role (under Prefr, their lending platform) sits at the intersection of credit risk, growth, and product analytics. You are not running ad-hoc queries — you are building propensity models that determine who gets a credit offer and cohort analyses that shape how the product retains users at scale. The interview tests whether you can move from a vague business question to a structured, opinionated recommendation fast, and whether your thinking survives contact with a sceptical PM or risk lead who will push back on every assumption.

CRED's Interview Process for Data Scientist

Typically 3–4 rounds: an initial SQL/Python screen on fundamentals, a take-home analytics case (usually a funnel or retention problem on synthetic data), a technical deep-dive on modelling and experimentation, and a final stakeholder round testing how clearly you communicate insight to non-technical partners. 4+ years of hands-on data science or analytics experience required.

Question 1: Churn Propensity Modelling

Prefr's monthly active repayers have dropped 8% over the last 6 weeks. Your PM asks: "Which users are about to churn, and what should we do about them?" Walk through how you'd build a churn propensity model end-to-end — from defining churn to deploying an intervention.

Why interviewers ask this

Tests whether you can operationalise a vague business concern into a complete modelling workflow — defining the label, engineering relevant features, choosing evaluation metrics that match the business problem, and connecting model output to an actual action. Weak candidates describe a generic scikit-learn pipeline with no domain grounding. Strong candidates define churn specifically in a repayment context, acknowledge the class imbalance problem, and design the intervention alongside the model rather than as an afterthought.

Example strong answer

The first thing I would do is resist the urge to build a model immediately. The PM's question has three embedded assumptions that need unpacking before any modelling starts: that churn is well-defined, that it is predictable from signals we already have, and that there is an intervention available once we identify at-risk users. None of these can be taken for granted in a lending context.

So I would start by defining churn precisely. In a repayment product, churn is not simply "stopped using the app." For Prefr, I would define it as a user who has not made a scheduled repayment and has not opened the app in 30 consecutive days, where their historical pattern shows at least weekly activity. That 30-day window is meaningful because it aligns with the DPD30 trigger for a collections handoff — so our intervention window is really the 7 to 22 days before that threshold.

Critically, I would separate voluntary churn (the user actively disengages, perhaps because they are unhappy with the product or found a better rate elsewhere) from involuntary churn (a payment failed due to insufficient funds). The interventions are entirely different — one needs a product or trust nudge, the other needs a payment facilitation solution like an EMI restructuring offer or a payment date change.

With the label defined, I would pull a training dataset from 60 to 90 days prior, labelling users as churned or active. Class imbalance is real here — in most fintech retention contexts, churn rates sit at 5 to 15%, so the negative class dominates. I would use class weighting in XGBoost or LightGBM rather than oversampling, to preserve the original distribution's statistical properties when the model scores live users.

Feature engineering from three sources. Loan behaviour: days since last repayment, number of missed EMIs in last 90 days, loan utilisation rate, whether the user ever requested a restructuring. App engagement: days since last login, frequency of reward redemptions (a proxy for CRED-native engagement beyond just repayment), notification open rate in the last 30 days. Bureau signals, if accessible: whether a hard enquiry appeared at another lender in the last 60 days, which is a strong signal the user is shopping for credit elsewhere and often precedes voluntary churn by 3 to 4 weeks.

For evaluation, I would use AUC and precision at the top two deciles rather than overall accuracy. In a 200,000-user portfolio with a 2% churn rate, there are 4,000 at-risk users — but the intervention budget realistically covers reaching 800 to 1,000 of them. So I need the model to rank-order risk accurately in the top decile, not classify the full population correctly. A 95% accurate model that buries the at-risk users in the middle deciles is useless for this problem.

Deployment: a daily batch scoring job feeding a CRM segment. Intervention for the top decile: a personalised push notification at day 7 of inactivity, anchored to the user's specific loan balance and upcoming EMI date, not a generic reward nudge. For users in the 60th to 80th percentile, a lower-touch automated email.

Measurement: pre-registered randomisation of the at-risk population into treatment and holdout before the campaign launches. Primary metric is 30-day repayment rate in treatment versus control. If the lift is statistically significant and the intervention cost — campaign operations plus any cashback incentive — is lower than the avoided expected credit loss on those loans, I would recommend scaling. I would also track the 60-day repayment rate separately, because some interventions shift repayment timing rather than preventing churn, and reporting that as a win would be misleading to the business.

Follow-up questions

Your model has 78% AUC but the intervention campaign shows zero lift in a 14-day holdout test. What do you investigate first?
Product says churn has two types — voluntary and involuntary. How does your model handle them differently, and does that change your feature set or your deployment logic?

Question 2: Campaign Analytics

You ran a push notification campaign targeting 200,000 Prefr users offering ₹500 cashback on their next EMI repayment. Open rate was 18%, click-through was 4%, but repayment conversion lifted by only 0.3% vs. control. The business team says the campaign "worked." Do you agree?

Why interviewers ask this

Tests ability to distinguish vanity metrics from business outcomes, and whether you can form and defend a clear point of view rather than presenting numbers and letting the room decide. Weak candidates say the results are mixed and need further analysis. Strong candidates quantify the outcome in rupees, identify the segmentation failure, and propose a specific next experiment with a testable hypothesis.

Example strong answer

My answer is no — but the reasoning matters more than the conclusion, and I would walk the business team through the numbers rather than simply disagreeing.

A 0.3% conversion lift on 200,000 users means 600 incremental repayments. At an average EMI of ₹8,000, that is ₹48 lakh in repayments triggered. The cashback cost is ₹500 per converter, so ₹3 lakh in direct spend plus campaign operations. On paper, that looks positive.

But the critical question is whether these were truly incremental repayments or whether we paid ₹500 to users who would have repaid regardless. To test this, I would look at the repayment propensity distribution of the users who converted in the treatment group. If they are concentrated in the top two propensity deciles — the users with an 80%+ baseline repayment probability — then we have confirmed the campaign primarily cannibalised organic conversions rather than preventing genuine defaults. The marginal unit economics are then deeply negative: we spent ₹500 to move forward a repayment by a few days for a user who was never at risk of not repaying.

The deeper problem is segmentation. A blanket cashback offer on 200,000 users is a blunt instrument applied to a population that is mostly not the target. The users who most need an incentive to repay — the borderline 40 to 65% propensity band, roughly 30,000 to 40,000 users in this population — are exactly the ones for whom a ₹500 cashback might actually change behaviour. The top-propensity users simply pocketed the cashback on a repayment they were going to make anyway.

When the business team says the campaign worked, I would reframe it specifically: "It generated positive ROI on paper at the population level, but the targeting efficiency was low. We spent 60% of our cashback budget on users who did not need it. The next test should restrict the cashback offer to the 40 to 65% propensity band — I would estimate we can achieve 2 to 3x the conversion lift at less than half the current budget, because we are reaching the population where the offer actually changes behaviour."

I would also flag a measurement risk. If we are only measuring repayment in the 14 days post-notification, we may be missing repayments that the campaign inadvertently delayed — users who planned to pay early but waited for the cashback window. A 60-day measurement window with a breakdown of repayment timing by day would give a cleaner read on whether we are genuinely preventing defaults or simply shifting their timing.

Follow-up questions

How would you design the next campaign iteration to improve ROI by 3x without increasing the total cashback budget?
The business team wants to scale to 1 million users next month. What data would you need before giving a recommendation?

Question 3: A/B Experiment Validity

A PM ran an A/B test on a new loan eligibility UI. Variant B showed 12% higher approval click-through after 3 days, and the PM wants to ship. You're asked to review. What do you check?

Why interviewers ask this

Tests statistical rigour and the willingness to push back on premature conclusions without being obstructionist. Weak candidates validate the p-value and sign off. Strong candidates interrogate the experiment design, flag the novelty effect risk, distinguish between click-through as a proxy and the actual business metric, and give a clear recommendation rather than hedging indefinitely.

Example strong answer

I would run four checks in sequence before giving the PM a recommendation.

First, power and sample size. Was the experiment actually powered to detect a 12% relative lift at 80% power and 95% confidence? If the baseline click-through rate is around 55%, a 12% relative improvement means moving from 55% to roughly 61.6% — a 6.6 percentage point absolute difference. To detect that reliably requires approximately 1,200 users per arm in a two-tailed test. If the PM ran the test on a small traffic segment and has far fewer than that, the observed lift could easily be noise, even if it looks directionally positive. I would calculate the required sample size from the baseline rate and the claimed effect size before looking at any other output.

Second, novelty effect. This is the most common trap in short-duration UI tests, and 3 days is far too short to rule it out. When users encounter a new interface, they engage more simply because it is different — the interaction is novel, not because the design is genuinely better. This effect typically decays within 5 to 10 days as the novelty wears off. At 3 days, there is a meaningful probability that what we are seeing is novelty-driven click inflation rather than a durable improvement in user intent. I would recommend extending the test to at least 14 days to see whether the lift holds.

Third, randomisation integrity. Is the variant-to-control split truly random, and are the two arms comparable? I would check whether bureau score distribution, requested loan amount, device type — Android versus iOS — and partner channel source are balanced between arms. If one arm happens to be overrepresented in high-bureau-score users who are already more likely to click through, the observed lift is partly confounded by pre-existing differences between the groups, not the UI change. This is a 15-minute SQL check that should run before any test result is interpreted.

Fourth, and most importantly, the business metric. Click-through is a proxy, not the outcome we care about. The real question is whether Variant B leads to more completed loan applications and ultimately more disbursements. A UI that makes users click "Check Eligibility" more aggressively might be generating false intent — users who click but do not complete the application because they find the terms or the next screen unattractive. I would pull the funnel continuation rate for both arms: of users who clicked through, what fraction completed the full application? If Variant B's continuation rate is lower than Variant A's, the higher click rate is misleading and could actually indicate the UI is setting incorrect expectations.

My recommendation to the PM: extend to 14 days, verify the randomisation, and add a funnel continuation rate as a co-primary metric. If at day 14 the lift holds and funnel rates are equivalent, ship. If there is business urgency, a staged rollout to 20% of traffic with a 7-day monitoring window is a reasonable middle path.

Follow-up questions

The PM says 3 days is enough because they have seen this pattern hold before. How do you handle that pushback without blocking the roadmap unnecessarily?

Question 4: Cohort Retention SQL

Build a SQL query that shows monthly repayment retention by loan disbursement cohort for the last 6 months. Describe the logic and what you'd look for in the output.

Why interviewers ask this

Tests practical SQL grounded in a real lending context — not academic joins but cohort logic that a risk lead or PM would actually request in a Monday morning review meeting. Strong candidates write correct, clean SQL, explain the key implementation choices, and immediately connect the output to business meaning rather than describing the query mechanically.

Example strong answer

The core logic is a standard cohort retention structure: group users by the month they first received a disbursement, then for each subsequent calendar month, count how many users from that cohort made at least one repayment. Dividing by cohort size gives the retention rate per month-since-disbursement.

SELECT
  DATE_TRUNC('month', l.disbursement_date)        AS cohort_month,
  DATE_TRUNC('month', r.repayment_date)            AS repayment_month,
  DATEDIFF('month',
    DATE_TRUNC('month', l.disbursement_date),
    DATE_TRUNC('month', r.repayment_date))         AS months_since_disbursement,
  COUNT(DISTINCT r.user_id)                        AS active_repayers,
  COUNT(DISTINCT r.user_id) * 1.0
    / MAX(cs.cohort_size)                          AS retention_rate
FROM repayments r
JOIN loans l
  ON r.user_id = l.user_id
JOIN (
  SELECT
    DATE_TRUNC('month', disbursement_date) AS cohort_month,
    COUNT(DISTINCT user_id)                AS cohort_size
  FROM loans
  WHERE disbursement_date >= DATEADD('month', -6, CURRENT_DATE)
  GROUP BY 1
) cs
  ON DATE_TRUNC('month', l.disbursement_date) = cs.cohort_month
WHERE l.disbursement_date >= DATEADD('month', -6, CURRENT_DATE)
GROUP BY 1, 2, 3
ORDER BY 1, 3;

Two implementation choices worth flagging explicitly. First, I am joining repayments to loans via user_id to anchor each repayment to a disbursement cohort — but if users can have multiple loans across different months, I would want to anchor to their first disbursement date to avoid counting the same user in multiple cohorts and inflating retention. I would add a subquery using MIN(disbursement_date) per user, or filter using a loan_sequence flag if the schema supports it. Second, I am using COUNT(DISTINCT user_id) rather than COUNT(repayment_id) because retention means a user was active that month, not how many payment events they generated. A user making three partial payments in one month should count as one retained user.

In the output, I would be looking for three specific patterns. The first is month-1 retention across all cohorts — if it is consistently below 70%, that signals an onboarding problem where users receive their loan but do not establish a repayment habit with the app. The second is any cohort-specific anomaly: if the March cohort drops sharply at month 2 while the February and April cohorts hold steady, something changed in March — underwriting quality, a new partner channel, a product change, or a bureau data issue affecting that cohort specifically. That is the signal the risk team needs to investigate before it shows up in DPD numbers. The third is whether recent cohorts are tracking below older cohorts at equivalent months-since-disbursement points, which would indicate deteriorating portfolio quality in the most recent originations and warrants an immediate deep-dive into the March and April underwriting vintages.

Follow-up questions

Retention is declining for the March cohort starting specifically at month 2. What are your top three hypotheses and how would you test each one efficiently?

Question 5: Cross-Functional Trade-off

The risk team flags that a new user segment you're recommending for a credit limit increase has a predicted default rate of 4.2%, vs. the current portfolio average of 2.8%. The business team wants to proceed. How do you navigate this?

Why interviewers ask this

Tests whether you can structure genuinely ambiguous trade-off decisions and communicate them clearly to stakeholders with competing incentives, without deferring entirely to whoever argues most forcefully or blocking a business decision with a theoretical concern. This is a core competency at Prefr where data scientists regularly sit between the commercial team and the risk function and are expected to drive the decision, not just document the disagreement.

Example strong answer

The first thing I would resist is framing this as a risk-versus-business conflict, because that framing guarantees that someone loses and no one actually makes a well-reasoned decision. Instead, I would convert it into a P&L question that both sides can engage with on the same terms.

Here is the calculation I would bring to the room. On a 10,000-user rollout, a 4.2% predicted default rate means 420 expected defaults versus 280 at the portfolio average — 140 incremental defaults. At an average outstanding loan of ₹1.5 lakh and a 55% LGD after collections recovery, the incremental expected credit loss is approximately ₹1.16 crore. That is the cost side of the equation.

Now the upside. If this segment has a 30% higher origination acceptance rate than our current declined population — meaning we are approving users we previously turned away — and their average loan is ₹2 lakh over a 12-month tenor at an 18% annualised rate, the incremental gross interest income on ₹6 crore of new principal is approximately ₹1.08 crore. The math is nearly break-even at current rates, which means the decision hinges on two things: whether our 4.2% default prediction is accurate for this segment, and whether our LGD estimate reflects how collections actually performs on this demographic.

Those two uncertainties are exactly why I would not recommend a full rollout, and I would also not block the expansion. I would propose a controlled pilot: 2,000 users from the segment, randomised against a 2,000-user holdout that continues to be declined, with a 60-day monitoring window and a pre-agreed kill switch triggered if DPD30+ in the pilot exceeds 6% at the day-30 interim read. This structure gives the business team the experiment they want, gives the risk team a circuit breaker with a clear numeric threshold, and gives us 60 days of empirical data on whether our model's prediction is accurate before we have meaningful credit exposure.

When presenting this to both teams, I would frame it explicitly: if the pilot confirms a default rate near 4.2%, the economics are marginal at current pricing and we would need to either increase the rate by approximately 150 basis points or reduce the maximum tenor from 18 to 12 months to make the segment profitable. That is a business decision for the risk and commercial leads — but I have structured it so they are choosing between options with quantified trade-offs, not debating who is being more conservative.

Follow-up questions

The pilot comes back with a 3.6% default rate at day 60, lower than the predicted 4.2%. Does that automatically mean you approve the full rollout?
The risk lead wants to see confidence intervals on the 4.2% prediction before approving the pilot. How do you produce that and what range would give you concern?

Preparation tip

CRED's data science interviews consistently reward one habit: forming a point of view and defending it with numbers. Every answer should end with a concrete recommendation — what you would do, why, what would change your mind, and what the measurement plan looks like. Candidates who present options without recommending one rarely advance past the technical round.

Back