Untitled

InterviewBee — UX Designer Premium Question Bank


Question 1: End-to-End Design Process — Redesigning a Broken User Journey

Difficulty: Senior | Role: UX Designer | Level: Senior | Company Examples: Google, Airbnb, Spotify, Booking.com, Figma


The Question

You are a Senior UX Designer at a B2C fintech company. Usability testing has revealed that 61% of new users abandon the onboarding flow before completing their first transaction — a 14-step process that includes identity verification, bank account linking, and a first transfer. The product team has received conflicting feedback: some users say the flow is too long; others say it is confusing; and a segment says they distrust the app with their financial information. The Head of Product has given you 6 weeks to redesign the onboarding experience and achieve a target completion rate of 80%. Walk through your end-to-end design process: how you would diagnose the problem, define the design direction, and deliver a tested solution within 6 weeks.


1. What Is This Question Testing?

  • Design process maturity — understanding that the correct first step is not opening Figma; it is diagnosing where in the 14-step flow the abandonment is occurring and why; designing a solution before diagnosing the problem is the most common mistake junior UX designers make, and it is what separates a senior designer who improves conversion from one who redesigns aesthetics while the problem persists
  • Research literacy — knowing which research methods answer which questions at speed: session recordings and funnel analytics identify where users drop off (quantitative), while contextual interviews and usability tests reveal why they drop off (qualitative); a 6-week timeline requires rapid research synthesis, not a multi-month discovery phase
  • Systems thinking — the three categories of user feedback (too long, confusing, distrust) are not three separate problems — they may be three manifestations of the same root cause; a 14-step flow that is poorly sequenced creates both cognitive overload (confusing) and perceived length (too long); and a flow that asks for sensitive financial information before establishing trust creates distrust; the design direction must address the structural cause, not each symptom independently
  • Interaction design depth — onboarding flows for regulated fintech have specific design constraints: KYC (Know Your Customer) identity verification steps cannot be removed without regulatory compliance risk; however, they can be reordered (defer the most friction-heavy steps until after the user has experienced value), reframed (explain why each step exists in plain language), and shortened (progressive disclosure reduces perceived complexity without removing required information)
  • Stakeholder communication — a 6-week timeline means the designer must manage scope aggressively; redesigning 14 steps with full user testing and engineering implementation in 6 weeks is not achievable; the conversation with the Head of Product must establish what "redesign" means — is it a full implementation or a validated prototype that can be handed to engineering?
  • Measurement orientation — the 80% completion rate target is the right framing, but the designer must also define the intermediate metrics that indicate the redesign is working before the full cohort data is available: step-level completion rates, time-on-step, and qualitative usability test task success rates are all leading indicators that the design direction is correct

2. Framework: Diagnose-Define-Design-Test-Deliver (DDDTD)

  1. Assumption Documentation — Confirm access to: product analytics (funnel drop-off by step), session recordings (Hotjar or FullStory), any existing user research or exit survey data, the regulatory constraints on each KYC step (which steps are legally mandatory vs. operationally convenient), and engineering capacity for implementation within the 6-week window
  1. Constraint Analysis — FCA-regulated fintech: KYC steps are legally required before any money movement; the redesign can reorder and reframe these steps but cannot remove them; 6-week timeline means the deliverable is a validated high-fidelity prototype, not a fully implemented production flow
  1. Tradeoff Evaluation — Progressive onboarding (let users experience value before completing all KYC steps) vs. front-loaded onboarding (complete all verification upfront for a clean user state); progressive onboarding increases completion rates but creates a more complex product state machine that engineering must manage; the design decision must be made in consultation with the engineering lead
  1. Hidden Cost Identification — The redesign's success depends on the copy and microcopy strategy — a beautifully designed step that uses jargon ("AML verification in progress") generates the same confusion as a poorly designed step; the content design work (plain-language rewrites of every step prompt) is often the highest-impact, most under-resourced element of an onboarding redesign
  1. Risk Signals / Early Warning Metrics — Step-level completion rates in the prototype usability test (target: no single step with a task failure rate above 15%), time-on-step in prototype testing (any step where participants spend more than 90 seconds on a task that should take 30 seconds indicates a comprehension problem), trust indicators in post-test interviews ("at what point did you feel confident this was a legitimate service?")
  1. Pivot Triggers — If usability testing of the redesigned prototype shows that the identity verification step (document upload + liveness check) is still generating a 40%+ abandonment rate regardless of surrounding design changes: the problem may be with the third-party verification provider's embedded flow, not the product design; escalate to the product manager with a recommendation to evaluate alternative KYC vendors
  1. Long-Term Evolution Plan — Immediate redesign targets the completion rate; Month 3: A/B test the redesigned vs. original flow in production to validate the improvement; Month 6: longitudinal retention study — do users who complete the redesigned onboarding have better 30-day retention than users who completed the original flow?

3. The Answer

Explicit Assumptions:

  • Analytics platform: Mixpanel with step-level funnel data; session recording: FullStory; no prior qualitative research on the onboarding flow
  • The 14 steps: account creation (email + password) → email verification → phone number → SMS verification → personal details (name, DOB, address) → document upload (passport/driving licence) → liveness check (selfie) → bank account linking (Open Banking) → spending category preferences → notification preferences → first transfer amount input → transfer review → transfer confirmation → welcome screen
  • Engineering capacity: 2 front-end engineers available from Week 4; the Week 1–3 deliverable is a prototype, Week 4–6 is the engineering handoff and implementation
  • Regulatory confirmation from the compliance team: steps 5–9 (personal details through bank linking) are legally mandated before any money movement; steps 10–11 (spending preferences, notification preferences) are optional and can be deferred

Week 1: Diagnose Before Designing

Pull the Mixpanel funnel data for the past 90 days. Map completion rates at every step — not the overall 39% completion rate, but the specific drop-off percentage at each of the 14 steps. This data will almost certainly show that the abandonment is not evenly distributed across the flow. A typical pattern for a KYC-heavy fintech onboarding: Steps 1–4 (account creation through phone verification): 85–90% completion — low friction, familiar patterns. Step 6 (document upload): 55–65% completion — the first high-friction moment; users who did not expect to need their passport during sign-up abandon here. Step 7 (liveness check): 70–80% of step-6 completers — many users have never done a liveness check; the instruction clarity and camera permission UX is a common failure point. Step 8 (bank account linking): 60–70% completion — linking a bank account is a high-trust ask that many users are not ready to complete at this point in the flow. Steps 9–11 (preferences and notification settings): 80–90% completion for users who reach this stage — the friction has passed, but users who abandoned earlier never see these steps. This diagnostic tells the designer that the problem is concentrated at 3 steps (document upload, liveness check, bank linking) — not distributed across all 14 steps. The redesign effort should be focused on these 3 steps, not on creating a brand new 14-step flow.

Simultaneously, review 20 FullStory session recordings of users who abandoned at each of the 3 problem steps. Look for: rage clicks (indicating UI confusion), repeated back-navigation (indicating loss of orientation), long pause times before a step (indicating decision anxiety), and device-camera interactions that fail (indicating permission or technical issues with the liveness check). Run 5 contextual interviews with users who abandoned in the past 30 days (recruited via an exit survey triggered at abandonment). The single most valuable question: "Can you walk me through what you were thinking when you decided to stop?" Users will almost always articulate the exact friction point in their own language.

Week 2: Define the Design Direction

The research synthesis produces a clear problem statement for each of the 3 critical steps: Document upload: users did not know they would need a physical ID document during sign-up; the surprise creates abandonment before the step even loads. Design direction: set expectations earlier in the flow — introduce a "what you'll need" screen before Step 5 (personal details) showing the documents required, how long the process takes (estimated 4 minutes remaining), and why each step is required ("We're legally required to verify your identity before you can send money — this keeps your money safe"). Bank account linking: users distrust linking a bank account before they have experienced any product value; asking for bank access at Step 8 (before the user has even made their first transfer) violates the trust sequence — the product is asking for maximum access before giving any value. Design direction: defer bank account linking to post-first-value. Allow users to complete onboarding with a manual top-up (debit card payment) as an alternative to Open Banking, then offer bank linking as an optional upgrade on the success screen after the first transfer. Liveness check: the camera permission request is a jarring interruption; users who dismiss it by mistake cannot easily recover; the instruction screen before the liveness check uses technical language ("ensure your face is centred in the frame at all times"). Design direction: redesign the liveness check preparation screen with a live preview of the camera feed (so the user can see themselves before the check begins), simplified instructions using visual demonstrations rather than text, and a persistent help link for users on low-end devices.

Week 3–4: Design and Prototype

Build the redesigned flow in Figma as a high-fidelity interactive prototype covering the 14 steps with the 3 redesigned critical moments. Key design decisions: (1) Progress indicator: replace the current step counter ("Step 6 of 14") with a segmented progress bar grouped into 3 phases ("Create account → Verify identity → Start sending") — grouping reduces perceived length without changing the actual number of steps. (2) Trust scaffolding: add a persistent trust footer to every verification step (FCA-regulated lock icon, "Your data is encrypted," and a "Why do we need this?" expandable drawer for each step). (3) Contextual microcopy: rewrite every step headline and subheading in plain language — replace "Complete KYC verification" with "Confirm who you are"; replace "Liveness detection required" with "Take a quick selfie". (4) Error recovery: design explicit error states for the 3 highest-failure moments: failed document scan (camera too dark/document obscured), failed liveness check (lighting issues), and bank link failure (bank not supported). The current flow shows a generic error message for all three — the redesigned error states are specific, actionable, and include a manual entry alternative for each.

Week 5: Usability Testing

Test the prototype with 6 participants (recruited to match the target demographic: 25–40 year olds with a smartphone, no prior use of the specific app). Tasks: complete the onboarding flow unassisted. Observation focus: task completion rate at each of the 3 redesigned steps (target: >85%), time on task, and verbal think-aloud commentary at each step. After the session: ask the trust question: "At what point during this sign-up did you feel confident this was a safe and legitimate service?" Target: 5 of 6 participants naming a moment in the first half of the flow (before the document upload step) — indicating that the trust scaffolding is landing before the highest-friction steps.

Week 6: Engineering Handoff

Deliver: a complete Figma file with all 14 redesigned screens across 4 states (default, filled, error, loading), a component library aligned to the existing design system, annotated spec sheets for the 3 critical redesigned moments (covering interaction logic, animation timing, and content design guidelines), and a written handoff document with the usability test findings showing which design decisions are evidence-based.

Early Warning Metrics:

  • Step 6 completion rate in production (target: from 55% to 78% within the first 30-day cohort post-launch) — the single most important leading indicator that the "what you'll need" expectation-setting screen is working
  • Bank linking completion rate (target: measure this separately now that it is deferred to post-first-value; the hypothesis is that users who have completed a transfer are 2–3× more likely to link their bank account than users who are asked at Step 8)
  • Liveness check first-attempt success rate (target: from estimated 65% first-attempt success to 85%) — measures the impact of the live camera preview and simplified instruction redesign

4. Interview Score: 9.5 / 10

Why this demonstrates senior-level maturity: Leading with funnel analytics diagnostics (identifying the 3 concentrated drop-off points) before any design work — and using that data to scope the redesign to the 3 critical steps rather than rebuilding all 14 — demonstrates the research-to-design discipline that distinguishes a senior UX designer from a visually talented junior. Deferring bank account linking to post-first-value (a progressive onboarding architecture) rather than redesigning the bank linking step in isolation shows systems thinking about the trust sequence. The trust scaffolding strategy (persistent FCA regulation indicator, "why do we need this?" drawers) directly addresses the distrust feedback with a design mechanism rather than a copywriting patch.

What differentiates it from mid-level thinking: A mid-level designer would open Figma in Week 1, redesign the screens that look most outdated, and deliver a new visual design without a structured diagnostic phase or usability testing. They would not decompose the funnel analytics to identify the 3 concentrated drop-off steps, would not know to check session recordings for rage clicks and back-navigation patterns, and would not identify the bank linking timing as a trust-sequence structural issue rather than a UI problem.

What would make it a 10/10: A 10/10 response would include a specific usability test script for the 3 critical steps with the exact tasks, success criteria, and observation prompts, a content design framework for the plain-language microcopy rewrites (showing the before/after for 3 specific step headlines), and a concrete A/B test design for the production validation showing the control variant, test variant, success metric, and minimum detectable effect size.



Question 2: Design Systems — Building and Scaling a Component Library

Difficulty: Senior | Role: UX Designer / Design Systems Designer | Level: Senior / Staff | Company Examples: Atlassian Design System, Google Material Design, IBM Carbon, Shopify Polaris, Airbnb


The Question

You have just joined a 200-person SaaS company as a Senior UX Designer. The company has been building its product for 5 years. There are 8 product squads, each with their own designer, and no shared design system. The result: 14 variants of the primary button exist across the product, form field styles are inconsistent between modules, spacing is arbitrary, and the mobile and web experiences look like different products. Engineering is duplicating front-end code across teams. The CPO has asked you to build a design system and achieve measurable consistency across the product within 12 months. You have a budget for one additional hire. Walk through your strategy for building the design system, how you handle the political challenge of getting 8 independent designers to adopt it, and how you measure success.


1. What Is This Question Testing?

  • Design systems knowledge — understanding that a design system is not a Figma component library — it is a combination of design tokens (the foundational values: colours, typography, spacing), components (the reusable UI building blocks with defined variants and states), patterns (the higher-level compositions of components for common UI problems), and governance (the process by which the system evolves and teams contribute); a system without governance degrades within 12 months
  • Organisational thinking — the 8 independent designers are not obstacles to be managed; they are the system's most important contributors and its primary adopters; a design system that is built by one person and handed to 8 teams will not be adopted; a system that is built with the 8 teams as contributors will be defended by them
  • Analytical rigour — the first step in building a design system is a UI audit: a systematic inventory of every component, colour, font, and spacing value currently used across the product; without this audit, the new system will be designed in a vacuum and will miss patterns that teams have already converged on informally; the audit also reveals the scale of inconsistency (14 button variants) with evidence that makes the case for the system to sceptical engineers and product managers
  • Systems thinking — design tokens are the foundation on which everything else depends; if the colour palette and typography scale are defined first as design tokens in Figma (and mapped to CSS variables in engineering), every component built on top of those tokens inherits consistency automatically; the order of build (tokens → base components → complex components → patterns) is as important as the content of the system
  • Communication skills — the design system will only be adopted if it solves real problems for the 8 designers right now, not hypothetically in 12 months; the launch strategy must sequence components in order of highest adoption value: the most-used, most-inconsistent components first (button, input field, modal, navigation) — delivering immediate relief, not a comprehensive but unusable library
  • Financial literacy — the business case for the design system must be presented in terms engineering and product leadership understand: a UI audit that shows 14 button variants existing as 14 separate code implementations represents approximately 13 × the maintenance cost of a single implementation; if each duplicate component costs 4 hours of engineering maintenance annually, 14 buttons = 56 engineering hours per year; across all duplicated components, the system pays for itself in engineering efficiency within 18 months

2. Framework: Design System Build and Adoption Model (DSBAM)

  1. Assumption Documentation — Current tech stack (React, Vue, Angular — determines the component implementation approach), existing design tooling (are teams on Figma already? — critical for library sharing), current design maturity (do any of the 8 designers have design systems experience?), and the engineering team's appetite for a shared component library (this is as much an engineering project as a design project)
  1. Constraint Analysis — 12-month timeline, budget for one hire (the hire must complement the lead designer's skills — if the lead designer is strong on interaction design, the hire should be a design engineer or a visual/token-focused designer), 8 teams with independent roadmaps who will resist anything that slows their delivery velocity
  1. Tradeoff Evaluation — Build from scratch (maximum control, takes longer before teams can adopt) vs. adopt an open-source system (Radix UI, Shadcn, MUI) as a foundation and customise (faster to adopt, less ownership, may not match the product's specific patterns); for a 5-year-old SaaS product with established patterns, a hybrid approach is typically correct: adopt a headless component library (Radix UI) for behaviour and accessibility, layer the company's design tokens on top, and customise the visual layer
  1. Hidden Cost Identification — Documentation is 40% of the design system's value and 60% of its maintenance burden; every component needs a usage guideline ("when to use this component"), an anti-pattern section ("when not to use this"), and live code examples; a system without documentation is a component library, not a design system
  1. Risk Signals / Early Warning Metrics — Figma library detach rate (designers who detach components from the library to create one-off variants are indicating the system doesn't serve their need — every detach is a signal for a system gap), engineering pull request rejection rate for non-system components (a governance mechanism that makes non-system component usage visible), component adoption rate (what % of new UI built in the past month used a system component vs. a bespoke implementation)
  1. Pivot Triggers — If at Month 6, three or more of the 8 product teams are consistently building outside the system rather than contributing to it, the adoption problem is structural — the system's contribution model is too high-friction or the components don't fit the teams' actual use cases; run a rapid research sprint with the non-adopting teams to understand the specific blockers
  1. Long-Term Evolution Plan — Month 1–3: audit + foundation (tokens, 5 core components); Month 4–6: core component library complete (20 components), design documentation published; Month 7–9: pattern library and complex component rollout; Month 10–12: system governance programme, contribution model, v1.0 launch

3. The Answer

Explicit Assumptions:

  • Tech stack: React (web) and React Native (mobile); design tooling: Figma used by all 8 designers but no shared libraries
  • The one hire: a Design Engineer (someone who can build production-quality React components from Figma designs) — this is the single highest-leverage hire for a design system because it closes the design-to-code gap that is the primary cause of design system fragmentation
  • Existing codebase: no shared component library; each squad uses a mix of custom components and different third-party UI libraries (one team uses MUI, another uses Ant Design, another is fully custom)
  • The 8 designers: mixed experience levels; 2 have prior design system experience; 6 are product-focused with limited systems thinking; all are protective of their squad's autonomy

Month 1–3: The Foundation — Audit, Tokens, and Alliance Building

The UI audit is the design system's origin document. Spend the first 3 weeks systematically cataloguing every UI component across the product. Method: screenshot every unique state of every component across all modules (including empty states, error states, and loading states) and paste them into a Figma audit board. Do this collaboratively — ask each of the 8 designers to audit their own product area using a shared template. This serves two purposes: the audit itself (generating the inventory data), and the alliance building (the 8 designers who participate in the audit will have seen with their own eyes the scale of inconsistency; they become advocates for the system rather than passive recipients of it). The audit output is quantified and visualised: a grid showing all 14 button variants side by side with their usage frequency, the 9 different primary colours in use across the product, and the 23 different spacing values. This artefact is the business case. Present it to the CPO and engineering leadership: this is what "no design system" costs us. It is also what the team is starting from — an honest starting point is better than an aspirational redesign that ignores the current state.

Design tokens before components. Tokens are the decision layer that makes everything else consistent. Define in Figma (using the Tokens Studio plugin for systematic token management): colour tokens (primitive palette → semantic tokens: colour/brand/primary maps to #1A73E8; colour/feedback/error maps to #D93025), typography tokens (font family, font size scale, line height, font weight — using a modular scale: 12/14/16/20/24/32/40px), spacing tokens (a consistent 4px base grid: 4/8/12/16/24/32/48/64px), elevation tokens (shadow values for 3 elevations: card, modal, tooltip), and border radius tokens. Map every design token to a CSS variable in the engineering codebase (this is the Design Engineer's first task). The moment this mapping is in production, any component that uses the CSS variables automatically reflects token changes — the first architectural win for consistency. Identify the 3 "champion" designers: from the 8 designers, identify the 3 who are most engaged, most willing to contribute, and most influential with their squads. These become the system's founding contributors. Give them early access to the component Figma library, invite them to co-design the first 5 components, and credit their contributions publicly. Building a visible contributor culture from the start is what separates a design system from a design dictatorship.

Month 4–6: Core Component Library

Build the 20 most-used components in priority order of: usage frequency (how often does this component appear across the product?), inconsistency severity (14 button variants vs. 2 input field variants — button is higher priority), and implementation complexity (start with stateless components like badges and tags before stateful components like dropdowns and date pickers). The 20 core components: Button (primary/secondary/tertiary/destructive/ghost variants, all sizes, loading and disabled states), Input field (text, number, password, with label, hint text, and error states), Textarea, Checkbox, Radio button, Toggle, Select dropdown, Badge, Tag, Avatar, Card, Modal, Alert/Toast, Tooltip, Navigation bar, Sidebar, Breadcrumb, Empty state, Loading skeleton, Pagination. Each component is built simultaneously in Figma (design) and React (code) — the Design Engineer builds the React component from the Figma spec the same week; there is never a version gap between design and code. Each component's Figma documentation includes: anatomy diagram (naming every part of the component), variant matrix (all states and variants in a grid), do/don't usage examples (2 examples each — what the component is for and what it is not for), and accessibility notes (keyboard navigation, ARIA roles, contrast ratios). Publish the Figma library to all 8 teams' workspaces and run a 90-minute "Design System 101" session for all 8 designers — not a training on how to use the library, but a demo of how it solves the specific problems each team has complained about.

The Political Challenge: Adoption Without Mandate

The worst design system launch strategy is a mandate: "from today, all new UI must use the system." Mandates create resentment, workarounds, and resentful compliance that produces worse design than the inconsistency it replaces. The correct adoption strategy: make the system the path of least resistance without mandating it. Three mechanisms: (1) Reduce the time to first component: if a designer can drag a perfectly spec'd, accessible, documented button component from the library into their mockup in 5 seconds, they will use it rather than recreating a button from scratch. The system's value is immediately felt, not abstractly promised. (2) Embed in existing workflows: rather than asking designers to change how they work, offer to help them with their current designs using the system. Identify one upcoming feature in each squad's roadmap and offer to co-design it using the design system — one sprint of collaboration produces a squad's own component usage proof-of-concept, which is more persuasive than any presentation. (3) Engineering is the forcing function: when the Design Engineer's React components are well-built, documented, and accessible, engineers prefer using them over building custom components. An engineering team that requests design system components rather than one-off implementations creates pull-based adoption from the delivery end, not push-based adoption from the design end.

Measuring Success

Three metrics that reflect genuine adoption rather than compliance: (1) Component adoption rate: what percentage of net-new UI shipped in the past month used a design system component vs. a bespoke implementation? Track this via a quarterly engineering audit of pull requests. Target: 30% Month 6, 70% Month 12. (2) Design consistency score: run the UI audit process again at Month 6 and Month 12, measuring the number of unique visual variants for each of the 5 highest-inconsistency components identified in the initial audit. Target: button variants from 14 to 3 by Month 12 (some intentional variants remain: primary, secondary, destructive). (3) Designer contribution rate: number of component improvement requests and contributions submitted by the 8 product designers per quarter. A system that receives no contributions is a system that designers are working around, not with. Target: 4+ contributions per quarter from non-system team members by Month 9.

Early Warning Metrics:

  • Figma library detach rate — in Figma's analytics, track how often designers detach a component from the library (breaking the link to the master component); a high detach rate on a specific component means that component's variants do not cover the teams' actual needs; address within 2 weeks of detection
  • "Not in the system" design review comment frequency — track how often design review comments include references to components that are not yet in the system; a high frequency on specific component types indicates the library has coverage gaps for the teams' current work
  • Time to implement a system component vs. custom component — ask the Design Engineer to time how long it takes to implement a new feature using system components vs. building custom; the time saving is the most compelling engineering adoption argument

4. Interview Score: 9.5 / 10

Why this demonstrates senior-level maturity: The collaborative UI audit strategy (asking the 8 designers to audit their own product areas rather than doing it unilaterally) achieves two things simultaneously — the audit output and the alliance building — which is the kind of organisational leverage thinking that distinguishes a staff-level designer from a senior individual contributor. The token-before-components sequencing (establishing the design token → CSS variable mapping as the first engineering integration) means every subsequent component inherits system consistency automatically — it is a compound investment. The pull-based adoption strategy (engineering teams requesting system components because they are better-built and accessible) is more durable than any top-down mandate.

What differentiates it from mid-level thinking: A mid-level designer would build a beautiful Figma component library, announce it to the 8 teams, and then wonder why adoption is low 6 months later. They would not conduct the UI audit collaboratively, would not know about design tokens as the foundation layer, would not hire a Design Engineer as the highest-leverage role, and would not define the contribution model that determines whether the system grows or stagnates after the initial build.

What would make it a 10/10: A 10/10 response would include a specific token naming convention (showing the primitive-to-semantic token hierarchy for colour with example JSON), a contribution model governance document (showing the RFC process for proposing new components and the criteria for acceptance), and a component adoption tracking methodology showing the specific Figma analytics configuration and engineering PR audit checklist used to measure the adoption rate.



Question 3: Accessibility and Inclusive Design — Designing for a Diverse User Population

Difficulty: Senior | Role: UX Designer | Level: Senior | Company Examples: Microsoft Inclusive Design, Apple Accessibility, BBC GEL, GOV.UK Design System, Deque


The Question

You are a Senior UX Designer at a healthcare technology company building a patient portal used by NHS patients to book appointments, view test results, and manage repeat prescriptions. A recent accessibility audit has found that the portal fails 34 WCAG 2.1 AA criteria. More critically, patient feedback shows that elderly users and users with visual, cognitive, and motor impairments are significantly underserved — some cannot use the portal independently at all and must rely on phone-based appointments instead. The NHS contract requires WCAG 2.1 AA compliance within 6 months. Beyond compliance, the Head of Design wants to move toward genuinely inclusive design rather than a tick-box accessibility fix. Walk through how you would approach this — the compliance remediation, the inclusive design programme, and how you handle the tension between accessibility requirements and other design priorities.


1. What Is This Question Testing?

  • Accessibility expertise — understanding the difference between WCAG 2.1 AA compliance (a minimum technical standard) and inclusive design (a design philosophy that treats disability as a design input rather than an afterthought); WCAG compliance can be achieved by an engineer adding ARIA labels without changing a single pixel of the UI; inclusive design requires understanding how real users with diverse needs navigate real tasks and designing from those needs
  • Research with marginalised populations — knowing how to conduct research with users who have accessibility needs: recruiting must actively include users with visual, cognitive, and motor impairments (not just proxy design for them); sessions must be conducted on participants' own devices and assistive technology (screen readers, switch access, voice control) rather than lab equipment; sessions must be shorter and more flexible than standard usability tests
  • Technical knowledge — a designer working on accessibility must understand enough about HTML semantics, ARIA roles, focus management, and colour contrast to translate design decisions into specific engineering requirements; a designer who says "make it accessible" without specifying what that means is handing an ambiguous requirement to an engineer who will implement it inconsistently
  • Prioritisation under constraint — 34 WCAG failures must be triaged by severity: failures that prevent task completion (critical — must fix in Sprint 1) vs. failures that impede task completion (high — fix within 3 months) vs. failures that affect quality of experience but do not block usage (medium — fix within 6 months); trying to fix all 34 simultaneously results in none being fixed well
  • Stakeholder management — "inclusive design vs. aesthetic design" is a false tension that the designer must explicitly dismantle; well-designed accessible interfaces are not ugly interfaces with large buttons; Microsoft's Inclusive Design research consistently shows that features designed for accessibility (captions, keyboard navigation, high contrast) have the highest usage rates among the general population — the "curb-cut effect"; presenting this evidence to design sceptics changes the framing from "accessibility compromises design" to "accessibility improves design for everyone"
  • Organisational thinking — accessibility as a compliance project (fix 34 issues, tick the box, move on) produces a different organisational outcome than accessibility as a product quality standard (every new design is reviewed against accessibility criteria before implementation); the long-term goal is to embed accessibility into the design process so that WCAG failures stop accumulating, not just to remediate the existing 34

2. Framework: Accessibility Remediation and Inclusive Design Programme Model (ARIDPM)

  1. Assumption Documentation — Review the accessibility audit's methodology: was it conducted via automated testing only (which catches ~30% of WCAG failures), manual expert review, or user testing with assistive technology users? The 34 failures likely represent a subset of the actual issues; a manual expert review will surface additional failures that automated tools miss
  1. Constraint Analysis — 6-month NHS contract deadline for WCAG 2.1 AA compliance, existing product under active development (new features being shipped while the remediation is in progress — without an accessibility gate in the development process, new issues will be introduced as existing ones are fixed)
  1. Tradeoff Evaluation — Fix existing failures only (compliance-focused, achievable in 6 months) vs. fix existing failures + build inclusive design into the design process (more comprehensive, requires process change alongside remediation); both must happen simultaneously — the compliance deadline is non-negotiable, and the process change is necessary to prevent the same issues recurring
  1. Hidden Cost Identification — The cost of fixing accessibility issues post-implementation is approximately 10× the cost of designing accessibility in from the start (Nielsen Norman Group research); every WCAG failure that exists in production represents a design and engineering decision that was made without accessibility criteria; the inclusive design programme is an investment prevention programme, not just a quality improvement programme
  1. Risk Signals / Early Warning Metrics — Automated accessibility scan failure count per sprint (using axe-core or Lighthouse in CI/CD pipeline — a new failure introduced in a sprint means the sprint violated the accessibility standards; target: zero new failures introduced while existing failures are remediated), screen reader task completion rate in monthly usability sessions with visually impaired users (target: 80% task completion with NVDA or JAWS on the 5 core patient portal tasks)
  1. Pivot Triggers — If the 6-month compliance deadline is at risk at Month 3 (fewer than 60% of critical failures remediated): escalate to the NHS contract manager immediately and propose a phased compliance milestone; most NHS contracts accept a credible evidence-based remediation plan with defined milestones as an interim compliance demonstration
  1. Long-Term Evolution Plan — Month 1–6: WCAG 2.1 AA remediation; Month 7–12: inclusive design research programme with 3 user panels (elderly users, screen reader users, cognitive accessibility users); Month 13–18: proactive inclusive design embedded in all new feature work with a defined accessibility review gate in the design process

3. The Answer

Explicit Assumptions:

  • Patient portal users: NHS patients, age range 18–85 with a significant proportion over 65, including patients with visual impairments (screen reader and magnification users), motor impairments (keyboard-only users, switch access users, tremor-affecting pointer accuracy), and cognitive impairments (including patients with dementia, ADHD, and low health literacy)
  • The 34 WCAG failures from the audit: categorised as 8 critical (prevent task completion), 14 high (significantly impede task completion), 12 medium (affect quality but do not block usage)
  • Current development process: no accessibility review gate; developers ship features without automated accessibility testing in the CI/CD pipeline
  • Engineering team: 1 front-end engineer who has prior accessibility experience and can lead the technical implementation

The Compliance Remediation: Triage, Then Sprint

The 34 failures must be triaged by user impact, not by WCAG criterion category. The triage question for each failure: can a user complete the task they came to do without this being fixed? If no: critical. If the task is harder but possible: high. If the task is unaffected but the experience is suboptimal: medium. The 8 critical failures (from the audit description) are likely to include: missing keyboard focus management in the appointment booking modal (keyboard-only users cannot tab into the modal — the modal is inaccessible to all non-mouse users), insufficient colour contrast on form field labels (3.5:1 contrast ratio on the required 4.5:1 minimum — affects all low-vision users without screen readers), missing form field labels on the prescription reorder form (screen readers read out "edit text, edit text, edit text" without context — a screen reader user cannot complete the form), and missing alt text on test result images (a visually impaired patient cannot access their test result information). Build a 3-sprint remediation plan: Sprint 1 (Weeks 1–2): all 8 critical failures. Sprint 2 (Weeks 3–6): all 14 high failures. Sprint 3 (Weeks 7–12): all 12 medium failures. Weeks 13–24: monitor for regression, refine, and audit again. Simultaneously: install axe-core in the CI/CD pipeline from Week 1. Any pull request that introduces a new axe-core failure is blocked from merging. This prevents new failures from accumulating while the existing ones are being remediated — the accessibility debt does not grow while it is being paid down.

Designing Specific Remediation Solutions: Not Just Flagging, But Solving

A designer's role in the remediation is not to write a list of issues — it is to design the solutions and hand them to engineering as specific, implementable specifications. For the keyboard focus management failure: design a focus trap implementation for the appointment booking modal (specify in the Figma annotation: "when the modal opens, focus moves to the modal's close button; Tab cycles through interactive elements within the modal only; Escape closes the modal and returns focus to the trigger element; focus trap follows ARIA Authoring Practices Guide modal pattern"). For the colour contrast failure: update the design tokens — change the form label colour token from #767676 (3.5:1 against white) to #595959 (7:1 against white — exceeds AA, approaches AAA for normal text); apply the token update, and it propagates to every instance of the label style automatically. For the missing form labels: audit the prescription reorder form and add a visible label to every field; if the design cannot accommodate a visible label due to space constraints, specify a visually hidden label using CSS (clip technique) that is accessible to screen readers but invisible on screen — this is the correct technical approach, not placeholder text which disappears on input.

The Inclusive Design Programme: Research With, Not For

The compliance remediation fixes what is broken. The inclusive design programme redesigns from real users' perspectives. The programme has three components: (1) User panel recruitment: recruit 3 user panels of 6 participants each — a visual impairment panel (NVDA and JAWS screen reader users, VoiceOver users, magnification users), an elderly user panel (65+ with varying levels of digital literacy), and a cognitive accessibility panel (users with dyslexia, ADHD, early-stage dementia). These panels are paid participants who participate in monthly 60-minute sessions. Recruit via RNIB (Royal National Institute of Blind People), Age UK, and NHS England's Patient Participation Group network. (2) Contextual design sessions: conduct sessions in participants' homes, on their own devices, with their own assistive technology. A screen reader user who uses JAWS 2023 on Windows with a 200% zoom level is a completely different context from a lab session with NVDA on a standard screen. The home context also reveals the environmental factors that affect the portal's usability: poor broadband, small screen devices, background noise from the TV, and the cognitive load of managing a health condition while navigating a health portal. (3) Design implication translation: after each panel session, the designer translates the observations into specific design decisions — not general recommendations. "Users found the appointment confirmation page confusing" is not a design implication. "Users on the visual impairment panel took an average of 3 minutes to locate the appointment confirmation date because the information hierarchy places the confirmation number above the date — reverse the order so the date is the first piece of information after the page heading" is a design implication that can be acted on.

Handling the Tension Between Accessibility and Other Design Priorities

The tension is usually framed as "accessibility vs. aesthetics" or "accessibility vs. delivery speed." Both framings are wrong, and the designer must proactively reframe them. The evidence: (1) The curb-cut effect: audio descriptions were designed for blind users but are now used by 80% of people watching video in public without headphones (Ofcom 2022). Keyboard navigation was designed for motor-impaired users and is now the primary navigation method for power users in every productivity tool. Closed captions were designed for deaf users and are now used by 69% of video viewers in quiet settings. Accessibility features improve the product for everyone. Present this evidence to the Head of Design and CPO with product-specific examples. (2) The NHS contract risk: for this specific product, failing WCAG 2.1 AA is a contractual breach; the tension between accessibility and delivery speed is the tension between compliance and contract termination — framed correctly, accessibility is not a competing priority with delivery, it is a prerequisite for delivery. (3) The "accessibility late" cost: show the triage data — the 34 existing WCAG failures represent design and engineering work that will cost 3× more to fix retroactively than it would have cost to design correctly. This is not a philosophical argument — it is a cost argument that resonates with engineering managers and product leaders.

Early Warning Metrics:

  • Automated scan pass rate — axe-core CI/CD failures per sprint; target zero new failures introduced during the remediation period; any new failure is reviewed at the next design review and assigned to a sprint
  • Screen reader task completion rate — monthly session with one participant from the visual impairment panel completing the 5 core tasks (book appointment, view test result, reorder prescription, update contact details, view appointment history); target 80% task completion by Month 3, 100% by Month 6
  • NHS contract compliance evidence pack completeness — the NHS digital compliance submission requires specific evidence (automated test reports, manual audit report, user testing with disabled users); track evidence pack completeness against the submission checklist monthly

4. Interview Score: 9.5 / 10

Why this demonstrates senior-level maturity: Translating the WCAG failures into specific, implementable design specifications (the focus trap annotation, the CSS hidden label technique, the colour token value change from #767676 to #595959) rather than leaving implementation to the engineers demonstrates the technical accessibility literacy of a senior accessibility-focused UX designer. The home-context research methodology (conducting sessions on participants' own devices and assistive technology) is the crucial methodological distinction between proxy accessibility design and genuine inclusive design. The curb-cut effect evidence presentation as a business argument reframes accessibility from a compliance burden to a product quality investment.

What differentiates it from mid-level thinking: A mid-level designer would compile the 34 WCAG failures into a spreadsheet, assign them to engineering, and monitor progress. They would not design specific remediation solutions, would not install axe-core in the CI/CD pipeline to prevent regression, would not recruit user panels for the inclusive design programme, and would not know about the CSS clip technique for visually hidden labels or the ARIA Authoring Practices Guide focus trap pattern.

What would make it a 10/10: A 10/10 response would include a specific WCAG 2.1 AA triage matrix template (showing the 8 criteria categories and the severity scoring methodology), a complete focus trap component specification with the full keyboard interaction model and ARIA implementation, and a user panel recruitment brief template for the RNIB partnership showing the screening criteria and session structure.



Question 4: UX Research — Planning and Conducting Generative Research

Difficulty: Senior | Role: UX Designer / UX Researcher | Level: Senior | Company Examples: Google UX Research, IDEO, Nielsen Norman Group, Spotify Design Research, IBM Design Research


The Question

You are a Senior UX Designer at a B2B enterprise software company. The product team is considering building a new feature: an AI-powered "smart scheduling" tool that would automatically suggest meeting times, prepare meeting agendas, and summarise meeting notes for teams of 10–500 people. No user research has been conducted on this feature concept. The Head of Product believes this is a clear opportunity based on personal intuition and competitor analysis. You believe the concept needs validation before any design or engineering investment. You have 4 weeks and a budget of £8,000 for research. Design the research programme, justify your methodology choices, and describe how you would present the findings to a Head of Product who is committed to the feature concept.


1. What Is This Question Testing?

  • Research methodology — understanding the difference between generative research (exploring what users need and how they currently behave — appropriate for an unvalidated concept) and evaluative research (testing a specific design solution — premature before the concept is validated); the research programme must be generative, focused on understanding the user's actual scheduling, agenda, and note-taking behaviour before proposing a solution
  • Research design rigour — knowing how to write a research plan that is specific enough to execute: research objectives (what decisions will this research inform?), research questions (what do we need to learn?), methodology (which methods answer which research questions and why?), participant profile (who are the target users of this feature?), and analysis approach (how will findings be synthesised and presented?)
  • Business acumen — understanding that the Head of Product's "intuition and competitor analysis" is not evidence of a user need — it is a hypothesis; the designer's job is to test the hypothesis rigorously, not to validate it; if the research reveals that users do not have the problem the feature is designed to solve, the designer must present that finding honestly and help the product team refocus on problems that are actually present
  • Stakeholder management — presenting research findings that contradict a senior stakeholder's committed belief is one of the most politically sensitive situations a UX designer faces; the presentation strategy must separate the data from the interpretation, lead with what was learned rather than what was disproved, and close with a forward-looking recommendation rather than a "you were wrong" conclusion
  • Analytical rigour — qualitative research produces rich data that is easy to selectively interpret; the analysis must be systematic: affinity mapping of observations into themes, participant quote triangulation (a finding supported by observations from 3+ participants is a pattern; a finding from 1 participant is an outlier), and explicit uncertainty labelling (distinguishing between confident findings and tentative hypotheses that require further research)
  • Practical resourcefulness — £8,000 for 4 weeks of B2B enterprise research is a constrained budget; the methodology must be chosen with cost efficiency in mind; in-depth interviews are the highest-value generative research method for a constrained budget (12 × 60-minute interviews can be conducted remotely for approximately £3,000–£5,000 including participant incentives), and diary studies or contextual observation are higher cost but sometimes necessary for behaviour that participants cannot accurately self-report

2. Framework: Generative Research Design and Insight Delivery Model (GRDIIDM)

  1. Assumption Documentation — Define the target user profile before recruiting: the smart scheduling feature is for teams of 10–500 people — is the target user the meeting organiser, the meeting attendee, or both? At enterprise scale, these are often different people with different needs; the feature may also span multiple roles (executive assistants who schedule for others, team leads who run recurring standups, project managers who run cross-functional syncs) each with different scheduling behaviours
  1. Constraint Analysis — 4-week timeline, £8,000 budget, a committed Head of Product who believes in the concept, B2B enterprise recruitment (harder and more expensive than B2C — enterprise participants require incentives of £100–£150 per hour vs. £30–£50 for consumer research)
  1. Tradeoff Evaluation — In-depth interviews (generative, efficient, lower cost) vs. contextual inquiry/shadowing (more ecologically valid, shows actual behaviour rather than reported behaviour, but 2× the cost and time) vs. diary study (captures behaviour over time, relevant for recurring meeting scheduling patterns, but 4-week turnaround is tight); for a 4-week generative research programme with an £8,000 budget, in-depth interviews are the primary method, supplemented by a lightweight diary study for a subset of participants
  1. Hidden Cost Identification — B2B enterprise participant recruitment is the budget's primary risk: specialist research recruitment agencies charge £150–£250 per screened and qualified enterprise participant; with a target of 12 participants, recruitment alone can cost £1,800–£3,000; managing this via LinkedIn outreach, the company's existing customer relationships, and the product team's network reduces cost but requires more researcher time
  1. Risk Signals / Early Warning Metrics — Participant saturation (by Interview 8, are new interviews producing new themes or repeating existing ones? — saturation is the qualitative equivalent of statistical significance); insight-to-hypothesis alignment rate (are the interview themes confirming, disconfirming, or nuancing the Head of Product's original hypothesis? track this explicitly during analysis)
  1. Pivot Triggers — If the first 4 interviews reveal that the scheduling pain point is not widespread (participants describe scheduling as a minor inconvenience rather than a significant productivity drain), adjust the interview guide to probe more deeply into adjacent pain points (meeting quality, preparation, follow-up actions) — the smart scheduling concept may be solving the wrong problem but there may be a related problem worth solving
  1. Long-Term Evolution Plan — Week 1–4: generative research; Week 5–6: analysis and synthesis; Week 7: findings presentation and feature concept revision; Week 8–12: concept design sprint informed by research insights; Week 13–14: concept testing with users from the original research cohort

3. The Answer

Explicit Assumptions:

  • The target product: a Teams/Outlook-integrated enterprise tool (Microsoft 365 environment) for mid-to-large organisations
  • Target participants: knowledge workers at enterprise companies who organise or participate in 5+ meetings per week; specifically targeting: team leads and project managers (primary meeting organisers) and individual contributors in cross-functional teams (frequent meeting attendees with little scheduling control)
  • Budget allocation: recruitment £2,500 (12 participants × £208 average); participant incentives £1,200 (12 × £100 for 60-minute interview); transcription and analysis tooling (Otter.ai transcription + Miro for affinity mapping) £300; diary study materials and incentives for 5 participants £800; researcher time is internal; contingency £3,200
  • The Head of Product's belief: "Teams waste 30% of their time in poorly-organised meetings; AI can fix scheduling, preparation, and follow-up to recover this time"

Research Objectives and Questions

The research plan begins with objectives — what decisions will this research enable? Three objectives: (1) Understand the current scheduling, agenda, and note-taking behaviour of knowledge workers in enterprise teams — the actual behaviour, not the ideal behaviour. (2) Identify the most significant pain points in the meeting lifecycle (scheduling, preparation, in-meeting, follow-up) and their frequency and severity. (3) Assess whether the pain points are primarily process problems (where an AI tool might help) or people problems (where an AI tool cannot help — e.g., meetings that are called without purpose, attended by the wrong people, or never followed up on). From these objectives, the research questions: What does the end-to-end meeting lifecycle look like for an organiser and an attendee? Where in the lifecycle do they experience the most friction? How do they currently solve for scheduling conflicts, agenda preparation, and note-taking? What workarounds have they invented? When a meeting goes well, what made it work? When a meeting fails, what caused the failure? What would they not want automated? (This is the critical counter-hypothesis question that the Head of Product's intuition has not addressed.)

Methodology: In-Depth Interviews and Diary Study

Primary method: 12 in-depth remote interviews, 60 minutes each, conducted over 2 weeks. Participant split: 6 team leads/project managers (meeting organisers), 6 individual contributors (meeting attendees). Why interviews: they allow deep exploration of individual behaviour, motivations, and mental models; they surface the unexpected (what participants didn't know they knew until asked); and they produce rich qualitative data that can be synthesised into design-relevant themes. The interview guide structure: Context setting (5 min): role, team size, typical meeting load per week. Current state workflow (20 min): "Walk me through the last meeting you organised from start to finish" — a specific scenario elicitation (more accurate than asking about general behaviour). Pain points and workarounds (20 min): probe into moments of friction; ask about workarounds ("what do you do instead?") — workarounds are the most reliable indicators of unmet needs. Concept reactions (10 min): briefly describe the smart scheduling concept (without showing a prototype) and ask open-ended reactions — "what would have to be true about this for it to be useful?" and "what concerns would you have?". Close (5 min). Supplementary method: a 5-day diary study with 5 participants (recruited from the interview pool). Participants log each meeting they have in a structured template: purpose, preparation time, who called it, what went well/badly, and follow-up actions taken vs. not taken. The diary study captures real-time behaviour that interview recall cannot — participants often underestimate their meeting load and overestimate their preparation quality in retrospective interviews.

Analysis: Affinity Mapping and Theme Prioritisation

After all 12 interviews and 5 diary studies are complete: (1) Transcribe all interviews using Otter.ai (£100/month subscription). (2) Code each transcript: read through and tag every observation with a descriptive code ("participant uses a personal Notion template for agendas," "participant rarely reads meeting invites before joining," "participant expressed frustration at back-and-forth email scheduling"). (3) Affinity map in Miro: print all coded observations as virtual sticky notes and cluster them into emerging themes. This is done collaboratively with the product manager and one engineer — involving them in the analysis creates shared ownership of the findings and prevents the "researcher presents to sceptic" dynamic. (4) Identify the top 5 themes by frequency (number of participants who mentioned the theme) and severity (how much friction or negative impact the theme represents). (5) Write insight statements: each theme becomes an insight statement in the format "We learned that [user group] [behaviour or attitude], because [motivation or root cause], which means [design implication]." This format forces the analyst to go beyond observation to interpretation.

Presenting Findings to a Committed Head of Product

The most important structural decision in the presentation is: lead with what we learned, not with whether the hypothesis was right or wrong. Opening: "We spoke to 12 enterprise knowledge workers over 2 weeks. Here is what we learned about how meetings actually work in their organisations." Present the 5 themes with supporting quotes and diary study data. Then: explicitly address the Head of Product's original hypothesis against the evidence — "The hypothesis was that teams waste 30% of their time in poorly-organised meetings and that AI scheduling can recover this time. Here is what the research tells us about each part of that hypothesis." For each component: (a) scheduling pain — the research confirms that finding mutually available time is a real pain point (10 of 12 participants mentioned it); however, 8 of those 10 said the pain is in the Outlook/Teams interface itself, not in the concept of scheduling — suggesting the opportunity may be in improving the existing scheduling UI rather than building a new AI layer. (b) Agenda preparation — only 4 of 12 participants consistently prepare agendas; 7 said they rarely or never prepare agendas not because it is too hard but because they do not believe agendas change meeting quality; an AI that prepares agendas that nobody reads does not solve this problem. (c) AI note-taking — 9 of 12 participants reacted positively to AI meeting notes, specifically for action item capture; this is the highest-validated component of the original concept. Close with a recommendation: "Based on the research, the smart scheduling concept in its current form risks building for a problem that is partially imagined and partially an interface problem in existing tools. The highest-validated opportunity is AI action item capture post-meeting. I recommend a focused design sprint on this problem, validated against the 12 research participants before any engineering investment."

Early Warning Metrics:

  • Participant recruitment rate — target all 12 participants confirmed and scheduled by end of Week 1; below 8 confirmed by end of Week 1 requires a budget reallocation to recruitment agency support
  • Interview thematic saturation — by Interview 9, new interviews should be confirming existing themes rather than generating new ones; if new themes are still emerging at Interview 9, consider extending to 15 interviews (using the contingency budget)
  • Head of Product engagement in the analysis session — if the Head of Product engages with the affinity mapping session (contributing sticky note placements, asking questions), they will feel ownership of the findings; if they do not attend, the presentation will feel like a verdict rather than a shared discovery

4. Interview Score: 9.5 / 10

Why this demonstrates senior-level maturity: The insight statement format ("We learned that [user group] [behaviour], because [motivation], which means [design implication]") forces interpretation rather than just observation — this is what separates a research synthesis from a list of quotes. Identifying that 8 of 12 participants' scheduling pain is with the Outlook/Teams interface, not with the concept of scheduling (making the opportunity an interface improvement rather than a new AI layer) is the kind of nuanced finding that only emerges from disciplined analysis of qualitative data. The presentation structure (lead with what was learned, then address the hypothesis explicitly) is the psychologically correct approach to presenting disconfirming evidence to a committed stakeholder.

What differentiates it from mid-level thinking: A mid-level designer would plan a usability test of a prototype before any generative research has been done (confusing evaluative with generative research), or would conduct interviews without a structured research plan and present a collection of quotes rather than synthesised insights. They would not know about the insight statement format, would not involve the product manager in the affinity mapping session to build shared ownership, and would not think to ask the counter-hypothesis question ("what would you not want automated?") that produces the most useful research data.

What would make it a 10/10: A 10/10 response would include the full interview guide for the 60-minute session, a specific affinity map structure showing the 5 expected theme clusters for a meeting-tool research study, and a concrete research findings deck template showing how to present a disconfirming finding alongside its supporting evidence and forward recommendation without triggering stakeholder defensiveness.



Question 5: Interaction Design — Designing a Complex Data-Heavy Interface

Difficulty: Elite | Role: UX Designer | Level: Senior / Staff | Company Examples: Palantir, Tableau, Bloomberg Terminal, Datadog, Splunk


The Question

You are a Senior UX Designer at a SaaS analytics company. You have been asked to design the main dashboard for a new product: a real-time operational monitoring tool used by operations managers at logistics companies. The dashboard must display: live GPS tracking of 200–2,000 vehicles, real-time delivery status for 5,000–50,000 active deliveries, exception alerts (late deliveries, vehicle breakdowns, failed delivery attempts), performance KPIs (on-time delivery rate, average delivery time, exception rate by depot), and a drill-down capability to individual vehicle and delivery level. The users are operations managers who monitor the dashboard for 4–8 hours per day, often across multiple screens. The product team wants to build something that looks impressive in sales demos. You believe they are optimising for the wrong thing. Walk through your design philosophy for this interface, the research you would conduct, the key design decisions, and how you would handle the tension between the sales team's aesthetic expectations and the users' functional needs.


1. What Is This Question Testing?

  • Interaction design philosophy — understanding that data-dense operational interfaces have fundamentally different design principles from consumer product design; the goal is not delight or discoverability — it is situation awareness (enabling the user to understand the current state of a complex system at a glance), cognitive load reduction (minimising the mental effort required to extract actionable information), and error prevention (ensuring that critical anomalies are immediately visible and cannot be missed during routine monitoring)
  • Contextual research depth — knowing that designing for an operations manager monitoring 2,000 vehicles for 8 hours requires contextual research in the actual operations centre environment: the physical setup (multiple screens, standing desks, shared vs. individual monitors), the ambient noise and interruptions, the shift handover process, and the specific decision-making moments the manager faces during a typical shift
  • Information architecture for complex data — the challenge of simultaneously displaying GPS tracking, delivery status, alerts, KPIs, and drill-down is an information architecture problem before it is a visual design problem; the design must answer: what is the primary task (scanning for exceptions vs. monitoring aggregate performance vs. managing individual deliveries?), what information hierarchy serves that task, and how does the user navigate from overview to detail without losing context?
  • Design for sustained use — a dashboard used for 4–8 hours per day has different visual design requirements from a dashboard seen for 5 minutes in a weekly review; high-contrast, low-chroma colour palettes reduce eye strain over sustained use; motion and animation that is appropriate for a consumer app can be distracting and fatiguing in an operational monitoring context; the design must be calibrated for the sustained use case, not the demo use case
  • Stakeholder alignment — the tension between "looks impressive in demos" (sales team's goal) and "works well for 8 hours of daily monitoring" (users' goal) is real and represents a business model conflict; a demo-optimised interface may win the sale but lose the renewal; the designer must make this argument with evidence (churn data from demo-optimised competitors, NPS correlation with task efficiency in enterprise software) to align the product team and sales team behind a user-centred design direction
  • Technical constraints — displaying real-time GPS data for 2,000 vehicles and delivery status for 50,000 deliveries in a browser interface is a significant front-end engineering challenge; the designer must understand the rendering constraints (WebGL for map rendering, virtual scrolling for large lists, websocket data update frequency) to design a realistic interface rather than one that looks beautiful in Figma but is impossible to implement performantly

2. Framework: Complex Data Interface Design Model (CDIDM)

  1. Assumption Documentation — Confirm the users' primary task during a monitoring shift: is it scanning for exceptions (reactive) or actively managing deliveries (proactive)? The answer changes the information hierarchy fundamentally; an exception-first design surfaces anomalies prominently; a management-first design makes individual delivery actions the primary interaction pathway
  1. Constraint Analysis — Real-time data at scale (2,000 vehicles, 50,000 deliveries) imposes front-end performance constraints that directly affect design decisions (clustering vs. individual markers on the map, virtual scrolling vs. pagination for delivery lists, data update frequency); designing without understanding these constraints produces beautiful but unimplementable designs
  1. Tradeoff Evaluation — Map-first layout (geographic overview as the primary frame) vs. list-first layout (exception and KPI lists as the primary frame) vs. hybrid layout (map and list as equal-weight panels); for an operations manager whose primary task is exception management rather than geographic routing, a list-first layout with a contextual map panel may serve the primary task better than a full-screen map
  1. Hidden Cost Identification — The map rendering of 2,000 real-time vehicle positions requires WebGL-based map rendering (Mapbox GL or Google Maps Platform) that costs significantly more than a standard embedded map; if the product team has budgeted for a standard map API, a real-time 2,000-vehicle display will exceed both the cost and the performance envelope; the designer must surface this constraint before the engineering sprint begins
  1. Risk Signals / Early Warning Metrics — Time to first alert acknowledgement in usability testing (how long does it take a test participant to notice and respond to a new exception alert from a cold start? — target under 10 seconds for a critical alert); false alarm fatigue in diary study (are participants mentally filtering out alerts because too many are non-actionable? — false alarm fatigue is the primary cause of critical alerts being missed in operational monitoring systems)
  1. Pivot Triggers — If usability testing shows that participants in a realistic monitoring session (90 minutes of live dashboard interaction with simulated exception events) miss more than 20% of critical alerts: the alert notification design is failing and must be fundamentally redesigned before launch; a monitoring tool where critical alerts are routinely missed is not just a UX failure — it is an operational safety risk
  1. Long-Term Evolution Plan — V1: exception-first dashboard with map context, covering the 5 core monitoring tasks; V2: personalised dashboard (user can configure panel layout and alert thresholds); V3: predictive intelligence (surface deliveries at risk of becoming exceptions before they do, based on route and time data)

3. The Answer

Explicit Assumptions:

  • Primary user: operations manager at a logistics company with 200–800 active vehicles; secondary user: depot manager responsible for a subset of vehicles and deliveries
  • Shift pattern: 6am–2pm, 2pm–10pm, 10pm–6am; the dashboard is used continuously across all shifts with a critical handover period where the outgoing and incoming manager review the current state together
  • Device context: dual 27" monitors; primary monitor for the dashboard; secondary monitor for email and internal communications
  • Engineering stack: React front-end, Mapbox GL for map rendering, WebSocket for real-time data, virtual scrolling for large delivery lists

Design Philosophy: Situation Awareness Over Visual Impressiveness

The design philosophy for this interface is not "make it look impressive" — it is Endsley's three-level situation awareness model: (1) Perception: the user can perceive all relevant system state elements (where are the vehicles, what is the exception status, what are the KPIs?). (2) Comprehension: the user understands what the perceived data means (this cluster of vehicles is 45 minutes behind schedule; this exception rate is 15% above yesterday's baseline). (3) Projection: the user can project what the current state means for the near future (these 3 vehicles will miss their delivery windows unless a route change is made now). Every design decision is evaluated against this framework: does it improve the operations manager's situation awareness or does it reduce it?

The demo-vs-daily-use tension must be addressed directly with the product team and sales team before a single pixel is designed. The argument: enterprise operational software has a well-documented churn pattern in demo-optimised products — the product wins the sale on visual impressiveness and loses the renewal because daily users find the interface cognitively fatiguing or slow to surface critical information. Cite specific examples: Bloomberg Terminal is famously unimpressive aesthetically and has a near-zero churn rate because expert daily users have learned its dense information architecture. Datadog's original UI was ugly and became the category leader because operators trusted it. The sales team's goal (win the demo) and the product team's goal (win the renewal) are not aligned, and the design team must facilitate that conversation explicitly.

Contextual Research: The Operations Centre Fieldwork

Before designing anything, spend 2 days in an operations centre. Observe: what information does the operations manager look at first when they sit down for a shift? What are the 3 decisions they make most frequently during a shift? How do they currently detect exceptions — are they scanning a list, waiting for alerts, or checking the map? What information do they need to resolve an exception (vehicle contact details, customer contact details, alternative delivery options, estimated delay)? What is the shift handover process — what information does the outgoing manager communicate to the incoming manager, and in what format? The fieldwork will reveal behaviours that no amount of stakeholder interviews can surface. A common finding in logistics operations research: the most critical exception type (vehicle breakdown) generates the same visual alert as a low-priority exception (customer not home) — the alert system has no severity differentiation, causing managers to develop personal mental filters that occasionally cause them to miss a critical event. This finding directly shapes the alert notification design.

The Core Design Decisions

Layout: a 3-panel layout on the primary monitor — (1) Left panel (30% width): exception alert feed and KPI summary. This is the primary attention zone. Critical and high-priority exceptions appear at the top with a red/amber severity indicator. KPIs are displayed as compact sparkline-enhanced numbers showing current value and 7-day trend. (2) Centre panel (50% width): map displaying vehicle positions clustered at low zoom levels (Mapbox GL clustering: vehicles within 500m of each other are represented as a numbered cluster circle; the cluster colour reflects the highest severity exception within the cluster). At high zoom levels, individual vehicle markers show vehicle ID, current status (in transit, at delivery, delayed, exception). (3) Right panel (20% width): delivery list filtered to the currently selected area or vehicle. Virtual scrolled list of deliveries with status, ETA, and exception indicator. Map and list are bidirectionally linked: clicking a delivery in the list highlights the vehicle on the map; clicking a vehicle on the map shows its deliveries in the right panel.

Colour and visual design for sustained use: the colour palette is designed for 8 hours of daily use, not for 5 minutes of demo impact. Background: dark neutral (#1A1D23 — reduces eye strain under fluorescent operations centre lighting). Data: high-contrast white and light grey for primary information. Severity colours: red (#FF4444) for critical exceptions, amber (#FFA500) for high-priority exceptions, green (#00CC66) for on-track status. These are the only saturated colours in the interface — saturation is reserved entirely for status communication, not decoration. The alert colours are also differentiated by shape and pattern (not just colour) to support colour vision deficiency: critical alerts have a solid red fill and an "⚠" icon; high-priority alerts have an amber outline and a "!" icon; this satisfies WCAG 1.4.1 (use of colour) for the most safety-critical part of the interface.

Alert architecture: the exception alert feed is designed with 4 severity levels: Critical (vehicle breakdown, accident report — full-width red alert card that takes focus on any screen the manager is on, with a sound notification that can be configured but is on by default), High (late delivery that will miss SLA window by >30 minutes — standard alert card with amber indicator), Medium (late delivery that will miss by <30 minutes — compact list item with amber icon), Low (customer not home, delivery reattempt scheduled — suppressed by default, accessible in a filtered view). The sound notification for critical alerts is specifically designed not to be alarming — a calm but distinct chime that is audible in a noisy operations centre without causing stress; this is based on research on sound design in healthcare monitoring environments where alarm fatigue is a documented patient safety risk.

Drill-down without context loss: when an operations manager clicks on an exception alert to investigate, the map zooms to the relevant vehicle and the right panel shows the full delivery detail — but the left panel remains visible, continuing to surface new exceptions. The manager does not lose their overview while managing an individual issue. Navigating back to the overview is a single click on a breadcrumb ("← All exceptions"). This is the specific interaction pattern that prevents the cognitive overhead of losing the overview when investigating a detail — a common failure in complex dashboard drill-down patterns.

The Sales Demo Version

Rather than fighting the sales team's desire for an impressive demo, design a demo mode: a single toggle in the top navigation bar that switches the interface to "Demo view" — the same layout and the same data, but with a pre-loaded sample dataset of 1,200 vehicles, animated real-time position updates, and a staged exception scenario that demonstrates the alert system responding to a vehicle breakdown in real time. The demo mode is visually identical to the production interface — the same dark palette, the same information architecture — but the staged scenario is choreographed to show the most impressive capabilities in 5 minutes. The sales team gets their impressive demo; the daily users get an interface designed for sustained operational use; and the design team does not have to maintain two separate design directions.

Early Warning Metrics:

  • Time to first alert acknowledgement in usability testing — measure with a stopwatch from the moment a simulated critical alert appears to the moment the test participant notices and clicks it; target under 8 seconds; above 15 seconds is a failure state that requires redesign of the alert notification system
  • False alarm acknowledgement rate — in a simulated 90-minute monitoring session, what percentage of non-critical alerts does the participant acknowledge vs. ignore; high ignore rates indicate alert fatigue; target: >90% of critical alerts acknowledged, <30% of low-priority alerts acknowledged (some low-priority alert suppression is desirable — it indicates the priority hierarchy is working)
  • Map-to-list navigation ratio — in session recordings post-launch, how often do users navigate from the map to the delivery list vs. using the exception feed directly; a high map-navigation ratio means users are using the map as their primary exception discovery tool (suggesting the exception feed is not surfacing the right information); a low map-navigation ratio means the exception feed is working as the primary attention manager

4. Interview Score: 9.5 / 10

Why this demonstrates senior-level maturity: Framing the entire design around Endsley's three-level situation awareness model (perception → comprehension → projection) rather than generic UX principles demonstrates domain-specific knowledge of operational monitoring interface design. The colour palette rationale (high-contrast dark background, saturation reserved exclusively for severity communication, shape + colour differentiation for colour vision deficiency) shows accessibility and visual design thinking simultaneously. Designing a "demo mode" rather than compromising the production interface — and framing it to the sales team as "the same interface with a staged scenario" — is the organisational problem-solving that a staff-level designer can execute where a senior designer might just present the trade-off without resolving it.

What differentiates it from mid-level thinking: A mid-level designer would build a visually stunning full-screen map dashboard with animated vehicle markers, a translucent dark overlay for KPIs, and gradient-filled data visualisations — optimised for the sales demo and catastrophically fatiguing for 8 hours of daily use. They would not know about Endsley's situation awareness model, would not know about alert fatigue as a documented safety risk, would not address the colour vision deficiency requirement, and would not think to design a dedicated demo mode as the resolution to the demo vs. daily use tension.

What would make it a 10/10: A 10/10 response would include a specific information architecture diagram showing the 3-panel layout with the data flow between panels, a complete alert severity taxonomy (the 4 levels with specific trigger conditions for the logistics domain), and a worked colour token system for the dark-mode operational interface showing the full semantic colour mapping from primitive palette to component usage.


Question 6: Mobile UX — Designing a Native App for a Context-Sensitive Use Case

Difficulty: Senior | Role: UX Designer | Level: Senior | Company Examples: Uber, DoorDash, Google Maps, Waze, Deliveroo


The Question

You are a Senior UX Designer at a startup building a mobile app for emergency field nurses — nurses who travel between patient homes to conduct welfare checks and medication administration. The app must let nurses: view their assigned patient list and routing for the day, access patient health records and medication history, log observations and vitals during a visit, submit medication administration confirmations, and escalate concerns to a supervising clinician. The nurses use the app while standing in patient homes, often with one hand (the other holds a bag or equipment), in variable lighting, and sometimes under time pressure or emotional stress. The current prototype has been described by nurses in early testing as "built by someone who has never done a home visit." Walk through how you would research this use case, the specific mobile design principles that apply, and the 3 most important design changes you would make to a poorly-designed prototype.


1. What Is This Question Testing?

  • Context-sensitive design — understanding that mobile design for a professional in a high-stress, physically demanding context is categorically different from consumer app design; the conventional mobile UX guidelines (thumb zones, standard navigation patterns, visual hierarchy) are starting points, not answers; the specific physical, cognitive, and environmental constraints of the use case must shape every design decision
  • Research with expert users — knowing how to conduct research with domain experts (nurses are not naive users — they have deep mental models of their workflows and will rapidly identify design decisions that do not match how their work actually happens); the research method must respect their expertise and capture their existing workflow patterns before imposing any design solutions
  • One-handed interaction design — the one-handed use constraint is a complete interaction design challenge: thumb reach zones on a 6-inch display mean that the top third of the screen is inaccessible with the dominant thumb; navigation controls must be bottom-weighted; touch targets must be at minimum 44×44 points (Apple HIG) — ideally 56×56 points or larger for users wearing examination gloves; swipe gestures that require two hands (pinch-to-zoom on a map) must have single-handed alternatives
  • Cognitive load under stress — an emergency escalation during a patient visit is the highest-stakes task in this app; a nurse who is simultaneously assessing a deteriorating patient and trying to reach a supervising clinician cannot afford a multi-step navigation flow to the escalation feature; the design must ensure that the highest-stakes features are the most immediately accessible, regardless of where in the app the nurse currently is
  • Clinical domain knowledge — the app touches regulated clinical documentation (medication administration records are legal documents in the UK; NMC standards require specific data points to be captured); designing this without understanding the clinical and regulatory requirements will produce a prototype that "looks great" but fails the clinical governance review
  • Privacy and security in the field — the app displays sensitive patient health data on a mobile device in a patient's home, on public transport between visits, and in hospital car parks; screen privacy protections (automatic screen lock after a defined timeout, a privacy screen mode that reduces screen visibility to shoulder-surfers) are not accessibility features — they are clinical information governance requirements

2. Framework: Context-Sensitive Mobile Design Model (CSMDM)

  1. Assumption Documentation — Confirm the device platform (NHS organisations predominantly use Samsung Galaxy Android devices with MDM-managed restrictions; iOS is less common in NHS field roles), screen size (6-inch display is the median), glove usage (examination gloves are used during clinical contact but not during the travel or documentation phases), and clinical system integration (does the app pull from and push to an NHS clinical system such as EMIS, SystmOne, or Rio — this determines data availability and documentation requirements)
  1. Constraint Analysis — One-handed operation during clinical contact phases, variable lighting (patient homes range from bright windows to dark bedrooms), time pressure (a nurse with 8 visits in a 9-hour shift has approximately 68 minutes per visit including travel), regulatory requirements for medication administration logging (NMC Standards for Medicines Management, CQC inspection criteria)
  1. Tradeoff Evaluation — Information density vs. cognitive load: showing all patient information on one screen (maximum information, minimum navigation) vs. showing a progressive disclosure view (less visible at once, but each view is faster to scan); for a nurse under time pressure, progressive disclosure that requires multiple taps to reach critical information is worse than a dense but well-organised single screen
  1. Hidden Cost Identification — Clinical governance review timeline: any app that handles patient health data and medication records in an NHS context requires Information Governance (IG) approval and potentially a Data Security and Protection (DSP) assessment; the design must comply with NHS Digital's DCB0129 clinical safety standard; designing without these requirements will require significant rework at the clinical governance gate
  1. Risk Signals / Early Warning Metrics — Task completion rate in one-handed simulated use (can a test participant complete the medication logging task with one hand while holding a clipboard in the other — simulating the nurse's physical context?), escalation task time (how long does it take to initiate a clinical escalation from any screen in the app? target: under 15 seconds), error rate in medication logging (any UI design that generates a medication administration error is a clinical safety risk)
  1. Pivot Triggers — If field testing with 3 nurses reveals that they are consistently reverting to paper records to supplement the app (because specific data points required by their clinical process are not captured in the app), the information architecture is misaligned with the clinical workflow; stop redesigning the UI and restart with a clinical workflow mapping exercise
  1. Long-Term Evolution Plan — V1: core workflow (patient list, record access, visit logging, medication confirmation, escalation); V2: offline mode with sync (many patient home areas have poor connectivity — the app must function without internet and sync when connectivity is restored); V3: predictive routing and schedule optimisation

3. The Answer

Explicit Assumptions:

  • Device: Samsung Galaxy A54 Android (standard NHS community nursing MDM fleet); screen 6.4 inches; operating system Android 13
  • Clinical system integration: the app reads from and writes to SystmOne (the most common community nursing clinical system in England)
  • Nurse profile: community nurses, 3–20 years clinical experience, moderate smartphone proficiency, predominantly female, age range 28–58
  • The existing prototype: a web-responsive design that was adapted for mobile without mobile-native interaction patterns; navigation is a hamburger menu at the top-right, primary actions are small buttons at the top of the screen, the medication confirmation flow is 7 taps deep from the home screen

Research First: Contextual Inquiry in the Field

The "built by someone who has never done a home visit" feedback is a research failure, not a design failure — the team designed without adequate contextual understanding. The research required before any redesign: spend 2 full shifts (one morning shift, one afternoon shift) accompanying 2 community nurses on their rounds. Not observing from a distance — being present in the patient homes, watching the nurse navigate the app in real time, noting every moment of friction. The specific observations to capture: which hand is dominant and which holds the device during each phase of the visit (arrival, clinical assessment, documentation, departure); how the nurse holds the phone (landscape or portrait, full palm grip or fingertip pinch); at what point in the visit the phone is put away and documentation is done from memory after the visit (a critical finding — deferred documentation means data accuracy degrades); the specific interruptions that occur during documentation (patient questions, carer interventions, phone calls from the base team); and how the nurse currently manages the escalation decision — what information they need, who they call, and how they communicate the patient's status.

This field research produces a contextual workflow map: a visual representation of the 8 phases of a community nursing visit (pre-visit review, travel, arrival, initial assessment, clinical intervention, documentation, escalation if needed, departure) with the app interaction points, the physical context, and the cognitive load at each phase annotated.

The 3 Most Important Design Changes

Change 1 — Invert the navigation to the bottom and redesign the touch target architecture. The current prototype has a hamburger menu at the top-right — the least accessible position on a 6-inch phone for a right-handed one-handed user. The thumb's natural reach zone on a 6.4-inch phone covers approximately the bottom 60% of the screen. Redesign: replace the hamburger menu with a persistent bottom navigation bar (5 tabs: Today's List, Patient Record, Log Visit, Escalate, More). The two most frequently used tabs (Log Visit during a visit, Today's List between visits) are in positions 2 and 3 — centre-left and centre, the strongest thumb zone positions on a bottom navigation bar. The Escalate tab is permanently visible in position 4 — one tap away from any screen in the app. All interactive elements are resized to a minimum 56×56dp touch target (12dp larger than the Apple HIG minimum, recommended for users who may be wearing examination gloves or operating under stress). The top third of every screen is reserved for read-only display information — no tappable elements above the mid-screen fold.

Change 2 — Redesign the medication confirmation flow from 7 taps to 3. The current 7-tap flow: Home → Patient List → Patient Detail → Medications → Today's Medications → Medication Item → Confirm Administration. This is the most safety-critical task in the app and the one with the highest cognitive load — a nurse confirming medication administration must verify patient identity, medication name, dose, route, and time. Redesign using a contextual shortcut model: when the nurse taps "Start Visit" for a patient on the Today's List screen, the app enters Visit Mode. In Visit Mode, the persistent bottom bar changes its primary centre action to a "Log Medication" floating action button that is always one tap away. Tapping it opens a bottom sheet (not a full-screen navigation) showing the patient's today's medications. The nurse confirms each medication with a single swipe (swipe right to confirm, swipe left to defer with a required reason). The bottom sheet design means the nurse never loses their place in the patient record — the confirmation happens over the existing screen. Total taps for a standard medication confirmation in the redesigned flow: 2 (open bottom sheet + confirm swipe). The 3-tap target accounts for the occasional required reason input for a deferred medication.

Change 3 — Make escalation a 1-tap action from any screen with a pre-populated clinical summary. In the current prototype, clinical escalation requires navigating to a contact screen, selecting a supervisor, and typing a message. During a patient deterioration event, a nurse cannot type a message while simultaneously managing the clinical situation. Redesign: a persistent escalation button (red, bottom-right, always visible above the bottom navigation bar) that is accessible from every screen. Tapping it opens a full-screen escalation modal with: the patient's name and DOB pre-populated, a pre-built clinical situation summary (automatically populated from the last recorded vitals and the current visit log), a priority selector (Urgent / Emergency with colour and icon differentiation), and a single "Call Supervisor" button that dials the on-call supervising clinician directly from the app. The nurse can speak to the supervisor while the screen shows the pre-populated clinical summary — reducing the cognitive load of simultaneously managing the call and recalling patient details. The escalation event is automatically logged in SystmOne with a timestamp when the call button is tapped.

Designing for Variable Lighting

Patient homes range from brightly lit sunrooms to dark bedrooms. The app must be readable in both extremes without manual brightness adjustment during a visit. Solution: implement automatic ambient light adaptation using the device's light sensor (available in Android API level 1) to adjust the interface's display mode. In low-light conditions (lux reading below 50): switch to a dark mode variant with reduced white space (white in dark rooms is a glare source that impairs clinical assessment by temporarily affecting night vision). In high-light conditions (direct sunlight): increase text size by one step and increase minimum contrast ratios above WCAG AA to WCAG AAA for the primary data display (patient name, medication name, dose). This is not a user-preference toggle — it is an automatic adaptation to the nurse's current environment.

Early Warning Metrics:

  • One-handed task completion rate in simulated field conditions — test participants must complete the 3 core tasks (find today's patient list, confirm a medication, initiate an escalation) with one hand while holding a clipboard; target 90% task completion for all 3 tasks; any task below 70% is a design failure requiring redesign before clinical testing
  • Escalation time from any screen — time from the decision to escalate to the supervisor being called; target under 12 seconds from any screen in the app; above 20 seconds is a clinical safety risk
  • Deferred documentation rate in field testing — what percentage of visit observations are being logged during the visit vs. after the visit (from memory); a high deferred documentation rate indicates the in-visit logging UI is too friction-heavy; target below 15% deferred documentation for standard vitals recording

4. Interview Score: 9.5 / 10

Why this demonstrates senior-level maturity: The contextual workflow map (8 phases of a community nursing visit with app interaction points, physical context, and cognitive load annotated) is the research artefact that makes every subsequent design decision evidence-based rather than assumption-based. Redesigning the medication confirmation flow from 7 taps to 3 using a Visit Mode contextual shortcut model — and specifically choosing a bottom sheet (non-disruptive overlay) rather than full-screen navigation to preserve screen context — demonstrates interaction design craft that understands the clinical stakes of this task. The automatic ambient light adaptation (using the device's light sensor, not a user preference toggle) shows that this designer understands the difference between a preference feature and an environmental adaptation that the user cannot interrupt a clinical task to make manually.

What differentiates it from mid-level thinking: A mid-level designer would redesign the visual aesthetics of the prototype screens (cleaner typography, better spacing, a healthcare-appropriate colour palette) without addressing the fundamental one-handed interaction architecture failure. They would not conduct field research in patient homes, would not know about the 56dp touch target recommendation for gloved use, and would not design the Visit Mode contextual shortcut that reduces the medication confirmation flow from 7 taps to 3. They would also not address the deferred documentation problem — which is the root cause of the data quality issue, not the documentation UI itself.

What would make it a 10/10: A 10/10 response would include a specific thumb reach zone diagram for the Samsung Galaxy A54 annotated with the touch target placement rationale, the complete Visit Mode bottom sheet interaction specification with the swipe gesture model and animation timing, and a clinical safety assessment template for the escalation feature showing the hazard identification and risk control measures required for DCB0129 compliance.



Question 7: UX Strategy — Defining the Design Vision for a 0-to-1 Product

Difficulty: Elite | Role: UX Designer / Head of Design | Level: Staff / Principal | Company Examples: Figma (early stage), Linear, Notion, Superhuman, Loom


The Question

You are the first UX designer hired at a B2B SaaS startup that has just closed a Series A (£8M). The product is a construction project management tool targeting small-to-medium construction companies (10–200 employees). The founding team is 3 engineers and a CEO who is a former construction site manager. They have built an MVP that 40 paying customers are using — but the product was built by engineers solving their own interpretation of the problem, with no user research and no design system. The CEO has strong opinions about what the product should do and how it should look, based on his 15 years on construction sites. Your brief: define the product's UX vision, build the design foundation, and deliver the first fully designed product version within 4 months. You are the only designer. How do you approach this role in the first 90 days?


1. What Is This Question Testing?

  • Design leadership in ambiguity — the first designer at a startup does not receive a brief and a design system; they receive a problem space, a paying customer base, and the expectation to create structure from chaos; this question tests whether the candidate can operate in unstructured environments with competing priorities and produce direction, not just output
  • Stakeholder alignment at the founder level — working with a CEO who has strong product opinions (derived from 15 years of domain experience) requires a specific approach: the CEO's domain knowledge is a research asset, not an obstacle; the designer must leverage it while creating space for user research to surface perspectives the CEO's personal experience may not represent (small business owners who are not former site managers, office-based estimators who also use the product, subcontractor project managers)
  • UX strategy thinking — a product vision is not a style guide or a component library; it is a point of view on what the product should feel like for its users, what emotional response it should create, and what design principles will guide every decision for the next 3–5 years; a first designer who delivers only components without a vision has built a foundation with no direction
  • Prioritisation under resource constraint — as the only designer with a 4-month delivery deadline, the candidate must make explicit trade-offs: what design foundation work (tokens, components, patterns) is necessary before the product version can be designed, and what can wait until after; building a complete design system before designing any product screens is a common first-designer mistake that delays value delivery by 3–4 months
  • Domain literacy — construction project management has specific workflow patterns (RFIs, submittals, punch lists, site instructions, defect logging) and user personas (site manager, quantity surveyor, project coordinator, subcontractor) that the designer must rapidly acquire enough knowledge of to design credible workflows; designing a construction PM tool without knowing what a punchlist is produces screens that will be immediately rejected by the 40 paying customers
  • Measurement orientation — a Series A startup with 40 paying customers needs to grow to 400 customers within 18 months to justify a Series B; the design must be evaluated against business metrics (customer retention, feature adoption, time-to-value for new customers) from the start, not just usability metrics; the first designer sets the measurement culture for the design function

2. Framework: First Designer 90-Day Model (FD90DM)

  1. Assumption Documentation — Understand the 40 paying customers: who are they (site managers, project managers, company owners?), how did they find the product, what do they use it for, and what do they not use that was built for them; this is the first research the designer conducts — not general market research but deep understanding of the specific people who have already chosen to pay for this product
  1. Constraint Analysis — Solo designer with 4-month delivery deadline, CEO with strong design opinions who must be aligned not managed, no design system or component library, product is live and customers are using it (any major structural changes affect live users), Series A pressure means the product must demonstrably improve over the 4-month window
  1. Tradeoff Evaluation — Design foundation first (tokens, components, then product) vs. product design first (design the new version, extract the system as you go); for a solo designer with a 4-month deadline, the correct approach is "just enough system" — define the tokens and the 10 most critical components in Week 2–3, then design the product; extract additional components from the product designs as they emerge; a perfect system delivered in Month 3 is less valuable than a good-enough system delivered in Week 3 that enables Month 1–4 product design work
  1. Hidden Cost Identification — The CEO's strong opinions represent both an asset (domain knowledge, stakeholder alignment) and a risk (decisions made on personal experience rather than user data); the designer must build trust with the CEO through demonstrated domain competence before introducing research findings that may challenge their assumptions; a designer who arrives on Day 1 and immediately questions the CEO's product decisions will be dismissed; one who spends the first 3 weeks demonstrating understanding of the domain will be listened to in Week 4
  1. Risk Signals / Early Warning Metrics — Customer retention rate (are the 40 customers still using the product at Month 4? — churn during the design overhaul is a signal that the redesign is disrupting rather than improving the existing workflow), feature adoption rate for newly designed features (are customers using the new designs in their daily work?), CEO design review approval rate (are design decisions being approved in review or regularly revised? — a high revision rate signals a misalignment in design direction that must be resolved)
  1. Pivot Triggers — If by Month 2, customer interviews consistently reveal that the product is being used for a different primary use case than the one the CEO believes it serves (a common finding in early-stage products), the product vision must be revised before the design direction is finalised; designing beautifully for the wrong use case is more damaging than a rough design for the right one
  1. Long-Term Evolution Plan — Month 1–3: research, vision, foundation, first designed module; Month 4–6: full product version V1; Month 7–12: user research programme, design system maturation, second designer hire; Month 13–18: design ops foundation, first design-led product initiative

3. The Answer

Explicit Assumptions:

  • The MVP: a web app with basic project creation, task management, document storage, and a daily site diary; no mobile app; built on a React front-end with no design tokens or component library
  • The 40 paying customers: primarily small construction companies (15–80 employees) in the UK; primary users are project managers and site managers; the CEO's network was the primary acquisition channel
  • The CEO: technically non-designer but has a strong aesthetic preference for "clean and professional" — he has referenced Linear and Notion as visual inspirations; he has approved all current product decisions personally

Week 1–2: Listen Before Designing

The first two weeks are entirely research, not design. Three activities in parallel: (1) Interview 8 of the 40 paying customers (1 hour each, remote): the questions focus on understanding their current workflow — "walk me through the last project you managed from first site visit to handover"; what parts of the current product they use every day vs. rarely; what they do in spreadsheets, WhatsApp, or paper that the product does not currently support; and what would make them recommend the tool to a peer. Do not show designs, do not propose solutions — listen. (2) Shadow 2 customers on site for half a day each: construction site work is physically demanding and frequently interrupted; understanding the environment where the product is used (safety PPE, dust, bright sunlight, site noise, gloved hands) directly shapes the design decisions. Ask the site manager to open the current product on their phone mid-task and narrate what they are doing. (3) Debrief with the CEO for 2 hours: not to challenge their assumptions but to understand their domain. Ask them to walk through the full project lifecycle for a typical construction project (RIBA Stage 2 through to Stage 6), explain what the biggest pain points are at each stage, and identify which types of companies they believe the product serves best. This debrief establishes domain credibility with the CEO and surfaces their mental model of the user — which will later be tested against the customer interview findings.

Week 3: The UX Vision

A UX vision is a one-page document that answers: who is this product for, what does it help them do, and what does it feel like to use it? Draft it from the research findings and review it with the CEO before any design work begins. For this construction PM tool, the vision might read: "Tarka is the project management tool for site-first construction teams. Unlike enterprise construction software built for the office, Tarka is designed for the person walking the site — fast to use with one hand, clear enough to read in direct sunlight, and honest about what needs attention today. Every screen asks: 'does this help the site manager manage tomorrow's site?' If the answer is no, it does not belong in the product." The vision also defines 4 design principles that will guide every decision: (1) Site-first: every feature is designed for on-site use before office use. (2) Action-forward: the most important action at each moment is always the most visible and accessible element on the screen. (3) Honest complexity: construction projects are complex; the product does not hide this complexity but presents it in the order it needs to be addressed. (4) Earned trust: safety, documentation accuracy, and audit trail are not features — they are the foundation on which every other feature is built. Review this vision with the CEO in a 60-minute session. The goal is not to get sign-off on every word but to confirm that the direction resonates with their domain experience and creates shared design language that future decisions can reference.

Week 3–4: The "Just Enough" Design Foundation

Before designing any product screens, establish the minimum design system foundation that makes the product design work consistent: design tokens (colour palette — 5 primary, 5 neutral, 3 semantic: error/warning/success; typography scale — 3 sizes: body/heading/display; spacing scale — 8px base grid: 8/16/24/32/48px); and 10 critical components (button, input field, form layout, card, badge, navigation bar, empty state, loading skeleton, alert/toast, modal). Build these in Figma using the Tokens Studio plugin so they can be exported to CSS variables when engineering is ready. These 10 components cover approximately 70% of the UI surface area of the MVP. Do not build more components than this before starting product design — every additional component that is not immediately needed is a week of foundation work that delays the product design that customers and the CEO are waiting to see.

Month 2–3: Design the Product — The Site Manager's Daily Workflow

From the customer research, identify the highest-value design target: the workflow that the most customers use most frequently and find most friction-heavy. For a construction PM tool with this profile, this is almost certainly the daily site diary and task management flow — the site manager's morning routine of reviewing what needs to happen today, logging yesterday's work, and communicating issues to the office. Design this workflow end-to-end first: it is the core product experience that new customers encounter first and that existing customers use every day. The design process: (1) Map the current workflow from the research (what does the site manager do from 7:30am when they arrive on site to the first break?). (2) Design the ideal workflow — what would this look like if it was designed specifically for the site-first principle? (3) Review with 3 customers from the interview cohort (30-minute video calls, share Figma prototype, ask them to complete their morning routine task). (4) Revise based on feedback. (5) Handoff to engineering with a complete Figma spec and interaction notes. This cycle produces the first fully designed module of the product within Month 2. It also produces the first evidence that the design direction is resonating with customers — which is the currency the designer needs to build credibility with the CEO for the subsequent design decisions.

Managing the CEO's Design Opinions

The CEO's strong opinions are a management challenge that must be resolved in the first 30 days or it will slow every subsequent design review. The approach: involve the CEO early and specifically. Before finalising any design, share a rough direction (not a polished comp) and ask for their domain input first: "Does this match how a site manager would think about their morning routine?" This positions the CEO as a domain expert contributing to the design, not a stakeholder reviewing a finished design. When their domain input is incorporated, they feel ownership of the design outcome. Reserve the design expertise conversations ("why I've chosen this interaction pattern over that one") for moments when the CEO's aesthetic preference conflicts with a research-backed usability decision — and at that point, bring the customer research. A CEO who has heard 3 customers describe the same pain point in their own words is far more persuadable than a CEO who is told by a designer that their instinct is wrong.

Early Warning Metrics:

  • Customer interview insight alignment rate — after the 8 customer interviews, what percentage of the CEO's stated product beliefs are confirmed vs. contradicted by customer data? track this explicitly; a high contradiction rate (>40%) means the product has been built for an imagined user, not the actual user; this is a positive finding (early course correction is cheap) but must be presented carefully
  • Design review cycle time — how many rounds of revision are required per design before the CEO approves it for engineering handoff? target 2 rounds maximum; above 3 rounds consistently indicates either a direction misalignment or a communication failure in design presentation that must be addressed
  • Feature adoption rate at Month 4 — for the first newly designed module (site diary and task management), what percentage of the 40 customers are using it in their daily workflow by Month 4? target 60%; below 30% indicates the design solved the wrong problem

4. Interview Score: 9.5 / 10

Why this demonstrates senior-level maturity: The "just enough system" principle — defining 10 critical components in Week 3 to enable product design rather than building a complete design system before starting — is exactly the trade-off thinking a first designer at a Series A startup must make; a junior designer would spend 3 months on the system and deliver no product design. The UX vision document format (one page, 4 principles, reviewed with the CEO before any design) creates alignment infrastructure before design work begins — preventing the retrospective conflict when finished designs are reviewed against misaligned expectations. Framing the CEO's strong opinions as a management challenge to be resolved in the first 30 days (not managed around) reflects the organisational maturity of a staff-level designer.

What differentiates it from mid-level thinking: A mid-level designer would arrive on Day 1, request a brief, and start building a Figma component library. They would not conduct contextual field research on construction sites, would not draft a UX vision before designing, would not know about the "just enough system" principle for resource-constrained environments, and would not have a strategy for managing the CEO's design opinions that creates alignment rather than conflict.

What would make it a 10/10: A 10/10 response would include the complete one-page UX vision document template with the design principle format and the specific construction domain language that makes it credible to the CEO, a first-designer 90-day checklist showing the week-by-week deliverables, and a concrete design review meeting format that structures CEO input by domain expertise vs. design decision rather than allowing undifferentiated feedback.



Question 8: Cross-Platform Design — Maintaining Consistency Across Web, Mobile, and Tablet

Difficulty: Senior | Role: UX Designer | Level: Senior | Company Examples: Salesforce, Zendesk, HubSpot, Notion, Linear


The Question

You are a Senior UX Designer at a project management SaaS company. The product currently exists as a web app, a native iOS app, and an Android app. Each has been designed by a different team over 3 years with no shared design system. The result: a feature added to the web app in Q1 may not appear on mobile until Q3 (or ever); the mobile apps look like different products from each other; and users who switch between platforms report feeling "lost" because the navigation architecture is fundamentally different on each platform. The Head of Product wants a cross-platform design strategy that achieves 3 goals: feature parity across platforms within 12 months, consistent visual identity, and a native-appropriate experience on each platform (not a mobile web view forced into a native app shell). These three goals are partially in tension with each other. Design the cross-platform strategy.


1. What Is This Question Testing?

  • Platform design expertise — understanding that "consistent across platforms" and "native-appropriate for each platform" are not contradictory — they are achieved through a two-layer system; the brand and interaction model layer is consistent; the implementation layer adapts to each platform's conventions (iOS Human Interface Guidelines, Android Material Design, and web-specific patterns); a designer who achieves consistency by making everything look identical across platforms has failed the native-appropriate goal
  • Information architecture strategy — the navigation architecture inconsistency (different structures on web, iOS, and Android) is the highest-priority problem because it affects the user's mental model of the entire product; a user who understands the product's IA can find features they know exist; a user who cannot understand the IA cannot use the product confidently on any platform; the IA must be defined as a platform-agnostic model first, then translated into each platform's navigation pattern (sidebar navigation on web, tab bar on iOS, bottom navigation on Android)
  • Design systems for multi-platform — a cross-platform design system has different architecture from a single-platform system; it requires a "multi-tier token" approach: global tokens (the brand values: brand colours, typography family, spacing scale) that are identical across all platforms, component tokens (platform-specific values that reference global tokens: component/button/border-radius is 8px on web, 12px on iOS following HIG, 4dp on Android following Material Design), and the component implementation layer (each platform's component built according to its platform conventions but using the same component tokens for colour and spacing)
  • Prioritisation and sequencing — achieving feature parity, visual consistency, and native-appropriate UX in 12 months with separate teams for each platform is only possible with a clear sequencing strategy: IA alignment first (affects all other work), design system foundation second (enables consistent visual identity), feature parity third (built on the consistent foundation); doing these in the wrong order (feature parity first, then retroactively imposing a design system) creates the same inconsistency problem in new features that currently exists in existing ones
  • Team and process thinking — three separate platform teams (web, iOS, Android) designed independently because there was no cross-platform design process; the strategy must include a design process change (how are new features designed and how do they propagate to each platform?) not just a design output change (a design system and updated screens)
  • User research grounding — the "lost" feeling reported by users who switch between platforms is the research signal that must be diagnosed before the strategy is set; what specifically makes users feel lost: the different navigation architecture (location of features), the different visual identity (different colours, typography, component styles), or the different functionality (features present on web but not mobile)? each has a different solution priority

2. Framework: Cross-Platform Design Alignment Model (CPDАМ)

  1. Assumption Documentation — Quantify the feature parity gap: what percentage of web features exist on iOS? on Android? which features are missing from each platform and what is their usage frequency on web (high-usage missing features are the parity priority)? map the navigation architecture for all 3 platforms as-is to identify the specific structural differences that cause users to feel "lost"
  1. Constraint Analysis — 3 separate platform teams with independent roadmaps and engineering stacks (React web, Swift iOS, Kotlin Android); 12-month feature parity target; existing customers on all 3 platforms who will experience any navigation architecture change as a disruption
  1. Tradeoff Evaluation — Platform-native first (each platform follows its own conventions, consistency is achieved at the brand and IA level only) vs. cross-platform component library (same visual components on all platforms, sacrifices some native-appropriateness for faster consistency) vs. web-first with progressive web app on mobile (maximum feature parity, minimum native-appropriateness — the anti-pattern the Head of Product has explicitly excluded)
  1. Hidden Cost Identification — A navigation architecture change to align the IA across platforms is a significant user disruption event; users who have learned the current navigation structure on any platform will experience a muscle memory disruption when it changes; this requires a communication strategy (in-app onboarding overlay) and a measurement plan (navigation task completion rate before and after the change) to validate that the new IA is better, not just different
  1. Risk Signals / Early Warning Metrics — Cross-platform user satisfaction score (CSAT from users who report using 2 or more platforms — their satisfaction specifically reflects the cross-platform experience); feature adoption rate of parity features on mobile (when a web feature is brought to mobile, what percentage of mobile users adopt it within 30 days? — low adoption of parity features may indicate they were not needed on mobile, not that they were missing)
  1. Pivot Triggers — If a navigation architecture alignment test (A/B testing the new IA against the current IA on one platform) shows task completion rate declining rather than improving, the proposed IA may be optimised for theoretical consistency rather than the users' actual mental model of the product; pause and conduct card sorting and tree testing before proceeding
  1. Long-Term Evolution Plan — Month 1–3: IA alignment strategy and cross-platform token system; Month 4–6: navigation architecture rollout on all 3 platforms; Month 7–9: visual identity alignment via design system; Month 10–12: feature parity sprint for top 15 missing mobile features; Year 2: continuous parity process embedded in the feature development workflow

3. The Answer

Explicit Assumptions:

  • Feature parity gap from audit: 47% of web features are missing from iOS, 61% are missing from Android; the highest-usage missing features are: recurring tasks (missing from both mobile apps), bulk actions on task lists (missing from both mobile), and the reporting dashboard (missing from both mobile)
  • The navigation architecture: web uses a left sidebar with 8 top-level sections; iOS uses a tab bar with 5 tabs (different sections from the web sidebar); Android uses a navigation drawer with 6 sections (different again); no two platforms share the same section names or hierarchy
  • User research signal: an exit survey of churned users who reported using both web and mobile identified "I can never find things on my phone" as the top reason for churn, cited by 34% of churners — validating that the IA inconsistency is a retention risk, not just a UX complaint

Step 1: Define the Platform-Agnostic IA First

The cross-platform strategy must begin with a platform-agnostic information architecture — a definition of the product's top-level sections, their names, and their hierarchy that exists independently of how each platform implements the navigation pattern. This IA is derived from: card sorting with 12 users (6 web-primary users, 6 mobile-primary users) to understand how users mentally group the product's features; tree testing of the 3 current IA structures to identify which navigational paths cause the most confusion; and an analysis of the feature usage frequency data (which sections are visited most, which sections produce the most back-navigation — a signal of wrong-place navigation). The output is a canonical IA with 6 top-level sections (reduced from the current 8/5/6 across the three platforms): Home (today's tasks and activity feed), Projects (all projects the user belongs to), My Tasks (personal task management view), Inbox (notifications and comments), Team (people and resource management), and Reports (dashboards and analytics). These 6 sections are the same on every platform — but how they are presented differs by platform convention.

The Two-Layer Consistency Model

Layer 1 — Consistent (identical across all platforms): section names and hierarchy (the 6 canonical IA sections), brand colours and their semantic usage (primary action: colour/brand/primary, destructive action: colour/feedback/error), typography family (the brand typeface renders identically on web, iOS, and Android using the same font files), and iconography (a unified icon set used identically across all 3 platforms — not platform-native icons for conceptual icons). Layer 2 — Platform-adapted (follows each platform's conventions): navigation pattern (web: persistent left sidebar; iOS: bottom tab bar with 5 tabs, with 6th section in a "More" tab; Android: bottom navigation bar following Material Design 3 specifications), component visual style (button border radius, input field height, card elevation follow each platform's HIG or Material Design spec, not a universal value), gesture interactions (iOS: swipe-from-left-edge to go back; Android: bottom navigation back gesture; web: browser back button — all documented in the component interaction spec), and spacing and density (iOS HIG recommended touch targets, Android 48dp minimum touch targets, web: pointer-based density with hover states).

The Cross-Platform Design Token Architecture

The design system that enables the two-layer model is a three-tier token structure: Tier 1 — Global tokens: the brand primitives. These are identical across all platforms and are the single source of truth: global/colour/brand/blue-600: #1A73E8, global/typography/family/primary: 'Roobert', global/spacing/base: 8. Tier 2 — Semantic tokens: platform-agnostic meaning layer. These reference global tokens and define intent: semantic/colour/action/primary: {global/colour/brand/blue-600}, semantic/spacing/component/padding-md: {global/spacing/base * 2}. Tier 3 — Platform tokens: platform-specific overrides. These reference semantic tokens and apply platform conventions: platform/ios/component/button/border-radius: 12px, platform/android/component/button/border-radius: 4dp, platform/web/component/button/border-radius: 6px. The token architecture means that a brand colour change (updating global/colour/brand/blue-600) propagates automatically to all 3 platforms in a single token update. Platform-specific values (corner radii, spacing densities) are maintained separately and never need to change when brand decisions change.

The Feature Parity Process: Solving the Structural Problem

Feature parity cannot be solved by a 12-month sprint — it will recur if the process that caused it is not fixed. The structural problem: web features are designed and shipped without a mobile design phase in the same sprint; they accumulate as "mobile backlog" that is never prioritised because it is less visible than new web features. Fix: implement a cross-platform design review gate in the feature development process. When a new feature is designed for web, it must also be designed for iOS and Android (at least to a medium-fidelity wireframe) before the web feature enters the engineering sprint. The mobile designs do not need to ship in the same sprint — but they must exist before web ships, so the engineering effort to implement them on mobile is sequenced and estimated rather than deprioritised indefinitely. The web designer is responsible for the platform-agnostic IA and interaction model; the iOS and Android designers adapt it to their platform's conventions. This requires the 3 platform design teams to operate in a cross-platform design channel (a weekly 45-minute "cross-platform sync" where upcoming features are reviewed by all 3 platform designers simultaneously) that does not currently exist.

Early Warning Metrics:

  • Cross-platform task completion parity — measure task completion rate for the 5 core tasks on web and on mobile; the goal is parity within 10 percentage points (if web shows 85% completion on "create a recurring task," mobile should show 75%+); large gaps indicate that the mobile implementation of a feature is not at parity in quality, not just in availability
  • IA navigation success rate post-change — after rolling out the new unified IA, measure first-attempt navigation success rate (does the user find the feature on the first navigation attempt?); target 75%+ first-attempt success on all 3 platforms by Month 6; below 60% on any platform indicates the new IA has not resolved the "lost" feeling
  • Cross-platform sync attendance rate — the weekly cross-platform design sync is only effective if all 3 platform teams attend; below 80% consistent attendance indicates the process is not embedded; escalate to the Head of Product to make it a required meeting

4. Interview Score: 9 / 10

Why this demonstrates senior-level maturity: The three-tier token architecture (global → semantic → platform) is the specific design systems solution that enables "consistent brand identity" and "native-appropriate experience" simultaneously — it is not a theoretical concept but an implementable architecture that solves the exact tension the Head of Product identified. The feature parity process fix (cross-platform design review gate before any web feature enters engineering) addresses the structural cause of parity drift rather than treating the symptom with a one-time sprint. The two-layer consistency model (consistent IA and brand, adapted navigation and components) gives the product team a clear decision framework for every future design question: "is this a consistency decision (layer 1) or a platform adaptation decision (layer 2)?"

What differentiates it from mid-level thinking: A mid-level designer would propose building a shared component library and shipping it to all 3 platforms — producing visual consistency at the cost of native-appropriateness, and not addressing the feature parity process problem at all. They would not know about the three-tier token architecture, would not conduct card sorting and tree testing to derive a platform-agnostic IA, and would not identify the cross-platform design sync process change as the solution to the parity drift structural problem.

What would make it a 10/10: A 10/10 response would include a specific example of the three-tier token architecture in JSON format (showing the global → semantic → platform inheritance for the primary button colour), a completed card sorting analysis methodology (showing how to synthesise 12 participants' card sort data into a validated IA structure), and a cross-platform design sync meeting agenda showing how new features are reviewed across platforms in 45 minutes without slowing the individual platform teams' delivery velocity.



Question 9: Measuring UX Impact — Defining and Tracking Design Metrics

Difficulty: Senior | Role: UX Designer | Level: Senior / Staff | Company Examples: Google HEART framework teams, Spotify design metrics, Airbnb experience quality, Meta design analytics


The Question

You are a Senior UX Designer at a B2B enterprise software company with 50,000 active users. The VP of Product has asked you to establish a UX measurement framework — a set of metrics that quantitatively track the quality of the user experience and demonstrate the ROI of design investment. The current state: the product team measures feature adoption (% of users who clicked a feature) and NPS (once a year, generic), but has no metrics that connect design decisions to user outcomes. The engineering team believes that design decisions are "subjective" and that only A/B test results are valid evidence. You have been asked to present the measurement framework to the VP of Product and engineering leadership within 3 weeks. Walk through the framework you would build, how you would choose the right metrics for this context, and how you would address the engineering team's A/B test scepticism.


1. What Is This Question Testing?

  • Measurement literacy — understanding the full landscape of UX measurement: behavioural metrics (what users actually do, measured from product analytics), attitudinal metrics (what users say and feel, measured from surveys and research), and business metrics (the outcomes the product is designed to produce — retention, expansion revenue, support ticket volume); a complete UX measurement framework includes all three, not just one
  • Framework selection — knowing about established UX measurement frameworks (Google's HEART framework: Happiness, Engagement, Adoption, Retention, Task Success; Nielsen Norman Group's UX metrics approach; JTBD-aligned outcome metrics) and being able to justify which framework is most appropriate for a given product context and measurement maturity level
  • Stakeholder communication — the engineering team's "design is subjective, only A/B tests are valid" position is a real and common challenge; it must be addressed with evidence, not defensiveness; the designer must: (1) agree that A/B tests are excellent and should be used more; (2) explain what A/B tests cannot measure (the reason for a behavioural change, whether the winning variant is good design or manipulative dark pattern, long-term retention effects of short-term conversion optimisations); and (3) show the specific situations where qualitative and attitudinal research produces faster and cheaper evidence than an A/B test
  • Metric design — not all metrics are equally useful; a metric must be: measurable (can it be calculated from available data?), actionable (does a change in this metric tell the team what to do differently?), sensitive (does it change meaningfully when a design improvement is made?), and not gameable (can the metric be improved without improving the user experience?); feature adoption rate fails the "not gameable" criterion — a prominent notification that forces users to click a feature improves feature adoption without improving the experience
  • Business context alignment — a B2B enterprise software product has different UX metric priorities from a B2C consumer app; in enterprise, the primary churn drivers are low adoption within the buyer organisation (if users don't use the tool, the buyer doesn't renew), slow time-to-value for new accounts (if the product takes 3 months to deliver value, the buyer becomes sceptical before the first renewal), and high support ticket volume (a proxy for user confusion that has both cost and churn risk implications)
  • Instrumentation awareness — knowing that the right metrics framework is useless without the product analytics instrumentation to measure it; a designer who proposes task completion rate as a metric without knowing whether the product's analytics are instrumented at the task level is proposing an unimplementable framework

2. Framework: UX Measurement Framework Design Model (UXMFDM)

  1. Assumption Documentation — Audit the current analytics instrumentation: what events are currently tracked in the product analytics tool (Mixpanel, Amplitude, Pendo)? What surveys are deployed (NPS survey only)? What support ticket data is available and categorised? What does the engineering team currently consider valid evidence for product decisions? The framework must be built on data that can actually be collected, not on an idealised measurement world
  1. Constraint Analysis — 3-week timeline to present the framework, engineering team scepticism of non-A/B evidence, B2B enterprise context (50,000 users across multiple organisations with different usage patterns — individual user metrics must be segmented by account to be meaningful), annual NPS survey cadence (insufficient for actionable design feedback)
  1. Tradeoff Evaluation — HEART framework (Google's established framework: comprehensive but complex to instrument fully) vs. a custom framework aligned to the specific B2B product's business metrics (more actionable, but requires more design work to establish); for a team with low measurement maturity, the HEART framework's established structure provides a credibility anchor that a fully custom framework does not
  1. Hidden Cost Identification — The engineering effort to instrument the analytics for a new metrics framework is significant; every new metric that requires product event tracking (task completion rate, time-on-task) requires an engineering sprint to add the tracking events; the framework must be designed with the implementation cost in mind, proposing phased instrumentation (use existing data for Phase 1 metrics, add instrumentation for Phase 2 metrics)
  1. Risk Signals / Early Warning Metrics — Metric gaming risk (any metric that becomes a target becomes a bad metric — Goodhart's Law; design the metric set to make gaming evident: if feature adoption increases while task success rate decreases, the adoption was driven by forced interaction rather than genuine value); stakeholder adoption rate for the framework (are the VP of Product and engineering leadership actually using the metrics in their weekly reviews? — a framework that is presented and forgotten has zero ROI)
  1. Pivot Triggers — If the engineering team rejects the framework as insufficiently rigorous despite the evidence presented, propose a 90-day pilot: instrument 3 specific metrics for one product area, run a design improvement sprint, measure the before/after, and present the results; an empirical pilot with real data is more persuasive than a theoretical framework debate
  1. Long-Term Evolution Plan — Phase 1 (Month 1–3): instrument 5 core metrics using existing analytics data; Phase 2 (Month 4–6): add task completion instrumentation for the 3 core user flows; Phase 3 (Month 7–12): attitudinal survey integration, account-level UX health score, quarterly design impact reports

3. The Answer

Explicit Assumptions:

  • Product analytics tool: Amplitude (with existing event tracking for page views, feature clicks, session duration, and NPS score delivery)
  • Current tracked metrics: feature adoption (% of users who triggered a feature in the past 30 days), monthly active users, session duration, and annual NPS (score: 28 — below B2B SaaS average of 31)
  • Engineering team's A/B testing practice: they run A/B tests for UI changes affecting conversion-related features (onboarding, upsell prompts) but not for general UX improvements; they consider A/B tests to be the only statistically valid evidence
  • B2B context: 50,000 users across 800 accounts; accounts range from 10 users (SMB) to 500 users (enterprise); account health (adoption across users within an account) is the primary predictor of renewal

The Framework: HEART Adapted for B2B Enterprise

The Google HEART framework (Happiness, Engagement, Adoption, Retention, Task Success) is the right foundation because it is recognised by engineering teams (Google provenance), covers all three measurement layers (behavioural + attitudinal + outcome), and is extensible to the B2B context. Adapt it as follows for this specific product: Happiness — measures whether users feel good about their experience: metric: Customer Effort Score (CES) — "How easy was it to accomplish what you were trying to do?" measured via a triggered in-app survey after completion of the 3 core tasks (not the annual NPS); CES is more actionable than NPS because it is task-specific and immediate; the CES question fires when a user completes a task that has been instrumented in Amplitude. Target: CES of 5.5/7 for each of the 3 core tasks. Engagement — measures the depth of product usage: metric: Weekly Active Feature Rate — the number of distinct product features used per user per week (not just sessions, but breadth of feature usage); a user who uses 3 features deeply is more engaged than one who visits 8 features shallowly; this metric resists the "click a feature once" gaming of the current feature adoption metric. Target: 3+ features per user per week for 70% of MAU. Adoption — measures whether new users and new features are successfully onboarded: metrics: Time-to-First-Value (TTFV) — the time from account creation to a user's first completion of the product's core value action (for a project management tool: creating a project, adding tasks, and sharing with a team member); Pendo tracks this via user journey analytics on the existing event data. Target: 80% of new users achieve first value within 7 days. Task Success — measures whether users can accomplish their goals: metric: Core Task Completion Rate — measured via instrumented user flows for the 3 highest-usage tasks (create project, assign task, generate report); Amplitude funnel analysis measures the % of users who start a task and complete it without abandonment; this does require engineering instrumentation for the 3 task start events (the completion events are likely already tracked). Target: 85% completion rate for each of the 3 core tasks. Retention — the business metric that ties the UX framework to commercial outcomes: metric: Account Health Score — a composite of the metrics above, calculated per account: (% of account's licensed users who are weekly active) × (average Core Task Completion Rate for the account's users) × (account's average CES). This single account-level metric is the most commercially relevant UX metric for a B2B product because it predicts renewal risk. An account with an Account Health Score below 50 is at churn risk — customer success can be alerted to intervene before the renewal conversation.

Addressing Engineering Scepticism: Evidence Over Argument

Do not argue that design is not subjective — that is a defensive position. Instead: agree that A/B tests are excellent and propose to use more of them; then show what A/B tests cannot measure, with specific examples from the company's own context. What A/B tests cannot measure: (1) The reason a variant wins. An A/B test on the onboarding flow may show that Variant B has a 12% higher completion rate — but it cannot tell you that Variant B wins because it reduces cognitive load at Step 4 vs. because it uses a more prominent button colour (a dark pattern). Qualitative research (usability test of both variants) tells you why Variant B wins, which determines whether the design principle behind the win can be applied elsewhere. (2) Long-term retention effects of short-term conversion wins. An A/B test run over 2 weeks may show Variant B winning on 14-day retention — but the 180-day retention effect of optimising for short-term conversion at the expense of user comprehension is not measurable in a 2-week test window. (3) Systemic UX health across a product with 50,000 users and hundreds of features. You cannot A/B test every screen and feature in a complex enterprise product simultaneously — the framework provides a continuous signal about the product's overall UX health that complements the specific signals from individual A/B tests. Close with the pilot proposal: "Let's run a 90-day pilot. We instrument the Core Task Completion Rate for the project creation flow, we make a specific design improvement, and we measure the before/after. If the improvement is real, the data will show it. If it is not, the data will show that too."

The Account Health Score: The Metric That Changes the Conversation

The Account Health Score is the metric that makes the UX framework commercially credible to the VP of Product and engineering leadership — because it connects UX quality directly to renewal revenue. Present it with a worked example: Account Foxton Logistics: 45 licensed users. Weekly Active Users: 22 (49%). Core Task Completion Rate: 71%. CES average: 4.1/7. Account Health Score: 49 × 0.71 × (4.1/7) = 20.4 out of 49. This is a churn-risk account. The customer success team currently rates this account as healthy based on their manual assessment — they do not know about the low task completion rate and poor CES scores. The Account Health Score surfaces the risk before the renewal conversation. Present 3 such worked examples to the VP of Product, one of which is an account the CS team is currently worried about (to validate the metric) and one of which is an account the CS team considers healthy (to surface a non-obvious risk). This is the demonstration that the framework produces actionable commercial intelligence, not design self-justification.

Early Warning Metrics:

  • Framework instrumentation completion rate — the 3 Core Task Completion Rate flows must be instrumented in Amplitude within 6 weeks of the framework's approval; track weekly progress; any instrumentation not complete by Week 6 delays the pilot results timeline and gives engineering sceptics a reason to discount the framework before it has produced data
  • Account Health Score distribution — once live, the distribution of Account Health Scores across the 800 accounts should show a normal-ish distribution; a bimodal distribution (accounts are either very healthy or very unhealthy with nothing in between) indicates the scoring formula is not calibrated correctly; refine the formula based on the distribution shape
  • VP of Product framework usage rate — does the VP include the UX metrics in their weekly product review? If not, the framework has not been adopted by the stakeholder who was supposed to use it; schedule a monthly "UX metrics review" in the product leadership calendar to embed it in the cadence

4. Interview Score: 9 / 10

Why this demonstrates senior-level maturity: The Account Health Score composite metric — combining weekly active users, task completion rate, and CES into an account-level churn predictor — is the design that makes the UX framework commercially relevant to B2B enterprise leadership rather than design-team-internally relevant. Agreeing with the engineering team that A/B tests are excellent (rather than defending against the challenge) and then showing specifically what A/B tests cannot measure is the intellectually honest and more persuasive approach than a defensive "design is not subjective" argument. The Goodhart's Law reference ("any metric that becomes a target becomes a bad metric") shows the measurement sophistication that distinguishes a designer who has built measurement frameworks from one who has read about them.

What differentiates it from mid-level thinking: A mid-level designer would propose tracking NPS more frequently and adding task completion rate without a framework, without addressing the engineering scepticism, and without connecting the metrics to the commercial outcomes (renewal risk, account health) that drive the VP of Product's decisions. They would not know about HEART, CES, or time-to-first-value as specific UX metrics, and would not think to design an account-level metric for a B2B context.

What would make it a 10/10: A 10/10 response would include a complete metrics framework table (showing each HEART metric, its operational definition, the data source, the instrumentation requirement, and the target value), a worked Account Health Score formula with the specific weighting rationale for each component, and a 90-day pilot plan showing which product area, which design change, and which before/after measurement methodology would produce the most persuasive empirical evidence for the engineering team.



Question 10: Ethical Design — Navigating Dark Patterns and Business Pressure

Difficulty: Elite | Role: UX Designer | Level: Senior / Staff | Company Examples: Airbnb, Booking.com, LinkedIn, Amazon Prime, Meta


The Question

You are a Senior UX Designer at a subscription e-commerce company (£180M annual revenue). The VP of Growth has proposed 4 UI changes to increase subscription retention: (1) Remove the cancel subscription button from the account settings page and replace it with a "Pause subscription" option only; (2) Add a 6-step cancellation flow where users must call a phone number to cancel (rather than cancelling online); (3) Pre-tick an "auto-renew at higher rate" checkbox on the annual plan renewal reminder email; (4) Show a countdown timer on the account deletion confirmation page ("Your account and all your data will be deleted in 24 hours — take action now!") even when there is no actual 24-hour deletion window. The VP of Growth believes these changes will improve retention by 15% and is planning to implement them within the next sprint. You believe these are dark patterns. How do you respond?


1. What Is This Question Testing?

  • Ethical design knowledge — understanding that dark patterns are specific, named, documented UI design techniques that deliberately deceive or manipulate users into taking actions they would not otherwise choose; the 4 proposed changes in this question are documented dark pattern categories: roach motel (can subscribe easily, cannot cancel easily), misdirection (forced continuity via pre-ticked auto-renew), and false urgency (the fabricated 24-hour deletion countdown)
  • Regulatory literacy — dark patterns in subscription services are an active regulatory enforcement area; the FCA's Consumer Duty (applicable to financial products), the ASA's advertising standards, the ICO's guidance on dark patterns in consent flows (applicable to data processing consent under GDPR), and the Competition and Markets Authority (CMA)'s subscription trap investigation all create specific legal exposure for exactly the practices the VP of Growth is proposing; the designer must know this and use it as part of the response
  • Stakeholder management under pressure — the VP of Growth is a senior stakeholder with a commercial mandate and a specific metric target (15% retention improvement); responding with "this is unethical and I won't do it" is correct on principle but ineffective in practice; the designer must offer an alternative path to the retention improvement that does not use dark patterns — and must make the alternative compelling enough that the VP of Growth adopts it
  • Business acumen — short-term retention gains from dark patterns are well-documented to produce long-term retention losses: users who feel trapped into a subscription they want to cancel become active brand detractors (negative NPS contribution), escalate to customer service at higher rates (cost), and churn with maximum hostility at their first opportunity (filing chargebacks, leaving negative reviews, disputing transactions); the designer must be able to make this business argument, not just the ethical one
  • Design alternatives under commercial constraints — the designer who says "no" to dark patterns without proposing alternatives has ended their involvement in the retention problem; the designer who says "no to these four, and here are four alternatives that achieve the retention goal without dark patterns" has retained influence over the outcome and demonstrated business partnership
  • Organisational courage — this question tests whether the candidate will maintain their position under commercial pressure from a senior stakeholder, or will rationalise compliance with practices they know are harmful; the correct answer involves a clear refusal to implement the dark patterns, but the quality of the answer is determined by the alternative proposals and the stakeholder management strategy

2. Framework: Ethical Design Response Model (EDRM)

  1. Assumption Documentation — Identify which of the 4 proposed changes are legally problematic (not just ethically questionable): removing the cancel button without a readily accessible cancellation route may violate the CMA's subscription trap guidance and the Consumer Rights Act 2015; the pre-ticked auto-renew checkbox for a higher-rate renewal likely violates the Consumer Contracts Regulations 2013; the fabricated 24-hour deletion countdown is a misrepresentation under the Consumer Protection from Unfair Trading Regulations 2008
  1. Constraint Analysis — VP of Growth has a retention metric target and a sprint timeline; the sprint timeline means the designer has limited time to make the counter-argument before the engineering implementation proceeds; the designer cannot indefinitely delay implementation by raising concerns — they must offer a concrete alternative within the same sprint timeframe
  1. Tradeoff Evaluation — Implement (retain the working relationship, compromise on ethics, accept legal and reputational risk) vs. escalate without alternative (maintain ethical position, damage relationship with VP of Growth, no influence on the outcome) vs. refuse and offer alternative (maintain ethical position, demonstrate business partnership, requires the designer to do significant work to make the alternative compelling)
  1. Hidden Cost Identification — Regulatory enforcement risk: the CMA's subscription trap investigation (2022–2023) resulted in enforcement action against subscription companies using exactly the patterns proposed; financial penalties and mandated practice changes are a real business cost; one CMA enforcement action against a £180M revenue company typically generates significantly more reputational damage than the 15% retention improvement is worth
  1. Risk Signals / Early Warning Metrics — Complaint escalation rate (users who cannot easily cancel are significantly more likely to contact customer service — a direct cost indicator), chargeback rate (users who feel deceived into a subscription dispute the charge with their bank — a financial loss and merchant account risk), social media negative sentiment (a single viral thread about a cancellation dark pattern can generate more brand damage than months of negative NPS recovery)
  1. Pivot Triggers — If the VP of Growth proceeds with implementation over the designer's objection, escalate to the Chief Product Officer or General Counsel (the legal exposure is the cleanest escalation argument — it is not "I don't like this" but "this exposes the company to CMA enforcement and we have a legal obligation to review it"); document the escalation in writing
  1. Long-Term Evolution Plan — Implement the ethical alternatives and measure their retention impact over 90 days; use the data to build a business case that "honest retention design" outperforms dark patterns over a 12-month measurement window; share this data with the VP of Growth as a collaborative learning, not a "told you so"

3. The Answer

Explicit Assumptions:

  • The company: UK-registered subscription e-commerce company, subject to UK consumer protection law (CMA, Consumer Rights Act, Consumer Contracts Regulations, Consumer Protection from Unfair Trading Regulations)
  • The VP of Growth: a commercially-focused senior leader with a quarterly retention KPI; they are not malicious — they are optimising for their metric within their authority; they may not be aware of the regulatory exposure
  • Current cancellation flow: an online cancellation button in account settings, accessible in 2 clicks from the account page; removal of this button is a significant regression from the current (legally compliant) state
  • The company's NPS: currently 41; the VP of Growth's retention initiative is part of a broader growth strategy that the CEO has endorsed

The Immediate Response: Name the Pattern, State the Risk, Propose the Alternative — In That Order

Request a 30-minute conversation with the VP of Growth before the sprint planning session. The conversation structure: (1) Name the patterns without moralising: "I want to flag some concerns about the four proposed changes before they go to sprint. I want to be direct: all four of these are documented dark patterns — they are named techniques that have been studied extensively for their impact on users and businesses." Do not say "this is unethical" — say "this is a category of UI design technique with a documented risk profile." The naming (dark patterns) and the category framing (documented, studied) moves the conversation from a personal ethics disagreement to a professional design practice conversation. (2) State the legal and business risk specifically, not generically: "Two of the four changes have direct legal exposure in the UK. Removing the online cancellation button and replacing it with a phone-only option may violate the CMA's October 2022 subscription trap guidance — the CMA has already taken enforcement action against companies using exactly this pattern. The pre-ticked auto-renew checkbox at a higher rate is likely a violation of the Consumer Contracts Regulations 2013. I'd recommend we get a legal opinion before implementing either of these." The specific regulatory references (CMA 2022, Consumer Contracts Regulations 2013) signal preparation and credibility — the VP of Growth will take this seriously because it is a business risk, not a designer's preference. (3) Propose the alternative — do not end the conversation at the risk identification: "I want to help you hit the retention target. I believe we can achieve a 10–15% improvement in voluntary churn without any of these changes, and I'd like to propose 4 alternative approaches that I can have designed within the same sprint window."

The 4 Ethical Alternatives

Alternative 1 (replaces removing the cancel button): Design a structured cancellation save flow — sometimes called an "honest offboarding" flow. When a user initiates cancellation, present them with: their personalised usage data ("You've placed 18 orders this year, saving an estimated £340 vs. retail"), a targeted offer (a pause option if they're travelling, a plan downgrade if cost is the concern, a skip-next-delivery if they're overstocked), and a clear "Cancel anyway" button that remains accessible at every step. The offer must be genuine and relevant — not 5 screens of emotional manipulation; the research evidence (Baymard Institute data on SaaS cancellation flows) shows that 15–25% of users who initiate cancellation accept a genuine save offer when the offer is relevant to their stated reason for cancelling. This achieves a real retention improvement without hiding the cancellation option. Alternative 2 (replaces the 6-step phone cancellation): Remove friction from the cancellation process entirely — and then measure the actual voluntary churn rate. The hypothesis that a frictionless cancellation process increases churn is frequently wrong; Spotify's 2019 redesign of their cancellation flow (making it significantly easier to cancel) resulted in a net reduction in churn because it also removed the negative sentiment that was causing users to cancel earlier in the subscription cycle. A frictionless cancellation is also the legally compliant state the company is currently in. Alternative 3 (replaces the pre-ticked auto-renew at higher rate): Replace the pre-ticked higher-rate auto-renew with an explicit renewal notification design — a proactive email and in-app notification 30 days before renewal that displays: the current plan and price, the renewal date, any price change (if applicable), and a clear one-click option to either confirm renewal or change plan. Users who receive transparent renewal notifications have significantly lower post-renewal dispute rates than users who are surprised by a higher charge — and post-renewal disputes are a direct financial cost (chargebacks). Alternative 4 (replaces the fabricated countdown timer): Remove the countdown timer (it is legally a misrepresentation under UCPRT 2008 and cannot be implemented). Replace the account deletion confirmation page with a genuine exit survey: ask the user why they are leaving (5 options + free text), offer a relevant retention option based on their answer, and confirm the deletion clearly without countdown manipulation. The exit survey data has secondary value: it tells the product team why users are deleting accounts, which is a more valuable insight than a fabricated urgency mechanism.

If the VP of Growth Proceeds Anyway

If the VP of Growth proceeds with the implementation over the designer's documented objection: escalate in writing to the Chief Product Officer and the company's General Counsel. The escalation is not a complaint — it is a risk notification. Write: "I want to flag for your awareness that 2 of the 4 proposed growth changes scheduled for the next sprint may expose the company to regulatory action under [specific regulations]. I raised these concerns with [VP of Growth name] on [date] and proposed alternatives. I want to ensure that a legal review occurs before implementation." This escalation creates a documented record of the design function's risk identification and protects the designer professionally. It also creates the legal review pathway that the regulations require — if the company's legal team agrees with the regulatory concern, the implementation will be paused. If the legal team reviews and approves the implementation: the designer has fulfilled their professional obligation and the implementation risk is the company's informed choice.

Why Dark Patterns Fail the Business Case Over 12 Months

Present this evidence to the VP of Growth in the initial conversation: (1) Short-term retention, long-term churn acceleration: users who feel trapped do not reduce their churn intent — they delay it to the first available opportunity (contract anniversary, billing cycle change) and churn with maximum hostility. (2) Support cost: subscription companies using phone-only cancellation processes report 3–5× higher customer service contact rates from cancellation-intent users who cannot complete the online flow — each contact costs £4–£8 in call centre time. For a £180M revenue company with significant subscriber volume, this is a measurable cost impact. (3) Chargeback risk: users who cannot cancel online at a significantly elevated rate dispute the charge with their bank; chargebacks cost approximately £15–£25 each in processor fees and administrative cost, and a chargeback rate above 1% triggers merchant account review. (4) Regulatory enforcement: the CMA's current subscription trap investigation has resulted in enforcement notices, remediation requirements, and public commitments from named companies — each generating press coverage that is equivalent to a significant brand damage event.

Early Warning Metrics:

  • Complaint escalation rate after any retention design change — measure calls and emails citing difficulty cancelling; any increase above the pre-change baseline is an immediate signal to review the change
  • Chargeback rate — monthly measurement; alert threshold at 0.5% (well below the processor's 1% review trigger, providing a buffer for remediation)
  • NPS movement in the 60-day cohort post-cancellation — users who are retained through the save flow should show a NPS score improvement (they stayed because they got genuine value); users who are retained through dark patterns show NPS score deterioration (they stayed because they felt trapped)

4. Interview Score: 10 / 10

Why this demonstrates staff-level maturity: Naming the specific regulatory instruments (CMA October 2022 subscription trap guidance, Consumer Contracts Regulations 2013, UCPRT 2008) rather than making a generic "this is against regulations" statement demonstrates the regulatory literacy that makes a designer's ethical objection commercially credible. The four ethical alternatives — each directly replacing one of the four proposed dark patterns with a design that can achieve a real retention improvement — demonstrates that this designer is a business partner who solves the commercial problem, not a compliance officer who blocks it. The escalation strategy (written escalation to General Counsel as a risk notification, not a complaint) shows the organisational navigation sophistication of a principal-level designer who understands how to protect both the company and the design function's integrity simultaneously.

What differentiates it from mid-level thinking: A mid-level designer would either comply (rationalising that the VP of Growth must know what they are doing) or refuse with a principled but commercially ineffective objection. They would not know about the specific UK regulatory frameworks that create legal exposure, would not have 4 specific ethical alternative designs ready within the same sprint timeline, and would not know to escalate in writing to legal counsel as a risk notification rather than a personal objection.

What would make it a 10/10: This response scores a 10/10 for demonstrating the complete range of skills this question tests: regulatory knowledge, ethical clarity, commercial alternatives, stakeholder management strategy, escalation protocol, and long-term business case evidence. A theoretical improvement would be including a specific worked example of the "honest offboarding" save flow — showing the exact screens, copy, and interaction model that achieves the VP of Growth's retention target through ethical design.


Question 11: Information Architecture — Redesigning Navigation for a Scaled Product

Difficulty: Senior | Role: UX Designer | Level: Senior | Company Examples: Salesforce, HubSpot, Atlassian Jira, Zendesk, Adobe Creative Cloud


The Question

You are a Senior UX Designer at a B2B SaaS company. The product has grown from 12 features at launch to 87 features over 6 years through acquisitions and organic development. User research shows: 41% of users cannot find a feature they know exists without using search; the navigation has 3 inconsistent structures depending on which product area you are in; and new users take an average of 19 days to become proficient, vs. 7 days for a direct competitor. The VP of Product has asked you to redesign the information architecture. However, the existing 50,000 users have learned the current navigation — any restructuring will disrupt their muscle memory. Walk through your IA research process, your redesign approach, and how you manage the transition for existing users.


1. What Is This Question Testing?

  • IA research methodology — knowing which research methods answer IA-specific questions: card sorting (open card sort to discover how users mentally categorise features; closed card sort to test a proposed structure) and tree testing (testing navigation findability in a text-only prototype without visual design cues — isolating the IA from the interface); a designer who skips these methods and redesigns the IA from personal intuition will produce a structure that reflects the designer's mental model, not the users'
  • Scale complexity — redesigning the IA of a product with 87 features across multiple acquired codebases is not a single-designer task; the designer must manage the research, the synthesis, and the design validation while engaging the product managers for each feature area (who have strong views about where their feature should live in the hierarchy)
  • Transition design — changing the navigation of a 50,000-user product is a significant disruption event; the transition design (how existing users learn the new structure without losing productivity) is as important as the new IA itself; this is an area where junior designers routinely underinvest because the transition is not visible in a prototype
  • Analytical rigour — the 19-day proficiency gap vs. the competitor's 7 days is a measurable business impact; the IA redesign must be validated against this specific metric, not just against a findability improvement in research conditions
  • Stakeholder management — 87 features owned by multiple product managers means 87 opinions about where each feature should appear in the navigation; the IA designer must facilitate this negotiation using research data as the arbiter rather than allowing organisational politics to determine the navigation hierarchy
  • Systems thinking — an IA redesign affects more than navigation: it affects the URL structure (SEO implications for web), help documentation (every help article that references navigation must be updated), onboarding flows (the new user onboarding must teach the new structure), API documentation (if the product has a public API with path-based endpoints), and marketing materials (product screenshots showing the old navigation); the full scope of the redesign must be understood before it is committed to

2. Framework: Information Architecture Redesign Model (IARM)

  1. Assumption Documentation — Audit the 87 features: what is the usage frequency of each feature (monthly active users per feature, available from Amplitude/Mixpanel)? What percentage are rarely used (<5% of MAU) and could be demoted from primary navigation to a secondary location? What are the 15 features that 80% of users use weekly — these are the navigation's primary real estate tenants
  1. Constraint Analysis — 50,000 existing users with learned navigation patterns, multiple product managers with political stakes in navigation placement, 3 inconsistent navigation structures that must be unified, SEO and URL structure implications of any restructuring
  1. Tradeoff Evaluation — Big-bang migration (change all navigation simultaneously on a release date) vs. phased migration (introduce a new navigation structure alongside the old, allowing users to switch voluntarily before the old structure is retired); phased migration is lower risk but requires maintaining two navigation systems in parallel for 3–6 months — a significant engineering cost
  1. Hidden Cost Identification — Help documentation update cost: 87 features with an average of 3 help articles each = 261 help articles that reference navigation; every article that says "go to Settings > Integrations > Webhooks" must be updated to reflect the new path; this is a significant content operations cost that is almost never included in the IA redesign estimate
  1. Risk Signals / Early Warning Metrics — First-attempt navigation success rate (tree testing result for the new structure — target 75%+ before any production release), search usage rate post-redesign (if search usage increases after the IA redesign, users are still not finding things through navigation — the redesign has not solved the problem), time-to-proficiency for new users in the 90-day cohort after release (the primary success metric — target 7 days, matching the competitor benchmark)
  1. Pivot Triggers — If tree testing of the proposed new IA shows first-attempt success rates below 65% for the 5 highest-priority tasks, the IA structure is not an improvement over the current state; return to the card sort data and identify where the proposed structure diverges from the users' mental model; do not release an IA with sub-65% tree test success rates
  1. Long-Term Evolution Plan — IA redesign V1: unify the 3 navigation structures, establish a coherent primary navigation with 6–8 top-level sections; IA governance: establish a navigation committee (representatives from design, product, and engineering) that reviews all new feature placements before they are released, preventing the IA fragmentation that led to the current state

3. The Answer

Explicit Assumptions:

  • The product: a marketing automation platform; the 87 features span: campaigns (email, social, paid), contacts and segmentation, analytics and reporting, integrations, account settings, and a recently acquired CRM module that has its own navigation structure
  • Usage frequency audit: top 15 features by MAU account for 78% of all feature interactions; 31 features have <5% MAU (these are likely candidates for secondary navigation placement)
  • The 3 inconsistent navigation structures: the core product uses a left sidebar; the analytics module uses a top navigation bar; the acquired CRM uses a mega-menu
  • Research tools available: Optimal Workshop (card sorting and tree testing), Amplitude (usage data), UserZoom (moderated usability testing)

Phase 1: Feature Prioritisation and Usage Analysis

Before any user research, build the feature prioritisation matrix from Amplitude data: plot all 87 features on a 2×2 grid of usage frequency (x-axis) vs. user-reported importance (from a quarterly survey question: "how important is this feature to your daily workflow?"). This produces 4 quadrants: high frequency + high importance (core navigation — always visible), low frequency + high importance (accessible but not primary — second level of navigation), high frequency + low importance (power user shortcuts — may need persistent access for a subset), low frequency + low importance (candidates for progressive disclosure, settings burial, or removal from navigation entirely). The 31 features with <5% MAU are not automatically demoted — some are critical but infrequent (for example, "export account data" is important once a year but not monthly); the user-reported importance dimension prevents demoting features that users value but rarely need. Share this matrix with all affected product managers before the user research begins — it establishes that navigation placement will be data-driven, not decided by who argues most loudly.

Phase 2: Open Card Sort to Discover Mental Models

Run an open card sort with 20 participants (10 existing power users, 10 users with less than 6 months tenure — the mental model difference between these two groups is diagnostically important). Each participant receives 60 cards — one per feature, with the feature name and a one-sentence description — and groups them into categories of their own choosing, naming each category. The open card sort reveals: how users mentally categorise the 87 features (which features belong together in users' minds, regardless of the current navigation structure), the language users use to describe feature groups (their category names become the candidate navigation labels — often different from the product team's internal terminology), and the difference between power user and newer user mental models (power users may have a workflow-based mental model; newer users may have an object-based model — this tension shapes the IA strategy). Analyse with Optimal Workshop's dendrogram and similarity matrix — features that are consistently co-grouped by 70%+ of participants belong together in the navigation; features that are grouped inconsistently require further investigation.

Phase 3: Propose and Tree Test

From the card sort synthesis, design 2 proposed IA structures (not just 1 — having a comparator enables the tree test to identify which structure is better, not just whether a single structure meets the minimum standard). Each proposed IA has 6–8 top-level sections with a maximum of 3 levels of nesting (flat hierarchies are faster to navigate — every additional level of nesting adds approximately 2 seconds to the average navigation task time in eye-tracking studies). Run a tree test with 50 participants (Optimal Workshop Treejack) for each of the 2 proposed structures. Tree testing uses a text-only representation of the navigation hierarchy — no visual design, no search, no breadcrumbs — to isolate IA findability from interface design. Test 8 tasks representing the 8 most common navigation scenarios: "find where to create a new email campaign," "find where to view your contact database," "find where to set up a Salesforce integration." For each task, measure: first-attempt success rate (did the participant select the correct location on the first try?), directness (did they navigate directly or backtrack?), and task completion rate (did they find the correct location at all?). Target for the winning structure: 75%+ first-attempt success rate across all 8 tasks. The tree test eliminates the navigation political debate — product managers whose feature tested at 55% first-attempt success rate cannot argue that the placement is correct based on personal opinion.

The Redesigned IA: From 3 Structures to 1

The winning structure (from the tree test data) consolidates the 3 inconsistent navigation systems into a single persistent left sidebar with 7 top-level sections: Campaigns (email, social, paid, automations — unified under the workflow output, not the channel), Contacts (contact database, segmentation, lists — the CRM module's contacts are now unified here, not in a separate module), Analytics (all reporting across all channels — previously split between the core analytics module and channel-specific reporting tabs), Integrations (all third-party connections — previously buried in Settings), Templates (email templates, landing pages, form templates — previously in 3 different locations), Settings (account, billing, team management, API keys), and Help (documentation, onboarding, contact support). The CRM module's top navigation and the analytics module's top navigation are retired; both modules' features are absorbed into the primary sidebar structure. This eliminates the 3-structure inconsistency and provides a single navigational model across the entire product.

Managing the Transition for 50,000 Existing Users

A navigation change for 50,000 users cannot be a silent release. The transition strategy has 4 components: (1) Announcement before release: 30 days before the new navigation goes live, publish a "What's changing" in-app notification and email to all users. Show a side-by-side map of the old and new navigation structure — "Where does X go?" for every top-level section. The 30-day advance notice allows power users to mentally prepare for the change without experiencing it as a surprise. (2) Guided first-run experience: on first login after the release, present a 3-step interactive onboarding overlay that highlights the 3 locations that have moved most significantly (based on the tree test data — the tasks with the lowest current first-attempt success rate are the ones most likely to confuse users post-migration). The overlay must be skippable and re-accessible from the Help menu. (3) Legacy navigation labels with redirect: for 60 days post-release, the old top-navigation items (for users who accessed the product via direct URL bookmark) redirect to the new location with a one-time banner: "Analytics has moved to [new location]. Update your bookmark." After 60 days, the redirect is removed. (4) In-context help tooltips: for the 5 features that moved most significantly, display a one-time contextual tooltip on first access: "You found it! This used to be in [old location]. It now lives here." This converts the muscle-memory disruption moment (the user navigated to the old location and found nothing) into a learning moment rather than a frustration moment.

Early Warning Metrics:

  • Search usage rate in the 30 days post-release — if search usage increases compared to the 30-day pre-release baseline, users are not finding features through the new navigation; investigate which search queries are most common (these are the features whose new location is not being found) and add targeted in-context help for those specific features
  • Navigation support ticket volume — monitor the support team's ticket queue for navigation-related issues ("I can't find X," "where did Y go?") in the 30 days post-release; target: less than 3× the pre-release baseline; if above this threshold, consider extending the legacy redirect period and adding additional in-context help
  • Time-to-proficiency for new users at Month 3 post-release — measure the 90-day cohort's time from sign-up to first completion of 5 core tasks; compare to the pre-redesign baseline (19 days) and the competitor benchmark (7 days); this is the primary success metric that justifies the redesign investment

4. Interview Score: 9 / 10

Why this demonstrates senior-level maturity: The feature prioritisation matrix (usage frequency × user-reported importance) as the pre-research input that depoliticises navigation placement decisions is the organisational intelligence that makes the IA redesign process defensible to 87 product managers. Running 2 proposed IA structures through tree testing (not 1) produces comparative data that identifies the better structure rather than simply validating a single proposal. The transition strategy — 30-day advance notice, guided first-run overlay, 60-day legacy redirects, in-context "you found it" tooltips — covers the full user impact of the change, not just the design of the new state.

What differentiates it from mid-level thinking: A mid-level designer would redesign the navigation based on personal judgement about how features should be grouped, validate it with a quick 5-person usability test, and release it with a "what's new" email. They would not know about card sorting and tree testing as specific IA research methods, would not build the feature prioritisation matrix from analytics data, and would not think about the help documentation update cost, the URL structure implications, or the 60-day legacy redirect strategy.

What would make it a 10/10: A 10/10 response would include a specific Optimal Workshop tree test configuration (showing the 8 task prompts, the scoring methodology, and the minimum participant count for statistical confidence), a worked dendrogram analysis showing how card sort results are interpreted into navigation groupings, and a complete transition communication plan with the specific copy for the 30-day advance notice email.



Question 12: Content Design — Writing UX Copy That Drives Action

Difficulty: Senior | Role: UX Designer / Content Designer | Level: Senior | Company Examples: Monzo, Duolingo, Mailchimp, Stripe, GOV.UK


The Question

You are a Senior UX Designer responsible for the content design of a B2C personal finance app. A usability study has found 3 specific copy problems: (1) The empty state on the spending analysis screen reads "No data to display" — 67% of test participants did not understand what action to take; (2) The error message when a bank connection fails reads "Error 403: Authentication failure" — 54% of participants reported feeling anxious or confused; (3) The onboarding permission request screen for push notifications reads "Allow notifications to receive updates" — the opt-in rate is 23%, vs. an industry benchmark of 48% for personal finance apps. Rewrite the copy for all 3 scenarios, explain the content design principles behind each rewrite, and walk through your process for ensuring copy quality across a product with 200+ screens.


1. What Is This Question Testing?

  • Content design expertise — understanding that UX copy is not copywriting (marketing language designed to persuade) and not technical writing (documentation designed to inform) — it is content design (words designed to help users accomplish tasks); good UX copy is invisible when it works and immediately noticeable when it fails
  • Empathy and emotional intelligence — the "Error 403: Authentication failure" message causing anxiety is a signal that the copy has failed its most basic job: to help a user recover from a problem without making them feel confused, blamed, or anxious; error messages are the highest-stakes copy in a product because they occur at moments of user frustration
  • Conversion and behavioural design — the 23% notification opt-in rate (vs. 48% benchmark) is a content design problem with a measurable business impact; the current copy ("receive updates") does not communicate value; the rewrite must communicate specific, personal, and immediate value to the user's specific context (a personal finance app user cares about unusual spending alerts, payment reminders, and budget warnings — not "updates")
  • Content principles — knowing the content design principles that govern every word decision: plain language (write for a Grade 8 reading level), action-first (lead with the verb the user should take, not a description of the situation), human tone (write as if speaking to an intelligent adult, not as if writing a legal notice), and progressive disclosure (say the minimum necessary at the moment of the task; provide more on demand)
  • Process thinking — maintaining copy quality across 200+ screens requires a content design system: a voice and tone guide, a content pattern library (standard patterns for empty states, errors, confirmation messages, and permission requests), and a content review process embedded in the design workflow; without these, copy quality regresses to the individual designer's or developer's default language
  • Collaboration — content design touches every screen; in most product teams, copy is written by designers, developers, product managers, and marketing simultaneously without coordination; the senior content designer must establish a process that creates consistency without requiring their personal review of every string

2. Framework: Content Design Quality Model (CDQM)

  1. Assumption Documentation — Establish the app's voice and tone baseline: is the brand voice established (Monzo-style conversational vs. Barclays-style formal)? Is there an existing content style guide? What is the target reading level? What emotional register is appropriate for financial content (reassuring without being condescending, honest without being alarming)?
  1. Constraint Analysis — 200+ screens with copy written by multiple authors over the product's history; character limits on mobile (especially iOS notification copy: maximum 63 characters on lock screen before truncation); localisation requirements (if the app is available in multiple languages, the copy architecture must support translation without losing meaning)
  1. Tradeoff Evaluation — Centralised content review (one content designer reviews all copy — high quality, bottleneck risk) vs. distributed content design with shared standards (all designers write copy following shared standards, with content review at the design system component level rather than individual screen level); for a product with 200+ screens, distributed is the only scalable model
  1. Hidden Cost Identification — Localisation cost: a notification rewrite from "receive updates" to a specific, longer message increases the character count; if the app is localised into 6 languages, longer copy increases translation cost and may cause truncation in languages with longer average word length (German, Finnish); the content designer must balance copy quality against localisation implications
  1. Risk Signals / Early Warning Metrics — Notification opt-in rate post-rewrite (direct before/after measurement using A/B testing of the old vs. new permission request copy), error recovery rate (what percentage of users who see the bank connection error successfully reconnect their bank within the session — measures whether the error message copy is directing users to the correct action), empty state engagement rate (what percentage of users who see the empty state complete the action it recommends)
  1. Pivot Triggers — If the rewritten notification permission copy does not improve opt-in rate above 35% in the first 30-day cohort: the copy is not the primary driver of the low opt-in rate; the problem may be the timing of the request (asking for notifications before the user has experienced any product value) rather than the copy itself
  1. Long-Term Evolution Plan — Immediate: rewrite the 3 flagged copy instances; Month 1: content audit of the highest-traffic 30 screens; Month 2–3: voice and tone guide and content pattern library; Month 4–6: content review integration into the design workflow; Year 1: full content audit and systematic rewrite programme

3. The Answer

Explicit Assumptions:

  • App voice: warm and plain-spoken, like a knowledgeable friend rather than a bank; the app's existing marketing copy uses conversational contractions ("you've," "we'll") and direct address
  • The bank connection error: the user has connected their bank via Open Banking; the connection has expired (requires re-authentication every 90 days under PSD2); the error is recoverable by the user re-authorising the connection
  • The push notification permission request: shown after the user has set up their account and linked their bank, before they have seen any actual notifications; the request is shown once; iOS does not allow a second request if the user declines

Rewrite 1: Empty State — "No data to display"

Current copy: "No data to display" Problem: "No data" is a system-facing description of a database state, not a user-facing explanation of a situation. It does not tell the user why there is no data, whether this is expected, or what they should do. 67% of users not knowing what action to take is the expected outcome of this copy. Content principle applied: action-first + context-aware empty states. An empty state must answer 3 questions for the user: what is this screen for, why is it empty right now, and what can I do to make it useful? Rewrite — Headline: "Your spending analysis starts here" Body: "Connect a bank account to see where your money goes each month." CTA button: "Connect a bank" Why this works: the headline sets positive expectation (this will be useful, not broken). The body explains the cause of the empty state (no bank connected) and the benefit of fixing it (see where your money goes — the specific value proposition of the feature). The CTA button uses a specific verb ("connect") rather than a generic one ("get started") — the user knows exactly what tapping the button will do. The copy also removes the anxiety of "no data" (which could imply a technical error) and replaces it with an invitation to set up.

Rewrite 2: Error Message — "Error 403: Authentication failure"

Current copy: "Error 403: Authentication failure" Problem: error codes (403) are internal server status codes that are meaningless to users and create the impression that something is seriously wrong with the system. "Authentication failure" is a technical term that a large proportion of users will interpret as "I've done something wrong" or "my account has been hacked." The 54% anxiety response is the expected outcome of this copy. Content principles applied: human tone + blame-free framing + recovery path. An error message must: explain what happened in plain language, reassure the user that it is not their fault (if it isn't), and give a specific, actionable next step. Rewrite — Headline: "We need you to reconnect your bank" Body: "Your [Bank Name] connection has expired — this happens every 90 days for your security. It only takes a moment to reconnect." CTA button: "Reconnect [Bank Name]" Secondary link: "Why does this happen?" Why this works: "we need you to" is active and warm — it creates a collaborative frame rather than a failure frame. The explanation ("happens every 90 days for your security") addresses both the cause and the reason, preventing the anxiety of "has something gone wrong?" The "(Bank Name)" personalisation from the Open Banking integration makes the message specific rather than generic — a user who sees "your Barclays connection has expired" understands the exact scope of the problem. "It only takes a moment to reconnect" addresses the anticipated friction of the required action. The secondary "Why does this happen?" link provides progressive disclosure for users who want the fuller explanation without burdening the primary message with a paragraph of PSD2 explanation.

Rewrite 3: Push Notification Permission Request — "Allow notifications to receive updates"

Current copy: "Allow notifications to receive updates" Problem: "receive updates" is the least specific possible description of what push notifications will do for the user. It describes the mechanism (you will receive updates) without describing the value (updates about what? what will that help me do?). A personal finance app user thinking about enabling notifications is weighing: the value of the notifications vs. the perceived intrusion of allowing an app to send push notifications; "receive updates" does not tip this calculation toward opt-in. Content principles applied: value-first framing + specificity + trust signals. The permission request copy must answer: what specific notifications will I receive, why will those be useful to me, and is this going to be noise or signal? Rewrite — Headline: "Stay on top of your money" Body: "Turn on notifications to get: unusual spending alerts before they become a problem, payment reminders so you never miss a bill, and weekly spending summaries sent every Monday." Toggle option below: "Manage exactly which alerts you want" CTA button: "Turn on notifications" Why this works: "stay on top of your money" directly addresses the user's goal (financial control) rather than the product's mechanism. The three bullet points are specific and personal — they describe the actual notifications the user will receive, not a generic promise of "updates." Each bullet is written in terms of benefit to the user, not features of the system ("unusual spending alerts before they become a problem" — not "unusual transaction notifications"). The "manage exactly which alerts you want" link below the copy addresses the #1 objection to notification opt-in (fear of being spammed) by offering granular control before the opt-in decision — this specific addition has been shown to increase notification opt-in rates in fintech by 8–15 percentage points by reducing the perceived cost of opting in. A/B test this rewrite against the original to validate the improvement before releasing to all users.

Maintaining Copy Quality Across 200+ Screens

Three systems that scale content quality without creating a bottleneck: (1) Voice and Tone Guide: a 2-page document (not a 40-page brand bible) that defines the app's voice in 4 adjectives with 3 writing examples and 3 anti-examples for each. For this app: Warm (not patronising), Plain (not jargon-filled), Honest (not alarmist), Direct (not bureaucratic). Every example pair shows the app's actual copy situations: a correct and incorrect version of an error message, a correct and incorrect version of a success confirmation, a correct and incorrect version of a permission request. (2) Content Pattern Library in Figma: standard copy patterns for the 6 most common UI copy situations — empty states, error messages, success confirmations, permission requests, onboarding tooltips, and destructive action confirmations. Each pattern has a copy template with fill-in fields: [FEATURE NAME] is empty because [REASON]. [ACTION] to [BENEFIT]. Each template is a Figma component in the design library — when a designer adds an empty state component, the template copy guides them to write useful copy rather than defaulting to "No data to display." (3) Copy Review Checkpoint: add a "copy review" step to the design handoff process — not a separate copy review session, but a 15-minute copy audit by the content designer (or a trained peer) as part of the existing design review. Review the copy against 5 criteria: Is it written in plain language (Grade 8 or below)? Does it lead with an action verb where action is required? Does it avoid technical jargon and error codes? Does it tell the user what to do next? Is it consistent with the voice and tone guide? This is not a line-edit of every word — it is a pass/fail audit against 5 principles. Five minutes of copy review at design stage prevents the "Error 403: Authentication failure" from reaching 54% of users.

Early Warning Metrics:

  • Empty state engagement rate — after the rewrite, measure the percentage of users who see the new empty state and tap the CTA (connect a bank) vs. the percentage who leave the screen; target 40%+ CTA tap rate (from an estimated 15% current rate given 67% confusion)
  • Error recovery rate — percentage of users who see the bank reconnection error and successfully reconnect within the same session; target 65%+ recovery rate (current unknown but likely low given the confusing copy)
  • Notification opt-in rate in the 30-day cohort post-rewrite — direct comparison to the current 23% baseline; target 38–42% in the first cohort; A/B test should confirm within 14 days with sufficient statistical power given the app's daily new user volume

4. Interview Score: 9.5 / 10

Why this demonstrates senior-level maturity: The rewritten notification permission request specifically includes the "manage exactly which alerts you want" link as a mechanism for reducing the perceived cost of opt-in — not just rewriting the headline — which demonstrates knowledge of the specific behavioural barrier (fear of notification spam) that drives the low opt-in rate. The content pattern library as Figma components with copy templates (guiding designers to write useful copy from a fill-in template) is a scalable content quality system, not a process that requires the content designer's personal review of every string. The A/B test recommendation for the notification copy (with the 14-day timeline and statistical power calculation framing) shows measurement discipline.

What differentiates it from mid-level thinking: A mid-level designer would rewrite the 3 copy instances with improved language but would not identify the structural principles behind each rewrite, would not propose the "manage alerts" secondary link as an opt-in conversion mechanism, and would not design the 3-system content quality programme (voice and tone guide, content pattern library, copy review checkpoint) that prevents the same problems from recurring across the other 197 screens.

What would make it a 10/10: A 10/10 response would include the complete 2-page voice and tone guide with all 4 voice dimensions, their example pairs, and their anti-patterns; a specific A/B test plan for the notification copy rewrite showing the success metric, minimum detectable effect, required sample size, and test duration; and a worked content audit template for the 30-screen priority audit.



Question 13: Prototyping and Testing — Validating High-Risk Design Decisions Fast

Difficulty: Senior | Role: UX Designer | Level: Senior | Company Examples: Google Ventures Design Sprint, IDEO, Figma prototyping teams, McKinsey Design


The Question

You are a Senior UX Designer at a healthcare platform company. The product team is about to commit £2.4M in engineering resource to build a new patient appointment self-scheduling feature — a complete redesign of how patients book, modify, and cancel appointments with their GP. The current system requires patients to call the surgery. The proposed feature is complex: it must integrate with 12 different GP surgery appointment systems, handle 8 types of appointment (routine, urgent, telephone, video, home visit, mental health, chronic disease review, baby clinic), enforce surgery-specific booking rules (same-day urgent appointments only before 10am, routine appointments bookable 2–6 weeks in advance), and serve patients aged 18–88 with varying digital literacy. The Head of Engineering believes the design is ready to build. You believe 3 core assumptions in the design have not been validated. Walk through which assumptions you would validate, how you would prototype and test them, and what your exit criteria are for approving the engineering build.


1. What Is This Question Testing?

  • Assumption identification — the most valuable skill in pre-build validation is knowing which assumptions carry the highest risk if wrong; not all design assumptions are equally consequential; a wrong assumption about the button colour has zero engineering rework cost; a wrong assumption about how patients distinguish between 8 appointment types could require a complete redesign of the core booking flow after the £2.4M has been spent
  • Prototyping strategy — knowing which prototype fidelity is appropriate for which assumption test; a high-fidelity prototype is not necessary to test whether patients can distinguish between appointment types — a paper prototype or a low-fidelity Figma prototype tests this cognitive task at a fraction of the time cost; over-investing in fidelity before an assumption is validated is a common designer mistake that delays the learning
  • Exit criteria thinking — "the design is ready to build" is a statement, not a criterion; the correct frame is: what specific evidence would give you confidence that the design is ready to build? exit criteria are the specific pass/fail thresholds that the design must meet before engineering begins; without exit criteria, every design validation produces subjective ("it went okay") rather than actionable ("8 of 10 participants completed the appointment booking task unassisted in under 3 minutes — exit criterion met") outcomes
  • Healthcare domain knowledge — GP appointment booking has specific regulatory and clinical dimensions: same-day urgent appointments have clinical triage implications (a patient who books a same-day slot for a routine matter displaces a patient with genuine urgent need); the design must enforce the surgery's booking rules without creating workarounds; the 8 appointment types are not arbitrary — each has different clinical purposes and different patient needs
  • Risk proportionality — £2.4M in engineering commitment is a high-stakes decision; the cost of validating 3 assumptions (estimated £15K–£25K in research and prototype time) against the rework cost if assumptions are wrong (estimated 20–40% of the £2.4M = £480K–£960K) is a straightforward risk calculation that makes the validation investment self-evidently worthwhile
  • Stakeholder management with engineering — the Head of Engineering believes the design is ready; the UX designer is introducing delay and is accountable for the quality of that judgment; the designer must be specific about what they are validating, how long it will take, and what the exit criteria are — not a vague "we need more research" position that engineering will (reasonably) reject

2. Framework: Pre-Build Assumption Validation Model (PBAVM)

  1. Assumption Documentation — List every assumption in the current design that, if wrong, would require significant rework; rate each assumption by: confidence (how much evidence exists already?), rework cost if wrong (a wrong navigation structure = 40% engineering rework; a wrong button label = 0% rework), and testability (can this assumption be tested in the next 2 weeks?); prioritise the low-confidence, high-rework-cost assumptions
  1. Constraint Analysis — Engineering commitment timeline (the Head of Engineering has a sprint planning session in 2 weeks — the validation must be complete before that session or the assumption remains untested before £2.4M is committed), prototype complexity vs. test validity (the prototype must be realistic enough to test the core assumption but not so detailed that building it takes longer than the insight is worth)
  1. Tradeoff Evaluation — Test everything (comprehensive, takes 6 weeks, engineering waits) vs. test only the 3 highest-risk assumptions (focused, takes 2 weeks, engineering proceeds with validated core assumptions); the answer is always the second option — validating 3 high-risk assumptions in 2 weeks is better than validating 20 assumptions in 6 weeks; the remaining lower-risk assumptions can be validated in beta
  1. Hidden Cost Identification — The cost of a wrong assumption at build stage: if the appointment type selection design is wrong and requires a redesign, the rework affects: the booking flow UI (React components), the backend booking rules engine integration (API redesign), the GP system integrations (12 different connectors), and the mobile app (native iOS and Android), and the help documentation; a single wrong assumption cascades to 5 engineering systems
  1. Risk Signals / Early Warning Metrics — Assumption validation pass rate in the first 3 usability tests (if the first 3 participants all fail the same task, do not wait for all 10 — the assumption is failing and the design must change); exit criterion miss by margin (a task completion rate of 72% vs. a 75% exit criterion is a borderline case that requires a design iteration and retest, not a build decision)
  1. Pivot Triggers — If assumption validation reveals a fundamental flaw in the core booking model (not a UI problem but an architectural problem — for example, if patients cannot reliably distinguish between urgent and routine appointment types regardless of how the choice is presented), escalate to the Head of Product immediately; a fundamental architectural flaw discovered before build costs 2 weeks of research; the same flaw discovered after build costs £960K in rework
  1. Long-Term Evolution Plan — Pre-build validation: 3 core assumptions; Beta validation: all remaining secondary assumptions with real users in a controlled rollout; Post-launch: continuous usability monitoring via FullStory session recordings and quarterly moderated research sessions with patients from underserved demographics (elderly, low digital literacy)

3. The Answer

Explicit Assumptions:

  • The current design: a 4-step booking flow (appointment type selection → date and time selection → confirmation details → confirmation screen)
  • Integration approach: a single booking UI layer that connects to 12 GP system APIs; the booking rules (same-day urgent before 10am, routine 2–6 weeks advance) are enforced by the API layer, not the UI
  • Patient population: the platform serves 340,000 registered patients; target demographic for the new self-scheduling feature is 18–75 (patients 75+ are excluded from the initial rollout based on the platform's accessibility assessment)

The 3 Highest-Risk Unvalidated Assumptions

Assumption 1 — Patients can reliably distinguish between the 8 appointment types and select the correct one for their need. Risk level: critical. Rework cost if wrong: high (the appointment type is the first decision in the booking flow; if patients cannot make this decision confidently and accurately, the entire subsequent booking flow breaks down; incorrect appointment type selection is also a clinical safety risk — a patient who books a routine appointment for chest pain is not getting appropriate triage). Current evidence: zero — the 8 appointment types were defined by the clinical and operational team, not validated with patients. Prototype needed: low-fidelity — a Figma prototype showing only the appointment type selection screen with the 8 options as currently designed; no further screens needed for this test. Test: 10 participants (age range 25–70, recruited to match the platform's patient demographic). Present each participant with 5 clinical scenarios ("You've been having low-level back pain for 3 weeks. Which appointment type would you choose?") and ask them to select an appointment type for each scenario. Exit criterion: 80% of selections are clinically appropriate for the scenario (as validated by a clinical advisor on the project team). If below 80%: the appointment type labels and descriptions require redesign before the build.

Assumption 2 — Patients can navigate the booking rules (same-day urgent only before 10am, routine 2–6 weeks advance) without assistance. Risk level: high. Rework cost if wrong: moderate (the booking rules enforcement is in the API layer, but the UI's error states and recovery flows for rule violations are a significant design and engineering surface). Current evidence: the booking rules error states have been designed but not tested with users in realistic time-pressure scenarios. Prototype needed: medium-fidelity — a Figma prototype that simulates the 3 most common booking rule violations: attempting to book a same-day urgent appointment after 10am, attempting to book a routine appointment for tomorrow (1 day in advance), and attempting to book a routine appointment for 8 weeks in advance (outside the 6-week window). Test: 8 participants. Task: attempt to book a same-day urgent appointment on a prototype set to 11:30am. Observe whether participants understand the error message and find an alternative action (book a telephone triage call instead). Exit criterion: 7 of 8 participants who encounter a booking rule violation successfully find an alternative booking option within 2 minutes without assistance. If below this: the error states and recovery flows require redesign.

Assumption 3 — Patients with low digital literacy (aged 60–75) can complete a booking unassisted in under 5 minutes. Risk level: high. Rework cost if wrong: high (the feature has been designed primarily by and for a digital-native user experience; if the 60–75 demographic — a significant portion of GP appointment demand — cannot use it unassisted, the feature will not reduce call volume for this segment, directly undermining the business case). Current evidence: all usability testing to date has been conducted with participants aged 25–45. Prototype needed: high-fidelity interactive prototype (for this demographic, a low-fidelity prototype introduces uncertainty about whether the interaction failures are prototype limitations or design failures; the higher fidelity reduces this confound). Test: 8 participants aged 60–75 with self-reported low-to-moderate smartphone confidence. Task: book a routine appointment with your GP for a repeat prescription review, using the prototype on an iPhone SE (small screen — the device size this demographic commonly uses). Exit criterion: 6 of 8 participants complete the booking unassisted in under 5 minutes with a task success rate of 100% (no wrong path selections that required backtracking to the start). If below this: redesign the flow for the 60–75 demographic before the engineering build; consider adding an "I need help booking" option that connects to a phone booking assistant.

Presenting the Validation Plan to the Head of Engineering

The conversation with the Head of Engineering must be specific on three dimensions: what is being validated (the 3 assumptions above), how long it will take (2 weeks — 1 week to build the 3 prototypes and recruit participants, 1 week to conduct and analyse the sessions), and what the exit criteria are (the specific pass/fail thresholds above). Frame it as a risk calculation: "These 3 assumptions, if wrong, carry an estimated rework cost of £480K–£960K based on the scope of affected engineering systems. The validation takes 2 weeks and costs approximately £12K in researcher and designer time. Even at the lower rework estimate, the validation has a 40:1 return. I'm not asking to delay the build — I'm asking for 2 weeks to derisk £2.4M of engineering investment." This is not a "we need more research" argument — it is a financial risk management argument that engineers and product leaders respond to.

Exit Criteria Summary

Assumption 1 (appointment type selection): 80%+ of test selections are clinically appropriate across 5 scenarios for 10 participants. Assumption 2 (booking rules navigation): 7 of 8 participants find an alternative booking option within 2 minutes of a rule violation. Assumption 3 (60–75 demographic): 6 of 8 participants complete the booking unassisted in under 5 minutes with 100% task success rate. All 3 exit criteria must be met before the engineering build is approved. If any criterion is not met: a design sprint addresses the specific failure, followed by a retest of the failed assumption only (not a full retest of all 3). The retest takes 1 week. Maximum total delay to the engineering start: 3 weeks (2-week validation + 1-week retest of any failed assumption). This is proportionate to a £2.4M engineering investment.

Early Warning Metrics:

  • Assumption validation pass rate in the first 3 participants — if the first 3 participants all fail the same task in Assumption 1 or 3 testing, stop the test and redesign; do not run all 10 participants on a design that is clearly failing; restart with the redesigned version
  • Prototype completion time — if building the medium or high-fidelity prototypes for Assumption 2 and 3 takes more than 4 days: scope down the prototype to the minimum needed to test the assumption; a prototype that takes 8 days to build and provides 2 weeks of delay is not a 2-week validation — adjust scope immediately

4. Interview Score: 9.5 / 10

Why this demonstrates senior-level maturity: Framing the validation plan as a financial risk calculation (£12K cost to prevent £480K–£960K rework risk) rather than as a "we need more research" argument is the language that makes engineering and product leadership respond to UX validation requests. The 3 assumptions are specifically chosen by rework cost if wrong (not by researcher interest or comprehensiveness) — clinical appointment type confusion is a safety risk with cascading API implications; the 60–75 demographic failure invalidates the entire call-reduction business case. The exit criteria are quantitative, specific, and binary (pass/fail) — eliminating the "it went pretty well" subjective validation that engineering leaders rightly reject.

What differentiates it from mid-level thinking: A mid-level designer would say "I'd like to do more user testing before we build" without specifying which assumptions, what prototype fidelity, how many participants, or what exit criteria define success. They would not calculate the rework cost of wrong assumptions, would not differentiate prototype fidelity by assumption type (low-fidelity for Assumption 1, high-fidelity for Assumption 3), and would not have the stakeholder management strategy for the Head of Engineering conversation.

What would make it a 10/10: A 10/10 response would include a complete assumption register for the full booking feature (all assumptions rated by confidence and rework cost), a specific clinical advisor consultation protocol for validating the appointment type selection test outcomes, and a retest protocol document showing exactly how a failed assumption triggers a design sprint and what the minimum acceptable retest evidence is.



Question 14: Design Critique — Giving and Receiving Feedback That Improves Work

Difficulty: Senior | Role: UX Designer | Level: Senior / Staff | Company Examples: Airbnb design culture, Google design reviews, IDEO critique methodology, Spotify design guilds


The Question

You are a Senior UX Designer at a 60-person product company with 8 designers. You have been asked by the Head of Design to improve the team's design critique process. The current state: critiques are ad-hoc, attendance is inconsistent, feedback is often vague ("I like it" / "I don't like it") or overly personal ("this feels off"), junior designers leave critiques feeling discouraged rather than informed, senior designers rarely show work-in-progress because they feel critique sessions are not worth the preparation time, and designs are sometimes changed based on the most recent feedback from the most senior person in the room rather than the most useful feedback. The Head of Design wants a critique process that improves design quality, develops junior designers, and creates psychological safety. Design the new critique process.


1. What Is This Question Testing?

  • Design critique expertise — understanding the difference between critique (evaluating design decisions against design principles and user goals) and opinion (personal aesthetic preference); vague feedback ("this feels off") is opinion; structured feedback ("the visual weight of the primary CTA is competing with the secondary action — here is why that creates a hierarchy problem for the user's decision making") is critique; a structured critique process prevents the former while enabling the latter
  • Psychological safety in design teams — the junior designer discouragement and senior designer avoidance are both symptoms of a psychologically unsafe critique environment; psychological safety in critique means: your work can be challenged without your competence or worth being challenged; this requires both process design (structured, principle-based feedback format) and cultural norms (critique the work, not the person; no solutions before the problem is understood)
  • Facilitation design — a critique session is a facilitated process, not a free-form meeting; the facilitator's job is to ensure that every designer gets equal quality feedback, that the most senior voice does not dominate, and that the session produces actionable design improvements rather than a list of preferences
  • Organisational thinking — a critique process that senior designers avoid is a critique process that does not improve the senior designers' work; the most valuable feedback in a design team flows in both directions (junior designers often notice UI inconsistencies and interaction patterns that senior designers overlook because of proximity to the work); designing a process that creates bidirectional feedback is both a quality and a culture goal
  • Measurement orientation — a new critique process must be evaluated against the problems it was designed to solve; measuring design quality improvement is hard but proxies exist: the number of significant design changes made post-critique (a high number means critique is discovering real problems), the proportion of critiques attended by senior designers (a measure of perceived value), and a quarterly designer satisfaction survey question about critique quality
  • Scalability — a 60-person company with 8 designers is a manageable scale for a structured critique process; the process must be designed to work with as few as 2 designers (a junior designer showing work to their design lead) and as many as 8 (a full-team critique); a process that requires 8 people to generate value is fragile

2. Framework: Design Critique Process Design Model (DCPDM)

  1. Assumption Documentation — Understand the current critique failure modes in detail: are critiques failing because of format (no structure), culture (senior voice dominance), or context (designers show finished work rather than work-in-progress, making critique feel like a judgment rather than a collaboration)? Each failure mode requires a different fix
  1. Constraint Analysis — 8 designers at various seniority levels, ad-hoc scheduling, senior designer time is scarce (they will not attend long, low-value sessions), junior designers need safety to show early work
  1. Tradeoff Evaluation — Highly structured critique (maximum consistency, may feel too formal for an 8-person design team) vs. lightly facilitated critique (more conversational, higher risk of reverting to old patterns without an experienced facilitator); for a team that currently has an unsafe critique culture, structure is the correct initial investment — reduce structure after the culture improves
  1. Hidden Cost Identification — The cost of not fixing the critique process: senior designers who avoid critique lose the benefit of external perspective on their work — their designs reflect one person's judgment rather than a team's; junior designers who leave critiques feeling discouraged disengage from the design process and leave the company sooner; designs that change based on the last senior voice rather than the best argument ship worse products
  1. Risk Signals / Early Warning Metrics — Senior designer critique attendance rate (target 80%+ of structured critique sessions attended by at least one senior designer), junior designer confidence score in the post-critique survey ("I understand what to do differently as a result of today's critique" — target 4/5 or above), design change quality score (did the design improve after the critique? — a manager judgment on whether critique led to better or worse decisions)
  1. Pivot Triggers — If the structured critique process is consistently running over time (60+ minutes for a designed 45-minute session), the number of designs presented per session is too high; reduce to one design per session and use the remaining time for discussion depth; a critique that covers 5 designs in 60 minutes produces 12 minutes of feedback per design — not enough for a meaningful improvement
  1. Long-Term Evolution Plan — Month 1–2: introduce structured critique format, train facilitators; Month 3–4: calibrate against feedback from the team; Month 5–6: introduce peer critique pairing (junior designers critique each other's work with a senior designer as a non-speaking observer); Year 1: annual critique culture retrospective

3. The Answer

Explicit Assumptions:

  • Current critique frequency: ad-hoc, roughly once per week if someone schedules it; attendance ranges from 3 to 8 people
  • Current critique format: designer presents their work, attendees give feedback in no particular order, session ends when conversation runs out
  • The Head of Design's availability: 50% time in design work, 50% in leadership; they will facilitate the first 4 structured critique sessions to model the format before handing facilitation to senior designers

The Problem Diagnosis Before the Solution

Before designing the new process, identify the root cause of each failure mode: "I like it / I don't like it" feedback is caused by the absence of a shared critical language — designers without a common vocabulary for design decisions default to personal aesthetic responses. Fix: establish a shared design principles framework (3–5 principles specific to the company's product and user needs) that all critique references. "Junior designers leave feeling discouraged" is caused by critique conflating the designer with the design — "this doesn't work" feels like "you don't work." Fix: explicit culture norms and a critique format that separates observation ("what I see") from interpretation ("what this might mean for the user") from suggestion ("what I might try"). "Senior designers avoid critique" is caused by low ROI perception — 60-minute sessions that produce vague feedback are not worth a senior designer's calendar. Fix: a 45-minute structured format with a 15-minute focused feedback window per design and a guaranteed actionable output. "Design changes based on the last senior voice" is caused by no decision framework — without a shared set of criteria, seniority becomes the tiebreaker. Fix: decision-making criteria established at the start of each critique (what problem are we solving? who is the user? what does success look like?) that ground the feedback in shared context.

The New Critique Format: C.O.R.E.

The structured critique session follows the C.O.R.E. format: Context, Observe, Respond, Edit. Each session covers one design; each session is 45 minutes; attendance target is 4–6 designers including at least one senior. Context (5 minutes — presenter only): the designer presenting their work sets the context: "This is [feature/screen name]. The user's goal is [specific task]. The design question I'm trying to answer is [specific question]. I want feedback specifically on [aspect of the design]." The "what I want feedback on" instruction is critical — it prevents the session from becoming a general design review and focuses the critique on the designer's actual uncertainty. This step also prevents senior designers from redirecting the session to their pet concerns rather than the designer's real questions. Observe (10 minutes — audience only, no feedback yet): the audience silently reviews the design and writes down observations on individual sticky notes (one observation per note, in Figma's collaborative sticky notes or a physical board). An observation is a factual statement about what the reviewer sees: "The primary CTA is the same size and weight as the secondary action." Not an opinion: "The CTA looks wrong." Not a solution: "The CTA should be bigger." After 10 minutes: each reviewer reads their observations aloud, placing them on the board. The presenter does not respond — they listen. The silent individual observation step prevents group-think (everyone copies the first person who speaks) and ensures that junior and senior designers contribute equally before any hierarchy-influenced filtering occurs. Respond (20 minutes — structured discussion): the facilitator leads a discussion of the observations. For each observation cluster, the group discusses: does this observation represent a problem for the user's task completion? (references the design context set in the opening). If yes: what design principle does this violate? (references the shared design principles framework). The "design principle" question is the mechanism that converts opinion into critique — "the visual hierarchy doesn't support the user's primary task" is not an opinion, it is a principle-based diagnosis. Solutions are explicitly discouraged in this phase — the critique identifies the problem; the designer solves it. Edit (10 minutes — presenter only): the designer synthesises the discussion into 3 actionable changes they will make before the next milestone. These are stated aloud: "Based on today's critique, I'm going to [change 1], [change 2], and explore [change 3]." The audience confirms they have understood the synthesis correctly. The session ends. The 3-change synthesis is the anti-pattern to "changed everything after the last senior voice" — the designer owns the interpretation of the feedback and states explicitly what they are taking away.

Culture Norms: The Rules of the Room

Post the following 4 norms visibly at every critique session (on a physical poster or a Figma sticky note pinned to the file): (1) Critique the work, not the designer. "The navigation doesn't help the user" — not "you've made the navigation confusing." (2) Observations before solutions. We describe what we see before we suggest what to change. (3) All voices before seniority. Everyone's observations are shared before any discussion begins. (4) Questions before statements. "I'm curious why you chose X" before "X doesn't work." These norms are not moral statements — they are process instructions that make the critique more effective. The reason "observations before solutions" produces better critique is that a designer who hears "make the CTA bigger" has received a solution to an undiagnosed problem; a designer who hears "the CTA has the same visual weight as the secondary action" has received a diagnosis that they can solve in multiple ways (bigger button, different colour, more whitespace, removing the secondary action) — the designer's own solution is usually better than the first solution suggested by a reviewer.

Developing Junior Designers Through Critique

Three specific mechanisms for junior designer development: (1) Junior designers facilitate critiques from Month 2: facilitation is a skill that develops through practice; asking junior designers to facilitate their peers' critiques (with a senior designer as a silent supporter who only intervenes if the session goes structurally wrong) builds facilitation skills and confidence simultaneously. (2) The "what I'm uncertain about" framing: junior designers are more likely to show work-in-progress if the critique format explicitly legitimises uncertainty; the "design question I'm trying to answer" step in the Context phase signals that work-in-progress is the expected input, not finished work. (3) Post-critique 1:1: the senior designer or design lead schedules a 15-minute 1:1 with the junior designer within 48 hours of their first few critiques to debrief: "How did that feel? Was the feedback useful? What are you going to try?" This 15-minute investment produces significant psychological safety returns.

Early Warning Metrics:

  • Critique session format adherence in the first month — are sessions running over time (longer than 45 minutes)? Are observations being given as solutions? Is the presenter responding during the Observe phase? Track these process adherence indicators in the first 4 sessions; the Head of Design should be present to redirect
  • Post-critique confidence score (junior designers specifically) — a 2-question survey sent within 24 hours of each critique: "I understand what to change as a result of today's critique" (1–5) and "I felt safe showing my work today" (1–5); target both questions at 4+; a score of 2 or below on either question indicates a specific session failure that must be debriefed with the facilitator
  • Senior designer attendance rate — track the proportion of structured critique sessions that include at least one senior designer; target 75%+; below 50% by Month 3 indicates the format is not delivering sufficient value for senior designer time; consider reducing session length to 30 minutes or introducing a "senior designer open critique hour" format where the senior designer shows their own work

4. Interview Score: 9.5 / 10

Why this demonstrates senior-level maturity: The C.O.R.E. format's specific mechanism for preventing the "most recent senior voice" problem (the 3-change synthesis stated aloud by the designer) demonstrates that this designer understands the political dynamics of design critique and has designed a specific process intervention to address them. The root cause analysis before the solution design (identifying that "I like it / I don't like it" feedback is caused by the absence of a shared critical language, not by individual reviewer inadequacy) shows the diagnostic rigour of a senior designer who solves structural problems rather than individual behaviour problems. The "observations before solutions" norm with its explicit rationale (a designer who receives a diagnosis can find better solutions than a reviewer who provides a solution to an undiagnosed problem) shows communication sophistication.

What differentiates it from mid-level thinking: A mid-level designer would propose a critique template with feedback categories (visual design, interaction design, content), a dedicated calendar slot, and a "be kind" norm. These address the symptoms without addressing the root causes. They would not know about the silent individual observation phase as the mechanism for preventing group-think, would not design the 3-change synthesis as the anti-pattern to seniority-driven design changes, and would not think to measure post-critique junior designer confidence as a leading indicator of psychological safety.

What would make it a 10/10: A 10/10 response would include a specific shared design principles framework for a product company (showing 5 principles with critique-applicable definitions and example violation descriptions), a complete post-critique survey template with the 2 junior designer confidence questions and the manager design quality rating question, and a facilitator guide for the first 4 structured critique sessions showing how to redirect common failure modes (solutions offered during the Observe phase, presenter defending work during the Respond phase).



Question 15: AI and UX — Designing Interfaces for AI-Powered Features

Difficulty: Elite | Role: UX Designer | Level: Senior / Staff | Company Examples: Google DeepMind products, Notion AI, GitHub Copilot, Figma AI, Linear


The Question

You are a Senior UX Designer at a legal technology company. The product is a contract review platform used by corporate lawyers and paralegals to review, annotate, and negotiate contracts. The CTO wants to add an AI feature: an LLM-powered "contract analysis assistant" that automatically identifies high-risk clauses, suggests alternative language, and summarises key obligations. The legal team's concern is significant: lawyers are professionally liable for the advice they give their clients; if the AI misidentifies a clause or suggests incorrect alternative language, the lawyer — not the AI vendor — bears the professional responsibility. The Head of Legal Operations says their team will not use an AI feature that presents suggestions as facts. Design the UX for the AI contract analysis assistant, addressing the trust, transparency, and liability concerns while making the feature genuinely useful.


1. What Is This Question Testing?

  • AI UX design principles — understanding that AI interfaces have specific design challenges that standard interaction design does not address: confidence calibration (the interface must communicate the AI's uncertainty level, not just its output), human oversight design (the AI assists the human decision; the human remains accountable; the interface must make human review and override the natural workflow, not an optional extra), and failure mode design (the AI will make mistakes; the interface must make mistakes visible and recoverable, not hidden)
  • Professional context sensitivity — designing AI for a legally liable professional context is fundamentally different from designing AI for a consumer context; a spell-checker that makes a wrong suggestion has no professional consequences; an AI that misidentifies an indemnity clause as standard when it contains unusual liability exposure creates professional liability risk for the lawyer; every design decision must account for the consequence of an AI error in this context
  • Trust architecture — the Head of Legal Operations' objection ("will not use a feature that presents suggestions as facts") is a trust architecture problem; the solution is not to make the AI more accurate (though that helps) but to design the interface so that the AI's suggestions are always presented as suggestions with explicit uncertainty signals — never as conclusions that the lawyer would need to actively reverse to challenge
  • Information hierarchy for complex professional documents — contract review is a cognitively dense task; a contract may be 80 pages with 150 clauses, many of which require specialist interpretation; an AI feature that adds UI complexity on top of an already complex task is worse than no AI feature; the design challenge is to surface AI insights without adding cognitive load to the primary task of reading and reviewing the contract
  • Regulatory awareness — AI in legal contexts is subject to emerging AI regulation; in the EU, the AI Act classifies AI systems used in administration of justice as high-risk; in the UK, the Solicitors Regulation Authority (SRA) has published guidance on AI use in legal services requiring that lawyers maintain professional judgment and oversight; the design must support regulatory compliance, not just user convenience
  • Collaborative design between AI capability and UX — the UX designer working on an AI feature must understand enough about how LLMs work to design appropriately; specifically: LLMs produce probabilistic outputs, not deterministic answers; they can be confidently wrong; they can hallucinate; the interface must communicate these properties to users in plain language without requiring the user to understand how LLMs work

2. Framework: AI Feature UX Design Model (AFUXDM)

  1. Assumption Documentation — Understand the AI model's actual performance characteristics: what is its precision and recall on high-risk clause identification in the specific domain (commercial contracts, M&A agreements, employment contracts)? What types of errors does it make most commonly (false positives — flagging benign clauses as high-risk, or false negatives — missing genuinely risky clauses)? The UX must be calibrated to the model's actual error profile, not a theoretical ideal
  1. Constraint Analysis — Professional liability means that every AI suggestion must be reviewable and overridable; SRA guidance requires human oversight; the interface must not create automation bias (the documented psychological tendency to trust automated outputs over human judgment, particularly when the automated output is presented authoritatively)
  1. Tradeoff Evaluation — Inline AI suggestions (AI analysis presented directly on the contract document, contextually) vs. side panel AI analysis (AI analysis presented separately, requiring the lawyer to correlate the AI output with the contract text); inline is faster for the user but higher risk for creating automation bias; side panel is slower but creates clearer cognitive separation between the AI's analysis and the lawyer's judgment
  1. Hidden Cost Identification — Automation bias risk: research (Goddard et al., 2012, BMJ on clinical decision support; Lee et al., 2018, Computers in Human Behavior on automated legal analysis) consistently shows that professionals who use AI decision support tools make worse decisions on the cases where the AI is wrong than professionals who receive no AI support at all — because the AI's authoritative presentation suppresses the professional's own critical judgment; the design must actively counter this tendency
  1. Risk Signals / Early Warning Metrics — AI override rate (what percentage of AI suggestions do lawyers modify or reject? — a rate below 5% suggests automation bias; lawyers should be overriding 20–40% of suggestions based on their professional judgment); missed clause detection rate in post-launch audit (are lawyers finding high-risk clauses that the AI missed? — a high miss rate with no AI override suggests lawyers are not performing independent review alongside the AI)
  1. Pivot Triggers — If user testing shows that lawyers are accepting AI suggestions at a rate above 85% without reviewing the underlying clause text (measured via eye-tracking or session recording): the interface is creating automation bias and must be redesigned to increase friction at the acceptance decision point — not to slow down the workflow, but to ensure a genuine review moment exists before acceptance
  1. Long-Term Evolution Plan — V1: AI analysis as a parallel assistant panel (not inline on the document); V2: personalised confidence calibration per lawyer (the interface learns which clause types each lawyer typically agrees or disagrees with the AI on, and adjusts its confidence display accordingly); V3: collaborative review mode (AI suggestions are shared with opposing counsel's system for counterpart negotiation suggestions)

3. The Answer

Explicit Assumptions:

  • The AI model: a fine-tuned LLM trained on commercial contract datasets; precision on high-risk clause identification: 84%; recall: 79% (meaning 21% of genuinely high-risk clauses are missed by the AI); most common false positive: flagging standard limitation of liability clauses as "unusual" when they are market-standard for the specific contract type
  • The lawyers' workflow: a contract review typically takes 2–8 hours; lawyers work through the document sequentially, annotating clauses with a risk rating (high/medium/low) and drafting suggested alternatives in a side panel; the current annotation system is manual
  • Regulatory context: the firm is SRA-regulated; the SRA's November 2023 guidance on AI requires that lawyers maintain professional responsibility for AI-assisted work and that AI use is disclosed to clients in engagement letters
  • The interface: a dual-panel layout (contract text on left, annotation and analysis panel on right — this is the existing layout that the AI feature will be added to)

Design Principle 1: AI Suggestions Are Always Hypotheses, Never Conclusions

Every piece of AI-generated content in the interface must be visually and linguistically marked as a suggestion, not a finding. The visual language: AI-generated content has a distinct but non-alarming visual treatment — a subtle teal left border (a different colour from the lawyer's own annotations, which use a blue border) and a small AI icon (a geometric sparkle, not a robot or brain icon — chosen for its neutrality in a professional context). The linguistic marker: every AI suggestion is prefaced with the qualifier: "AI suggests:" or "AI analysis:". These qualifiers are not boilerplate disclaimer text — they are embedded in the suggestion's visual design at a reading-level position so that the lawyer always processes "this is an AI suggestion" simultaneously with the suggestion content, not before or after reading it.

Design Principle 2: Confidence Display as a Professional Decision Tool

The AI's confidence level for each clause analysis is displayed as a 3-level indicator (not a percentage — percentages create false precision for outputs that are inherently probabilistic): High confidence (filled teal circle): "The AI has identified this pattern in many similar contracts with high consistency." Review recommended, but the AI's basis for flagging this clause is strong. Moderate confidence (half-filled teal circle): "The AI has flagged this because of one or more features that can indicate risk — but context matters. Your judgment is particularly important here." Low confidence (outlined teal circle): "The AI is uncertain about this clause. The language may be ambiguous or unusual. This requires your professional assessment." The confidence display serves two purposes: it calibrates the lawyer's reliance on the AI (moderate and low confidence suggestions warrant more scrutiny than high confidence ones) and it communicates the AI's actual uncertainty profile honestly — which the Head of Legal Operations specifically required.

Design Principle 3: Human Review Is the Natural Workflow, Not an Optional Override

The most important design decision in the interface is the acceptance/rejection workflow for AI suggestions. Do not use a binary "accept/reject" button — this creates a choice architecture that frames the AI's suggestion as the default and human judgment as the override. Instead: AI suggestions are "unreviewed" by default (displayed with a grey background in the annotation panel). The lawyer's workflow is to mark each suggestion as: "Agreed" (the AI's analysis is correct — I confirm this clause as high risk), "Disagree — not high risk" (the AI has flagged a clause I consider standard market practice), "Disagree — different issue" (the clause has an issue, but not the one the AI identified), or "Escalate for senior review" (I'm uncertain — I want a second opinion). No suggestion is ever automatically incorporated into the contract review — every suggestion requires a positive lawyer action. The unreviewed suggestions count is displayed as a persistent badge in the AI panel header ("14 unreviewed AI suggestions") — not as a completion progress bar (which implies that reviewing AI suggestions is the lawyer's primary task) but as a reference count (the AI has flagged 14 things; the lawyer reviews them in the order of their own workflow, not the AI's). At the end of the review, the lawyer sees a summary: "You agreed with 8 of 14 AI suggestions, modified 3, and disagreed with 3. 2 clauses were flagged by you that the AI did not identify." The final line — "2 clauses flagged by you that the AI did not identify" — is critical: it makes the lawyer's own independent contribution visible, reinforcing the message that the AI is a tool for the lawyer, not a replacement for the lawyer's professional judgment.

Design Principle 4: Transparent Provenance for Every Suggestion

For each AI suggestion, the lawyer can access a "How did the AI reach this?" panel — a one-click expansion showing: the specific contract language that triggered the flag, the comparable clause patterns from the model's training data (shown as anonymised clause examples — "this language is similar to clauses that were renegotiated in 78% of comparable contracts"), and the model's uncertainty reasons for moderate and low confidence suggestions ("the AI is uncertain because the 'subject to applicable law' qualifier may change the risk profile of this clause depending on jurisdiction"). This provenance panel is not required reading for every suggestion — it is available on demand for lawyers who want to understand the AI's reasoning before making a judgment. A lawyer who can see why the AI flagged a clause is better equipped to agree or disagree with the AI than a lawyer who receives only the output.

Addressing the Liability Concern Directly in the Interface

The SRA guidance requires that AI use is disclosed to clients. The interface supports this with a "generate disclosure statement" function that produces a draft client letter paragraph: "Our review of this contract used AI-assisted clause analysis tools. All AI suggestions were reviewed and verified by [Lawyer Name], who takes professional responsibility for the advice in this letter." This is a 2-click action (flag contract for AI disclosure → generate disclosure draft) that takes less than 30 seconds. It reduces the friction of SRA compliance to near-zero, removing the operational barrier to using the AI feature while meeting the regulatory requirement. It also signals to lawyers that the company has thought about their professional obligations, which builds trust in the product.

Early Warning Metrics:

  • AI suggestion agreement rate by confidence level — monitor the proportion of high/moderate/low confidence suggestions that lawyers agree with; expected: high confidence ≈ 70–85% agreement, moderate ≈ 40–60% agreement, low ≈ 20–35% agreement; if high confidence agreement rate is above 90%, investigate for automation bias; if low confidence agreement rate is above 50%, the confidence calibration is miscalibrated (the AI is showing higher confidence than warranted)
  • Unreviewed suggestions at document sign-off — the platform should alert the lawyer if any AI-flagged suggestions remain unreviewed when the document is marked complete; a lawyer who signs off a review with 6 unreviewed AI suggestions has not completed the review workflow; this is both a quality signal and a liability signal
  • Independent clause flags (lawyer-found, AI-missed) per review — track the number of clauses that lawyers flag independently that the AI did not identify; a declining rate over time may indicate that lawyers are reducing their independent review (relying on the AI as their primary detection mechanism); an increasing rate indicates healthy independent judgment

4. Interview Score: 10 / 10

Why this demonstrates staff-level maturity: Designing the acceptance workflow to require a positive lawyer action on every suggestion (rather than a binary accept/reject that frames the AI as the default) is the specific design mechanism that addresses automation bias at an architectural level — not through a warning label, but through interaction design that makes independent review the natural workflow path. The "2 clauses flagged by you that the AI did not identify" metric in the review summary is a design detail that actively reinforces the lawyer's independent professional judgment — it is the kind of insight that only comes from thinking about the psychological dynamics of human-AI collaboration in a high-stakes professional context. The Goddard and Lee automation bias research citations (BMJ clinical decision support; Computers in Human Behavior) demonstrate the research literacy that makes the design decisions defensible to a sceptical engineering team.

What differentiates it from mid-level thinking: A mid-level designer would add an "AI suggestions" panel to the existing interface, use a standard accept/reject button pattern, add a disclaimer footer ("AI output may be inaccurate — please verify"), and consider the design complete. They would not address automation bias as a specific design challenge, would not design the confidence calibration display, would not think about the SRA disclosure obligation as an interface feature, and would not design the "unreviewed suggestions at sign-off" alert as a liability-protective mechanism.

What would make it a perfect implementation: The response scores 10/10 on the dimensions this question tests. The theoretical extension would be a complete interaction specification for the confidence display component, an A/B test design comparing the "unreviewed by default" acceptance workflow against a "pre-accepted, override to reject" workflow to measure automation bias rates empirically, and a worked example of the provenance panel content for a specific clause type (limitation of liability).