Case Study

Advanced Camera Module — AI guidance at the lens

Photo quality is won or lost at the lens. We created AI-assisted capture experiences that help clinical teams and patients take the right photos the first time—reducing retakes, delays, and support overhead.

Prepare → Capture (overlays) → Quality check → Review → Submit

The capture flow evolved from a single monolithic camera into five distinct states with real-time guidance at each step. This structure emerged after testing revealed users needed explicit progress markers, not just a viewfinder.

Overview

Design that corrects before the shutter clicks

The Advanced Camera Module branched off from the Invisalign Practice App team to go deeper into photo capture UX, then reintegrated back into both the IPA and MyInvisalign apps as a shared capability. The goal was to shift quality upstream—from reactive review to proactive guidance—so teams and patients could capture clinical-grade photos without specialized training or multiple attempts.

Instead of "open camera and hope for the best," we built a system that teaches good technique in real-time. AI overlays guide angle and distance, on-device checks flag blur or exposure issues before the shutter clicks, and best-shot compare tools give users confidence they've captured something usable—all while working offline in low-signal operatories or at home.

Role & Scope

Lead Product Designer — I defined capture UX across clinical and patient contexts, partnered with computer vision scientists to translate ML outputs into usable UI, created prototypes for guided flows and edge cases, led usability testing in clinics and remote scenarios, and systemized components for reuse across apps.

Timeline

5 months to pilot deployment; 2+ years of iterative releases post-launch expanding capabilities and refining guidance

Team

Cross-functional squad: PM, iOS/Android engineers, platform lead, computer vision scientist, clinical advisor, QA, and design system partners

Problem

Retakes, slowdowns, and inconsistent sets

Jump to: Research findings • Early concepts • What didn't work

Even with digital cameras in hand, clinical teams and patients routinely lost minutes to reshoots and guesswork. Photos came back with inconsistent angles, missing shots, or quality issues that only surfaced during review—forcing people back to the chair or mirror to try again. The feedback loop was too late, the guidance too implicit, and the consequences too costly in a high-volume practice.

Through field observations and rejected photo analysis, we identified where quality broke down: framing varied wildly by user experience level, shot order drifted causing downstream rework, problems surfaced at review instead of at capture, and connectivity gaps in low-signal rooms meant uploads failed silently. Add in hardware complexity—different iPhone models with different focal lengths and autofocus capabilities—and you had a recipe for inconsistent outcomes that training alone couldn't solve.

Inconsistent framing: Distance and angle varied widely depending on who was holding the device, with no real-time feedback to correct course.
Order drift: Missing or swapped shots created downstream rework in case processing and treatment planning.
Late feedback: Quality issues discovered at review meant backtracking to the appointment and asking for retakes—disrupting schedules and eroding trust.
Connectivity gaps: Upload failures in low-signal rooms left teams unsure whether photos made it to the system, forcing manual checks and workarounds.
Hardware complexity: Different devices required different guidance strategies, but users expected a consistent experience regardless of what iPhone they happened to be using.

Before and after comparison of photo capture workflow showing the improvement from reactive retakes to proactive real-time guidance — Early journey mapping revealed that quality issues weren't discovered until 10-15 minutes after capture, when photos reached the review queue. This late feedback loop meant assistants had to pull patients back to the chair—disrupting schedules and creating friction. The "after" state shows how moving validation upstream to the capture moment eliminated this costly delay.

User Research

Observing where quality breaks down in real contexts

Jump to: See what we tried • What failed • Final solution

Before designing AI guidance, we needed to understand where and why photo quality failed in practice. That meant shadowing assistants during chair-side capture, watching patients attempt self-capture at home for virtual care, and analyzing thousands of rejected photo sets to identify patterns in blur, exposure, and framing errors.

Field observations revealed a common theme: people wanted to do it right, but lacked confidence in the moment. Assistants described the pressure of keeping appointments on schedule while ensuring every shot met clinical standards. Patients attempting self-capture at home expressed frustration with "guesswork"—they couldn't tell if the angle or distance was correct until after submitting, when it was too late.

We ran early computer vision prototypes with clinical teams to validate detection accuracy and gather feedback on how AI guidance should manifest in the UI. The recurring insight: teams wanted the module to catch problems before hitting submit, not after review. This shifted our approach from post-capture validation to real-time, in-capture correction—moving the feedback loop as close to the lens as possible.

"If the app tells me what's wrong before I click, that's huge. I can fix it right then instead of discovering it later." — Chair-side assistant during pilot testing

Research workshop with sticky notes and sketches showing user feedback from clinical staff and patients during field observations — Synthesis workshop after three weeks of field observations in five different practices. The red clusters represent pain points mentioned by 4+ participants. The pattern that emerged: confidence, not capability, was the barrier—users physically could take good photos, but didn't know if they had until it was too late.

Early Concepts & Process

Sketches, prototypes, and experiments

The final solution looks clean and obvious in hindsight, but we explored dozens of directions before landing on real-time overlays and progressive guidance. Here's what the messy middle looked like.

Hand-drawn wireframe sketches on whiteboard showing initial concepts for camera guidance UI — **Week 2 whiteboard sketches:** Initial brainstorm with CV scientist exploring how to surface head pose data. These early concepts were too engineering-focused—showing raw confidence scores and axes that meant nothing to users. We hadn't yet translated ML outputs into actionable guidance.

Low-fidelity wireframe showing early ghost guide concept with numerical indicators — **Low-fi prototype v1:** First attempt at "ghost guide" overlays with numerical distance indicators. Testing revealed users ignored the numbers entirely and just wanted color feedback (green = good, red = adjust). This prototype helped us realize guidance needed to be glanceable, not precise.

Flow diagram with handwritten annotations showing iteration on capture state logic — **Capture flow iteration with PM and engineers:** This annotated flow diagram shows the debate over when to trigger quality checks—after every shot (too interruptive) vs. only on submit (too late). The handwritten notes capture the tension: engineers wanted fewer checks for performance, clinical wanted more for safety. We compromised on instant checks only for critical issues (blur, glare), with batch validation on review.

Collaboration challenge: Translating CV confidence scores into UI

The computer vision scientist wanted to expose model confidence scores (0.0-1.0) so users could see how "sure" the AI was. I pushed back: exposing uncertainty would erode trust, not build it. We spent two weeks in Figma and prototypes testing approaches. The breakthrough came when I proposed we only show guidance when confidence crossed a threshold (>0.85)—and make it binary. Green overlay = good. Red + prompt = fix this specific thing. No gray area, no probabilistic thinking required. The scientist was initially resistant, worried about false negatives. But field testing proved users preferred confident, occasionally wrong guidance over constant hedging. We implemented confidence thresholds with a feedback mechanism so the model could learn from corrections.

Failures & Pivots

What didn't work and why

Not every concept survived contact with users. Here are the ideas we killed, the pivots we made, and what we learned from things that didn't work.

❌ Failed: Tutorial-first onboarding

What we tried: A 5-screen tutorial explaining how AI guidance worked before letting users access the camera.

Why it failed: Completion rate was 31% in pilot testing. Assistants were under appointment pressure and skipped tutorials entirely. Patients at home lost patience and abandoned the flow. The guidance we were explaining made no sense without actually seeing it in the camera.

What we learned: Teach through use, not upfront. We moved all guidance explanations into contextual tooltips that appeared the first time a user encountered each feature. Completion rate jumped to 89%.

❌ Failed: Auto-capture on quality threshold

What we tried: Automatic shutter trigger when all quality heuristics passed thresholds—hands-free capture.

Why it failed: Users felt a complete loss of control. The camera would fire at "wrong" moments from their perspective, even though quality scores were good. One assistant said it felt like "the app is taking photos OF me, not FOR me." Trust eroded fast.

What we learned: Users needed to feel in control of the capture moment, even if AI was doing heavy lifting behind the scenes. We kept manual shutter control and moved AI to a supportive role: guiding toward good positioning, then letting users decide when to capture. Satisfaction scores improved 28 points.

❌ Failed: Single comprehensive overlay

What we tried: One overlay showing distance, angle, symmetry, and lighting feedback simultaneously.

Why it failed: Information overload. Users didn't know which feedback to prioritize. In testing, they'd fix angle, which broke distance, which broke symmetry—chasing multiple moving targets with no clear hierarchy.

What we learned: Progressive disclosure wins. We broke guidance into layers: primary (distance + framing) activates first, then secondary (symmetry) once primary is met, then tertiary (lighting hints) only if needed. Users now fix issues in sequence, not all at once. Time-to-good-shot dropped 41%.

⚠️ Pivot: From cloud-based to on-device processing

Original plan: Send frames to cloud for ML analysis, return guidance prompts. Lower device requirements, centralized model updates.

Why we pivoted: Field reality hit hard—45% of operatories had spotty WiFi. Cloud processing meant 300-800ms latency, which felt laggy and broke the real-time illusion. Worse: complete failure in offline scenarios.

The pivot: Migrated to Apple Core ML for on-device inference. Latency dropped to <50ms, offline-first became default. Trade-off: bigger app size (+47MB) and device requirements (iPhone 8+ only), but worth it for reliability. This pivot shaped the entire architecture going forward.

Design Principles & Approach

Real-time guidance without extra steps

Jump to: See key decisions • Final implementation

The technical foundation relied on on-device AI for real-time guidance, offline-first architecture for low-signal environments, and adaptive heuristics that accounted for different iPhone models and focal lengths. But the design challenge was translating machine learning outputs into usable, confidence-building UI that felt helpful rather than intrusive.

We partnered closely with computer vision scientists to understand what the models could reliably detect—and what they couldn't. Early prototypes showed we could assess head pose, distance, symmetry, and basic quality issues like blur and exposure in real-time. The question became: how do we surface that feedback without overwhelming users or slowing them down?

Four design principles guided the work:

Correct before capture: Real-time overlays and prompts catch issues in the moment, not after the fact.
Teach without slowing: Guidance is embedded in the UI flow, not added as extra training steps or interruptions.
Make completeness visible: Progress meters and best-shot compare tools give users confidence they've captured what's needed.
Design for edge cases: Handle gracefully scenarios like Extra Wide Open Bite (EWOB) or hardware limitations where standard heuristics break down.

System architecture diagram showing computer vision pipeline and technical approach to AI guidance — Technical architecture evolved from cloud-dependent (v1) to fully offline-capable (v3). The on-device Core ML pipeline processes 30fps camera feed through face detection → pose analysis → quality heuristics, with results mapped to UI guidance in <50ms. This architecture emerged after the cloud-processing pivot and shaped all subsequent feature work.

Wireframe showing AI overlay system with ghost guides and angle indicators

Flow diagram showing quality check validation logic for blur and exposure detection

Wireframe showing best-shot comparison interface and selection logic

Technical diagram showing offline-first architecture and device adaptation matrix

System diagram showing component sharing and design token architecture across apps

Design exploration showing fidelity progression: sketches (ideation) → wireframes (structure) → mockups (polish). Intentionally mixed fidelity levels to show thinking evolution, not just final outputs.

Design & Implementation

Translating ML outputs into usable, confidence-building UI

Jump to: Why we made these choices • Impact metrics

Turning computer vision detections into actual product required close collaboration between design, engineering, and data science. Early prototypes looked more like engineering demos—raw ML confidence scores and bounding boxes—than guidance people could act on. The design work involved iterating through flows, wireframes, and prototypes until we found the right balance of information, timing, and visual weight.

The core capture flow standardized around Prepare → Capture → Quality Check → Review → Submit. Each state had clear entry criteria and visible progress, so users always knew where they were and what came next. Within the Capture state, we layered in real-time AI overlays, instant quality prompts, and a best-shot compare UI that let users course-correct without feeling locked into a rigid sequence.

Key design work included:

Ghost guide overlays that showed ideal angle and symmetry without blocking the viewfinder—using transparency and color to indicate correctness.
Quality check validation prompts for blur, glare, and exposure that appeared instantly when issues were detected, with clear instructions on how to fix ("Hold steady," "Adjust lighting").
Best-shot selection and side-by-side compare UI so users could swap in a better image before finalizing, reducing the anxiety of "did I get it right?"
Accessibility considerations: large hit targets, color-agnostic cues (shape and motion, not just color), and VoiceOver support for guided flows.
Systemization: shared design tokens and components across IPA and MyInvisalign so guidance felt consistent regardless of which app you were using.
Edge case handling: prototyped solutions for color correction needs, Extra Wide Open Bite (EWOB) scenarios where standard heuristics failed, and hardware variability across iPhone models.

Mobile screen showing real-time AI overlay guidance for photo capture with ghost guides and angle indicators — **Real-time overlays in capture state:** Green ghost guide indicates proper alignment. The overlay uses 40% opacity so it's visible but not obstructive. Color switches to amber when user drifts >15° off target, red when critical thresholds are breached. This progressive color system emerged from accessibility testing—we needed cues that worked without color dependency.

Mobile screen showing quality validation prompt for image blur with actionable guidance — **Instant quality validation with remediation:** Blur detected via laplacian variance threshold <100. Instead of generic "photo unclear," we show specific fix actions ranked by likelihood of success. "Hold steady" appears first (88% fix rate), "Move closer" second (67% fix rate). This prioritization came from analyzing 12,000+ retake scenarios.

Mobile screen showing side-by-side best-shot comparison interface — **Best-shot compare in review state:** Side-by-side view with quality scores shown as visual bars, not numbers. Users swipe to compare, tap to select. This replaced a single-photo review where users had no reference point for "good enough." Comparison increased submission confidence 34%.

Mobile screen showing progress meter and completeness indicators for photo set — **Completeness visualization:** Progress ring shows overall set completion, with individual slots indicating required vs. optional photos. Checkmarks appear instantly after quality validation passes. This explicit "done" state reduced incomplete submissions 52%—users previously had no clear signal when the set was complete.

Key Design Decisions

Critical choices and tradeoffs

Every design decision involves tradeoffs. Here are three pivotal choices that shaped the Advanced Camera Module, with the constraints we navigated and alternatives we rejected.

Decision #1: Progressive guidance layers, not comprehensive overlays

The choice:

Show guidance in sequence (distance first, then angle, then symmetry) rather than all feedback simultaneously.

Why:

Testing comprehensive overlays caused paralysis—users didn't know which issue to fix first and ended up chasing multiple moving targets. Progressive disclosure gave them one clear action at a time, reducing cognitive load and time-to-good-shot by 41%.

Tradeoff:

Slower guidance for expert users who could handle multiple cues. We mitigated this by detecting repeat users and progressively showing more guidance layers simultaneously after 10+ successful captures. Power users eventually got comprehensive overlays, novices stayed in guided mode.

Alternative we rejected:

Difficulty levels (Beginner/Advanced) that users chose upfront. Testing showed users didn't know which level they needed and picked wrong 63% of the time. Adaptive behavior based on actual usage patterns worked better.

Decision #2: Manual shutter control, not auto-capture

The choice:

Keep manual shutter button even though AI could detect optimal capture moments and fire automatically.

Why:

Auto-capture felt like loss of control—the camera firing "at" users rather than "for" them. Satisfaction scores dropped 31 points in testing. Users needed to feel they were in charge of the capture moment, even if AI was doing heavy guidance behind the scenes. Manual control with AI assistance delivered higher trust and satisfaction than full automation.

Tradeoff:

Some photos captured outside optimal quality windows because users clicked too early. We added "ready to capture" haptic feedback (subtle vibration) when all quality thresholds passed to nudge toward better timing without removing control.

Alternative we rejected:

Hybrid mode that auto-captured after a countdown if user didn't click. Felt gimmicky and pressured users into rushed captures. The countdown created anxiety, not confidence.

Decision #3: On-device processing, not cloud ML

The choice:

Migrate all computer vision models to run on-device using Core ML instead of cloud-based processing.

Why:

Field reality: 45% of dental operatories had unreliable WiFi. Cloud processing introduced 300-800ms latency that broke the real-time illusion and completely failed offline. On-device inference runs <50ms with no connectivity dependency—critical for reliability in actual usage environments.

Tradeoff:

Increased app size by 47MB and required iPhone 8 or newer. Older devices couldn't run the module. We accepted this—better to work perfectly for 85% of users than poorly for 100%. Also complicated model updates: cloud models could be swapped instantly, on-device required app updates. We built a monthly release cadence for model improvements.

Alternative we rejected:

Hybrid approach with degraded guidance for offline scenarios. Created inconsistent experiences—users couldn't predict what features would work where. We chose full capability offline over partial capability always.

Key Milestones

From concept to scaled clinical deployment

The Advanced Camera Module evolved through several major releases, each expanding capabilities while maintaining the core principle of real-time, confidence-building guidance.

Timeline showing major milestones from discovery through Wave 3 deployment — Project timeline with velocity spikes visible around major pivots (cloud→on-device migration in month 3, auto-capture removal in month 7). Deployment phases shown with actual practice adoption curves, not just ship dates.

Discovery & CV Prototyping

Partnered with computer vision team to develop initial real-time quality heuristics for blur, exposure, and framing. Ran pilot tests with clinical staff to validate detection accuracy and gather early UX feedback on how guidance should surface in the UI.

Core Capture Flow Standardization

Established the Prepare → Capture → Quality Check → Review → Submit flow as the standard across apps. Introduced AI overlays for angle, distance, and framing, plus on-device blur/exposure/glare checks with instant prompts. Deployed best-shot selection and side-by-side compare tools.

Advanced Cases & Continuous Photo Capture

Shipped Continuous Photo Capture SDK supporting 23 photos for advanced clinical cases, with guidance for partial photo sets and supplementary buccal shots. Migrated from Google ML Kit to Apple Core ML for improved speed and stability on-device.

Wave 3: Camera Configuration Preferences

Added user-configurable camera preferences including intraoral zoom presets (Near, Balanced, Far), illumination modes (Off, Burst Flash, Torch with adjustable intensity), and realtime guidance color schemes—giving doctors control over capture behavior while maintaining quality guardrails.

Integration & Systemization

Fully integrated the module into Invisalign Practice App and MyInvisalign as a shared capability. Systemized components, tokens, and guidance patterns for reuse across clinical and patient contexts, scaling globally across 100+ markets.

Impact

Fewer retakes, faster capture, higher acceptance rates

The Advanced Camera Module significantly reduced the time and friction involved in capturing clinical-grade photos, while improving quality and first-time acceptance rates across both clinical and patient-facing contexts.

0 %

Photo set resubmissions

Real-time quality checks and guidance reduced the need for retakes and follow-up submissions.

0 %

Time to complete capture

Guided workflows and instant feedback accelerated the capture process without sacrificing quality.

0 %

First-time acceptance rate

More photo sets passed clinical review on the first submission, reducing back-and-forth.

0 pts

User satisfaction (ease of use)

Teams and patients reported higher confidence and satisfaction with the guided capture experience.

Notes: Metrics shown are representative of pilot and post-launch outcomes and illustrate directional impact. Actual results may vary by practice, use case, and deployment context.

Lessons Learned

What this project taught me about AI-assisted design

Real-time feedback beats post-capture review. Moving quality checks upstream—into the capture moment—prevented errors rather than just detecting them after the fact. The earlier you can surface an issue, the less costly it is to fix.
Trust comes from transparency. Showing users why a photo was flagged (and how to fix it) built confidence in the AI guidance. Generic "try again" prompts eroded trust; specific, actionable feedback ("Hold steady," "Adjust angle left") earned it.
Systemization scales impact. Building the module as a shared capability across apps—rather than duplicating guidance logic—multiplied its value without multiplying work. Design systems aren't just visual tokens; they're also patterns, behaviors, and ML models.
Edge cases reveal system limits. Scenarios like Extra Wide Open Bite (EWOB) or low-light environments exposed where standard heuristics broke down. Designing graceful degradation and fallback flows was as important as optimizing the happy path.
Users want guidance, not automation. The failed auto-capture experiment taught me that even when AI can do something perfectly, users still want to feel in control. The best AI features augment human decision-making rather than replacing it.

Recognition & Achievements

How the work was recognized

The Advanced Camera Module and related imaging innovations contributed to broader platform recognition and earned both external awards and internal IP protections.

Award badges and recognition logos for Stevie Awards, Dentistry Today, and Align Inventor Award — Recognition for Advanced Camera Module work contributed to broader platform awards. Gold Stevie (2023) specifically cited "innovative use of computer vision for real-time capture guidance."

Patents & Intellectual Property

Real-Time Head Pose Analysis for Clinical Photos
Multi-Device AI-Based Mobile Camera
Contributions to image quality assessment and adaptive guidance systems

Awards & Recognition

2023 Gold Stevie Award — Integrated Mobile Experience (IPA)
Dentistry Today Top 50 Tech Products (2022)
Align Technology Inventor Award for ACM innovations
Internal IPM Awards: Customer Focus (2023), Best Innovation (2021)

← Previous case Next case →