Outside the day job
Vibe-coded Prototypes
Working AI prototypes I built end-to-end to pressure-test product ideas: an agent-evaluation engine, a duplicate-prescription detection agent, and an ops console.
Innovaccer AI Studio — Agent Evaluation Engine
Problem: Most agent-evaluation tools answer “did v1.9 beat v1.8?” — not the question a non-technical population-health manager actually has: “Can I let this agent act on my members on its own — and if not, for whom and how far?” Scores don't translate into a defensible deployment decision.
Approach: I designed evaluation as trust infrastructure, not a reporting screen. Verdicts are expressed in what the agent is allowed to do (an Observe → Assisted → Partially → Fully Autonomous ladder), not F1 scores. Checks are layered (deterministic first, LLM-judge second) under one mental model, and everything is cohort-sliced so sub-population bias surfaces instead of hiding behind a strong average.
Solution & trade-off: The headline trade-off was objectivity vs. nuance. I kept the rollup strictly objective — rules → cases → cohorts → capability → autonomy verdict, with no weighted averages or mid-way tags — so any verdict can be walked down to the failing case and the agent's step-by-step reasoning. V1 stayed tight (pre-built packs, deterministic comparison, simple dashboard); AI-generated evaluators, cohort value-maximization, and RBAC were pushed to V2/V3.
Wheel — Duplicate Prescription Detection Agent (PrescribeShield)
Problem: In virtual care, patients seek second opinions or stockpile — and can end up with duplicate prescriptions under different drug names. Rule-based checks flag exact matches but miss dose titration, naming variation, and cross-system data. The risk spans patient safety, payer clawbacks, regulatory exposure, and platform trust.
Approach: A six-step pipeline that separates deterministic work from AI reasoning: normalize the pending Rx, normalize medication history, filter candidates, compute overlap (all deterministic), then classify intent and surface a finding (AI reasoning). The agent surfaces evidence and a severity tier; the clinician always decides. Mindset: assist, never replace.
Solution & trade-off: The core trade-off is safety vs. alert fatigue. I tuned for it with a three-tier escalation model (Low/Medium/High by confidence and potential harm) and a deliberately minimal clinician UI showing only the matched med, strength, fill date, days supply, pharmacy, overlap, and a clear action. Key decision: don't let the LLM do math — anchor date/overlap logic in code and use the model only where genuine ambiguity exists.
Sprinter Console — Care Ops Surface
Problem: Operational teams in care delivery juggle scattered tools and signals to triage, route, and resolve work — slowing decisions and making it hard to see what needs attention now versus later.
Approach: I prototyped a single operational console that consolidates the queue and surfaces the next best action, applying the same principle from my agentic work: make the system explicit and measurable — clear states, traceable decisions, and a human firmly in control of consequential actions.
Solution & trade-off: The decision call was breadth vs. depth. Rather than a sprawling do-everything console, I scoped a focused surface that proves value on the highest-volume operational paths first, with deterministic guardrails around any automated step and obvious manual override — leaving richer workflows for a later iteration.
