Outside the day job

Vibe-coded Prototypes

Working AI prototypes I built end-to-end to pressure-test product ideas: an agent-evaluation engine, a duplicate-prescription detection agent, and an ops console.

Innovaccer AI Studio — Agent Evaluation Engine

Problem: Most agent-evaluation tools answer “did v1.9 beat v1.8?” — not the question a non-technical population-health manager actually has: “Can I let this agent act on my members on its own — and if not, for whom and how far?” Scores don't translate into a defensible deployment decision.

Approach: I designed evaluation as trust infrastructure, not a reporting screen. Verdicts are expressed in what the agent is allowed to do (an Observe → Assisted → Partially → Fully Autonomous ladder), not F1 scores. Checks are layered (deterministic first, LLM-judge second) under one mental model, and everything is cohort-sliced so sub-population bias surfaces instead of hiding behind a strong average.

Solution & trade-off: The headline trade-off was objectivity vs. nuance. I kept the rollup strictly objective — rules → cases → cohorts → capability → autonomy verdict, with no weighted averages or mid-way tags — so any verdict can be walked down to the failing case and the agent's step-by-step reasoning. V1 stayed tight (pre-built packs, deterministic comparison, simple dashboard); AI-generated evaluators, cohort value-maximization, and RBAC were pushed to V2/V3.

How to try it (mocked data)

Launch the prototype.
In AI Studio on the main page, click the Evaluations card.
Click New evaluation, then Continue.
Review the built-in eval packs.
Add a custom evaluation, then Launch.
Open the dashboard — it shows whether the agent is ready for autonomous deployment.
Drill into failing cases and review them by cohort.
Manually override where needed.
Deploy the agent once evaluation is complete.

View prototype ↗

Wheel — Duplicate Prescription Detection Agent (PrescribeShield)

Problem: In virtual care, patients seek second opinions or stockpile — and can end up with duplicate prescriptions under different drug names. Rule-based checks flag exact matches but miss dose titration, naming variation, and cross-system data. The risk spans patient safety, payer clawbacks, regulatory exposure, and platform trust.

Approach: A six-step pipeline that separates deterministic work from AI reasoning: normalize the pending Rx, normalize medication history, filter candidates, compute overlap (all deterministic), then classify intent and surface a finding (AI reasoning). The agent surfaces evidence and a severity tier; the clinician always decides. Mindset: assist, never replace.

Solution & trade-off: The core trade-off is safety vs. alert fatigue. I tuned for it with a three-tier escalation model (Low/Medium/High by confidence and potential harm) and a deliberately minimal clinician UI showing only the matched med, strength, fill date, days supply, pharmacy, overlap, and a clear action. Key decision: don't let the LLM do math — anchor date/overlap logic in code and use the model only where genuine ambiguity exists.

How to try it (mocked data)

In the first-name field, pick one of the 4 pre-seeded mocked options.
Hit Submit.
See the provider-facing view triggered when a duplicate prescription is detected.
Explore all 4 CTAs on that page — Review, Edit, Hold, Cancel.

View prototype ↗System prompts ↗

Sprinter Console — Care Ops Surface

Problem: Operational teams in care delivery juggle scattered tools and signals to triage, route, and resolve work — slowing decisions and making it hard to see what needs attention now versus later.

Approach: I prototyped a single operational console that consolidates the queue and surfaces the next best action, applying the same principle from my agentic work: make the system explicit and measurable — clear states, traceable decisions, and a human firmly in control of consequential actions.

Solution & trade-off: The decision call was breadth vs. depth. Rather than a sprawling do-everything console, I scoped a focused surface that proves value on the highest-volume operational paths first, with deterministic guardrails around any automated step and obvious manual override — leaving richer workflows for a later iteration.

How to try it (mocked data)

See how information surfaces on the provider's page before an in-person visit.
Toggle the switch in the top-right corner to preview what future iterations could add.

View prototype →