Kris Blackmore

contact

Kris Blackmore

contact

Augintel Assist

Teaching a generative AI model when to say "I don't know."

Platform

B2B SaaS - GovTech

My Role

Lead product designer

With

PM, ML, CS, eng.

Timeline

~13 months

The OPPORTUNITY

Too many notes. Too little time. Too much at stake.

Augintel is case management software for child welfare agencies. A single active case can accumulate over a hundred notes covering visits, court filings, service updates, and risk flags, often times written by different workers. New workers inheriting a case face the steepest climb: hundreds of entries and a looming deadline.

I led design on Augintel Assist, a conversational AI feature letting workers ask plain-language questions about a case, grounded only in that case's own notes. The risk was obvious from day one — this isn't a domain where a model can guess. Getting it wrong in a court report has real consequences for a family.

Discovery

I surveyed workers and interviewed six supervisors before we built anything.

Alongside early surveying, I ran an ideation workshop with stakeholders to understand where AI could create the most value, then interviewed six supervisors at a county child welfare agency to understand what would make an AI summary trustworthy enough to use before a hearing.

Survey data

What Supervisors said

"As long as it were subject to review and ultimately my final decision, I think that would be fine. There does need to be some editorial control."

Lawerence, Supervisor

"As long as it were subject to review and ultimately my final decision, I think that would be fine. There does need to be some editorial control."

Lawerence, Supervisor

"As long as it were subject to review and ultimately my final decision, I think that would be fine. There does need to be some editorial control."

Lawerence, Supervisor

What We learned

Excitement, with conditions

Every supervisor was enthusiastic, but tied their enthusiasm directly to editorial control. They wanted a draft to review, not a finished answer to submit.

Omissions are the real risk

"It can really vary. We could be putting a child back at risk." Supervisors were less worried about the AI inventing things than quietly leaving something out.

"Where it came from" matters as much as "what it says"

Several interviewees raised court admissibility unprompted. A generated summary needs to trace to a source or it's unusable.

Act 3 — The solution

AI had to show its work

The interface design choices were almost all in service of one goal: making it safe for a supervisor to use in a workflow that ends in court. Research gave us four clear principles to design against.

Key decisions

Each one tied directly to a supervisor research finding.

Every claim needs a source

Supervisors needed to verify before they'd trust. Every generated answer carries numbered citation markers linking back to source notes, with a "view all associated notes" link.

Every claim needs a source

Supervisors needed to verify before they'd trust. Every generated answer carries numbered citation markers linking back to source notes, with a "view all associated notes" link.

Every claim needs a source

Supervisors needed to verify before they'd trust. Every generated answer carries numbered citation markers linking back to source notes, with a "view all associated notes" link.

Ask, don't assume

Cases routinely involve more than one person matching the same description. When a query is ambiguous, Assist surfaces the specific people on the case and asks the worker to choose.

Ask, don't assume

Cases routinely involve more than one person matching the same description. When a query is ambiguous, Assist surfaces the specific people on the case and asks the worker to choose.

Ask, don't assume

Cases routinely involve more than one person matching the same description. When a query is ambiguous, Assist surfaces the specific people on the case and asks the worker to choose.

The worker has the final word

Supervisors wanted a draft, not a finished answer. Generated answers are labeled as drafts with thumbs up/down feedback directly on the output.

The worker has the final word

Supervisors wanted a draft, not a finished answer. Generated answers are labeled as drafts with thumbs up/down feedback directly on the output.

The worker has the final word

Supervisors wanted a draft, not a finished answer. Generated answers are labeled as drafts with thumbs up/down feedback directly on the output.

Name the limitation upfront

"May contain incomplete or inaccurate information" appears on every draft. Not legal boilerplate — a direct response to what supervisors said they needed to trust the output.

Name the limitation upfront

"May contain incomplete or inaccurate information" appears on every draft. Not legal boilerplate — a direct response to what supervisors said they needed to trust the output.

Name the limitation upfront

"May contain incomplete or inaccurate information" appears on every draft. Not legal boilerplate — a direct response to what supervisors said they needed to trust the output.

The Trade off

I chose guided, structured input over natural language

The obvious direction for a conversational AI feature is a free-text box. But early model testing revealed a consistent failure pattern: open-ended queries produced unreliable scoping. The model would answer about the wrong person, collapse timeframes, or synthesize across contexts it shouldn't combine.

The guided approach wasn't a UX preference. It was a strategy for constraining the problem space early, so the model could produce reliable answers within a bounded scope. Each structured interaction also generated clean, labeled training signal — the path toward natural language input, not a detour from it.