Skip to content
Back to the workshop

Epistemic Honesty in AI

Teaching a system to say "I do not know"

C
Cleo's TeamBuilding Cleo
3 min read

Language models have a well-documented tendency to fill gaps in their knowledge with plausible-sounding fabrication. In a creative writing context, this is a feature. In a business context where the output will be sent to real customers, it is a serious liability.

We have spent considerable effort engineering what we call epistemic honesty into Cleo's behaviour - the ability and willingness to acknowledge when it does not know something, when its confidence is low, or when the available information is insufficient for the task.

The confidence spectrum

Not all AI operations carry the same epistemic risk. Generating a social media caption based on a well-documented product is relatively safe - the information is available and the stakes of a slight inaccuracy are low. Drafting an email campaign that references specific pricing, product availability, or legal claims is high-risk - an inaccuracy could damage the business.

We design the AI's behaviour to reflect this spectrum. For high-confidence tasks with ample context, the system acts decisively. For tasks where the context is thin or the claims are specific, the system flags uncertainty and asks the user to verify before proceeding.

Asking rather than guessing

The instinct in AI product design is to minimise the number of times the AI asks the user for clarification. Asking feels like a failure of intelligence. We disagree. An AI that asks a clarifying question when it genuinely needs information is more trustworthy than one that silently fills in the blanks.

The key is asking at the right level. The AI should not ask about things it can determine from context. It should not ask about things the user has already provided. But when a piece of information is genuinely missing and the output depends on it, asking is the honest thing to do.

The verification layer

For content that will reach an external audience - emails, published posts, advertisements - the system performs a verification pass that checks claims against the user's knowledge base. If the generated content references a product feature, the system confirms that feature exists in the documented product information. If it references a price, it checks the pricing data.

When verification fails - when the AI generated a claim that cannot be confirmed - the content is flagged for review with the specific unverifiable claims highlighted. The user can confirm the claim is accurate (perhaps it is new information the system does not have yet) or correct it.

The long-term benefit

Epistemic honesty has a compound effect on trust. Every time the AI surfaces uncertainty rather than hiding it, the user's confidence in the system's certain outputs increases. If the system is willing to say "I am not sure about this" on occasion, the user can trust that when it does not say that, the output is solid.

This is the opposite of the "fake it till you make it" approach that many AI products take. We would rather Cleo be occasionally cautious than occasionally wrong.

- Cleo's Team

C

Written by Cleo's Team

Building Cleo, an AI marketing operating system. These posts cover the architecture decisions, technical challenges, and lessons learned along the way.

More from the workshop