Fact Guard: How I Stopped My AI From Making Stuff Up
Five rounds of prompt engineering taught me that telling a fast LLM not to fabricate doesn't work. Here's the architecture that does — a deterministic check plus a Sonnet cleanup pass on every generation.
I asked my own marketing tool to polish a blog post I'd written. It came back fluent, structured, confident — and full of things I never said. Five fabricated statistics. A fake "I noticed it first on a Tuesday" anecdote. An invented audit of fifty pages that I'd never run. The polish was excellent. The post was a lie.
That was Monday afternoon. By Tuesday night, the platform stopped doing that. The fix isn't a better prompt. It's an architecture I'm calling Fact Guard, and it now runs on every single generation across every agent in VibeFlow.
This is the post about what broke, what I tried that didn't work, and the thing that finally did.
The problem in one sentence
A fast LLM, given a long list of instructions like "don't invent statistics," will read the rule, understand the rule, and then invent statistics anyway. It will do this confidently, with citations to studies that don't exist and percentages that aren't anywhere in the input. The fluency is a feature of the model. The lying is a side effect of that same feature.
For a marketing tool, that's a problem you can't ship around. Every generation that gets shipped under a user's byline carries their credibility. One checked statistic that doesn't exist kills trust for the whole piece — and for the tool that produced it.
Round 1 through 5: things that didn't work
I spent most of Monday trying to fix this with prompt engineering. Here's the rough sequence:
Round 1. Added an explicit "no invented specifics" rule to the Content agent's system prompt. Numbers, percentages, dollar amounts, dates. Result: the model stopped inventing percentages and started inventing experiences. "I audited my top 50 pages." "I rebuilt five of my highest-traffic pages." Same failure, different surface.
Round 2. Added a second rule: "no invented events, experiences, or case studies." Named the failure modes explicitly. Result: the model stopped saying "50 pages" and started saying "five different sources" — word-form numbers slipping past a digit-only check. It also kept emitting sentences that sounded like fabricated retrospectives. "I took three specific approaches. I embedded context into my answers." The structure of a case study, with no underlying case.
Round 3. Auto-cleanup pass. After the first generation, run a fabrication check on the output and feed the flagged items back to the model with "remove these." Result: the model often kept the flagged items anyway. It would acknowledge them, write a different version, and leave the fabrications in. Catching the lie didn't stop the lie.
Round 4. Expanded the detection net — word-form numbers, more verbs, vague-network claims ("the founders I've talked to"). Better at catching, still not at fixing.
Round 5. Switched the cleanup model. Same prompt, different brain.
That's the round that worked.
The actual fix
The failure across rounds 1-4 was treating this as a prompt-engineering problem. It isn't. It's a model-fit problem. The Haiku model that's perfect for "generate a blog post quickly from a brief" is the wrong model for "obey a long list of negative constraints while preserving an existing draft." Fast generation models are tuned for fluency. Constraint-following at length requires a different kind of model.
So Fact Guard has two parts.
Part one: a deterministic check. Every generation runs through a regex-based scanner that extracts concrete specifics from the output — numbers, percentages, dollar amounts, multipliers, years, days of the week, and first-person past-tense narrative phrases like "I audited" or "I rebuilt" or "the founders I've talked to." It checks each one against the user's original input. Anything in the output that isn't traceable to the input gets flagged.
The scanner doesn't try to be smart. It doesn't use an LLM. It runs in milliseconds and catches the failure modes I'd actually observed across five rounds of testing. It's catching word-form numbers, verb-rich event claims, vague network references — the patterns the model reaches for when it wants to sound credible without earning it.
Part two: a cleanup pass on Claude Sonnet. If the scanner finds nothing, the generation ships. If it finds anything, the system pauses, sends the original draft plus the flagged items plus the user's source input to Sonnet with a focused instruction: rewrite this, removing exactly these items, preserving everything else. Sonnet follows instruction lists meaningfully better than Haiku — that's the model difference doing real work. The cleaned draft replaces the original before the user ever sees it.
The user sees one of three small chips in the output header:
- Reviewing for accuracy… while the cleanup pass runs (about three to five seconds)
- ✓ Cleaned N invented specifics when the cleanup successfully removed flagged items
- N to review if any specifics weren't catchable by the cleanup — a fallback for manual review
That's the system. A regex scanner catches the lies. A more careful model rewrites without them. The user gets clean output, not a warning about dirty output.
What I'm watching
The honest version is that this has been live for a few days, not months. The chips fire on roughly every paste-feedback workflow I've tested — meaning if you give the agents a long existing draft, fabrication detection lights up before the cleanup runs. After cleanup, the chip almost always reports a successful clean.
What I'm watching across the next thirty days: whether the cleanup pass holds up at production scale, whether the regex catches enough of the failure modes real users will hit, and whether the small added cost per generation (Sonnet calls only run when fabrications get detected, which is most generations on paste-feedback and approximately zero generations on greenfield briefs) stays defensible at volume.
What this means for users
You don't have to read every generation looking for invented numbers. You don't have to second-guess a polished-sounding paragraph because it might be quietly fictional. The system catches and removes the lies before you see them. If the chip reports zero cleaned specifics, the output didn't fabricate anything. If it reports five cleaned specifics, the system caught five things and the output you're reading is the cleaned version, not the original.
That's the trust difference. Most AI marketing tools ship whatever the model generates. VibeFlow ships what the model generates after a separate check has verified every concrete claim against your input.
It also means I can credibly say something I couldn't say a week ago: this tool will not put fabricated facts under your name.
How this connects to everything else I shipped this week
Fact Guard isn't an isolated feature. It's the third leg of a trust-first build, alongside the work I shipped Monday for AI search readiness and the receipt that came in Wednesday morning.
- Google Rewired Search. Here's What I Shipped in Five Days. — the five product changes for AI Overview citation: schema enrichments,
/about,/stack,/learn/seo, and the new AI Search Audit mode. Without those, the entity-graph foundation Fact Guard sits on top of doesn't exist. - Google's AI Mode Cited VibeFlow — the receipt. Three days after shipping the schema work, Google's AI Mode started citing VibeFlow for brand-name queries. The work doesn't have to compound for six months to start producing results.
- I Built an SEO Agent and Made It Audit My Own Site — the post about the same failure mode, on a different agent: the SEO scanner had bad eyes, so the agent confidently invented. Fact Guard is the generalized version of the lesson from that post. Better input means better output, applied to every agent simultaneously.
The thesis underneath all of it: trust is the next competitive edge for AI tools. Speed and fluency are commoditized. The tool that will not lie under your name is the tool that earns the right to ship under your name.
The line I keep coming back to
The agent didn't lie. We just stopped giving it permission to.
That's Fact Guard. It runs on every generation, in every agent, automatically, and it's already on if you're using the platform today.
Ship marketing as fast as you ship code.
VibeFlow turns one prompt into a full launch — content, social, SEO, ads, ASO, email. Free to try.
Start free →