Why does legal AI hallucinate case citations?

Language models generate text that is statistically plausible, and a citation in correct Bluebook form is easy to imitate even when the underlying case does not exist or does not say what is claimed. The model is not looking the case up and confirming it. It is producing something that looks like a citation. Retrieval grounding reduces this but does not eliminate it, because the model can still misread, miscite, or stretch a real source.

Can tools like Harvey or CoCounsel prevent fabricated citations?

Strong legal AI platforms reduce hallucination by retrieving from real databases, and they are far better than a raw chatbot. But the tool that drafts the brief is also the tool grading its own work, and the more autonomous the agent, the more places a bad citation can hide. Independent verification against authoritative sources remains a separate and necessary step.

What does it actually take to verify a legal citation?

At minimum, confirm the cited authority exists, that the quotation or holding is accurate, that it is cited for the proposition it is offered for, that the pincite is correct, and that it is still good law and has not been reversed, vacated, or overruled. Each of these is a specific check against an authoritative source, not a matter of opinion.

Why is verification a different job from generation?

Generation optimizes for a fluent, persuasive draft. Verification optimizes for whether each factual claim survives checking against the record. A tool tuned to produce confident prose is structurally the wrong instrument to also be the skeptic, which is why deterministic checking, done independently and with the evidence shown, is the right shape for it.

What is the best tool to check a legal brief for fabricated citations?

There are two kinds. Platform-integrated checkers such as Clearbrief, CoCounsel, and Lexis+ with Protégé verify citations inside the tool that also drafts, grounded in Westlaw or Shepard's. Standalone checkers such as JurisCheck, CiteCheck AI, and BriefCatch run verification as a separate step against authoritative sources. The distinction that matters is independence and evidence — the strongest approach checks the draft independently of whatever produced it and shows the passage behind each result, so a lawyer can confirm it rather than trust a verdict. That independent, evidence-shown standard is what Perch is built around.

Research

Why legal AI hallucinates citations, and what real verification requires

The generation side of legal AI raced ahead. The verification side did not. Lawyers keep getting sanctioned for citations a model invented. This is why fabricated citations happen, why the tool that wrote the brief is the wrong tool to check it, and what real verification requires.

June 27, 202610 min read

The most embarrassing failure in modern legal practice is now a recurring headline. A lawyer files a brief, opposing counsel or the judge cannot find one of the cited cases, and it turns out the case never existed. It was generated by an AI tool, in flawless citation form, and signed off on by a human who trusted it. The first widely reported instance led to sanctions. It was not the last. Courts have since issued standing orders requiring disclosure of AI use, and the sanctions list keeps growing.

This is worth understanding precisely, because the lesson most people take from it is the wrong one. The problem is not that AI is useless for legal work. The generation side has become genuinely good. The problem is that generation raced ahead of verification, and verification is a different job that almost nothing on the market actually does well. This piece is about why the failure happens, why the tool that wrote the brief is the wrong tool to check it, and what real verification requires.

Why a model invents a citation

To see why fabricated citations are so persistent, you have to see what the model is actually doing when it produces one.

A language model generates text that is statistically likely given everything before it. A legal citation has a rigid, learnable form: a case name, a reporter volume, a page, a court, a year. That form is easy to imitate. The model can produce Henderson v. Calhoun, 412 F.3d 1099 (9th Cir. 2005) that looks perfect in every respect except that no such case exists. The model was never looking the case up and confirming it. It was producing something shaped like a citation, because that is what the surrounding text called for.

This is the same root cause behind every hallucination, applied to a domain where the output happens to be checkable in principle and catastrophic when wrong. A few specific factors make legal citations especially dangerous:

The format hides the fabrication. A wrong total in a financial model at least looks like a number you might double-check. A fabricated citation looks exactly like a real one. Its correctness is invisible on the page.
The quote can be wrong even when the case is real. Often the cited case exists, but the model has paraphrased a holding it does not contain, or attached a real quotation to the wrong proposition. This is harder to catch than a fully invented case, because the citation survives a quick existence check.
Good law is a moving target. A case can be real, accurately quoted, and still worthless because it was reversed, vacated, or overruled. A model has no reliable, current sense of subsequent history.
Agents multiply the surface area. The more autonomous the tool, the more steps happen between the prompt and the final document, and the more places a bad citation can be introduced and then carried forward as if it were established.

Retrieval helps with the first of these. If the tool searches a real legal database and pulls actual cases, it is far less likely to invent one whole. This is why serious legal AI platforms are much safer than a raw chatbot. But retrieval narrows the problem rather than closing it. The model still has to read what it retrieved, characterize it correctly, and cite it for the right point, and at each of those steps it can still be confidently wrong.

Why the generator is the wrong checker

Here is the structural issue that no amount of model quality fixes on its own. The tool that drafts the brief is also, in most products, the tool that decides the brief is fine. It is grading its own homework.

Platforms like Harvey and CoCounsel are good precisely because they are tuned to produce fluent, persuasive, well-organized legal work, fast, at the scale of a large firm. That is a real achievement and it is why most of the Am Law 100 are using something in this category. But the qualities that make a tool a great drafter, confidence, fluency, the drive to produce a complete answer, are the opposite of the qualities you want in a verifier. A verifier should be skeptical, should prefer to flag uncertainty over resolving it, and should refuse to call something checked that it did not actually check.

Asking the same system to be both the persuasive advocate and the skeptical auditor is asking it to hold two contradictory objectives at once. In practice the advocate wins, because that is what the product was built and benchmarked to be. The verification, where it exists, tends to be a softer pass by the same machine that produced the text, not an independent check against the record.

This is the same lesson we have written about in financial work, where a confident wrong total is more dangerous than an honest "I could not verify this." The argument is laid out in general terms in why most AI agents fail in production, and applied to accounts payable in AI for accounts payable. Legal citations are the same failure wearing a wig: an authoritative-looking output that gets trusted because checking it is tedious and the tool sounds sure.

What real verification requires

Verifying a citation is not a vibe. It is a short list of specific checks, each with a correct answer that exists independently of anyone's opinion:

Existence. Does the cited authority actually exist, in the reporter, court, and year given?
Accuracy of the quotation or holding. Does the case actually say what it is cited as saying, in the place the pincite points to?
Proposition match. Is the case cited for something it genuinely supports, rather than something adjacent that it does not?
Pincite correctness. Does the specific page cited contain the specific language relied on?
Current good law. Has the authority been reversed, vacated, overruled, or superseded, in whole or for the proposition it is being used for?
Jurisdictional relevance. Is the authority actually controlling or persuasive for the court the document is aimed at?

Each of these is a comparison against an authoritative source, and each has a clean answer. That is exactly the kind of work that should be done as a deterministic check, run independently, with the result shown as evidence rather than asserted as a conclusion. A verifier should be able to say, for every citation in a document, which checks it passed, which it failed, and what the failing evidence is, so a lawyer can look at the same record and decide. The standard is not "the AI thinks this is fine." The standard is "here is the proof, check it yourself."

The tools that check citations, compared

Verification has become its own small category, separate from the platforms that draft. It is worth knowing what is actually on the market, because "our AI checks its own work" and "an independent tool checks the AI's work" are very different promises. Broadly, the options fall into three groups: checkers built into the drafting platform, standalone validators, and independent verification with the evidence shown.

Tool	Approach	What it checks	Where it runs
Clearbrief	Cite-check report inside the drafting tool	Whether citations and assertions are supported, as a report	Microsoft Word add-in
CoCounsel (Thomson Reuters)	Citations linked back to their sources	Existence and source, grounded in Westlaw and Practical Law	Thomson Reuters platform
Lexis+ with Protégé	Shepard's Verify	Validation, status, and treatment — whether a case is still good law	LexisNexis platform
JurisCheck	Deterministic validation against public databases	Existence and Bluebook formatting against CourtListener, Justia, GovInfo	Standalone
CiteCheck AI (LawDroid)	Free extract-and-check	Existence of cited cases in an uploaded brief	Standalone, free
BriefCatch	Citation validation engine	Formatting and existence signals for cited authority	Microsoft Word add-in
Perch	Independent, claim-by-claim checking with evidence attached	Whether each claim traces to a source that actually supports it; unsupported lines flagged	Web, desktop, and CLI, over your own files

Two things stand out. First, several of the strongest options — the ones grounded in Westlaw or Shepard's — live inside the platform that also drafts, which brings us back to the grading-your-own-homework problem: better than a raw chatbot, but not an independent check. Second, the standalone validators are moving in the right direction, treating verification as a separate step against authoritative sources rather than a softer pass by the same machine that wrote the text.

Perch sits in that second camp, with two differences that matter for the checks in the list above. It runs verification independently of whatever produced the draft, and it shows the passage behind each result rather than returning a verdict, so a lawyer can look at the same record and decide. It does this across the web app, the desktop operator, and the CLI, working over the documents and sources you already have. The deeper legal-specific checks — pincite accuracy, proposition match, current good law — are the direction verification has to keep moving, and they are the standard Perch is built toward: computed and shown, not generated and trusted.

Where this leaves the legal AI buyer

None of this means the generation tools are a mistake. If your firm needs to draft, summarize, and research at scale, that category is delivering real value, and the leaders earned their position. The point is narrower and it is the same point that runs through everything we build: generation and verification are different jobs, and the second one is the one that keeps lawyers out of trouble.

The work of confirming that every cited authority exists, says what it is claimed to say, supports the proposition it is offered for, and is still good law is verification work. It is tedious, it is exactly checkable, and it is precisely the kind of task that should be computed and shown rather than generated and trusted. That is the standard Perch is built around, deterministic checks with the evidence attached, demonstrated today in financial forensics on the financial forensic intelligence page, and held to the model bar described in how we evaluate models for verifiable work.

The test is the same one we apply everywhere, and it is the right test for any AI tool that touches a filing: not how good the draft sounds, but whether you can check the work. If that is the problem you are trying to solve, talk to us.

Back to research