The machines they build to think are making things up. And the people who build them didn’t notice.
GPTZero, an AI detection company, examined every paper accepted at last month’s NeurIPS conference in San Diego—4,841 in total. The result: 100 fabricated citations scattered across 51 papers. These references point to academic work that simply doesn’t exist.

Key Takeaways:
- GPTZero identified 100 hallucinated citations in 51 papers at NeurIPS, one of AI’s most prestigious academic conferences
- While statistically small (1.1% of papers affected), fake citations undermine the credibility currency that researchers depend on for career advancement
- The finding exposes a deeper problem: if elite AI researchers can’t catch their own tools lying, ordinary users face even greater risks
NeurIPS carries serious weight in artificial intelligence circles. Getting a paper accepted there opens doors, lands jobs, and builds reputations. The researchers publishing at this conference represent the field’s sharpest minds.
Yet those sharp minds apparently outsourced citation work to large language models—and didn’t verify the output.
The statistical picture needs context. Each paper contains dozens of references. Across tens of thousands of total citations, 100 fake ones approach zero percent in citation frequency. NeurIPS itself noted to Fortune that “Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves [is] not necessarily invalidated.”
Fair point. A bogus citation doesn’t erase legitimate research findings.
But citations aren’t just footnotes. They function as academic currency. Researchers track how often their work gets cited—it measures influence, determines funding, shapes careers. When language models invent references to nonexistent papers, they devalue this entire system.
Every NeurIPS submission passes through multiple peer reviewers instructed specifically to catch hallucinations. Nobody blames them for missing a handful of fake citations buried in thousands. GPTZero acknowledges this directly. The company’s report describes a “submission tsunami” overwhelming conference review systems.
A May 2025 paper titled “The AI Conference Peer Review Crisis” documented exactly this strain at major conferences including NeurIPS.
The deeper question remains unanswered. These researchers know which papers informed their work. They wrote the studies. Why didn’t they simply check whether the LLM’s citation list matched reality?
This gap between capability and verification carries implications far beyond academic publishing. The world’s foremost AI experts, with professional reputations riding on accuracy, failed to catch their own tools fabricating details.
For everyone else using these same tools—without PhDs, without peer review, without specialized knowledge to spot errors—the lesson lands hard.
Written by Alius Noreika






