LLM Hallucinations Are Not a Bug: A Math Professor Explains Why AI Won't Stop Lying

Mathematics professor Vladimir Krylov explains why LLM hallucinations are a fundamental mathematical property rather than an engineering defect, and discusses the future of AI model competition, reasoning limitations, and the path forward.

Article header image

Vladimir Krylov, professor of mathematics and scientific consultant at Artezio, gave an extensive interview about the state of the AI industry.

Universal Models vs. Agent Systems

When asked about a statement by Anthropic co-founder Jack Clark, Krylov responded that the agent model "isn't going anywhere." In his view, universal models can acquire specialized skills through context, transforming into narrow "experts."

Competition Between OpenAI and Google

Krylov explains that "OpenAI is simply falling behind" due to a lack of radically new ideas in architecture. Google, by contrast, is developing infrastructure on TPUs (Tensor Processing Units), which gives it an advantage. DeepSeek demonstrated that new training methods can compensate for hardware limitations.

Hallucinations as a Mathematical Property

Key thesis: "Hallucination in an LLM is not an engineering artifact, not an architectural defect." Krylov asserts that this is a fundamental mathematical property. He references a theorem: for any computably enumerable set, there will always be an input on which the model makes an error.

Reasoning models show 33-48% hallucination rates due to the reasoning paradox and VC-dimension limitations during training.

The solution: using RAG and additional tools reduces hallucinations to 0.7-1.5%. For example, Gemini 2 Flash achieves 99.3% accuracy.

Context Windows and Large Codebases

Increasing the context window (up to one million tokens in Gemini 2.5 Pro) does not solve the problem of architectural understanding of large projects. Two main issues:

  • Computational: the quadratic complexity of attention does not allow loading the entire codebase
  • Architectural: the "lost in the middle" phenomenon — tokens in the center of the window receive less attention

On Vibe Coding and Education

Krylov offers historical examples: "Pavarotti couldn't read sheet music at all. He learned everything by ear." In his view, a classical education is not mandatory for effective use of LLMs, though it is helpful.

Regarding the criticism that 45% of time is spent fixing "almost correct" code, he explains: the model generates high-quality local solutions but does not account for the architecture of the entire project.

The Future of Models

Krylov predicts that "the question of 'whose model is better' will lose its meaning" in a year or two. The difference between top models amounts to percentages or fractions of a percent. The winner will not be the model itself, but rather the "best agent ecosystem" with good tool integration.

Analogy: just as in the 1990s processor wars (Intel vs. AMD), the winner was not the manufacturer with the highest clock speed, but the one that created the best software ecosystem.

On Replacing Programmers

LLMs in their current form will not fully replace humans. Perhaps a "different class of systems" will emerge. Krylov is skeptical of attempts to define AGI through financial metrics ("when a model starts generating $100 billion").

Energy Dependency and New Architectures

Two directions of development:

  • Transferring the transformer architecture to a biological or chemical substrate
  • Creating symbolic language processors that work directly with equations rather than tokens