The Limits of LLM Based Systems

Towards Potential Solutions

December 20, 2025

Despite the exponential error accumulation problem, proponents of further LLM scaling as a path towards true general artificial intelligence (AGI) argue we've barely begun exploring what's possible. They suggest that the path to AGI isn't just about a better next-token predictor, but about the systems we build on top of LLMs.

Extended reasoning chains Current agents generate hundreds of thinking tokens; future systems might be able to generate millions, creating elaborate verification chains where each step is validated before proceeding. But only increasing context length or adding more "thinking tokens" might not solve the fundamental error acumulation problem as we explored in the previous section. we could e.g. do Three of Thought patterns, and let multiple reasoning branches run in parallel

Tool execution and Structured Outputs More importantly however, modern LLM agents increasingly use tool—executing. Instead of using pure text generation (which is inherently vulnerable to drift and halucinations) to structure their internal reasoning, they can invoking specialized tools that perform specific subtasks, such as searching files for the most relevant context, run computer code to verify outputs [more examples].

This shifts the paradigm from pure text generation to tool orchestration. Instead of performing all of the internal reasoning by generating text tokens with an LLM, the agent can generate instructions for which tool to call and with what parameters, receives the results, and incorporates them into subsequent LLM-based reasoning. This might suppress the exponential error accumulation from the LLM-based reasoning enough to keep the model on from drifting and hallucinating.

Structured outputs further constrain errors and can prevent AI agents from calling made up non-existing tools during their internal reasoning process, the LLM output (wat function should be called) can be validated against well-defined output schemas. any task we want to run seems to require an agentic scaffold

Networks of Specialized Agents Though tool use has become very standardised as of 2024, agent-to-agent communication is yet less well defined.

We can imagine a hierarchical system of multiple agents with defined roles, that delegate tasks down to increasingly more specialised agentic workflows that they can invoke in a way similar to how MCP allows for tool calling. networks of specialized agents working in parallel might achieve collective intelligence exceeding any individual model's capabilities. Just as no single human could build a modern corporation, but teams with clear role divisions can, perhaps LLM agents with well-defined responsibilities, formal communication protocols, and hierarchical verification could achieve AGI-like capabilities through collaboration.

as multiple startups emerge, and work out agentic scaffolds for different tasks, we will see they become tied together into larger networks of LLM-agent based services. There will be wrappers on top of wrappers, each will perform a dedicated niche in the AI-based ecosystem, so that when we run Voting mechanisms and consensus-building Backtracking when hitting dead ends

networks of specialized agents with clearly defined roles and formal communication protocols might achieve collective intelligence exceeding any individual model—much as human organizations and bureaucracies can accomplish what no single person could. Analogy: Like human organizations—no single person builds a corporation, but coordinated teams with clear roles can accomplish what individuals cannot. Could agent bureaucracies achieve collective intelligence exceeding any individual model?

These approaches represent genuine innovations that may push LLM-based systems far beyond current capabilities. But critics argue they don't solve the fundamental problem—they just create more sophisticated ways to work within the limitations.

Continue reading:The Limits of LLM Based Systems