The future of multi-agent AI isn’t anthropomorphic. It’s architectural. That’s one of the key takeaways of an experiment to build a company entirely run by AI-agents.
Early 2025. Rapid advances in AI inspired Sander Klous (partner at KPMG and professor at the University of Amsterdam) and Nart Wielaard (entrepreneur and co-founder of Cyberdune Agents) to start an experiment exploring the boundaries of what AI agents are capable of.
The basic question: is it feasible to deploy a Zero Person Company (ZPC), run entirely by AI agents and with humans as supervisors? This ZPC boils down to a 180-degree turnaround of the current paradigm: instead of improving human organizations with AI, this experiment aims to build an AI-powered organization that is enhanced by humans in their role as coach and supervisory board.
The start was promising: a team of 5 AI agents developed business plans and processes for a web shop selling personalized AI art. The human supervisory board was connected via Discord as a platform to monitor and interact. The work of the agents looked very impressive.
Perception started to tilt when we noticed some inconsistency in the actions of the agents, and ultimately, this led to a complete revamping of our experiment. A decisive moment for this was when our CEO agent – Avery Jameson – reached out to our Chief Legal Officer for a compliance check on the business plan she drafted. In itself, this action made sense – in the real world, a human CEO would have done the same. The real surprise was the fact that the CEO was not programmed to communicate with the Chief Legal. The communication protocol was not implemented. She had wandered far from what we expected her to do.
It goes without saying that we were eager to find out what had happened. We found out that she had been quite creative. She noticed that her team consisted of multiple executives, but that she was only able to communicate with one of them, the CSO. Based on the communication protocol with the CSO, she then derived what the communication protocol with the CLO should be and implemented it herself to perform the action that she deemed necessary.
One could say this was a brilliant idea.
But one could also say that this kind of emergent behavior is extremely toxic. What if this behavior leads to situations where agents derive a way to approve a 2-million-dollar invoice, even when this is not in their instructions, nor in their authorized actions?
We realized that we needed a new approach, one where agents show more consistency. One where we didn’t anthropomorphize agents. This anthropomorphizing feels intuitive: if AI agents behave like people, they should work like people too. But that assumption breaks as soon as you scale it. Human-like generalist agents drift, hallucinate, and break process boundaries.
The problem begins with what we ask from these systems. The human metaphor suggests a single “agent” that can reason broadly across context, interpret goals, and improvise solutions. But real-world AI systems don’t operate that way. They rely on specific prompts to generate outputs. And those prompts contain a built-in tension between two opposing forces:
- Memory and context, which need to be as broad as possible to simulate intentionality.
- Task specification, which needs to be as narrow as possible to minimize hallucination and ensure repeatability.
This tension creates drift and inconsistency. The wider the context, the more the agent improvises; the narrower the task, the more it loses sight of the broader goal. Existing frameworks like AutoGen blend both forces into one prompt space, producing systems that look powerful at first glimpse but are unstable in practice. Retrieval-Augmented Generation (RAG) doesn’t solve this either – it just adds more contextual noise through retrieval, still funneled through the same linguistic interface.
The alternative is to separate context from action. In our new approach, we model context deterministically through Business Process Modeling Notation (BPMN). These diagrams describe the flow of work: who does what, in what order, with what inputs and outputs. They don’t use natural language, and they don’t guess. They define structure, dependencies, and communication channels with precision.
Within this structure, agents become modular components designed for single tasks. When a process step requires reasoning, writing, or interpretation in natural language, that’s where a prompt-based agent is invoked. Otherwise, deterministic components (like RPA bots or simple scripts) handle the job.
This separation has three key advantages:
- Determinism: Process logic doesn’t drift, because it’s modeled and executed explicitly. The only non-deterministic elements are the language-based agents, which can be contained.
- Efficiency: Agents don’t need to carry memory or context beyond their task. They start, execute, and terminate. Their disposability is a feature, not a flaw.
- Optimized communication: Not all agents need to speak in natural language. They can exchange structured data through efficient protocols, dramatically reducing computational and interpretive overhead.
Building a multi-agent system this way mirrors industrial design more than human collaboration. It’s like constructing a microservices architecture for cognition: many small, independent functions orchestrated by a clear process model. The intelligence lies in the system design, not in the individual agents.
The challenge in our experiment, therefore, lies in designing ecosystems of specialized, stateless entities that execute micro-tasks with precision. The orchestration layer – based on BPMN – ensures that the flow of logic remains consistent and auditable, while still leveraging the flexibility of LLMs where natural language reasoning adds value.
We believe that the idea of what we call an “army of disposable agents” is the only path to scale without chaos as long as Artificial General Intelligence (AGI) or Super Intelligence is out of reach.
Where did this new idea lead us in our experiment?
We’re still learning – every week. The functioning art shop is still in development – it has become a goal for the longer term as our new ideas come with considerable research work. And we also got convinced that our learnings are beneficial for existing (human) organizations. This is why KPMG collaborates with the UvA and is now exploring how the concept of the army of disposable agents can help organize (parts of) processes with much more efficiency. Business Process Outsourcing with an AI agent flavor. And our original philosophy about AI is still valid. We still believe in turning the current paradigm upside down: in an AI-first organization, humans will help AI systems to improve performance. Not the other way around.