The coordination problem eating your AI budget
The coordination problem eating your AI budget
By
cauri
on •
Jul 21, 2025

Originally published on cauri's Substack
_____________
We built a system with about fifteen agents—each with a clear role. Some pulled research from structured sources or the web, others planned workflows, compiled information, drafted documents, or searched through custom databases to surface what mattered. Each could hold its own in a narrow task. But together? They were powerful... and they kept circling back on the same jobs.
Our task manager agent would coordinate an entire research flow—planning, task distribution, multiple research agents feeding a synthesiser, final document assembly. Beautiful. Then, after the final agent wrapped up the document, the task manager would look at the completed work and decide to… do the whole bloody thing again. The entire flow. Burning through tokens like we had money to waste.
Two other agents might summarise the same paper. Another would plan a workstream that another agent had already handled. Sometimes no agent followed up on a critical step because it "assumed" another agent had it covered. We weren't dealing with prompt failure—we were dealing with awareness failure. Not a lack of storage, but a lack of coordination: what's been done, what still needs doing, and what to prioritise next.
At first, we tried basic reflection—prompting the LLM to ask itself whether it had completed a task or if something had already been addressed. (Like when chatGPT says, I'll summarise that for you, but does not do it. And we reply "hey, where's my summary?" It apologises and gets it done. This works the same way but programmatically.) This helped marginally, but reflection without a source of truth can amplify confusion. So we went further: we gave each agent the ability to write to and read from a central coordination log—backed by Postgres and mediated through a Model Context Protocol (MCP)-style tool layer.
Each agent logged what it handled, what it produced, and what it thought significant. Before beginning a new task, it checked that log: what had already happened? What remained? What decisions had others made? This went beyond simple recall—it enabled coordination.
Suddenly, the whole thing started to gel. Not because the agents got smarter, but because they finally had the right context—available at the right time, with the right scope.
We started building memory. Not memory like RAM and hard drives, but memory needed for cognition. Memory in a multi-agentic system emerges from the art and science of context management.
Context requires more than storage—it needs timing and scope
When people talk about giving LLMs recall abilities, what they usually mean involves access to stored history: previous conversations, documents, events, summaries. But stored information doesn't help unless you know what to surface—and when.
Storage doesn't create the problem. Context delivery does.
Context only works when the system receives the right piece of information, at the right time, in a form it can act on. Give too little and it forgets what it needs. Give too much and it loses the thread. You end up with agents making redundant decisions or missing key steps—not because the knowledge wasn't available, but because it wasn't surfaced properly.
That's why recall in agentic systems needs more than just a tool—it needs a strategy.
Six coordination strategies that work in practice
In our work designing multi-agent systems with real-world complexity, we rely on six core memory strategies. These represent real components we deploy and orchestrate. And yes, each one fails in predictable ways when pushed too hard.
RAG (Retrieval-Augmented Generation) Used to surface relevant reference materials from internal databases or structured knowledge bases. The vector store provides grounding context for specific decisions. But it handles ephemeral information poorly—like what just happened a minute ago. For this, RAG would cheerfully return confident nonsense.
Redis for short-term recall We use Redis to hold fast-turnover information—like what a group of agents discussed or selected recently. This short-term store supports task-specific queries and quick lookups during a multi-agentic process, giving each agent access to recent events without bloating the prompt window. Of course, when Redis times out or gets flushed at the wrong moment, your agents suddenly develop amnesia mid-task.
Distillation logs Instead of storing whole chat histories, we distil chat into structured takeaways: decisions, rationale, open questions, blocking issues. These go into a shared space—Postgres or otherwise—and get reinjected as needed into prompts, sometimes in summary form into a system prompt and sometimes on-demand through a tool. The catch? Bad distillation means losing critical nuance. "Customer angry about pricing" doesn't capture "customer threatening to cancel $2M contract over 3% increase."
Reflection tools These tools pull mainly on chat history or very recent activity. They allow an agent to pause and ask: "Have I done this before?" or "Did I complete the task I set out to do?" The tool surfaces just enough information to ground the next decision. Useful when a loop of interaction needs light-touch awareness. Less useful when the reflection itself creates loops of navel-gazing.
Runtime prompt injection Instead of keeping everything in a static system prompt, we inject specific, timely context at runtime. This might include summaries of previous choices, key facts from distillation, or dynamic variables—tailored to the current interaction. Not everything can go in here as a single growing data structure could blow the context window.
MCP-style structured coordination Here, agents write to and read from a shared database through a thin tool layer. Each entry includes what the agent handled, what it produced, and what it believes matters for others. Other agents then use that shared log to inform their own decisions. Agents can create questions to get just the information they need using natural language from the shared store. Beyond recall—this forms the foundation of coordination.
Each strategy plays a distinct role but they overlap and work together like parts of a single brain. Like in human cognition, the relationship between them matters more than the pieces alone.
Why we build our own memory systems
Many tools out there promise memory for LLMs—LangChain, Mem0, MemGPT, Rewind, AutoGPT plugins, vector databases with session management. They work fine for demos, and some even hold up in light production use. But as soon as the workflows get real—spanning teams, sessions, documents, and systems—they start to crumble.
Take LlamaIndex's built-in memory and chat history system. Looked great on paper. As soon as we wanted anything more sophisticated than basic chat, it became impossible. Open source? Sure. But making changes at the level we needed wasn't practical—you'd essentially rewrite the whole thing. We spent weeks unwinding LlamaIndex from our system. Complete waste.
Most of these tools assume simple agents doing narrow tasks. They track conversations, maybe offer summarised recall, and sometimes automate RAG behind the scenes. What they don't offer involves memory as infrastructure for use by agents. Systems that handle information flow between agents. Persistence that lives longer than a session and adapts over time. Architecture that distinguishes between what happened, what mattered, and what still needs attention.
That's why we build our own.
We don't mean building vector stores or rewriting Redis. We mean stitching the right behaviours together—distillation, reflection, prompt injection, structured write-read layers—and tuning them to the actual goals of the system. Often we use common tools like Postgres, Redis, DuckDB, or even plain JSONL files. The difference lies in the design: what gets remembered, when it gets surfaced, and how it stays useful.
Off-the-shelf solutions usually treat recall like a transcript or a vector soup. We treat it like a living, scoped knowledge space—curated, pruned, and shaped by the agents using it.
Build incrementally or suffer
Here's what vendors won't tell you: building these systems happens iteratively, not as one big project. You can't estimate "memory system: 6 weeks" and deliver something useful. Instead, you build coordination piece by piece as your agentic system grows.
Start with basic logging. Add reflection when loops appear. Layer in Redis when timing matters. Introduce distillation when chat history gets unwieldy. Each piece solves a specific problem you've actually hit, not one you might theoretically face.
The alternative? Spending weeks integrating a comprehensive "memory solution" that doesn't quite fit, then weeks more unwinding it when reality hits. We learned this the hard way.
Quick coordination checklist for enterprise systems
If you're evaluating or building context management for an LLM-based system, check whether it:
Handles both long-term and short-term information effectively
Captures decisions, not just transcripts
Allows multiple agents or components to read and write
Enables context filtering rather than prompt flooding
Supports scoping or tagging of memory entries
Provides traceability and human auditability
Evolves over time rather than built all at once
If most of these remain unchecked, memory will likely become the bottleneck—and the model will take the blame for what essentially amounts to a systems design problem.
Questions to ask before your agents start forgetting
If you're building or managing an AI system that needs persistent context, these questions can save you from surprises:
Here's what we wished we'd asked before our first system fell apart:
What information do agents actually need to track to perform well?
Which parts write to shared state—and which ones read from it?
How do we distinguish between temporary context and persistent knowledge?
What gets stored, for how long, and in what format?
Can entries be traced, audited, and updated without breaking the system?
Does the architecture help parts coordinate, or just replay chat history?
What scope does each memory solution cover?
What happens when two entries conflict?
These aren't nice-to-have considerations. They shape how the system thinks, reacts, and collaborates. Treat them like a checkbox and your system will behave like one.
Let agents manage their own recall
We stopped trying to control everything. Early on, we fell into the trap of micromanaging context—carefully orchestrating which agent saw which piece of information when. It reminded me of those horrible early AI chatbots in the 1980s with nested if-then statements, trying to handle every possible case. Exponential complexity, minimal intelligence.
Instead, we gave agents tools and let them figure out when to use them. Need recent context? Check Redis. Need decision history? Query the coordination log. Need reference material? Hit the RAG store.
Our task manager stopped repeating entire workflows once it could check what had already been done. Research agents stopped duplicating summaries once they could see what existed. The system became less like programmed automation and more like actual coordination.
That shift—from trying to control every interaction to giving agents the tools to manage their own context—made everything click. Stop thinking of it as a programming problem. Start thinking of it as giving your agents the same thing you'd want in a confusing situation: the ability to check their notes, remember what happened, and figure out what comes next.
The agents finally work together because they can finally see what everyone else has done. Not because we got smarter about programming them, but because we got out of their way and let them coordinate.
______________
Written in collaboration with Claude, with some (sometimes) constructive criticism from chatGPT.
Originally published on cauri's Substack
_____________
We built a system with about fifteen agents—each with a clear role. Some pulled research from structured sources or the web, others planned workflows, compiled information, drafted documents, or searched through custom databases to surface what mattered. Each could hold its own in a narrow task. But together? They were powerful... and they kept circling back on the same jobs.
Our task manager agent would coordinate an entire research flow—planning, task distribution, multiple research agents feeding a synthesiser, final document assembly. Beautiful. Then, after the final agent wrapped up the document, the task manager would look at the completed work and decide to… do the whole bloody thing again. The entire flow. Burning through tokens like we had money to waste.
Two other agents might summarise the same paper. Another would plan a workstream that another agent had already handled. Sometimes no agent followed up on a critical step because it "assumed" another agent had it covered. We weren't dealing with prompt failure—we were dealing with awareness failure. Not a lack of storage, but a lack of coordination: what's been done, what still needs doing, and what to prioritise next.
At first, we tried basic reflection—prompting the LLM to ask itself whether it had completed a task or if something had already been addressed. (Like when chatGPT says, I'll summarise that for you, but does not do it. And we reply "hey, where's my summary?" It apologises and gets it done. This works the same way but programmatically.) This helped marginally, but reflection without a source of truth can amplify confusion. So we went further: we gave each agent the ability to write to and read from a central coordination log—backed by Postgres and mediated through a Model Context Protocol (MCP)-style tool layer.
Each agent logged what it handled, what it produced, and what it thought significant. Before beginning a new task, it checked that log: what had already happened? What remained? What decisions had others made? This went beyond simple recall—it enabled coordination.
Suddenly, the whole thing started to gel. Not because the agents got smarter, but because they finally had the right context—available at the right time, with the right scope.
We started building memory. Not memory like RAM and hard drives, but memory needed for cognition. Memory in a multi-agentic system emerges from the art and science of context management.
Context requires more than storage—it needs timing and scope
When people talk about giving LLMs recall abilities, what they usually mean involves access to stored history: previous conversations, documents, events, summaries. But stored information doesn't help unless you know what to surface—and when.
Storage doesn't create the problem. Context delivery does.
Context only works when the system receives the right piece of information, at the right time, in a form it can act on. Give too little and it forgets what it needs. Give too much and it loses the thread. You end up with agents making redundant decisions or missing key steps—not because the knowledge wasn't available, but because it wasn't surfaced properly.
That's why recall in agentic systems needs more than just a tool—it needs a strategy.
Six coordination strategies that work in practice
In our work designing multi-agent systems with real-world complexity, we rely on six core memory strategies. These represent real components we deploy and orchestrate. And yes, each one fails in predictable ways when pushed too hard.
RAG (Retrieval-Augmented Generation) Used to surface relevant reference materials from internal databases or structured knowledge bases. The vector store provides grounding context for specific decisions. But it handles ephemeral information poorly—like what just happened a minute ago. For this, RAG would cheerfully return confident nonsense.
Redis for short-term recall We use Redis to hold fast-turnover information—like what a group of agents discussed or selected recently. This short-term store supports task-specific queries and quick lookups during a multi-agentic process, giving each agent access to recent events without bloating the prompt window. Of course, when Redis times out or gets flushed at the wrong moment, your agents suddenly develop amnesia mid-task.
Distillation logs Instead of storing whole chat histories, we distil chat into structured takeaways: decisions, rationale, open questions, blocking issues. These go into a shared space—Postgres or otherwise—and get reinjected as needed into prompts, sometimes in summary form into a system prompt and sometimes on-demand through a tool. The catch? Bad distillation means losing critical nuance. "Customer angry about pricing" doesn't capture "customer threatening to cancel $2M contract over 3% increase."
Reflection tools These tools pull mainly on chat history or very recent activity. They allow an agent to pause and ask: "Have I done this before?" or "Did I complete the task I set out to do?" The tool surfaces just enough information to ground the next decision. Useful when a loop of interaction needs light-touch awareness. Less useful when the reflection itself creates loops of navel-gazing.
Runtime prompt injection Instead of keeping everything in a static system prompt, we inject specific, timely context at runtime. This might include summaries of previous choices, key facts from distillation, or dynamic variables—tailored to the current interaction. Not everything can go in here as a single growing data structure could blow the context window.
MCP-style structured coordination Here, agents write to and read from a shared database through a thin tool layer. Each entry includes what the agent handled, what it produced, and what it believes matters for others. Other agents then use that shared log to inform their own decisions. Agents can create questions to get just the information they need using natural language from the shared store. Beyond recall—this forms the foundation of coordination.
Each strategy plays a distinct role but they overlap and work together like parts of a single brain. Like in human cognition, the relationship between them matters more than the pieces alone.
Why we build our own memory systems
Many tools out there promise memory for LLMs—LangChain, Mem0, MemGPT, Rewind, AutoGPT plugins, vector databases with session management. They work fine for demos, and some even hold up in light production use. But as soon as the workflows get real—spanning teams, sessions, documents, and systems—they start to crumble.
Take LlamaIndex's built-in memory and chat history system. Looked great on paper. As soon as we wanted anything more sophisticated than basic chat, it became impossible. Open source? Sure. But making changes at the level we needed wasn't practical—you'd essentially rewrite the whole thing. We spent weeks unwinding LlamaIndex from our system. Complete waste.
Most of these tools assume simple agents doing narrow tasks. They track conversations, maybe offer summarised recall, and sometimes automate RAG behind the scenes. What they don't offer involves memory as infrastructure for use by agents. Systems that handle information flow between agents. Persistence that lives longer than a session and adapts over time. Architecture that distinguishes between what happened, what mattered, and what still needs attention.
That's why we build our own.
We don't mean building vector stores or rewriting Redis. We mean stitching the right behaviours together—distillation, reflection, prompt injection, structured write-read layers—and tuning them to the actual goals of the system. Often we use common tools like Postgres, Redis, DuckDB, or even plain JSONL files. The difference lies in the design: what gets remembered, when it gets surfaced, and how it stays useful.
Off-the-shelf solutions usually treat recall like a transcript or a vector soup. We treat it like a living, scoped knowledge space—curated, pruned, and shaped by the agents using it.
Build incrementally or suffer
Here's what vendors won't tell you: building these systems happens iteratively, not as one big project. You can't estimate "memory system: 6 weeks" and deliver something useful. Instead, you build coordination piece by piece as your agentic system grows.
Start with basic logging. Add reflection when loops appear. Layer in Redis when timing matters. Introduce distillation when chat history gets unwieldy. Each piece solves a specific problem you've actually hit, not one you might theoretically face.
The alternative? Spending weeks integrating a comprehensive "memory solution" that doesn't quite fit, then weeks more unwinding it when reality hits. We learned this the hard way.
Quick coordination checklist for enterprise systems
If you're evaluating or building context management for an LLM-based system, check whether it:
Handles both long-term and short-term information effectively
Captures decisions, not just transcripts
Allows multiple agents or components to read and write
Enables context filtering rather than prompt flooding
Supports scoping or tagging of memory entries
Provides traceability and human auditability
Evolves over time rather than built all at once
If most of these remain unchecked, memory will likely become the bottleneck—and the model will take the blame for what essentially amounts to a systems design problem.
Questions to ask before your agents start forgetting
If you're building or managing an AI system that needs persistent context, these questions can save you from surprises:
Here's what we wished we'd asked before our first system fell apart:
What information do agents actually need to track to perform well?
Which parts write to shared state—and which ones read from it?
How do we distinguish between temporary context and persistent knowledge?
What gets stored, for how long, and in what format?
Can entries be traced, audited, and updated without breaking the system?
Does the architecture help parts coordinate, or just replay chat history?
What scope does each memory solution cover?
What happens when two entries conflict?
These aren't nice-to-have considerations. They shape how the system thinks, reacts, and collaborates. Treat them like a checkbox and your system will behave like one.
Let agents manage their own recall
We stopped trying to control everything. Early on, we fell into the trap of micromanaging context—carefully orchestrating which agent saw which piece of information when. It reminded me of those horrible early AI chatbots in the 1980s with nested if-then statements, trying to handle every possible case. Exponential complexity, minimal intelligence.
Instead, we gave agents tools and let them figure out when to use them. Need recent context? Check Redis. Need decision history? Query the coordination log. Need reference material? Hit the RAG store.
Our task manager stopped repeating entire workflows once it could check what had already been done. Research agents stopped duplicating summaries once they could see what existed. The system became less like programmed automation and more like actual coordination.
That shift—from trying to control every interaction to giving agents the tools to manage their own context—made everything click. Stop thinking of it as a programming problem. Start thinking of it as giving your agents the same thing you'd want in a confusing situation: the ability to check their notes, remember what happened, and figure out what comes next.
The agents finally work together because they can finally see what everyone else has done. Not because we got smarter about programming them, but because we got out of their way and let them coordinate.
______________
Written in collaboration with Claude, with some (sometimes) constructive criticism from chatGPT.
Originally published on cauri's Substack
_____________
We built a system with about fifteen agents—each with a clear role. Some pulled research from structured sources or the web, others planned workflows, compiled information, drafted documents, or searched through custom databases to surface what mattered. Each could hold its own in a narrow task. But together? They were powerful... and they kept circling back on the same jobs.
Our task manager agent would coordinate an entire research flow—planning, task distribution, multiple research agents feeding a synthesiser, final document assembly. Beautiful. Then, after the final agent wrapped up the document, the task manager would look at the completed work and decide to… do the whole bloody thing again. The entire flow. Burning through tokens like we had money to waste.
Two other agents might summarise the same paper. Another would plan a workstream that another agent had already handled. Sometimes no agent followed up on a critical step because it "assumed" another agent had it covered. We weren't dealing with prompt failure—we were dealing with awareness failure. Not a lack of storage, but a lack of coordination: what's been done, what still needs doing, and what to prioritise next.
At first, we tried basic reflection—prompting the LLM to ask itself whether it had completed a task or if something had already been addressed. (Like when chatGPT says, I'll summarise that for you, but does not do it. And we reply "hey, where's my summary?" It apologises and gets it done. This works the same way but programmatically.) This helped marginally, but reflection without a source of truth can amplify confusion. So we went further: we gave each agent the ability to write to and read from a central coordination log—backed by Postgres and mediated through a Model Context Protocol (MCP)-style tool layer.
Each agent logged what it handled, what it produced, and what it thought significant. Before beginning a new task, it checked that log: what had already happened? What remained? What decisions had others made? This went beyond simple recall—it enabled coordination.
Suddenly, the whole thing started to gel. Not because the agents got smarter, but because they finally had the right context—available at the right time, with the right scope.
We started building memory. Not memory like RAM and hard drives, but memory needed for cognition. Memory in a multi-agentic system emerges from the art and science of context management.
Context requires more than storage—it needs timing and scope
When people talk about giving LLMs recall abilities, what they usually mean involves access to stored history: previous conversations, documents, events, summaries. But stored information doesn't help unless you know what to surface—and when.
Storage doesn't create the problem. Context delivery does.
Context only works when the system receives the right piece of information, at the right time, in a form it can act on. Give too little and it forgets what it needs. Give too much and it loses the thread. You end up with agents making redundant decisions or missing key steps—not because the knowledge wasn't available, but because it wasn't surfaced properly.
That's why recall in agentic systems needs more than just a tool—it needs a strategy.
Six coordination strategies that work in practice
In our work designing multi-agent systems with real-world complexity, we rely on six core memory strategies. These represent real components we deploy and orchestrate. And yes, each one fails in predictable ways when pushed too hard.
RAG (Retrieval-Augmented Generation) Used to surface relevant reference materials from internal databases or structured knowledge bases. The vector store provides grounding context for specific decisions. But it handles ephemeral information poorly—like what just happened a minute ago. For this, RAG would cheerfully return confident nonsense.
Redis for short-term recall We use Redis to hold fast-turnover information—like what a group of agents discussed or selected recently. This short-term store supports task-specific queries and quick lookups during a multi-agentic process, giving each agent access to recent events without bloating the prompt window. Of course, when Redis times out or gets flushed at the wrong moment, your agents suddenly develop amnesia mid-task.
Distillation logs Instead of storing whole chat histories, we distil chat into structured takeaways: decisions, rationale, open questions, blocking issues. These go into a shared space—Postgres or otherwise—and get reinjected as needed into prompts, sometimes in summary form into a system prompt and sometimes on-demand through a tool. The catch? Bad distillation means losing critical nuance. "Customer angry about pricing" doesn't capture "customer threatening to cancel $2M contract over 3% increase."
Reflection tools These tools pull mainly on chat history or very recent activity. They allow an agent to pause and ask: "Have I done this before?" or "Did I complete the task I set out to do?" The tool surfaces just enough information to ground the next decision. Useful when a loop of interaction needs light-touch awareness. Less useful when the reflection itself creates loops of navel-gazing.
Runtime prompt injection Instead of keeping everything in a static system prompt, we inject specific, timely context at runtime. This might include summaries of previous choices, key facts from distillation, or dynamic variables—tailored to the current interaction. Not everything can go in here as a single growing data structure could blow the context window.
MCP-style structured coordination Here, agents write to and read from a shared database through a thin tool layer. Each entry includes what the agent handled, what it produced, and what it believes matters for others. Other agents then use that shared log to inform their own decisions. Agents can create questions to get just the information they need using natural language from the shared store. Beyond recall—this forms the foundation of coordination.
Each strategy plays a distinct role but they overlap and work together like parts of a single brain. Like in human cognition, the relationship between them matters more than the pieces alone.
Why we build our own memory systems
Many tools out there promise memory for LLMs—LangChain, Mem0, MemGPT, Rewind, AutoGPT plugins, vector databases with session management. They work fine for demos, and some even hold up in light production use. But as soon as the workflows get real—spanning teams, sessions, documents, and systems—they start to crumble.
Take LlamaIndex's built-in memory and chat history system. Looked great on paper. As soon as we wanted anything more sophisticated than basic chat, it became impossible. Open source? Sure. But making changes at the level we needed wasn't practical—you'd essentially rewrite the whole thing. We spent weeks unwinding LlamaIndex from our system. Complete waste.
Most of these tools assume simple agents doing narrow tasks. They track conversations, maybe offer summarised recall, and sometimes automate RAG behind the scenes. What they don't offer involves memory as infrastructure for use by agents. Systems that handle information flow between agents. Persistence that lives longer than a session and adapts over time. Architecture that distinguishes between what happened, what mattered, and what still needs attention.
That's why we build our own.
We don't mean building vector stores or rewriting Redis. We mean stitching the right behaviours together—distillation, reflection, prompt injection, structured write-read layers—and tuning them to the actual goals of the system. Often we use common tools like Postgres, Redis, DuckDB, or even plain JSONL files. The difference lies in the design: what gets remembered, when it gets surfaced, and how it stays useful.
Off-the-shelf solutions usually treat recall like a transcript or a vector soup. We treat it like a living, scoped knowledge space—curated, pruned, and shaped by the agents using it.
Build incrementally or suffer
Here's what vendors won't tell you: building these systems happens iteratively, not as one big project. You can't estimate "memory system: 6 weeks" and deliver something useful. Instead, you build coordination piece by piece as your agentic system grows.
Start with basic logging. Add reflection when loops appear. Layer in Redis when timing matters. Introduce distillation when chat history gets unwieldy. Each piece solves a specific problem you've actually hit, not one you might theoretically face.
The alternative? Spending weeks integrating a comprehensive "memory solution" that doesn't quite fit, then weeks more unwinding it when reality hits. We learned this the hard way.
Quick coordination checklist for enterprise systems
If you're evaluating or building context management for an LLM-based system, check whether it:
Handles both long-term and short-term information effectively
Captures decisions, not just transcripts
Allows multiple agents or components to read and write
Enables context filtering rather than prompt flooding
Supports scoping or tagging of memory entries
Provides traceability and human auditability
Evolves over time rather than built all at once
If most of these remain unchecked, memory will likely become the bottleneck—and the model will take the blame for what essentially amounts to a systems design problem.
Questions to ask before your agents start forgetting
If you're building or managing an AI system that needs persistent context, these questions can save you from surprises:
Here's what we wished we'd asked before our first system fell apart:
What information do agents actually need to track to perform well?
Which parts write to shared state—and which ones read from it?
How do we distinguish between temporary context and persistent knowledge?
What gets stored, for how long, and in what format?
Can entries be traced, audited, and updated without breaking the system?
Does the architecture help parts coordinate, or just replay chat history?
What scope does each memory solution cover?
What happens when two entries conflict?
These aren't nice-to-have considerations. They shape how the system thinks, reacts, and collaborates. Treat them like a checkbox and your system will behave like one.
Let agents manage their own recall
We stopped trying to control everything. Early on, we fell into the trap of micromanaging context—carefully orchestrating which agent saw which piece of information when. It reminded me of those horrible early AI chatbots in the 1980s with nested if-then statements, trying to handle every possible case. Exponential complexity, minimal intelligence.
Instead, we gave agents tools and let them figure out when to use them. Need recent context? Check Redis. Need decision history? Query the coordination log. Need reference material? Hit the RAG store.
Our task manager stopped repeating entire workflows once it could check what had already been done. Research agents stopped duplicating summaries once they could see what existed. The system became less like programmed automation and more like actual coordination.
That shift—from trying to control every interaction to giving agents the tools to manage their own context—made everything click. Stop thinking of it as a programming problem. Start thinking of it as giving your agents the same thing you'd want in a confusing situation: the ability to check their notes, remember what happened, and figure out what comes next.
The agents finally work together because they can finally see what everyone else has done. Not because we got smarter about programming them, but because we got out of their way and let them coordinate.
______________
Written in collaboration with Claude, with some (sometimes) constructive criticism from chatGPT.