Memory in multi-agent systems: technical implementations

Originally published on cauri's Medium

____________

Memory fundamentally changes how we think about multi-agent systems. At Artium, when we first started working with generative AI chatbots, we faced a simple problem: limited context windows. We soon realised that once we had reached the end of a context window, we’d run a function to compress and summarise everything that came before, then start a new context window that would eventually fill up and compress again.

This created a fascinating parallel to human memory — events further in the past became more compressed, less feature-rich, and less clear, much like how our own memories work. Things that happened long ago tend to lack detail compared to recent experiences. While intellectually interesting, this approach served as a brute-force method for managing context limitations.

Agentic memory architecture

Modern agent systems require more nuanced memory types that mirror human cognitive categories:

Immediate Working Memory — Information that must remain constantly accessible, similar to how you don’t need to recall how to speak or walk consciously
Searchable Episodic Memory — Information the agent must actively retrieve, comparable to how you search your mind for a specific conversation or event
Procedural Memory — Skills and learned behaviours that become automatic, like your ability to type without thinking about individual keys
Semantic Knowledge — Factual information and conceptual understanding that form the agent’s world model

How agentic memory mimics human memory concepts at a high level

These memory requirements vary by type, urgency, recency, and importance (there is another blog post on adding layers, such as temporal weight to memories). Some information must remain in immediate context, while other details can be retrieved only when needed.

Technical Implementations

To support these different memory needs, we’ve developed various technical approaches:

1. File-based memory for working context

The simplest yet often most effective approach involves storing critical information in structured files injected directly into prompts. We typically use JSON or markdown formats since they integrate seamlessly with prompt structures.

# User Preferences
- prefers a down-to-earth tone
- frequently discussed topics: “renewable energy”, “neural networks”, “gardening techniques”
- sensitive topics: “politics”, “religion”

The format itself matters less than consistency and LLM interpretability. Markdown often proves ideal since it matches the format of the prompt itself, creating fewer parsing demands on the model. This approach ensures that critical working memory remains instantly accessible without requiring active retrieval.

2. Database integration through model context protocol (MCP)

We implement Model Context Protocol (MCP) as an interface layer between agents and databases for structured, queryable memory that can be shared across multiple agents. MCP allows agents to express intent to read or write information in natural language, which then gets converted to appropriate database operations.

Agent: “I need to store that the user prefers email notifications over SMS”
MCP Layer: Translates to → INSERT INTO preferences (user_id, notification_channel, timestamp) VALUES (1234, ‘email’, NOW());

This approach allows multiple agents to share a persistent memory store while maintaining a natural language interface. We’re experimenting with Postgres for this implementation.

3. RAG Systems for Episodic and Semantic Memory

We implement Retrieval Augmented Generation (RAG) systems to handle large volumes of information that don’t need to be constantly present in context. The agent reaches into these stores only when relevant, performing semantic searches to find information similar to what it’s looking for.

We use pgvector as it has a rich ecosystem and is a known quantity. Our RAG implementations function like episodic and semantic memory, housing factual knowledge and experiential information. This approach works particularly well for storing:

Lengthy documents
Conversation histories
Domain knowledge
Past interactions with specific users

Advanced Memory Techniques

Beyond these basic implementations, we’re developing several more sophisticated approaches:

Dynamic multi-shot example selection

Rather than manually curating examples for few-shot training, we’ve implemented an AI-driven approach to example selection:

Our system tracks user interactions with an agent
When the agent suggests content that receives positive user engagement (user accepts and continues), this gets flagged as effective
If the same input pattern repeatedly leads to successful outputs (without necessarily using the exact same words), it’s automatically added to a library of multi-shot examples
Future prompts dynamically include these proven examples based on input similarity

This creates a self-improving system which reinforces the most effective patterns through actual usage, essentially leveraging AI to curate its own training examples.

Once enough examples exist, these can be augmented with synthetic data to form a dataset for fine tuning a model if needed.

Memory distillation process

We’ve moved beyond the crude compression approach of summarising entire chat histories. Our current approach resembles a codec (compression-decompression) algorithm designed explicitly for conversational intelligence, we call it distillation:

An agent actively monitors conversation in real-time
It identifies and extracts important information as it emerges
This information gets paraphrased and distilled into compact, high-signal representations
Less important conversational elements get discarded entirely (or stored as a full chat in a RAG store as above for occasional retreival)
The distilled information gets stored in the most appropriate memory system (file, database, or RAG)
When needed, this compressed information can be “decompressed” by another LLM and given to an agent as chat history context.

This distillation process significantly improves memory efficiency by:

Eliminating verbose back-and-forth elements
Preserving core information while reducing token usage
Storing information in formats optimised for future LLM interpretation
Enabling more intelligent context management than simple truncation

Basic distillation flow for creating useful conversation history

Conflict resolution mechanisms

With multiple agents potentially writing to shared memory, conflict resolution becomes critical. The least suitable approaches would be traditional locking mechanisms and simple timestamp-based resolutions, as they don’t account for the semantic understanding and reasoning capabilities that make agentic systems unique. We have begun to explore the following:

Event sourcing — This aligns perfectly with agentic systems because:

It captures the reasoning behind each write, not just the data itself
It preserves the whole history of memory changes, allowing for retrospective analysis
It enables “replaying” memory formation with different resolution strategies as needed
It matches well with the memory distillation process, capturing the evolution of knowledge

Semantic conflict resolution — Rather than mechanical conflict resolution:

Specialised arbiter agents can review conflicting writes and generate coherent reconciliations
The resolution can consider intent and meaning, not just timestamps
This approach leverages the LLM’s strengths in understanding nuance

CRDTs (Conflict-free Replicated Data Types) — Highly relevant because:

They allow multiple agents to write independently without coordination
They guarantee eventual consistency without locking
Different memory types can use different CRDT implementations based on their needs
They’re designed for distributed systems with eventual consistency

CQRS (Command Query Responsibility Segregation) — Valuable because:

It separates the write model from the read model, allowing optimisation for both
Agents can issue memory operations as commands with intent
The system can maintain multiple projections of memory optimised for different query patterns
It naturally integrates with event sourcing

Moving toward cognitive architecture

This evolution in memory management moves us closer to true cognitive architecture, shifting from simplistic systems to more competent, interconnected memory mechanisms that more closely resemble human thought processes.

We’ve moved beyond treating these systems as simple input-output machines. The technical approaches described here represent a step towards enabling an entirely new class of agent capabilities through more sophisticated information management. As these memory systems mature, I suspect we will see more emergent behaviours that unlock unforeseen new abilities in agents, which I cannot wait to explore.

Originally published on cauri's Medium

____________

Memory fundamentally changes how we think about multi-agent systems. At Artium, when we first started working with generative AI chatbots, we faced a simple problem: limited context windows. We soon realised that once we had reached the end of a context window, we’d run a function to compress and summarise everything that came before, then start a new context window that would eventually fill up and compress again.

This created a fascinating parallel to human memory — events further in the past became more compressed, less feature-rich, and less clear, much like how our own memories work. Things that happened long ago tend to lack detail compared to recent experiences. While intellectually interesting, this approach served as a brute-force method for managing context limitations.

Agentic memory architecture

Modern agent systems require more nuanced memory types that mirror human cognitive categories:

Immediate Working Memory — Information that must remain constantly accessible, similar to how you don’t need to recall how to speak or walk consciously
Searchable Episodic Memory — Information the agent must actively retrieve, comparable to how you search your mind for a specific conversation or event
Procedural Memory — Skills and learned behaviours that become automatic, like your ability to type without thinking about individual keys
Semantic Knowledge — Factual information and conceptual understanding that form the agent’s world model

How agentic memory mimics human memory concepts at a high level

These memory requirements vary by type, urgency, recency, and importance (there is another blog post on adding layers, such as temporal weight to memories). Some information must remain in immediate context, while other details can be retrieved only when needed.

Technical Implementations

To support these different memory needs, we’ve developed various technical approaches:

1. File-based memory for working context

The simplest yet often most effective approach involves storing critical information in structured files injected directly into prompts. We typically use JSON or markdown formats since they integrate seamlessly with prompt structures.

# User Preferences
- prefers a down-to-earth tone
- frequently discussed topics: “renewable energy”, “neural networks”, “gardening techniques”
- sensitive topics: “politics”, “religion”

The format itself matters less than consistency and LLM interpretability. Markdown often proves ideal since it matches the format of the prompt itself, creating fewer parsing demands on the model. This approach ensures that critical working memory remains instantly accessible without requiring active retrieval.

2. Database integration through model context protocol (MCP)

We implement Model Context Protocol (MCP) as an interface layer between agents and databases for structured, queryable memory that can be shared across multiple agents. MCP allows agents to express intent to read or write information in natural language, which then gets converted to appropriate database operations.

Agent: “I need to store that the user prefers email notifications over SMS”
MCP Layer: Translates to → INSERT INTO preferences (user_id, notification_channel, timestamp) VALUES (1234, ‘email’, NOW());

This approach allows multiple agents to share a persistent memory store while maintaining a natural language interface. We’re experimenting with Postgres for this implementation.

3. RAG Systems for Episodic and Semantic Memory

We implement Retrieval Augmented Generation (RAG) systems to handle large volumes of information that don’t need to be constantly present in context. The agent reaches into these stores only when relevant, performing semantic searches to find information similar to what it’s looking for.

We use pgvector as it has a rich ecosystem and is a known quantity. Our RAG implementations function like episodic and semantic memory, housing factual knowledge and experiential information. This approach works particularly well for storing:

Lengthy documents
Conversation histories
Domain knowledge
Past interactions with specific users

Advanced Memory Techniques

Beyond these basic implementations, we’re developing several more sophisticated approaches:

Dynamic multi-shot example selection

Rather than manually curating examples for few-shot training, we’ve implemented an AI-driven approach to example selection:

Our system tracks user interactions with an agent
When the agent suggests content that receives positive user engagement (user accepts and continues), this gets flagged as effective
If the same input pattern repeatedly leads to successful outputs (without necessarily using the exact same words), it’s automatically added to a library of multi-shot examples
Future prompts dynamically include these proven examples based on input similarity

This creates a self-improving system which reinforces the most effective patterns through actual usage, essentially leveraging AI to curate its own training examples.

Once enough examples exist, these can be augmented with synthetic data to form a dataset for fine tuning a model if needed.

Memory distillation process

We’ve moved beyond the crude compression approach of summarising entire chat histories. Our current approach resembles a codec (compression-decompression) algorithm designed explicitly for conversational intelligence, we call it distillation:

An agent actively monitors conversation in real-time
It identifies and extracts important information as it emerges
This information gets paraphrased and distilled into compact, high-signal representations
Less important conversational elements get discarded entirely (or stored as a full chat in a RAG store as above for occasional retreival)
The distilled information gets stored in the most appropriate memory system (file, database, or RAG)
When needed, this compressed information can be “decompressed” by another LLM and given to an agent as chat history context.

This distillation process significantly improves memory efficiency by:

Eliminating verbose back-and-forth elements
Preserving core information while reducing token usage
Storing information in formats optimised for future LLM interpretation
Enabling more intelligent context management than simple truncation

Basic distillation flow for creating useful conversation history

Conflict resolution mechanisms

With multiple agents potentially writing to shared memory, conflict resolution becomes critical. The least suitable approaches would be traditional locking mechanisms and simple timestamp-based resolutions, as they don’t account for the semantic understanding and reasoning capabilities that make agentic systems unique. We have begun to explore the following:

Event sourcing — This aligns perfectly with agentic systems because:

It captures the reasoning behind each write, not just the data itself
It preserves the whole history of memory changes, allowing for retrospective analysis
It enables “replaying” memory formation with different resolution strategies as needed
It matches well with the memory distillation process, capturing the evolution of knowledge

Semantic conflict resolution — Rather than mechanical conflict resolution:

Specialised arbiter agents can review conflicting writes and generate coherent reconciliations
The resolution can consider intent and meaning, not just timestamps
This approach leverages the LLM’s strengths in understanding nuance

CRDTs (Conflict-free Replicated Data Types) — Highly relevant because:

They allow multiple agents to write independently without coordination
They guarantee eventual consistency without locking
Different memory types can use different CRDT implementations based on their needs
They’re designed for distributed systems with eventual consistency

CQRS (Command Query Responsibility Segregation) — Valuable because:

It separates the write model from the read model, allowing optimisation for both
Agents can issue memory operations as commands with intent
The system can maintain multiple projections of memory optimised for different query patterns
It naturally integrates with event sourcing

Moving toward cognitive architecture

This evolution in memory management moves us closer to true cognitive architecture, shifting from simplistic systems to more competent, interconnected memory mechanisms that more closely resemble human thought processes.

We’ve moved beyond treating these systems as simple input-output machines. The technical approaches described here represent a step towards enabling an entirely new class of agent capabilities through more sophisticated information management. As these memory systems mature, I suspect we will see more emergent behaviours that unlock unforeseen new abilities in agents, which I cannot wait to explore.

Originally published on cauri's Medium

____________

Memory fundamentally changes how we think about multi-agent systems. At Artium, when we first started working with generative AI chatbots, we faced a simple problem: limited context windows. We soon realised that once we had reached the end of a context window, we’d run a function to compress and summarise everything that came before, then start a new context window that would eventually fill up and compress again.

This created a fascinating parallel to human memory — events further in the past became more compressed, less feature-rich, and less clear, much like how our own memories work. Things that happened long ago tend to lack detail compared to recent experiences. While intellectually interesting, this approach served as a brute-force method for managing context limitations.

Agentic memory architecture

Modern agent systems require more nuanced memory types that mirror human cognitive categories:

Immediate Working Memory — Information that must remain constantly accessible, similar to how you don’t need to recall how to speak or walk consciously
Searchable Episodic Memory — Information the agent must actively retrieve, comparable to how you search your mind for a specific conversation or event
Procedural Memory — Skills and learned behaviours that become automatic, like your ability to type without thinking about individual keys
Semantic Knowledge — Factual information and conceptual understanding that form the agent’s world model

How agentic memory mimics human memory concepts at a high level

These memory requirements vary by type, urgency, recency, and importance (there is another blog post on adding layers, such as temporal weight to memories). Some information must remain in immediate context, while other details can be retrieved only when needed.

Technical Implementations

To support these different memory needs, we’ve developed various technical approaches:

1. File-based memory for working context

The simplest yet often most effective approach involves storing critical information in structured files injected directly into prompts. We typically use JSON or markdown formats since they integrate seamlessly with prompt structures.

# User Preferences
- prefers a down-to-earth tone
- frequently discussed topics: “renewable energy”, “neural networks”, “gardening techniques”
- sensitive topics: “politics”, “religion”

The format itself matters less than consistency and LLM interpretability. Markdown often proves ideal since it matches the format of the prompt itself, creating fewer parsing demands on the model. This approach ensures that critical working memory remains instantly accessible without requiring active retrieval.

2. Database integration through model context protocol (MCP)

We implement Model Context Protocol (MCP) as an interface layer between agents and databases for structured, queryable memory that can be shared across multiple agents. MCP allows agents to express intent to read or write information in natural language, which then gets converted to appropriate database operations.

Agent: “I need to store that the user prefers email notifications over SMS”
MCP Layer: Translates to → INSERT INTO preferences (user_id, notification_channel, timestamp) VALUES (1234, ‘email’, NOW());

This approach allows multiple agents to share a persistent memory store while maintaining a natural language interface. We’re experimenting with Postgres for this implementation.

3. RAG Systems for Episodic and Semantic Memory

We implement Retrieval Augmented Generation (RAG) systems to handle large volumes of information that don’t need to be constantly present in context. The agent reaches into these stores only when relevant, performing semantic searches to find information similar to what it’s looking for.

We use pgvector as it has a rich ecosystem and is a known quantity. Our RAG implementations function like episodic and semantic memory, housing factual knowledge and experiential information. This approach works particularly well for storing:

Lengthy documents
Conversation histories
Domain knowledge
Past interactions with specific users

Advanced Memory Techniques

Beyond these basic implementations, we’re developing several more sophisticated approaches:

Dynamic multi-shot example selection

Rather than manually curating examples for few-shot training, we’ve implemented an AI-driven approach to example selection:

Our system tracks user interactions with an agent
When the agent suggests content that receives positive user engagement (user accepts and continues), this gets flagged as effective
If the same input pattern repeatedly leads to successful outputs (without necessarily using the exact same words), it’s automatically added to a library of multi-shot examples
Future prompts dynamically include these proven examples based on input similarity

This creates a self-improving system which reinforces the most effective patterns through actual usage, essentially leveraging AI to curate its own training examples.

Once enough examples exist, these can be augmented with synthetic data to form a dataset for fine tuning a model if needed.

Memory distillation process

We’ve moved beyond the crude compression approach of summarising entire chat histories. Our current approach resembles a codec (compression-decompression) algorithm designed explicitly for conversational intelligence, we call it distillation:

An agent actively monitors conversation in real-time
It identifies and extracts important information as it emerges
This information gets paraphrased and distilled into compact, high-signal representations
Less important conversational elements get discarded entirely (or stored as a full chat in a RAG store as above for occasional retreival)
The distilled information gets stored in the most appropriate memory system (file, database, or RAG)
When needed, this compressed information can be “decompressed” by another LLM and given to an agent as chat history context.

This distillation process significantly improves memory efficiency by:

Eliminating verbose back-and-forth elements
Preserving core information while reducing token usage
Storing information in formats optimised for future LLM interpretation
Enabling more intelligent context management than simple truncation

Basic distillation flow for creating useful conversation history

Conflict resolution mechanisms

With multiple agents potentially writing to shared memory, conflict resolution becomes critical. The least suitable approaches would be traditional locking mechanisms and simple timestamp-based resolutions, as they don’t account for the semantic understanding and reasoning capabilities that make agentic systems unique. We have begun to explore the following:

Event sourcing — This aligns perfectly with agentic systems because:

It captures the reasoning behind each write, not just the data itself
It preserves the whole history of memory changes, allowing for retrospective analysis
It enables “replaying” memory formation with different resolution strategies as needed
It matches well with the memory distillation process, capturing the evolution of knowledge

Semantic conflict resolution — Rather than mechanical conflict resolution:

Specialised arbiter agents can review conflicting writes and generate coherent reconciliations
The resolution can consider intent and meaning, not just timestamps
This approach leverages the LLM’s strengths in understanding nuance

CRDTs (Conflict-free Replicated Data Types) — Highly relevant because:

They allow multiple agents to write independently without coordination
They guarantee eventual consistency without locking
Different memory types can use different CRDT implementations based on their needs
They’re designed for distributed systems with eventual consistency

CQRS (Command Query Responsibility Segregation) — Valuable because:

It separates the write model from the read model, allowing optimisation for both
Agents can issue memory operations as commands with intent
The system can maintain multiple projections of memory optimised for different query patterns
It naturally integrates with event sourcing

Moving toward cognitive architecture

This evolution in memory management moves us closer to true cognitive architecture, shifting from simplistic systems to more competent, interconnected memory mechanisms that more closely resemble human thought processes.

We’ve moved beyond treating these systems as simple input-output machines. The technical approaches described here represent a step towards enabling an entirely new class of agent capabilities through more sophisticated information management. As these memory systems mature, I suspect we will see more emergent behaviours that unlock unforeseen new abilities in agents, which I cannot wait to explore.

Home

Our Work

Who We Are

Our Services

Insights

Contact

Memory in multi-agent systems: technical implementations

Memory in multi-agent systems: technical implementations

Agentic memory architecture

Technical Implementations

1. File-based memory for working context

2. Database integration through model context protocol (MCP)

3. RAG Systems for Episodic and Semantic Memory

Advanced Memory Techniques

Dynamic multi-shot example selection

Memory distillation process

Conflict resolution mechanisms

Moving toward cognitive architecture

Agentic memory architecture

Technical Implementations

1. File-based memory for working context

2. Database integration through model context protocol (MCP)

3. RAG Systems for Episodic and Semantic Memory

Advanced Memory Techniques

Dynamic multi-shot example selection

Memory distillation process

Conflict resolution mechanisms

Moving toward cognitive architecture

Agentic memory architecture

Technical Implementations

1. File-based memory for working context

2. Database integration through model context protocol (MCP)

3. RAG Systems for Episodic and Semantic Memory

Advanced Memory Techniques

Dynamic multi-shot example selection

Memory distillation process

Conflict resolution mechanisms

Moving toward cognitive architecture