What is agent memory?

Agent memory is the persistent state an AI agent reads at the start of every turn and writes back at the end. It usually has three layers: short-term (the conversation buffer), working memory (mid-session scratchpad), and long-term (cross-session persistent storage like a vector store or typed database).

Why not just keep everything in the context window?

Because the context window is bounded and expensive. Past 100k tokens, quality degrades (lost-in-the-middle) and cost grows linearly. Working memory and long-term memory live outside the context window; the agent reads only what it needs into the prompt at each turn.

What does memory failure look like in production?

Three patterns: unbounded growth (memory keeps adding rows until retrieval slows and cost balloons), staleness (the agent confidently uses an out-of-date fact), and cross-user contamination (one user's memory leaks into another's session). All three are caught by adding memory operations to the eval harness and grading on synthetic time-shifted fixtures.

Where does Morvion store agent memory?

pgvector inside the existing Postgres for long-term memory, a typed table or Redis for working memory, and the conversation buffer in the context window for short-term. Memory operations are versioned and traceable through the same observability layer as the rest of the agent.

Agent memory · Morvion Glossary

Agent memory is the persistent state an AI agent reads at the start of every turn and writes back at the end. Without it, every conversation is a cold start — the user repeats the same context, the agent forgets last week, and the workflow regresses to a fancy autocomplete. With it, the agent accumulates real working knowledge across sessions.

The three layers of agent memory.

Short-term. The conversation buffer for the current turn. Lives in the context window. Cleared at the end of the session.
Working memory. Mid-session scratchpad: the agent's current plan, what it has already tried, what it knows about the task. Held in a structured store outside the context window so it can grow without inflating the prompt.
Long-term. The persistent layer that survives across sessions. User preferences, accumulated facts, prior interactions, retrieved-and-confirmed information. Backed by a vector store or a typed database.

The read/write contract.

A working agent declares which memory it reads at the start of every turn and which it writes back at the end. Reads are structured queries (give me the last five turns; give me the user preferences object). Writes are typed events (record that the user prefers DM Mono; record that the agent has already tried tool X). Memory operations are part of the observability trace.

“Memory without contracts is just nostalgia.”

Common failure modes.

Unbounded memory growth — every turn adds rows, retrieval slows, costs balloon. Stale memory — the agent confidently uses a fact from three months ago that is no longer true. Cross-user contamination — memory from one user leaks into another's session because the partitioning key was wrong. All three are caught by writing memory operations into the eval harness and grading the agent's behaviour on synthetic time-shifted fixtures.

Frequently asked.

What is agent memory?: Agent memory is the persistent state an AI agent reads at the start of every turn and writes back at the end. It usually has three layers: short-term (the conversation buffer), working memory (mid-session scratchpad), and long-term (cross-session persistent storage like a vector store or typed database).
Why not just keep everything in the context window?: Because the context window is bounded and expensive. Past 100k tokens, quality degrades (lost-in-the-middle) and cost grows linearly. Working memory and long-term memory live outside the context window; the agent reads only what it needs into the prompt at each turn.
What does memory failure look like in production?: Three patterns: unbounded growth (memory keeps adding rows until retrieval slows and cost balloons), staleness (the agent confidently uses an out-of-date fact), and cross-user contamination (one user's memory leaks into another's session). All three are caught by adding memory operations to the eval harness and grading on synthetic time-shifted fixtures.
Where does Morvion store agent memory?: pgvector inside the existing Postgres for long-term memory, a typed table or Redis for working memory, and the conversation buffer in the context window for short-term. Memory operations are versioned and traceable through the same observability layer as the rest of the agent.

Agent memory

The three layers of agent memory.

The read/write contract.

Common failure modes.

Frequently asked.

Intelligent Systems & AI Infrastructure

Keep reading the glossary.

AI infrastructure

CRM intelligence

Immersive website

AI agent

Business intelligence dashboard

Client portal

Discovery sprint

Digital operating layer

Document intelligence

Eval-driven AI

Hospitality website

Marketplace platform

Multi-agent workflow

Real-time dashboard

Retrieval-augmented generation (RAG)

Prompt engineering

Vector database

AI observability

Embedding model

Fine-tuning

Vector search

Semantic search

Hallucination

Chain-of-thought

Function calling

Model distillation

Safety rails

Eval harness

Regression gate

Model Context Protocol (MCP)

Structured output

Agent tool use

Prompt injection

Agentic search

Observability traces

LLM guardrails

Agent handoff

Vector index

Token budget

Retrieval rerank

Embedding space

Semantic cache

Context window

Faithfulness

Cross-encoder

Model router

AI cost control

Structured extraction

AI evaluation framework

Retrieval quality

AI guardrail policy

Eval fixture

Eval rubric

AI incident

Agent orchestration

Eval versioning

Model fallback

Fine-grained routing

AI policy version control