Agent memory is the persistent state an AI agent reads at the start of every turn and writes back at the end. Without it, every conversation is a cold start — the user repeats the same context, the agent forgets last week, and the workflow regresses to a fancy autocomplete. With it, the agent accumulates real working knowledge across sessions.

The three layers of agent memory.

  • Short-term. The conversation buffer for the current turn. Lives in the context window. Cleared at the end of the session.
  • Working memory. Mid-session scratchpad: the agent's current plan, what it has already tried, what it knows about the task. Held in a structured store outside the context window so it can grow without inflating the prompt.
  • Long-term. The persistent layer that survives across sessions. User preferences, accumulated facts, prior interactions, retrieved-and-confirmed information. Backed by a vector store or a typed database.

The read/write contract.

A working agent declares which memory it reads at the start of every turn and which it writes back at the end. Reads are structured queries (give me the last five turns; give me the user preferences object). Writes are typed events (record that the user prefers DM Mono; record that the agent has already tried tool X). Memory operations are part of the observability trace.

“Memory without contracts is just nostalgia.”

Common failure modes.

Unbounded memory growth — every turn adds rows, retrieval slows, costs balloon. Stale memory — the agent confidently uses a fact from three months ago that is no longer true. Cross-user contamination — memory from one user leaks into another's session because the partitioning key was wrong. All three are caught by writing memory operations into the eval harness and grading the agent's behaviour on synthetic time-shifted fixtures.

Frequently asked.

What is agent memory?
Agent memory is the persistent state an AI agent reads at the start of every turn and writes back at the end. It usually has three layers: short-term (the conversation buffer), working memory (mid-session scratchpad), and long-term (cross-session persistent storage like a vector store or typed database).
Why not just keep everything in the context window?
Because the context window is bounded and expensive. Past 100k tokens, quality degrades (lost-in-the-middle) and cost grows linearly. Working memory and long-term memory live outside the context window; the agent reads only what it needs into the prompt at each turn.
What does memory failure look like in production?
Three patterns: unbounded growth (memory keeps adding rows until retrieval slows and cost balloons), staleness (the agent confidently uses an out-of-date fact), and cross-user contamination (one user's memory leaks into another's session). All three are caught by adding memory operations to the eval harness and grading on synthetic time-shifted fixtures.
Where does Morvion store agent memory?
pgvector inside the existing Postgres for long-term memory, a typed table or Redis for working memory, and the conversation buffer in the context window for short-term. Memory operations are versioned and traceable through the same observability layer as the rest of the agent.