Platform Platform Overview Orchestration Memory Integrations Observability Use Cases Customer Support Data Pipelines DevOps Automation Research & Synthesis More Docs Changelog Pricing
Sign in Get started free
Memory

Memory for Production Agents

Working memory. Episodic memory. Semantic retrieval. Three memory types wired together so your agents remember what matters without hallucinating what they don't.

AGENT running Working Memory session context Episodic Memory past run history Semantic Memory vector retrieval

Three memory tiers, one unified API

Diaflow manages all three memory types behind a single SDK interface. Write to working memory in one call; query semantic vector memory in the next. No vector DB client to manage separately.

  • Working memory — key/value store scoped to a single run. Shared across steps. Cleared on run end.
  • Episodic memory — append-only log of past runs. Query by time range, tags, or user ID. Used for continuity across sessions.
  • Semantic memory — vector store for knowledge retrieval. Connect Pinecone, Weaviate, Qdrant, or pgvector. Query by embedding similarity.
Agent memory architecture showing working memory, episodic memory store, and semantic vector retrieval flow

Memory you can trust in production

Context isolation

Working memory is scoped per run — no bleed between concurrent agent executions. Isolation enforced at the storage layer, not the SDK.

Hybrid search

Combine vector similarity with metadata filters (user ID, date range, document type) in a single query. Precision retrieval, not just top-K.

TTL and eviction

Set TTLs on working memory keys. Configure episodic memory retention policies. Auto-archive old runs to keep storage costs predictable.

One API for all three memory types

memory_example.py
from diaflow import Agent
from diaflow.memory import PineconeBackend

agent = Agent(
  llm="claude-3-5-sonnet",
  memory=PineconeBackend(index="my-index")
)

with agent.run(run_id="run-42") as ctx:
  ctx.working.set("user_id", "u123")       # working mem
  docs = ctx.semantic.query(                  # semantic mem
    embedding=query_vec, top_k=5
  )
  history = ctx.episodic.get_last(            # episodic mem
    user_id="u123", n=10
  )

Agents with real memory, starting free

Set up memory in 5 minutes.