The Agent That Couldn't Connect the Dots
You've built an AI agent to help your engineering team stay on top of project work. You feed it everything: Jira tickets, Confluence docs, Slack threads, sprint planning notes, architecture decisions. You wire up a vector database, embed it all, and ship it. The team starts using it.
At first, it seems great. Ask it "What's the current status of the auth service refactor?" — it nails it. Ask it "Who's been working on the payment gateway?" — it finds the right Slack messages. But then someone asks a slightly harder question:
"Which engineers on the payments team have unresolved blockers that are holding up Q2 deliverables?"
The agent returns a generic list of open Jira tickets. It doesn't connect that Sarah owns the payments gateway, that she raised a blocker in Monday's standup Slack thread, that the blocker involves a dependency owned by the infra team, and that the infra team's sprint is already over capacity per last week's planning doc.
Four facts. Four different sources. Each one retrievable in isolation. None of them connected.
This isn't a data problem. All of that information exists in the knowledge base. The agent can recall any individual fact with high confidence. What it cannot do is reason across the chain of relationships that links them together.
This is the fundamental limitation of a vector-only memory architecture — and it's the most common mistake teams make when building production agents.
Bigger Context Windows Don't Fix It
The instinct is to throw more context at the problem. Dump everything into a 128K or 1M token window and let the model figure it out. This feels like it should work. It mostly doesn't.
The model doesn't have a filing system. It has attention — and attention has limits. Stuffing more tokens into the context is like solving a messy desk by buying a bigger desk. Eventually, everything gets lost faster.
The real fix requires rethinking memory as a layered architecture, not a single store.
The Three Memory Layers
World-class agent memory systems combine three complementary layers, each solving a different part of the problem. Used together, they give agents the ability to recall facts precisely, understand semantic meaning, and reason across complex relationships.
A traditional relational database (Postgres, SQLite, etc.) that stores structured facts with context: where did this information come from? When was it recorded? Who created it, who has access to it, and when does it expire? This layer is the source of truth for provenance. It answers the question "where did this come from, and can I trust it?" — something a vector store fundamentally cannot answer.
A vector index (pgvector, Pinecone, Weaviate, etc.) that stores embedded representations of meaning. This layer answers "what is this similar to?" It enables fuzzy, intent-driven retrieval: finding docs about "payment system failures" when the user asked about "transaction errors." It's the semantic bridge between natural language and stored knowledge. Powerful for initial retrieval — but blind to relationships.
A property graph database (Neo4j, Apache AGE, etc.) that stores explicit connections between entities. This layer answers "how do these facts connect?" It enables the kind of multi-hop reasoning our project manager example requires: Sarah → owns → payments gateway → blocked by → infra dependency → owned by → Team X → sprint status: over-capacity. Without a graph layer, that chain is invisible. With it, it's a three-hop traversal.
How the Three Layers Work Together
These layers aren't alternatives — they're complements. A query flows through all three, each contributing what it does best.
%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#1a1a2e", "primaryTextColor": "#e2e8f0", "primaryBorderColor": "#00d4ff", "lineColor": "#7c3aed", "secondaryColor": "#0d0d1f", "tertiaryColor": "#0a0a1a", "edgeLabelBackground": "#0a0a1a", "clusterBkg": "#0d0d1f", "clusterBorder": "#7c3aed", "titleColor": "#00d4ff", "nodeTextColor": "#fff"}}}%%
flowchart LR
Q["User Query"]:::query --> V
subgraph MEM ["Memory Architecture"]
V["Vector Layer
semantic search
pgvector"]:::vector
R["Relational Layer
provenance + access
Postgres"]:::relational
G["Graph Layer
relationships
Neo4j / AGE"]:::graph
V -->|candidate entities| G
G -->|hop traversal| R
V -->|access filter| R
end
MEM --> CTX["Assembled Context
ranked and filtered"]:::ctx
CTX --> LLM["LLM Response
grounded and precise"]:::llm
classDef query fill:#7c3aed,stroke:#a78bfa,color:#fff
classDef vector fill:#1a1040,stroke:#7c3aed,color:#c4b5fd
classDef relational fill:#001a22,stroke:#00d4ff,color:#7dd3fc
classDef graph fill:#220011,stroke:#ec4899,color:#f9a8d4
classDef ctx fill:#0d0d1f,stroke:#00d4ff,color:#e2e8f0
classDef llm fill:#0a0a1a,stroke:#00d4ff,color:#00d4ff
Fig 1. Query flows through all three layers; each contributes what it does best.
The flow in practice:
- The user's query is embedded and hits the vector layer first, retrieving semantically relevant candidate documents and entities.
- Those entities are handed to the graph layer, which traverses relationships — connecting engineers to tickets, tickets to blockers, blockers to dependent teams.
- Results are filtered through the relational layer, which checks provenance: is this information current? Does this user have access? What's the source confidence?
- The assembled, relationship-aware context is passed to the LLM — which now has everything it needs to answer the question correctly.
Practical Architecture Options
The good news: you don't need three separate managed services to get started. Here are the most practical paths depending on your stack:
| Option | Stack | Notes |
|---|---|---|
| pgvector + Apache AGE | Postgres extensions | Both layers in one Postgres instance. Great for self-hosted setups. ⚠️ Apache AGE is not available on Neon's managed Postgres — requires your own Postgres host or Docker. |
| pgvector + Neo4j AuraDB | Postgres + managed graph | Best of both worlds: Neon-compatible for vectors + Neo4j AuraDB free tier for the graph layer. Production-ready with zero graph infra to manage. |
| Cognee | Open-source orchestration | Cognee automatically orchestrates all three layers. You give it raw documents; it handles embedding, relationship extraction, and graph construction. Lowest barrier to get started. |
| Docker (self-hosted) | Apache AGE in Docker | The fastest way to get Apache AGE running locally or on a VPS with full extension support. |
For the Docker option, getting Apache AGE running is a single command:
docker run -p 5432:5432 \
-e POSTGRES_PASSWORD=secret \
apache/age:latest
Then enable both extensions in your database:
-- In your connected DB:
CREATE EXTENSION IF NOT EXISTS age;
CREATE EXTENSION IF NOT EXISTS vector;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
From there, you can store vector embeddings in a standard vector(1536) column, and model relationships using AGE's Cypher query interface — all within the same Postgres connection.
Where to Start
If you're building a new agent from scratch, start with Cognee — it removes all the architectural decisions and gets you a working three-layer memory in under an hour. If you're adding memory to an existing Postgres-backed system, reach for pgvector + Neo4j AuraDB. If you're self-hosted and want full control, Apache AGE + pgvector via Docker is the cleanest single-service path.
The key insight is this: vector similarity is a starting point, not a destination. The moment your agent needs to reason about how things relate — ownership, dependency, causality, chronology — you need a graph layer. And the moment your agent needs to reason about whether to trust a fact — who added it, when, from what source — you need a relational layer.
Memory in production agents is an architectural decision, not a configuration option. Build it with all three layers, and your agent won't just remember things. It'll understand them.
References & Inspirations
- 📊 Top AI Labs Share an Agent Memory Architecture — Daily Dose of DS. The original deep dive into how leading labs approach layered memory in production agents.
- 🧠 Cognee — Open-source memory for AI agents — topoteretes/cognee on GitHub. Orchestrates relational, vector, and graph layers automatically from raw input documents.
- 🕸️ Apache AGE — A graph extension for PostgreSQL. Enables graph database functionality using Cypher query language inside Postgres.
- 📄 Lost in the Middle: How Language Models Use Long Contexts — Liu et al., Stanford / Berkeley, 2023. Demonstrates the 30%+ recall degradation when relevant context is positioned mid-sequence.
- 📚 Karpathy's LLM OS / LLM Wiki — Andrej Karpathy's notes on the evolving architecture of LLMs as operating systems, including memory hierarchies.
← Back to Writing