Roadmap¶

Honest priorities, in three tiers. Tier 1 unblocks the next release; Tier 2 is the differentiator that justifies the price tag; Tier 3 is enterprise polish.

Tier 1 — retrieval quality (next release)¶

These are table-stakes for "memory that works." Plain vector top-k against an LLM-quality embedder gets you 70% of the way; the gaps below are most of the remaining 30%.

1. Hybrid search (BM25 + vector + RRF fusion)¶

Why. Pure-vector retrieval misses exact matches like order IDs (#4521), SKUs, emails. BM25 catches them. Reciprocal Rank Fusion combines the two without tuning weights. Where. New HybridRetriever in src/memnex/memory/retrieval.py, called from MemoryManager.search. Cost. Postgres has built-in tsvector for BM25; no new infra.

2. Cross-encoder reranker on top-k¶

Why. Bi-encoder retrieval (what we use today) is fast but imprecise. A cross-encoder reranker on the top-50 boosts recall@5 by 20–40% on standard benchmarks. Cohere Rerank or bge-reranker-v2-m3 run locally are both viable. Where. New Reranker protocol; called between vector search and the salience compressor. Cost. ~50ms per query if local; ~100ms via API.

3. Temporal decay at read time¶

Why. Today, salience is frozen at write time. A 6-month-old "I love pizza" can outrank yesterday's "I'm vegan now." A simple time-decay multiplier at retrieval fixes it. Where. Compressor in src/memnex/memory/compressor.py.

Tier 2 — the memory differentiator¶

These move us from "RAG with a nice schema" to "actual memory product."

4. LLM-based fact merging¶

Why. Today, conflict detection supersedes the older fact (is_active=False). Better: merge into one fact with a history trail (status: cancel_requested → cancel_rescinded [2026-04-25]). This is what makes "memory" feel like memory. Where. MemoryManager._resolve_conflict becomes pluggable; ship a default LLM merger and a deterministic fallback.

5. Entity resolution and canonicalization¶

Why. "order XYZ", "#XYZ", "the XYZ one" should collapse to the same entity. Today they're three separate strings, so conflict detection misfires. Where. New EntityResolver invoked at write time; uses tenant-scoped alias maps.

6. Postgres-backed `TenantStore`¶

Why. Today's TenantStore is in-memory only. Production SaaS needs durability across restarts and replicas. Single-class swap behind the existing protocol. Where. src/memnex/saas/postgres_store.py. Hooked in via bootstrap_store_from_env.

7. Eval harness — LongMemEval and LOCOMO scores¶

Why. "Memnex is good" is a marketing claim. "Memnex scores X on LongMemEval" is a number. Enterprise buyers want numbers. Where. Extend src/memnex/eval/suites/ with the public benchmark datasets; publish per-release.

Tier 3 — enterprise and polish¶

8. Memory graph¶

Why. Entities as nodes, facts as edges. Enables "tell me everything about order XYZ" without relying on embedding luck. Also useful for household / org-level memory ("Vikram is Priya's son — they share an address").

9. Per-tenant rate limits and usage quotas¶

Why. Required for any paid tier. Today there are no enforced caps beyond the request-level size limits.

10. Channel-aware merging¶

Why. Voice + explicit confirmation should outweigh ambient WhatsApp chatter. Today merging is purely temporal.

11. HTTP MCP transport tested end-to-end¶

Why. Today integration tests drive the tool handlers directly. The HTTP transport itself isn't covered.

12. Observability — retrieval traces, recall@k per query, salience drift¶

Why. Operators need to see why a retrieval missed. Today the metrics are surface-level (counts, latencies).

Out of scope (intentionally)¶

A built-in chatbot UI. Memnex is infra; bring your own agent.
A built-in CRM. Stores agent-relevant facts, not your sales pipeline.
Real-time pub-sub of memory changes. Polling is sufficient for the use cases we serve.
Multi-modal memory (image / doc references). Possible later, not soon.