Skip to content

Roadmap

Honest priorities, in three tiers. Tier 1 unblocks the next release; Tier 2 is the differentiator that justifies the price tag; Tier 3 is enterprise polish.

Tier 1 — retrieval quality (next release)

These are table-stakes for "memory that works." Plain vector top-k against an LLM-quality embedder gets you 70% of the way; the gaps below are most of the remaining 30%.

1. Hybrid search (BM25 + vector + RRF fusion)

Why. Pure-vector retrieval misses exact matches like order IDs (#4521), SKUs, emails. BM25 catches them. Reciprocal Rank Fusion combines the two without tuning weights. Where. New HybridRetriever in src/memnex/memory/retrieval.py, called from MemoryManager.search. Cost. Postgres has built-in tsvector for BM25; no new infra.

2. Cross-encoder reranker on top-k

Why. Bi-encoder retrieval (what we use today) is fast but imprecise. A cross-encoder reranker on the top-50 boosts recall@5 by 20–40% on standard benchmarks. Cohere Rerank or bge-reranker-v2-m3 run locally are both viable. Where. New Reranker protocol; called between vector search and the salience compressor. Cost. ~50ms per query if local; ~100ms via API.

3. Temporal decay at read time

Why. Today, salience is frozen at write time. A 6-month-old "I love pizza" can outrank yesterday's "I'm vegan now." A simple time-decay multiplier at retrieval fixes it. Where. Compressor in src/memnex/memory/compressor.py.

Tier 2 — the memory differentiator

These move us from "RAG with a nice schema" to "actual memory product."

4. LLM-based fact merging

Why. Today, conflict detection supersedes the older fact (is_active=False). Better: merge into one fact with a history trail (status: cancel_requested → cancel_rescinded [2026-04-25]). This is what makes "memory" feel like memory. Where. MemoryManager._resolve_conflict becomes pluggable; ship a default LLM merger and a deterministic fallback.

5. Entity resolution and canonicalization

Why. "order XYZ", "#XYZ", "the XYZ one" should collapse to the same entity. Today they're three separate strings, so conflict detection misfires. Where. New EntityResolver invoked at write time; uses tenant-scoped alias maps.

6. Postgres-backed TenantStore

Why. Today's TenantStore is in-memory only. Production SaaS needs durability across restarts and replicas. Single-class swap behind the existing protocol. Where. src/memnex/saas/postgres_store.py. Hooked in via bootstrap_store_from_env.

7. Eval harness — LongMemEval and LOCOMO scores

Why. "Memnex is good" is a marketing claim. "Memnex scores X on LongMemEval" is a number. Enterprise buyers want numbers. Where. Extend src/memnex/eval/suites/ with the public benchmark datasets; publish per-release.

Tier 3 — enterprise and polish

8. Memory graph

Why. Entities as nodes, facts as edges. Enables "tell me everything about order XYZ" without relying on embedding luck. Also useful for household / org-level memory ("Vikram is Priya's son — they share an address").

9. Per-tenant rate limits and usage quotas

Why. Required for any paid tier. Today there are no enforced caps beyond the request-level size limits.

10. Channel-aware merging

Why. Voice + explicit confirmation should outweigh ambient WhatsApp chatter. Today merging is purely temporal.

11. HTTP MCP transport tested end-to-end

Why. Today integration tests drive the tool handlers directly. The HTTP transport itself isn't covered.

12. Observability — retrieval traces, recall@k per query, salience drift

Why. Operators need to see why a retrieval missed. Today the metrics are surface-level (counts, latencies).

Out of scope (intentionally)

  • A built-in chatbot UI. Memnex is infra; bring your own agent.
  • A built-in CRM. Stores agent-relevant facts, not your sales pipeline.
  • Real-time pub-sub of memory changes. Polling is sufficient for the use cases we serve.
  • Multi-modal memory (image / doc references). Possible later, not soon.