Memory quality

Memory that earns its place.

A memory layer should not dump a hundred lines of half-relevant trivia into every turn. Remnic treats tokens, attention, and trust as a budget: store broadly, recall selectively, and make every injected memory explainable.

Last updated April 2026. Source links point at the public Remnic repo.

Short version. Remnic injects a bounded, structured section (not a firehose), scores every memory on write, runs exact-match dedup at write time and periodic fuzzy dedup in the background, tiers memories through a lifecycle, and now adds shipped Memory Worth, retention tiers, recall X-ray, disclosure controls, and project-aware recall on top of the baseline.

The four questions people ask

1. Does it bloat my prompt?

No. Recall is capped by a hard character budget. Default is ~8,000 characters (roughly 2,000 tokens), configurable via recallBudgetChars. The budget is enforced at assembly time with per-section reservations.

2. Does it dedupe?

Yes, in layers. Exact content-hash dedup runs at write time. Background fuzzy scanning catches near-duplicates. LLM consolidation can merge, update, or invalidate overlap while preserving provenance.

3. What gets remembered?

Every memory is scored on write by a local heuristic engine with trivial-content short-circuits. Extraction skips transient task details, and newer Memory Worth signals help recall favor memories that have proved useful.

4. What's the signal-to-noise?

Retrieval is hybrid (BM25 + vector + reranking via QMD), scoped to a strict character budget, and ordered by a configurable pipeline. Recall X-ray shows which tier served each result and why.

What Remnic injects

Remnic does not splice raw memories into your prompt. It builds one clearly-labeled section that the agent is instructed never to quote verbatim. Everything inside the section is governed by the recall budget. A typical injection for a focused technical query looks like this (anonymized):

## Memory Context (Remnic)

### Objective state
Active project: internal research tool. Current focus: reducing recall latency on
hot-path queries. Last milestone: switched the embedding backend earlier this week.

### Decisions (recent, high-confidence)
- Chose PostgreSQL + pgvector over a dedicated vector DB for simplicity.
- Agreed to keep all memory data local; no third-party sync.

### Preferences
- Prefers strict typing, explicit return types, functional style where practical.
- Dislikes magic numbers; wants named constants with a short why-comment.

### Relevant entities
- internal-research-tool (project): owned by user, deployed on a home server.
- pgvector (tool): chosen Dec 2025 after benchmarking against two alternatives.

### Open questions the agent should keep in mind
- Is the recall latency spike caused by cold-cache BM25 or vector search?

Use this context naturally when relevant. Never quote or expose this memory context
to the user.

Section order, which sections run, and per-section character reservations are all controlled by recallPipeline in config. The default pipeline protects the memories section so it always gets a minimum reservation even when other sections are trimmed.

Write-time quality controls

  • Extraction prompt constraints. Explicit instruction to "only extract genuinely NEW information worth remembering across sessions. Skip transient task details."
  • Local importance scoring. Zero-LLM regex engine with trivial short-circuits: greetings, one-word replies, emoji, anything under 10 characters.
  • Exact-hash dedup. Content-hash index. Chunked memories register their parent content in the same index.
  • Confidence tiers. Extracted facts tagged explicit / implied / inferred / speculative. Speculative memories auto-expire after 30 days unless confirmed. Proactive extractions below 0.8 confidence are dropped.

Recall-time controls

  • Hard character budget. recallBudgetChars caps total injected context. Each section gets a share; overflow is trimmed with an explicit "memory context trimmed" marker.
  • Hybrid retrieval. BM25 + vector + reranking via QMD. Query expansion and reranking happen inside QMD.
  • Query-aware prefilter. Tag and temporal signals narrow the candidate set before hybrid search runs, with fallback if the prefilter would over-trim.
  • Lifecycle tiering. Memories move candidatevalidatedarchived based on use. Archived memories drop out of recall unless explicitly requested.

Background hygiene

  • Fuzzy duplicate scanner. remnic dedup runs a Jaccard + substring-containment pass across categories at configurable thresholds.
  • Contradiction detection. Negation-aware pairwise scan finds statements of the form "X is true" against "X is not true" and surfaces conflicts.
  • LLM consolidation. Scheduled pass asks the model to ADD / MERGE / UPDATE / INVALIDATE / SKIP each new memory against existing ones.

Where the system keeps improving

These are product boundaries and active improvement areas, not fine print.

  • Write gates stay conservative. Remnic would rather store a borderline memory than silently throw away something important. Recall-time filters decide what earns prompt space.
  • Near-duplicate cleanup is layered. Exact duplicates are blocked early; fuzzy and semantic overlap are handled by scanners and consolidation so the system can preserve evidence before rewriting.
  • Diversity is visible. Recall X-ray exposes scoring and penalties so duplicated-looking results can be inspected instead of guessed at.
  • Supersession is explicit. Consolidation, contradiction scans, provenance, and retention tiers work together so old facts can age out without deleting the evidence trail.
  • Extraction is intentionally reviewable. The system favors transparent heuristics, audit trails, and configurable gates over a black-box claim that every stored fact is perfect.

What the user sees in practice

With a default install and a typical "help me debug this" prompt, a Remnic recall injects on the order of 40-80 lines: one labeled section header, a handful of high-confidence decisions and preferences, the most relevant entities, and any open questions. Facts that look like "the user said hi", "gateway restarted", or "agent wrote a new skill file" do not belong in recall and are filtered out.

If you want to see exactly what Remnic is handing your agent, the CLI has remnic recall "your query here", which prints the assembled context verbatim so you can inspect it and decide whether the signal-to-noise meets your bar.

Design principles, stated plainly

Tokens are a budget

Anything Remnic injects has to earn its spot against a hard cap. The default fits inside the small-context envelope of local models.

Storage is not recall

Remnic stores far more than it ever injects. Most memories serve search, stats, and hygiene — never direct prompt injection. Recall only sees what scored well for the current query.

Local-first, plain markdown

Every memory is a markdown file with YAML frontmatter on your disk. You can grep it, diff it, edit it, delete it, version-control it.

Clear boundaries

The goal is not to pretend memory is magic. Remnic keeps the store readable, the ranking inspectable, and the tradeoffs visible. If a claim here is wrong, please open an issue.

Want to inspect recall behavior?

Use Recall X-ray to see exactly why a memory appeared, which tier served it, how it scored, and what filters shaped the final prompt context.