Integration guide

Persistent memory for every Hermes turn.

The remnic-hermes package plugs Remnic directly into NousResearch Hermes Agent using the MemoryProvider protocol. Every LLM call gets relevant memories pre-fetched into the system prompt. Every response is automatically observed for future recall. Hermes also gets explicit Remnic tools for LCM search, recall debugging, memory curation, checkpoints, profiling, governance, and work continuity. The agent does not choose when to recall — it always does.

Published April 2026. Package: remnic-hermes v1.0.2 on PyPI. Source: joshuaswarren/remnic.

Install in 30 seconds PyPI README

The problem with tool-based memory

Every agent conversation starts from zero unless something deliberately bridges the gap. The obvious fix is to give the agent a recall tool and expect it to call the tool before answering. That approach works until the moment it matters most: when the context is long, the task is complex, or the model is under token pressure. At those moments the agent skips the tool call, forgets to check, or simply does not know what to search for.

MCP-based memory integration — where Remnic registers remnic_recall as a callable tool — is genuinely useful for explicit, user-directed queries. But for ambient recall that should happen on every turn, it places a burden on the model that will not always be honored.

How Remnic + Hermes is different

Hermes Agent defines a MemoryProvider protocol: a set of lifecycle hooks that run at fixed points in the agent loop, outside the LLM's control. The remnic-hermes package implements that protocol and connects it to the Remnic daemon.

The result is structural recall. Before the LLM sees the user's message, the plugin fires pre_llm_call, queries Remnic with the last user message, and injects the results directly into the system prompt as a <remnic-memory> block. After the LLM responds, sync_turn fires and sends the last two messages to Remnic for real-time observation. At session end, extract_memories sends the full transcript for a deeper extraction pass. The agent does not participate in any of this. It just gets better context.

Aspect	MCP only	MemoryProvider
Recall	Agent must call `remnic_recall`	Automatic on every turn before the LLM call
Observe	Agent must call `remnic_store`	Automatic after every response
Latency	Tool call overhead on the hot path	Pre-fetched; Remnic query runs before LLM call
Reliability	Agent may skip under load or context pressure	Structural — the hook cannot be skipped
Tool call budget	Recall consumes one tool call per turn	No tool call consumed; memory arrives in system prompt

The two approaches are complementary. remnic-hermes also registers the full remnic_* parity surface as explicit Hermes tools: recall, store, search, LCM search, Recall X-ray, memory CRUD, continuity incidents, identity anchors, governance, work boards, shared context, compounding, summaries, briefings, checkpoints, and profiling. Structural recall handles the ambient case; explicit tools handle the intentional case.

What happens on each turn

User message arrives
        |
        v
pre_llm_call(messages)
  - Last user message extracted as recall query
  - Query skipped if message is fewer than 3 words
  - POST /engram/v1/recall  { query, topK: 8 }
  - mode omitted so daemon defaults can include LCM sections
  - Results injected into system prompt:
        |
        v
  <remnic-memory count="N">
    ... relevant memories from Remnic ...
  </remnic-memory>
        |
        v
LLM call  (sees full context including injected memories)
        |
        v
sync_turn(transcript)
  - Last 2 messages (user + assistant) sent to Remnic
  - POST /engram/v1/observe  { sessionKey, messages }
  - Non-blocking; errors are swallowed silently
        |
        v
... more turns ...
        |
        v
extract_memories(session)  (on session end)
  - Full session transcript sent to Remnic
  - POST /engram/v1/observe  { sessionKey, messages: all }
  - Remnic runs a structured extraction pass on the full context

HTTP paths currently use the legacy /engram/v1 surface during the Remnic v1.x compat window. They will switch to /remnic/v1 in a future release. The plugin handles this transparently.

Benefits

Zero agent-side choice

Recall is structural, not tool-based. The MemoryProvider hook fires before every LLM call, regardless of what the model decides.

No tool-call latency

Memory arrives in the system prompt, not via a tool call round-trip. The agent's tool budget is preserved for actual task work.

Persists across sessions

Remnic stores memories on disk as plain markdown files. They survive Hermes restarts, machine reboots, and profile changes.

Local-first

The Remnic daemon runs on your machine. No cloud service, no telemetry, no subscription. Your memories are plain files.

Session isolation

Each provider instance generates a unique session_key. Different Hermes profiles can use different keys or share one.

Graceful degradation

If Remnic is down or unreachable, the plugin swallows errors silently and the agent keeps working without memory context.

Full tool parity

Hermes can call Remnic tools for LCM search, recall debug, memory CRUD, continuity, identity, governance, work boards, checkpoints, and profiling.

LCM without context_engine

Lossless Context Management is delivered through the MemoryProvider recall envelope. Hermes' context_engine slot is intentionally unused.

MIT licensed

remnic-hermes and the Remnic core are both MIT. Inspect, fork, and extend freely.

Quickstart

Install the plugin
```
pip install --upgrade remnic-hermes
```
Requires Python 3.10+. Install into the same environment Hermes uses. Hermes Agent v0.7.0+ is required for the MemoryProvider protocol.
Wire Hermes to Remnic
```
remnic connectors install hermes
```
Generates a dedicated Hermes token, writes it to ~/.remnic/tokens.json, adds the remnic: block to config.yaml, and runs a daemon health check.
Restart Hermes
Hermes reads its plugin list at startup. Full restart required. Config reload is not sufficient.
Verify
```
hermes --version && pip show remnic-hermes
```
Start a session and issue a query. Check the Hermes debug log for <remnic-memory> blocks, then call remnic_lcm_search or remnic_profiling_report to confirm explicit tools are registered.

Configuration

The plugin reads from a remnic: key in your Hermes config.yaml. All fields are optional — defaults work for a standard local Remnic install.

plugins:
  - remnic_hermes

remnic:
  host: "127.0.0.1"    # default
  port: 4318           # default
  token: ""            # empty = auto-load from ~/.remnic/tokens.json
  session_key: ""      # auto-generated as hermes-<12hex>
  timeout: 30.0

REMNIC_HOST and REMNIC_PORT env vars override the config values. Legacy ENGRAM_HOST / ENGRAM_PORT are accepted during the transition. A legacy engram: config block is accepted in place of remnic: — the plugin reads remnic: first and falls back to engram:.

The plugin intentionally does not register a Hermes context_engine. That Hermes slot replaces the local conversation compressor. Remnic's memory recall, LCM enrichment, reset handling, and explicit tools all run through the MemoryProvider and daemon surfaces.

Full config schema, profile isolation examples, and migration notes live in the in-depth plugin docs.

Troubleshooting

Daemon not running

remnic daemon status
remnic daemon install    # installs and starts the launchd/systemd service

Token missing — calls return 401

Verify ~/.remnic/tokens.json exists and contains a hermes connector entry. Re-running remnic connectors install hermes regenerates the token.

Memories not appearing in context

The plugin skips recall when the last user message is fewer than three words. Force an explicit recall to confirm the daemon round-trip works:

remnic daemon status
remnic recall "any query with at least three words"

Wrong Python environment

If you see ModuleNotFoundError: No module named 'remnic_hermes', check which Python Hermes is running under and install into that one: which python && pip show remnic-hermes.