Zero agent-side choice
Recall is structural, not tool-based. The MemoryProvider hook fires before every LLM call, regardless of what the model decides.
The remnic-hermes package plugs Remnic directly into
NousResearch Hermes Agent using the
MemoryProvider protocol. Every LLM call gets relevant memories pre-fetched
into the system prompt. Every response is automatically observed for future
recall. Hermes also gets explicit Remnic tools for LCM search, recall
debugging, memory curation, checkpoints, profiling, governance, and work
continuity. The agent does not choose when to recall — it always does.
Every agent conversation starts from zero unless something deliberately
bridges the gap. The obvious fix is to give the agent a
recall tool and expect it to call the tool before
answering. That approach works until the moment it matters most: when
the context is long, the task is complex, or the model is under token
pressure. At those moments the agent skips the tool call, forgets to
check, or simply does not know what to search for.
MCP-based memory integration — where Remnic registers
remnic_recall as a callable tool — is genuinely useful for
explicit, user-directed queries. But for ambient recall that should
happen on every turn, it places a burden on the model that will not
always be honored.
Hermes Agent defines a MemoryProvider protocol: a set of lifecycle
hooks that run at fixed points in the agent loop, outside the LLM's
control. The remnic-hermes package implements that
protocol and connects it to the Remnic daemon.
The result is structural recall. Before the LLM sees the user's
message, the plugin fires pre_llm_call, queries Remnic
with the last user message, and injects the results directly into the
system prompt as a <remnic-memory> block. After the
LLM responds, sync_turn fires and sends the last two
messages to Remnic for real-time observation. At session end,
extract_memories sends the full transcript for a deeper
extraction pass. The agent does not participate in any of this. It
just gets better context.
| Aspect | MCP only | MemoryProvider |
|---|---|---|
| Recall | Agent must call remnic_recall | Automatic on every turn before the LLM call |
| Observe | Agent must call remnic_store | Automatic after every response |
| Latency | Tool call overhead on the hot path | Pre-fetched; Remnic query runs before LLM call |
| Reliability | Agent may skip under load or context pressure | Structural — the hook cannot be skipped |
| Tool call budget | Recall consumes one tool call per turn | No tool call consumed; memory arrives in system prompt |
The two approaches are complementary. remnic-hermes also
registers the full remnic_* parity surface as explicit
Hermes tools: recall, store, search, LCM search, Recall X-ray,
memory CRUD, continuity incidents, identity anchors, governance,
work boards, shared context, compounding, summaries, briefings,
checkpoints, and profiling. Structural recall handles the ambient
case; explicit tools handle the intentional case.
User message arrives | v pre_llm_call(messages) - Last user message extracted as recall query - Query skipped if message is fewer than 3 words - POST /engram/v1/recall { query, topK: 8 } - mode omitted so daemon defaults can include LCM sections - Results injected into system prompt: | v <remnic-memory count="N"> ... relevant memories from Remnic ... </remnic-memory> | v LLM call (sees full context including injected memories) | v sync_turn(transcript) - Last 2 messages (user + assistant) sent to Remnic - POST /engram/v1/observe { sessionKey, messages } - Non-blocking; errors are swallowed silently | v ... more turns ... | v extract_memories(session) (on session end) - Full session transcript sent to Remnic - POST /engram/v1/observe { sessionKey, messages: all } - Remnic runs a structured extraction pass on the full context
Recall is structural, not tool-based. The MemoryProvider hook fires before every LLM call, regardless of what the model decides.
Memory arrives in the system prompt, not via a tool call round-trip. The agent's tool budget is preserved for actual task work.
Remnic stores memories on disk as plain markdown files. They survive Hermes restarts, machine reboots, and profile changes.
The Remnic daemon runs on your machine. No cloud service, no telemetry, no subscription. Your memories are plain files.
Each provider instance generates a unique session_key. Different Hermes profiles can use different keys or share one.
If Remnic is down or unreachable, the plugin swallows errors silently and the agent keeps working without memory context.
Hermes can call Remnic tools for LCM search, recall debug, memory CRUD, continuity, identity, governance, work boards, checkpoints, and profiling.
Lossless Context Management is delivered through the MemoryProvider recall envelope. Hermes' context_engine slot is intentionally unused.
remnic-hermes and the Remnic core are both MIT. Inspect, fork, and extend freely.
pip install --upgrade remnic-hermes Requires Python 3.10+. Install into the same environment Hermes uses. Hermes Agent v0.7.0+ is required for the MemoryProvider protocol.
remnic connectors install hermes Generates a dedicated Hermes token, writes it to ~/.remnic/tokens.json, adds the remnic: block to config.yaml, and runs a daemon health check.
Hermes reads its plugin list at startup. Full restart required. Config reload is not sufficient.
hermes --version && pip show remnic-hermes Start a session and issue a query. Check the Hermes debug log for <remnic-memory> blocks, then call remnic_lcm_search or remnic_profiling_report to confirm explicit tools are registered.
The plugin reads from a remnic: key in your Hermes
config.yaml. All fields are optional — defaults work for a
standard local Remnic install.
plugins:
- remnic_hermes
remnic:
host: "127.0.0.1" # default
port: 4318 # default
token: "" # empty = auto-load from ~/.remnic/tokens.json
session_key: "" # auto-generated as hermes-<12hex>
timeout: 30.0 REMNIC_HOST and REMNIC_PORT env vars override
the config values. Legacy ENGRAM_HOST /
ENGRAM_PORT are accepted during the transition. A legacy
engram: config block is accepted in place of
remnic: — the plugin reads remnic: first and
falls back to engram:.
The plugin intentionally does not register a Hermes context_engine.
That Hermes slot replaces the local conversation compressor. Remnic's
memory recall, LCM enrichment, reset handling, and explicit tools all
run through the MemoryProvider and daemon surfaces.
remnic daemon status
remnic daemon install # installs and starts the launchd/systemd service Verify ~/.remnic/tokens.json exists and contains a hermes connector entry. Re-running remnic connectors install hermes regenerates the token.
The plugin skips recall when the last user message is fewer than three words. Force an explicit recall to confirm the daemon round-trip works:
remnic daemon status
remnic recall "any query with at least three words" If you see ModuleNotFoundError: No module named 'remnic_hermes', check which Python Hermes is running under and install into that one: which python && pip show remnic-hermes.