Integration guide

Persistent memory for every Hermes turn.

The remnic-hermes package plugs Remnic directly into Hermes Agent using the MemoryProvider protocol. Every LLM call gets relevant memories pre-fetched into the system prompt. Every response is automatically observed for future recall. The agent does not choose when to recall — it always does.

Published April 2026. Package: remnic-hermes v1.0.1 on PyPI. Source: joshuaswarren/remnic.

The problem with tool-based memory

Every agent conversation starts from zero unless something deliberately bridges the gap. The obvious fix is to give the agent a recall tool and expect it to call the tool before answering. That approach works until the moment it matters most: when the context is long, the task is complex, or the model is under token pressure. At those moments the agent skips the tool call, forgets to check, or simply does not know what to search for.

MCP-based memory integration — where Remnic registers remnic_recall as a callable tool — is genuinely useful for explicit, user-directed queries. But for ambient recall that should happen on every turn, it places a burden on the model that will not always be honored.

How Remnic + Hermes is different

Hermes Agent defines a MemoryProvider protocol: a set of lifecycle hooks that run at fixed points in the agent loop, outside the LLM's control. The remnic-hermes package implements that protocol and connects it to the Remnic daemon.

The result is structural recall. Before the LLM sees the user's message, the plugin fires pre_llm_call, queries Remnic with the last user message, and injects the results directly into the system prompt as a <remnic-memory> block. After the LLM responds, sync_turn fires and sends the last two messages to Remnic for real-time observation. At session end, extract_memories sends the full transcript for a deeper extraction pass. The agent does not participate in any of this. It just gets better context.

Aspect MCP only MemoryProvider
Recall Agent must call remnic_recall Automatic on every turn before the LLM call
Observe Agent must call remnic_store Automatic after every response
Latency Tool call overhead on the hot path Pre-fetched; Remnic query runs before LLM call
Reliability Agent may skip under load or context pressure Structural — the hook cannot be skipped
Tool call budget Recall consumes one tool call per turn No tool call consumed; memory arrives in system prompt

The two approaches are complementary. remnic-hermes also registers remnic_recall, remnic_store, and remnic_search as explicit tools for cases where the agent needs to search memory on demand. Structural recall handles the ambient case; explicit tools handle the intentional case.

What happens on each turn

User message arrives | v pre_llm_call(messages) - Last user message extracted as recall query - Query skipped if message is fewer than 3 words - POST /engram/v1/recall { query, topK: 8, mode: "minimal" } - Results injected into system prompt: | v <remnic-memory count="N"> ... relevant memories from Remnic ... </remnic-memory> | v LLM call (sees full context including injected memories) | v sync_turn(transcript) - Last 2 messages (user + assistant) sent to Remnic - POST /engram/v1/observe { sessionKey, messages } - Non-blocking; errors are swallowed silently | v ... more turns ... | v extract_memories(session) (on session end) - Full session transcript sent to Remnic - POST /engram/v1/observe { sessionKey, messages: all } - Remnic runs a structured extraction pass on the full context

HTTP paths currently use the legacy /engram/v1 surface during the Remnic v1.x compat window. They will switch to /remnic/v1 in a future release. The plugin handles this transparently.

Benefits

Zero agent-side choice

Recall is structural, not tool-based. The MemoryProvider hook fires before every LLM call, regardless of what the model decides.

No tool-call latency

Memory arrives in the system prompt, not via a tool call round-trip. The agent's tool budget is preserved for actual task work.

Persists across sessions

Remnic stores memories on disk as plain markdown files. They survive Hermes restarts, machine reboots, and profile changes.

Local-first

The Remnic daemon runs on your machine. No cloud service, no telemetry, no subscription. Your memories are plain files.

Session isolation

Each provider instance generates a unique session_key. Different Hermes profiles can use different keys or share one.

Graceful degradation

If Remnic is down or unreachable, the plugin swallows errors silently and the agent keeps working without memory context.

Explicit tools still available

remnic_recall, remnic_store, remnic_search registered as Hermes tools for on-demand use.

MIT licensed

remnic-hermes and the Remnic core are both MIT. Inspect, fork, and extend freely.

Quickstart

  1. Install the plugin
    pip install remnic-hermes

    Requires Python 3.10+. Install into the same environment Hermes uses. Hermes Agent v0.7.0+ is required for the MemoryProvider protocol.

  2. Wire Hermes to Remnic
    remnic connectors install hermes

    Generates a dedicated Hermes token, writes it to ~/.remnic/tokens.json, adds the remnic: block to config.yaml, and runs a daemon health check.

  3. Restart Hermes

    Hermes reads its plugin list at startup. Full restart required. Config reload is not sufficient.

  4. Verify
    hermes --version && pip show remnic-hermes

    Start a session and issue a query. Check the Hermes debug log for <remnic-memory> blocks.

Configuration

The plugin reads from a remnic: key in your Hermes config.yaml. All fields are optional — defaults work for a standard local Remnic install.

plugins:
  - remnic_hermes

remnic:
  host: "127.0.0.1"    # default
  port: 4318           # default
  token: ""            # empty = auto-load from ~/.remnic/tokens.json
  session_key: ""      # auto-generated as hermes-<12hex>
  timeout: 30.0

REMNIC_HOST and REMNIC_PORT env vars override the config values. Legacy ENGRAM_HOST / ENGRAM_PORT are accepted during the transition. A legacy engram: config block is accepted in place of remnic: — the plugin reads remnic: first and falls back to engram:.

Full config schema, profile isolation examples, and migration notes live in the in-depth plugin docs.

Troubleshooting

Daemon not running
remnic daemon status
remnic daemon install    # installs and starts the launchd/systemd service
Token missing — calls return 401

Verify ~/.remnic/tokens.json exists and contains a hermes connector entry. Re-running remnic connectors install hermes regenerates the token.

Memories not appearing in context

The plugin skips recall when the last user message is fewer than three words. Force an explicit recall to confirm the daemon round-trip works:

remnic daemon status
remnic recall "any query with at least three words"
Wrong Python environment

If you see ModuleNotFoundError: No module named 'remnic_hermes', check which Python Hermes is running under and install into that one: which python && pip show remnic-hermes.

Further reading