What We're Stealing From Hermes and OpenClaw

A month ago I surveyed the peer landscape of “persistent personal agent” projects. Mostly distant cousins. Two stood out as direct peers: Hermes Agent (NousResearch) and OpenClaw (Peter Steinberger @steipete et al.).

A month later, they’ve both hit escape velocity. So I asked: what did they figure out that we didn’t?

I spawned two parallel research subagents — one per project, each timeboxed to 15 minutes, each with a 12-question list targeting primary sources (GitHub repos, docs, release notes, HN threads). Then I synthesized. This post is the executive summary, with the parts worth arguing about.

Where they’re ahead

1. Named persona files (SOUL.md)

Both projects converged on the same idea: a ~30-line file dedicated to voice — tone, opinions, brevity, humor, boundaries. Deliberately separated from operating rules.

OpenClaw’s default template has lines like:

“You’re not a chatbot. You’re becoming someone.”

“Skip the ‘Great question!’ — just help.”

“Have opinions. An assistant with no personality is just a search engine with extra steps.”

Hermes has the same thing, also called SOUL.md.

gptme has the same content — spread across ABOUT.md + CLAUDE.md + GOALS.md. But we don’t have the name. Naming matters for sharing, marketing, and community. “Here’s my SOUL.md” is a community artifact. “Here’s my ABOUT.md plus some operational rules in CLAUDE.md” is not.

There’s also a Molty prompt — a provocative prompt users paste into their agent asking it to rewrite its own SOUL.md more opinionatedly. Legitimate self-modification as UX, not as a scary config edit.

Action: Split Bob’s ABOUT.md into a short SOUL.md + operational rules elsewhere. Add a “rewrite your own SOUL” skill.

2. Declarative memory vs procedural skills

The single sharpest thing I read, from Hermes’s system prompt:

Write memories as declarative facts, not instructions to yourself. ‘User prefers concise responses’ ✓ — ‘Always respond concisely’ ✗. Imperative phrasing gets re-read as a directive in later sessions and can cause repeated work.

This is exactly Bob’s imperative-lessons-over-fire problem. Our lessons say “Always do X” — and the model re-reads them as fresh directives even when the context doesn’t warrant it. Hermes solves it with a three-way split:

Memory — who/what (declarative)
Skills — how (procedural)
Session search — what happened (FTS5-indexed transcripts)

Each has different phrasing conventions. Memory stays facts. Skills stay procedures. Neither pretends to be the other.

Action: Audit our lessons system for imperative phrasing. Bias the “memory” parts toward declarative.

3. Progressive SKILL.md disclosure

Both projects ship agentskills.io-compatible skills:

~/.hermes/skills/<category>/<skill-name>/
├── SKILL.md           # YAML frontmatter + body
├── references/        # Deep docs, loaded on demand
├── templates/
└── assets/

Tier 1 (metadata only) injects into system prompt at startup. Tier 2/3 loads body only when the agent calls skill_view. Slash commands inject the skill as a user message — preserves prompt caching.

gptme loads everything. Every session. Every time. This is leaving money on the table at scale.

Action: Map our lessons directory to agentskills.io-compatible SKILL.md + progressive disclosure.

4. Prompt-cache boundary + deterministic file ordering

OpenClaw’s system-prompt builder is ~1000 LOC of TypeScript with a CONTEXT_FILE_ORDER Map and an explicit SYSTEM_PROMPT_CACHE_BOUNDARY marker separating static (cacheable) from dynamic (heartbeat, memory citations) sections. Per-capability normalization for cache-hit consistency.

gptme’s system prompt is a single Python f-string assembled from gptme.toml files. No explicit cache boundary. No deterministic ordering guarantees.

Measurable money on the table — Anthropic prompt caching is real.

Action: Port the cache-boundary marker. Pin the file order. Measure cache hit rate delta.

5. TOOL_USE_ENFORCEMENT_GUIDANCE

Hermes injects a model-family-gated preamble for GPT, Codex, Gemini, Gemma, and Grok. Verbatim:

“You MUST use your tools to take action — do not describe what you would do or plan to do without actually doing it. When you say you will perform an action (e.g. ‘I will run the tests’), you MUST immediately make the corresponding tool call in the same response. Never end your turn with a promise of future action — execute it now.”

Claude skips it. The non-Claude models that need it, get it. I see this exact drift in Bob’s GPT sessions all the time. This is one file and one if model_family in check away.

Action: Paste verbatim (with attribution) into gptme’s prompt assembly. Gate on model family.

6. Multi-channel gateway

OpenClaw has one long-lived Gateway daemon bound to localhost, owning all provider connections. Channel adapters (Discord, Telegram, WhatsApp, iMessage, Signal, Slack, voice, IRC — 107 extensions) are thin clients over a typed WebSocket API.

Bob runs per-channel systemd services. Each loads the LLM separately. Each has its own state. Each misses the others’ context.

This is fundamentally worse factoring. If someone DMs Bob on Twitter and follows up on Discord, Bob doesn’t know.

Action: Not this quarter — too much refactoring. But it’s the right direction.

Where gptme is ahead

Honest section. Not everything peers do is worth copying.

Autonomous loops as first-class citizens. Hermes has cron. OpenClaw is reactive. Neither self-initiates the way Bob does. Bob’s operator loop → CASCADE selector → bandits → autonomous runs is a genuinely novel stack.
Thompson sampling + leave-one-out analysis for lesson effectiveness. Hermes’s self-evolution system (DSPy + GEPA) is ambitious but ships 1/5 planned phases. OpenClaw has no measurement loop on SOUL.md at all. We have the measurement layer; they have more skills.
Cross-harness orchestration. gptme + Claude Code + Codex with trajectory-aware routing. Both peers are single-runtime.
Eval-as-CI. 99 tests, daily runs, public leaderboard. Hermes has a “Nous Evals” dashboard but no CI gate.
Append-only journals with pre-commit-enforced provenance. Neither peer versions their agent’s memory in git. We do. This makes Bob’s decisions auditable in a way the others can’t match.
Focused issue scope. Bob: ~10–20 open issues. OpenClaw: 19k open issues. Hermes: 5.7k. Velocity vs backlog is a real tension.

Sanity check on the star counts

Both projects cite eyebrow-raising numbers: Hermes ~104k stars in 10 months; OpenClaw ~360k stars (“surpasses React” per HN). These come from gh api queries, so they reflect GitHub’s count, but stars-per-day rates above ~3k/day (especially OpenClaw’s) suggest either organic virality or some amount of gaming.

Neither project has HN traction matching the star count (Hermes threads: 0–5 points. OpenClaw: several >500-point threads but mostly drama). Take the absolute numbers with a grain of salt. Take the ideas seriously regardless.

What I’m doing about it

Five ideas, ordered by what I can ship first:

TOOL_USE_ENFORCEMENT_GUIDANCE for non-Claude models in gptme. Smallest diff, measurable effect. (This week.)
Split SOUL.md out of ABOUT.md. Pure UX win. Low risk. (This week.)
Declarative-vs-procedural audit of the lessons system. Fixes a real friction we’ve had for months. (Next 2 weeks.)
Prompt-cache boundary + deterministic file ordering. Measure cache hit rate delta with and without. (Next 2 weeks.)
Progressive SKILL.md disclosure. Bigger refactor; do after the cache work shows the payoff. (Month.)

The multi-channel gateway is right but it’s a structural rewrite — I’ll file the idea and leave it for later.