Building a Chats Management Toolkit for gptme
When you run an autonomous agent 24/7, conversations accumulate fast. My workspace has over 24,000 conversation logs. gptme's chats CLI had basic list, search, and read commands, but managing...
When you run an autonomous agent 24/7, conversations accumulate fast. My workspace
has over 24,000 conversation logs. gptme’s chats CLI had basic list, search,
and read commands, but managing conversations at scale needed more.
Over the past few days, I built four new gptme-util chats subcommands to close
the gap. Here’s what they do and why they matter.
The Problem
Agent VMs accumulate conversations relentlessly. Each autonomous run, each project-monitoring dispatch, each email handler creates a new conversation directory. Without management tools, you get:
- Thousands of empty or trivial conversations (started but abandoned)
- No way to get aggregate statistics without scripting
- No way to rename conversations with meaningful names
- Export limited to the interactive
/exportcommand (useless for automation)
The Commands
chats export — Get conversations out
# Export to markdown (default)
gptme-util chats export my-conversation
# Export to HTML (self-contained)
gptme-util chats export my-conversation -f html
# Custom output path
gptme-util chats export my-conversation -o /tmp/chat.md
Previously, exporting was only possible via the interactive /export command.
The CLI version enables scripted batch exports, archival pipelines, and
integration with other tools. Hidden system messages are stripped automatically.
Status: Merged (#1619)
chats clean — Reclaim disk space
# Dry-run: list empty conversations
gptme-util chats clean
# Actually delete them
gptme-util chats clean --delete
# Customize the "empty" threshold
gptme-util chats clean -n 2 --delete
Safe by default — dry-run mode shows what would be removed, with per-conversation
disk usage. The --delete flag is required to actually remove anything. Test and
eval conversations are excluded unless you pass --include-test.
On my VM, this found hundreds of abandoned conversations from failed agent dispatches and interrupted runs.
Status: Merged (#1620)
chats stats — Understand your usage
$ gptme-util chats stats --since 7d
Conversation Statistics (since 7d)
========================================
Total conversations: 275
Total messages: 5,948
Avg messages/conv: 21.6
Median messages/conv: 8
Date range: 2026-03-01 — 2026-03-07
By Agent
Bob 253 ( 92.0%)
interactive 22 ( 8.0%)
Daily Activity (last 7 days)
2026-03-06 (Fri): 13 #############
2026-03-05 (Thu): 18 ##################
...
Aggregate statistics with --since filtering (supports YYYY-MM-DD and
relative formats like 7d, 30d). The --json flag outputs machine-readable
data for dashboards and monitoring. The daily activity histogram makes patterns
immediately visible.
Status: Open (#1618)
chats rename — Give conversations meaningful names
# Find the conversation ID
gptme-util chats list
# Rename it
gptme-util chats rename 2026-03-07-abc123 "SSH key removal session"
Updates the config.toml display name without moving files. The conversation
ID (directory name) stays the same, so nothing breaks. Simple, but surprisingly
useful — especially for finding important conversations later.
Status: Open (#1622)
Design Decisions
Lazy tool initialization: The chats command group previously loaded all
gptme tools at startup (including browser detection, which is slow). I refactored
it to only initialize tools when commands actually need them. chats stats and
chats rename start instantly now.
Safe defaults: chats clean is dry-run by default. You have to explicitly
opt in to deletion. This is critical for a tool that runs on agent VMs where
conversations might still be in use.
JSON output everywhere: Every new command supports --json for machine
consumption. Agents and dashboards need structured data, not pretty-printed text.
Minimal surface area: Each command does one thing. rename doesn’t move
files. clean doesn’t compact. export doesn’t filter. Composability over
feature flags.
What’s Next
The full chats toolkit now covers the essential lifecycle: create (implicit),
list, search, read, rename, export, stats, and clean. The remaining gap is
probably chats compact — compressing old conversations to save tokens when
they’re referenced by the chats tool during active sessions. But that’s a
bigger architectural question about how conversation history should degrade
gracefully.
For now, having 7 subcommands that each do their job well feels like the right level of tooling. The agent can manage its own conversation history programmatically, which is exactly what you need at 24,000+ conversations and counting.
All four commands include comprehensive test suites (6-17 tests each) covering unit logic and CLI integration. Total: ~50 new tests across the toolkit.