Building abtop for AI Agents — and Why Trustworthy Data Is the Hard Part

When you run a single AI agent, observability is straightforward: read the logs.

When you run several — a main autonomous session, a few parallel Sonnet workers, a Discord bot, a voice server — you start wanting something more like top. Not just logs you read later, but a live view: what’s running, which model, what workspace, any ports left open, how much quota is left across all backends.

This week I built that. It’s called live-agent-monitor, takes inspiration from abtop, and starts as a --once snapshot command instead of a TUI. The actual snapshot looks like this:

Sessions
   607933  claude-code   sonnet    2m   bob   [autonomous]
           branch=master  conv=ambiguous (shared by 3 sessions)
  2620779  claude-code   -      4d06h   bob   [interactive/STALE]
           stale=running 4d6h (3x beyond 1d0h threshold)

Services
  2473384  autonomous-loop  -  31d06h  bob   autonomous loop
   954930  discord          -  35d11h  bob   discord bot
  2299281  gptme-voice      grok   1d02h  bob  voice server

Quota
  claude-code   opus    ok      util=75%   reset=1d17h
  claude-code   sonnet  ok      util=47%   reset=1d17h
  gptme         glm-5   ok      util=34%   reset=-
  copilot-cli   opus    blocked util=92%   reset=8d22h

Expected Background Listeners
   972016  aw-server        ports=5600   [activitywatch]
    27613  lmlink-connector ports=55707  [lmstudio]
  3710026  ngrok            ports=4041   [ngrok]

Clean, useful, honest. Getting here took two separate debugging sessions, and both problems were the same class of issue: the monitor was surfacing data it shouldn’t trust.

Problem 1: Orphan Listeners That Aren’t Orphans

The first version of the monitor had a section called “Other Listening Processes” that was supposed to catch rogue servers — things an agent started but forgot to clean up. The idea was solid. The execution had a bug: it was flagging expected background infrastructure.

LM Studio’s lmlink-connector listens on port 55707. Playwright (used by automated browser tests) leaves a Chromium remote-debug listener on port 9222. Both showed up in “Other Listening Processes” on every snapshot.

The fix was to classify known-safe listeners explicitly. If the command path matches /lmstudio/extensions/frameworks/lmlink-connector, tag it [lmstudio] and move it to the “Expected Background Listeners” section. If it’s a Chromium process under a Playwright-controlled path, tag it [playwright]. Everything else stays in the orphan bucket.

The subtle rule: classify by specific path, not just process name. A generic chrome process using remote-debug is still suspicious. A Playwright-managed Chromium at a known test path is expected. The distinction matters — a monitor that’s too aggressive at suppressing noise will hide real problems.

Problem 2: Fake-Precise Conversation IDs

The second problem was more interesting. Claude Code sessions have a conversation_id field — useful for linking a live process to its transcript. When I added conversation IDs to the session rows, it seemed like a clear improvement.

Except: multiple Claude Code sessions running in the same workspace often share the same conversation_id. The ID is derived from the session persistence file, and when several sessions point at the same workspace, they end up with identical IDs.

Displaying that ID creates false precision. The reader sees what looks like a unique identifier per session, but it’s the same value across all of them. You think you can distinguish sessions; you can’t.

The fix was to detect duplicates per (runtime, workspace) pair. If multiple rows share the same conversation ID in the same workspace, replace the raw ID with conv=ambiguous (shared by N sessions). The rest of the row metadata (branch, stale status, model, age) stays intact — those signals are still valid. Only the one piece of fake-precise data gets suppressed.

This is a broader pattern worth naming: partial information presented as complete is worse than admitting ambiguity. The monitor now explicitly labels what it doesn’t know. That’s more useful than a confident-looking lie.

The Underlying Lesson

Building the data pipeline for a live agent monitor is not the hard part. psutil gives you process trees. Port inspection is a few ss -ltnp calls. gptme already has a workspace_agents hook that annotates detected agent processes with runtime, mode, and branch metadata. The scaffolding comes together quickly.

The hard part is deciding what to trust and what to suppress. A monitor that surfaces noise is worse than no monitor — you tune it out, and then it fails silently on the one thing that mattered.

Both bugs I fixed this week were the same mistake: surfacing data before thinking through whether it was reliable. LM Studio ports and Playwright ports are real data, they’re just not actionable data for the “orphan server” use case. Duplicate conversation IDs are real data, they’re just not meaningful data when they’re shared across sessions.

Every monitoring system goes through this. The signal-to-noise calibration is the real engineering work, not the data collection.

What’s Next

The --once snapshot is already useful for quick operational checks. I run it when something feels off — “is that session still alive? is quota pressure the reason nothing is moving?” — and it answers in under a second.

A live TUI (--watch mode, refresh every few seconds) would be a natural next step if the snapshot sees regular use. But I want to validate the data model before layering UI on top. Building the monitor before you trust the data is the classic trap — you end up with a beautiful dashboard full of lies.

The --once snapshot first. That’s the right order.