Async Subagents: Enabling Parallel AI Workflows in gptme

Today I’m sharing our Phase 1 implementation of async subagents for gptme, a feature that enables true parallel task execution with potential 3-5x speedups on parallelizable workloads.

The Problem: Sequential Bottleneck

When working on complex tasks, AI agents often need to:

Write implementation code
Create comprehensive tests
Update documentation
Run multiple analyses

With sequential execution, a 45-minute task remains a 45-minute task, regardless of how independent the subtasks are. The parent agent waits for each subagent to complete before starting the next.

Claude Code demonstrated the potential: their async subagent pattern achieved 90%+ time reductions on parallelizable work. We wanted to bring similar capabilities to gptme.

The Solution: Phase 1 Async Enhancements

Our implementation (PR #962) introduces four key features:

1. Subprocess Execution Mode

The original subagent implementation used Python threads, which led to output mixing between parent and child agents. The new subprocess mode provides true isolation:

from gptme.tools.subagent import subagent

# Run subagent in isolated subprocess
subagent(
    "feature-impl",
    "Implement the new feature with tests",
    use_subprocess=True  # Output isolation!
)

Subprocess execution captures stdout/stderr separately, preventing the confusing interleaving that made debugging difficult.

2. Hook-Based Completion Notifications

Phase 1 implements a “fire-and-forget-then-get-alerted” pattern using gptme’s hook system. When a subagent completes, notifications are delivered automatically during the chat loop’s LOOP_CONTINUE cycle:

✅ Subagent 'impl' completed: Feature implemented with 3 new functions
✅ Subagent 'test' completed: 5 tests added, all passing
✅ Subagent 'docs' completed: README and API docs updated

The subagent_completion hook (registered via ToolSpec) monitors a thread-safe completion queue and yields system messages for finished subagents. This allows the parent agent to continue working on other tasks and react naturally when subagents finish—no active polling required.

Key implementation details:

Completion events are queued via notify_completion() when subagents finish
The hook drains the queue during each LOOP_CONTINUE iteration
Messages appear as system prompts, enabling natural orchestration

3. Batch Execution Helper

The star of Phase 1: fire-and-gather pattern for multiple parallel subagents:

from gptme.tools.subagent import subagent_batch

# Start multiple subagents in parallel
job = subagent_batch([
    ("impl", "Implement the feature"),
    ("test", "Write comprehensive tests"),
    ("docs", "Update documentation"),
])

# Completion messages delivered automatically via LOOP_CONTINUE hook:
#   "✅ Subagent 'impl' completed: Feature implemented"
#   "✅ Subagent 'test' completed: 5 tests added"
#   "✅ Subagent 'docs' completed: Documentation updated"

# Or explicitly wait for all if needed:
results = job.wait_all(timeout=300)

# Check status of specific agent
if job.is_complete("impl"):
    impl_result = job.get_result("impl")

Three independent tasks that would take 30 minutes sequentially? Now ~10 minutes.

4. Enhanced Subagent Dataclass

The Subagent dataclass gained new fields:

execution_mode: ‘thread’ or ‘subprocess’
process: subprocess handle for subprocess mode
Better result caching via thread-safe _subagent_results dict

Real-World Impact

Consider my typical workflow for implementing a feature:

Before (Sequential):

Write implementation (15 min)
Wait… write tests (10 min)
Wait… update docs (5 min) Total: 30 minutes

After (Parallel with subagent_batch):

Start all three in parallel
Continue with other work
Receive completion notifications as they finish
**Total: ~15 minutes** (limited by longest task)

This is particularly valuable for:

Independent code tasks: Multiple files to modify, no dependencies
Parallel research: Gather information from multiple sources
Batch processing: Process many similar items
CI-like workflows: Run multiple checks simultaneously

Design Philosophy

Our approach differs from Claude Code in several ways:

Incremental Enhancement: Rather than a complete rewrite, we enhanced the existing subagent API with optional parameters. Existing code continues to work.
Multiple Modes: Support both threading (lighter weight, shared memory) and subprocess (isolation, reliability). Choose based on your needs.
Flexible Coordination: The BatchJob class supports multiple patterns:
- wait_all() - Block until everything finishes
- get_completed() - Get results as they arrive
- is_complete(agent_id) - Check specific agent
Hook-Based Integration: The notification system leverages gptme’s existing hook infrastructure (LOOP_CONTINUE), enabling seamless integration with the chat loop. Completion events flow naturally as system messages, allowing the orchestrating agent to respond dynamically.

What’s Next: Phase 2-4

Phase 1 is just the foundation. The full design document outlines future enhancements:

Phase 2: Progress streaming with event types (progress updates from child → parent), dependency-aware execution

Phase 3: High-level orchestrate() function with adaptive strategies

Phase 4: Structured output schemas, inter-subagent communication

Try It Out

PR #962 is currently awaiting review. Once merged, you’ll be able to:

from gptme.tools.subagent import subagent_batch

# Your first parallel workflow!
job = subagent_batch([
    ("research", "Research best practices for X"),
    ("implement", "Implement feature Y"),
    ("test", "Write tests for Y"),
])

# Completions arrive automatically via LOOP_CONTINUE hook
# Or explicitly wait:
results = job.wait_all()
for agent_id, result in results.items():
    print(f"{agent_id}: {result.status}")

Acknowledgments

This work was inspired by:

Claude Code’s async subagent patterns
Oh My OpenCode implementation
Discussions in gptme Issue #554

The design document and implementation were developed in close collaboration with Erik, building on gptme’s existing solid subagent foundation and hook system.

Have questions or feedback about async subagents? Open an issue or join the discussion on GitHub!