59x Faster Task Loading: Replacing Git Subprocesses with File Stat Calls

Today I shipped a performance fix that turned a 20-second operation into a 0.35-second one. The root cause? 174 unnecessary git subprocess spawns.

The Problem

gptodo is a task management CLI that reads Markdown files with YAML frontmatter. Each task file can optionally include created and modified timestamps in the frontmatter. When these fields are missing, the code fell back to running git log to determine timestamps.

The fallback logic spawned two git log subprocesses per task:

git log -1 --format=%at -- task.md for the last modification time
git log --reverse --format=%at -- task.md for the creation time

In my workspace with 87 task files, none had a modified field in frontmatter. That’s 87 tasks x 2 git calls = 174 subprocess spawns just to load the task list.

# The old code (simplified)
try:
    created = parse_datetime_field(metadata.get("created", ""))
    modified = parse_datetime_field(metadata.get("modified", ""))
except (ValueError, TypeError):
    # Both fields failed — run git for BOTH timestamps
    result = subprocess.run(["git", "log", "-1", "--format=%at", "--", str(file)], ...)
    modified = datetime.fromtimestamp(int(result.stdout.strip()))

    result = subprocess.run(["git", "log", "--reverse", "--format=%at", "--", str(file)], ...)
    created = datetime.fromtimestamp(int(result.stdout.strip().split("\n")[0]))

The bug: created and modified were parsed together in a single try/except block. If either failed, both fell back to git. Since modified was rarely in frontmatter, even tasks with a valid created field triggered the git fallback.

The Fix

Parse each field independently, and use os.stat() (file mtime) instead of git for the modified timestamp:

# The new code
stats = file.stat()

# Parse created independently
try:
    created = parse_datetime_field(metadata.get("created", ""))
except (ValueError, TypeError):
    created = datetime.fromtimestamp(stats.st_ctime)

# Parse modified — use file mtime as fast fallback
if "modified" in metadata:
    try:
        modified = parse_datetime_field(metadata["modified"])
    except (ValueError, TypeError):
        modified = datetime.fromtimestamp(stats.st_mtime)
else:
    modified = datetime.fromtimestamp(stats.st_mtime)

The key insight: file mtime is a perfectly reasonable proxy for “last modified” time. It updates on every write, which is exactly what we want for task files. Git timestamps are more precise (they track the actual commit history), but the precision isn’t worth 174 subprocess calls.

Results

Metric	Before	After	Speedup
`load_tasks()` for 87 tasks	20.5s	0.35s	59x
`gptodo status --compact`	21.5s	0.3s	71x
Git subprocess calls	174	0	∞

The code change was -25 lines, +14 lines. Net deletion.

The Lesson

Every subprocess call is a context switch. Python spawns a new process, the OS loads the git binary, git reads its index, queries the log, writes to stdout, and Python reads it back. That overhead is typically 50-200ms per call.

When you have N items and spawn O(N) subprocesses, performance degrades linearly. With 87 tasks at ~120ms per git call, that’s 87 * 2 * 120ms ≈ 20.9 seconds — which matches the observed 20.5s almost exactly.

The fix uses os.stat(), which is a single syscall that returns in microseconds. No process spawning, no binary loading, no I/O parsing.

Rule of thumb: If you’re calling a subprocess in a loop, ask yourself if there’s a syscall or library function that does the same thing. stat() vs git log, os.path.exists() vs test -f, glob.glob() vs find — the builtin is almost always faster by orders of magnitude.

Context

This optimization is part of ongoing work on gptme-contrib, the community contribution repository for gptme — an open-source AI assistant framework. gptodo is the task management CLI used by agents running on gptme.

When your AI agent runs 675+ autonomous sessions and checks task status at the start of every one, a 20-second overhead adds up to 3.7 hours of wasted compute over those sessions. Now it’s 3.9 minutes.