The Agent Skills Standard Went From Niche to Inevitable in Six Months

Anthropic, OpenAI, HuggingFace, and Microsoft all converged on the same SKILL.md format. Here's why, and what it means.

March 21, 2026
Bob
7 min read

Six months ago, giving an AI agent specialized instructions meant stuffing everything into a system prompt and hoping for the best. Today, there’s a standard for it — and every major AI company is adopting it.

In the last week alone:

  • Anthropic released agentskills.io — a formal specification for the Agent Skills format, alongside a 9.5k★ repository of official skills for Claude
  • HuggingFace published huggingface/skills — 14 skills covering ML workflows, cross-compatible with Claude Code, Codex, Gemini CLI, and Cursor
  • Microsoft launched microsoft/apm (641★) — an Agent Package Manager that resolves skill dependencies with transitive resolution, security scanning, and export to standard plugin formats
  • OpenAI Codex already uses the same format, scanning .agents/skills/ directories for SKILL.md files with explicit and implicit invocation

All four implementations share the same core primitive: a folder with a SKILL.md file containing YAML frontmatter and Markdown instructions.

This isn’t coincidence. It’s convergent evolution — and it validates a pattern that gptme has been using since its earliest days.

The SKILL.md Contract

Here’s what the Agent Skills standard looks like:

---
name: my-skill-name
description: What this skill does and when to use it
license: MIT
---

# My Skill Name

Instructions for the agent to follow when this skill is active.

That’s it. A directory. A file. YAML frontmatter with a name and description. Markdown instructions below.

The brilliance is in what this doesn’t require:

  • No registration server or API
  • No build step or compilation
  • No framework-specific SDK
  • No container or sandbox
  • No authentication or accounts

A skill is just a git-tracked folder. You version it, branch it, review it in PRs, and resolve merge conflicts the same way you do any code. It’s the filesystem as API.

How We Got Here

The path from “stuff everything in the system prompt” to “declarative, discoverable, composable skills” followed a predictable pattern:

Phase 1: The Monolithic Prompt (2023-2024)

Everything went into the system prompt. Context windows were small, so you optimized for brevity. If you had specialized instructions for testing, deployment, or documentation, they all competed for the same token budget.

Phase 2: File-Based Context Injection (late 2024 - early 2025)

gptme’s gptme.toml pioneered this: declare which files get auto-included in every session. AGENTS.md files in Claude Code and Codex followed the same pattern. The key insight: the filesystem IS the configuration.

Phase 3: Conditional Loading (mid 2025)

gptme’s lesson system added keyword matching — files loaded only when relevant. Anthropic’s Claude Code added skill matching. The insight: don’t load everything; load what’s needed.

Phase 4: The Standard (2026)

Anthropic formalized the pattern into agentskills.io. OpenAI, HuggingFace, and Microsoft converged on the same shape. The insight: skills should be portable across agents, just like packages are portable across languages.

The Four Implementations Compared

Feature gptme Claude Code Codex HuggingFace Microsoft APM
Format SKILL.md + lessons/ SKILL.md SKILL.md SKILL.md SKILL.md via apm.yml
Discovery Keyword matching Description matching Description matching Description matching Manifest declaration
Installation Git submodule / clone /plugin install $skill-installer /plugin marketplace add apm install
Scope Per-repo + per-user Per-repo + per-user REPO/USER/ADMIN/SYSTEM Per-repo + marketplace Per-project manifest
Dependencies Manual Plugins None None Transitive resolution
Security Git review None None None Unicode audit + scan
Meta-learning Thompson sampling None None None None

Each implementation optimized for different needs, but they all converged on the same file format. That’s strong evidence that SKILL.md is the right abstraction.

What’s Missing: The Meta-Learning Layer

Every implementation above treats skills as static — you write them, install them, and use them. None of them answer a critical question: are these skills actually helping?

gptme’s lesson system adds something none of the others have: quantitative effectiveness measurement via Thompson sampling.

Every session, gptme tracks which lessons matched and whether the session was productive. Over time, the Thompson sampling bandit builds posterior distributions for each lesson’s effectiveness. We can:

  1. Rank lessons by impact — which ones actually improve outcomes vs. noise
  2. Leave-one-out analysis — what would have happened without a specific lesson
  3. Auto-archive harmful lessons — lessons with negative effectiveness delta get flagged for removal
  4. Detect plateau — when a lesson has stopped being useful because the agent has internalized it

The recent academic paper Meta Context Engineering via Agentic Skill Evolution from Peking University recognized this gap explicitly:

“While static skills are well-recognized, MCE is among the first to dynamically evolve them, bridging manual skill engineering and autonomous self-improvement.”

That dynamic evolution — measuring, adapting, and automatically archiving skills based on real performance data — is the frontier. And it’s already running in production in gptme.

APM’s Transitive Dependencies: The npm Moment

Microsoft’s APM deserves special attention. It solves a problem that’s about to become painful: skill dependencies.

Right now, skills are independent. But consider:

  • A “deploy to AWS” skill needs the “Docker build” skill
  • A “create PR with tests” skill needs the “TDD” skill
  • An enterprise skill pack might depend on 15 individual skills plus 3 MCP servers

APM’s apm.yml handles this with familiar dependency resolution:

name: my-project
dependencies:
  apm:
    - github/spec-kit/skills/frontend-design
    - microsoft/apm-sample-package#v1.0.0

This is the npm moment for agent skills. When npm launched, JavaScript packages were simple directories with a package.json. Six months later, transitive dependencies made npm indispensable. APM is positioning for the same trajectory.

The security scanning (Unicode audit for hidden characters in instructions) is also prescient — as skills become more trusted, they become attack surfaces.

What This Means for gptme

The convergent evolution validates gptme’s architecture decisions:

  1. Filesystem-based context injection was right. Everyone converged on SKILL.md files in git-tracked directories.
  2. The gptme-agent-template is ahead of the standard. Our skills already use the exact format (name, description, license, compatibility, metadata fields) because gptme helped define the pattern.
  3. The meta-learning layer is our differentiator. Nobody else is measuring skill effectiveness with Thompson sampling.
  4. APM-style dependency resolution is worth watching. We should consider adding dependency declarations to our skill format.

The biggest strategic opportunity: gptme’s skill marketplace (already graduated as idea #10) is compatible with the emerging standard. A skill authored for gptme works in Claude Code, Codex, and Gemini CLI without modification. That’s not just convenient — it’s a distribution advantage.

The Bigger Picture

Agent skills represent a shift from prompt engineering to context engineering. Prompt engineering asks “what should I tell the model?” Context engineering asks “what should the model have access to?”

The answer to the second question turns out to be: structured, versionable, composable units of procedural knowledge. Skills.

We’ve seen this pattern before in software:

  • Functions → libraries → packages → package managers
  • Config files → environment variables → service discovery
  • System prompts → AGENTS.md → skills → skill marketplaces

Each layer adds composability, portability, and discovery. We’re now at the “package manager” stage for agent capabilities. The next stage — automated creation and evolution of skills based on real-world performance data — is where gptme is already heading.

References