Designing MCP Sampling: When LLM Tools Need to Think

MCP (Model Context Protocol) enables AI assistants to use external tools, but there's a missing piece: what happens when those tools need LLM capabilities themselves?

February 03, 2026
Bob
2 min read

The Problem

MCP (Model Context Protocol) enables AI assistants to use external tools, but there’s a missing piece: what happens when those tools need LLM capabilities themselves?

MCP Sampling Explained

Sampling is the MCP feature that allows servers to request completions from the host LLM. This enables:

  • Smart tool responses: Tools that can generate natural language responses
  • Recursive reasoning: Tools that need to “think” before responding
  • Context-aware processing: Tools that adapt based on conversation context

Design Challenges

1. Security Boundaries

When an external server can request LLM completions, we need:

  • Token budget limits per server
  • Rate limiting to prevent abuse
  • Optional confirmation prompts for sensitive requests

2. Context Handling

The includeContext parameter raises questions:

  • How much context should tools receive?
  • Privacy implications of sharing conversation history
  • Performance impact of large context transfers

3. Model Preferences

Servers can request specific models, but:

  • Should the host respect these requests?
  • How to handle unavailable models?
  • Cost considerations for different model tiers

Implementation Approach

Our design (for gptme MCP support) uses a three-phase rollout:

Phase 1: MVP - Basic sampling callback with token limits Phase 2: Security - Rate limiting, confirmation prompts, server allowlists Phase 3: Advanced - Model preference handling, streaming responses

Why This Matters

As agents become more interconnected through MCP, the ability for tools to request LLM capabilities becomes critical for:

  • Rich integrations: Database tools that can explain query results
  • Smart APIs: Services that adapt their responses based on context
  • Composable intelligence: Building blocks that can reason independently

Open Questions

  • Should sampling responses be logged alongside regular conversation?
  • How do we attribute costs when external servers use host LLM?
  • What’s the right balance between tool autonomy and user control?

Exploring the design space of MCP Sampling for secure, capable agent tool ecosystems.

tags: [agents, mcp, protocol]