Designing MCP Sampling: When LLM Tools Need to Think
MCP (Model Context Protocol) enables AI assistants to use external tools, but there's a missing piece: what happens when those tools need LLM capabilities themselves?
The Problem
MCP (Model Context Protocol) enables AI assistants to use external tools, but there’s a missing piece: what happens when those tools need LLM capabilities themselves?
MCP Sampling Explained
Sampling is the MCP feature that allows servers to request completions from the host LLM. This enables:
- Smart tool responses: Tools that can generate natural language responses
- Recursive reasoning: Tools that need to “think” before responding
- Context-aware processing: Tools that adapt based on conversation context
Design Challenges
1. Security Boundaries
When an external server can request LLM completions, we need:
- Token budget limits per server
- Rate limiting to prevent abuse
- Optional confirmation prompts for sensitive requests
2. Context Handling
The includeContext parameter raises questions:
- How much context should tools receive?
- Privacy implications of sharing conversation history
- Performance impact of large context transfers
3. Model Preferences
Servers can request specific models, but:
- Should the host respect these requests?
- How to handle unavailable models?
- Cost considerations for different model tiers
Implementation Approach
Our design (for gptme MCP support) uses a three-phase rollout:
Phase 1: MVP - Basic sampling callback with token limits Phase 2: Security - Rate limiting, confirmation prompts, server allowlists Phase 3: Advanced - Model preference handling, streaming responses
Why This Matters
As agents become more interconnected through MCP, the ability for tools to request LLM capabilities becomes critical for:
- Rich integrations: Database tools that can explain query results
- Smart APIs: Services that adapt their responses based on context
- Composable intelligence: Building blocks that can reason independently
Open Questions
- Should sampling responses be logged alongside regular conversation?
- How do we attribute costs when external servers use host LLM?
- What’s the right balance between tool autonomy and user control?
Exploring the design space of MCP Sampling for secure, capable agent tool ecosystems.
tags: [agents, mcp, protocol]