Skills vs. Prompts: The Architecture Behind Efficient AI Agents

Most people using AI agents are burning tokens they don't need to burn. The default behavior of loading everything into context — system prompts, memory files, tool descriptions, project notes — fills the context window quickly, degrades output quality as the window saturates, and creates compaction events that interrupt flow. The fix is an architectural pattern called progressive disclosure, and it's the difference between an agent that holds up under a real workload and one that falls apart after a few hours.

Why Context Bloat Happens

A context window in a current frontier model holds roughly 200,000–250,000 tokens. That sounds large until you account for what goes into it: the system prompt, any persistent instruction files, all tool definitions, the codebase or data the agent is working with, and the full conversation history. Each of these grows over a session, and the model's effective performance degrades as the window fills — not because the model forgets, but because relevant information gets diluted by volume.

The common culprit is the always-loaded instruction file: a document that describes how the agent should behave, what tools are available, and what workflows to follow. If this file is large and loaded on every turn, it costs hundreds of tokens per exchange even on turns where its content is irrelevant.

What Progressive Disclosure Fixes

Skills solve this problem by changing when information loads. A skill file contains a name, a one-line description, and detailed instructions — but only the name and description sit in the active context. The agent reads the full file only when it determines the skill is relevant to the current task.

The token math is material: a skill registered this way costs roughly 50–60 tokens per conversation turn. An equivalent always-loaded instruction file costs 900+ tokens per turn. In a long session with a dozen potential skills, the difference is tens of thousands of tokens — the equivalent of several thousand words of context that can instead be used for the actual work.

Building Skills That Hold Up

The right way to create a skill is through use, not anticipation. Walk through the workflow with the agent once, correct mistakes in real time, and wait until you've had a successful run before capturing the steps as a skill. Skills written from abstraction tend to miss the edge cases that only appear during actual execution.

Once a skill is created, failures become update opportunities. When the agent hits a case the skill doesn't handle well, identify the gap, have the agent fix it, and update the skill file. After several iterations of this loop, skills become reliable enough to run without supervision.

Scaling Without Losing Control

Multi-agent architectures — where a lead agent spawns sub-agents for parallelizable tasks — amplify both the benefits and risks of context design. A lead agent with clean, skill-based context keeps its own reasoning uncluttered while delegating specific workflows to sub-agents that have their own targeted context. The failure mode to avoid is adding sub-agents before the foundational workflows are reliable — complexity at scale doesn't fix problems, it amplifies them.

The practitioners getting consistent results from AI agents are the ones who treat context architecture as seriously as they treat prompting. Skills are the mechanism; progressive disclosure is the principle.

Skills vs. Prompts: The Architecture Behind Efficient AI Agents

Skills vs. Prompts: The Architecture Behind Efficient AI Agents

Why Context Bloat Happens

What Progressive Disclosure Fixes

Building Skills That Hold Up

Scaling Without Losing Control

Stay in the loop

Stay in the loop