Esc
Type to search posts, tags, and more...
Skip to content

The anatomy of an AI agent

AI agents are starting to look like operating systems. The LLM is the CPU, the agent is the OS, and skills and MCPs are the applications — here's why that analogy holds up.

Contents

The more I work with Claude Code, the more I realize we’re not just using a chatbot. We’re using something that looks a lot like a computer. Not metaphorically — structurally. The architecture of a modern AI agent maps almost perfectly onto the anatomy of a traditional computer system.

The LLM is the CPU. It’s the raw processing power — takes in instructions, reasons about them, produces output. It doesn’t know what’s on your filesystem, doesn’t have opinions about your tech stack, and forgets everything the moment the session ends. Just like a CPU, it’s powerful but useless without an operating system to direct it.

The agent is the operating system. Claude Code, Cursor, Codex — these are the OS layer. They manage the context window (memory), schedule tool calls (process management), handle file I/O, and mediate between the user and the raw model. The agent decides when to read a file, which tool to call, and how to break a task into steps.

Skills, commands, and MCPs are the applications. They extend what the agent can do — domain-specific knowledge, user-defined workflows, external integrations. You install them, and the agent gains new capabilities. Just like installing apps on a phone.

Skills: domain expertise, loaded on demand

A skill is a folder containing a SKILL.md file and optional code or reference material. It encodes domain-specific knowledge — how to approach a particular type of task, what conventions to follow, which tools to use. Think of it as installing a specialized app on your OS.

The key advantage: skills load on demand. The agent doesn’t carry every skill in its context window at all times. When a user’s request matches a skill’s trigger criteria, the agent loads that skill’s instructions into context — just like an OS loading a program into memory when you launch it.

I’ve been building skills for my own workflows. In a previous post, I described a system for AI-assisted engineering that I packaged as a skill. It gives Claude Code a structured development lifecycle — /x-plan, /x-build, /x-verify, /x-docs — with context files that persist across sessions. Installing this skill turns a general-purpose coding agent into one that follows my specific engineering process.

More recently, I built a skill that equips Claude Code with image generation via the Gemini API. This skill includes a prompting guide, an API reference, and a Python script that calls Gemini’s Nano Banana Pro model. Once loaded, Claude can generate images directly from a conversation:

python3 scripts/generate_image.py "A minimalist network topology diagram" \
  -m pro -ar 16:9 -s 2K -o topology.png

The skill bundles the how (API reference, prompting techniques) with the what (the generation script) — so the agent doesn’t just have access to a tool, it knows how to use it well.

Commands: shell scripts for the agent

If skills are applications, commands are shell scripts — predefined workflows you invoke by name. In Claude Code, a command is a markdown file in .claude/commands/ that gets expanded into a full prompt when you type its slash command.

The distinction matters. A skill provides knowledge the agent can draw on whenever relevant. A command is an explicit action — you trigger it, and the agent follows a specific sequence of steps. Skills are loaded automatically when matched; commands are invoked deliberately.

For this blog, I have commands like /add-blog-post and /add-microblog. When I run /add-blog-post with some raw notes, Claude Code gets a full prompt with the site’s frontmatter schema, tag taxonomy, voice guidelines, and formatting rules. It creates the .mdx file, structures the frontmatter, and runs the build to verify. The command encodes the entire procedure — I don’t re-explain it each session.

This maps cleanly to how shell scripts work on a real OS. You don’t manually type out a 20-step deployment process every time — you write a script and run it. Commands do the same for agent workflows. They’re composable too: a command can reference skills, invoke MCP tools, and chain multiple steps together.

.claude/commands/
├── add-blog-post.md      # Create MDX post with correct frontmatter + build
├── add-microblog.md       # Create short-form entry to the feed
└── migrate-post.md        # Fetch external post via Firecrawl + adapt

MCPs: remote access to external systems

Model Context Protocol (MCP) servers are the other half of the application layer. Where skills provide knowledge and workflows, MCPs provide connectivity. They give the agent access to external systems through a standardized client-server protocol.

Each MCP server exposes three types of capabilities:

  • Tools — executable functions (like API endpoints the agent can call)
  • Resources — data and context the agent can read
  • Prompts — reusable templates for common interactions

The architecture is straightforward. The agent (host) maintains connections to MCP servers, each exposing a set of tools. When the agent needs to interact with GitHub, it calls the GitHub MCP. When it needs documentation, it calls Context7. When it needs to scrape a URL, it calls Firecrawl.

Here’s what my current MCP setup looks like in practice:

MCP ServerWhat it provides
GitHubRepository management, PR creation, issue tracking, code search
SupabaseDatabase queries, migrations, edge functions, project management
Context7Up-to-date library documentation and code examples
FirecrawlWeb scraping, content extraction, site crawling
Mermaid ChartDiagram validation and rendering

Each of these is like an app installed on the OS. The agent discovers what tools are available, decides which ones to invoke based on the task, and handles the responses. I don’t need to tell Claude Code “use the GitHub MCP to create a PR” — it figures out the right tool to call from the context, the same way an OS routes a file open to the correct application.

Why the OS analogy matters

This isn’t just a cute metaphor. It reflects a real architectural shift in how AI agents work.

Early AI coding tools were specialized — a code completion engine, a chat interface for questions, a separate tool for code review. Each was a single-purpose application. What’s happening now is that agents are becoming general-purpose platforms that load specialized capabilities on demand.

Researchers at Rutgers formalized this in their AIOS paper, treating LLMs as the kernel of an operating system with agents as applications running on top. IBM is building what they call an “Agentic Operating System” for enterprise orchestration. OpenAI turned ChatGPT into a platform at DevDay 2025, with an Apps SDK that lets third parties build applications inside conversations.

The pattern is clear: every major computing paradigm eventually needs an OS. Hardware got operating systems. Distributed services got Kubernetes. Cloud infrastructure got control planes. AI agents are following the same trajectory.

What this means in practice

If you’re using Claude Code (or any agentic tool), think about it in these terms:

  1. Invest in the OS layer. A CLAUDE.md project constitution and .ai/ context directory are your OS configuration. They tell the agent how to behave in your environment.

  2. Build skills for repeated workflows. If you find yourself explaining the same process to the agent session after session, that’s a skill waiting to be written. Package the knowledge, the references, and the scripts into a folder with a SKILL.md.

  3. Connect MCPs for the tools you actually use. Don’t install every MCP available — connect the ones that match your workflow. GitHub, your database, your documentation sources.

  4. Let the agent orchestrate. The power isn’t in any single skill or MCP. It’s in the agent’s ability to combine them — reading a GitHub issue, querying the database for context, generating code that follows your project conventions, and creating a PR. That’s the OS doing its job.

The competitive question in AI is shifting. It’s no longer just “who has the best model” — it’s who builds the best operating system around it.

! Was this useful?