Esc
Type to search posts, tags, and more...
Skip to content

LLM as CPU, agent as OS

AI agents are starting to look like operating systems. The LLM is the CPU, the agent is the OS, and skills and MCPs are the applications — here's why that analogy holds up.

Contents

The more I work with Claude Code, the more I realize we’re not using a chatbot. We’re using something that looks a lot like a computer. Not metaphorically — structurally. The architecture of a modern AI agent maps almost perfectly onto the anatomy of a traditional computer system.

Andrej Karpathy was one of the first to frame it this way: the LLM is the CPU of a new computational stack. The context window behaves like RAM, long-term stores resemble disk, agents act as processes, and tools are invoked as system calls. Once you see it, you can’t unsee it. Every piece of the agent stack has a direct analog in classical computing.

The stack

The LLM is the CPU. It’s the raw processing power — takes in instructions, reasons about them, produces output. It doesn’t know what’s on your filesystem, doesn’t have opinions about your tech stack, and forgets everything the moment the session ends. A CPU is capable but useless without an operating system to direct it. Same applies here.

The agent is the operating system. Claude Code, Cursor, Codex — these are the OS layer. They handle the tool call loop, manage the context window (memory), enforce permissions, and mediate between you and the raw model. The LLM decides what to do; the agent makes it happen.

Skills, commands, and MCPs are the applications. They extend what the agent can do — domain-specific knowledge, user-defined workflows, external integrations. You install them, and the agent gains new capabilities. Same as installing apps on a phone.

The Rutgers AIOS paper (published at COLM 2025) formalized this exact architecture: three layers — Application, Kernel, and Hardware — mirroring traditional OS design. Their Kernel Layer includes a Scheduler, Context Manager, Memory Manager, Storage Manager, Tool Manager, and LLM Core(s), which they describe as “akin to CPU cores.” Agent queries get decomposed into sub-execution units (AIOS syscalls) for parallelism, and the Context Manager provides snapshot and restoration for LLM context switching — directly analogous to CPU context switching.

The parallels go deeper than architecture diagrams. Consider memory. vLLM adopted virtual memory techniques from OS design, paging the attention KV cache the same way an OS pages physical memory. System prompts exist once in physical memory but are referenced by thousands of concurrent agents — mirroring Unix fork() with copy-on-write. MemGPT implements a three-tier hierarchy: core memory (pinned), recall memory (summarized), and archival memory (paged in on demand). This isn’t a metaphor anymore. It’s convergent design.

Skills: domain expertise, loaded on demand

A skill is a folder containing a SKILL.md file and optional code or reference material. It encodes domain-specific knowledge — how to approach a particular type of task, what conventions to follow, which tools to use. Think of it as installing a specialized app on your OS.

The key advantage: skills load on demand. The agent doesn’t carry every skill in its context window at all times. When your request matches a skill’s trigger criteria, the agent loads that skill’s instructions into context — like an OS loading a program into memory when you launch it.

This follows what the Skills specification calls progressive disclosure, a three-level architecture:

  1. Level 1: YAML frontmatter (name + description) — loaded at startup, costs a few dozen tokens.
  2. Level 2: Full SKILL.md body — loaded when the skill is triggered.
  3. Level 3: Subdirectory resources — loaded on demand during execution.

This matters because context is expensive. Anthropic’s engineering team demonstrated that presenting MCP servers as code APIs rather than loading all tool definitions upfront reduced token usage from 150,000 tokens to 2,000 — a 98.7% reduction. The same principle applies to skills. You don’t load everything; you load what you need, when you need it.

Skills modify the agent’s preparation, not its output directly. This is the key distinction between a skill and a function call. A function call executes and returns a result. A skill changes how the agent approaches the problem.

I’ve been building skills for my own workflows. In a previous post, I described a system for AI-assisted engineering that I packaged as a skill. It gives Claude Code a structured development lifecycle — /x-plan, /x-build, /x-verify, /x-docs — with context files that persist across sessions. Installing this skill turns a general-purpose coding agent into one that follows my specific engineering process.

More recently, I built a skill that equips Claude Code with image generation via the Gemini API. This skill includes a prompting guide, an API reference, and a Python script that calls Gemini’s Nano Banana Pro model. Once loaded, Claude can generate images directly from a conversation:

python3 scripts/generate_image.py "A minimalist network topology diagram" \
  -m pro -ar 16:9 -s 2K -o topology.png

The skill bundles the how (API reference, prompting techniques) with the what (the generation script) — so the agent doesn’t just have access to a tool, it knows how to use it well.

Commands: shell scripts for the agent

If skills are applications, commands are shell scripts — predefined workflows you invoke by name. In Claude Code, a command is a markdown file in .claude/commands/ that gets expanded into a full prompt when you type its slash command.

The distinction matters. A skill provides knowledge the agent can draw on whenever relevant. A command is an explicit action — you trigger it, and the agent follows a specific sequence of steps. Skills are loaded automatically when matched; commands are invoked deliberately.

For this blog, I have commands like /add-blog-post and /add-microblog. When I run /add-blog-post with some raw notes, Claude Code gets a full prompt with the site’s frontmatter schema, tag taxonomy, voice guidelines, and formatting rules. It creates the .mdx file, structures the frontmatter, and runs the build to verify. The command encodes the entire procedure — I don’t re-explain it each session.

This maps cleanly to how shell scripts work on a real OS. You don’t manually type out a 20-step deployment process every time — you write a script and run it. Commands do the same for agent workflows. They’re composable too: a command can reference skills, invoke MCP tools, and chain multiple steps together.

.claude/commands/
├── add-blog-post.md      # Create MDX post with correct frontmatter + build
└── add-microblog.md       # Create short-form entry to the feed

MCPs: remote access to external systems

Model Context Protocol (MCP) servers are the other half of the application layer. Where skills provide knowledge and workflows, MCPs provide connectivity. They give the agent access to external systems through a standardized client-server protocol using JSON-RPC 2.0.

Each MCP server exposes three types of capabilities:

  • Tools — executable functions (like API endpoints the agent can call)
  • Resources — data and context the agent can read
  • Prompts — reusable templates for common interactions

The growth has been fast. Anthropic launched MCP in November 2024. By April 2025, server downloads had grown from roughly 100,000 to over 8 million, with more than 5,800 servers and 300+ clients in the ecosystem. In December 2025, the protocol was donated to the Linux Foundation’s Agentic AI Foundation, and adoption spread to OpenAI, Google, Microsoft, JetBrains, Replit, Sourcegraph, Stripe, and Cloudflare.

Here’s what my current MCP setup looks like in practice:

MCP ServerWhat it provides
GitHubRepository management, PR creation, issue tracking, code search
SupabaseDatabase queries, migrations, edge functions, project management
Context7Up-to-date library documentation and code examples
FirecrawlWeb scraping, content extraction, site crawling

Each of these is like an app installed on the OS. The agent discovers what tools are available, decides which ones to invoke based on the task, and handles the responses. I don’t need to tell Claude Code “use the GitHub MCP to create a PR” — it figures out the right tool to call from the context, the same way an OS routes a file open to the correct application.

Skills vs MCP: what goes where?

A common point of confusion. The Xu and Yan survey (Zhejiang University, February 2026) puts it clearly: skills and MCP are “not competing standards but orthogonal layers of an emerging agentic stack.” Skills define what to do — procedural knowledge. MCP defines how to connect — tool connectivity. A skill might instruct the agent to “create a PR with a conventional commit message and request review from the team lead.” The MCP server provides the GitHub API call that makes it happen. Both became open standards in December 2025.

Why the OS analogy matters

This isn’t a cute metaphor. It reflects a real architectural shift.

Early AI coding tools were specialized — a code completion engine, a chat interface for questions, a separate tool for code review. Each was a single-purpose application. What’s happening now is that agents are becoming general-purpose platforms that load specialized capabilities on demand.

The research community has converged on this framing. The AIOS paper at Rutgers treats LLMs as the kernel with agents as applications. The AgentOS paper from Fukuoka Institute of Technology and NUS (2026) goes further, proposing a Semantic Memory Management Unit, a Reasoning Interrupt Handler, and Cognitive Synchronization Pulses — all borrowed directly from classical OS abstractions. Tim O’Reilly described the same pattern in December 2025: models as processors, agent runtimes as the OS, skills as applications.

And the industry is building toward it. Microsoft Research’s UFO2 is a multiagent AgentOS for Windows desktops, with a HostAgent for task decomposition and AppAgents that use native APIs. It evolved into UFO3 with multi-device orchestration, MCP integration, and formally verified correctness. Microsoft also announced Agent 365 at Ignite 2025, calling it “the operating system for AI agents.” OpenAI turned ChatGPT into a platform at DevDay 2025, with an Apps SDK that lets third parties build applications inside conversations.

The pattern is consistent: every major computing paradigm eventually needs an OS. Hardware got operating systems. Servers got hypervisors. Containers got Kubernetes. AI agents are next.

What this means in practice

If you’re using Claude Code (or any agentic tool), think about it in these terms:

  1. Invest in the OS layer. A CLAUDE.md project constitution and .ai/ context directory are your OS configuration. They tell the agent how to behave in your environment — your conventions, your architecture, your constraints. Without them, every session starts from scratch.

  2. Build skills for repeated workflows. If you find yourself explaining the same process to the agent session after session, that’s a skill waiting to be written. Package the knowledge, the references, and the scripts into a folder with a SKILL.md. Use progressive disclosure — keep the frontmatter light, put the detail in the body, and reference resources from subdirectories.

  3. Connect MCPs for the tools you actually use. Don’t install every MCP available — connect the ones that match your workflow. GitHub, your database, your documentation sources. Treat them like device drivers: they abstract external services through a unified tool interface.

  4. Let the agent orchestrate. The power isn’t in any single skill or MCP. It’s in the agent’s ability to combine them — reading a GitHub issue, querying the database for context, generating code that follows your project conventions, and creating a PR. That’s the OS doing its job.

The competitive question in AI is shifting. It’s no longer “who has the best model” — it’s who builds the best operating system around it.

! Was this useful?