Esc
Type to search posts, tags, and more...
Skip to content

A network MCP

An open-source, vendor-agnostic MCP service for AI-assisted network troubleshooting — starting read-only, with a roadmap toward gated write operations.

Contents

You have a BGP session that’s been flapping for the last 20 minutes. You want to check neighbor state across four routers, correlate interface counters, pull log entries, and cross-reference against the topology in NetBox. Today that means four SSH sessions, a dozen show commands, maybe a script if you’re organized. Tomorrow, you ask an LLM agent to do it — and it calls a Network MCP server that knows how to talk to your infrastructure.

That’s the project I’m building: an open-source MCP service that gives LLM agents a structured, safe way to interact with network devices. The first release focuses entirely on AI-assisted troubleshooting — read-only operations that help you diagnose problems faster. Write operations come later, behind approval gates.

This post is the roadmap.

Why MCP for network operations

MCP (Model Context Protocol) is an open standard from Anthropic that standardizes how LLM applications access external tools and data sources. It uses JSON-RPC 2.0 messaging between hosts, clients, and servers. The latest spec adds OAuth 2.1 authorization and scope-based consent — both directly relevant when you’re building something that touches production infrastructure.

The networking industry is already paying attention. An IETF Internet-Draft published in February 2026 by engineers from Huawei, Telefonica, Deutsche Telekom, and Orange formally explores MCP for network management. Their key framing: MCP doesn’t replace YANG data models or NETCONF/RESTCONF — it integrates with them.

#
Deep dive

A full analysis of the IETF draft constellation — architecture, companion specs, gap analysis, and what the drafts get right and wrong.

There are already vendor-specific implementations. Cisco’s Network MCP Docker Suite bundles seven MCP servers for IOS-XE, Meraki, Catalyst Center, and others. Juniper has a proof-of-concept MCP server that demonstrates BGP state checking and log analysis via natural language. NetBox Labs shipped a read-only MCP server that’s already in production use across enterprise teams.

What’s missing is a vendor-agnostic approach — one service that works across Cisco, Juniper, Arista, and legacy platforms, with YANG models defining the tool schemas under the hood.

The architecture

A single MCP server sits between LLM agents and your network infrastructure. The LLM calls MCP tools with typed parameters and gets back structured responses — it never touches YANG directly. Behind the scenes, YANG models define those tool schemas: what parameters each tool accepts, what fields come back in the response. The server handles translation from those schemas to whatever protocol the target device speaks.

The YANG model registry resolves models in a defined hierarchy:

  • OpenConfig (primary) — vendor-neutral models for interfaces, BGP, OSPF, VLANs, ACLs
  • IETF models (secondary) — RFC-defined models where OpenConfig has gaps
  • Vendor-native (fallback) — platform-specific models for the long tail of features
  • NAPALM/Netmiko shim (legacy) — CLI translation for devices without model-driven APIs

The LLM sees the same tool interface regardless of what’s underneath. Whether the target device speaks NETCONF or only understands show commands, the MCP tool parameters and response shapes are identical — because they’re all derived from the same YANG models.

Phase 1: read-only troubleshooting

The first release is entirely read-only. No configuration changes, no approval flows needed. The MCP server exposes tools that query device state and return structured data.

Here’s what a read tool looks like with FastMCP:

from fastmcp import FastMCP

mcp = FastMCP("network-mcp")

@mcp.tool()
async def get_bgp_neighbors(device: str, vrf: str = "default") -> dict:
    """Get BGP neighbor table for a device.

    Returns neighbor address, state, prefixes received,
    uptime, and last state change for each peer.
    """
    async with device_connection(device) as conn:
        result = await conn.get(filter=BGP_NEIGHBOR_FILTER)
        return parse_bgp_neighbors(result, vrf=vrf)

The tools in Phase 1 cover the operations you’d run during a troubleshooting session:

  • Device stateget_facts, get_interfaces, get_interface_counters
  • Routingget_bgp_neighbors, get_bgp_routes, get_routes, get_ospf_neighbors
  • L2get_vlans, get_mac_table, get_lldp_neighbors
  • Diagnosticsget_logs, get_environment (CPU, memory, temperature)
  • Inventoryget_config (running/startup), get_ntp, get_snmp_information

These map closely to NAPALM’s getter methods, which already provide a unified Python API across Cisco IOS, IOS-XR, NX-OS, Junos, EOS, and others. For devices that support NETCONF, the server uses ncclient directly with YANG-modeled filters. For legacy devices, NAPALM handles the CLI translation.

Phase 1 is safe by design. Read operations don’t change state. There’s no approval engine to build, no risk classification to configure. You get value immediately — an LLM that can pull correlated data from across your network in seconds.

The real value is correlation. A single LLM invocation can check BGP state on four routers, compare interface error counters, pull relevant log entries, and cross-reference device inventory — then synthesize a coherent diagnosis. That’s the workflow that takes 15 minutes manually and 15 seconds through MCP.

A known pitfall with LLM query scope

Juniper’s MCP demo surfaced a real issue: when asked to check BGP status for VRF sessions, the LLM returned all BGP sessions instead of filtering to the specified VRF. This is why structured tool parameters matter — get_bgp_neighbors(device, vrf="CUSTOMER-A") is unambiguous in a way that natural language isn’t. The tool schema constrains the LLM’s request, not the LLM’s interpretation.

Phase 2: write operations with approval gates

Once the read-only foundation is stable, Phase 2 adds configuration changes behind mandatory approval flows. The MCP server exposes two additional tool categories:

Write tools create a pending intent. The LLM submits structured parameters through the MCP tool — the server translates those into a YANG-modeled configuration delta for human review. Every write goes through an approval engine — there is no bypass.

Dry-run tools generate a diff without side effects. The LLM can ask “what would this change look like?” and get back the rendered configuration delta. No approval needed, nothing touches the device.

The approval engine classifies each intent by risk:

  • Low (auto-approve) — description changes, SNMP community rotations
  • Medium (single approval) — static routes, ACL entries, enabling an interface
  • High (approval + confirmation) — BGP peer changes, OSPF area modifications
  • Critical (two-person sign-off) — core routing policy, MPLS LSP changes, multi-failure-domain impact

The LLM doesn’t block while waiting for approval. It receives an intent ID and a PENDING status, then the current invocation ends. On the next call — triggered by the user, a scheduled check, or a webhook callback — the LLM queries the intent status and picks up where it left off. This follows the UniFi MCP pattern of requiring explicit confirmation for mutations, extended with structured risk classification.

Phase 3: operator UI and integrations

The final phase adds the human control plane:

  • Approval queue — pending intents sorted by risk and age, with rendered diffs and LLM reasoning
  • Audit log — every query, every proposed change, every approval decision, every device commit
  • Active sessions — which LLM agents are connected, what they’re doing, with session revocation
  • Notifications — Slack/Teams for approvals, email for audit trail, PagerDuty for critical escalation

Tech stack

The project uses FastAPI with FastMCP for the MCP server — FastMCP supports FastMCP.from_fastapi() integration, so the MCP endpoint and the REST API live in the same process. ncclient handles NETCONF. NAPALM provides the multi-vendor abstraction for both model-driven and CLI-based devices. pyang parses and validates YANG models. PostgreSQL stores approval records and audit logs. The operator UI is a React/Vite SPA served from the same origin.

FastMCP’s built-in OAuth2 support with scope-based access control (require_scopes("read"), require_scopes("write")) maps directly onto the read/write separation.

Why open source

The vendor-specific MCP servers already exist. What doesn’t exist is a vendor-neutral layer that uses YANG models to generate consistent tool schemas across platforms, with a safety model that scales from read-only troubleshooting to gated write operations.

This needs to be open source for two reasons. First, vendor neutrality only works if the community owns the abstraction — otherwise it drifts toward whoever funds it. Second, the YANG model coverage across vendors is uneven. OpenConfig models work well for common constructs, but the edge cases need contributions from engineers who actually run those platforms.

The IETF draft recommends categorizing MCP servers by network management function rather than building monolithic servers. That’s the right direction — a composable architecture where you deploy the pieces your network needs, not a single binary that tries to cover everything.

What’s next

Phase 1 development starts now. The initial scope is read-only tools for Cisco IOS-XE and Junos, with NAPALM as the fallback for other platforms. If you’re running network infrastructure and want to help shape this — especially if you have vendor-specific edge cases or opinions about which read operations matter most for troubleshooting — the repo will be up shortly.

The goal: an LLM that pulls correlated data from across your network in seconds, so you spend your time on the diagnosis instead of the data gathering.

! Was this useful?