Replacing Legacy ACLs with Infrastructure as Code
Moving from manual ACL management to declarative infrastructure as code using Terraform and Python, eliminating the risk of fat-finger mistakes at 2am.
Contents
The Problem with Manual ACL Management
If you have spent any time managing network access control lists by hand, you know the pain. A typical enterprise Cisco router might have hundreds of ACL entries spread across multiple interfaces. Each change requires an SSH session, careful typing, and a prayer that you did not accidentally lock yourself out.
Consider a typical day: a ticket comes in asking to allow TCP port 443 from the new developer subnet to the staging environment. You SSH into the router and type:
router# configure terminal
router(config)# ip access-list extended STAGING-INBOUND
router(config-ext-nacl)# 150 permit tcp 10.50.0.0 0.0.3.255 host 10.100.5.20 eq 443
router(config-ext-nacl)# end
router# write memory
Simple enough. But multiply this by dozens of routers, factor in change windows, peer review requirements, and the occasional emergency rollback, and you have a process that does not scale.
Why Infrastructure as Code
Infrastructure as Code (IaC) treats network configuration the same way software engineers treat application code: version-controlled, peer-reviewed, tested, and automatically deployed.
The benefits are immediate:
Every change is tracked in Git. Every deployment is repeatable. Every rollback is a
git revertaway.
Here is what that same ACL change looks like in Terraform using the Cisco IOS provider:
resource "cisco_ios_access_list_extended" "staging_inbound" {
name = "STAGING-INBOUND"
entry {
sequence = 150
action = "permit"
protocol = "tcp"
source = "10.50.0.0 0.0.3.255"
destination = "host 10.100.5.20"
destination_port = "eq 443"
remark = "TICKET-4521: Allow dev subnet to staging HTTPS"
}
}
Building the Pipeline
Step 1: Extract Current ACLs
First, we need to capture the current state. A Python script using Netmiko does the heavy lifting:
from netmiko import ConnectHandler
import json
device = {
"device_type": "cisco_ios",
"host": "core-rtr-01.lab.internal",
"username": "netops",
"password_from_vault": True,
}
def extract_acls(device_params: dict) -> dict:
"""Connect to device and extract all extended ACLs."""
conn = ConnectHandler(**device_params)
output = conn.send_command("show ip access-lists", use_textfsm=True)
conn.disconnect()
return output
acls = extract_acls(device)
with open("baseline_acls.json", "w") as f:
json.dump(acls, f, indent=2)
Step 2: Define the Desired State
We store ACL definitions as structured YAML files:
# acls/staging-inbound.yaml
name: STAGING-INBOUND
interface: GigabitEthernet0/1
direction: in
entries:
- seq: 100
action: permit
protocol: tcp
source: 10.10.0.0/22
destination: 10.100.5.0/24
port: 443
remark: "Production web access"
- seq: 150
action: permit
protocol: tcp
source: 10.50.0.0/22
destination: host 10.100.5.20
port: 443
remark: "TICKET-4521: Dev subnet staging access"
- seq: 999
action: deny
protocol: ip
source: any
destination: any
log: true
remark: "Implicit deny with logging"
Step 3: Generate and Apply Configurations
A Jinja2 template renders the YAML into Cisco IOS commands:
from jinja2 import Environment, FileSystemLoader
import yaml
env = Environment(loader=FileSystemLoader("templates"))
template = env.get_template("acl_extended.j2")
with open("acls/staging-inbound.yaml") as f:
acl_data = yaml.safe_load(f)
config = template.render(acl=acl_data)
print(config)
The rendered output:
ip access-list extended STAGING-INBOUND
remark Production web access
100 permit tcp 10.10.0.0 0.0.3.255 10.100.5.0 0.0.0.255 eq 443
remark TICKET-4521: Dev subnet staging access
150 permit tcp 10.50.0.0 0.0.3.255 host 10.100.5.20 eq 443
remark Implicit deny with logging
999 deny ip any any log
Step 4: CI/CD Integration
We wire this into a GitHub Actions pipeline that validates syntax, runs against a lab environment, waits for approval, then deploys to production:
# .github/workflows/acl-deploy.yml
name: ACL Deployment
on:
push:
paths: ['acls/**']
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Lint ACL definitions
run: python scripts/validate_acls.py
- name: Dry-run against lab
run: python scripts/deploy.py --target lab --dry-run
deploy:
needs: validate
environment: production
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Apply to production
run: python scripts/deploy.py --target production
Results
After migrating to IaC, we saw measurable improvements:
- Change lead time dropped from 4 hours to 15 minutes
- Rollback time dropped from “whatever it takes” to a single
git revert - Configuration drift eliminated through periodic reconciliation
- Audit compliance became trivial with Git history as the source of truth
Lessons Learned
The hardest part was not the tooling. It was convincing the team that treating network configs like code was worth the initial investment. Once the first emergency rollback took 30 seconds instead of 30 minutes, the skeptics came around.
Start small. Pick one ACL on one router. Automate it end to end. Show the results. Then scale.