FinOps for SREs: Cutting Costs Without Breaking Things Tags: aws finops sre reliability devops Most FinOps advice starts with a cost dashboard. This series starts with a different question: how do we cut costs without violating our SLOs? I'm an SRE at a subsidiary of one of Korea's largest tech comp

FinOps for SREs: Cutting Costs Without Breaking Things

Tags: aws finops sre reliability devops

Most FinOps advice starts with a cost dashboard. This series starts with a different question: how do we cut costs without violating our SLOs?

I'm an SRE at a subsidiary of one of Korea's largest tech companies, managing four AWS accounts connected via a Transit Gateway hub-spoke architecture. When I was asked to reduce cloud spend, I didn't open AWS Cost Explorer first. I opened our SigNoz dashboards and checked our error budgets.

That's the difference between FinOps and SRE-driven FinOps.

The SRE Guarantee

Before any cost optimization begins, I guarantee three things:

1. Error Budget Protection
No optimization will be executed if it risks breaching SLOs. If our error budget is below 50%, all FinOps work stops — reliability comes first.

2. Assured Minimum Downtime
Every change has a rollback plan, a maintenance window, and a blast radius assessment. Zero-downtime is the target. Documented, brief downtime during a maintenance window is the floor. Unplanned downtime is unacceptable.

3. Reliability Over Savings
If forced to choose between $500/month in savings and a 0.01% availability risk, we choose availability. Always. The cost of an outage — in customer trust, in engineering hours, in incident response — exceeds any monthly savings.

This guarantee isn't just a principle. It's encoded in every check of the aws-finops-toolkit — the open-source CLI I built to automate this workflow.

The Series

This series walks through the complete FinOps workflow I used to identify $48-67K/year in savings across four AWS accounts — starting with analysis, through passive cleanup, to active downsizing with SRE guardrails.

Part 0: The Pre-Flight Checklist

9 checks before cutting any cost. Traffic analysis, SLO status, cache dependencies, incident history, RI/SP coverage, and more. This is the analysis phase — never optimize what you don't fully understand.

→ OSS: finops preflight command (aws-finops-toolkit)

Part 1: How I Found $12K/Year in AWS Waste

Passive waste — things nobody uses. Abandoned VPCs ($748/mo), orphan CloudWatch log groups ($110-165/mo), S3 lifecycle vs Intelligent-Tiering ($75-104/mo). Zero risk to production. Total: $933-1,017/month.

→ OSS: finops scan — vpc_waste, cloudwatch_waste, s3_lifecycle checks

Part 2: Downsizing Without Downtime

Active optimization — shrinking running infrastructure with SRE guardrails. EC2/EKS right-sizing with PDBs, NAT Gateway replacement, Spot with drain handlers, RDS right-sizing with read replicas and cold cache planning, ElastiCache scheduling, and Reserved Instances (commit last, not first). Total: $787-1,087/month.

→ OSS: finops scan — ec2_rightsizing, nat_gateway, spot_candidates, rds_rightsizing, elasticache_scheduling, reserved_instances, unused_resources checks

Combined Savings

Phase	Monthly	Annual
Part 1: Passive waste cleanup	$933-1,017	$11.2-12.2K
Part 2: Active downsizing	$787-1,087	$9.4-13K
Total identified	$1,720-2,104	$20.6-25.2K
P0-P2 roadmap (pending)	$3,995-5,565	$48-67K

Every optimization in this series passed through the SRE guarantee. Not a single SLO was breached. Not a single unplanned outage occurred.

The Toolkit

Everything in this series maps to aws-finops-toolkit — an open-source CLI that automates the discovery:

# Pre-flight analysis before any change
finops preflight --target pn-sh-rds-prod --profile dodo-dev --apm signoz

# Scan for cost waste across accounts
finops scan --profiles dev,staging,prod

# Generate report for stakeholders
finops report --format html --output finops-report.html

The tool finds the opportunities. The SRE decides which ones are safe to execute.

This is the introduction to the "FinOps for SREs" series. Start with Part 0: The Pre-Flight Checklist or jump to the part most relevant to your situation.

I'm June, an SRE with 5+ years of experience at Korea's top tech companies including Coupang (NYSE: CPNG) and NAVER Corporation. I write about real-world infrastructure problems. Find me on LinkedIn.

FinOps for SREs: Cutting Costs Without Breaking Things

Tags: aws finops sre reliability devops

Most FinOps advice starts with a cost dashboard. This series starts with a different question: how do we cut costs without violating our SLOs?

That's the difference between FinOps and SRE-driven FinOps.

The SRE Guarantee

Before any cost optimization begins, I guarantee three things:

1. Error Budget Protection
No optimization will be executed if it risks breaching SLOs. If our error budget is below 50%, all FinOps work stops — reliability comes first.

This guarantee isn't just a principle. It's encoded in every check of the aws-finops-toolkit — the open-source CLI I built to automate this workflow.

The Series

Part 0: The Pre-Flight Checklist

→ OSS: finops preflight command (aws-finops-toolkit)

Part 1: How I Found $12K/Year in AWS Waste

→ OSS: finops scan — vpc_waste, cloudwatch_waste, s3_lifecycle checks

Part 2: Downsizing Without Downtime

→ OSS: finops scan — ec2_rightsizing, nat_gateway, spot_candidates, rds_rightsizing, elasticache_scheduling, reserved_instances, unused_resources checks

Combined Savings

Phase	Monthly	Annual
Part 1: Passive waste cleanup	$933-1,017	$11.2-12.2K
Part 2: Active downsizing	$787-1,087	$9.4-13K
Total identified	$1,720-2,104	$20.6-25.2K
P0-P2 roadmap (pending)	$3,995-5,565	$48-67K

Every optimization in this series passed through the SRE guarantee. Not a single SLO was breached. Not a single unplanned outage occurred.

The Toolkit

Everything in this series maps to aws-finops-toolkit — an open-source CLI that automates the discovery:

# Pre-flight analysis before any change
finops preflight --target pn-sh-rds-prod --profile dodo-dev --apm signoz

# Scan for cost waste across accounts
finops scan --profiles dev,staging,prod

# Generate report for stakeholders
finops report --format html --output finops-report.html

The tool finds the opportunities. The SRE decides which ones are safe to execute.

This is the introduction to the "FinOps for SREs" series. Start with Part 0: The Pre-Flight Checklist or jump to the part most relevant to your situation.

FinOps for SREs: Cutting Costs Without Breaking Things

FinOps for SREs: Cutting Costs Without Breaking Things

The SRE Guarantee

The Series

Part 0: The Pre-Flight Checklist

Part 1: How I Found $12K/Year in AWS Waste

Part 2: Downsizing Without Downtime

Combined Savings

The Toolkit

Related Stories

i.MX6ULL Porting Log 02: Project Layout, a Serial Port Trap, and the Current Board Baseline

Why Your AI Coding Agent Keeps Failing at Specialized Tasks (and How to Fix It)

What Rotifer Protocol Is Not: Positioning Beyond the AGI Hype

Microsoft's Agent Governance Toolkit and Where Rynko Flow Fits In

FinOps for SREs: Cutting Costs Without Breaking Things

FinOps for SREs: Cutting Costs Without Breaking Things

The SRE Guarantee

The Series

Part 0: The Pre-Flight Checklist

Part 1: How I Found $12K/Year in AWS Waste

Part 2: Downsizing Without Downtime

Combined Savings

The Toolkit

Related Stories

i.MX6ULL Porting Log 02: Project Layout, a Serial Port Trap, and the Current Board Baseline

Why Your AI Coding Agent Keeps Failing at Specialized Tasks (and How to Fix It)

What Rotifer Protocol Is Not: Positioning Beyond the AGI Hype

Microsoft's Agent Governance Toolkit and Where Rynko Flow Fits In