FinOps for SREs: Cutting Costs Without Breaking Things
FinOps for SREs: Cutting Costs Without Breaking Things Tags: aws finops sre reliability devops Most FinOps advice starts with a cost dashboard. This series starts with a different question: how do we cut costs without violating our SLOs? I'm an SRE at a subsidiary of one of Korea's largest tech comp
June Gu
Original Source
Dev.to
https://dev.to/june-gu/finops-for-sres-cutting-costs-without-breaking-things-2fbkFinOps for SREs: Cutting Costs Without Breaking Things
Tags: aws finops sre reliability devops
Most FinOps advice starts with a cost dashboard. This series starts with a different question: how do we cut costs without violating our SLOs?
I'm an SRE at a subsidiary of one of Korea's largest tech companies, managing four AWS accounts connected via a Transit Gateway hub-spoke architecture. When I was asked to reduce cloud spend, I didn't open AWS Cost Explorer first. I opened our SigNoz dashboards and checked our error budgets.
That's the difference between FinOps and SRE-driven FinOps.
The SRE Guarantee
Before any cost optimization begins, I guarantee three things:
1. Error Budget Protection
No optimization will be executed if it risks breaching SLOs. If our error budget is below 50%, all FinOps work stops โ reliability comes first.
2. Assured Minimum Downtime
Every change has a rollback plan, a maintenance window, and a blast radius assessment. Zero-downtime is the target. Documented, brief downtime during a maintenance window is the floor. Unplanned downtime is unacceptable.
3. Reliability Over Savings
If forced to choose between $500/month in savings and a 0.01% availability risk, we choose availability. Always. The cost of an outage โ in customer trust, in engineering hours, in incident response โ exceeds any monthly savings.
This guarantee isn't just a principle. It's encoded in every check of the aws-finops-toolkit โ the open-source CLI I built to automate this workflow.
The Series
This series walks through the complete FinOps workflow I used to identify $48-67K/year in savings across four AWS accounts โ starting with analysis, through passive cleanup, to active downsizing with SRE guardrails.
Part 0: The Pre-Flight Checklist
9 checks before cutting any cost. Traffic analysis, SLO status, cache dependencies, incident history, RI/SP coverage, and more. This is the analysis phase โ never optimize what you don't fully understand.
โ OSS: finops preflight command (aws-finops-toolkit)
Part 1: How I Found $12K/Year in AWS Waste
Passive waste โ things nobody uses. Abandoned VPCs ($748/mo), orphan CloudWatch log groups ($110-165/mo), S3 lifecycle vs Intelligent-Tiering ($75-104/mo). Zero risk to production. Total: $933-1,017/month.
โ OSS: finops scan โ vpc_waste, cloudwatch_waste, s3_lifecycle checks
Part 2: Downsizing Without Downtime
Active optimization โ shrinking running infrastructure with SRE guardrails. EC2/EKS right-sizing with PDBs, NAT Gateway replacement, Spot with drain handlers, RDS right-sizing with read replicas and cold cache planning, ElastiCache scheduling, and Reserved Instances (commit last, not first). Total: $787-1,087/month.
โ OSS: finops scan โ ec2_rightsizing, nat_gateway, spot_candidates, rds_rightsizing, elasticache_scheduling, reserved_instances, unused_resources checks
Combined Savings
| Phase | Monthly | Annual |
|---|---|---|
| Part 1: Passive waste cleanup | $933-1,017 | $11.2-12.2K |
| Part 2: Active downsizing | $787-1,087 | $9.4-13K |
| Total identified | $1,720-2,104 | $20.6-25.2K |
| P0-P2 roadmap (pending) | $3,995-5,565 | $48-67K |
Every optimization in this series passed through the SRE guarantee. Not a single SLO was breached. Not a single unplanned outage occurred.
The Toolkit
Everything in this series maps to aws-finops-toolkit โ an open-source CLI that automates the discovery:
# Pre-flight analysis before any change
finops preflight --target pn-sh-rds-prod --profile dodo-dev --apm signoz
# Scan for cost waste across accounts
finops scan --profiles dev,staging,prod
# Generate report for stakeholders
finops report --format html --output finops-report.html
The tool finds the opportunities. The SRE decides which ones are safe to execute.
This is the introduction to the "FinOps for SREs" series. Start with Part 0: The Pre-Flight Checklist or jump to the part most relevant to your situation.
I'm June, an SRE with 5+ years of experience at Korea's top tech companies including Coupang (NYSE: CPNG) and NAVER Corporation. I write about real-world infrastructure problems. Find me on LinkedIn.
Found this useful? Share it!
Read the Full Story
Continue reading on Dev.to
Related Stories
i.MX6ULL Porting Log 02: Project Layout, a Serial Port Trap, and the Current Board Baseline
about 1 hour ago
Why Your AI Coding Agent Keeps Failing at Specialized Tasks (and How to Fix It)
about 1 hour ago
What Rotifer Protocol Is Not: Positioning Beyond the AGI Hype
about 1 hour ago

Microsoft's Agent Governance Toolkit and Where Rynko Flow Fits In
about 1 hour ago