Cost optimisation (often called FinOps or Cloud Financial Management) is the practice of managing cloud costs while maintaining the performance and reliability your business needs. It’s not about spending less — it’s about spending wisely.
FinOps Principles
FinOps is a cultural practice that brings together technology, finance, and business teams to manage cloud costs collaboratively.
Core Principles
- Teams need to collaborate — Engineering, finance, and business must work together
- Everyone takes ownership — Decentralise cost decisions to the teams who make architectural choices
- A centralised team drives FinOps — A dedicated team provides best practices, tooling, and governance
- Reports should be accessible and timely — Real-time visibility enables real-time decisions
- Decisions are driven by business value — Optimise for value, not just cost
- Take advantage of the variable cost model — Cloud’s pay-as-you-go model is a feature, not a bug
The FinOps Lifecycle
Inform → Optimise → Operate → (repeat)
- Inform — Understand where money is going (visibility, allocation, benchmarking)
- Optimise — Identify and implement savings opportunities
- Operate — Continuously monitor and govern cloud spend
Cost Visibility
You can’t optimise what you can’t see. The first step is understanding your current spend.
Tagging Strategy
Tags are the foundation of cost allocation. Without consistent tagging, you cannot attribute costs to teams, products, or environments.
Essential tags:
| Tag | Purpose | Example Values |
|---|---|---|
environment | Distinguish prod/non-prod | production, staging, development |
team | Cost allocation to teams | platform, payments, frontend |
product | Cost allocation to products | checkout, search, analytics |
cost-centre | Finance allocation | CC-1234, engineering |
owner | Contact for the resource | [email protected] |
managed-by | How it’s provisioned | terraform, manual, cdk |
Tagging best practices:
- Enforce tagging via IaC and policies
- Use consistent naming conventions (kebab-case, lowercase)
- Implement tag compliance checks in CI/CD
- Create default tags at the account/project level
- Review and clean up untagged resources regularly
Cost Allocation
- Shared costs — Distribute platform/infrastructure costs fairly (by usage, headcount, or fixed percentage)
- Showback — Report costs to teams without formal chargebacks
- Chargeback — Formally transfer costs to team budgets
Start with showback to build awareness; move to chargeback when teams have sufficient control over their costs.
Optimisation Strategies
Right-Sizing
Match resource capacity to actual utilisation. Oversized resources are the most common source of waste.
Process:
- Collect utilisation metrics (CPU, memory, network, storage)
- Identify resources consistently below 40% utilisation
- Recommend smaller instance types
- Test changes in non-production first
- Implement and monitor
Tools: AWS Compute Optimizer, GCP Recommender, Azure Advisor
Reserved Instances / Savings Plans
Commit to usage in exchange for significant discounts (typically 30-70%).
| Commitment Type | Discount | Flexibility | Best For |
|---|---|---|---|
| On-demand | 0% | Maximum | Unpredictable workloads |
| Savings Plans (AWS) | 30-50% | Good | Stable compute usage |
| Reserved Instances | 40-70% | Limited | Predictable, steady-state |
| Spot/Preemptible | 60-90% | None (can be terminated) | Fault-tolerant, batch |
Reservation strategy:
- Cover your baseline with reservations (the minimum you always use)
- Use savings plans for steady growth
- Use on-demand for variable load
- Use spot for fault-tolerant workloads
Spot/Preemptible Instances
Spare cloud capacity at massive discounts, but can be reclaimed with little notice.
Good candidates for spot:
- Batch processing and data pipelines
- CI/CD build agents
- Stateless web servers behind load balancers
- Development and test environments
- Kubernetes node pools (with proper pod disruption budgets)
Not suitable:
- Stateful workloads without replication
- Long-running jobs that can’t checkpoint
- Anything requiring guaranteed availability
Storage Optimisation
Storage costs accumulate silently. Regular review is essential.
Strategies:
- Lifecycle policies — Automatically move data to cheaper tiers (e.g., S3 Standard → Glacier)
- Delete unused snapshots — Old EBS/disk snapshots add up quickly
- Compress and deduplicate — Reduce stored data volume
- Right-size volumes — Provisioned storage is often oversized
- Review backups — Do you need 90 days of daily backups?
Network Cost Optimisation
Data transfer costs are often overlooked and can be substantial.
Strategies:
- Use private endpoints — Avoid NAT gateway and internet egress charges
- Keep traffic in-region — Cross-region and cross-AZ traffic costs money
- CDN for static content — Serve from edge locations, reduce origin traffic
- Compress data — Less data transferred = lower costs
- Review NAT gateway usage — These are expensive; consider alternatives
Database Optimisation
- Reserved capacity — Commit to database instances like compute
- Right-size instances — Databases are often over-provisioned
- Storage auto-scaling — Only pay for what you use
- Review provisioned IOPS — Often unnecessary
- Consider serverless — Aurora Serverless, DynamoDB on-demand for variable workloads
Waste Elimination
Common sources of waste to audit regularly:
| Waste Type | Description | Action |
|---|---|---|
| Idle resources | Running but unused (VMs, databases, load balancers) | Terminate or schedule |
| Orphaned resources | Detached volumes, unused IPs, old snapshots | Delete |
| Over-provisioned | Resources larger than needed | Right-size |
| Non-production running 24/7 | Dev/test environments running overnight/weekends | Schedule shutdown |
| Unused reservations | Reservations that don’t match current usage | Sell or let expire |
Environment Scheduling
Non-production environments often don’t need to run continuously.
Implementation:
- Tag resources with schedule (e.g.,
schedule: office-hours) - Use Lambda/Cloud Functions to start/stop resources on schedule
- Provide self-service mechanisms for engineers to extend when needed
- Typical savings: 65-75% on non-production compute
Governance
Budgets and Alerts
- Set budgets at account, team, and project levels
- Alert at 50%, 80%, 100% of budget
- Include forecast-based alerts (predicted spend)
- Ensure alerts reach people who can act on them
Anomaly Detection
Cloud providers offer anomaly detection to catch unexpected spend spikes:
- AWS Cost Anomaly Detection
- GCP Recommender anomaly alerts
- Azure Cost Management anomaly alerts
Configure alerts to notify relevant teams immediately.
Cost Review Cadence
| Frequency | Activity |
|---|---|
| Daily | Check for anomalies and spikes |
| Weekly | Review top spending services and trends |
| Monthly | Team-level cost reviews, reservation coverage |
| Quarterly | Strategic planning, commitment purchases |
Unit Economics
Track cost efficiency, not just total cost.
Example unit metrics:
- Cost per transaction
- Cost per active user
- Cost per GB processed
- Cost per API request
As your business scales, total cost should increase but cost per unit should decrease (economies of scale).
Organisational Considerations
FinOps Team
A central FinOps function provides:
- Tooling and dashboards
- Best practices and training
- Reserved instance/savings plan management
- Governance and policy enforcement
- Executive reporting
Engineering Culture
- Include cost as a non-functional requirement
- Add cost impact to architecture decision records
- Make cost data visible to engineers
- Celebrate cost savings alongside feature delivery
- Consider cost in code reviews for infrastructure changes
Cloud Provider Tools
AWS
- Cost Explorer — Visualise and analyse costs
- Cost and Usage Reports (CUR) — Detailed billing data
- Compute Optimizer — Right-sizing recommendations
- Trusted Advisor — Optimisation recommendations
- Savings Plans — Flexible commitment discounts
Google Cloud
- Billing Reports — Cost visualisation
- Recommender — Right-sizing and idle resource recommendations
- Committed Use Discounts — Reservations
- Active Assist — Optimisation recommendations
Azure
- Cost Management — Visualisation and budgets
- Advisor — Optimisation recommendations
- Reservations — Commitment discounts
- Azure Hybrid Benefit — Use existing licenses
Third-Party Tools
- Infracost — Cost estimates for Terraform in CI/CD
- OpenCost — Kubernetes cost monitoring (CNCF project)
- Kubecost — Kubernetes cost management
- CloudHealth — Multi-cloud cost management
- Spot.io — Automated spot instance management
- CAST AI — Kubernetes cost optimisation