Cost optimisation (often called FinOps or Cloud Financial Management) is the practice of managing cloud costs while maintaining the performance and reliability your business needs. It’s not about spending less — it’s about spending wisely.

FinOps Principles

FinOps is a cultural practice that brings together technology, finance, and business teams to manage cloud costs collaboratively.

Core Principles

  1. Teams need to collaborate — Engineering, finance, and business must work together
  2. Everyone takes ownership — Decentralise cost decisions to the teams who make architectural choices
  3. A centralised team drives FinOps — A dedicated team provides best practices, tooling, and governance
  4. Reports should be accessible and timely — Real-time visibility enables real-time decisions
  5. Decisions are driven by business value — Optimise for value, not just cost
  6. Take advantage of the variable cost model — Cloud’s pay-as-you-go model is a feature, not a bug

The FinOps Lifecycle

Inform → Optimise → Operate → (repeat)
  • Inform — Understand where money is going (visibility, allocation, benchmarking)
  • Optimise — Identify and implement savings opportunities
  • Operate — Continuously monitor and govern cloud spend

Cost Visibility

You can’t optimise what you can’t see. The first step is understanding your current spend.

Tagging Strategy

Tags are the foundation of cost allocation. Without consistent tagging, you cannot attribute costs to teams, products, or environments.

Essential tags:

TagPurposeExample Values
environmentDistinguish prod/non-prodproduction, staging, development
teamCost allocation to teamsplatform, payments, frontend
productCost allocation to productscheckout, search, analytics
cost-centreFinance allocationCC-1234, engineering
ownerContact for the resource[email protected]
managed-byHow it’s provisionedterraform, manual, cdk

Tagging best practices:

  • Enforce tagging via IaC and policies
  • Use consistent naming conventions (kebab-case, lowercase)
  • Implement tag compliance checks in CI/CD
  • Create default tags at the account/project level
  • Review and clean up untagged resources regularly

Cost Allocation

  • Shared costs — Distribute platform/infrastructure costs fairly (by usage, headcount, or fixed percentage)
  • Showback — Report costs to teams without formal chargebacks
  • Chargeback — Formally transfer costs to team budgets

Start with showback to build awareness; move to chargeback when teams have sufficient control over their costs.

Optimisation Strategies

Right-Sizing

Match resource capacity to actual utilisation. Oversized resources are the most common source of waste.

Process:

  1. Collect utilisation metrics (CPU, memory, network, storage)
  2. Identify resources consistently below 40% utilisation
  3. Recommend smaller instance types
  4. Test changes in non-production first
  5. Implement and monitor

Tools: AWS Compute Optimizer, GCP Recommender, Azure Advisor

Reserved Instances / Savings Plans

Commit to usage in exchange for significant discounts (typically 30-70%).

Commitment TypeDiscountFlexibilityBest For
On-demand0%MaximumUnpredictable workloads
Savings Plans (AWS)30-50%GoodStable compute usage
Reserved Instances40-70%LimitedPredictable, steady-state
Spot/Preemptible60-90%None (can be terminated)Fault-tolerant, batch

Reservation strategy:

  • Cover your baseline with reservations (the minimum you always use)
  • Use savings plans for steady growth
  • Use on-demand for variable load
  • Use spot for fault-tolerant workloads

Spot/Preemptible Instances

Spare cloud capacity at massive discounts, but can be reclaimed with little notice.

Good candidates for spot:

  • Batch processing and data pipelines
  • CI/CD build agents
  • Stateless web servers behind load balancers
  • Development and test environments
  • Kubernetes node pools (with proper pod disruption budgets)

Not suitable:

  • Stateful workloads without replication
  • Long-running jobs that can’t checkpoint
  • Anything requiring guaranteed availability

Storage Optimisation

Storage costs accumulate silently. Regular review is essential.

Strategies:

  • Lifecycle policies — Automatically move data to cheaper tiers (e.g., S3 Standard → Glacier)
  • Delete unused snapshots — Old EBS/disk snapshots add up quickly
  • Compress and deduplicate — Reduce stored data volume
  • Right-size volumes — Provisioned storage is often oversized
  • Review backups — Do you need 90 days of daily backups?

Network Cost Optimisation

Data transfer costs are often overlooked and can be substantial.

Strategies:

  • Use private endpoints — Avoid NAT gateway and internet egress charges
  • Keep traffic in-region — Cross-region and cross-AZ traffic costs money
  • CDN for static content — Serve from edge locations, reduce origin traffic
  • Compress data — Less data transferred = lower costs
  • Review NAT gateway usage — These are expensive; consider alternatives

Database Optimisation

  • Reserved capacity — Commit to database instances like compute
  • Right-size instances — Databases are often over-provisioned
  • Storage auto-scaling — Only pay for what you use
  • Review provisioned IOPS — Often unnecessary
  • Consider serverless — Aurora Serverless, DynamoDB on-demand for variable workloads

Waste Elimination

Common sources of waste to audit regularly:

Waste TypeDescriptionAction
Idle resourcesRunning but unused (VMs, databases, load balancers)Terminate or schedule
Orphaned resourcesDetached volumes, unused IPs, old snapshotsDelete
Over-provisionedResources larger than neededRight-size
Non-production running 24/7Dev/test environments running overnight/weekendsSchedule shutdown
Unused reservationsReservations that don’t match current usageSell or let expire

Environment Scheduling

Non-production environments often don’t need to run continuously.

Implementation:

  • Tag resources with schedule (e.g., schedule: office-hours)
  • Use Lambda/Cloud Functions to start/stop resources on schedule
  • Provide self-service mechanisms for engineers to extend when needed
  • Typical savings: 65-75% on non-production compute

Governance

Budgets and Alerts

  • Set budgets at account, team, and project levels
  • Alert at 50%, 80%, 100% of budget
  • Include forecast-based alerts (predicted spend)
  • Ensure alerts reach people who can act on them

Anomaly Detection

Cloud providers offer anomaly detection to catch unexpected spend spikes:

  • AWS Cost Anomaly Detection
  • GCP Recommender anomaly alerts
  • Azure Cost Management anomaly alerts

Configure alerts to notify relevant teams immediately.

Cost Review Cadence

FrequencyActivity
DailyCheck for anomalies and spikes
WeeklyReview top spending services and trends
MonthlyTeam-level cost reviews, reservation coverage
QuarterlyStrategic planning, commitment purchases

Unit Economics

Track cost efficiency, not just total cost.

Example unit metrics:

  • Cost per transaction
  • Cost per active user
  • Cost per GB processed
  • Cost per API request

As your business scales, total cost should increase but cost per unit should decrease (economies of scale).

Organisational Considerations

FinOps Team

A central FinOps function provides:

  • Tooling and dashboards
  • Best practices and training
  • Reserved instance/savings plan management
  • Governance and policy enforcement
  • Executive reporting

Engineering Culture

  • Include cost as a non-functional requirement
  • Add cost impact to architecture decision records
  • Make cost data visible to engineers
  • Celebrate cost savings alongside feature delivery
  • Consider cost in code reviews for infrastructure changes

Cloud Provider Tools

AWS

  • Cost Explorer — Visualise and analyse costs
  • Cost and Usage Reports (CUR) — Detailed billing data
  • Compute Optimizer — Right-sizing recommendations
  • Trusted Advisor — Optimisation recommendations
  • Savings Plans — Flexible commitment discounts

Google Cloud

  • Billing Reports — Cost visualisation
  • Recommender — Right-sizing and idle resource recommendations
  • Committed Use Discounts — Reservations
  • Active Assist — Optimisation recommendations

Azure

  • Cost Management — Visualisation and budgets
  • Advisor — Optimisation recommendations
  • Reservations — Commitment discounts
  • Azure Hybrid Benefit — Use existing licenses

Third-Party Tools

  • Infracost — Cost estimates for Terraform in CI/CD
  • OpenCost — Kubernetes cost monitoring (CNCF project)
  • Kubecost — Kubernetes cost management
  • CloudHealth — Multi-cloud cost management
  • Spot.io — Automated spot instance management
  • CAST AI — Kubernetes cost optimisation