Platform — A foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced coordination. — E. Bottcher
Reliability & Operations
- Incident Management — On-call, runbooks, blameless postmortems
- Reliability Metrics — SLIs, SLOs, SLAs, error budgets
- Observability — Observability, dashboards, alerting strategies
- Disaster Recovery — RTO/RPO, DR strategies, chaos engineering
Infrastructure
Security & Governance
- Infrastructure Security — Zero trust, secrets management, supply chain security
- OPA — Policy as code
Cost Management
- Cost Optimisation — FinOps, tagging, reservations, waste elimination
Delivery
- DORA or Four Key Metrics
- CI/CD
- Deployment strategies: blue-green, canary, rolling
Platform Metrics
HEAT framework:
- Happiness (Satisfaction)
- Efficiency
- Adoption
- Task success
Culture
You build it, you run it:
- Organise infrastructure code with application code so it’s not a black box
- Product teams are responsible for monitoring
- Anyone can request any role (privileges)
Shortening feedback loops:
- Branch previews
- Fast CI/CD pipelines
- Rapid rollback capabilities
Blameless culture:
- Clear incident process
- Learning from failures
- Platform team enables product teams
Source: https://www.linuxrecruit.co.uk/insight/devops-exchange-november-ft-accurx-playstation-and-onto