Infrastructure security encompasses the practices, tools, and processes used to protect your systems, networks, and data from threats. In a DevSecOps model, security is integrated throughout the development and operations lifecycle rather than bolted on at the end.
See also: Incident Management, Disaster Recovery, Networking.
Zero Trust Architecture
Traditional security assumes everything inside the network perimeter is trusted. Zero trust assumes nothing is trusted by default.
Core Principles
- Never trust, always verify — Authenticate and authorise every request
- Assume breach — Design as if attackers are already inside
- Verify explicitly — Use all available data points (identity, location, device, behaviour)
- Least privilege access — Grant minimum necessary permissions
- Micro-segmentation — Limit blast radius with fine-grained network controls
Implementing Zero Trust
- Strong identity verification for all users and services
- Device health validation before granting access
- Micro-segmentation of networks and workloads
- Encryption of all data in transit
- Continuous monitoring and validation
- Just-in-time and just-enough access
Identity and Access Management (IAM)
Principles
Least privilege: Grant only the permissions needed to perform a task, nothing more.
Separation of duties: Divide responsibilities so no single person can compromise the system (e.g., developers can’t deploy to production without review).
Defence in depth: Multiple layers of controls; if one fails, others still protect.
Authentication
Multi-factor authentication (MFA): Require something you know (password) + something you have (token, phone) or something you are (biometrics).
- Enforce MFA for all human users, especially privileged access
- Use hardware tokens (YubiKey) for highest security
- Avoid SMS-based MFA where possible (vulnerable to SIM swapping)
Service-to-service authentication:
- Use short-lived credentials (tokens, certificates)
- Mutual TLS (mTLS) between services
- Avoid long-lived API keys where possible
Authorisation
Role-Based Access Control (RBAC):
- Assign permissions to roles, assign roles to users
- Easier to manage than individual permissions
- Common in Kubernetes, cloud IAM
Attribute-Based Access Control (ABAC):
- Permissions based on attributes (user department, resource owner, time of day)
- More flexible but more complex
Policy as Code:
- Define access policies in code (e.g., OPA/Rego)
- Version control and review policies like application code
- Test policies before deployment
Privileged Access Management
- Use just-in-time (JIT) access — grant elevated permissions only when needed
- Require approval workflows for sensitive operations
- Time-bound access — automatically revoke after a period
- Audit all privileged access
Secrets Management
Secrets include API keys, database credentials, certificates, encryption keys, and tokens.
Bad Practices
- Hardcoding secrets in source code
- Storing secrets in environment variables in plain text
- Committing secrets to version control
- Sharing secrets via email or chat
- Using the same secret across environments
Good Practices
- Use a dedicated secrets manager
- Rotate secrets regularly (and automatically)
- Audit secret access
- Use different secrets per environment
- Encrypt secrets at rest and in transit
Secrets Management Tools
- HashiCorp Vault — Industry standard, feature-rich
- AWS Secrets Manager / AWS Parameter Store
- Google Secret Manager
- Azure Key Vault
- Doppler — Developer-friendly secrets management
- SOPS — Encrypted files in version control
Kubernetes Secrets
Native Kubernetes secrets are base64-encoded, not encrypted. Enhance security with:
- Sealed Secrets — Encrypt secrets for GitOps
- External Secrets Operator — Sync from external vaults
- Enable encryption at rest for etcd
Network Security
Network Segmentation
Divide networks into zones with controlled traffic between them.
Typical zones:
- Public — Internet-facing (load balancers, CDN)
- DMZ — Semi-trusted (web servers, API gateways)
- Private — Internal services (application servers)
- Restricted — Sensitive data (databases, secrets)
Firewall and Security Groups
- Default deny — only allow explicitly permitted traffic
- Use security groups/NSGs to control traffic between resources
- Review rules regularly; remove unused rules
- Log denied traffic for security analysis
See AWS Security Groups for cloud-specific guidance.
Web Application Firewall (WAF)
Protect web applications from common attacks:
- SQL injection
- Cross-site scripting (XSS)
- OWASP Top 10 vulnerabilities
Services: AWS WAF, Cloudflare WAF, Azure WAF, Fastly
DDoS Protection
- Use CDN providers with built-in DDoS mitigation
- Enable cloud provider DDoS protection (AWS Shield, GCP Cloud Armor)
- Implement rate limiting
- Design for horizontal scaling
Private Connectivity
- Use private endpoints/Private Link to access cloud services without internet
- VPN or Direct Connect for on-premises connectivity
- VPC peering or transit gateways for cross-account/cross-region connectivity
Supply Chain Security
Software Bill of Materials (SBOM)
An SBOM lists all components (dependencies, libraries) in your software. Essential for:
- Knowing what you’re running
- Responding quickly to vulnerabilities (e.g., Log4Shell)
- Compliance requirements
Dependency Scanning
Automatically scan dependencies for known vulnerabilities:
- Scan in CI/CD pipelines
- Fail builds for critical vulnerabilities
- Monitor continuously (new CVEs affect existing code)
Tools:
- Dependabot (GitHub)
- Snyk
- Trivy
- Grype
Container Image Security
Build:
- Use minimal base images (distroless, Alpine) or Docker Hardened Images
- Don’t run as root
- Pin image versions (avoid
latesttag) - Scan images for vulnerabilities
- Sign images to verify authenticity
Runtime:
- Use read-only filesystems where possible
- Drop unnecessary capabilities
- Enforce image policies (only allow signed/scanned images)
Tools:
- Trivy — Vulnerability scanning
- Cosign — Container signing
- Kyverno / OPA Gatekeeper — Policy enforcement
Code Signing
Sign commits and artefacts to ensure integrity and authenticity:
- Sign Git commits with GPG or SSH keys
- Sign container images
- Sign Helm charts and other deployment artefacts
Vulnerability Management
Scanning
What to scan:
- Application code (SAST — Static Application Security Testing)
- Dependencies (SCA — Software Composition Analysis)
- Container images
- Infrastructure as Code (misconfigurations)
- Cloud configurations
- Running systems (DAST — Dynamic Application Security Testing)
When to scan:
- In CI/CD pipelines (shift left)
- Continuously in production (runtime scanning)
- Before major releases
Triage and Prioritisation
Not all vulnerabilities are equal. Prioritise based on:
- Severity — CVSS score
- Exploitability — Is there a known exploit? Is it easy to exploit?
- Exposure — Is the vulnerable component reachable from the internet?
- Business impact — What could an attacker do if they exploited it?
Patch Management
- Automate patching where possible (auto-merge dependabot, auto-update base images)
- Define SLAs for patching (e.g., critical within 24 hours, high within 7 days)
- Test patches before production deployment
- Track patch compliance
Tools
- Trivy — All-in-one scanner (containers, IaC, SBOM)
- Snyk — Developer-first security platform
- Checkov — IaC security scanning
- tfsec — Terraform security scanner
- Semgrep — Static analysis
- SonarQube — Code quality and security
Compliance and Governance
Common Frameworks
- SOC 2 — Trust service criteria for service organisations
- ISO 27001 — Information security management
- PCI DSS — Payment card industry standards
- HIPAA — Healthcare data protection (US)
- GDPR — Data protection (EU)
- FedRAMP — US government cloud security
Policy as Code
Codify compliance requirements:
- Use OPA or Kyverno for policy enforcement
- Scan infrastructure with Checkov, tfsec
- Block non-compliant deployments in CI/CD
Audit Logging
- Log all access to sensitive systems and data
- Log administrative actions
- Centralise logs in immutable storage
- Retain logs per compliance requirements
- Enable cloud provider audit trails (CloudTrail, GCP Audit Logs, Azure Activity Log)
Kubernetes Security
Cluster Hardening
- Keep Kubernetes up to date
- Enable RBAC (disable ABAC)
- Encrypt etcd data at rest
- Restrict access to the API server
- Use network policies to control pod-to-pod traffic
- Enable audit logging
Pod Security
- Don’t run containers as root
- Use read-only root filesystems
- Drop all capabilities, add only what’s needed
- Use Pod Security Standards (restricted, baseline, privileged)
- Set resource limits to prevent denial of service
Network Policies
Control traffic between pods:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
spec:
podSelector: {}
policyTypes:
- IngressStart with default deny, then allow specific traffic.
Runtime Security
Detect threats at runtime:
- Falco — Runtime security and threat detection
- Tetragon — eBPF-based security observability
- KubeArmor — Runtime security enforcement
Incident Response
See Incident Management for general incident response practices.
Security-specific considerations:
- Have a dedicated security incident process
- Know when to involve legal, PR, and executive leadership
- Preserve evidence for forensics
- Have containment strategies ready (isolate affected systems)
- Plan for disclosure and communication