Infrastructure Security

Infrastructure security encompasses the practices, tools, and processes used to protect your systems, networks, and data from threats. In a DevSecOps model, security is integrated throughout the development and operations lifecycle rather than bolted on at the end.

Zero Trust Architecture

Traditional security assumes everything inside the network perimeter is trusted. Zero trust assumes nothing is trusted by default.

Core Principles

Never trust, always verify — Authenticate and authorise every request
Assume breach — Design as if attackers are already inside
Verify explicitly — Use all available data points (identity, location, device, behaviour)
Least privilege access — Grant minimum necessary permissions
Micro-segmentation — Limit blast radius with fine-grained network controls

Implementing Zero Trust

Strong identity verification for all users and services
Device health validation before granting access
Micro-segmentation of networks and workloads
Encryption of all data in transit
Continuous monitoring and validation
Just-in-time and just-enough access

Identity and Access Management (IAM)

Principles

Least privilege: Grant only the permissions needed to perform a task, nothing more.

Separation of duties: Divide responsibilities so no single person can compromise the system (e.g., developers can’t deploy to production without review).

Defense in depth: Multiple layers of controls; if one fails, others still protect.

Authentication

Multi-factor authentication (MFA): Require something you know (password) + something you have (token, phone) or something you are (biometrics).

Enforce MFA for all human users, especially privileged access
Use hardware tokens (YubiKey) for highest security
Avoid SMS-based MFA where possible (vulnerable to SIM swapping)

Service-to-service authentication:

Use short-lived credentials (tokens, certificates)
Mutual TLS (mTLS) between services
Avoid long-lived API keys where possible

Authorisation

Role-Based Access Control (RBAC):

Assign permissions to roles, assign roles to users
Easier to manage than individual permissions
Common in Kubernetes, cloud IAM

Attribute-Based Access Control (ABAC):

Permissions based on attributes (user department, resource owner, time of day)
More flexible but more complex

Policy as Code:

Define access policies in code (e.g., OPA/Rego)
Version control and review policies like application code
Test policies before deployment

Privileged Access Management

Use just-in-time (JIT) access — grant elevated permissions only when needed
Require approval workflows for sensitive operations
Time-bound access — automatically revoke after a period
Audit all privileged access

Secrets Management

Secrets include API keys, database credentials, certificates, encryption keys, and tokens.

Bad Practices

Hardcoding secrets in source code
Storing secrets in environment variables in plain text
Committing secrets to version control
Sharing secrets via email or chat
Using the same secret across environments

Good Practices

Use a dedicated secrets manager
Rotate secrets regularly (and automatically)
Audit secret access
Use different secrets per environment
Encrypt secrets at rest and in transit

Secrets Management Tools

HashiCorp Vault — Industry standard, feature-rich
AWS Secrets Manager / AWS Parameter Store
Google Secret Manager
Azure Key Vault
Doppler — Developer-friendly secrets management
SOPS — Encrypted files in version control

Kubernetes Secrets

Native Kubernetes secrets are base64-encoded, not encrypted. Enhance security with:

Sealed Secrets — Encrypt secrets for GitOps
External Secrets Operator — Sync from external vaults
Enable encryption at rest for etcd

Network Security

Network Segmentation

Divide networks into zones with controlled traffic between them.

Typical zones:

Public — Internet-facing (load balancers, CDN)
DMZ — Semi-trusted (web servers, API gateways)
Private — Internal services (application servers)
Restricted — Sensitive data (databases, secrets)

Firewall and Security Groups

Default deny — only allow explicitly permitted traffic
Use security groups/NSGs to control traffic between resources
Review rules regularly; remove unused rules
Log denied traffic for security analysis

See AWS Security Groups for cloud-specific guidance.

Web Application Firewall (WAF)

Protect web applications from common attacks:

SQL injection
Cross-site scripting (XSS)
OWASP Top 10 vulnerabilities

Services: AWS WAF, Cloudflare WAF, Azure WAF, Fastly

DDoS Protection

Use CDN providers with built-in DDoS mitigation
Enable cloud provider DDoS protection (AWS Shield, GCP Cloud Armor)
Implement rate limiting
Design for horizontal scaling

Private Connectivity

Use private endpoints/Private Link to access cloud services without internet
VPN or Direct Connect for on-premises connectivity
VPC peering or transit gateways for cross-account/cross-region connectivity

Supply Chain Security

Software Bill of Materials (SBOM)

An SBOM lists all components (dependencies, libraries) in your software. Essential for:

Knowing what you’re running
Responding quickly to vulnerabilities (e.g., Log4Shell)
Compliance requirements

Tools: Syft, Trivy, CycloneDX

Dependency Scanning

Automatically scan dependencies for known vulnerabilities:

Scan in CI/CD pipelines
Fail builds for critical vulnerabilities
Monitor continuously (new CVEs affect existing code)

Tools:

Container Image Security

Build:

Use minimal base images (distroless, Alpine) or Docker Hardened Images
Don’t run as root
Pin image versions (avoid latest tag)
Scan images for vulnerabilities
Sign images to verify authenticity

Runtime:

Use read-only filesystems where possible
Drop unnecessary capabilities
Enforce image policies (only allow signed/scanned images)

Tools:

Trivy — Vulnerability scanning
Cosign — Container signing
Kyverno / OPA Gatekeeper — Policy enforcement

Code Signing

Sign commits and artefacts to ensure integrity and authenticity:

Sign Git commits with GPG or SSH keys
Sign container images
Sign Helm charts and other deployment artefacts

Vulnerability Management

Scanning

What to scan:

Application code (SAST — Static Application Security Testing)
Dependencies (SCA — Software Composition Analysis)
Container images
Infrastructure as Code (misconfigurations)
Cloud configurations
Running systems (DAST — Dynamic Application Security Testing)

When to scan:

In CI/CD pipelines (shift left)
Continuously in production (runtime scanning)
Before major releases

Triage and Prioritisation

Not all vulnerabilities are equal. Prioritise based on:

Severity — CVSS score
Exploitability — Is there a known exploit? Is it easy to exploit?
Exposure — Is the vulnerable component reachable from the internet?
Business impact — What could an attacker do if they exploited it?

Patch Management

Automate patching where possible (auto-merge dependabot, auto-update base images)
Define SLAs for patching (e.g., critical within 24 hours, high within 7 days)
Test patches before production deployment
Track patch compliance

Tools

Trivy — All-in-one scanner (containers, IaC, SBOM)
Snyk — Developer-first security platform
Checkov — IaC security scanning
tfsec — Terraform security scanner
Semgrep — Static analysis
SonarQube — Code quality and security

Compliance and Governance

Common Frameworks

SOC 2 — Trust service criteria for service organisations
ISO 27001 — Information security management
PCI DSS — Payment card industry standards
HIPAA — Healthcare data protection (US)
GDPR — Data protection (EU)
FedRAMP — US government cloud security

Policy as Code

Codify compliance requirements:

Use OPA or Kyverno for policy enforcement
Scan infrastructure with Checkov, tfsec
Block non-compliant deployments in CI/CD

Audit Logging

Log all access to sensitive systems and data
Log administrative actions
Centralise logs in immutable storage
Retain logs per compliance requirements
Enable cloud provider audit trails (CloudTrail, GCP Audit Logs, Azure Activity Log)

Kubernetes Security

Cluster Hardening

Keep Kubernetes up to date
Enable RBAC (disable ABAC)
Encrypt etcd data at rest
Restrict access to the API server
Use network policies to control pod-to-pod traffic
Enable audit logging

Pod Security

Don’t run containers as root
Use read-only root filesystems
Drop all capabilities, add only what’s needed
Use Pod Security Standards (restricted, baseline, privileged)
Set resource limits to prevent denial of service

Network Policies

Control traffic between pods:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
spec:
  podSelector: {}
  policyTypes:
  - Ingress

Start with default deny, then allow specific traffic.

Runtime Security

Detect threats at runtime:

Falco — Runtime security and threat detection
Tetragon — eBPF-based security observability
KubeArmor — Runtime security enforcement

Incident Response

See Incident Management for general incident response practices.

Security-specific considerations:

Have a dedicated security incident process
Know when to involve legal, PR, and executive leadership
Preserve evidence for forensics
Have containment strategies ready (isolate affected systems)
Plan for disclosure and communication

Rai Notes

Explorer