Kubernetes in Production: Security, Monitoring, and Cost Optimization
March 20, 2025
14 min read
By DevOps Team
Production-hardened Kubernetes deployment strategies covering service mesh, observability, auto-scaling, and infrastructure-as-code best practices.
Production Kubernetes: Beyond the Tutorial
Running Kubernetes in production is vastly different from local development. This guide covers the security, monitoring, cost optimization, and operational practices that separate toy clusters from enterprise-grade infrastructure.
- •Security hardening (RBAC, policies, secrets)
- •Monitoring and observability (metrics, logs, traces)
- •Cost optimization (right-sizing, autoscaling, spot instances)
- •Disaster recovery and high availability
- •GitOps and deployment strategies
Security Hardening
YAML
# Production Security Configuration
# 1. Network Policies - Zero Trust Networking
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-network-policy
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: production
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
# 2. Pod Security Standards - Restrict privileges
apiVersion: policy/v1beta1
kind:PodSecurityPolicy
metadata:
name: restricted
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
readOnlyRootFilesystem: true
# 3. RBAC - Least Privilege Access
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: developer-role
rules:
- apiGroups: ["", "apps", "batch"]
resources: ["pods", "deployments", "jobs"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get"]
# Note: No delete, no secrets access
# 4. Secrets Management with External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: api-secrets
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: api-secrets
creationPolicy: Owner
data:
- secretKey: database-url
remoteRef:
key: prod/api/database-url
- secretKey: api-key
remoteRef:
key: prod/api/key
# 5. Resource Limits - Prevent noisy neighbors
apiVersion: v1
kind: LimitRange
metadata:
name: resource-limits
namespace: production
spec:
limits:
- max:
cpu: "2"
memory: 2Gi
min:
cpu: 100m
memory: 128Mi
default:
cpu: 500m
memory: 512Mi
defaultRequest:
cpu: 200m
memory: 256Mi
type: Container
Monitoring and Observability
The Three Pillars of Observability:
- 1.Metrics (Prometheus + Grafana)
- 2.Logs (Loki + Promtail)
- 3.Traces (Tempo + OpenTelemetry)
- •Cluster Level: Node CPU/memory, pod count, PVC usage
- •Application Level: Request rate, error rate, latency (p50, p95, p99)
- •Business Level: Sign-ups, transactions, revenue
- •Cost: Resource utilization, waste, spot instance savings
- •Only alert on actionable issues
- •Use runbooks for all alerts
- •Escalation policies (PagerDuty, Opsgenie)
- •Alert grouping and deduplication
KubernetesDevOpsSecurityProductionInfrastructure