P0 INCIDENTAWS S3 Runaway rm Command (Region-Wide Outage)#ENG-WAR-014AWSAirbnb
Mid~20 min15:00
API CPU Usage99.2%↑ 42%
P99 Latency2450 ms↑ 400%
5xx Error Rate12.4%↑ 12%
DB Connections14,492↑ 800%
bastion-prod-1.internal — bash
[SYSTEM] War-Room terminal initialised. Bastion host connection established.
[SYSTEM] Active incident: AWS S3 Runaway rm Command (Region-Wide Outage)
[SYSTEM] Type "help" for a list of investigation commands.
user@bastion:~$
Execute Remediation⚠ PROD
A junior engineer ran a maintenance script with a typo in the bucket name parameter. Instead of deleting temporary files in 'app-logs-temp', the script targeted 'app-logs' — your primary application asset bucket containing 500GB of user-uploaded files and static assets. The delete operation ran for 3 minutes before the engineer noticed and killed it. An unknown number of objects have been permanently deleted. The CDN is now returning 403s for missing assets.

What is your first action?