P0 INCIDENTDynamoDB Hot Partition Throttling#ENG-WAR-028AmazonLyft
Mid~25 min15:00
API CPU Usage99.2%↑ 42%
P99 Latency2450 ms↑ 400%
5xx Error Rate12.4%↑ 12%
DB Connections14,492↑ 800%
bastion-prod-1.internal — bash
[SYSTEM] War-Room terminal initialised. Bastion host connection established.
[SYSTEM] Active incident: DynamoDB Hot Partition Throttling
[SYSTEM] Type "help" for a list of investigation commands.
user@bastion:~$
Execute Remediation⚠ PROD
Lyft's ride request API is throwing ThrottlingException from DynamoDB during a surge event. CloudWatch shows DynamoDB consumed capacity is evenly distributed ACROSS the table, but one specific partition is consuming 80x more than others. The partition key for ride requests is ride_date (e.g., '2024-01-15') + hour ('14') — meaning all rides in the same hour go to the same partition.

What is your first action?