EngPrep — Real Engineering. Real Interviews.

←

SYSTEM DESIGNDesign a Web CrawlerGoogleOpenAI

TRAFFIC LEVEL

—/3

CONSTRAINTS

Pages per month1 Billion

Avg page size100 KB

Total data~100 TB/month

Crawl rate needed~400 pages/sec

Unique URL store~200 GB (Bloom Filter)

Compute & Network

Load BalancerDistribute traffic

API GatewayEntry point / auth

API ServerBusiness logic

Worker NodeAsync processing

CDN EdgeGlobal cache

WebSocket GatewayPersistent connections

Data Stores

PostgreSQLRelational DB

MySQLRelational DB

CassandraWide-Column DB

DynamoDBNoSQL / Managed

S3 BucketObject storage

Queues & Cache

Redis CacheIn-memory store

KafkaEvent stream

ZookeeperCoordination

Specialized

Bloom FilterProbabilistic set

Rate LimiterThrottling

Geohash ServiceGeospatial index

Trie ServerPrefix search

APNS / FCMPush notifications

AggregatorBatch / roll-up

Drag to canvas · Hover node for × to delete · Draw from handle to connect

⚡

Design your architecture

Drag components from the left panel · Connect them by drawing from a node handle · Hit Start Simulation to validate

🚨 INCIDENT

GoogleOpenAI

Crawl 1 billion web pages a month to train an LLM. Avoid crawling the same page twice, respect robots.txt, and avoid getting blocked by anti-DDoS systems.

📥 Assigned to:You — Senior Engineer

SCALE LEVELS

100 RPS

Target: <500ms

400 RPS

Target: <2000ms

400 RPS

Target: <100ms

GLOBAL SUCCESS RATE

100.0%

P99 LATENCY

45ms

Target: < 200ms

TOTAL RPS INGESTED0 / 11,000