Core insight: Pub/sub is not mainly decoupling. It is controlled amplification, and the cost appears downstream.

The Topic Looks Fine. Four Partitions Are Dying.

Show how average topic health can hide hot-key concentration, and how a few overloaded partitions can create lag across many subscriber groups even while the overall cluster looks healthy.

Placement note: Place near the hot-topic skew discussion.

The Outage Ends. The Expensive Part Starts.

Show that backlog is not just stored delay. It is future traffic that must be repaid while live traffic continues.

Placement note: Place near the backlog growth and catch-up section.

Why This Exists#

Pub/sub is worth its price when one fact needs many independent reactions.

An order changes state. Billing cares. Fraud cares. Search cares. Notifications care. Analytics cares. Support tooling cares. You do not want all of that in the synchronous request path. You want a durable event and independently owned subscribers.

That is the good version.

The expensive version starts later. More teams subscribe because the event already exists. More systems depend on the same stream. Subscriber quality diverges. Freshness expectations diverge. Recovery needs diverge. The producer stays one service. The delivery side becomes shared infrastructure.

The rest is for members.

Finish the essay and open the rest of the archive.