Core insight: Event sourcing is not a better CRUD system. It is a decision to make history the system of record, which only pays when history itself is the product truth rather than secondary evidence.
Most teams ask, “Could this system benefit from event sourcing?”
That question is too weak to be useful. Almost any system can be made to “benefit” from append-only history if the comparison is generous enough. Better traceability. Cleaner writes. Future replay. Potential analytics. None of that gets to the point.
The real question is this: is history the truth, or is history just evidence?
A ledger, policy timeline, entitlement engine, trading workflow, or legally significant approval system may need the exact sequence of domain facts to be first-class truth. In those systems, “what was true at 14:03:17 under the rules that applied then?” is not an edge case. It is the product.
A back-office SaaS product with accounts, settings, users, invoices, and approvals usually does not live there. It may need to know who changed something and when. That is an audit requirement. It is not automatically a reason to rebuild the system around historical reconstruction.
This is where teams get seduced by the wrong thing. They confuse advanced architecture with better architecture. They choose event sourcing because it feels like the serious engineer’s version of CRUD.
It is not.
Event sourcing is not what you choose when you want a better CRUD system. It is what you choose when CRUD is lying about the nature of the business.
Event sourcing is expensive because it changes what data is.
In a state-first system, the current row or document is the truth and history is secondary. In an event-sourced system, history is the truth and current state is a derivative. That shift sounds clean in a design review. In production, it moves cost into the parts of the system people actually operate.
The first cost is projection dependency.
Users do not interact with event streams. They interact with balances, order states, search results, entitlement summaries, invoice views, and dashboards. Those are all projections. So the practical user experience depends less on the elegance of the append path and more on whether the read side is fresh, correct, and debuggable.
A system can append 10,000 events per second and still feel broken because one projection is 90 seconds behind. The dashboard shows green writes and durable storage. The customer sees stale balance or missing order. The first thing that fails is usually not persistence. It is trust.
Teams learn this late because the write path looks healthy for a long time. The ugly part is that customers do not care that your truth is durable if the screen in front of them is wrong.
The second cost is rebuild time.
This is the bill teams pretend is a feature.
Replay sounds empowering when you first adopt event sourcing. Later it becomes the question nobody wants to ask out loud: how long will it take to rebuild something important from history, and what else will we hurt while doing it?
An aggregate with 50 events is easy to romanticize. Read the stream, rehydrate, move on. An aggregate with 50,000 events is where the romance ends. At that point you are thinking about snapshot cadence, cold-path latency, archival boundaries, and whether the aggregate was a bad boundary in the first place. State-first systems tolerate bad aggregate boundaries as awkwardness. Event-sourced systems turn them into hot streams, replay cost, snapshot dependence, and on-call pain.
Suppose you have 1.5 billion events over three years and need to rebuild a projection after fixing a business logic bug. At an effective end-to-end replay rate of 25,000 events per second, that is about 16.7 hours in theory. In reality you throttle, validate, retry, hit hot partitions, and protect downstream systems. What looked like a maintenance task becomes a multi-day operational exercise.
Replay cost also scales with projection count and read-model diversity, not just raw event volume. Rebuilding one denormalized table is manageable. Rebuilding a customer timeline, search index, compliance export, fraud feature store, and internal finance view from the same history is where event sourcing stops feeling elegant and starts feeling infrastructural.
The replay you brag about in design review is often the maintenance window you keep postponing in production.
The third cost is event permanence.
Bad tables can be migrated. Bad durable events stay with you.
Once domain history is the primary truth, your event contracts are no longer internal implementation details. A lazy event name, a missing field, or an overloaded semantic boundary does not remain a local mistake. It becomes part of the archaeology of the system. “OrderUpdated” becomes a junk drawer. Missing reason codes become permanent support pain. One rushed compatibility shortcut becomes years of translation logic.
This is one of the least appreciated truths about event sourcing: it does not forgive vague modeling. It preserves it.
Versioning gets nastier as the system succeeds. One producer and two consumers is annoying. Twelve consumers across billing, analytics, search, risk, support tooling, and archival jobs is governance by historical scar tissue. Schema evolution stops being a code concern and becomes a coordination concern.
You do not clean this up in a sprint. You carry it.
The fourth cost is debugging distance.
In a state-based system, wrong data is often directly inspectable. In an event-sourced system, wrong data may live in a projection, originate in an old event version, and surface only because one replay path hit one edge case that normal flow never exercised.
You are not just debugging what the system did. You are debugging what the system inferred from what it did.
At 2 a.m., that difference matters. An on-call engineer trying to explain why a customer balance is wrong by $47.12 would usually prefer a current-state table plus an audit record over a chain of events, projection offsets, snapshot state, replay behavior, and compensating writes.
A system can preserve historical truth perfectly and still be terrible at answering support questions quickly.
The fifth cost is storage growth that turns into operational policy.
Raw storage is rarely the hardest part. If you average 1 KB per event and produce 50 million events per day, that is about 50 GB per day before replication, indexes, metadata, and snapshots. Many teams can absorb that.
What becomes expensive is not the bytes. It is the promise that those bytes remain interpretable, replayable, and useful.
Fifty gigabytes per day becomes about 18 TB per year before copies and derived storage. Over four years, you are no longer discussing whether storage is cheap. You are deciding how much history stays hot, how much can be replayed inside operational windows, and which product expectations silently assume that all historical truth remains instantly available.
History is expensive when it becomes executable.
The sixth cost is operational maturity.
Event sourcing is not just a data-model choice. It is an operating model. You need disciplined event versioning, idempotent handlers, replay isolation, projection lag monitoring, dead-letter handling, backfill tooling, retention policy, and engineers who understand the difference between durable facts and derived state.
That is why the pattern disappoints in ordinary products. The write volume is modest. The domain is mostly current-state-oriented. The organization never truly needed history-first design, so it never builds the muscle to operate it well. Complexity arrives before value, then stays.
My strongest view here is simple: event sourcing is usually a bad trade for low-volume CRUD systems, even when the team is fully capable of implementing it. Competence is not justification.
Ask four questions.
1. Is historical truth central to correctness?
Not “history would be nice.” Not “support might use it.” Central. If you cannot correctly explain or compute business state without the sequence of domain facts, event sourcing may be justified.
2. Is replay a business capability or an architectural fantasy?
If replay will not materially improve reporting, reconstruction, simulation, compliance, or historical decision analysis, it is probably not worth carrying forever.
3. Are multiple derived read models genuinely part of the product’s future?
If the same durable facts will feed materially different, long-lived views, event sourcing gets stronger. If the system mostly needs a few current-state queries and a human-readable audit trail, simpler designs usually win.
The alternative is not hand-wavy simplicity. In many domains the stronger design is: current-state tables as authority, append-only audit records with actor and reason metadata, explicit business timestamps, immutable ledgers only where they are truly needed, and narrow temporal history tables for the few entities that actually require point-in-time reconstruction. That is not a lesser architecture. It is a more honest one.
4. Can the team operate history-first systems soberly?
That includes versioning, projection ownership, lag budgets, replay tooling, data repair, and incident debugging. Without that maturity, event sourcing becomes performative sophistication.
A blunt rule works well: this is overkill unless the business would pay real money to preserve exact domain history as primary truth, not just proof that something changed.
Case Walkthrough 1
A larger-scale system where event sourcing earns its keep
Consider a payments ledger for a global marketplace handling wallet balances, holds, releases, settlements, reversals, and dispute adjustments. Peak traffic is about 6,000 posting-related writes per second. The system feeds current balances, settlement views, merchant history, finance workflows, and regulatory extracts.
This is a real fit.
The business truth is not “balance = X.” The truth is the ordered sequence of financial facts that made balance become X. If a regulator asks what an account looked like at a specific timestamp under the rules that applied then, a current-state table plus audit log starts to look like a reconstruction exercise. In this domain, chronology is not metadata. It is the product contract.
Replay has real value here. New reporting models can be derived from durable history. Reconciliation logic can be improved and rerun. Dispute systems can reconstruct temporal context. Audit is not extra reporting. It is core behavior.
But this is also where the cost becomes honest.
Suppose a reconciliation projection undercounts a class of reversal events for 11 months. Fixing it is not a deploy. It is a replay plan. If you replay naively into live downstream systems, you can double publish, flood alerts, distort finance views, or saturate the read path that merchants are actively using.
This is one of the places where event sourcing creates very specific production pain.
Early signal: support sees a small number of merchants with mismatched settlement totals after reversal-heavy days.
What the dashboard shows first: slightly elevated projection lag and increased write pressure on the reconciliation store, while append latency stays green.
What is actually broken first: operational trust in the derived financial view. Durability is intact, but one important read model is no longer safe to use.
Immediate containment: freeze the affected projection, route finance workflows to authoritative ledger queries for impacted accounts, and replay into a shadow projection rather than the live one.
Durable fix: version projection logic explicitly, isolate replay traffic from live consumers, make replay paths side-effect free by design, and assign clear ownership for projection correctness.
A lot of teams discover this only once. After the first bad replay, nobody treats projection code like harmless glue again.
At this scale, techniques that sounded optional at launch become necessary. Snapshots are not polish. They are protection against aggregate rehydration cost. Projection sharding is not elegance. It is how rebuilds stop fighting live traffic. Archival tiers are not just storage optimization. They are a statement about how much history remains operationally replayable.
Another non-obvious truth shows up here: the hardest bugs are often not missing events. They are valid old events interpreted under changed rules. The history is intact. The meaning moved.
Case Walkthrough 2
A small-scale system where event sourcing is usually vanity
Now take a B2B administration product with 12 core entities, 3 to 10 writes per second, and common workflows like user creation, plan updates, discount rules, invoice state, approval actions, and admin notes.
This is the kind of system that adopts event sourcing to future-proof itself.
It is usually a mistake.
The business mostly cares about current state. Support wants to know what the subscription is now, who changed the override yesterday, and whether approval happened before or after invoicing. Those are state-plus-history questions. They are not evidence that history itself is the primary truth.
A better design is usually boring in the right way:
normalized current-state tables
append-only audit records with actor, timestamp, mutation type, and before-and-after values where they matter
explicit business timestamps for important transitions
targeted history tables for the small number of entities where temporal queries really matter
That system is easier to read, easier to debug, easier to explain, and usually more honest about the product.
Take a believable small-scale example. Say the system averages 8 writes per second, maintains 6 projections, and produces about 700,000 events per day. After 18 months it has around 380 million events. A new billing-reporting projection is introduced and replay runs at only 4,000 events per second once validation and downstream writes are included. That is more than 26 hours to build a single new view for a modest product.
This is the trap. The business never became meaningfully temporal. The architecture simply imported replay cost.
The pain is not dramatic at first. It is worse than that.
A customer reports that a subscription shows “active” in one admin screen, “pending approval” in another, and the invoice preview agrees with neither. The write path is technically correct. One projection is behind and another is interpreting an older event version differently. Now debugging requires event streams, projection offsets, code version history, and knowledge of replay behavior instead of reading the row the business thought it owned.
Early signal: support tickets say “sometimes wrong after refresh” rather than outright broken.
What the dashboard shows first: mild consumer lag, a few dead-lettered projection messages, normal API success rates, normal database latency.
What is actually broken first: the product becomes hard to explain even while it remains durably correct underneath.
Immediate containment: narrow the affected workflows, temporarily serve those pages from a state-first fallback or targeted on-demand recomputation, and stop broad replay until the bad projection version is isolated.
Durable fix: stop event-sourcing domains that do not need temporal truth, or at least split the bounded context so CRUD-heavy admin state remains state-first.
The ugliest part is not the lag. It is the meeting where everyone realizes the system is “correct,” nobody trusts it, and support still needs an answer.
That last point matters. Event sourcing can be manageable when applied to a narrow workflow with clear fact sequencing. It becomes expensive when stretched across a broad CRUD-heavy surface where half the events are just renamed state mutations. In that kind of product, bounded-context separation is often the senior move.
A useful blunt rule: if your system mostly needs current state with traceability, event sourcing is usually architectural theater.
Case Walkthrough 3
A mixed domain where selective use is the adult answer
Consider a commerce platform with product catalog, inventory reservations, pricing decisions, promotions, checkout, fulfillment, and returns.
The mistake here is not choosing event sourcing or rejecting it. The mistake is flattening the whole domain into one answer.
Inventory reservations may justify event sourcing because the business cares about exact sequencing of holds, releases, expirations, and compensations across asynchronous workflows. Pricing decisions may justify it if disputes or contractual obligations depend on temporal truth. Product descriptions, admin settings, content metadata, and broad configuration usually do not.
The adult move is selective adoption. Event-source the narrow parts of the domain where truth is inherently sequential and time-sensitive. Keep the rest state-first. Otherwise you end up paying projection, replay, and versioning costs for catalog edits and settings screens that never needed history-first modeling in the first place.
What Changes at Scale
Scale does not automatically justify event sourcing. It just stops hiding its cost.
At higher volume, the append path often remains healthy longer than expected. The trouble usually appears in materialization, replay, version compatibility, and historical operations.
The 10x transition is where the design gets honest. At 500 events per second, one projection fleet, shallow history, and a few snapshots, replay is annoying. At 5,000 events per second, with more projections and more consumers, the same design starts demanding decisions about shard ownership, snapshot cadence, retention tiers, replay isolation, and whether some read models should still be fully derived from the full stream at all.
The architecture did not become wrong overnight. It ran out of cheap assumptions about how often history would be reprocessed, how many consumers would depend on old events, and how much lag the product could absorb before users stopped trusting it.
The first bottleneck is often the projection fleet or the downstream stores feeding read models. You add a new search projection, start replaying 2.3 billion events, and discover the bottleneck is not event retrieval. It is the search cluster absorbing rebuild traffic while still serving live indexing.
There is also a dangerous middle phase where the dashboard still looks healthy. Append latency is green. Broker lag is tolerable. Error rates are low. Meanwhile a new projection replay has stretched from 90 minutes to 19 hours, compatibility code for old events is multiplying, and the team quietly avoids historical reprocessing because it is operationally painful. Live-service metrics look fine while historical debt is already expensive.
Another scaling failure shows up when business rules change. A pricing or entitlement rule is updated, and old events lack a field the new rule assumes, such as region, contract tier, or override precedence. Live traffic stays healthy. The append path stays green. But replay and new projections now have to choose between inference and inconsistency. This is where event sourcing stops being a data pattern and becomes historical semantics management.
Failure propagation also gets nastier. A projection consumer falls behind by 30 seconds. Customers retry. Retries create more writes. More writes deepen lag. Support sees inconsistent views and triggers manual corrections. Manual corrections create more events. The system starts amplifying the gap between durable truth and visible truth.
Meaningful caveat: at very large scale, event sourcing can still be exactly right if the domain genuinely needs it. But once historical depth makes full replay expensive, snapshots, bounded rebuilds, archival tiers, and context separation stop being optimization. They become survival.
The Mistakes That Compound
The first compounding mistake is publishing events that describe service behavior instead of domain fact. “OrderUpdated” is rarely a fact. It is usually evidence that the team avoided deciding what actually happened.
The second is mixing domain events with workflow or process events in the same truth model. “InvoiceApproved” and “EmailSent” do not have the same semantic weight. When they share a history stream without clear boundaries, replay becomes a semantic mess rather than a reconstruction tool.
The third is delaying event versioning because “we only have one consumer for now.” That logic ages badly. The cost is not just code branches later. It is that future consumers inherit old ambiguity as if it were business truth.
The fourth is using replay as a repair mechanism before replay is safe. Teams add a replay button before they have side-effect fences, projection isolation, or confidence about what a replay actually republishes.
The fifth is pretending the event stream is the source of truth while everyone actually trusts the projection. That split brain is common. Officially the stream is authoritative. Operationally the team debugs, explains, and supports the business from projection state. Once that happens, projection correctness is no longer secondary machinery. It is part of the product.
This is how teams end up with two truths and confidence in neither.
The sixth is introducing new projections years later and discovering the old events are semantically incomplete for the new question. The historical record is intact, but it never captured the reason code, eligibility context, or precedence rule the new projection needs. Teams call this a projection problem. It is usually a truth-model problem discovered late.
The seventh is letting the event store become central infrastructure by accident. No one explicitly owns replay safety, retention policy, compatibility guarantees, or historical data repair, but the entire system quietly depends on them. That is how event stores become core operational machinery without core operational stewardship.
When the Conventional Wisdom Is Wrong
Conventional wisdom says event sourcing is more advanced, so serious systems eventually grow into it.
Wrong. Many serious systems should avoid it on purpose.
Conventional wisdom says storage is cheap, so keeping everything forever is prudent.
Misleading. Cheap bytes are not cheap historical operations. Replayable history is much more expensive than retained records.
Conventional wisdom says event sourcing gives you perfect auditability.
Only in the thinnest sense. It gives you durable history. Whether that history is human-usable, queryable, and explainable depends on design discipline. Many teams would serve compliance, support, and operations better with current state plus a strong audit model.
Conventional wisdom says replay gives flexibility.
Potential flexibility, yes. Actual flexibility depends on event quality, side-effect isolation, version discipline, and tooling. Bad history is not flexible. It is permanent.
Another bad assumption is timing. Teams see event sourcing behave well at launch and take that as proof that the choice was right. Day 1 flatters history-first systems. Historical depth is shallow, projections are few, versions are young, and the engineers who designed the model still remember what they meant. The awkwardness arrives later.
The Decision Checklist
Ask these questions before choosing event sourcing:
Is the sequence of domain facts itself part of product correctness?
Do we need temporal queries that current state plus audit tables cannot answer cleanly?
Will replay create concrete business value, not just architectural optionality?
Are we ready to version events deliberately from the start?
Can we rebuild critical projections inside acceptable operational windows?
Do we know which read models may lag, and which workflows cannot tolerate that lag?
Can we replay safely without re-triggering external side effects?
Will support and on-call engineers be able to explain customer state at 2 a.m.?
Are we choosing this because history is essential, or because the architecture feels sophisticated?
Who explicitly owns projection correctness, replay tooling, event compatibility, retention decisions, and historical repair semantics?
Would state-plus-audit solve 80 to 90 percent of the need with far less machinery?
If question 11 makes the room uncomfortable, that discomfort is usually telling you something useful.