News Feed Architecture and the Fan-Out Decision That Defines Everything

News Feed Architecture and the Fan-Out Decision That Defines Everything | ArchCrux

Core insight: A news feed is not mainly a timeline storage problem. It is a decision about where fan-out cost lives, when ranking work happens, and how much celebrity skew the system can survive before the design starts lying.

The core trade-off is simple to state and expensive to operate:

Fan-out-on-write assembles candidate feeds when content is created, pushes entries outward, and makes reads cheaper. Fan-out-on-read assembles candidate feeds when users request them, makes writes cheap, and pays the cost live. Hybrid pushes where the economics work, pulls where they do not, and accepts that the feed is now a routing policy wearing an architecture diagram.

Choose fan-out-on-write and your hardest problems become write amplification, backlog growth, hot-partition behavior, celebrity isolation, and stale timeline repair. Choose fan-out-on-read and your hardest problems become scatter-gather latency, cache miss storms, request-time ranking budgets, and p99 tail control. Choose hybrid and your hardest problem becomes coherence. The feed is no longer one mechanism. It is multiple expensive truths that have to agree fast enough to look like one product.

My view is blunt: for any consumer product with real follower skew and meaningful ranking, pure fan-out-on-write is usually a phase, not a destination. It works until the graph stops being polite. Once enough exceptions exist, the exceptions are the architecture.

Why This System Is Deceptively Hard#

The whiteboard version stays clean right up until real traffic arrives.

Users create posts. Other users follow them. Build a timeline, maybe cache it, maybe rank it. That is why feed design is so often taught as a neat component diagram. Production strips that neatness away.

Feed architecture has to satisfy four forces that get more hostile as the graph gets less uniform.

Distribution cost

One post may need to become visible to 100 readers or 100 million readers. The average account is cheap. The graph tail is not.

Latency

Read-time assembly has to fit inside a real request budget. If your home-feed SLA is 200 ms server-side and ranking wants 80 ms, you are not left with much room for candidate retrieval, feature fetches, cache misses, and hydration.

Freshness

Product teams want the feed to react to new engagement, blocks, mutes, deletes, session intent, and late-arriving features. The more freshness matters, the less a fully materialized timeline resembles the actual feed.

Skew

A design that is excellent for ordinary users can be disastrous for celebrities. One account with 30 million followers can dominate write cost more than millions of normal accounts combined.

That last point is where many engineers still under-think the problem. They talk about scale in terms of total QPS or storage. Mature feed systems are often constrained by skew economics, not averages.

A feed is where graph shape turns into infrastructure bills.

Reader behavior matters almost as much as author behavior. Cold users, inactive users, heavy refreshers, and high-following users should not all receive the same assembly treatment. Pushing eagerly to a user who will not open the app for two days is not freshness. It is wasted work with nice intentions.

The Decision That Defines Everything#

Diagram placeholder

Push, Pull, and Hybrid Feed Assembly

Compare fan-out-on-write, fan-out-on-read, and hybrid assembly so the reader can see where write amplification, request latency, and coherence costs land.

Placement note: Place immediately after The Decision That Defines Everything. The visual should make the cost shift between write-time and read-time unmistakable.

The defining choice is when the expensive assembly work happens.

Fan-out-on-write

When Alice posts, the system identifies her followers and writes a lightweight feed entry into each follower’s inbox or timeline store. Bob later opens the app and reads mostly preassembled candidates.

That sounds elegant because it converts a live read problem into a controlled write problem. For ordinary users, it often is elegant.

Take a small-scale example:

1 million registered users 180,000 daily actives 30,000 authors posting on a typical day average follower count for active authors: 150 average posts per active author per day: 1.5

That yields about 6.75 million timeline insertions per day. Even if each feed entry is 180 bytes before replication, that is still a manageable write footprint. At this scale, fan-out-on-write can look almost embarrassingly effective. Read latency is good. Cache locality is good. Product iteration is fast. The first issues are often pedestrian: bad cache keys, oversized payloads, or overfetch in the ranker.

Now take a larger-scale example:

100 million registered users 25 million daily actives 6 million authors posting on a typical day median follower count: 35 average follower count among posting authors: 220 99.99th percentile follower count: 15 million+

On paper, the average still looks survivable. In practice, the distribution tail changes the economics completely. A single celebrity post can generate millions of candidate writes. A small number of high-reach accounts can account for a wildly disproportionate share of total fan-out work. If ten such accounts post inside the same hour, the architecture is no longer being tested on average throughput. It is being tested on burst absorption, queue isolation, and how much lag the product can hide before users notice.

This is where the clean diagram starts lying. “The average follower count is only 220” is comfort food, not capacity planning.

The write path rarely fails at the database first. It fails at distribution.

Fan-out-on-read

When Bob opens his home feed, the system looks at the follow graph, fetches recent content from authors he follows, merges candidates, ranks them, and returns the page.

This avoids write amplification for high-fan-out authors. The cost is pushed onto readers, which can be a good trade when most content will never actually be consumed by most followers.

But fan-out-on-read has its own lie: writes stay cheap only because reads absorbed the bill.

If Bob follows 50 accounts, fetching recent posts is simple. If Bob follows 2,000, the feed path becomes a multi-stage candidate assembly pipeline with scatter-gather behavior, partial failure, and heavy dependence on cache locality. At that point you are running a search and ranking system on every page request, whether or not you admit it.

Read amplification gets worse at larger scale for two reasons. First, some users follow many active authors, so request-time candidate width grows faster than teams expect. Second, better ranking usually wants a much wider frontier than the user will ever see. A page with 20 visible items may require 500 or 2,000 candidate references upstream. At small scale, that feels tolerable. At large scale, it becomes a permanent tax on latency, backend IO, and ranking spend.

Hybrid

At meaningful scale, most consumer feed systems become hybrid because one global policy stops being honest.

The standard pattern is simple:

push content from ordinary authors into user-specific candidate stores pull content from high-fan-out authors at read time rank on read over the merged candidate set precompute enough to keep latency sane, but not so much that freshness becomes fiction

The better systems go one step further and treat readers differently too:

heavy users and frequent refreshers get more aggressively precomputed candidate windows because the work will likely be consumed cold or inactive users get less eager push, more pull, or coarser precomputed slices because most pushed work would expire unseen high-following users may get author-tiered pull paths because fully pushing their graph is expensive and often low-value

At scale, the fan-out decision stops being “push or pull” and becomes “for this class of author and this class of reader, which bill is less dangerous to pay?”

Request Path Walkthrough#

Diagram placeholder

Hybrid Feed Write Path and Read Assembly

Show canonical post storage, author-post index, fan-out policy classification, push candidate creation, pull retrieval, merge, filtering, ranking, and hydration.

Placement note: Place at the start of Request Path Walkthrough. This should be a flow diagram that makes hybrid coherence feel like the central challenge.

This is where feed systems stop being whiteboard-friendly.

A realistic hybrid feed path has both a write pipeline and a read assembly path, and the exact point where ranking enters the flow is one of the most consequential decisions in the system.

Write path

Suppose Alice publishes a post.

Write to canonical post storage The post is persisted in durable storage with author ID, creation timestamp, visibility rules, media metadata, and any early moderation state.

Write to author-post index The system appends the post to an author-centric store keyed by author ID and time. This is the source for pull-based retrieval later.

Classify fan-out mode A policy service decides whether Alice is handled via push, pull, or mixed treatment. This is usually based on follower count, historical open rates, region distribution, reader activity, or a cost model.

A plausible policy looks like this:

below 100k followers: push 100k to 5M: partial push to high-affinity readers, pull for the rest above 5M: pull only

That threshold is never permanent. It moves with traffic shape, product expectations, and capacity reality.

A mature version of this policy also considers reader value. Pushing to two million dormant followers is not a good use of infrastructure just because the graph edge exists. Many real systems quietly move from full follower fan-out to active-set fan-out long before they admit they have become hybrid.

Enqueue fan-out tasks for pushable audiences For ordinary authors, the system emits fan-out tasks partitioned by follower shard or user segment. Each task writes small feed entries into per-user candidate stores.

This is one of the clearest differences between a 1M-user system and a 100M-user system. At 1M users, “enqueue fan-out work” is mostly a throughput concern. At 100M users, it becomes a queue-discipline concern. You need bounded queues, isolation by author tier, replay safety, hot-shard mitigation, and often regional pacing so one burst does not poison global freshness.

The write path never looks dangerous in architecture diagrams. It looks dangerous in queue age.

Attach lightweight ranking hints Many systems store cheap ranking hints at write time, such as age bucket, graph proximity, author prior, or coarse predicted quality. These are not final scores. They are a way to make later candidate pruning cheaper.

Asynchronous side effects Notifications, counters, moderation updates, activity logs, and search index updates run in separate pipelines. This separation matters because feed freshness should not be hostage to every downstream concern.

The subtlety here is that a write path that looks simple at 1M users becomes a queue-management system at 100M users. It is not “write to follower inboxes.” It is “maintain bounded, isolated, replayable distribution pipelines whose lag does not become product-visible during skew bursts.”

Read path

Now Bob opens the home feed.

Fetch precomputed inbox candidates The feed service loads Bob’s recent preassembled entries from a timeline store or cache. These are usually not final ranked items. They are candidate references, maybe the most recent 500 to 2,000.

Fetch pull-based candidates For authors Bob follows who are classified as pull-only, the system fetches recent posts from author-post indexes. In mature systems this is not all followed authors. It is usually a narrowed frontier: recent active authors, historically high-affinity authors, or other high-yield sources.

That narrowing matters more than many designs admit. A system that naively pulls from the full follow graph spends its latency budget on candidate breadth before ranking has even started. Good feed systems do not ask the read path to rediscover the world from scratch.

Merge and deduplicate The system merges push and pull candidates. If a partially pushed post also appears via pull, deduplication has to be deterministic. This sounds minor until a hybrid bug makes the same post show up twice in the first five slots.

This is one of those bugs that looks trivial in code review and ugly in production.

Apply policy filters Blocks, mutes, privacy checks, soft-deletes, integrity interventions, and regional policy checks are applied here or partly earlier. This is one reason fully materialized final-order timelines age badly. Policy is not static.

Candidate pruning This is the first ranking insertion point. Before expensive ranking, cheap heuristics narrow the set from perhaps 2,000 items to 300. Inputs may include age, author affinity, engagement prior, language match, freshness bucket, and social-context signals.

Feature enrichment For the reduced set, the system fetches richer features such as recent engagement counts, predicted dwell value, negative feedback history, session context, and graph features.

Final ranking A heavier ranker orders the top candidates. This is where freshness pressure collides with precomputation. The more the model depends on near-real-time features, the less faith you can place in precomputed order.

The expensive mistake is not ranking too late. It is ranking expensively against the wrong frontier.

Hydration Post bodies, media URLs, author labels, reply counts, and social context are fetched for the top N items.

Response and cursor emission The service returns, for example, 20 items and a cursor that references a candidate boundary rather than a strict timestamp. That distinction matters when ranking reshuffles content.

Why ranking insertion point matters

A feed system that ranks only after all candidates are fetched behaves very differently from one that stores partially ordered candidate windows ahead of time.

If ranking happens mostly on write, reads are cheap but freshness becomes expensive. If ranking happens mostly on read, freshness improves but latency and cacheability get worse. If ranking happens in two stages, cheap candidate shaping can be precomputed while final ordering stays live.

The sharp judgment is this: if live features decide top-of-feed quality, precompute presence, not final order. Full feed precomputation starts to become theater the moment top-of-feed quality depends on signals that move faster than your materialization cycle.

The ranker gets the glamour. The candidate pipeline gets the blame. In production, the candidate pipeline is usually closer to the truth.

Where the Architecture Hides Debt#

Feed systems accumulate debt in places that are easy to ignore when growth is fast and hard to ignore when incidents begin.

Timeline entry bloat

Teams say “we only store a pointer in the feed.” Then the pointer grows teeth: post ID, author ID, creation time, coarse rank score, feature bucket, source reason, visibility flags, experiment tags, hydration hints, dedupe keys.

The entry that started at 40 bytes becomes 200 bytes. That matters when you keep 1,000 entries per user for tens of millions of active users. A small metadata decision turns into tens of terabytes of hot state.

Rebuild debt

Precomputed timelines are fast right until they need to be rebuilt.

This is where many dashboards tell a flattering story. A system can show excellent steady-state latency and still be fragile because its refill economics are awful. Lose a cache tier or invalidate too much state, and rebuild traffic starts competing with live traffic. The system that looked efficient becomes self-hostile under repair.

Freshness debt

Every time the ranking team adds a signal that changes rapidly, the value of precomputed order declines.

The pattern is familiar:

first, precompute full order then, re-rank the top 100 on read then, re-rank the top 300 because freshness matters more then, widen candidate fetch because the old frontier became biased eventually, the “precomputed feed” is mostly a convenience cache for candidate presence

Nothing broke technically. The architecture just drifted. Teams that miss that drift keep optimizing the wrong layer.

Policy invalidation debt

Deleting a post from canonical storage is easy. Removing it from millions of precomputed feeds is not. The same goes for privacy changes, suspensions, blocks, and legal takedowns.

The more aggressively you materialize, the more cleanup work you create later. Engineers underestimate this because their mental model is append-heavy. Mature feed systems are not append-heavy where it matters. They are mutation-heavy.

Reader-state debt

If the system eagerly builds rich candidate state for users who are inactive, sporadic, or unlikely to refresh soon, the feed quietly accumulates wasted work. A design that looks excellent for your heaviest 10 percent of readers can be economically foolish for the quiet majority.

One of the highest-payoff scale optimizations is often not a better queue or a faster ranker. It is simply refusing to precompute for readers who are unlikely to cash the check.

Graph-churn debt

Feed architecture is not only tested by new posts. It is also tested by new follows, unfollows, returning cold users, and users who suddenly become active again after days away.

That is where materialization starts invoicing you for old decisions. If you fully precompute, follow-graph churn creates backfill and catch-up work. If you fully pull, returning users turn into expensive read-time reconstruction. Systems that survive growth well usually stop pretending the whole graph deserves the same treatment. They push to likely readers, pull for the rest, and treat catch-up as a first-class workload.

Freshness divergence debt

Cache freshness and ranking freshness are not the same thing. A candidate cache may be warm while the ordering logic on top of it is stale, or the ranker may be healthy while the candidate inventory underneath it is missing new posts.

This is one of the nastiest production debts because top-line metrics often stay green. The feed is being served quickly, but it is no longer truthful.

Capacity and Scaling Behavior#

The scaling behavior of a feed is not determined just by DAU or QPS. It is determined by the interaction of graph shape, posting rate, open rate, freshness expectations, and recovery behavior.

Small-scale example: 1M users

Suppose you have:

1M registered users 200k DAU 20k concurrent read peak 50k posters/day average follower count: 120 median posts/day per poster: 1 modest ranking sophistication

A largely push-based design can work well here.

If the system generates 6 million timeline insertions per day, spread evenly enough, that is fine. Reads can hit per-user timeline caches. The product gets a snappy feed. You can tolerate mild staleness because the graph and audience are still forgiving.

The first bottlenecks here are usually not fan-out limits. They are far more ordinary:

poor cache keys oversized feed payloads follow-graph lookup hotspots ranking latency spikes caused by overfetching

This is why smaller systems often learn the wrong lesson. The architecture did not prove itself. The graph was just too polite to expose the tail.

Large-scale example: 100M users

Now suppose:

100M registered users 25M DAU 3M to 5M concurrent feed reads at peak 8M posters/day average follower count: 250 99th percentile follower count: 80k 99.99th percentile follower count: 15M meaningful ranking with live features multiple regions

This is no longer a simple timeline service.

A push-heavy design may still look fine in aggregate. Daily insertion volume might fit inside your storage and queue budget. But averages become misleading. If the top 0.01 percent of authors generate a large share of total fan-out writes, then celebrity activity, not aggregate request count, becomes the real scaling variable.

What breaks first in push-heavy designs

The first real bottleneck is usually distribution queue health for high-fan-out authors, not storage and not ranker CPU.

A single high-reach post can enqueue millions of writes. Those writes are bursty, partition-sensitive, and often region-skewed. If several such authors post inside a narrow window, backlog age climbs. Once backlog age climbs, freshness degrades. Once freshness degrades, users refresh more. Once refreshes increase, read QPS rises against stale or half-filled candidate stores.

That sequence matters because the dashboard that screams first may be feed-read latency, while the root cause is write-side lag.

A dangerous pattern is when the system looks healthy on average while already failing economically. Mean queue lag is low. Median read latency is fine. Average post fan-out looks modest. Meanwhile, celebrity partitions are backing up, freshness age for valuable readers is widening, and cache churn is climbing because users keep refreshing unchanged pages.

A warm cache can hide a dead feed.

There is a cache-economics wrinkle here that clean diagrams miss. Celebrity content is often a great fit for author-centric caches and a terrible fit for per-user page materialization. Pulling one hot author’s recent posts into an author cache can be cheap and truthful. Pushing that same author into millions of per-user page caches is how elegant feed designs become expensive lies.

What breaks first in pull-heavy designs

The first real bottleneck is often candidate retrieval width and tail latency.

If a user follows many active authors, or if the system widens pull candidate sets to preserve ranking quality, request-time fan-out becomes expensive fast. A pull-heavy feed path behaves like a distributed search query over many author shards. p50 can look fine while p99 falls apart because one slow shard delays the merge.

Read amplification is also reader-tier dependent. Heavy users follow more accounts, refresh more often, and expect fresher ranking. They are exactly the readers for whom pure pull gets most expensive. That is why many hybrid systems precompute more aggressively for the most engaged readers, not less.

Where caching helps

Caching helps when it preserves work that is both expensive and reusable.

per-user candidate caches help when users refresh frequently and ranking can still run live over a stable frontier per-author recent-post caches help pull-based retrieval, especially for high-fan-out authors hydration caches help because post bodies and media metadata are shared across many reads graph and policy caches help because follow edges and block edges are read constantly

Where caching creates new cost

Caching becomes dangerous when it stores state that is expensive to invalidate or ages faster than teams admit.

final per-user page caches are cheap to serve and expensive to keep truthful fully ordered precomputed feeds lose value as ranking freshness increases aggressive invalidation can turn policy events into traffic events cache tier loss can turn a read-optimized design into a refill storm

Cache hit rate is one of the least trustworthy feed metrics in isolation. A high hit rate can coexist with stale feeds, brittle invalidation, bad rebuild behavior, and poor ranking freshness.

Failure Modes and Blast Radius

Feed incidents are dangerous because the user-visible symptom is often downstream of the actual failure.

The hardest operational truth is that the dashboard that pages you is often showing the wrong layer. Feed systems fail as chains, not as isolated faults. The publish path, candidate pipeline, cache layer, and ranker can all be individually “mostly healthy” while the product is already showing stale, missing, or delayed content.

Failure mode 1: one celebrity post turns write fan-out into stale feeds

This is the classic push-heavy failure chain.

A creator with 25 million followers posts during a live event. The publish write itself is cheap. Canonical storage succeeds in milliseconds. The post exists. The incident starts after that.

Early signal

queue depth for high-fan-out partitions starts climbing oldest-unprocessed fan-out age increases for one author tier publish-to-feed-visible latency widens for a subset of readers follower-shard workers go hot while most of the fleet still looks normal

This is the moment when experienced teams look at age of oldest fan-out work, not just queue length. Length can lie if producers and consumers are both busy. Age tells you freshness is already being lost.

What the dashboard shows first

home-feed refresh QPS rises median feed latency still looks acceptable p95 or p99 latency starts drifting feed cache churn increases stale-feed complaints appear before hard errors

This is why teams misdiagnose these incidents. The first graph that moves is often read-path load or ranking traffic, not the write-side queue that is actually broken first.

What is actually broken first

The first broken thing is usually not storage and not the ranker. It is the fan-out pipeline’s ability to keep up with skewed write amplification. The post has been published, but the system cannot distribute candidate entries quickly enough to preserve expected freshness.

The explicit chain looks like this:

celebrity post is accepted into canonical storage fan-out tasks are emitted to follower partitions a small number of partitions receive a burst measured in millions of writes queue age grows for that author tier user inboxes are not updated on time users open the app and do not see the new post users refresh repeatedly because the feed appears unchanged read QPS and cache churn increase ranker processes more requests against thinner or staler candidate sets on-call sees a feed-read incident even though the root failure began in the publish-to-distribute path

The worst feed incidents often start with a post that was successfully published.

Teams usually discover this late: the painful bug is not “publish failed.” The painful bug is “publish succeeded, then freshness died quietly behind it.”

Immediate containment

stop treating the celebrity path like ordinary fan-out shift that author tier to pull-on-read or partial pull immediately shed low-value push to cold readers rate-limit or batch fan-out for the hot author tier protect ordinary-author queues from celebrity backlog contamination relax freshness targets for low-value feed surfaces

The goal is not to finish all pending writes at any cost. The goal is to stop backlog age from spreading.

Durable fix

separate queues by author reach tier separate worker pools separate lag SLOs explicit pull fallback when projected fan-out cost crosses threshold publish-to-visible latency tracked by author class and reader class

Celebrity handling is not an optimization bolt-on. It is part of the architecture’s truth.

Longer-term prevention

simulate high-reach publish bursts in load tests maintain per-tier capacity models budget write amplification by author percentile, not just posts per second keep age-of-oldest-unserved-post by reach tier on primary dashboards validate that one hot author cannot poison ordinary-user freshness

If a 20M-follower author can enter the same write path as a 2k-follower author without strong isolation, the architecture is already lying to you.

Failure mode 2: pull-based assembly survives writes and then collapses under read spikes

Pull-heavy designs fail with a different signature. The write path looks beautiful during incidents. Posts publish cleanly. There is little backlog. The system feels elegant right until readers arrive in volume.

Early signal

fan-out writes remain low and healthy candidate fetch breadth per request rises read-time scatter-gather width increases for heavy users p95 upstream fetch latency creeps up before visible error rates do ranking queue wait time grows even when ranker CPU does not look fully saturated

Healthy publish metrics fool teams here. The write path is green, so they assume the system is healthy.

What the dashboard shows first

feed latency p99 drifts while p50 stays fine timeout rates rise for only a subset of requests candidate count delivered to ranker falls users report repetitive or thin feeds rather than outright failure

This is a nasty failure shape because the feed can remain “up” while quality collapses.

What is actually broken first

What breaks first is not necessarily raw storage IO. Often it is the economics of live assembly:

too many authors need to be consulted too many candidate fetches are required to build a decent page the ranker still expects a wide frontier even though retrieval is slowing a few slow shards dominate tail latency cache reuse is poor because each user’s frontier is highly personalized

In other words, the architecture is paying too much live assembly cost per request.

The system can stay technically available long after it stops being cheap enough to be healthy.

Immediate containment

cap candidate breadth per author tier reduce pull depth for low-affinity authors shrink ranker input set skip expensive feature fetches serve from recent candidate caches even if slightly stale pin part of the feed to reverse chronological or coarse heuristics temporarily

The goal is to stop every read from behaving like a distributed search query under distress.

Durable fix

more aggressive candidate precomputation for heavy readers author-tiered pull policies per-author recent-post caching two-stage ranking so expensive ranking sees a smaller frontier better locality and partitioning for author-post retrieval

If the system needs 2,000 live candidates to find 20 good items, that is not just a scaling problem. It is a candidate-generation problem.

Longer-term prevention

measure candidate width at each pipeline stage track publish-to-visible latency for pull-only authors monitor ranker input size under normal and degraded modes quantify quality loss when candidate frontier is thinned track refresh amplification during partial slowness

Pull-heavy designs usually fail on read amplification before they fail on simple request count.

Failure mode 3: hybrid logic becomes inconsistent under partial failure

Hybrid systems reduce worst-case cost, but they buy that relief with coherence risk. Under partial failure, they can produce feeds that are technically served and semantically wrong.

A post may be pushed to some readers, pulled for others, deduped incorrectly for a third segment, and ranked against stale features everywhere. This is not a theoretical inconvenience. It is a real production class.

Early signal

duplicate rate increases for mixed-source candidates publish-to-visible latency diverges by reader segment one region or platform sees the post while another does not dedupe mismatches rise when push and pull overlap visibility metrics disagree across serving paths

These are easy to miss because none of them necessarily trips a classic availability alert.

What the dashboard shows first

users report “my post is visible on one device but not another” feed opens remain normal, but engagement on fresh posts drops ranking latency looks acceptable candidate cache hit rate remains high support sees complaints about missing or repeated content

The system looks green enough. The product feels broken.

What is actually broken first

What breaks first is usually consistency between assembly paths:

push path may be lagging pull path may be healthy but narrower dedupe may use different recency windows policy filters may have updated in one path but not the other cache freshness and ranker freshness may be based on different clocks

Hybrid systems usually fail at the seams.

The ugly practical reality is that mixed-path correctness bugs rarely look like outages. They look like users losing trust.

Immediate containment

choose a temporary source of truth for the affected author or reader segment disable one assembly path for the problematic tier widen or relax dedupe windows temporarily bypass stale partial caches if they are producing contradictions freeze ranking experiments that depend on unstable candidate joins

The wrong move is to keep multiple unstable truths alive and hope eventual consistency will be polite.

Durable fix

one canonical candidate identity model across push and pull deterministic dedupe keys shared across all sources aligned freshness semantics for caches and ranking features observability that compares path outputs, not just component health publish-to-visible SLOs per assembly path

Longer-term prevention

rehearse partial failure with one path lagging and another healthy test source disagreement as a first-class failure mode instrument candidate provenance in production keep dashboards for visibility gaps across reader segments treat duplicate and missing-content rate as production metrics, not QA metrics

Hybrid reduces average cost. It also creates more ways for the system to be wrong without being obviously down.

Failure mode 4: cache freshness and ranking freshness diverge

This failure is quietly common in mature feed systems.

The feed service may be returning cached candidate sets quickly. The ranker may be healthy. Page latency looks good. The user experience is still stale because candidate freshness and ranking freshness have drifted apart.

Early signal

recently hot posts arrive late into candidate pools engagement-reactive signals lag behind candidate delivery publish-to-visible latency looks fine for some tiers and poor for others ranking scores are computed over old candidate frontiers

What the dashboard shows first

Almost nothing dramatic:

cache hit rate remains high feed latency remains low ranker latency stays within budget no storage tier is obviously red

This is the dangerous class where every subsystem looks respectable on its own.

What is actually broken first

What is broken is the contract between layers. The feed is fast, but it is no longer temporally coherent. A hot post is missing because candidate assembly lagged. An older post remains overexposed because page caches are warm. The feed feels dead before the latency dashboards admit anything is wrong.

Immediate containment

shorten candidate cache TTLs for hot author tiers bypass final-page caches for rapidly changing inventory classes bias ranking toward fresher sources temporarily reduce reliance on live signals known to be lagging explicitly surface recent-content windows during freshness events

Durable fix

separate metrics for candidate freshness, ranking-feature freshness, and publish-to-visible latency align freshness budgets across retrieval and ranking make ranking source-aware so stale candidate pools are treated differently design caches according to how quickly each inventory class ages

Longer-term prevention

measure “how old is the newest visible post” by reader cohort track whether the ranker received a fresh frontier, not just whether it answered fast measure not just “post persisted” but “target readers saw it in time”

Fast is not the same as fresh. Feed teams learn that one painfully.

Failure mode 5: one subsystem looks healthy while the feed is already stale

This is the operational pattern that separates feed systems from simpler serving stacks.

It is completely possible to have:

a healthy publish service normal storage latency a healthy ranker warm caches low error rate

And still have delayed post visibility, stale middles of feed, or contradictory freshness across reader cohorts.

Why? Because the real product metric is not “did every subsystem respond.” It is “did fresh, eligible, relevant content reach the right reader within the expected time window.”

Early signal

publish-to-first-visible latency age of newest item in top N by reader cohort stale-refresh rate, users refreshing without meaningful feed change candidate frontier freshness by author tier and reader tier

What the dashboard shows first

refresh volume rises engagement on new posts drops session depth falls user complaints increase while infrastructure still looks green

What is actually broken first

Usually something hidden in assembly:

backlog in one fan-out tier stale candidate cache not invalidated for a hot author slow pull retrieval for one reader class ranking consuming candidate sets that are technically served but temporally out of date

Immediate containment

stop optimizing for cache efficiency and start optimizing for visible freshness isolate the affected assembly tier degrade low-value traffic first expose a simpler but fresher path if needed

Durable fix

publish-to-visible latency SLOs feed freshness histograms by cohort visibility-gap metrics between author tiers and reader tiers “no meaningful feed change after refresh” tracking candidate provenance and age beside latency metrics

Longer-term prevention

The long-term fix is partly cultural. Teams have to stop asking only “is the service up?” and start asking “is the feed fresh for the users who matter right now?”

Trade-offs#

The cleanest way to frame the decision is not “what are the pros and cons.” It is “which lie can your system afford?”

Push is the right lie when read frequency is high, cache reuse is real, and most of the work you precompute will actually be consumed. You are paying extra write cost to avoid paying the same read cost over and over again.

Pull is the right lie when attention is unpredictable, follower skew is brutal, and distributing work eagerly would mostly be waste. You are accepting request-time cost because unused fan-out is the more expensive failure.

Hybrid is the only honest answer when one global policy stops matching the graph. If reader behavior is uneven, celebrity skew is real, and ranking freshness matters, then treating all authors and all readers the same is not simplicity. It is denial with nicer diagrams.

That is the real trade. Push is cheaper when readers are predictable. Pull is cheaper when attention is unpredictable. Hybrid exists because large social graphs are neither.

What Changes at 10x

Most feed architectures do not die because QPS went up 10x. They die because skew, recovery, freshness, and reader heterogeneity finally get a vote.

From 1M to 10M users, teams often stretch a push-heavy design with better queueing, larger caches, partial celebrity exceptions, and some read-time reranking. That phase is dangerous because it teaches the wrong lesson. It makes the architecture look more durable than it is.

From 10M to 100M users, different questions take over. Celebrity handling becomes a capacity tier, not a special case. Recovery becomes architecture, not cleanup. Reader segmentation matters because cold users and heavy users should not consume the same assembly budget. Ranking freshness starts competing directly with precomputation value. Multi-region locality starts affecting queue lag and cache warmth. Tooling has to tell you not just whether feed reads are fast, but whether candidate quality, freshness age, celebrity lag, and refill pressure are drifting apart.

This is overkill unless the product is already feeling real skew, real freshness pressure, or painful refill incidents. But once those show up, treating feed assembly as a storage problem is not disciplined engineering. It is denial.

Operational Reality

The system usually ends up with explicit operational truths:

feed freshness has an SLO publish-to-visible latency has an SLO fan-out lag has a budget celebrity traffic is isolated expensive enrichments can be skipped policy invalidations are staged refill traffic is rate-limited degraded ranking modes are acceptable for short periods low-value push to cold readers can be deferred during stress

That matters because the product requirement is not actually “always serve the perfect feed.” It is much closer to “always serve a believable feed, protect the stack, and recover without turning one failure into three.”

In practice, teams often accept:

mildly stale middle-of-feed content lower-rank precision during incidents shorter candidate windows weaker personalization for very large authors delayed social counts reverse-chronological fallback slices

That is not lack of ambition. It is operational maturity.

The production reality most architecture writeups miss is this: during an incident, the team is usually not trying to preserve one ideal feed. It is trying to preserve three things in order: fresh-enough top-of-feed content, believable behavior on refresh, and bounded blast radius across author and reader tiers. Everything else becomes negotiable.

The late lesson is that operational complexity rarely announces itself as complexity. It shows up as one more special case, one more queue, one more cache, one more fallback you swear is temporary.

When the feed looks unchanged, users stop being consumers and start becoming load generators.

Common Mistakes Engineers Make#

Mistake 1: optimizing the post store instead of the assembly boundary

Teams spend months tuning storage, then get surprised when the first real bottleneck is fan-out queue age or request-time candidate width. Posts rarely break the system first. Assembly does.

Mistake 2: using average follower count as if it were a planning metric

A feed graph with a modest average can still be dominated by a brutal tail. If your capacity model is based on average fan-out, you are modeling a product you do not actually run.

Mistake 3: treating celebrity handling as optional

If high-reach authors can melt the write path, celebrity logic is not an optimization. It is core architecture. Designing it late is how teams end up with ordinary users paying for celebrity traffic.

Mistake 4: precomputing too much final order

Candidate presence ages better than final order. Engineers often materialize too much ranking too early, then spend the next year compensating for freshness drift with increasingly expensive read-time patches.

Mistake 5: measuring component health instead of feed truth

A healthy publish service, healthy ranker, and warm cache can coexist with a stale feed. If you do not measure publish-to-visible latency and freshness by cohort, you are operating blind.

Mistake 6: treating refill as an afterthought

A system that is fast only while caches are warm is not robust. It is lucky.

When To Use#

Use a push-heavy or hybrid feed when:

users refresh the home feed frequently low-latency reads matter a lot the system can tolerate some staleness in candidate assembly much of the feed value comes from recently available candidate pools rather than full live search the team can invest in queueing, replay, isolation, and invalidation tooling

Use a hybrid feed in particular when:

follower skew is strong celebrity traffic is materially different from ordinary traffic ranking needs live freshness but not fully live candidate search for every item cache locality on reads matters, but full precomputation has started to drift from reality reader behavior is uneven enough that heavy users deserve more precomputation than cold users

When NOT To Use#

Do not build a heavy hybrid feed stack if the product does not need it.

If you have:

a small graph bounded follower counts low posting frequency simple reverse-chronological ordering early-stage product uncertainty

then a simpler pull model or modest push model is usually the right answer.

Many teams build for a 100M-user graph when they have a 200k-user product. That buys them years of operational burden before they earn the benefits. A feed system should be allowed to become complicated only when the graph, freshness demands, and incident history prove that simplicity has become dishonest.

How Senior Engineers Think About This#

Senior engineers do not start with component diagrams. They start with failure economics.

They ask:

Where does the assembly cost land? Which users actually justify eager precomputation? What is the first bottleneck under celebrity skew? What is the graceful degradation story if fan-out lags? What is the recovery story if timeline caches disappear? Which ranking decisions must stay fresh, and which can be precomputed? How much invalidation work are we creating by materializing more state? What will the incident dashboard show first, and what will actually have failed first?

They distinguish:

candidate presence from final order steady-state throughput from refill throughput average graph behavior from cost-dominant graph behavior a fast feed from a fresh feed a ranking problem from a candidate-assembly problem

Most of all, they understand when the original architecture has started lying. The design that once looked elegant starts hiding lag, skew, and stale ranking behind caches and heuristics. Senior engineers recognize that moment earlier. They stop defending the old mental model and start redrawing the assembly boundary.

At 1M users, a feed architecture can get away with being elegant. At 100M, it has to be honest.

Summary#

News feed architecture is defined by one decision: when feed assembly happens. Fan-out-on-write pushes cost into the write path and buys cheap reads until skew and invalidation make that bargain expensive. Fan-out-on-read pushes cost into the request path and buys flexibility until latency, scatter-gather behavior, and ranking cost make that bargain painful. Hybrid exists because large feed products do not have one graph shape, one freshness need, or one safe place to put the bill.

The mature lesson is simpler and harsher than most diagrams suggest: the wrong feed architecture is not the one that looks unsophisticated. It is the one that pretends one fan-out policy can stay honest across a graph that no longer behaves like one population. At 100M users, the real design is whatever still tells the truth about where the cost lands.

Payment Architecture: Atomicity, Idempotency, and the Retries That Move Money

Job Schedulers and the Failure Modes That Wait for the Weekend

Outbox and Inbox: Reliable State Propagation Without Wishful Thinking

The core trade-off is simple to state and expensive to operate:

Distribution cost

Latency

Freshness

Skew

Push, Pull, and Hybrid Feed Assembly

Fan-out-on-write

Fan-out-on-read

Hybrid

The standard pattern is simple:

Hybrid Feed Write Path and Read Assembly

Write path

A plausible policy looks like this:

Read path

Why ranking insertion point matters

Suppose you have:

Now suppose:

What breaks first in push-heavy designs

What breaks first in pull-heavy designs

Where caching helps

Where caching creates new cost

Failure Modes and Blast Radius

Failure mode 1: one celebrity post turns write fan-out into stale feeds

Early signal

What the dashboard shows first

What is actually broken first

The explicit chain looks like this:

Immediate containment

Durable fix

Longer-term prevention

Failure mode 2: pull-based assembly survives writes and then collapses under read spikes

Early signal

What the dashboard shows first

What is actually broken first

Immediate containment

Durable fix

Longer-term prevention

Failure mode 3: hybrid logic becomes inconsistent under partial failure

Early signal

What the dashboard shows first

What is actually broken first

What breaks first is usually consistency between assembly paths:

Immediate containment

Durable fix

Longer-term prevention

Failure mode 4: cache freshness and ranking freshness diverge

Early signal

What the dashboard shows first

Almost nothing dramatic:

What is actually broken first

Immediate containment

Durable fix

Longer-term prevention

Failure mode 5: one subsystem looks healthy while the feed is already stale

It is completely possible to have:

Early signal

What the dashboard shows first

What is actually broken first

Usually something hidden in assembly:

Immediate containment

Durable fix

Longer-term prevention

What Changes at 10x

Operational Reality

Mistake 1: optimizing the post store instead of the assembly boundary

Mistake 2: using average follower count as if it were a planning metric

Mistake 3: treating celebrity handling as optional

Mistake 4: precomputing too much final order

Mistake 5: measuring component health instead of feed truth

Mistake 6: treating refill as an afterthought

Use a push-heavy or hybrid feed when:

Use a hybrid feed in particular when:

If you have:

They ask: