Limits: The Engineering Decision You Keep Postponing—Until Production Makes It For You
What running at scale taught us about the limits we didn’t set.
Prologue — Why “We’ll Add It Later” Is a Trap
Every system starts with the same quiet assumption: we’ll add limits later.
And at the start, that’s a fair call. Data is small. Users are few. Edge cases are theoretical. Adding limits up front feels like premature optimization — or worse, a bad user experience you’re imposing on people who haven’t done anything wrong yet.
So you skip them.
Then, months or years in, the system starts answering back. Not with errors. Not with crashes. With something far more dangerous: it gets slow. Inconsistent. Unpredictable. Users file vague tickets like “the app feels weird today” and your dashboards look almost fine.
That’s the moment you realize limits were never optional. They were just deferred — and deferred limits always come due with interest.
This is the story of the incidents that reshaped how we think about limits — and why stress testing is the cheapest way to find them before your users do.
🚨 Incident 01 — The Save That Stopped the World
The first incident wasn’t a crash. It was a plateau. CPU had climbed, stuck, and stayed there. Latency tails were drifting up across the platform — every user a little slower, with no single request to blame. The graphs were red, but they weren’t telling us anything specific.
After enough time in logs and traces, the pattern emerged. This wasn’t one rogue survey — it was a whole category of them. Surveys with large conjoint questions, where editing features and levels meant bulk-writing hundreds of related entities at once. Every one of those edits funneled into the same hot path.
A quick primer: a conjoint question is built from features (like price or color) and levels (specific values per feature). The math is combinatorial — 10 features × 20 levels is 200 entities for a single question, and surveys often have several.
Every bulk edit touched all of them in one transaction. That matched the blow-up pattern exactly — and it was baked into the feature itself, not a misuse of it.
The entire hot path came down to one call:
await levelRepository.saveAll(levels);
It looks innocent. It wasn’t.
repository.save() in TypeORM is an upsert. For every entity in the array, it checks whether the row exists, compares it field-by-field against the database version, and deep-merges nested relations before persisting. On small arrays, it’s fine. On arrays of a few thousand related entities, the comparison logic degrades into O(n²) — and all of it runs as synchronous JavaScript on the main thread.
While that code ran, nothing else on the Node process could make progress.

FIG. 01 · Event loop blocked by synchronous CPU work on a single request.
We fixed the immediate problem quickly — an optimized bulk-write strategy that skips the per-entity comparison — and latency recovered within hours. We’re also moving CPU-heavy work to worker threads, so no future slow operation can starve the event loop. But the fix wasn’t the lesson.
The lesson was everything that had to be true for this to happen.
-
No structural limits. A conjoint question could grow to hundreds of features and levels, and a survey could contain several — with nothing to stop it. A cap on features and levels per question would have ended the story right there. One user’s edit could never have become everyone else’s problem.
-
No stress testing. Even with limits in place, we wouldn’t have known whether they were safe — because the write path had never been tested at scale. The ORM’s O(n²) cliff was invisible at small data sizes, which is exactly the kind of thing stress testing exists to find.
Limits decide what the system allows. Stress testing proves the system can handle what it allows. With neither, production ended up being the first real test — the hard way to find out.
Without limits, one user can degrade the system for everyone. Without stress testing, you won’t know until they do.
🚨 Incident 02 — The Limit We Didn’t Set (So Browser Did)
A customer opened a survey with 800+ questions in a single block. Many of those questions had <audio> tags embedded in their content, and when the survey loaded, the browser dutifully started creating audio players for every one of them.
It didn’t finish. Somewhere past the first few dozen, Chrome stopped cooperating. The page froze. The console filled with errors:

FIG. 02 · Chrome's WebMediaPlayer limit hit silently when rendering too many audio elements.
Chrome has a hard cap on the number of simultaneous WebMediaPlayer instances per page. It’s baked into the browser — you can’t raise it, you can’t bypass it. Exceed it, and every subsequent media element silently fails to initialize.
The fix was virtualization — render only what’s in the viewport, defer the rest. But virtualization wasn’t the lesson. It was the workaround.
The lesson was that Chrome already had a limit. We just hadn’t matched it. We allowed effectively unbounded questions per block, each free to embed any number of media tags. Chrome allowed a few dozen concurrent players. One of those numbers was fixed; the other wasn’t. They were always going to meet — and the browser was always going to win.
A product-level cap, set well below Chrome’s, would have kept the collision from ever happening. Users would see a clear message from us, instead of a silent failure from the browser. We didn’t have that cap. So Chrome drew the line for us.
Every layer of your stack has limits. The only question is whether you set yours, or discover someone else’s.
🚨 Incident 03 — The Limit on What’s Awake
The first two incidents failed loudly. One froze a backend thread, the other froze a browser tab. The third one didn’t fail at all. It just stopped being usable.
A customer with a 200-question survey opened the editor, expanded a few blocks to work across them, and the UI began to crawl. Clicks took a beat. Scrolling stuttered. Typing lagged behind the keystrokes.
We assumed it was the question count. It wasn’t — not on its own. A block with 200 questions rendered fine if nothing else was expanded. A block with 50 questions rendered fine if three others were also open. The slowness only showed up when enough of the UI was active at the same time.
We hadn’t put a limit on that. Any number of blocks could be expanded, each rendering all of its questions, inputs, and validations. A single user could, with a few clicks, ask the browser to keep hundreds of stateful React subtrees alive and reactive at once.

FIG. 03 · Same survey, different activation patterns. Cost scales with what's awake, not with what exists.
The fix was to bound the active state. Above a threshold, only one block stays expanded at a time; opening another collapses the previous. The UI recovered immediately — not because rendering got faster, but because we stopped asking the browser to keep the whole survey hot in memory.
Cost scales with what’s awake, not with what exists. If you don’t cap what’s active, the browser will — by slowing down until users stop.
💡 Limits You Don’t Have to Enforce
After three incidents, we were convinced: limits needed to be first-class.
The most obvious limit to add was a cap on the total number of answer entities a survey can hold. The enforcement seemed straightforward — on every update, count the answers, reject if over the cap.
Before we shipped it, we did the thing we hadn’t done in any of the previous incidents. We stress-tested it.
It caught the problem immediately.
Counting total answers meant walking the entire survey tree on every update. The walk was cheap on small surveys. On large ones, it grew expensive — and the larger the survey, the more often it ran, because larger surveys see more edits.
A naive reading of “enforce the limit” would have placed a growing traversal directly in the hot path of every write. We were about to recreate Incident 01 on purpose.

FIG. 04 · Same limit, two ways to enforce it. One costly. One free.
So we stepped back and asked a different question: do we need this check at all?
We already had limits at lower layers — a cap on answers per question, and a cap on questions per survey. Multiply those together and the total answer count is already bounded, without any runtime check. The survey-level cap we wanted to enforce was already enforced, for free, by the composition of the limits beneath it.
We removed the check from the design. The final set of limits stayed at the layers where they naturally belonged — per question, per row, per column — and the aggregate took care of itself.
This was the turning point for how we thought about limits. The previous incidents had taught us that limits needed to exist. This one taught us something subtler: where you enforce a limit matters more than the limit itself. A cap enforced at the wrong layer is just a new bottleneck. A cap guaranteed by composition is free.
The best limits are the ones the system already guarantees. If a cap can be derived from lower-level constraints, enforcing it again at runtime is overhead with no benefit.
🧠 A Better Mental Model for Limits
After three incidents and one near-miss, limits stopped being ad-hoc reactions and became a design vocabulary.
| What kind of limit | If you don’t set it… |
|---|---|
| Input — what the system accepts | One request’s CPU work starves the event loop |
| Output — what ends up on the page | The browser enforces its own limits, silently, on your users |
| Active state — how much is alive at once | The UI feels broken without technically being broken |
| Composition — aggregates derived from lower-level caps | You build a runtime check the system didn’t need |
| Enforcement — where the limit lives | The right limit becomes the new bottleneck |
Two of these matter most. Composition saves you the most work — the best limits are the ones you don’t have to enforce. Enforcement is the easiest to get wrong — the right limit in the wrong place is still a bottleneck.
And underneath all of them: stress testing is how you find out whether your limits actually hold. Without it, production does the testing for you.
✅ Limits Are Guarantees
By the time we were done, we understood limits weren’t a constraint we had to add to the product. They were a set of guarantees we already needed:
- One user’s actions can’t slow down the experience for everyone else.
- The browser doesn’t silently enforce its limits on your users.
- The interface stays responsive no matter how much data it holds.
- Your safety checks don’t become the bottleneck they were meant to prevent.
That reframing changed how we designed features. “What’s the limit?” moved from one of the last questions in a design doc to one of the first.
If you don’t define limits deliberately, your system will discover them accidentally — in production, under load, at the worst possible time.