An engineering lead walked me through the incident over coffee. A 2 AM call, half the team still muted, nobody wanting to answer the question that was already obvious. The login page had been getting hammered all night. Thousands of requests per second, all from different IPInternet Protocolhow each device identifies itself on the network addresses, all targeting real accounts. The authentication was solid. The team had hashed passwords properly. None of it mattered. The attacker didn’t need to break the lock. They just needed unlimited tries at picking it.

What fixed it wasn’t fancy. Just a counter that said “you get ten tries, then you wait.” Twenty minutes to deploy. Stopped the bleed immediately. Listening to him tell it, I remember thinking: this is the most boring, most important thing in the entire stack.

Think of it like the number dispenser at a deli counter. Everyone gets served, but there’s a pace. You pull a ticket, you wait your turn. The system works because it assumes a normal flow of customers. If someone tries to grab 50 tickets at once, the people behind the counter are going to notice. If 500 people walk through the door in the same second, something is clearly wrong. Rate limiting is the mechanism that decides: is this a lunch rush, or is this something else entirely?

The simplest version of a hard problem

Rate limiting answers one question: “How many times should I let this thing happen in a given window?”

The “thing” might be login attempts, password reset requests, or APIApplication Programming Interfacethe way two software systems talk to each otherLearn more in The Bouncer That Confuses Everyone → calls. The “window” might be per second, per minute, per hour.

The concept is dead simple. Getting it right is the hard part.

When unlimited tries break everything

Imagine your login page has no rate limit. An attacker can try a thousand passwords a second against a known email address. The hashing might be perfect, the storage airtight. None of that matters if someone can take a million swings at it.

The same logic applies to two-factor codes. A six-digit TOTPTime-based One-Time Passwordthe rotating code from your authenticator app has a million possible combinations. Without rate limiting, an attacker can try all of them in a few minutes. With it, that same attack would take years. The deli counter doesn’t make the lock harder to pick. It just makes sure nobody gets to stand there trying keys all day.

Knowing who to limit

A deli counter works because you can see each person walk up. The internet doesn’t work like that. What if someone sends ten friends to pull tickets for them? What if people start coming in through the back door? What if half the people in line are wearing the same uniform and you can’t tell them apart?

That’s the problem with rate limiting on the internet.

The obvious approach is to limit by IPInternet Protocolhow each device identifies itself on the network address. One IPInternet Protocolhow each device identifies itself on the network, ten requests per minute, done. This breaks almost immediately in the real world. Corporate offices route hundreds of users through a single IPInternet Protocolhow each device identifies itself on the network address. A rate limit that blocks by IPInternet Protocolhow each device identifies itself on the network will lock out entire companies because one person triggers the threshold. Mobile carriers share IPInternet Protocolhow each device identifies itself on the network addresses across thousands of devices. VPNVirtual Private Networka tool that routes your internet traffic through a shared server users all appear to come from the same place. It’s like cutting off the entire deli line because one customer was being rude.

So you limit by user account instead. But an attacker targeting a specific account can just spread their attempts across many IPInternet Protocolhow each device identifies itself on the network addresses. And if the account gets locked after too many attempts, the attacker has found a way to cause a denial of service, making the system unusable for the real owner. They can lock anyone out of their own account just by failing to log in enough times.

There’s no single key that perfectly identifies “who is making this request” in all contexts. Good rate limiting usually combines multiple signals: IPInternet Protocolhow each device identifies itself on the network address, user account, APIApplication Programming Interfacethe way two software systems talk to each otherLearn more in The Bouncer That Confuses Everyone → key, device fingerprint (a rough identity pieced together from browser and device details). Each signal is imperfect. Together, they cover more ground.

The counting problem

How you count requests matters more than you’d expect. Think of it as different ways to manage that deli counter.

The simplest approach is a fixed window counter. Reset the ticket numbers at the top of every hour. Straightforward, except for one edge case that matters: show up at 11:59, grab your limit, wait one minute, grab another batch at noon. An attacker who times it right doubles their rate at the boundary. More sophisticated approaches blend windows together or use refilling “buckets” that let regulars place bigger orders during slow periods but tighten things up when the counter gets crowded.

But choosing the right algorithm only solves half the problem. The harder question is what happens when the person gaming the system looks exactly like every other customer.

When the bots start pulling tickets

Rate limiting used to be about stopping simple scripts and brute-force tools. The traffic patterns were predictable. A bot hammering your login endpoint would generate a wall of identical requests from the same IPInternet Protocolhow each device identifies itself on the network at inhuman speed. Easy to spot, easy to block. At the deli counter, it was the person screaming “NEXT” fifty times in a row. Obvious.

That’s not what it looks like anymore. AI-powered bots mimic human behavior. They randomize request timing and rotate user agents (the identifier your browser sends to tell a website what software you’re using). They solve CAPTCHAsCompletely Automated Public Turing test to tell Computers and Humans Apartthose puzzles designed to verify you're a person and distribute attacks across residential proxy networks, services that route traffic through real people’s home internet connections. Each request looks like it’s coming from a different house. At the deli counter, these bots don’t scream. They walk in calmly, pull one ticket, wait politely, and do it again from a different door. Fifty times.

At the same time, legitimate AI tools are flooding APIsApplication Programming Interfacethe way two software systems talk to each otherLearn more in The Bouncer That Confuses Everyone → with traffic. When an AI coding assistant fires off dozens of APIApplication Programming Interfacethe way two software systems talk to each otherLearn more in The Bouncer That Confuses Everyone → calls in seconds trying to debug a problem, is that a user or an attacker? The old rules of thumb break down here.

Static rate limits don’t hold up against that. The systems that survive are the ones that can read context, not just count requests. But telling the difference between a bot pretending to be human and a human who happens to behave like a bot? Nobody’s cleanly solved that one.

The number nobody picks right

Here’s where most teams get it wrong. They treat rate limiting like a security toggle: flip it on, set a number, move on. But the number you pick is a product decision disguised as an infrastructure one.

Set it too low and you’ve told your best customers to slow down. It’s like the deli putting a sign on the door that says “one order per visit.” Your regulars, the ones who keep the lights on, start going somewhere else. An APIApplication Programming Interfacethe way two software systems talk to each otherLearn more in The Bouncer That Confuses Everyone → client that hits a rate limit during normal usage will generate support tickets, slow down integrations, and frustrate partners. Too high and the counter might as well not be there.

Getting the number right means understanding your threat model (a clear picture of who might attack you, how, and what they’re after) alongside your actual usage patterns. What does normal look like for your busiest legitimate customer? What does an attack look like? Your rate limit lives in the gap between those two numbers.

If those two numbers overlap, you’ve got a problem a simple counter can’t solve.

The boring thing that saves you

Nobody writes a blog post celebrating their rate limiter. Until the night they need it. When a credential stuffing attack (stolen username and password pairs from one breach, tried against your system) hits your login page at three in the morning, that’s when you notice. When a bot starts scraping your entire database through the APIApplication Programming Interfacethe way two software systems talk to each otherLearn more in The Bouncer That Confuses Everyone →, or an attacker discovers an expensive endpoint and starts running up your cloud bill, rate limiting is the difference between “we noticed an anomaly” and “we’re down.”

Here’s what sits with me, though. Rate limiting works best when traffic is predictable. The moment it matters most is the moment traffic stops being predictable. A real surge, the kind that could bring your system down, is also the moment when it’s hardest to tell who should be in line and who shouldn’t. And as AI agents start generating more legitimate high-volume traffic, the gap between “normal user” and “attacker” isn’t getting wider. It’s getting narrower. The deli counter still works. But more and more, the people grabbing 50 tickets have a perfectly good reason to.

I’m not sure rate limiting is ready for a world where it can’t tell the difference.

Previously on Off White Paper: last week, the bouncer that confuses everyone looked at which origins the browser lets through the door. Rate limiting is what happens when too many of them show up at once.