Right Ways of API Rate Limiting

API rate limiting is essential for every API. Here are the best practices for implementing rate limits
Picture of aden.forshaw
aden.forshaw

What is API Rate Limiting?

Rate limiting, also called throttling, is the process of restricting the number of requests that an API can process. In other words, if an API receives 1,000 requests per second from a single user, a rate limit could restrict that to a manageable number such as 10 or 20. Rate limits can apply to individual keys or overall traffic to an API.

Just like a fire marshal restricts the number of occupants allowed in a building, a rate limiter restricts the number of requests allowed to an API. 

Sometimes, too much of a good thing can be a bad thing.

In 2020, the world quickly had to adapt to new restrictions. Seeing friends and family may be good, but too much of it could be dangerous. Going to work may be good, but being in the office could be hazardous. People had to adapt rapidly to new limitations.

Whether you’re talking about social distancing or API requests, the same principle applies: healthy performance requires healthy limitations. Too many customers and a store become crowded. Too many API requests and your server becomes overloaded. That is why API rate limiting is an essential practice for APIs, but like everything, there’s a right and a wrong way to do it.

API rate limit offenders can disrupt your service by:

Consistent Experience. Everybody has had the experience of being in a convenience store, waiting in a long line of people while the cashier is busy with one overly-complicated transaction for what feels like a lifetime. Because one user is asking more than usual from the service, the entire line is backed up. Similarly, one user submitting an excessive number of API requests affects the benefit of every other user trying to use the same API. This could either be due to a malicious attack, poor design, or simply the legitimate needs of users, but putting a limit on the number of requests that can be made safeguards the experience of everybody using your API.

Cost Management. Resources cost money, and API requests use resources. This can directly affect the bottom line in many ways, ranging from simple memory usage to lost customers who have gotten frustrated with an inaccessible API. Excessive API requests harm the experiences of other users, including your most valued customers. Even APIs that serve static content can be affected by this issue, as unlimited requests for this static content can drastically impact your bottom line. Therefore, API rate limits are essential for every API, regardless of the type of resource being provided.

Protected Services. While rate limiting is necessary to regulate the day-to-day legitimate users of your API, it also protects against another critical category: malicious attacks. Bad actors can abuse your API by submitting unnecessary requests that clog the communication channels for your legitimate users.

The Consequences of Ignoring Rate Limits

Rate limits are essential for your API, but what are the dangers of neglecting them? Whether malicious or not, the consequences of ignoring rate limiting can be extreme.

DoS and DDoS Attacks

Denial of Service (DoS) attacks occur when a lousy actor floods an API with requests. These attacks stop legitimate users from being able to access resources. Similarly, Distributed Denial of Service (DDoS) attacks have the same goal of flooding an API with requests but use “distributed” users (users of separate machines) to make these requests from more than one source. That makes DDoS attacks harder to prevent and the culprits more challenging to identify. Rate limits are essential for preventing DoS and DDoS attacks. By restricting the number of requests each user can make within a time period, DoS attacks are made less effective, and by limiting the total number of requests that can be made over a more significant time period, DDoS attacks are weakened.

Neither of these attacks is completely avoided through rate limiting, but the harmful impacts can be mitigated by preventing the damage directly from the source.

Cascading failure

Cascading failure refers to a state of errors that propagate and multiply. This escalation of errors can be caused by an overload of API requests—either through a malicious attack or a surplus of legitimate users.

Cascading failure occurs when a portion of a system is overloaded, driving increased traffic to other areas of a system, increasing the strain and causing them, in turn, to be overloaded. The most effective way to prevent cascading failures is to prevent server overload in the first place by using rate limiting.

Resource Starvation

Resource Starvation is the result of inaccessible resources. For example, suppose your API is embedded in another website, but your servers are overloaded with requests for resources. In that case, the website trying to access your API won’t be able to access the resources, resulting in Resource Starvation.

Related to Resource Starvation is Resource Exhaustion, which describes a type of DoS attack that uses specific vulnerabilities in the design of an API to create more resource-taxing requests, as opposed to a sheer volume of requests. Resource Exhaustion Attacks highlight the importance of tiered rate limiting so that different kinds of risks can be mitigated at once.

Tiered Rate Limiting

Because of the variety of vulnerabilities rate limiting must attempt to account for, APIs should use tiered rate limits.

As the name suggests, tiered rate limiting structures API requests into time-based tiers that build on each other. For example, an API may limit the number of requests that can be made every second, every three seconds, every 10 seconds. If the limit is 6 requests per second but 10 requests every three seconds, that means that high levels of traffic are allowed in short bursts, but sustained levels of high usage would be limited.

Tiered Rate Limiting Diagram
Tiered API rate limiting request flow

The tiers of a rate-limiting server will depend on the resources being used. Some APIs may allow hundreds of requests per second, while others may limit their usage to a few requests per minute or even per hour. The complexity and resources used for each API request will dictate what your tiers should be.

Since tiers can be complex and intersecting, APIs can structure their tiers in sophisticated ways, such as limiting overall activity in addition to single-user activity, changing time frames based on volume, or creating delayed requests as a middle option between allowing and rejecting a request.

Implementation

There are multiple ways to implement rate limits based on the individual needs of your API. Here are three of the most useful ways to implement rate tiers:

1. Hard Stop

In the hard stop practice, once a rate limit is reached, the API will reject all requests that exceed the limit. In 2020 and 2021, many people experienced newly restricted occupancy limits in grocery stores, which often required attendants to count customers coming in and only allow new customers to enter the building as other customers exited. This is the hard stop implementation in analog. Once the limit has been reached, the doors are closed, and only when the requests fall back under the threshold will any new requests be allowed.

The most typical indication that this limit has been reached is an HTTP 429 error “Too Many Requests.” Optionally, developers can include information about when the request can be retried.

While this hard limit can be frustrating for users, especially when they don’t understand why they are receiving this error, it’s also the simplest to implement and regulate, making it a popular choice for developers. To prevent unnecessary frustration, customers must understand that they should retry their request after waiting for some time.

Users of the popular Dall-E Mini AI art generation algorithm are no strangers to rate limits, as the “too much traffic, please try again” popup became nearly as famous as the AI itself. This kind of popup is just one example that rate limits can be communicated to the users of an API.

example of a website showing too much traffic

2. Throttled Stop

If a hard stop is like shutting the doors, a throttled stop is like tapping the breaks. Throttled stops delay the response to a request rather than rejecting it outright, so they serve as a middle ground between accepting and rejecting API requests. Throttling can be built into your rate tiers in concert with hard limits.

For example, you can set hard limits of 10 requests per second and 100 requests per minute. That is straightforward, it means that users can make up to 10 requests per second, but they can’t make 10 requests EVERY second. After 100 requests within a minute, any additional requests will be denied until the time period has progressed. However, this means that a user submitting 10 requests per second would have access to the resources they need for 10 seconds, and then have a 50-second waiting period before they can access any further resources.

That kind of inconsistency can be frustrating for users, but it can be mitigated by allowing a certain number of delayed requests in addition to the hard limits. Rather than rate limits being an all-or-nothing toggle, throttling API requests can slow down additional requests by creating an artificial delay.

The tiers of a throttled rate limit can be structured the same way as rate limiters with hard stops, but with an additional color in their palette. The diagram below illustrates this in action: the user is submitting requests at around 8 requests per second (r/s). The API has a tier that accepts 8 requests within a “burst,” as defined by their time frame, and additional requests are only accepted at a rate of 5 requests per second. That means that although the user is submitting 8 requests per second, once the limit has been met, only 5 of those requests are allowed, and the additional requests are denied.

Some APIs use delay to create a middle ground between allowed and rejected requests.

Diagram of API rate limit delaying
Some APIs use delay to create a middle ground between allowed and rejected requests.

This can be used in combination with hard limits or in place of it. For example, an API could use a tier that allows 10 requests with no delay per 10 seconds, followed by 10 requests with 5r/s delay, for a total of 20 requests allowed in 10 seconds—10 with the delay, and 10 without, with a hard limit after the 20th request. It all depends on the needs of the API and the nature of the resources being accessed.

Throttled stops are more difficult to implement, but they improve the user experience by reducing frustrating hard stops.

3. Billable Stop

One of the reasons rate limits are important in the first place is to reduce unnecessary costs. In service of this goal, billable stops give the users the ability to access requests over their limit—for a price.

Billable stops are a good way to recoup the cost of excess API requests while also giving users the option to have more access to your API than they would otherwise.

The obvious downside to billable stops is that many users are reluctant to spend more money on your API, but the upside is that it can be beneficial to both the API and the user.

Best Practices

The key to a successful API is to make users enjoy their experience. Whether the user is the CEO of a major business, the developer of another API, or the end-user who just found you on Google, your API should be built to meet people’s needs and—at a minimum—annoy them as little as possible along the way.

In service of this goal, here are some best practices for introducing rate limiting to your API:

1. Don’t Be Greedy.

You are in business to make a profit. There is no shame in that. But turning your rate limit into a profit stream may be killing the golden goose.

Allow your users to use your API to solve their pain point rather than using your rate limit to create a new pain point for them. If you are flexible with your rate limit and you make your users happy, they will be happy to support you in return.

Having no restrictions is putting yourself in a position to be vulnerable to DoS and DDoS attacks, compromising your user experience, and hurting your bottom line. But having too stringent restrictions may sow bad faith and turn users away from your API altogether. It’s important to hold both of those concepts in balance as you structure your rate limits.

2. Be Transparent.

Make sure your users know what limits they have agreed to in their contract. The users of your API aren’t just the end-users of the resource, they are also the developers of other APIs who are implementing your server into their interface. It’s important they fully understand how your rate limiting is structured, and why you structured it the way you did.

It’s especially important to document your entire process so users have an objective source to look to for answers.

For example, your API offers weather report information to its users. If one of your users is a Smart Thermostat that accesses your API at regular intervals to update the weather, a design error could cause these calls to go into an infinite loop, requesting weather data several times per second. By documenting the way your tiers are structured, your user better understands what kinds of errors they could run into, what protections you have in place, and what the consequences of exceeding their limit could be.

3. Add a Counter

In addition to being transparent about the reasons for your limits, it’s also important to be transparent as your API is being used. Including counters in response headers ensures that users know where they are in relation to your limits.

This is important because it allows your users to make informed decisions. The number of requests you allow can vary greatly depending on your API, but a best practice for rate limit implementation is to make relevant information accessible to your users, both as they’re implementing your API and as they’re using it.

The throughline of these best practices is that you should equip your users to make informed decisions. Rather than exploiting their uninformed decisions for a quick payout, it’s better for you and for your users to treat people fairly and transparently.

A great way to do this is to set these headers on the Response

  • “RateLimit-Limit” – How many requests or “points” the client is allowed to make
  • “RateLimit-Remaining” – How many they have remaining until the “Reset”
  • “RateLimit-Consumed” – How many they have already consumed
  • “RateLimit-Reset” – ISO Date of when the RateLimit will be reset for the client
  • “Retry-After-Seconds” – How many seconds until the client should wait for the RateLimit to be reset

The Auth API is Here to Help

Managing an API can be a headache, but The Auth API has a team of qualified experts to take the most technical aspects of API management off your hands.

The Auth API specializes in API key management and analytics, helping you leverage your API to maximize your ROI.

Don’t reinvent the wheel—see how The Auth API can help you achieve your goals by signing up for a free trial today.

Picture of a blac and white guarding your keys

What safeguards have you implemented to protect your API from malicious actors?

Take the first step today on your journey to secure API access.