# API Rate Limiting

API rate limiting is the practice of enforcing a maximum number of API requests that a client can make within a defined time period. It is a specific application of general rate limiting, tailored to the characteristics and requirements of API-based services. When the limit is exceeded, the API responds with an HTTP 429 status code and typically includes headers indicating when the client can retry.

## How API rate limiting works

An API rate limiter tracks request counts against defined quotas. The quota is usually expressed as a number of requests per time window (e.g., 1,000 requests per minute). The limiter identifies clients by one or more attributes:

* **API key**: Each registered API consumer has its own quota.
* **Authenticated user**: Limits are tied to the user identity extracted from a JWT or OAuth token.
* **IP address**: Used for unauthenticated endpoints or as a fallback when no API key is present.
* **Endpoint**: Different endpoints may have different limits based on their cost or sensitivity.

When a request arrives, the limiter checks the client's current count against the limit. If the count is within bounds, the request proceeds and the counter increments. If the limit is exceeded, the request is rejected.

## Standard response headers

Well-designed APIs communicate rate limit status through response headers:

* `X-RateLimit-Limit`: The maximum number of requests allowed in the current window.
* `X-RateLimit-Remaining`: How many requests the client can still make in the current window.
* `X-RateLimit-Reset`: The time (as a Unix timestamp or seconds) until the current window resets.
* `Retry-After`: Included with 429 responses to indicate how long the client should wait before retrying.

These headers let client applications implement backoff logic and avoid hammering the API when limits are reached.

## API rate limiting strategies

Different strategies suit different use cases:

* **Per-plan limits**: SaaS APIs commonly tie rate limits to subscription tiers. A free plan might allow 100 requests per minute, while an enterprise plan allows 10,000.
* **Per-endpoint limits**: Expensive operations (such as full-text search or report generation) get lower limits than lightweight reads.
* **Burst allowance**: Some implementations allow short bursts above the sustained rate, using a token bucket algorithm, as long as the average rate stays within bounds.
* **Graduated throttling**: Instead of hard rejection, the API may slow responses (add latency) as the client approaches its limit, providing a softer degradation.

## API rate limiting and API gateways

API gateways are the standard enforcement point for API rate limiting. Because the gateway processes every request before it reaches backend services, it can enforce limits consistently across all endpoints without requiring each backend to implement its own logic. The gateway can also differentiate between clients based on API keys, JWT claims, or IP addresses.

Serverless API Gateway does not include built-in API rate limiting at this time. Teams requiring API-level rate limits can use Cloudflare's platform-level rate limiting rules, which are applied at the network edge before requests reach the gateway Worker. This approach provides rate limiting without adding latency to the request path. For more granular, application-aware limits, rate limiting logic can be implemented in the backend services themselves.

## Related documentation

* [Authorizer Configuration](/configuration/authorizer.md) - Identify API consumers for per-client rate limits
* [Path Routing](/configuration/paths.md) - Define per-endpoint routing where rate limits can vary
* [Configuration Overview](/configuration/overview.md) - Gateway configuration reference


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.serverlessapigateway.com/glossary/a/api-rate-limiting.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
