# Rate Limiting

Rate limiting is the practice of controlling the number of requests a client can make to a server or API within a specified time window. It is a fundamental mechanism for protecting services from abuse, preventing resource exhaustion, and ensuring fair usage across consumers. When a client exceeds the allowed limit, the server typically responds with an HTTP 429 (Too Many Requests) status code.

## Why rate limiting matters

Without rate limiting, a single misbehaving or malicious client can consume disproportionate resources and degrade the experience for other users. Common reasons to implement rate limiting include:

* **Abuse prevention**: Blocking brute-force login attempts, credential stuffing, scraping, and denial-of-service attacks.
* **Resource protection**: Preventing backend services from being overwhelmed by traffic spikes, whether organic or artificial.
* **Fair usage**: Ensuring that all API consumers get equitable access to the service rather than allowing one heavy user to crowd out others.
* **Cost control**: In pay-per-use cloud architectures, uncontrolled traffic can lead to unexpected bills.
* **SLA enforcement**: Tying rate limits to subscription tiers so that higher-paying customers receive higher quotas.

## Rate limiting algorithms

Several algorithms are commonly used to implement rate limiting:

* **Fixed window**: Counts requests in fixed time intervals (e.g., per minute). Simple but susceptible to burst traffic at window boundaries.
* **Sliding window**: Smooths the fixed window approach by considering a rolling time period, reducing the burst problem.
* **Token bucket**: A bucket holds tokens that refill at a steady rate. Each request consumes a token. When the bucket is empty, requests are rejected. Allows controlled bursts up to the bucket size.
* **Leaky bucket**: Requests enter a queue that drains at a fixed rate. Excess requests are dropped. Produces very smooth output but queues add latency.

## Rate limiting at the API gateway layer

API gateways are a natural enforcement point for rate limiting because they sit in front of all backend services and process every request. Applying rate limits at the gateway means:

* Backend services do not need to implement rate limiting individually.
* Limits are enforced consistently across all endpoints.
* Rejected requests never reach the backend, saving compute resources.
* Limits can be applied per API key, per IP address, per authenticated user, or per endpoint.

Gateway-level rate limiting typically works with headers like `X-RateLimit-Limit`, `X-RateLimit-Remaining`, and `Retry-After` to communicate quota status to clients.

## Rate limiting and Serverless API Gateway

Serverless API Gateway does not currently include built-in rate limiting functionality. For applications that require rate limiting, teams can implement it at the Cloudflare level using Cloudflare's native rate limiting rules, which apply before traffic reaches the Worker. Alternatively, rate limiting logic can be built into the backend services or applied through Cloudflare Workers custom logic upstream of the gateway.

## Related documentation

* [Configuration Overview](/configuration/overview.md) - Gateway configuration reference
* [Authorizer Configuration](/configuration/authorizer.md) - Authenticate requests before applying rate limits
* [Servers Configuration](/configuration/servers.md) - Configure upstream services protected by rate limits


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.serverlessapigateway.com/glossary/r/rate-limiting.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
