API Reference
Rate Limits

Rate Limits

almyty applies rate limits to protect the platform and ensure fair usage. Limits vary by endpoint type and authentication method.

Default Limits

Endpoint CategoryRate LimitWindow
Authentication (/auth/*)10 requestsper minute
API Management (/apis/*, /tools/*, /gateways/*)120 requestsper minute
Agent Invocations (/agents/:id/invoke)60 requestsper minute
Gateway Endpoints (/mcp/*, /a2a/*, /utcp/*)300 requestsper minute
Analytics (/analytics/*)30 requestsper minute
Health Checks (/health*)No limit

Rate Limit Headers

Every response includes rate limit information in headers:

HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the window
X-RateLimit-RemainingRequests remaining in the current window
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds until the next request is allowed (only on 429)

Example headers:

X-RateLimit-Limit: 120
X-RateLimit-Remaining: 85
X-RateLimit-Reset: 1711234620

Exceeding Limits

When you exceed the rate limit, the API returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1711234620
Content-Type: application/json

{
  "success": false,
  "message": "Rate limit exceeded. Retry after 30 seconds.",
  "error": "RATE_LIMITED",
  "statusCode": 429
}

Handling Rate Limits

Exponential Backoff

The recommended strategy for handling rate limits:

async function requestWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);
 
    if (response.status === 429) {
      const retryAfter = parseInt(response.headers.get("Retry-After") || "1");
      const delay = retryAfter * 1000 * Math.pow(2, attempt);
      console.log(`Rate limited. Retrying in ${delay}ms...`);
      await new Promise(resolve => setTimeout(resolve, delay));
      continue;
    }
 
    return response;
  }
 
  throw new Error("Max retries exceeded");
}

Python

import time
import requests
 
def request_with_retry(url, headers, max_retries=3):
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)
 
        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 1))
            delay = retry_after * (2 ** attempt)
            print(f"Rate limited. Retrying in {delay}s...")
            time.sleep(delay)
            continue
 
        return response
 
    raise Exception("Max retries exceeded")

Per-Gateway Rate Limits

Individual gateway tools can have custom rate limits configured through Tool Scoping:

curl -X PATCH https://api.almyty.com/gateways/{gatewayId}/tools/{gatewayToolId} \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "rateLimit": 50
  }'

Gateway-level rate limits are applied per API key, allowing different keys to have different limits.

Burst Allowance

Rate limits include a small burst allowance. You can briefly exceed the per-minute rate for short bursts, as long as the sustained rate stays within limits. The burst window is typically 10 seconds.

Best Practices

  1. Respect Retry-After — Always wait the specified duration before retrying
  2. Implement backoff — Use exponential backoff for retry logic
  3. Cache responses — Avoid unnecessary repeated requests
  4. Batch operations — Use bulk endpoints where available (e.g., /tools/bulk)
  5. Monitor headers — Track X-RateLimit-Remaining to proactively slow down
  6. Use webhooks — For event-driven workflows, use webhooks instead of polling