Rate Limits

nexusflow controls peak traffic through RPM, TPM, approval flows, async tasks, and a monitoring system. High concurrency is not a single number — it is a combination of rate limits, queues, polling cadence, and model latency.

RPM

Controls request frequency to prevent instantaneous spikes from overwhelming the upstream.

TPM

Limits tokens per minute to prevent long-context traffic from crowding out resources.

Concurrency

Route long tasks through an async queue rather than holding synchronous connections open.

Monitoring

Watch for peak-period degradation via TTFT, success rate, and per-model latency.

Plan Rate Limits

Plan	RPM	TPM	Concurrency	Description
Free	`20`	`40K`	`2`	For personal learning and testing
Developer	`60`	`150K`	`5`	For individual developers and small projects
Team	`200`	`500K`	`20`	For team collaboration and mid-sized apps
Enterprise	`1000`	`2M`	`100`	For large-scale production environments
Custom	`Custom`	`Custom`	`Custom`	Custom limits based on your needs

Model Token Limits

Model	Context Window	Max Input	Max Output
`qwen3-max`	256K	252K	64K
`qwen3.6-max-preview`	256K	256K	64K
`qwen3.6-plus`	1M	1M	64K
`qwen3.6-flash`	1M	1M	64K
`qwen3.5-plus`	1M	1M	64K
`deepseek-v4-pro`	1M	1M	16K
`deepseek-v4-flash`	1M	1M	16K
`deepseek-r1`	64K	64K	8K
`deepseek-v3`	64K	64K	8K

Rate Limit Response Headers

The currently reliable response headers concern remaining quota. More granular headers will depend on what future platform versions expose.

Response Header	Description
`X-RateLimit-Remaining`	Remaining request quota visible to the current request path
`Retry-After`	Recommended wait time in seconds when rate-limited; clients should use exponential backoff

High-Concurrency Recommendations

Separate sync and async

Route chat through `/v1/chat/completions` and image / video through `/v1/tasks` to keep long tasks off the synchronous path.

Back off when polling

Do not poll task status too frequently; use a fixed 3-5 second interval or exponential backoff to reduce amplification.

Watch degradation on the monitoring page

Track request volume, TTFT, success rate, and per-model latency to tell whether you are approaching the capacity ceiling.

Degrade gracefully on your side

During peaks, switch to faster models first, or reduce max_tokens and long-context usage.