nexusflow
Online

Rate Limits

NexusFlow controls peak traffic via RPM, TPM, concurrent streams, async tasks, and a monitoring system. High concurrency is not a single number, but a combination of rate limiting, queuing, polling cadence, and model latency.

RPM
Controls request frequency, preventing instantaneous spikes from overwhelming upstream.
TPM
Limits tokens per minute, preventing long-context traffic from squeezing resources.
Concurrency
Long tasks should use the async queue rather than holding synchronous connections for extended periods.
Monitoring
Observe peak-period changes via TTFT, success rate, and per-model latency.

Plan Rate Limits

PlanRPMTPMConcurrencyDescription
Free2040K2Suitable for personal learning and testing
Developer60150K5Suitable for individual developers and small projects
Team200500K20Suitable for team collaboration and large-scale applications
Enterprise10002M100Suitable for large-scale production environments
CustomCustomCustomCustomCustom limits based on your needs

Models Token Limitation

ModelsContext windowMax InputMax output
qwen3-max262K258K64K
qwen3.6-max-preview262K262K64K
qwen3.6-plus1M1M64K
qwen3.6-flash1M1M64K
qwen3.5-plus1M1M64K
deepseek-v4-pro1M1M16K
deepseek-v4-flash1M1M16K
deepseek-r164K64K8K
deepseek-v364K64K8K

Rate Limiting Response Headers

Currently, the stable response headers you can rely on are those related to remaining balance. For more granular headers, refer to subsequent platform releases.

Response HeaderDescription
X-RateLimit-RemainingRemaining request quota visible on the current request chain
Retry-AfterSuggested wait time in seconds when rate limiting is triggered; clients should implement exponential backoff

High-Concurrency Scenario Recommendations

Sync vs. Async Traffic Separation
Chat goes to `/v1/chat/completions`, image/video goes to `/v1/tasks`. Separate long tasks from the synchronous path.
Smart Polling
Don't poll task status at high frequency; use a fixed 3-5 second interval or exponential backoff to reduce cascading amplification effects.
Monitor via the Dashboard
Track request volume, TTFT, success rate, and per-model latency changes to recognize whether you're approaching capacity limits.
Business-Side Degradation
During peak periods, prioritize switching to faster models, or downgrade by lowering max_tokens and long-context usage.