nexusflow
Online
Feature

Model Fallback Design

Planned request-level fallback design for switching to backup models when the preferred model is unavailable

models parameter allows you to specify a list of backup models. When the primary model (model field) has all providers unable to respond, the system will try backup models in order until one returns successfully.

This is currently preserved as a public API design document, facilitating future convergence of OpenAI/Anthropic request-level fallback into a unified specification; the current more stable fault tolerance is primarily based on provider-level failover.

How It Works

In the request body, use model to specify the primary model, and via models array to list backup models by priority. The example below shows the planned public contract for defining future model-level fallback behavior.

curl -X POST https://nexusflow.vip/v1/chat/completions \
  -H "Authorization: Bearer sk-air-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-max",
    "models": ["deepseek-v3.2", "glm-4.7"],
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Fallback Behavior

ScenarioBehavior
Primary model availableUse primary model normally (model field)
All providers for primary model failTry backup models in models array in order
All models failReturn the last error

Pricing

Requests are billed based on the model actually used. You can view the actual model used and corresponding fees for each request in the Call Logs

Usage Tips

Sort by Capability
Use the most capable model as primary, and slightly less capable but more stable models as backups.
Set Reasonable Count
1-2 backup models are usually sufficient. Too many backups increase overall latency.
Applicable Scenarios
Model fallback is suitable for production environments with extremely high availability requirements. For development and testing, a single model is sufficient.
Combine with Provider Routing
Provider routing handles endpoint switching within the same model, model fallback handles cross-model backup. They complement each other.

Related Docs