Planned request-level fallback design for switching to backup models when the preferred model is unavailable
models parameter allows you to specify a list of backup models. When the primary model (model field) has all providers unable to respond, the system will try backup models in order until one returns successfully.
This is currently preserved as a public API design document, facilitating future convergence of OpenAI/Anthropic request-level fallback into a unified specification; the current more stable fault tolerance is primarily based on provider-level failover.
How It Works
In the request body, use model to specify the primary model, and via models array to list backup models by priority. The example below shows the planned public contract for defining future model-level fallback behavior.
Requests are billed based on the model actually used. You can view the actual model used and corresponding fees for each request in the Call Logs
Usage Tips
Sort by Capability
Use the most capable model as primary, and slightly less capable but more stable models as backups.
Set Reasonable Count
1-2 backup models are usually sufficient. Too many backups increase overall latency.
Applicable Scenarios
Model fallback is suitable for production environments with extremely high availability requirements. For development and testing, a single model is sufficient.
Combine with Provider Routing
Provider routing handles endpoint switching within the same model, model fallback handles cross-model backup. They complement each other.