/v1/chat/completionsChat Completions API
Create chat completion responses. Fully compatible with the OpenAI Chat Completions format. You can directly use the official OpenAI SDK (Python / Node.js) for integration by simply modifying base_url and api_key. Supports streaming output, multi-turn conversation, Function Calling, vision understanding, and other capabilities.
Request Endpoint
https://nexusflow.vip/v1/chat/completionsRequest Headers
Request Parameters
Code Examples
Response Format (Non-streaming)
Non-streaming requests return a complete JSON object. The object field value is "chat.completion".
Response Example
Response Fields
Streaming Response Format (SSE)
When stream: true is set, the response is returned step by step via Server-Sent Events (SSE). Each event starts with data: and ends with data: [DONE] as a termination marker. Each chunk's object field value is "chat.completion.chunk".
SSE Data Format
Chunk Field Descriptions
Relationship with the Bailian Official Chat API
/v1/chat/completions is fully protocol-compatible with Alibaba Cloud Bailian's OpenAI-compatible endpoint: the request body is forwarded upstream as-is, and the response is relayed unchanged. Bailian extension fields such as tools, tool_choice, response_format,enable_thinking, thinking_budget, enable_search, search_options,seed, top_k, logprobs, stream_options can all be used directly. The exact support range depends on the specific model. Official reference: Qwen API Reference.Billing Details
Tiered Billing
Bailian series models (Qwen/Tongyi, GLM, etc.) use tiered billing based on the input token count per request. The total prompt tokens of a single request determine the applicable pricing tier, with input and output billed at the corresponding tier's unit price.
Example: a request with 50K input tokens + 2K output tokens would bill input at ¥4/M and output at ¥16/M (falling into Tier 2). See the full tiered pricing on the Pricing page.
Context Caching (Prompt Caching)
Context caching is supported when calling via /v1/messages (Anthropic protocol). For repeated system prompts or long documents, DashScope automatically caches the prompt prefix, and subsequent requests hitting the cached portion enjoy a discount:
/v1/chat/completions supports explicit caching via the enable_context_caching: true parameter (Bailian series models). /v1/messages (Anthropic protocol) supports cache_control content block annotations. Both protocols also automatically benefit from implicit cache discounts.
Important Notes
- Different models have different
max_tokensupper limits. Please refer to the Model List for each model's limitations. temperatureandtop_pshould be adjusted independently; setting both simultaneously may produce unpredictable results.- In streaming output, only the final chunk has a non-null
finish_reasonvalue, indicating generation has ended. - For image understanding, it is recommended to use multimodal models such as Qwen-VL. The
contentfield must use the array format and includeimage_urltype entries. - For Function Calling, it is recommended to use model series that support tool calling, such as Qwen, DeepSeek, GLM, etc.
- Thinking mode (
enable_thinking) must be used with the appropriate model ID. See the support matrix at Parameters Matrix. - The request body is protocol-compatible with the upstream Bailian API; undocumented Bailian extension fields (e.g.
thinking_budget,enable_search,search_options) can be used directly, with specific support depending on the model. - For the complete parameter descriptions and model compatibility matrix, see Parameters Matrix.