API Reference
Parameters Matrix
This page lists parameters based on backend actual forwarding and protocol conversion logic. Text-type models support OpenAI, Anthropic, and Gemini protocols; non-text models use image, audio, embedding, or async task interfaces based on model capabilities.
OpenAI Chat Completions
Parameter
Type/Map
Status
Description
modelstring
Required
Model ID. Text, reasoning, multimodal, coding, and professional models support Chat Completions.
messagesarray
Required
Chat message array, passed in sequential order: system, user, assistant, tool.
messages[].rolestring
Required
system / user / assistant / tool. The tool role is used for passing back tool execution results.
messages[].contentstring | array
Required
Text can be passed directly as a string; multimodal input uses a content block array.
messages[].content[].typestring
Multimodal
Common values include text / image_url; video, input_audio, and other Bailian extension content blocks depend on the specific model.
messages[].content[].textstring
Multimodal
Text content when type=text.
messages[].content[].image_url.urlstring
Multimodal
Image URL or data URL; requires the model to support vision understanding.
streamboolean
Optional
Enable SSE streaming output. Recommended for long text, reasoning models, and interactive scenarios.
stream_options.include_usageboolean
Optional
Return usage in the final streaming response chunk. Recommended when billing, statistics, or smoke checks are needed.
temperaturenumber
Optional
Sampling temperature. Range typically 0 to 2; higher values produce more random output.
top_pnumber
Optional
Top-p sampling threshold. It is recommended not to adjust both temperature and top_p significantly at the same time.
max_tokensinteger
Optional
Maximum output token count; cannot exceed the model's maximum output.
stopstring | string[]
Optional
Stop sequences; the model stops output after hitting one.
presence_penaltynumber
Optional
Presence penalty, typical range -2 to 2, increases tendency for new topics.
frequency_penaltynumber
Optional
Frequency penalty, typical range -2 to 2, reduces repetition in output.
toolsarray
Optional
Function calling definition array. Only models that support tool calling will reliably return tool_calls.
tools[].typestring
Tool
Fixed as function.
tools[].function.namestring
Tool
Function name. Recommended to use letters, digits, and underscores.
tools[].function.descriptionstring
Tool
Description of the function's purpose, affects the model's accuracy in selecting tools.
tools[].function.parametersobject
Tool
JSON Schema describing the function's input parameters.
tool_choicestring | object
Optional
Supports auto / none, or specifying {type:'function', function:{name}}. For thinking mode models, forcing specific tools is not recommended.
response_formatobject
Optional
Output format control. Common values: {"type":"text"} or {"type":"json_object"}.
enable_thinkingboolean
Optional
Toggle thinking mode. Only verified hybrid thinking models can be disabled; pure thinking models will ignore false and continue returning reasoning_content.
thinking_budgetinteger
Optional
Limit thinking token upper bound, passed through based on model ID prefix (qwen3.7- / qwen3.6- / qwen3.5- / qwen3-).
preserve_thinkingboolean
Optional
Pass historical reasoning_content in messages back to the model. Supports qwen3.7-max, qwen3.6-max-preview, qwen3.6-plus, kimi-k2.6.
enable_searchboolean
Optional
Web search, supported by Qwen (Tongyi) text-type models (not VL / math series).
search_optionsobject
Optional
Web search configuration, used together with enable_search.
enable_context_cachingboolean
Optional
Enable Context Caching. Repeated prompt prefixes are automatically cached; hits are billed at 0.1x input price. Supports Qwen (Tongyi), GLM series.
seedinteger
Optional
Random seed, supported by Qwen (Tongyi) text models for pass-through.
top_kinteger
Optional
Top-K sampling, supported by Qwen (Tongyi) text models for pass-through.
logprobsboolean
Optional
Return log probabilities, supported by Qwen (Tongyi) text models for pass-through.
repetition_penaltynumber
Optional
Repetition penalty, supported by Qwen (Tongyi) text models for pass-through.
parallel_tool_callsboolean
Optional
Parallel tool calling, supported by Qwen (Tongyi), DeepSeek, GLM, Anthropic models.
Not Yet Supported Fields
The table below lists fields that are not yet reliably forwarded through the public Chat endpoint. Production code should not depend on these.
Parameter
Type/Map
Status
Description
max_completion_tokensinteger
Not yet forwarded
Please use the currently supported max_tokens instead.
Thinking Mode Support
This section lists the actual behavior of NexusFlow's online OpenAI Chat endpoint. Support may change with upstream model versions; production code should use explicit configuration based on model ID.
Parameter
Type/Map
Status
Description
qwen3.7-maxhybrid thinking
Supports true / false
Thinking enabled by default; true returns reasoning_content; false does not. Supports thinking_budget and preserve_thinking.
qwen3.5-flashhybrid thinking
Supports true / false
Verified: true returns reasoning_content; false does not.
qwen3-maxhybrid thinking
Supports true / false
Verified: true returns reasoning_content; false does not.
qwq-pluspure thinking
false cannot be disabled
Verified: true/false both return reasoning_content.
qwen-math-plusnot handled as thinking toggle
Do not pass
Verified: true/false do not yet return reasoning_content.
deepseek-r1pure thinking
false cannot be disabled
Verified: true/false both return reasoning_content.
deepseek-v3.2hybrid thinking
Supports true / false
Verified: true returns reasoning_content; false does not.
deepseek-v4-prohybrid thinking
Supports true / false
Verified: true returns reasoning_content; false does not.
glm-5.1hybrid thinking
Supports true / false
Verified: true returns reasoning_content; false does not.
Anthropic Messages Mapping
Parameter
Type/Map
Status
modelmodel
Model ID, mapped to OpenAI model.
systemmessages[0].role=system
System prompt. Supports string or text blocks.
messagesmessages
user / assistant messages are converted to OpenAI messages.
messages[].content[].textmessages[].content
Text block. Pure text blocks are merged into a string.
messages[].content[].imageimage_url
Supports url or base64 source, converted to OpenAI image_url.
messages[].content[].tool_useassistant.tool_calls
Assistant tool call result.
messages[].content[].tool_resultrole=tool
Tool execution result passed back.
max_tokensmax_tokens
Maximum output tokens.
temperaturetemperature
Sampling temperature.
top_ptop_p
Nucleus sampling.
stop_sequencesstop
Stop sequence array.
streamstream
Enable Anthropic SSE event stream.
toolstools
Anthropic tools are converted to OpenAI function tools.
tool_choicetool_choice
auto / none / any / tool is converted to OpenAI tool_choice.
Gemini GenerateContent Mapping
Parameter
Type/Map
Status
contentsmessages
Message array. String contents are also wrapped into user text messages.
contents[].rolemessages[].role
user maps to user, model maps to assistant.
contents[].parts[].textcontent text
Text content.
contents[].parts[].inlineDataimage_url data URL
Base64 image content, converted to image_url.
contents[].parts[].fileDataimage_url
File URL, converted to image_url.
contents[].parts[].functionCallassistant.tool_calls
Model function calling.
contents[].parts[].functionResponserole=tool
Tool execution result.
systemInstructionsystem message
System prompt, supports string or parts.
generationConfig.temperaturetemperature
Sampling temperature.
generationConfig.topPtop_p
Nucleus sampling.
generationConfig.maxOutputTokensmax_tokens
Maximum output tokens.
generationConfig.stopSequencesstop
Stop sequence array.
tools[].functionDeclarationstools
Function declarations, converted to OpenAI function tools.
toolConfig.functionCallingConfig.modetool_choice
AUTO / ANY / NONE map to auto / required / none respectively; some upstream models may not decline required.
streamGenerateContentstream=true
Streaming interface. Use ?alt=sse for SSE-formatted responses.
Response Fields
Parameter
Type/Map
choices[].message.contentNon-streaming text output.
choices[].message.reasoning_contentThinking content field that reasoning models may return.
choices[].message.tool_callsReturned when the model requests a tool call.
choices[].delta.contentStreaming text increment.
choices[].delta.reasoning_contentStreaming thinking increment, may be returned by reasoning models.
choices[].finish_reasonstop / length / tool_calls / content_filter.
usage.prompt_tokensInput tokens.
usage.completion_tokensOutput tokens.
usage.total_tokensTotal tokens.
usage.completion_tokens_details.reasoning_tokensReasoning tokens, returned by some models.
Production recommendation: Use
stream=true and stream_options.include_usage=true for reasoning models; for hybrid thinking models in low-cost, low-latency scenarios, explicitly pass enable_thinking=false. More examples at Chat Completions API and Gemini Protocol.