Quick Start
Get up and running with QYAI in three steps.
Get your API Key
Contact us at qiyuanai@163.com to get your API key starting with sk-.
Configure your client
Point your OpenAI SDK or any compatible client to the QYAI base URL.
Send your first request
Use any model name from our list — it just works.
from openai import OpenAI
client = OpenAI(
api_key="sk-your-qyai-key",
base_url="http://121.41.214.247/v1"
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Hello, QYAI!"}]
)
print(response.choices[0].message.content)
Authentication
All API requests require authentication via a Bearer token in the Authorization header.
Authorization: Bearer sk-your-qyai-key
Keep your API key secret. Never expose it in client-side code or public repositories. If your key is compromised, regenerate it immediately from the dashboard.
Available Models
Official Models
| Model ID | Description | Context | Thinking |
|---|---|---|---|
deepseek-v4-pro |
Flagship reasoning model — complex reasoning, code, and Agent workflows | 1M | ✓ |
deepseek-v4-flash |
High-efficiency model — daily tasks and high-throughput scenarios | 1M | ✓ |
Compatibility Aliases
| Alias | Maps To | Notes |
|---|---|---|
deepseek-chat |
deepseek-v4-flash |
Legacy name — non-thinking mode |
deepseek-reasoner |
deepseek-v4-flash |
Legacy name — thinking mode enabled |
gpt-4o |
deepseek-v4-flash |
OpenAI-compatible alias |
gpt-4o-mini |
deepseek-v4-flash |
OpenAI-compatible alias |
Model Comparison
| Model | Context | Max Output | Thinking Mode | Tool Calling | JSON Output | Best For | Input Price | Output Price |
|---|---|---|---|---|---|---|---|---|
deepseek-v4-pro |
1M | 384K | ✓ | ✓ | ✓ | Complex reasoning, code, agents | $2.80/M | $5.60/M |
deepseek-v4-flash |
1M | 384K | ✓ | ✓ | ✓ | Daily tasks, high throughput | $0.22/M | $0.44/M |
Chat Completions
Send a chat completion request to generate a response from a model.
POST http://121.41.214.247/v1/chat/completions
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| modelrequired | string | — | Model ID to use (e.g. deepseek-v4-flash, deepseek-v4-pro) |
| messagesrequired | array | — | Array of message objects with role and content |
| streamoptional | boolean | false | Enable Server-Sent Events streaming |
| temperatureoptional | number | 1 | Sampling temperature (0–2). Higher = more random |
| max_tokensoptional | integer | 384K | Maximum tokens to generate in the completion |
| top_poptional | number | 1 | Nucleus sampling threshold (0–1) |
| frequency_penaltyoptional | number | 0 | Penalize frequent tokens (−2 to 2) |
| presence_penaltyoptional | number | 0 | Penalize new tokens that have appeared (−2 to 2) |
| stopoptional | string/array | null | Up to 4 stop sequences |
Request Example
{
"model": "deepseek-v4-flash",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in one paragraph."}
],
"temperature": 0.7,
"stream": false
}
Response Example
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1746000000,
"model": "deepseek-v4-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing harnesses the principles..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 18,
"completion_tokens": 87,
"total_tokens": 105
}
}
Thinking Mode
Both deepseek-v4-pro and deepseek-v4-flash support thinking mode, which returns the model's chain-of-thought reasoning via the reasoning_content field before the final answer.
How it works: When thinking mode is active, the response message object includes a reasoning_content field containing the model's internal reasoning, followed by content with the final answer.
Response with Thinking
{
"choices": [
{
"message": {
"role": "assistant",
"reasoning_content": "Let me think about this step by step...\nFirst, I need to consider...\nTherefore, the answer is...",
"content": "The answer is 42."
}
}
]
}
Important: When continuing a multi-turn conversation, do not include reasoning_content in subsequent messages. Only include role and content from assistant responses in the conversation history.
Using the deepseek-reasoner Alias
The legacy alias deepseek-reasoner automatically maps to deepseek-v4-flash with thinking mode enabled. This provides backward compatibility for existing integrations.
Streaming
Enable streaming by setting "stream": true in your request. The server returns Server-Sent Events (SSE), with each chunk prefixed by data: . The stream terminates with data: [DONE].
Streaming Response Format
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1746000000,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1746000000,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1746000000,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1746000000,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Tip: The final chunk before [DONE] may include a usage field with token counts when stream_options is set.
List Models
Retrieve the list of available models.
GET http://121.41.214.247/v1/models
Response
{
"object": "list",
"data": [
{
"id": "deepseek-v4-pro",
"object": "model",
"created": 1743465600,
"owned_by": "qyai"
},
{
"id": "deepseek-v4-flash",
"object": "model",
"created": 1743465600,
"owned_by": "qyai"
},
{
"id": "deepseek-chat",
"object": "model",
"created": 1743465600,
"owned_by": "qyai"
},
{
"id": "deepseek-reasoner",
"object": "model",
"created": 1743465600,
"owned_by": "qyai"
},
{
"id": "gpt-4o",
"object": "model",
"created": 1743465600,
"owned_by": "qyai"
},
{
"id": "gpt-4o-mini",
"object": "model",
"created": 1743465600,
"owned_by": "qyai"
}
]
}
Billing Usage
Query your API key's usage statistics.
GET http://121.41.214.247/v1/dashboard/billing/usage
Response
{
"object": "billing.usage",
"key_prefix": "sk-abc...",
"tier": "pro",
"monthly_requests": 142,
"monthly_limit": 3000,
"remaining": 2858,
"usage_percent": 4.7,
"total_prompt_tokens": 52340,
"total_completion_tokens": 12890
}
Pricing
Simple, transparent pricing. Pay only for what you use — no minimums, no commitments.
| Model | Input (cache miss) | Input (cache hit) | Output |
|---|---|---|---|
deepseek-v4-flash |
$0.22 / 1M tokens | $0.005 / 1M tokens | $0.44 / 1M tokens |
deepseek-v4-pro |
$2.80 / 1M tokens | $0.024 / 1M tokens | $5.60 / 1M tokens |
Cache hit pricing: When your prompt prefix matches a previously cached prefix, you benefit from drastically reduced input pricing. Cache hits are automatic — no configuration needed.
Price Comparison
See how QYAI compares to other major API providers (input / output per 1M tokens).
QYAI V4-Flash
QYAI V4-Pro
OpenAI GPT-4o
Anthropic Claude 3.5
| Provider | Model | Input / 1M | Output / 1M | Context |
|---|---|---|---|---|
| QYAI | V4-Flash | $0.22 | $0.44 | 1M |
| QYAI | V4-Pro | $2.80 | $5.60 | 1M |
| OpenAI | GPT-4o | $2.50 | $10.00 | 128K |
| Anthropic | Claude 3.5 Sonnet | $3.00 | $15.00 | 200K |
Error Codes
The API returns standard HTTP status codes. Error responses include a JSON body with details.
| Code | Type | Description |
|---|---|---|
| 400 | invalid_request_error | The request body is malformed or missing required parameters |
| 401 | authentication_error | Invalid or missing API key in the Authorization header |
| 403 | permission_error | Your API key does not have access to the requested resource |
| 404 | not_found_error | The requested endpoint or model does not exist |
| 429 | rate_limit_error | You have exceeded your rate limit or monthly quota |
| 500 | internal_error | An unexpected internal server error occurred |
| 502 | upstream_error | Failed to connect to the upstream model provider |
| 503 | service_unavailable | The service is temporarily unavailable — try again later |
| 504 | upstream_timeout | The upstream model provider timed out |
Error Response Format
{
"error": {
"message": "Invalid API key provided",
"type": "authentication_error"
}
}
Code Examples
Python (OpenAI SDK)
Non-streaming
from openai import OpenAI
client = OpenAI(
api_key="sk-your-qyai-key",
base_url="http://121.41.214.247/v1"
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about programming."}
],
temperature=0.7,
)
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")
Streaming
from openai import OpenAI
client = OpenAI(
api_key="sk-your-qyai-key",
base_url="http://121.41.214.247/v1"
)
stream = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Explain recursion in 3 sentences."}
],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
print() # newline
JavaScript / Node.js
Non-streaming
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "sk-your-qyai-key",
baseURL: "http://121.41.214.247/v1",
});
const response = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Write a haiku about programming." },
],
temperature: 0.7,
});
console.log(response.choices[0].message.content);
console.log(`Tokens used: ${response.usage.total_tokens}`);
Streaming
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "sk-your-qyai-key",
baseURL: "http://121.41.214.247/v1",
});
const stream = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [
{ role: "user", content: "Explain recursion in 3 sentences." },
],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || "";
process.stdout.write(content);
}
console.log(); // newline
cURL
Non-streaming
curl http://121.41.214.247/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-qyai-key" \
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about programming."}
],
"temperature": 0.7
}'
Streaming
curl http://121.41.214.247/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-qyai-key" \
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "user", "content": "Explain recursion in 3 sentences."}
],
"stream": true
}'
FAQ
Is QYAI compatible with the OpenAI SDK?
Yes. QYAI is fully OpenAI-compatible — just change the base_url and api_key in your existing OpenAI SDK client. All request and response formats are identical.
What models are available behind the gpt-4o and gpt-4o-mini aliases?
Both aliases map to deepseek-v4-flash. This allows you to switch providers without changing any model names in your code. You'll get DeepSeek V4-Flash quality at a fraction of the OpenAI price.
How does thinking mode work?
When thinking mode is active, the model returns a reasoning_content field in the response containing its chain-of-thought, followed by content with the final answer. Use deepseek-v4-pro for the best reasoning quality. Do not include reasoning_content in subsequent messages.
What is cache hit pricing?
DeepSeek's API automatically caches prompt prefixes. When a subsequent request's prefix matches a cached one, the input price drops dramatically (e.g., $0.005/M for V4-Flash vs $0.22/M for cache miss). This is automatic — no configuration needed.
What are the rate limits?
Rate limits depend on your plan tier. The basic tier allows 1,000 requests/month and 10 RPM; pro allows 3,000 requests/month and 30 RPM; enterprise allows 10,000 requests/month and 100 RPM. Contact us for custom limits.
Is my data sent to China?
QYAI proxies requests to DeepSeek's API infrastructure. DeepSeek is a Chinese company and their data processing is subject to Chinese regulations. We recommend not sending personally identifiable or sensitive data through any third-party AI API.