QYAI API Documentation

Quick Start

Get up and running with QYAI in three steps.

1

Get your API Key

Contact us at qiyuanai@163.com to get your API key starting with sk-.

2

Configure your client

Point your OpenAI SDK or any compatible client to the QYAI base URL.

3

Send your first request

Use any model name from our list — it just works.

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-qyai-key",
    base_url="http://121.41.214.247/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello, QYAI!"}]
)

print(response.choices[0].message.content)

Authentication

All API requests require authentication via a Bearer token in the Authorization header.

HTTP Header

Authorization: Bearer sk-your-qyai-key

Keep your API key secret. Never expose it in client-side code or public repositories. If your key is compromised, regenerate it immediately from the dashboard.

Available Models

Official Models

Model ID	Description	Context	Thinking
`deepseek-v4-pro`	Flagship reasoning model — complex reasoning, code, and Agent workflows	1M	✓
`deepseek-v4-flash`	High-efficiency model — daily tasks and high-throughput scenarios	1M	✓

Compatibility Aliases

Alias	Maps To	Notes
`deepseek-chat`	`deepseek-v4-flash`	Legacy name — non-thinking mode
`deepseek-reasoner`	`deepseek-v4-flash`	Legacy name — thinking mode enabled
`gpt-4o`	`deepseek-v4-flash`	OpenAI-compatible alias
`gpt-4o-mini`	`deepseek-v4-flash`	OpenAI-compatible alias

Model Comparison

Model	Context	Max Output	Thinking Mode	Tool Calling	JSON Output	Best For	Input Price	Output Price
`deepseek-v4-pro`	1M	384K	✓	✓	✓	Complex reasoning, code, agents	$2.80/M	$5.60/M
`deepseek-v4-flash`	1M	384K	✓	✓	✓	Daily tasks, high throughput	$0.22/M	$0.44/M

Chat Completions

Send a chat completion request to generate a response from a model.

HTTP

POST http://121.41.214.247/v1/chat/completions

Parameters

Parameter	Type	Default	Description
modelrequired	string	—	Model ID to use (e.g. `deepseek-v4-flash`, `deepseek-v4-pro`)
messagesrequired	array	—	Array of message objects with `role` and `content`
streamoptional	boolean	false	Enable Server-Sent Events streaming
temperatureoptional	number	1	Sampling temperature (0–2). Higher = more random
max_tokensoptional	integer	384K	Maximum tokens to generate in the completion
top_poptional	number	1	Nucleus sampling threshold (0–1)
frequency_penaltyoptional	number	0	Penalize frequent tokens (−2 to 2)
presence_penaltyoptional	number	0	Penalize new tokens that have appeared (−2 to 2)
stopoptional	string/array	null	Up to 4 stop sequences

Request Example

JSON

{
  "model": "deepseek-v4-flash",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in one paragraph."}
  ],
  "temperature": 0.7,
  "stream": false
}

Response Example

JSON

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1746000000,
  "model": "deepseek-v4-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing harnesses the principles..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 87,
    "total_tokens": 105
  }
}

Thinking Mode

Both deepseek-v4-pro and deepseek-v4-flash support thinking mode, which returns the model's chain-of-thought reasoning via the reasoning_content field before the final answer.

How it works: When thinking mode is active, the response message object includes a reasoning_content field containing the model's internal reasoning, followed by content with the final answer.

Response with Thinking

JSON

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "reasoning_content": "Let me think about this step by step...\nFirst, I need to consider...\nTherefore, the answer is...",
        "content": "The answer is 42."
      }
    }
  ]
}

Important: When continuing a multi-turn conversation, do not include reasoning_content in subsequent messages. Only include role and content from assistant responses in the conversation history.

Using the `deepseek-reasoner` Alias

The legacy alias deepseek-reasoner automatically maps to deepseek-v4-flash with thinking mode enabled. This provides backward compatibility for existing integrations.

Streaming

Enable streaming by setting "stream": true in your request. The server returns Server-Sent Events (SSE), with each chunk prefixed by data: . The stream terminates with data: [DONE].

Streaming Response Format

SSE

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1746000000,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1746000000,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1746000000,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1746000000,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Tip: The final chunk before [DONE] may include a usage field with token counts when stream_options is set.

List Models

Retrieve the list of available models.

HTTP

GET http://121.41.214.247/v1/models

Response

JSON

{
  "object": "list",
  "data": [
    {
      "id": "deepseek-v4-pro",
      "object": "model",
      "created": 1743465600,
      "owned_by": "qyai"
    },
    {
      "id": "deepseek-v4-flash",
      "object": "model",
      "created": 1743465600,
      "owned_by": "qyai"
    },
    {
      "id": "deepseek-chat",
      "object": "model",
      "created": 1743465600,
      "owned_by": "qyai"
    },
    {
      "id": "deepseek-reasoner",
      "object": "model",
      "created": 1743465600,
      "owned_by": "qyai"
    },
    {
      "id": "gpt-4o",
      "object": "model",
      "created": 1743465600,
      "owned_by": "qyai"
    },
    {
      "id": "gpt-4o-mini",
      "object": "model",
      "created": 1743465600,
      "owned_by": "qyai"
    }
  ]
}

Billing Usage

Query your API key's usage statistics.

HTTP

GET http://121.41.214.247/v1/dashboard/billing/usage

Response

JSON

{
  "object": "billing.usage",
  "key_prefix": "sk-abc...",
  "tier": "pro",
  "monthly_requests": 142,
  "monthly_limit": 3000,
  "remaining": 2858,
  "usage_percent": 4.7,
  "total_prompt_tokens": 52340,
  "total_completion_tokens": 12890
}

Pricing

Simple, transparent pricing. Pay only for what you use — no minimums, no commitments.

Model	Input (cache miss)	Input (cache hit)	Output
`deepseek-v4-flash`	$0.22 / 1M tokens	$0.005 / 1M tokens	$0.44 / 1M tokens
`deepseek-v4-pro`	$2.80 / 1M tokens	$0.024 / 1M tokens	$5.60 / 1M tokens

Cache hit pricing: When your prompt prefix matches a previously cached prefix, you benefit from drastically reduced input pricing. Cache hits are automatic — no configuration needed.

Price Comparison

See how QYAI compares to other major API providers (input / output per 1M tokens).

QYAI V4-Flash

$0.22 / $0.44

per 1M tokens (in / out)

91% cheaper than GPT-4o

QYAI V4-Pro

$2.80 / $5.60

per 1M tokens (in / out)

44% cheaper than GPT-4o

OpenAI GPT-4o

$2.50 / $10.00

per 1M tokens (in / out)

Anthropic Claude 3.5

$3.00 / $15.00

per 1M tokens (in / out)

Provider	Model	Input / 1M	Output / 1M	Context
QYAI	V4-Flash	$0.22	$0.44	1M
QYAI	V4-Pro	$2.80	$5.60	1M
OpenAI	GPT-4o	$2.50	$10.00	128K
Anthropic	Claude 3.5 Sonnet	$3.00	$15.00	200K

Error Codes

The API returns standard HTTP status codes. Error responses include a JSON body with details.

Code	Type	Description
400	invalid_request_error	The request body is malformed or missing required parameters
401	authentication_error	Invalid or missing API key in the Authorization header
403	permission_error	Your API key does not have access to the requested resource
404	not_found_error	The requested endpoint or model does not exist
429	rate_limit_error	You have exceeded your rate limit or monthly quota
500	internal_error	An unexpected internal server error occurred
502	upstream_error	Failed to connect to the upstream model provider
503	service_unavailable	The service is temporarily unavailable — try again later
504	upstream_timeout	The upstream model provider timed out

Error Response Format

JSON

{
  "error": {
    "message": "Invalid API key provided",
    "type": "authentication_error"
  }
}

Code Examples

Python (OpenAI SDK)

Non-streaming

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-qyai-key",
    base_url="http://121.41.214.247/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about programming."}
    ],
    temperature=0.7,
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Streaming

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-qyai-key",
    base_url="http://121.41.214.247/v1"
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Explain recursion in 3 sentences."}
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

print()  # newline

JavaScript / Node.js

Non-streaming

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-your-qyai-key",
  baseURL: "http://121.41.214.247/v1",
});

const response = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Write a haiku about programming." },
  ],
  temperature: 0.7,
});

console.log(response.choices[0].message.content);
console.log(`Tokens used: ${response.usage.total_tokens}`);

Streaming

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-your-qyai-key",
  baseURL: "http://121.41.214.247/v1",
});

const stream = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [
    { role: "user", content: "Explain recursion in 3 sentences." },
  ],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(content);
}
console.log(); // newline

cURL

Non-streaming

bash

curl http://121.41.214.247/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-qyai-key" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Write a haiku about programming."}
    ],
    "temperature": 0.7
  }'

Streaming

bash

curl http://121.41.214.247/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-qyai-key" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "user", "content": "Explain recursion in 3 sentences."}
    ],
    "stream": true
  }'

FAQ

Is QYAI compatible with the OpenAI SDK?

Yes. QYAI is fully OpenAI-compatible — just change the base_url and api_key in your existing OpenAI SDK client. All request and response formats are identical.

What models are available behind the `gpt-4o` and `gpt-4o-mini` aliases?

Both aliases map to deepseek-v4-flash. This allows you to switch providers without changing any model names in your code. You'll get DeepSeek V4-Flash quality at a fraction of the OpenAI price.

How does thinking mode work?

When thinking mode is active, the model returns a reasoning_content field in the response containing its chain-of-thought, followed by content with the final answer. Use deepseek-v4-pro for the best reasoning quality. Do not include reasoning_content in subsequent messages.

What is cache hit pricing?

DeepSeek's API automatically caches prompt prefixes. When a subsequent request's prefix matches a cached one, the input price drops dramatically (e.g., $0.005/M for V4-Flash vs $0.22/M for cache miss). This is automatic — no configuration needed.

What are the rate limits?

Rate limits depend on your plan tier. The basic tier allows 1,000 requests/month and 10 RPM; pro allows 3,000 requests/month and 30 RPM; enterprise allows 10,000 requests/month and 100 RPM. Contact us for custom limits.

Is my data sent to China?

QYAI proxies requests to DeepSeek's API infrastructure. DeepSeek is a Chinese company and their data processing is subject to Chinese regulations. We recommend not sending personally identifiable or sensitive data through any third-party AI API.

QYAI API Documentation

Quick Start

Get your API Key

Configure your client

Send your first request

Authentication

Available Models

Official Models

Compatibility Aliases

Model Comparison

Chat Completions

Parameters

Request Example

Response Example

Thinking Mode

Response with Thinking

Using the deepseek-reasoner Alias

Streaming

Streaming Response Format

List Models

Response

Billing Usage

Response

Pricing

Price Comparison

QYAI V4-Flash

QYAI V4-Pro

OpenAI GPT-4o

Anthropic Claude 3.5

Error Codes

Error Response Format

Code Examples

Python (OpenAI SDK)

Non-streaming

Streaming

JavaScript / Node.js

Non-streaming

Streaming

cURL

Non-streaming

Streaming

FAQ

Is QYAI compatible with the OpenAI SDK?

What models are available behind the gpt-4o and gpt-4o-mini aliases?

How does thinking mode work?

What is cache hit pricing?

What are the rate limits?

Is my data sent to China?

Using the `deepseek-reasoner` Alias

What models are available behind the `gpt-4o` and `gpt-4o-mini` aliases?