GPT-5.5 API Pricing Guide 2026: Cost, Cached Input & Long-Context Tiers

GPT-5.5 API pricing on EvoLink is $4.00 per 1M input tokens, $24.00 per 1M output tokens, and $0.40 per 1M cached input tokens. For sessions above 272K input tokens, long-context pricing applies at $8.00 input and $36.00 output per 1M tokens.

This guide focuses only on GPT-5.5 pricing. If you want the full GPT family comparison, use the broader GPT-5 API pricing comparison.

Pricing note: The GPT-5.5 numbers in this article use EvoLink listed pricing as of April 26, 2026. OpenAI public pricing should be checked separately before quoting any value as an OpenAI direct rate.

GPT-5.5 API Pricing Table

Billing item	EvoLink price	Notes
Standard input	$4.00 / 1M tokens	Prompt, system instructions, conversation history, and other input text
Output	$24.00 / 1M tokens	Visible answer tokens plus reasoning tokens when applicable
Cached input	$0.40 / 1M tokens	Reused prompt/context segments billed at a lower rate
Long-context input	$8.00 / 1M tokens	Applies when input exceeds 272K tokens
Long-context output	$36.00 / 1M tokens	Applies in the same long-context session
Context window	1M tokens	Use long-context pricing rules when large prompts cross the threshold
Max output	128K tokens	Output budget, not a guaranteed response length

The most important pricing rule is the 272K threshold. GPT-5.5 can support a 1M-token context window, but very large prompts can move the whole session into the long-context rate.

How GPT-5.5 Billing Works

GPT-5.5 billing has three main token categories: input, output, and cached input.

Input tokens are the tokens you send to the model. They include your user prompt, system message, prior conversation, retrieved documents, code snippets, and tool instructions.

Output tokens are the tokens generated by the model. For reasoning models, output can include reasoning tokens in addition to visible answer text, depending on the API response and model configuration.

Cached input tokens are repeated input segments that can be billed at a lower rate. Caching matters most when your product sends the same system prompt, policy block, tool description, documentation pack, or conversation scaffold again and again.

Cached Input Example

Suppose your application sends a stable 50K-token instruction and documentation block.

Request type	Calculation	Cost
First uncached request	50K x $4.00 / 1M	$0.20
Later cached request	50K x $0.40 / 1M	$0.02

That difference is why stable prompt design matters. Keep reusable instructions identical across requests and place long, stable context where it can be reused consistently.

Long-Context Pricing Above 272K Tokens

GPT-5.5 has a large context window, but long-context prompts need a separate cost plan. On EvoLink, when the input exceeds 272K tokens, the long-context rate is:

GPT-5.5 tier	Input	Output
Standard pricing	$4.00 / 1M	$24.00 / 1M
Long-context pricing	$8.00 / 1M	$36.00 / 1M

The long-context rate applies to the session, not only to the tokens above 272K. If you send 300K input tokens, all 300K input tokens are priced at the long-context input rate.

Long-Context Cost Example

Here is a 300K input / 20K output request:

Line item	Calculation	Cost
Input	300K x $8.00 / 1M	$2.40
Output	20K x $36.00 / 1M	$0.72
Total	$2.40 + $0.72	$3.12

If the same request were below the long-context threshold, the equivalent standard-rate cost would be $1.68. That does not mean you should always chunk aggressively; it means you should decide whether one full-context request is worth the higher price.

Example GPT-5.5 API Costs

Use these examples as planning estimates. Your real bill depends on prompt length, output length, cache hit rate, retries, and whether reasoning tokens are generated.

Scenario	Input	Output	Rate used	Estimated cost
Customer support answer	2K	500	Standard	$0.020
Code review task	20K	5K	Standard	$0.200
Repository analysis	300K	20K	Long-context	$3.120

The cost math:

2K input + 500 output = (2,000 x $4 / 1M) + (500 x $24 / 1M) = $0.020

20K input + 5K output = (20,000 x $4 / 1M) + (5,000 x $24 / 1M) = $0.200

300K input + 20K output = (300,000 x $8 / 1M) + (20,000 x $36 / 1M) = $3.120

GPT-5.5 vs GPT-5.4 Pricing

GPT-5.5 is the premium GPT route. GPT-5.4 is the lower-cost flagship route. This section is intentionally short because a full model comparison should live in a separate GPT-5.5 vs GPT-5.4 article.

Model	Input	Output	Cached input	Context
GPT-5.5	$4.00 / 1M	$24.00 / 1M	$0.40 / 1M	1M
GPT-5.4	$2.00 / 1M	$12.00 / 1M	$0.20 / 1M	1.05M

Use GPT-5.4 when you need long context at a lower price. Test GPT-5.5 when the task is reasoning-heavy, quality-sensitive, or expensive to retry.

When Is GPT-5.5 Worth the Cost?

GPT-5.5 is not the default choice for every request. It is best used where the task value justifies premium pricing.

Good Fits

Complex reasoning where wrong answers are expensive

Full-codebase analysis, architecture review, and multi-file debugging

Research synthesis across many documents

Agent workflows where planning quality reduces retries

High-value outputs that need fewer manual corrections

Poor Fits

Simple classification

Bulk summarization

Lightweight extraction

Low-margin content generation

Prototyping where a cheaper model is good enough

The practical rule is simple: use GPT-5.5 when better reasoning can reduce failures, retries, or human review. Use cheaper GPT routes when the task is routine.

How to Reduce GPT-5.5 API Cost

1. Cache Stable Prompts

Keep reusable system prompts, policies, tool descriptions, and documentation blocks stable. Cached input is $0.40 / 1M tokens instead of $4.00 / 1M.

2. Route Simple Work Elsewhere

Do not send every request to GPT-5.5. Use lower-cost GPT routes for simple tasks, and reserve GPT-5.5 for escalation or high-value reasoning.

def select_model(task_complexity: str) -> str:

    if task_complexity == "simple":

        return "gpt-5.1"

    if task_complexity == "standard":

        return "gpt-5.2"

    if task_complexity == "long_context":

        return "gpt-5.4"

    return "gpt-5.5"

3. Avoid Unnecessary Long-Context Requests

If your prompt is near 272K input tokens, check whether retrieval, summarization, or chunking can reduce the request without hurting answer quality.

4. Track Cost Per Successful Task

Cost per token is only one metric. Track retries, validation failures, human review time, latency, and final success rate. A more expensive model can be cheaper if it avoids repeated failed attempts, but that has to be measured in your own workflow.

5. Use GPT-5.5 as an Escalation Route

One common pattern is to start with GPT-5.2 or GPT-5.4 and escalate to GPT-5.5 only when validation fails, confidence is low, or the user requests a deeper pass.

FAQ

How much does GPT-5.5 API cost?

GPT-5.5 costs $4.00 per 1M input tokens, $24.00 per 1M output tokens, and $0.40 per 1M cached input tokens on EvoLink. Long-context pricing above 272K input tokens is $8.00 input and $36.00 output per 1M tokens.

What is GPT-5.5 cached input pricing?

GPT-5.5 cached input pricing on EvoLink is $0.40 per 1M tokens. Cached input is useful when your application repeats stable instructions, documentation, tool definitions, or conversation scaffolds.

What happens above 272K input tokens?

When input exceeds 272K tokens, GPT-5.5 uses long-context pricing on EvoLink: $8.00 per 1M input tokens and $36.00 per 1M output tokens. The long-context rate applies to the session.

Is GPT-5.5 more expensive than GPT-5.4?

Yes. GPT-5.5 is priced higher than GPT-5.4. GPT-5.5 is $4.00 / $24.00 per 1M input/output tokens on EvoLink, while GPT-5.4 is $2.00 / $12.00.

Is GPT-5.5 worth it for coding?

GPT-5.5 is worth testing for complex coding tasks such as multi-file debugging, repository analysis, architecture review, and agentic coding workflows. For simple code completion or small edits, a lower-cost GPT route may be more efficient.

Can I use GPT-5.5 with an OpenAI-compatible API?

Yes. EvoLink provides an OpenAI-compatible integration path, so most teams can migrate by changing the base URL, API key, and model value.

from openai import OpenAI



client = OpenAI(

    api_key="your-evolink-api-key",

    base_url="https://api.evolink.ai/v1"

)



response = client.chat.completions.create(

    model="gpt-5.5",

    messages=[

        {"role": "user", "content": "Summarize the main risks in this codebase."}

    ]

)

Where can I compare GPT-5.5 with other GPT models?

Use the GPT model family page for the broader model lineup, or read the GPT-5 API pricing comparison for GPT-5.5, GPT-5.4, GPT-5.2, and GPT-5.1 pricing in one table.

Start With GPT-5.5 Pricing, Then Test on Your Own Tasks

GPT-5.5 is a premium route, so the right question is not only "How much does it cost per token?" The better question is "What does it cost per successful task?"

Start with a small test set, measure retries and review time, compare GPT-5.5 against GPT-5.4 or GPT-5.2, and reserve GPT-5.5 for the workflows where it changes the outcome.

Compare GPT models on EvoLink