# How to switch from OpenAI web search to Parallel Search API

Reading time: 5 min
How to switch from OpenAI web search to Parallel Search API

OpenAI charges **$25 per 1,000 web search calls** for non-reasoning models like GPT-4o and GPT-4.1, and $10 per 1,000 calls for reasoning models like GPT-5. Parallel's Search API costs **$5 per 1,000 requests**, with compressed excerpts included in every result. That's an 80% cost reduction on the standard tier and 50% on the reasoning tier, with higher accuracy across four major benchmarks.

This guide covers why the switch makes sense, what changes in your codebase, and how to migrate in Python and TypeScript.


**What OpenAI charges for web search**

OpenAI's web search pricing splits into two tiers based on the model you're calling:

Model tierCost per 1,000 search calls
Non-reasoning (GPT-4o, GPT-4.1)$25
Reasoning (GPT-5)$10

Two details make the effective cost higher than it looks. First, OpenAI's model can trigger multiple search calls per API request, so a single /v1/responses call might generate two or three billable searches. Second, you pay for the search context tokens on top of the per-call fee (at the model's standard token rate). The bill adds up fast in production.

OpenAI's marketing pricing page shows only the $25 figure. The $10 reasoning-model tier appears on the developer pricing page. If you've seen conflicting numbers, that's why.


**What Parallel charges**

Parallel's Search API costs **$5 per 1,000 requests**. Each request returns up to 10 results with compressed, query-relevant excerpts included at no extra charge. Additional results beyond 10 cost $1 per 1,000.

No token fees on top. No per-model pricing tiers. One price.

OpenAI (reasoning)OpenAI (non-reasoning)Parallel
Per 1,000 search calls$25$10$5
Excerpts/content includedIncludedToken fees applyIncluded
Rate limitVariesVaries600 req/min

## What 10,000 searches actually cost

Here's what a realistic monthly workload costs across three configurations: an AI agent making 10,000 web search calls per month, using GPT-4.1 ($2/1M input, $8/1M output) for reasoning. Each query retrieves roughly 4,000 tokens of search context and generates a 500-token response.

OpenAI web search (GPT-4.1)OpenAI web search (GPT-5)Parallel Search API + GPT-4.1
Search calls10,000 × $0.01 = $10010,000 × $0.025 = $25010,000 × $0.005 = $50
Search context tokens40M tokens × $2/1M = $80Free (included in $25 tier)25M tokens × $2/1M = $50
Output tokens5M tokens × $8/1M = $405M tokens × $10/1M = $505M tokens × $8/1M = $40
Monthly total$220$300$140

A few things to note in the math.

OpenAI's GA web search tool costs **$10 per 1,000 calls** with GPT-4o and GPT-4.1. The search content tokens (web results injected into the model's context) get billed at the model's input rate on top of that. With GPT-5, search calls jump to **$25 per 1,000**, though the search content tokens are free. Either way, you don't control how much content the model retrieves per search.

Parallel's Search API costs **$5 per 1,000 calls** regardless of which LLM you pair it with. Excerpts are included. The search context column is lower (25M vs. 40M tokens) because Parallel returns compressed, query-relevant excerpts rather than raw page content. You set the excerpt length with max_chars_per_result, so you control exactly how many tokens reach your LLM.

At minimum, Parallel saves you **36% over the GPT-4.1 configuration** and **53% over the GPT-5 configuration**. The savings scale linearly: at 100,000 searches per month, the gap between OpenAI + GPT-5 ($3,000) and Parallel + GPT-4.1 ($1,400) is $1,600.


**Why the cost gap reflects a design difference**

OpenAI treats web search as a tool bolted onto its language models. You send a prompt, the model decides whether to search, and the search results get injected into the model's context window as raw tokens you pay for. You don't control how many searches the model runs, what context it retrieves, or how many tokens it consumes.

Parallel built its Search API as standalone infrastructure. You call the endpoint, define your search objective in natural language, and get back structured JSON with ranked URLs and dense excerpts. You control what goes into your LLM's context window, how many results you retrieve, and how many characters each excerpt contains.

The practical difference: Parallel returns token-efficient excerpts compressed around your query's intent, not raw page content. Every token in the response earns its place in your context window.


**Parallel scores higher on every major benchmark**

We tested Parallel's Search API against OpenAI GPT-5 (with web search enabled) across four public benchmarks, using 100-question samples from each. GPT-5 served as the search-calling agent, and GPT-4.1 judged the answers.

BenchmarkParallelOpenAI GPT-5Parallel Cost (Per 1K)OpenAI Cost (Per 1K)
SimpleQA (factual Q&A)98%98%$17$37
FRAMES (multi-hop reasoning)92%90%$42$68
BrowseComp (complex web browsing)58%53%$156$253
HLE (expert-level questions)47%45%$82$143

_Note: Despite our best efforts, these figures may not always be up to date. For the latest benchmarks, visit our benchmarks hub[benchmarks hub](/benchmarks)._

Parallel matches or beats OpenAI on accuracy across all four benchmarks. The cost column includes both the search API fees and LLM inference costs. On FRAMES, Parallel leads by 2 points at 38% lower total cost. On BrowseComp, the hardest benchmark in this set, Parallel leads by 5 points at 38% less.

These benchmarks are self-reported. We've published the methodology, competitor configurations, and judge model for transparency, and we use standardized public benchmark question sets.

## Migrate from OpenAI's web search to Parallel's Search API

pip install parallel-web

Get your API key at platform.parallel.ai[platform.parallel.ai](https://platform.parallel.ai/). The free tier includes credits (up to 16,000 searches) with no credit card required.

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-4o",
    tools=[{"type": "web_search"}],
    input="What are the latest SEC rulings on digital asset custody?",
)

print(response.output_text)

With this approach, you don't control how many searches the model runs, what content it retrieves, or how many tokens it consumes. The model decides all of that.

import os
from parallel import Parallel

client = Parallel(api_key=os.environ["PARALLEL_API_KEY"])

results = client.beta.search(
    objective="Find the latest SEC rulings on digital asset custody, "
              "including rule numbers and effective dates.",
    search_queries=["SEC digital asset custody rules 2026"],
    mode="fast",
    max_results=5,
    excerpts={"max_chars_per_result": 3000},
)

for result in results.results:
    print(f"{result.title}")
    print(f"{result.url}")
    for excerpt in result.excerpts:
        print(excerpt)

You now control the search objective, keyword queries, result count, and excerpt length. The response comes back as structured JSON with ranked URLs, page titles, publish dates, and compressed excerpts ready to inject into any LLM's context window.

### Feeding Parallel results into your LLM
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import os from openai import OpenAI as OpenAIClient from parallel import Parallel parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"]) openai = OpenAIClient() # Step 1: Search with Parallel search = parallel.beta.search( objective="Recent Federal Reserve statements on interest rate policy", search_queries=["Fed interest rate decision March 2026"], mode="one-shot", max_results=5, excerpts={"max_chars_per_result": 4000}, ) # Step 2: Build context from search results context = "\n\n".join( f"Source: {r.title} ({r.url})\n{chr(10).join(r.excerpts)}" for r in search.results ) # Step 3: Pass context to your LLM response = openai.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Answer using only the provided sources. " "Cite URLs for each claim."}, {"role": "user", "content": f"Context:\n{context}\n\n" f"Question: What did the Fed announce about " f"interest rates this month?"}, ], ) print(response.choices[0].message.content)```
import os
from openai import OpenAI as OpenAIClient
from parallel import Parallel
 
parallel = Parallel(api_key=os.environ["PARALLEL_API_KEY"])
openai = OpenAIClient()
 
# Step 1: Search with Parallel
search = parallel.beta.search(
objective="Recent Federal Reserve statements on interest rate policy",
search_queries=["Fed interest rate decision March 2026"],
mode="one-shot",
max_results=5,
excerpts={"max_chars_per_result": 4000},
)
 
# Step 2: Build context from search results
context = "\n\n".join(
f"Source: {r.title} ({r.url})\n{chr(10).join(r.excerpts)}"
for r in search.results
)
 
# Step 3: Pass context to your LLM
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Answer using only the provided sources. "
"Cite URLs for each claim."},
{"role": "user", "content": f"Context:\n{context}\n\n"
f"Question: What did the Fed announce about "
f"interest rates this month?"},
],
)
 
print(response.choices[0].message.content)
```

Parallel's Search API is model-agnostic. You call it for search, then pass the results into whatever LLM you use for reasoning. A typical pattern:

This pattern gives you the best of both: Parallel's search accuracy and excerpt quality, with your choice of LLM for reasoning. You also avoid paying OpenAI's $25 per 1,000 web search surcharge, since the LLM call contains no search tool.

## Get started

  1. Sign up at platform.parallel.ai[platform.parallel.ai](https://platform.parallel.ai/) with a work email. You'll get free credits, no credit card required.
  2. Install the SDK: pip install parallel-web or npm install parallel-web.
  3. Set your API key: export PARALLEL_API_KEY="your_key".
  4. Run your first search or swap your Chat API base URL.

Full documentation is at docs.parallel.ai[docs.parallel.ai](https://docs.parallel.ai/). If you have questions about migrating a production workload, reach out at support@parallel.ai.

Parallel avatar

By Parallel

March 27, 2026

## Related Articles8

OpenClaw vs Hermes | Comparison

- [OpenClaw vs. Nous Research Hermes: understanding two open-source personal AI agents](https://parallel.ai/articles/openclaw-vs-nousresearch-hermes)

Tags:Comparison
Reading time: 7 min
Websets vs FindAll | Comparison

- [Exa Websets vs. Parallel FindAll API: A comprehensive comparison](https://parallel.ai/articles/exa-vs-parallel-findall)

Tags:Comparison
Reading time: 6 min
Exa vs Parallel | Comparison

- [Exa vs. Parallel: a platform comparison for AI developers](https://parallel.ai/articles/exa-vs-parallel)

Tags:Comparison
Reading time: 9 min
Tavily vs Parallel | Comparison

- [Tavily vs. Parallel: choosing a search API for your AI agent](https://parallel.ai/articles/tavily-vs-parallel-search)

Tags:Comparison
Reading time: 6 min
What is a CLI | Industry Terms

- [What is a CLI, and why do AI agents like using them?](https://parallel.ai/articles/what-is-a-cli)

Tags:Industry Terms
Reading time: 7 min
The best web search for OpenClaw

- [OpenClaw web search best practices: getting maximum accuracy from Parallel](https://parallel.ai/articles/openclaw-best-practices-web-search)

Tags:Guides
Reading time: 10 min
Bing API vs. Parallel Search API

- [Bing API alternatives: top solutions for 2026](https://parallel.ai/articles/bing-api-comparison)

Tags:Comparison
Reading time: 6 min
What is an agent harness? | Industry terms

- [What is an agent harness?](https://parallel.ai/articles/what-is-an-agent-harness)

Tags:Industry Terms
Reading time: 23 min
![Company Logo](https://parallel.ai/parallel-logo-540.png)

Contact

  • hello@parallel.ai[hello@parallel.ai](mailto:hello@parallel.ai)

Products

  • Search API[Search API](https://docs.parallel.ai/search/search-quickstart)
  • Extract API[Extract API](https://docs.parallel.ai/extract/extract-quickstart)
  • Task API[Task API](https://docs.parallel.ai/task-api/task-quickstart)
  • FindAll API[FindAll API](https://docs.parallel.ai/findall-api/findall-quickstart)
  • Chat API[Chat API](https://docs.parallel.ai/chat-api/chat-quickstart)
  • Monitor API[Monitor API](https://docs.parallel.ai/monitor-api/monitor-quickstart)

Resources

  • About[About](https://parallel.ai/about)
  • Pricing[Pricing](https://parallel.ai/pricing)
  • Docs[Docs](https://docs.parallel.ai)
  • Blog[Blog](https://parallel.ai/blog)
  • Changelog[Changelog](https://docs.parallel.ai/resources/changelog)
  • Careers[Careers](https://jobs.ashbyhq.com/parallel)

Info

  • Terms of Service[Terms of Service](https://parallel.ai/terms-of-service)
  • Customer Terms[Customer Terms](https://parallel.ai/customer-terms)
  • Privacy[Privacy](https://parallel.ai/privacy-policy)
  • Acceptable Use[Acceptable Use](https://parallel.ai/acceptable-use-policy)
  • Trust Center[Trust Center](https://trust.parallel.ai/)
  • Report Security Issue[Report Security Issue](mailto:security@parallel.ai)
LinkedIn[LinkedIn](https://www.linkedin.com/company/parallel-web/about/)Twitter[Twitter](https://x.com/p0)GitHub[GitHub](https://github.com/parallel-web)
All Systems Operational
![SOC 2 Compliant](https://parallel.ai/soc2.svg)

Parallel Web Systems Inc. 2026