
# Introducing Parallel Search: the highest accuracy web search API engineered for AI
Web search, built from the ground up for AI

A second user has arrived on the web: AI. And it needs fundamentally different infrastructure than humans do.
The Parallel Search API, built on our proprietary web index, is now generally available. It's the only web search tool actually designed from the ground up for AI agents: engineered to deliver the most relevant, token-efficient web data at the lowest cost. The result: more accurate answers, fewer round-trips, and lower costs for any agent workflow.
## **Agents are only as good as their context**
Agents are only as effective as the context they have. When you build an agent, your job is simple: give it the tools to reach the next-best decision, complete the task, or generate the best answer. That's it.
And, the key thing that determines success is the** accuracy of the data** you put in the context window and how efficiently you do that.
For search, this means giving your agent access to relevant, fresh, and information-dense context. Without accurate data from your search tool, everything else downstream fails and issues compound.
## **Human search and AI search solve different problems**
Traditional search engines were built for humans. They rank URLs, assuming someone will click through and navigate a page with their eyes and brain. The search engine's job ends at the link. The system optimizes for keywords, click-through rates, and page layouts designed for browsing - done in milliseconds and as cheaply as possible.
The first wave of web search APIs used in AI-based search made this human search paradigm programmatically accessible, but failed to solve the underlying problem of how you design search for an AI agent’s needs.
AI search has to solve a different problem: **what tokens do you put in an agent's context window to help it complete the task? You're not ranking URLs for humans to click—you're ranking context and tokens for models to reason over.**
This requires a fundamentally different search architecture:
- - **Semantic objectives** that capture intent beyond keyword matching, so agents can specify what they need to accomplish rather than guessing at search terms
- - **Token-relevance ranking** to find webpages most directly relevant to the objective, not pages optimized for human engagement metrics
- - **Information-dense excerpts** compressed and prioritized for reasoning quality, so LLMs have the highest-signal tokens in their context window
- - **Single-call resolution** for complex queries that normally require multiple search hops
With this search architecture built from the ground up for AIs, agents get access to the most information-dense web tokens in their context. The result is higher accuracy, lower cost, and lower latency.
## **On every benchmark that matters for real-world agent use cases, Parallel wins on accuracy**
While most existing search systems are optimized for straightforward question answering, we believe the demand for more complex, multifaceted search will only continue to grow. Users and agents alike will increasingly seek answers that require synthesizing information across multiple sources, reasoning over complex objectives, and navigating harder-to-access content on the web.
To reflect this shift, we evaluated the performance of Parallel’s Search API across a range of benchmarks, from simple single-hop queries (e.g., SimpleQA) to challenging multi-hop tasks (e.g., BrowseComp).
### On simple searches, Parallel is the lowest cost with parity in accuracy
For straightforward factual queries like those in SimpleQA, Parallel's accuracy is on par at 98% while delivering the lowest per-query cost in the market (~50-80% lower than alternatives).
### About this benchmark
This benchmark[benchmark]($https://openai.com/index/introducing-simpleqa/), created by OpenAI, contains 4,326 questions focused on short, fact-seeking queries across a variety of domains. Results are reported on a sample of 100 questions from this benchmark.
### Methodology
- - **Evaluation**: Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
- - **Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
- - **Testing Dates**: Testing was conducted from November 3rd to November 5th.
### SimpleQA Search
| Series | Model | Cost (CPM) | Accuracy (%) | | --------- | ------------ | ----------- | ------------ | | Parallel | parallel | 17 | 98 | | Others | exa | 57 | 87 | | Others | tavily | 110 | 93 | | Others | perplexity | 52 | 92 | | Others | openai gpt-5 | 37 | 98 |
### About this benchmark
This benchmark[benchmark]($https://openai.com/index/introducing-simpleqa/), created by OpenAI, contains 4,326 questions focused on short, fact-seeking queries across a variety of domains. Results are reported on a sample of 100 questions from this benchmark.
### Methodology
- - **Evaluation**: Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
- - **Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
- - **Testing Dates**: Testing was conducted from November 3rd to November 5th.
### For complex searches, Parallel is the highest accuracy at the lowest cost
The performance advantage widens dramatically on challenging queries - those that span multiple topics, require deep comprehension of hard to crawl web content, or demand synthesis across scattered sources with multiple reasoning steps. On benchmarks specifically designed to test multi-hop reasoning (Batched SimpleQA, WebWalker, BrowseComp, FRAMES, HLE), Parallel not only achieves higher accuracy but also resolves queries through fundamentally more efficient reasoning paths.
Traditional search APIs can’t handle this complexity in a single pass. They force agents to perform a series of sequential searches - compounding latency, inflating context windows, and increasing token costs with every iteration. Parallel, by contrast, resolves most of these complex queries in a single call, reducing both end-to-end latency and total cost while maintaining high accuracy.
### About this benchmark
This benchmark[benchmark]($https://lastexam.ai/) consists of 2,500 questions developed by subject-matter experts across dozens of subjects (e.g. math, humanities, natural sciences). Each question has a known solution that is unambiguous and easily verifiable, but requires sophisticated web retrieval and reasoning. Results are reported on a sample of 100 questions from this benchmark.
### Methodology
- - **Evaluation**: Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
- - **Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
- - **Testing Dates**: Testing was conducted from November 3rd to November 5th.
### HLE Search
| Series | Model | Cost (CPM) | Accuracy (%) | | --------- | ------------ | ----------- | ------------ | | Parallel | parallel | 82 | 47 | | Others | exa | 138 | 24 | | Others | tavily | 190 | 21 | | Others | perplexity | 126 | 30 | | Others | openai gpt-5 | 143 | 45 |
### About this benchmark
This benchmark[benchmark]($https://lastexam.ai/) consists of 2,500 questions developed by subject-matter experts across dozens of subjects (e.g. math, humanities, natural sciences). Each question has a known solution that is unambiguous and easily verifiable, but requires sophisticated web retrieval and reasoning. Results are reported on a sample of 100 questions from this benchmark.
### Methodology
- - **Evaluation**: Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
- - **Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
- - **Testing Dates**: Testing was conducted from November 3rd to November 5th.
### BrowseComp Search
| Series | Model | Cost (CPM) | Accuracy (%) | | --------- | ------------ | ----------- | ------------ | | Parallel | parallel | 156 | 58 | | Others | exa | 233 | 29 | | Others | tavily | 314 | 23 | | Others | perplexity | 256 | 22 | | Others | openai gpt-5 | 253 | 53 |
### About the benchmark
This benchmark[benchmark]($https://openai.com/index/browsecomp/), created by OpenAI, contains 1,266 questions requiring multi-hop reasoning, creative search formulation, and synthesis of contextual clues across time periods. Results are reported on a sample of 100 questions from this benchmark.
### Methodology
- - **Evaluation**: Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
- - **Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
- - **Testing Dates**: Testing was conducted from November 3rd to November 5th.
### WebWalker-Search
| Series | Model | Cost (CPM) | Accuracy (%) | | --------- | ------------ | ----------- | ------------ | | Parallel | parallel | 42 | 81 | | Others | exa | 107 | 48 | | Others | tavily | 156 | 79 | | Others | perplexity | 91 | 67 | | Others | openai gpt-5 | 88 | 73 |
### About this benchmark
This benchmark[benchmark]($https://arxiv.org/abs/2501.07572) is designed to assess the ability of LLMs to perform web traversal. To successfully answer the questions in the benchmark, it requires the ability to crawl and extract content from website subpages. Results are reported on a sample of 100 questions from this benchmark.
### Methodology
- - **Evaluation**: Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
- - **Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
- - **Testing Dates**: Testing was conducted from November 3rd to November 5th.
### FRAMES-Search
| Series | Model | Cost (CPM) | Accuracy (%) | | --------- | ------------ | ----------- | ------------ | | Parallel | parallel | 42 | 92 | | Others | exa | 81 | 81 | | Others | tavily | 122 | 87 | | Others | perplexity | 95 | 83 | | Others | openai gpt-5 | 68 | 90 |
### About this benchmark
This benchmark[benchmark]($https://huggingface.co/datasets/google/frames-benchmark) contains 824 challenging multi-hop questions designed to test factuality, retrieval accuracy, and reasoning. Results are reported on a sample of 100 questions from this benchmark.
### Methodology
- - **Evaluation**: Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
- - **Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
- - **Testing Dates**: Testing was conducted from November 3rd to November 5th.
### Batched SimpleQA - Search
| Series | Model | Cost (CPM) | Accuracy (%) | | --------- | ------------ | ----------- | ------------ | | Parallel | parallel | 50 | 90 | | Others | exa | 119 | 71 | | Others | tavily | 227 | 59 | | Others | perplexity | 100 | 74 | | Others | openai gpt-5 | 91 | 88 |
### About this benchmark
This benchmark was created by batching 3 independent questions from the original SimpleQA dataset[SimpleQA dataset]($https://openai.com/index/introducing-simpleqa/) to create 100 composite, more complex, questions.
### Methodology
- - **Evaluation**: Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
- - **Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
- - **Testing Dates**: Testing was conducted from November 3rd to November 5th.
Across these multi-hop benchmarks, agents using Parallel achieve state-of-the-art accuracy at ~50% of the cost, compared to workflows built on traditional search APIs.
## **These results are possible because we've built a proprietary web index and a vertically-integrated search stack from the ground up, designed for AIs**
We are able to achieve state-of-the-art results because we have spent the last two years building the infrastructure to control the full search stack, enabling optimization at every layer and feedback loops that continuously improve performance.
**Index:** One of the fastest-growing, freshest, deepest, and largest web indexes with billions of pages crawled and indexed specifically for AI consumption, with hundreds of millions of new pages added daily.
**Crawl:** Infrastructure to access the most challenging corners of the web at scale: multi-modal, lengthy PDFs, JavaScript-heavy sites, and CAPTCHA-protected content.
**Ranking:** We rank and compress webpage results based on the density of tokens that are suitable for LLM reasoning, not pages optimized for human browsing. Our proprietary models and algorithms score based on token relevance, context window efficiency, and cross-source validation, rather than click-through probability or engagement.
## **Leading AI teams build on our Search API - and so do we**
Today, the most sophisticated builders choose to create and deploy AI, with search powered by Parallel. This includes Clay, Amp by Sourcegraph, Owner, Starbridge, Actively, Genpact, and leading Fortune 100 companies. These companies have tested alternatives and understand that the decisions their agents make, whether it’s Sourcegraph Amp’s coding agent solving bugs, _Claygent_ powering every GTM decision, Starbridge discovering government RFPs, or a Fortune 100 insurer underwriting claims better than human underwriters, all depend on the quality of their web data.
We use our own Search API as foundational infrastructure to power our Web Agents. For example, the Parallel Task API, our higher-level research API that serves complex, multi-step enrichment and deep research queries, is built on the Search API. Every Task API query that runs in production depends on the Search API performing flawlessly underneath.
This architectural decision forces us to hold ourselves to the highest standard. Every performance improvement, latency optimization, and quality enhancement in the Search API directly impacts our own production systems serving millions of queries daily. We feel every token of inefficiency and every accuracy gap immediately in our own workflows.
The result is infrastructure that's been battle-tested and continuously refined under the demands of real-world agent workloads.
## **Give your agents access to Parallel Search**
The accuracy of data in your agent’s context window is the single most important factor in the ability of the agent to complete a task effectively. Give your agents the most accurate and compressed context from the web with the Parallel Search API.
Give your agents access to better search. Get started in our Developer Platform[Developer Platform]($https://platform.parallel.ai/play/search) or dive into the documentation[documentation]($https://docs.parallel.ai/search/search-quickstart).
## **Notes on Methodology**
**Benchmark Details**: Various search providers were evaluated against a wide set of benchmarks ranging from simple benchmarks (SimpleQA) to more complex benchmarks (HLE, BrowseComp, Batched SimpleQA, WebWalker, and Frames).
**Evaluation**: Results are based on tests run using official MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM as a judge (GPT 4.1).
**Cost Calculation**: Cost reflects the average cost per query across all questions run. This cost includes both the search API call and LLM token cost.
**Testing Dates**: Testing was conducted from November 3rd to November 5th.
By Parallel
November 6, 2025






























