#
The highest accuracy web search for your AI
## A web API purpose-built for AIs
Powering millions of daily requests
### Highest accuracy
Production-ready outputs built on cross-referenced facts, with minimal hallucination.
### Predictable costs
Flex compute budget based on task complexity. Pay per query, not per token.
### Evidence-based outputs
Verifiability and provenance for every atomic output.
### Trusted
SOC-II Type 2 Certified, trusted by leading startups and enterprises.
Highest accuracy at every price point
State of the art across several benchmarks
### About the benchmark
This benchmark[benchmark]($https://openai.com/index/browsecomp/), created by OpenAI, contains 1,266 questions requiring multi-hop reasoning, creative search formulation, and synthesis of contextual clues across time periods. Results are reported on a random sample of 100 questions from this benchmark. Learn more in our blog[blog]($https://parallel.ai/blog/introducing-parallel).
### Methodology
- - Dates: All measurements were made between 08/08/2025 and 08/13/2025.
- - Configurations: For all competitors, we report the highest numbers we were able to achieve across multiple configurations of their APIs. The exact configurations are below.
- - GPT-5: high reasoning, high search context, default verbosity
- - Exa: Exa Research Pro
- - Anthropic: Claude Opus 4.1
- - Perplexity: Sonar Deep Research reasoning effort high
## Highest accuracy at every price point
State of the art across several benchmarks
### Browsecomp (LP)
| Category | Accuracy (%) | | ---------- | ------------ | | Ultra8x | 58 | | Ultra | 45 | | Pro | 34 | | GPT-5 | 41 | | Exa | 14 | | Anthropic | 7 | | Perplexity | 6 |
### About the benchmark
This benchmark[benchmark]($https://openai.com/index/browsecomp/), created by OpenAI, contains 1,266 questions requiring multi-hop reasoning, creative search formulation, and synthesis of contextual clues across time periods. Results are reported on a random sample of 100 questions from this benchmark. Learn more in our blog[blog]($https://parallel.ai/blog/introducing-parallel).
### Methodology
- - Dates: All measurements were made between 08/08/2025 and 08/13/2025.
- - Configurations: For all competitors, we report the highest numbers we were able to achieve across multiple configurations of their APIs. The exact configurations are below.
- - GPT-5: high reasoning, high search context, default verbosity
- - Exa: Exa Research Pro
- - Anthropic: Claude Opus 4.1
- - Perplexity: Sonar Deep Research reasoning effort high
### DeepResearch Bench (LP)
| Category | Win Rate (%) | | -------- | ------------ | | Ultra8x | 82 | | Ultra | 74 | | GPT-5 | 66 |
### About the benchmark
This benchmark[benchmark]($https://github.com/Ayanami0730/deep_research_bench) contains 100 expert-level research tasks designed by domain specialists across 22 fields, primarily Science & Technology, Business & Finance, and Software Development. It evaluates AI systems' ability to produce rigorous, long-form research reports on complex topics requiring cross-disciplinary synthesis. Results are reported from the subset of 50 English-language tasks in the benchmark. Learn more in our blog[blog]($https://parallel.ai/blog/introducing-parallel).
### Methodology
- - Dates: All measurements were made between 08/08/2025 and 08/13/2025.
- - Win Rate: Calculated by comparing RACE[RACE]($https://github.com/Ayanami0730/deep_research_bench) scores in direct head-to-head evaluations.
- - Configurations: For all competitors, we report results for the highest numbers we were able to achieve across multiple configurations of their APIs. The exact GPT-5 configuration is high reasoning, high search context, and high verbosity.
- - Excluded API Results: Exa Research Pro (0% win rate), Claude Opus 4.1 (0% win rate), and Perplexity Sonar Deep Research (6% win rate).
### Search MCP Benchmark (LP)
| Series | Model | Cost (CPM) | Accuracy (%) | | --------- | -------- | ---------- | ------------ | | Parallel | GPT 4.1 | 21 | 74.9 | | Parallel | o4 mini | 90 | 82.14 | | Parallel | o3 | 192 | 80.61 | | Parallel | sonnet 4 | 92 | 78.57 | | Native | GPT 4.1 | 27 | 70 | | Native | o4 mini | 190 | 77 | | Native | o3 | 351 | 79.08 | | Native | sonnet 4 | 122 | 68.83 | | Exa | GPT 4.1 | 40 | 58.67 | | Exa | o4 mini | 199 | 61.73 | | Exa | o3 | 342 | 56.12 | | Exa | sonnet 4 | 140 | 67.13 |
CPM: USD per 1000 requests. Cost is shown on a Linear scale.
### About the benchmark
This benchmark, created by Parallel, blends WISER-Fresh and WISER-Atomic. WISER-Fresh is a set of 76 queries requiring the freshest data from the web, generated by Parallel with o3 pro. WISER-Atomic is a set of 120 hard real-world business queries, based on use cases from Parallel customers. Read our blog here[here]($https://parallel.ai/blog/search-api-benchmark).
### Distribution
40% WISER-Fresh
60% WISER-Atomic
### Evaluation
The Parallel Search API was evaluated by comparing three different web search solutions (Parallel MCP server, Exa MCP server/tool calling, LLM native web search) across four different LLMs (GPT 4.1, o4-mini, o3, Claude Sonnet 4).
### WISER-Atomic
| Series | Model | Cost (CPM) | Accuracy (%) | | -------- | -------------- | ---------- | ------------ | | Parallel | Core | 25 | 77 | | Parallel | Base | 10 | 75 | | Parallel | Lite | 5 | 64 | | Others | o3 | 45 | 69 | | Others | 4.1 mini low | 25 | 63 | | Others | gemini 2.5 pro | 36 | 56 | | Others | sonar pro high | 16 | 64 | | Others | sonar low | 5 | 48 |
CPM: USD per 1000 requests. Cost is shown on a Log scale.
### About the benchmark
This benchmark, created by Parallel, contains 121 questions intended to reflect real-world web research queries across a variety of domains. Read our blog here[here]($https://parallel.ai/blog/parallel-task-api).
### Steps of reasoning
50% Multi-Hop questions
50% Single-Hop questions
### Distribution
40% Financial Research
20% Sales Research
20% Recruitment
20% Miscellaneous
### SimpleQA
| Series | Model | Cost (CPM) | Accuracy (%) | | -------- | ---------------- | ---------- | ------------ | | Parallel | Core | 25 | 94 | | Parallel | Base | 10 | 94 | | Parallel | Lite | 5 | 92 | | Others | o3 high | 56 | 92 | | Others | gemini 2.5 flash | 35 | 91 | | Others | 4.1 mini high | 30 | 88 | | Others | sonar pro | 13 | 84 | | Others | sonar | 8 | 81 |
CPM: USD per 1000 requests. Cost is shown on a Log scale.
### About the benchmark
This benchmark contains 4,326 questions focused on short, fact-seeking queries across a variety of domains.
### Steps of reasoning
100% Single-Hop questions
### Distribution
36% Culture
20% Science and Technology
16% Politics
28% Miscellaneous
## The most accurate deep and wide research
Run deeper and more accurate research at scale, for the same compute budget
## Build a dataset from the web
Define your search criteria in natural language, and get back a structured table of matches
## Search, built for AIs
The most accurate search tool, to bring web context to your AI agents
## Custom web enrichment
Bring existing data, define output columns to research, and get fresh web enrichments back
Start building
## Towards a programmatic web for AIs
Parallel is building new interfaces, infrastructure, and business models for AIs to work with the web
Latest updates