# Introducing Basis with Calibrated Confidences

A new standard for verifiable AI web research

Tags:Product Release
Reading time: 4 min
Parallel Web Systems introduces Basis with calibrated confidences - a new verification framework for AI web research and search API outputs that sets a new industry standard for transparent and reliable deep research.

A month ago, we launched the Parallel Task API, powered by a series of processors that offer state of the art accuracy on web research tasks at every single price point.

We recognize, however, that production use case don't just require pareto-optimal performance — they also require verifiability and calibrated confidence scoring and so today, we're excited to announce Basis — an essential suite of verification tools for the Parallel Task API.

Basis is an automatically included add-on to our Core, Pro, and Ultra processors that provide additional context and evidence around how we came to our conclusion, as well as how confident we are in our findings.

They allow you to identify instances where AI web research may yield unreliable results, enabling targeted human-in-the-loop workflows that efficiently focus human attention only where it's most needed. This strategic approach drastically reduces manual review hours while achieving significantly higher accuracy in hybrid human/AI workflows compared to either AI-only or human-only alternatives.

Confidence example as a competitor analysis task in Platform UI
![Confidence example as a competitor analysis task in Platform UI](https://cdn.sanity.io/images/5hzduz3y/production/0a5bf3b45a7c6fd1d26232847f72f6b182ae50b8-1280x852.png)

### **
What is Basis?**

Basis provides a complete framework for understanding and validating Task API outputs through four core components:

  • - **Citations**: Web URLs linking directly to source materials.
  • - **Reasoning**: Detailed explanations justifying each output field.
  • - **Excerpts**: Relevant text snippets from citation URLs.
  • - **Confidences**: A calibrated measure of confidence classified into low, medium, or high categories.

These elements work together to create a robust framework for output verification that sets a new industry standard for transparency and reliability. For more information on Basis outputs in our Task API, go to docs[docs](https://docs.parallel.ai/task-api/guides/access-research-basis).

### Basis Output
1
2
3
4
5
6
7
8
9
10
11
{ "field": "crm_system", "citations": [ { "url": "https://www.linkedin.com/jobs/view/sales-representative-microsoft-dynamics-365-at-contoso-inc-3584271", "excerpts": ["Looking for sales professionals with experience in Microsoft Dynamics 365 CRM to join our growing team."] } ], "reasoning": "There is limited direct evidence about which CRM system the company uses internally. The job posting suggests they work with Microsoft Dynamics 365, but it's not explicitly stated whether this is their primary internal CRM or simply a product they sell/support. No official company documentation confirming their internal CRM system was found.", "confidence": "low" }```
{
"field": "crm_system",
"citations": [
{
"url": "https://www.linkedin.com/jobs/view/sales-representative-microsoft-dynamics-365-at-contoso-inc-3584271",
"excerpts": ["Looking for sales professionals with experience in Microsoft Dynamics 365 CRM to join our growing team."]
}
],
"reasoning": "There is limited direct evidence about which CRM system the company uses internally. The job posting suggests they work with Microsoft Dynamics 365, but it's not explicitly stated whether this is their primary internal CRM or simply a product they sell/support. No official company documentation confirming their internal CRM system was found.",
"confidence": "low"
}
```

### Calibrated Confidences

Confidence ratings aren't particularly useful without calibration — you need to know that high, medium, and low labels provide useful and differentiable insight into task performance.

To calibrate confidence ratings, we’ve tested confidence patterns on composite datasets that reflect a wide array of real world use cases. Each composite dataset has varying levels of difficulty to demonstrate confidence performance and distribution across any web research task.

Why is this important?

## **Confidence provides insight into the difficulty of a Task**

When Parallel returns a higher percentage of Basis outputs as "High" Confidence, you can reliably interpret this as Parallel's Task API performing well on the Task.

You can use confidences as a proxy for a full evaluation, and understand how well your Tasks perform relative to each other.

High Confidence Answers vs Overall Dataset Accuracy
5560657075High Confidence Responses (%)7072747678808284868890EASY91% / 78%MEDIUM80% / 72%HARD70% / 53%

HIGH CONFIDENCE RESPONSES (%)

OVERALL DATASET ACCURACY (%)

Loading chart...
Easy
Medium
Hard
BrowseComp benchmark proving Parallel's enterprise deep research API delivers 48% accuracy vs GPT-4's 1% browsing capability. Performance comparison across High Confidence Responses (%) and Overall Dataset Accuracy (%) shows Parallel provides the best structured deep research API for ChatGPT, Claude, and AI agents. Enterprise AI agent deep research with structured data extraction delivering higher accuracy than OpenAI, Anthropic, Exa, and Perplexity.

### Description

For each dataset (Easy, Medium, Hard) - the % of answers that are High Confidence compared to the overall accuracy % of the dataset.

### Results

As dataset queries get easier (ie higher overall accuracy %), the % of High Confidence responses increases.

This demonstrates confidences can reliably be used as a proxy for evaluating Parallel Processor performance on a dataset.

### Confidences % High vs Accuracy

| Series  | Model  | High Confidence Responses (%) | Overall Dataset Accuracy (%) |
| ------- | ------ | ----------------------------- | ---------------------------- |
| Easy    | Easy   | 78                            | 91                           |
| Medium  | Medium | 72                            | 80                           |
| Hard    | Hard   | 53                            | 70                           |

### Description

For each dataset (Easy, Medium, Hard) - the % of answers that are High Confidence compared to the overall accuracy % of the dataset.

### Results

As dataset queries get easier (ie higher overall accuracy %), the % of High Confidence responses increases.

This demonstrates confidences can reliably be used as a proxy for evaluating Parallel Processor performance on a dataset.

## **Low confidence ratings provide efficient identification of what to review **

By reviewing just the outputs rated as "Low" Confidence—a small portion of your total dataset—you can typically achieve an ~2x reduction in error rate, giving you significantly more leverage over human time compared to reviewing all outputs.

Error Rate Reduction after Reviewing Low Confidence Outputs

## **High confidence ratings can be reliably skipped in manual review workflows**

Outputs marked as "High" Confidence have 2-3X lower error rates than that of the overall dataset.

Error Rate Reduction after Only Considering High Confidence Outputs

### **Built for real-world applications at scale**

Basis is particularly valuable for hybrid AI-human workflows where the addition of AI significantly increases leverage, accuracy, and time efficiency. By focusing human review on outputs with low confidence, teams can dramatically reduce verification time while maintaining quality standards. This approach allows enterprises to scale their web research operations without sacrificing accuracy or transparency.

Today, Basis powers human-in-the-loop production workflows across numerous domains. Insurance underwriters leverage low-confidence indicators and citation trails to streamline KYB verification processes that were previously manual. AI automation platforms use Basis to validate data enrichment capabilities before pushing to production, providing traceability from enriched fields back to source materials.

The Basis framework with calibrated confidences is available today with the Parallel Task API. To start building with verifiable web research, go to the Parallel Developer Platform[Parallel Developer Platform](https://platform.parallel.ai/).

## **Notes on Methodology**

**Testing Dates**: Testing was conducted between May 12 and May 15, 2025

**Benchmark Details**: The Parallel Confidence datasets cover easy, medium, and hard web research questions that all reflect a wide range of representative real world use cases. Three example questions are below.

**Example Questions**:

  • - **Compliance**
    Return if SOC2, ISO27001, PCI DSS are compliance frameworks mentioned on imsedge.com. If yes, return only the name of the compliance framework or frameworks that are mentioned. Otherwise, return no.
  • - **Company research**
    Return if a company is B2B or B2C, the CEO’s Linkedin URL, the CEO’s name, the CEO’s undergrad institution, and the employee count of the company as of January 2025.
  • - **Financial research**
    Find the SEC 10-K filing of the company as of January 2025 and the list of stock exchanges the company is listed on as of January 2025.

**Additional Data: **

Confidence Distribution per Accuracy Level Raw Answers
Performance comparison proving Parallel delivers the best enterprise deep research API for ChatGPT and AI agents with 48% accuracy vs competitors' 14% max across Confidence Distribution (Easy, Medium, Hard Datasets) and % of Questions (%). Multi-hop research benchmark shows Parallel's structured AI agent deep research outperforms GPT-4, Claude, Exa, and Perplexity. Enterprise-ready structured deep research API with MCP server integration.

### Confidence Distribution per Accuracy Level

| Category | Correct (%) | Incorrect (%) |
| -------- | ----------- | ------------- |
| High     | 97.2        | 2.8           |
| Medium   | 71.9        | 28.1          |
| Low      | 64.5        | 35.5          |
|          | 0           | 0             |
| High     | 97.5        | 2.5           |
| Medium   | 82.6        | 17.4          |
| Low      | 37.5        | 62.5          |
|          | 0           | 0             |
| High     | 85.9        | 14.1          |
| Medium   | 80          | 20            |
| Low      | 36.1        | 63.9          |

### Raw Question Count per Confidence Level

| Category | Incorrect ( ) | Correct ( ) |
| -------- | ------------- | ----------- |
| High     | 6             | 211         |
| Medium   | 9             | 23          |
| Low      | 11            | 20          |
|          | 0             | 0           |
| High     | 3             | 116         |
| Medium   | 4             | 19          |
| Low      | 10            | 6           |
|          | 0             | 0           |
| High     | 9             | 55          |
| Medium   | 4             | 16          |
| Low      | 23            | 13          |
Parallel avatar

By Parallel

May 16, 2025

## Related Posts46

Parallel | Vercel

- [Parallel Web Tools and Agents now available across Vercel AI Gateway, AI SDK, and Marketplace](https://parallel.ai/blog/vercel)

Tags:Product Release
Reading time: 3 min
Product release: Authenticated page access for the Parallel Task API

- [Authenticated page access for the Parallel Task API](https://parallel.ai/blog/authenticated-page-access)

Tags:Product Release
Reading time: 3 min
Introducing structured outputs for the Monitor API

- [Introducing structured outputs for the Monitor API](https://parallel.ai/blog/structured-outputs-monitor)

Tags:Product Release
Reading time: 3 min
Product release: Research Models with Basis for the Parallel Chat API

- [Introducing research models with Basis for the Parallel Chat API](https://parallel.ai/blog/research-models-chat)

Tags:Product Release
Reading time: 2 min
Parallel + Cerebras

- [Build a real-time fact checker with Parallel and Cerebras](https://parallel.ai/blog/cerebras-fact-checker)

Tags:Cookbook
Reading time: 5 min
DeepSearch QA: Task API

- [Parallel Task API achieves state-of-the-art accuracy on DeepSearchQA](https://parallel.ai/blog/deepsearch-qa)

Tags:Benchmarks
Reading time: 3 min
Product release: Granular Basis

- [Introducing Granular Basis for the Task API](https://parallel.ai/blog/granular-basis-task-api)

Tags:Product Release
Reading time: 3 min
How Amp’s coding agents build better software with Parallel Search

- [How Amp’s coding agents build better software with Parallel Search](https://parallel.ai/blog/case-study-amp)

Tags:Case Study
Reading time: 3 min
Latency improvements on the Parallel Task API

- [Latency improvements on the Parallel Task API ](https://parallel.ai/blog/task-api-latency)

Tags:Product Release
Reading time: 3 min
Product release: Extract

- [Introducing Parallel Extract](https://parallel.ai/blog/introducing-parallel-extract)

Tags:Product Release
Reading time: 2 min
FindAll API - Product Release

- [Introducing Parallel FindAll](https://parallel.ai/blog/introducing-findall-api)

Tags:Product Release,Benchmarks
Reading time: 4 min
Product release: Monitor API

- [Introducing Parallel Monitor](https://parallel.ai/blog/monitor-api)

Tags:Product Release
Reading time: 3 min
Parallel raises $100M Series A to build web infrastructure for agents

- [Parallel raises $100M Series A to build web infrastructure for agents](https://parallel.ai/blog/series-a)

Tags:Fundraise
Reading time: 3 min
How Macroscope reduced code review false positives with Parallel

- [How Macroscope reduced code review false positives with Parallel](https://parallel.ai/blog/case-study-macroscope)

Reading time: 2 min
Product release - Parallel Search API

- [Introducing Parallel Search](https://parallel.ai/blog/introducing-parallel-search)

Tags:Benchmarks
Reading time: 7 min
Benchmarks: SealQA: Task API

- [Parallel processors set new price-performance standard on SealQA benchmark](https://parallel.ai/blog/benchmarks-task-api-sealqa)

Tags:Benchmarks
Reading time: 3 min
Introducing LLMTEXT, an open source toolkit for the llms.txt standard

- [Introducing LLMTEXT, an open source toolkit for the llms.txt standard](https://parallel.ai/blog/LLMTEXT-for-llmstxt)

Tags:Product Release
Reading time: 7 min
Starbridge + Parallel

- [How Starbridge powers public sector GTM with state-of-the-art web research](https://parallel.ai/blog/case-study-starbridge)

Tags:Case Study
Reading time: 4 min
Building a market research platform with Parallel Deep Research

- [Building a market research platform with Parallel Deep Research](https://parallel.ai/blog/cookbook-market-research-platform-with-parallel)

Tags:Cookbook
Reading time: 4 min
How Lindy brings state-of-the-art web research to automation flows

- [How Lindy brings state-of-the-art web research to automation flows](https://parallel.ai/blog/case-study-lindy)

Tags:Case Study
Reading time: 3 min
Introducing the Parallel Task MCP Server

- [Introducing the Parallel Task MCP Server](https://parallel.ai/blog/parallel-task-mcp-server)

Tags:Product Release
Reading time: 4 min
Introducing the Core2x Processor for improved compute control on the Task API

- [Introducing the Core2x Processor for improved compute control on the Task API](https://parallel.ai/blog/core2x-processor)

Tags:Product Release
Reading time: 2 min
How Day AI merges private and public data for business intelligence

- [How Day AI merges private and public data for business intelligence](https://parallel.ai/blog/case-study-day-ai)

Tags:Case Study
Reading time: 4 min
Full Basis framework for all Task API Processors

- [Full Basis framework for all Task API Processors](https://parallel.ai/blog/full-basis-framework-for-task-api)

Tags:Product Release
Reading time: 2 min
Building a real-time streaming task manager with Parallel

- [Building a real-time streaming task manager with Parallel](https://parallel.ai/blog/cookbook-sse-task-manager-with-parallel)

Tags:Cookbook
Reading time: 5 min
How Gumloop built a new AI automation framework with web intelligence as a core node

- [How Gumloop built a new AI automation framework with web intelligence as a core node](https://parallel.ai/blog/case-study-gumloop)

Tags:Case Study
Reading time: 3 min
Introducing the TypeScript SDK

- [Introducing the TypeScript SDK](https://parallel.ai/blog/typescript-sdk)

Tags:Product Release
Reading time: 1 min
Building a serverless competitive intelligence platform with MCP + Task API

- [Building a serverless competitive intelligence platform with MCP + Task API](https://parallel.ai/blog/cookbook-competitor-research-with-reddit-mcp)

Tags:Cookbook
Reading time: 6 min
Introducing Parallel Deep Research reports

- [Introducing Parallel Deep Research reports](https://parallel.ai/blog/deep-research-reports)

Tags:Product Release
Reading time: 2 min
BrowseComp / DeepResearch: Task API

- [A new pareto-frontier for Deep Research price-performance](https://parallel.ai/blog/deep-research-benchmarks)

Tags:Benchmarks
Reading time: 4 min
Building a Full-Stack Search Agent with Parallel and Cerebras

- [Building a Full-Stack Search Agent with Parallel and Cerebras](https://parallel.ai/blog/cookbook-search-agent)

Tags:Cookbook
Reading time: 5 min
Webhooks for the Parallel Task API

- [Webhooks for the Parallel Task API](https://parallel.ai/blog/webhooks)

Tags:Product Release
Reading time: 2 min
Introducing Parallel: Web Search Infrastructure for AIs

- [Introducing Parallel: Web Search Infrastructure for AIs ](https://parallel.ai/blog/introducing-parallel)

Tags:Benchmarks,Product Release
Reading time: 6 min
Introducing SSE for Task Runs

- [Introducing SSE for Task Runs](https://parallel.ai/blog/sse-for-tasks)

Tags:Product Release
Reading time: 2 min
A new line of advanced Processors: Ultra2x, Ultra4x, and Ultra8x

- [A new line of advanced Processors: Ultra2x, Ultra4x, and Ultra8x ](https://parallel.ai/blog/new-advanced-processors)

Tags:Product Release
Reading time: 2 min
Introducing Auto Mode for the Parallel Task API

- [Introducing Auto Mode for the Parallel Task API](https://parallel.ai/blog/task-api-auto-mode)

Tags:Product Release
Reading time: 1 min
A linear dithering of a search interface for agents

- [A state-of-the-art search API purpose-built for agents](https://parallel.ai/blog/search-api-benchmark)

Tags:Benchmarks
Reading time: 3 min
Parallel Search MCP Server in Devin

- [Parallel Search MCP Server in Devin](https://parallel.ai/blog/parallel-search-mcp-in-devin)

Tags:Product Release
Reading time: 2 min
Introducing Tool Calling via MCP Servers

- [Introducing Tool Calling via MCP Servers](https://parallel.ai/blog/mcp-tool-calling)

Tags:Product Release
Reading time: 2 min
Introducing the Parallel Search MCP Server

- [Introducing the Parallel Search MCP Server ](https://parallel.ai/blog/search-mcp-server)

Tags:Product Release
Reading time: 2 min
Starting today, Source Policy is available for both the Parallel Task API and Search API - giving you granular control over which sources your AI agents access and how results are prioritized.

- [Introducing Source Policy](https://parallel.ai/blog/source-policy)

Tags:Product Release
Reading time: 1 min
The Parallel Task Group API

- [The Parallel Task Group API](https://parallel.ai/blog/task-group-api)

Tags:Product Release
Reading time: 1 min
State of the Art Deep Research APIs

- [State of the Art Deep Research APIs](https://parallel.ai/blog/deep-research)

Tags:Benchmarks
Reading time: 3 min
Introducing the Parallel Search API

- [Parallel Search API is now available in alpha](https://parallel.ai/blog/parallel-search-api)

Tags:Product Release
Reading time: 2 min
Introducing the Parallel Chat API - a low latency web research API for web based LLM completions. The Parallel Chat API returns completions in text and structured JSON format, and is OpenAI Chat Completions compatible.

- [Introducing the Parallel Chat API ](https://parallel.ai/blog/chat-api)

Tags:Product Release
Reading time: 1 min
The Parallel Task API is a state-of-the-art system for automated web research that delivers the highest accuracy at every price point.

- [Introducing the Parallel Task API](https://parallel.ai/blog/parallel-task-api)

Tags:Product Release,Benchmarks
Reading time: 4 min
![Company Logo](https://parallel.ai/parallel-logo-540.png)

Contact

  • hello@parallel.ai[hello@parallel.ai](mailto:hello@parallel.ai)

Products

  • Search API[Search API](https://parallel.ai/products/search)
  • Extract API[Extract API](https://docs.parallel.ai/extract/extract-quickstart)
  • Task API[Task API](https://docs.parallel.ai/task-api/task-quickstart)
  • FindAll API[FindAll API](https://docs.parallel.ai/findall-api/findall-quickstart)
  • Chat API[Chat API](https://docs.parallel.ai/chat-api/chat-quickstart)
  • Monitor API[Monitor API](https://platform.parallel.ai/play/monitor)

Resources

  • About[About](https://parallel.ai/about)
  • Pricing[Pricing](https://parallel.ai/pricing)
  • Docs[Docs](https://docs.parallel.ai)
  • Blog[Blog](https://parallel.ai/blog)
  • Changelog[Changelog](https://docs.parallel.ai/resources/changelog)
  • Careers[Careers](https://jobs.ashbyhq.com/parallel)

Info

  • Terms of Service[Terms of Service](https://parallel.ai/terms-of-service)
  • Customer Terms[Customer Terms](https://parallel.ai/customer-terms)
  • Privacy[Privacy](https://parallel.ai/privacy-policy)
  • Acceptable Use[Acceptable Use](https://parallel.ai/acceptable-use-policy)
  • Trust Center[Trust Center](https://trust.parallel.ai/)
![SOC 2 Compliant](https://parallel.ai/soc2.svg)
LinkedIn[LinkedIn](https://www.linkedin.com/company/parallel-web/about/)Twitter[Twitter](https://x.com/p0)GitHub[GitHub](https://github.com/parallel-web)
All Systems Operational

Parallel Web Systems Inc. 2026