Chat API
# Web-powered chat completions API
## The easiest way to implement fast AI answers with web citations

## Priced per request, not by tool calls
Leading inference APIs bundle in web search price on tool calls, which means fluctuating costs when agents make multiple tool calls. Parallel’s fixed $5 per 1,000 requests is not only 50% cheaper, it’s also consistent.
| Provider | Cost model | | ---------------------- | ----------------------------------------- | | OpenAI Web Search Tool | $10 per 1,000 searches + inference tokens | | Claude Web Search Tool | $10 per 1,000 searches + inference tokens | | Parallel Search API | $5 per 1,000 searches + inference tokens | | Parallel Chat API | $5 per 1,000 chat completion |
$10 per 1,000 searches + inference tokens
$10 per 1,000 searches + inference tokens
$5 per 1,000 searches + inference tokens
$5 per 1,000 chat completion
Features
- - OpenAI compatible: Same SDK, same format. Point your base URL at Parallel and you’re live.
- - Citations by default: Every response includes verifiable sources. No hallucinations without receipts.
- - Structured output ready: Request JSON schema responses for clean integration into your product.
# Powered by our own proprietary web scale index
With innovations in retrieval, crawling, indexing, and reasoning
- - Billions of pages covering the full depth and breadth of the public web
- - Millions of pages added daily
- - Recrawled constantly to keep data fresh

## FAQ
We own our index. That means faster responses, better citation quality, and pricing that doesn't penalize you for grounding every query. Built-in provider search is a black box bolted onto an LLM — we built search-first.
The Chat API answers from Parallel's continuously updated web index. For queries requiring the absolute freshest data or deep crawling, use the Task API.
Chat API uses our own optimized inference stack tuned for speed and accuracy on web-grounded queries. You don't need to choose a model, but you can specify a processor to tune the cost, speed, and depth to your unique needs.
Yes. Change your base URL to https://api.parallel.ai and swap in your Parallel API key. Everything else, streaming, JSON schema, message format, works the same.
300 requests per minute by default. Contact us for production capacity.
Chat is fast and simple, ideal for simple chat assistant UX where latency matters. Task is thorough, it combines our index with real-time crawling and extra reasoning for complex research that needs maximum accuracy and freshness.