
# Building a Full-Stack Search Agent with Parallel and Cerebras
Build a web research agent that combines Parallel's Search API with streaming AI inference.

This guide demonstrates how to build a web research agent that combines Parallel's Search API with streaming AI inference. By the end, you'll have a complete search agent with a simple frontend that shows searches, results, and AI responses as they stream in real-time.
Complete app available here[here]($https://oss.parallel.ai/agent/).
## The Architecture
The search agent we're building includes:
- - A simple search homepage
- - User-editable system prompt in config modal
- - Agent connection through Parallel Search API tool use
- - Streaming searches, search results, AI reasoning, and AI responses
- - Clean rendering of results as they arrive
Our technology stack:
- - Parallel TypeScript SDK[Parallel TypeScript SDK]($https://www.npmjs.com/package/parallel-web) for the Search API
- - Vercel AI SDK[Vercel AI SDK]($https://ai-sdk.dev/docs/introduction) for AI orchestration
- - Cerebras[Cerebras]($https://ai-sdk.dev/providers/ai-sdk-providers/cerebras) with GPT-OSS 120B for fast responses
- - Cloudflare Workers[Cloudflare Workers]($https://workers.cloudflare.com/) for deployment
## Why This Architecture Works
### Search API vs Traditional Agent Search Architecture
Parallel's Search API is designed for machines from first principles. The key difference from other search APIs like Exa or Tavily is that it provides all required context in a single API call. Other search approaches typically require two separate calls - one for getting the search engine results page (SERP), another for scraping the relevant pages. This traditional approach is slower and more token-heavy for the LLM.
Parallel streamlines this by finding the most relevant context from all pages immediately, returning only the relevant content to reduce context bloat. Our Search API benchmark[benchmark]($https://parallel.ai/blog/search-api-benchmark) demonstrates that the Parallel Search API being used in an agentic workflow can translate to up to 20% gains in accuracy vs other Search providers.
The diagram also illustrates how the AI agent can iteratively call the Search API multiple times, allowing it to explore different angles and gather comprehensive information before providing a final response. This multi-step capability is essential for true agentic behavior.
### Choosing the Vercel AI SDK
Most AI providers ship models with built-in tool calling via /chat/completions endpoints. However, doing tool calling in a streaming fashion requires working with Server-Sent Events and multiple API round trips, which is complex to implement correctly.
The Vercel AI SDK elegantly abstracts provider-specific quirks and allows calling most providers with most of their features from a unified interface. This eliminates the need to work directly with raw API specifications and handle the back-and-forth tool calling manually.
The SDK offers multiple approaches for building this agent. While we use vanilla HTML/JavaScript for simplicity, the same backend can work with React frontends using AI SDK UI components for more sophisticated interfaces. The streaming approach we demonstrate works across different frontend frameworks, giving you flexibility in your implementation choice.
## Implementation
Now that we understand the architectural advantages, let's walk through building this search agent step by step.
### Dependencies and Setup
To prevent TypeScript's "Type instantiation is excessively deep" error, zod requires a version suffix. Import the required functions:
### Defining the Search Tool
This section covers setting up the core search functionality that will power our AI agent:
### Key implementation choices:
- - We choose "objective" over "search_queries" because it allows for natural language description of research goals, making the tool more intuitive for the AI to use
- - The "base" processor prioritizes speed while "pro" focuses on freshness and quality - choose based on your use case requirements
- - Token limits are balanced to provide sufficient context without overwhelming the model
## Creating the Streaming Agent
Here we set up the core AI agent with multi-step reasoning capabilities:
### Important configuration details:
The stepCountIs(25) parameter allows the agent to make multiple search calls and reasoning steps, enabling thorough research across different angles before providing a comprehensive response.
The system prompt guides the agent to conduct multiple searches from different perspectives, which is crucial for comprehensive research.
## Streaming Response Handler
This section handles the real-time streaming of agent responses to the frontend:
## Cloudflare Workers Deployment
### Configuration
## Deployment Process
Requirements:
- - Node.js
- - Wrangler CLI
- - Cloudflare account
Before deploying, submit your secrets:
Deploy:
## Frontend Implementation
The worker also serves the frontend at the root path:
### Handling the Stream
The frontend processes the streaming responses in real-time:
## Styling and Dependencies
The frontend uses https://cdn.tailwindcss.com[https://cdn.tailwindcss.com]($https://cdn.tailwindcss.com/) for styling, which reduces the lines needed for clean design without additional dependencies. The implementation uses regular HTML rather than React or other frameworks, making it accessible and easy to understand.
## Development Context and Resources
The complete source files provide essential context for both backend logic and frontend streaming:
Essential source files:
- - worker.ts - Complete backend implementation
- - index.html - Frontend with streaming UI
These files contain the complete TypeScript definitions and HTML implementation that are essential for understanding the full integration between the Parallel Search API and the streaming frontend.
When altering the front-end implementation, having proper Typescript context is crucial for developer experience. The AI SDK Stubs file (https://unpkg.com/ai@5.0.22/dist/index.d.ts[https://unpkg.com/ai@5.0.22/dist/index.d.ts]($https://unpkg.com/ai@5.0.22/dist/index.d.ts)) was used to overcome the limited dev tooling for plain-HTML front-ends. More context can be found in SPEC.md.
## Model Considerations
The guide uses GPT-OSS 120B on Cerebras, which is one of the fastest models available and fully open source. However, there are some noted limitations. The model sometimes inaccurately stops early during search despite instructions and occasionally tries to call tools that aren't available, likely due to overfitting on training data. For production use cases, consider upgrading to better tool-calling models that don't have these quirks while maintaining similar speed. Both Groq and Cerebras provide such alternatives.
## Production Considerations
This demonstration omits several production requirements: Authentication: No user authentication is implemented
- - Rate limiting: Currently limited only by API budgets
- - Error handling: Basic error handling is shown but could be expanded
- - Monitoring: No observability or logging beyond basic console output
Adding these features would be important next steps for enterprise deployment.
The resulting agent demonstrates real-time streaming of search operations, multi-step AI reasoning with tool use, clean separation of search logic and presentation, and serverless deployment ready for scaling. The architecture shows how modern AI SDKs can simplify complex multi-step agent workflows while maintaining performance and user experience quality.
Resources:
- - Complete source code[Complete source code]($https://github.com/parallel-web/parallel-cookbook/tree/main/typescript-recipes/parallel-search-agent)
- - Parallel API documentation[Parallel API documentation]($https://docs.parallel.ai/)
- - Get Parallel API keys[Get Parallel API keys]($https://platform.parallel.ai/)
By Parallel
September 5, 2025































