

Build a web research agent that combines Parallel's Search API with streaming AI inference.
This guide demonstrates how to build a web research agent that combines Parallel's Search API with streaming AI inference. By the end, you'll have a complete search agent with a simple frontend that shows searches, results, and AI responses as they stream in real-time.
Complete app available here[here]($https://oss.parallel.ai/agent/).
The search agent we're building includes:
Our technology stack:
Parallel's Search API is designed for machines from first principles. The key difference from other search APIs like Exa or Tavily is that it provides all required context in a single API call. Other search approaches typically require two separate calls - one for getting the search engine results page (SERP), another for scraping the relevant pages. This traditional approach is slower and more token-heavy for the LLM.
Parallel streamlines this by finding the most relevant context from all pages immediately, returning only the relevant content to reduce context bloat. Our Search API benchmark[benchmark]($https://parallel.ai/blog/search-api-benchmark) demonstrates that the Parallel Search API being used in an agentic workflow can translate to up to 20% gains in accuracy vs other Search providers.
The diagram also illustrates how the AI agent can iteratively call the Search API multiple times, allowing it to explore different angles and gather comprehensive information before providing a final response. This multi-step capability is essential for true agentic behavior.
Most AI providers ship models with built-in tool calling via /chat/completions endpoints. However, doing tool calling in a streaming fashion requires working with Server-Sent Events and multiple API round trips, which is complex to implement correctly.
The Vercel AI SDK elegantly abstracts provider-specific quirks and allows calling most providers with most of their features from a unified interface. This eliminates the need to work directly with raw API specifications and handle the back-and-forth tool calling manually.
The SDK offers multiple approaches for building this agent. While we use vanilla HTML/JavaScript for simplicity, the same backend can work with React frontends using AI SDK UI components for more sophisticated interfaces. The streaming approach we demonstrate works across different frontend frameworks, giving you flexibility in your implementation choice.
Now that we understand the architectural advantages, let's walk through building this search agent step by step.
To prevent TypeScript's "Type instantiation is excessively deep" error, zod requires a version suffix. Import the required functions:
This section covers setting up the core search functionality that will power our AI agent:
Here we set up the core AI agent with multi-step reasoning capabilities:
The stepCountIs(25) parameter allows the agent to make multiple search calls and reasoning steps, enabling thorough research across different angles before providing a comprehensive response.
The system prompt guides the agent to conduct multiple searches from different perspectives, which is crucial for comprehensive research.
This section handles the real-time streaming of agent responses to the frontend:
Requirements:
Before deploying, submit your secrets:
Deploy:
The worker also serves the frontend at the root path:
The frontend processes the streaming responses in real-time:
The frontend uses https://cdn.tailwindcss.com[https://cdn.tailwindcss.com]($https://cdn.tailwindcss.com/) for styling, which reduces the lines needed for clean design without additional dependencies. The implementation uses regular HTML rather than React or other frameworks, making it accessible and easy to understand.
The complete source files provide essential context for both backend logic and frontend streaming:
Essential source files:
These files contain the complete TypeScript definitions and HTML implementation that are essential for understanding the full integration between the Parallel Search API and the streaming frontend.
When altering the front-end implementation, having proper Typescript context is crucial for developer experience. The AI SDK Stubs file (https://unpkg.com/ai@5.0.22/dist/index.d.ts[https://unpkg.com/ai@5.0.22/dist/index.d.ts]($https://unpkg.com/ai@5.0.22/dist/index.d.ts)) was used to overcome the limited dev tooling for plain-HTML front-ends. More context can be found in SPEC.md.
The guide uses GPT-OSS 120B on Cerebras, which is one of the fastest models available and fully open source. However, there are some noted limitations. The model sometimes inaccurately stops early during search despite instructions and occasionally tries to call tools that aren't available, likely due to overfitting on training data. For production use cases, consider upgrading to better tool-calling models that don't have these quirks while maintaining similar speed. Both Groq and Cerebras provide such alternatives.
This demonstration omits several production requirements: Authentication: No user authentication is implemented
Adding these features would be important next steps for enterprise deployment.
The resulting agent demonstrates real-time streaming of search operations, multi-step AI reasoning with tool use, clean separation of search logic and presentation, and serverless deployment ready for scaling. The architecture shows how modern AI SDKs can simplify complex multi-step agent workflows while maintaining performance and user experience quality.
Resources:
By Parallel
September 5, 2025