April 24, 2026

# Building a free CLI agent with Pi, Ollama, Gemma 4, and Parallel

Reading time: 4 min

Fully Free CLI with Pi, Ollama, Gemma 4, Parallel

When building software, I’ve often focused on different outcomes: speed, quality, ease of use. In the age of AI, cost is often overlooked. Now, in 2026, I wanted to see: could I vibe-code a CLI that uses AI completely for free? No LLM subscription, no per-token API bills, no hosted inference. Just a local model, a local agent harness, and Parallel's Free Search MCP[Parallel's Free Search MCP](/blog/free-web-search-mcp).

The result: `brief`, a single-file CLI that takes a topic and prints a morning-coffee summary with sources. It was _written by_ a local agent (Pi + gemma4:26b on Ollama) using the Parallel Search MCP to pull docs into context, and it _runs on_ the same free building blocks at runtime, `gemma4:e4b` on local Ollama for summarization and Parallel Search MCP for the news lookup. Full stack, on one machine, at my desk — $0 in API charges, zero API keys in my shell history. (Cost of the laptop not included.)

Here's how it went and where the rough edges were.

## The development stack

- **Agent harness:** Pi[Pi](https://github.com/badlogic/pi-mono) (`@mariozechner/pi-coding-agent`). Billed as a _minimal terminal coding harness_ — four built-in tools (`read`, `write`, `edit`, `bash`), everything else via extensions. "Adapt pi to your workflows, not the other way around." MCP support comes from a third-party extension, `pi-mcp-adapter`[`pi-mcp-adapter`](https://github.com/nicobailon/pi-mcp-adapter).
- **Model runtime:** Ollama[Ollama](https://ollama.com/)
- **Model:** `gemma4:26b` — the 26B Mixture-of-Experts variant with 4B active parameters from Google DeepMind's Gemma 4 family (Apache 2.0 license).
- **Search:** Parallel Search MCP[Parallel Search MCP](https://docs.parallel.ai/integrations/mcp/search-mcp) at `https://search.parallel.ai/mcp`. Two tools: `web_search` and `web_fetch`. No auth required.

## Getting it running

Four files end up on disk — two in the project, two global:

### File structure

1
2
3
4
5
6
7
pi_coder/
├── .mcp.json          # Parallel Search MCP endpoint
└── .pi/
    └── settings.json  # pointer to the Ollama provider

~/.pi/agent/
└── models.json        # defines the Ollama provider itself``` pi_coder/
├── .mcp.json          # Parallel Search MCP endpoint
└── .pi/
    └── settings.json  # pointer to the Ollama provider
 
~/.pi/agent/
└── models.json        # defines the Ollama provider itself
```

### Adding MCP support to Pi

Pi intentionally doesn't ship with MCP support. To use MCP, install the following:

### Install the Pi MCP adapter

1
pi install npm:pi-mcp-adapter``` pi install npm:pi-mcp-adapter
```

### Adding the Parallel Search MCP to Pi

There's nothing to set up server-side — Parallel[Parallel](/) hosts the endpoint. Drop the URL into `.mcp.json`:

### Parallel Search Free MCP Server

1
2
3
4
5
6
7
8
{
  "mcpServers": {
    "parallel-search": {
      "url": "https://search.parallel.ai/mcp",
      "directTools": ["web_search", "web_fetch"]
    }
  }
}``` {
  "mcpServers": {
    "parallel-search": {
      "url": "https://search.parallel.ai/mcp",
      "directTools": ["web_search", "web_fetch"]
    }
  }
}
```

That's the whole add. `directTools` registers `web_search` and `web_fetch` as first-class Pi tools alongside `read`/`write`/`edit`/`bash` — roughly 300–600 tokens of system-prompt overhead for the pair.

To verify it's wired up: `/mcp` inside Pi opens a panel showing every configured server, its connection status, and its tools. You should see `parallel-search` connected with `web_search` and `web_fetch` available.

### Pointing Pi at Ollama

Pi resolves providers globally, so the Ollama definition goes in `~/.pi/agent/models.json`:

### Point Pi at Ollama

1
2
3
4
5
6
7
8
9
10
{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "models": [{ "id": "gemma4:26b" }]
    }
  }
}``` {
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "models": [{ "id": "gemma4:26b" }]
    }
  }
}
```

Then the project-local .pi/settings.json just picks it:

### Default provider settings

1
2
3
4
{
  "defaultProvider": "ollama",
  "defaultModel": "gemma4:26b"
}``` {
  "defaultProvider": "ollama",
  "defaultModel": "gemma4:26b"
}
```

Full install:

### Installation

1
2
3
4
5
6
7
8
npm install -g @mariozechner/pi-coding-agent
pi install npm:pi-mcp-adapter
ollama pull gemma4:26b
ollama pull gemma4:e4b
ollama serve &
# write ~/.pi/agent/models.json (see above)
cd pi_coder # contains .mcp.json + .pi/settings.json
pi``` npm install -g @mariozechner/pi-coding-agent
pi install npm:pi-mcp-adapter
ollama pull gemma4:26b
ollama pull gemma4:e4b
ollama serve &
# write ~/.pi/agent/models.json (see above)
cd pi_coder # contains .mcp.json + .pi/settings.json
pi
```

## What I asked it to build

I gave Pi a detailed spec for a CLI called `brief`:

### Spec for Pi

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# `brief` — a terminal news briefing CLI

A tiny CLI that turns a topic into a morning-coffee briefing in your terminal. You type a topic, it fetches what happened recently, and prints a short summary with sources.

## Usage

```bash
brief "ai agents"
brief "openai" --since 24h
brief "rust web frameworks" --bullets 5
```

Output (stdout, plain text with minimal ANSI):

```
🗞  ai agents — last 24h

• Anthropic shipped Opus 4.7 with 1M-token context…  [1]
• LangChain released v1.0 with a rewritten runtime…  [2]
• Researchers at CMU published a paper on…           [3]

Sources
[1] https://…
[2] https://…
[3] https://…
```

## Flags

| Flag | Default | Purpose |
| --- | --- | --- |
| `--since <duration>` | `24h` | Recency window (`6h`, `24h`, `7d`) |
| `--bullets <n>` | `5` | Number of bullets |
| `--json` | off | Emit structured JSON instead of prose |

## How it works

Three steps, one file:

1. **Search** — use `@modelcontextprotocol/sdk` with `StreamableHTTPClientTransport` to connect to `https://search.parallel.ai/mcp`, then `client.callTool({ name: "web_search", arguments: { ... } })`. Do **not** hand-roll JSON-RPC over `fetch` — the Streamable HTTP transport has an initialize handshake, session headers, and SSE framing that the SDK handles. When developing, run await client.listTools() once and log the web_search input schema before the first callTool — don't guess argument names; also log the results so you know the shape of the response. Pass `--since` in the search objective.
2. **Summarize** — hand the results to an LLM with a tight prompt: "Return N bullets. One sentence each. Cite each bullet with `[n]` matching the source index. No fluff." The LLM only produces text — it does not call tools, so any Ollama chat model works regardless of its tool-calling maturity.
3. **Print** — render bullets + a numbered source list. In `--json` mode, skip rendering and dump the structured object, e.g.:

```json
{
  "topic": "ai agents",
  "since": "24h",
  "bullets": [
    { "text": "Anthropic shipped Opus 4.7…", "source": 1 }
  ],
  "sources": [
    { "n": 1, "url": "https://…", "title": "…" }
  ]
}
```

## Stack

- **Language:** TypeScript (Node), single file, `npx`-runnable.
- **Search:** Parallel Search MCP. Docs: `https://docs.parallel.ai/integrations/mcp/search-mcp`. No API key required.
- **LLM:** Use ollama + gemma4:e4b (already installed).
- **Config:** No config required.
- **Install**: installable as a brief command on PATH (e.g. via npm link)

## Non-goals

- No caching, no database, no daemon. Run it, read it, close it.
- No scheduling or delivery (cron + `brief ... | mail` is the user's job).
- No multi-topic dashboards. One topic per invocation keeps the code tiny.

## Tips
- **Parallel Search MCP** Use Parallel Search MCP yourself to find the latest documentation for any packages. Use it to look up anything you have a question on.
- **No tool-calling from the LLM** The CLI code orchestrates: it calls the MCP, then passes results as plain text to the LLM for summarization. The LLM never calls tools, so Ollama's tool-call parsing quirks are irrelevant here.
- **Test** You have all that you need to do a full end-to-end test on your own. Do this before marking as complete. After fixing any bugs, test again until you have determined `brief` is working well.``` # `brief` — a terminal news briefing CLI
 
A tiny CLI that turns a topic into a morning-coffee briefing in your terminal. You type a topic, it fetches what happened recently, and prints a short summary with sources.
 
## Usage
 
```bash
brief "ai agents"
brief "openai" --since 24h
brief "rust web frameworks" --bullets 5
```
 
Output (stdout, plain text with minimal ANSI):
 
```
🗞  ai agents — last 24h
 
• Anthropic shipped Opus 4.7 with 1M-token context…  [1]
• LangChain released v1.0 with a rewritten runtime…  [2]
• Researchers at CMU published a paper on…           [3]
 
Sources
[1] https://…
[2] https://…
[3] https://…
```
 
## Flags
 
| Flag | Default | Purpose |
| --- | --- | --- |
| `--since <duration>` | `24h` | Recency window (`6h`, `24h`, `7d`) |
| `--bullets <n>` | `5` | Number of bullets |
| `--json` | off | Emit structured JSON instead of prose |
 
## How it works
 
Three steps, one file:
 
1. **Search** — use `@modelcontextprotocol/sdk` with `StreamableHTTPClientTransport` to connect to `https://search.parallel.ai/mcp`, then `client.callTool({ name: "web_search", arguments: { ... } })`. Do **not** hand-roll JSON-RPC over `fetch` — the Streamable HTTP transport has an initialize handshake, session headers, and SSE framing that the SDK handles. When developing, run await client.listTools() once and log the web_search input schema before the first callTool — don't guess argument names; also log the results so you know the shape of the response. Pass `--since` in the search objective.
2. **Summarize** — hand the results to an LLM with a tight prompt: "Return N bullets. One sentence each. Cite each bullet with `[n]` matching the source index. No fluff." The LLM only produces text — it does not call tools, so any Ollama chat model works regardless of its tool-calling maturity.
3. **Print** — render bullets + a numbered source list. In `--json` mode, skip rendering and dump the structured object, e.g.:
 
```json
{
  "topic": "ai agents",
  "since": "24h",
  "bullets": [
    { "text": "Anthropic shipped Opus 4.7…", "source": 1 }
  ],
  "sources": [
    { "n": 1, "url": "https://…", "title": "…" }
  ]
}
```
 
## Stack
 
- **Language:** TypeScript (Node), single file, `npx`-runnable.
- **Search:** Parallel Search MCP. Docs: `https://docs.parallel.ai/integrations/mcp/search-mcp`. No API key required.
- **LLM:** Use ollama + gemma4:e4b (already installed).
- **Config:** No config required.
- **Install**: installable as a brief command on PATH (e.g. via npm link)
 
## Non-goals
 
- No caching, no database, no daemon. Run it, read it, close it.
- No scheduling or delivery (cron + `brief ... | mail` is the user's job).
- No multi-topic dashboards. One topic per invocation keeps the code tiny.
 
## Tips
- **Parallel Search MCP** Use Parallel Search MCP yourself to find the latest documentation for any packages. Use it to look up anything you have a question on.
- **No tool-calling from the LLM** The CLI code orchestrates: it calls the MCP, then passes results as plain text to the LLM for summarization. The LLM never calls tools, so Ollama's tool-call parsing quirks are irrelevant here.
- **Test** You have all that you need to do a full end-to-end test on your own. Do this before marking as complete. After fixing any bugs, test again until you have determined `brief` is working well.
```

## Where the rough edges were

There were a few issues I ran into when doing this setup.

**Errors from Ollama’s tool-call parser.** I ran into several issues where Pi stopped making progress. Looking at the `ollama` logs, I found there was a parsing error when trying to read a file. Politely asking Pi to continue resolved the issue.
**Infinite Thinking loops.** Late in a session, after a lot of back-and-forth and a few file reads, the model's thinking stream would occasionally collapse into spamming the same token. Starting a fresh session fixed it every time.
**Some sessions weren’t productive**. ****The model sometimes went down the wrong path and got stuck. Updating the spec with specific packages we found using Parallel Search helped. Restarting from scratch with the updated spec worked.
**Models biased not to look up information.** I needed to add some extra prompts to use the search MCP to look up package information and/or documentation to ensure the code followed best practices and used the latest packages.

## Takeaways

- **Pi is a good fit when you want a minimal harness.** Four tools, extensions for everything else, and a documented event system.
- **The Parallel Search MCP is a genuinely nice shape for this.** Free, no auth, two tools, clean interface, and the results are already concise enough for local-model context windows. Drop in for any MCP-aware client. This is a great way to make smaller models smarter by providing updated context from the web.
- **The whole stack is $0.** Local inference on hardware you already own, a free and open-source agent harness, and a free search endpoint. For solo projects, prototyping, or any workflow where you don't want a metered API hanging over every keystroke, this is hard to beat.

## Ready to get started?

Try Parallel[Try Parallel](https://platform.parallel.ai/home)Contact sales[Contact sales](https://contact.parallel.ai/)

Are you an agent? Read this to onboard Parallel[Are you an agent? Read this to onboard Parallel](https://parallel.ai/agents.md)

By Matt Harris

April 24, 2026

## Related Posts74

Jul 13, 2026

- [Introducing Parallel Search Turbo](https://parallel.ai/blog/parallel-search-turbo)

Author: By Parallel

Jul 10, 2026

- [How Nooks cut web search costs 70.5% by switching to Parallel](https://parallel.ai/blog/case-study-nooks)

Tags:Customers

Author: By Parallel

Jul 8, 2026

- [How Build created live geofenced alerts powered by Parallel for institutional real estate](https://parallel.ai/blog/case-study-build)

Tags:Customers

Author: By Parallel

Jun 9, 2026

- [OpenClaw now has free, LLM-optimized web search by default powered by Parallel](https://parallel.ai/blog/free-web-search-openclaw)

Tags:Company

Author: By Parallel

Jun 5, 2026

# Building a free CLI agent with Pi, Ollama, Gemma 4, and Parallel

## The development stack

## Getting it running

### Adding MCP support to Pi

### Adding the Parallel Search MCP to Pi

### Pointing Pi at Ollama

## What I asked it to build

## Where the rough edges were

## Takeaways

## Ready to get started?

## Related Posts74

- [Introducing Parallel Search Turbo](https://parallel.ai/blog/parallel-search-turbo)

- [How Nooks cut web search costs 70.5% by switching to Parallel](https://parallel.ai/blog/case-study-nooks)

- [How Build created live geofenced alerts powered by Parallel for institutional real estate](https://parallel.ai/blog/case-study-build)

- [OpenClaw now has free, LLM-optimized web search by default powered by Parallel](https://parallel.ai/blog/free-web-search-openclaw)

- [Introducing real-time Entity Search](https://parallel.ai/blog/entity-search-company)

- [How we enrich & triage inbound leads using the Parallel Task API](https://parallel.ai/blog/enrich-triage-inbound-leads-parallel-task-api)

- [How AirOps creates citation-worthy content at scale, powered by Parallel](https://parallel.ai/blog/case-study-airops)

- [Introducing Index by Parallel](https://parallel.ai/blog/introducing-index-by-parallel)

- [Parallel Monitor API: New processor tiers, snapshots and event streams, and Basis on every event](https://parallel.ai/blog/monitor-api)

- [How we built parallelmpp.dev](https://parallel.ai/blog/parallel-mpp-dev)

- [How Actively's Per Account Agents use Parallel to turn the entire web into a proactive sales intelligence layer](https://parallel.ai/blog/case-study-actively)

- [Parallel Raises at $2 Billion Valuation to Scale Web Infrastructure for Agents](https://parallel.ai/blog/series-b)

- [Parallel Search is now free for agents via MCP](https://parallel.ai/blog/free-web-search-mcp)

- [Upgrades to the Parallel Search & Extract APIs](https://parallel.ai/blog/parallel-search-api)

- [How Finch is scaling plaintiff law with AI agents that research like associates](https://parallel.ai/blog/case-study-finch)

- [Genpact and Parallel Web Systems Partner to Drive Tangible Efficiency from AI Systems](https://parallel.ai/blog/genpact-parallel-partnership)

- [How Genpact helps top US insurers cut contents claims processing times in half with Parallel ](https://parallel.ai/blog/case-study-genpact)

- [A new deep research frontier on DeepSearchQA with the Task API Harness](https://parallel.ai/blog/deep-research)

- [How Modal saves tens of thousands annually by building in-house GTM pipelines with Parallel](https://parallel.ai/blog/case-study-modal)

- [How Opendoor uses Parallel as the enterprise grade web research layer powering its AI-native real estate operations](https://parallel.ai/blog/case-study-opendoor)

- [Introducing stateful web research agents with multi-turn conversations](https://parallel.ai/blog/task-api-interactions)

- [Parallel is live on Tempo, now available natively to agents with the Machine Payments Protocol](https://parallel.ai/blog/tempo-stripe-mpp)

- [How Parallel helped Kepler build AI that finance professionals can actually trust](https://parallel.ai/blog/case-study-kepler)

- [Introducing the Parallel CLI](https://parallel.ai/blog/parallel-cli)

- [How Profound helps brands win AI Search with high-quality web research and content creation powered by Parallel](https://parallel.ai/blog/case-study-profound)

- [How Harvey is expanding legal AI internationally with Parallel](https://parallel.ai/blog/case-study-harvey)

- [How Tabstack by Mozilla enables agents to navigate the web with Parallel’s best-in-class web search](https://parallel.ai/blog/case-study-tabstack)

- [Parallel Web Tools and Agents now available across Vercel AI Gateway, AI SDK, and Marketplace](https://parallel.ai/blog/vercel)

- [Authenticated page access for the Parallel Task API](https://parallel.ai/blog/authenticated-page-access)

- [Introducing structured outputs for the Monitor API](https://parallel.ai/blog/structured-outputs-monitor)

- [Introducing research models with Basis for the Parallel Chat API](https://parallel.ai/blog/research-models-chat)

- [Build a real-time fact checker with Parallel and Cerebras](https://parallel.ai/blog/cerebras-fact-checker)

- [Parallel Task API achieves state-of-the-art accuracy on DeepSearchQA](https://parallel.ai/blog/deepsearch-qa)

- [Introducing Granular Basis for the Task API](https://parallel.ai/blog/granular-basis-task-api)

- [How Amp’s coding agents build better software with Parallel Search](https://parallel.ai/blog/case-study-amp)

- [Latency improvements on the Parallel Task API ](https://parallel.ai/blog/task-api-latency)

- [Introducing Parallel Extract](https://parallel.ai/blog/introducing-parallel-extract)

- [Introducing Parallel FindAll](https://parallel.ai/blog/introducing-findall-api)

- [Introducing Parallel Monitor](https://parallel.ai/blog/monitor-api-beta)

- [Parallel raises $100M Series A to build web infrastructure for agents](https://parallel.ai/blog/series-a)

- [How Macroscope reduced code review false positives with Parallel](https://parallel.ai/blog/case-study-macroscope)

- [Introducing Parallel Search](https://parallel.ai/blog/parallel-search-api-beta)

- [Parallel processors set new price-performance standard on SealQA benchmark](https://parallel.ai/blog/benchmarks-task-api-sealqa)

- [Introducing LLMTEXT, an open source toolkit for the llms.txt standard](https://parallel.ai/blog/LLMTEXT-for-llmstxt)

- [How Starbridge powers public sector GTM with state-of-the-art web research](https://parallel.ai/blog/case-study-starbridge)

- [Building a market research platform with Parallel Deep Research](https://parallel.ai/blog/cookbook-market-research-platform-with-parallel)

- [How Lindy brings state-of-the-art web research to automation flows](https://parallel.ai/blog/case-study-lindy)

- [Introducing the Parallel Task MCP Server](https://parallel.ai/blog/parallel-task-mcp-server)

- [Introducing the Core2x Processor for improved compute control on the Task API](https://parallel.ai/blog/core2x-processor)

- [How Day AI merges private and public data for business intelligence](https://parallel.ai/blog/case-study-day-ai)

- [Full Basis framework for all Task API Processors](https://parallel.ai/blog/full-basis-framework-for-task-api)

- [Building a real-time streaming task manager with Parallel](https://parallel.ai/blog/cookbook-sse-task-manager-with-parallel)

- [How Gumloop built a new AI automation framework with web intelligence as a core node](https://parallel.ai/blog/case-study-gumloop)

- [Introducing the TypeScript SDK](https://parallel.ai/blog/typescript-sdk)

- [Building a serverless competitive intelligence platform with MCP + Task API](https://parallel.ai/blog/cookbook-competitor-research-with-reddit-mcp)

- [Introducing Parallel Deep Research reports](https://parallel.ai/blog/deep-research-reports)

- [A new pareto-frontier for Deep Research price-performance](https://parallel.ai/blog/deep-research-benchmarks)

- [Building a Full-Stack Search Agent with Parallel and Cerebras](https://parallel.ai/blog/cookbook-search-agent)

- [Webhooks for the Parallel Task API](https://parallel.ai/blog/webhooks)

- [Introducing Parallel: Web Search Infrastructure for AIs ](https://parallel.ai/blog/introducing-parallel)

- [Introducing SSE for Task Runs](https://parallel.ai/blog/sse-for-tasks)

- [A new line of advanced Processors: Ultra2x, Ultra4x, and Ultra8x ](https://parallel.ai/blog/new-advanced-processors)

- [Introducing Auto Mode for the Parallel Task API](https://parallel.ai/blog/task-api-auto-mode)

- [A state-of-the-art search API purpose-built for agents](https://parallel.ai/blog/search-api-benchmark)

- [Parallel Search MCP Server in Devin](https://parallel.ai/blog/parallel-search-mcp-in-devin)

- [Introducing Tool Calling via MCP Servers](https://parallel.ai/blog/mcp-tool-calling)

- [Introducing the Parallel Search MCP Server ](https://parallel.ai/blog/search-mcp-server)

- [Introducing Source Policy](https://parallel.ai/blog/source-policy)

- [The Parallel Task Group API](https://parallel.ai/blog/task-group-api)