# Introducing LLMTEXT, an open source toolkit for the llms.txt standard

TL;DR: We're launching LLMTEXT, an open source toolkit that helps developers create, validate, and use llms.txt files—making any website instantly accessible to AI agents through standardized markdown documentation and MCP servers.

Tags:Product Release

Reading time: 7 min

Visit LLMTEXT View Code

Introducing LLMTEXT, an open source toolkit for the llms.txt standard

## The web wasn't built for AI agents.

Wikipedia recently reported[recently reported]($https://flowingdata.com/2025/10/27/wikipedia-losing-human-views-to-ai-summaries/#:~:text=Topic,and%20a%20lot%20of%20bots) an 8% decline in human visitors, and AI has overtaken humans[AI has overtaken humans]($https://cpl.thalesgroup.com/blog/access-management/ai-bots-internet-traffic-imperva-2025-report) as the primary user of the web this year. As LLMs increasingly become the primary way people interact with online information, websites face a critical challenge: how do you serve both human visitors and AI agents effectively?

Today, we’re proud to support the launch of LLMTEXT[LLMTEXT]($http://llmtext.com), an open source toolkit by Jan Wilmake[Jan Wilmake]($https://x.com/janwilmake) to help grow the llms.txt standard. With these tools, developers can more easily create llms.txt files for their websites, check their website’s existing llms.txt for validity, or turn any existing llms.txt into a dedicated MCP (Model Context Protocol) server.

## What is LLMTEXT, and why did we make it?

Jeremy Howard[Jeremy Howard]($https://x.com/jeremyphoward) introduced the llms.txt standard[llms.txt standard]($https://llmstxt.org) to make websites more friendly for large-language models (LLMs) by giving them access to Markdown files that contain the site’s most important text, explicitly excluding distracting elements that otherwise fill up their context windows. The spec has already been adopted by companies like Anthropic, Cloudflare, Docker, HubSpot, and many others.

At Parallel, we believe that AIs will soon be the primary users of the web, which is why we support initiatives like llms.txt that introduce new standards and frameworks to embrace that future.

## What tools are available on LLMTEXT?

The three LLMTEXT tools we're releasing today serve two purposes. First, the **llms.txt MCP **helps developers use projects without hallucination by getting a dedicated MCP for every library or API they use that supports llms.txt. On the other hand, the **Check tool** and **Create tool** aid websites to serve their users the best possible experience. Let's dive into each tool.

## llms.txt MCP tool

The **llms.txt MCP **turns any public llms.txt into a dedicated MCP server. You can think of this like Context7, but instead of one MCP server for all docs, it’s a narrowly-focused MCP server for a website, making it easier to get the right context for products you use often. It also works fundamentally differently: Where Context7 uses vector search to determine what’s relevant, the **llms.txt MCP **leverages the reasoning of the LLM based on the llms.txt overview to decide which documents to ingest into the context window.

Build a terminal-style chat app using Parallel's llms.txt

Many developers have already been using llms.txt or the linked markdown files by manually copying them into their context window, but the llms.txt MCP smoothens this process by having one-click installation and providing the LLM with clear instructions on how to ingest the right context when needed. The MCP exposes two tools:

**get - **explains what this llms.txt is about, will first retrieve the llms.txt itself, then be used again to retrieve multiple relevant contexts for any task
**leaderboard - **shows the most active users of the MCP and other insights

The **llms.txt MCP **can be installed for any llms.txt that follows the standard[the standard]($https://llmstxt.org).

## Check tool

When building the llms.txt MCP and trying it out on some of the available MCPs[some of the available MCPs]($https://github.com/thedaviddias/llms-txt-hub), many of them turned out to be incorrect according to the llms.txt prescribed format[llms.txt prescribed format]($https://llmstxt.org/) for various reasons.

Illustration demonstrating deep research API concepts, web search capabilities, or AI agent integration features

The goal of your llms.txt should be to give LLMs the best possible overview: a table of contents to determine where to look for the right information. Or, as our Co-founder and Head of Product, Travers[Travers]($https://x.com/travers00/status/1975947045497344162), puts it, the goal is to retrieve the tokens that agents need to answer or make the next best decision in a loop. This means that you should have clear, distinct titles and descriptions of pages, and the individual results of pages shouldn't be too long.

To incentivize companies to fix their llms.txt and to provide only the best quality to our users, LLMTEXT[LLMTEXT]($http://llmtext.com) only allows installing MCP servers that adhere to the spec fully. Here are the most common mistakes we found in llms.txt files, with examples from popular websites:

### Document Size

To get the most out of llms.txt, documents should be token-efficient. For example, https://developers.cloudflare.com/llms.txt[https://developers.cloudflare.com/llms.txt]($https://developers.cloudflare.com/llms.txt) is 36,000 tokens for just the table of contents, creating a very large minimum amount of tokens.

Another example is https://docs.cursor.com/llms.txt[https://docs.cursor.com/llms.txt]($https://docs.cursor.com/llms.txt), which serves links to several languages. This isn't succinct and creates unnecessary overhead to an LLM that knows most languages.

To make token usage efficient when wading through context, it's best if the llms.txt itself is not bigger than the pages being linked to. If it is, it becomes a significant addition to the context window every time you want to retrieve a piece of information.

Another example is https://supabase.com/llms.txt[https://supabase.com/llms.txt]($https://supabase.com/llms.txt), where the first document linked contains approximately 800,000 tokens, which is far too large for most LLMs to process. As a first rule, we recommend keeping both llms.txt and all linked documents under 10,000 tokens.

### Incorrect content-type

The llms.txt itself, as well as the links it refers to, must lead to a text/markdown or text/plain response. This is the most common mistake in llms.txt files today.

For example, https://www.bitcoin.com/llms.txt[https://www.bitcoin.com/llms.txt]($https://www.bitcoin.com/llms.txt) and https://docs.docker.com/llms.txt[https://docs.docker.com/llms.txt]($https://docs.docker.com/llms.txt) both return HTML for every document linked to, and while listed in some registries, https://elevenlabs.io/llms.txt[https://elevenlabs.io/llms.txt]($https://elevenlabs.io/llms.txt) responds with an HTML document.

In many cases, the content-type is text/plain or text/markdown, yet it can't be parsed according to the spec[the spec]($https://llmstxt.org/). For example, https://cursor.com/llms.txt[https://cursor.com/llms.txt]($https://cursor.com/llms.txt) just lists raw URLs without markdown link format, https://console.groq.com/llms.txt[https://console.groq.com/llms.txt]($https://console.groq.com/llms.txt) does not present its links in an h2 markdown section (##), and https://lmstudio.ai/llms.txt[https://lmstudio.ai/llms.txt]($https://lmstudio.ai/llms.txt) returns all documents directly, concatenated.

### Not served at the root

Many companies ended up not serving their llms.txt at the root. For example, https://www.mintlify.com/docs/llms.txt[https://www.mintlify.com/docs/llms.txt]($https://www.mintlify.com/docs/llms.txt) and https://nextjs.org/docs/llms.txt[https://nextjs.org/docs/llms.txt]($https://nextjs.org/docs/llms.txt) are not hosted at the root, making it hard to find programmatically.

## Create tool

Most websites aren't adapted to the AI internet yet, and instead serve as HTML content intended for humans. Most CMS systems don't support the creation of a Markdown versions either. There are several llms.txt generators (hosted as well as libraries) found on the internet, but many are specific to a certain framework. And many tools to create llms.txt don’t actually follow the llms.txt spec.

For example, some tools just create the llms.txt file itself, but don't refer to plain text or markdown variants of the pages.

The extract-from-sitemap[extract-from-sitemap]($https://github.com/janwilmake/llmtext-mcp/tree) tool is a framework-agnostic way to generate an llms.txt from multiple sources. It scrapes all needed pages and turns them into markdown, powered by the new Parallel Extract API[Parallel Extract API]($https://docs.parallel.ai/api-reference/search-and-extract-api-beta/extract) (beta). We used this library to create our own llms.txt[our own llms.txt]($https://parallel.ai/llms.txt), which is also available through this repo[this repo]($https://github.com/parallel-web/parallel-llmtext) as reference, and installable as MCP[installable as MCP]($https://installthismcp.com/parallel-llmtext-mcp?url=https://mcp.llmtext.com/parallel.ai/mcp) for those building with Parallel's APIs.

## Plans for LLMTEXT

This is just the beginning. We hope that the llms.txt standard thrives and evolves into a more valuable standard with many use-cases. We've already started improving the tooling and adding utilities, and hope to see the open source community contribute as well.

## About Jan Wilmake

Jan has been an active OSS developer building dev tools in the AI context management space. His work includes uithub.com[uithub.com]($http://uithub.com), a context ingestion tool for GitHub, and the openapi-mcp-server[openapi-mcp-server]($https://github.com/janwilmake/openapi-mcp-server), which allows ingesting the full API specification of the operations you’re interested in, following a very similar pattern to how the llms.txt MCP works.

## About Parallel Web Systems

Parallel develops critical web search infrastructure for AI. Our suite of web search and agent APIs is built on a rapidly growing proprietary index of the global internet. These solutions transform human tasks that previously took weeks into agentic tasks that now take just minutes.

Fortune 100 companies use Parallel’s search and agent APIs in insurance, finance, and retail, as well as AI-first businesses like Clay, Starbridge, and Sourcegraph.

By Parallel

October 30, 2025

## Related Posts37

- [How Macroscope reduced code review false positives with Parallel](https://parallel.ai/blog/case-study-macroscope)

Reading time: 2 min

- [Introducing Parallel Search: the highest accuracy web search API engineered for AI](https://parallel.ai/blog/introducing-parallel-search)

Tags:Benchmarks

Reading time: 7 min

- [Parallel processors set new price-performance standard on SealQA benchmark](https://parallel.ai/blog/benchmarks-task-api-sealqa)

Tags:Benchmarks

Reading time: 3 min

- [How Starbridge powers public sector GTM with state-of-the-art web research](https://parallel.ai/blog/case-study-starbridge)

Tags:Case Study

Reading time: 4 min

- [Building a market research platform with Parallel Deep Research](https://parallel.ai/blog/cookbook-market-research-platform-with-parallel)

Tags:Cookbook

Reading time: 4 min

- [How Lindy brings state-of-the-art web research to automation flows](https://parallel.ai/blog/case-study-lindy)

Tags:Case Study

Reading time: 3 min

- [Introducing the Parallel Task MCP Server](https://parallel.ai/blog/parallel-task-mcp-server)

Tags:Product Release

Reading time: 4 min

- [Introducing the Core2x Processor for improved compute control on the Task API](https://parallel.ai/blog/core2x-processor)

Tags:Product Release

Reading time: 2 min

- [How Day AI merges private and public data for business intelligence](https://parallel.ai/blog/case-study-day-ai)

Tags:Case Study

Reading time: 4 min

- [Full Basis framework for all Task API Processors](https://parallel.ai/blog/full-basis-framework-for-task-api)

Tags:Product Release

Reading time: 2 min

- [Building a real-time streaming task manager with Parallel](https://parallel.ai/blog/cookbook-sse-task-manager-with-parallel)

Tags:Cookbook

Reading time: 5 min

- [How Gumloop built a new AI automation framework with web intelligence as a core node](https://parallel.ai/blog/case-study-gumloop)

Tags:Case Study

Reading time: 3 min

- [Introducing the TypeScript SDK](https://parallel.ai/blog/typescript-sdk)

Tags:Product Release

Reading time: 1 min

- [Building a serverless competitive intelligence platform with MCP + Task API](https://parallel.ai/blog/cookbook-competitor-research-with-reddit-mcp)

Tags:Cookbook

Reading time: 6 min

Introducing Parallel Deep Research reports

- [Introducing Basis with Calibrated Confidences ](https://parallel.ai/blog/introducing-basis-with-calibrated-confidences)

Tags:Product Release

Reading time: 4 min

The Parallel Task API is a state-of-the-art system for automated web research that delivers the highest accuracy at every price point.

- [Introducing the Parallel Task API](https://parallel.ai/blog/parallel-task-api)

Tags:Product Release,Benchmarks

Reading time: 4 min

# Introducing LLMTEXT, an open source toolkit for the llms.txt standard

## The web wasn't built for AI agents.

## What is LLMTEXT, and why did we make it?

## What tools are available on LLMTEXT?

## llms.txt MCP tool

An LLMTEXT MCP being used in Cursor

## Check tool

An example of validation for Docker's llms.txt

### Document Size

### Incorrect content-type

### Not served at the root

## Create tool

Parallel's llms.txt, created with the create tool from LLMTEXT

## Plans for LLMTEXT

## About Jan Wilmake

## About Parallel Web Systems

## Related Posts37

- [Introducing Parallel Extract](https://parallel.ai/blog/introducing-parallel-extract)

- [Introducing Parallel FindAll](https://parallel.ai/blog/introducing-findall-api)

- [Introducing Parallel Monitor](https://parallel.ai/blog/monitor-api)

- [Parallel raises $100M Series A to build web infrastructure for agents](https://parallel.ai/blog/series-a)

- [How Macroscope reduced code review false positives with Parallel](https://parallel.ai/blog/case-study-macroscope)

- [Introducing Parallel Search: the highest accuracy web search API engineered for AI](https://parallel.ai/blog/introducing-parallel-search)

- [Parallel processors set new price-performance standard on SealQA benchmark](https://parallel.ai/blog/benchmarks-task-api-sealqa)

- [How Starbridge powers public sector GTM with state-of-the-art web research](https://parallel.ai/blog/case-study-starbridge)

- [Building a market research platform with Parallel Deep Research](https://parallel.ai/blog/cookbook-market-research-platform-with-parallel)

- [How Lindy brings state-of-the-art web research to automation flows](https://parallel.ai/blog/case-study-lindy)

- [Introducing the Parallel Task MCP Server](https://parallel.ai/blog/parallel-task-mcp-server)

- [Introducing the Core2x Processor for improved compute control on the Task API](https://parallel.ai/blog/core2x-processor)

- [How Day AI merges private and public data for business intelligence](https://parallel.ai/blog/case-study-day-ai)

- [Full Basis framework for all Task API Processors](https://parallel.ai/blog/full-basis-framework-for-task-api)

- [Building a real-time streaming task manager with Parallel](https://parallel.ai/blog/cookbook-sse-task-manager-with-parallel)

- [How Gumloop built a new AI automation framework with web intelligence as a core node](https://parallel.ai/blog/case-study-gumloop)

- [Introducing the TypeScript SDK](https://parallel.ai/blog/typescript-sdk)

- [Building a serverless competitive intelligence platform with MCP + Task API](https://parallel.ai/blog/cookbook-competitor-research-with-reddit-mcp)

- [Introducing Parallel Deep Research reports](https://parallel.ai/blog/deep-research-reports)

- [A new pareto-frontier for Deep Research price-performance](https://parallel.ai/blog/deep-research-benchmarks)

- [Building a Full-Stack Search Agent with Parallel and Cerebras](https://parallel.ai/blog/cookbook-search-agent)

- [Webhooks for the Parallel Task API](https://parallel.ai/blog/webhooks)

- [Introducing Parallel: Web Search Infrastructure for AIs ](https://parallel.ai/blog/introducing-parallel)

- [Introducing SSE for Task Runs](https://parallel.ai/blog/sse-for-tasks)

- [A new line of advanced processors: Ultra2x, Ultra4x, and Ultra8x ](https://parallel.ai/blog/new-advanced-processors)

- [Introducing Auto Mode for the Parallel Task API](https://parallel.ai/blog/task-api-auto-mode)

- [A state-of-the-art search API purpose-built for agents](https://parallel.ai/blog/search-api-benchmark)

- [Parallel Search MCP Server in Devin](https://parallel.ai/blog/parallel-search-mcp-in-devin)

- [Introducing Tool Calling via MCP Servers](https://parallel.ai/blog/mcp-tool-calling)

- [Introducing the Parallel Search MCP Server ](https://parallel.ai/blog/search-mcp-server)

- [Introducing Source Policy](https://parallel.ai/blog/source-policy)

- [The Parallel Task Group API](https://parallel.ai/blog/task-group-api)

- [State of the Art Deep Research APIs](https://parallel.ai/blog/deep-research)

- [Introducing the Parallel Search API ](https://parallel.ai/blog/parallel-search-api)

- [Introducing the Parallel Chat API ](https://parallel.ai/blog/chat-api)

- [Introducing Basis with Calibrated Confidences ](https://parallel.ai/blog/introducing-basis-with-calibrated-confidences)

- [Introducing the Parallel Task API](https://parallel.ai/blog/parallel-task-api)

Info