AI answer engines (ChatGPT Search, Gemini, Perplexity) now read the open web at answer time and show sources.
That creates a new optimization surface: make your best information easy to find, parse, and cite.
The community proposal llms.txt (by Jeremy Howard) is a tiny Markdown file at /llms.txt that points LLMs to your most useful, LLM-readable pages. It’s used at inference time and complements—not replaces—robots.txt and sitemaps.
What is llms.txt (simple + technical)
Plain-English definition
llms.txt is a short Markdown file at https://yourdomain.com/llms.txt. It summarizes what your site covers and links to concise, LLM-friendly docs (often Markdown mirrors of key pages) so answer engines can quote you reliably. It’s a community proposal, not a formal web standard.
Technical definition
The proposal specifies a root-path file with a required H1, a short summary in a blockquote, optional context, and one or more H2 “file list” sections with items like [Descriptive name](URL) plus optional notes. You can also add an “Optional” section for secondary material.
How llms.txt works (format, sections, context files)
- Format & sections. The spec defines a consistent, human-readable Markdown layout that tools can parse deterministically.
- Markdown mirrors. The proposal recommends clean Markdown versions of your docs (often by serving
page.html.md) to remove ad/JS noise and improve grounding and quoting. - Context expansion (optional). Tooling can generate
llms-ctx.txtandllms-ctx-full.txtfrom your llms.txt so agents/evaluators can load your content directly. GitHub - Plays nicely with existing standards. llms.txt complements
robots.txt(access policies) andsitemap.xml(discovery) by offering a curated, LLM-ready index for inference time.
Benefits for GEO/AEO—and limits
Why GEO/AEO teams should care
- Higher inclusion and cleaner citations. ChatGPT Search shows inline citations and a Sources panel; Gemini can display Sources/related links. If your llms.txt points to quotable Markdown, you increase the odds of being cited.
- Faster, safer grounding. Markdown reduces parsing noise vs. ad-heavy HTML, so models can lift definitions, steps, and tables more accurately.
- Operator control. You spotlight the exact definitions/procedures you want models to rely on and can flag optional material for truncation when context is tight.
Important limits
- Not access control. Use
robots.txtto manage crawlers. OpenAI documents separate bots—OAI-SearchBot (search) vs GPTBot (training)—and how to allow/deny them. llms.txt does not replace those controls. - Not a ranking guarantee. llms.txt improves discoverability and quote quality; it does not guarantee inclusion or placement.
Risks & governance
- Staleness. Out-of-date mirrors can mislead engines. Add “last reviewed” notes in mirrors and automate checks in CI.
- Over-curation. If you hide caveats, models can over-generalize. Keep the summary conservative and link to primary evidence in the mirrors.
- False sense of control. Keep
robots.txtdirectives for GPTBot/OAI-SearchBot and monitor logs; llms.txt only helps engines use your content.
Framework: LLM File Protocol
A practical workflow to design, publish, and operate llms.txt at scale.
1) Map the answers you want cited
List the definitions, procedures, comparisons, and data tables you want answer engines to quote (aligned to your topic clusters). Prioritize these in your mirrors.
2) Produce LLM-readable mirrors
Automate Markdown outputs (docs generator or build step). Follow the spec’s .md convention so tools can fetch clean text reliably. Tip: put key definitions at the top of each mirror and cite your primary sources inline.
3) Author /llms.txt
- H1 = project/site name
- Blockquote = concise description and caveats
- H2 sections = labeled lists of must-use docs
- Add an Optional section for secondary material.
4) Generate context files (optional)
Create llms-ctx.txt and llms-ctx-full.txt from your llms.txt to provide pre-packed context for agents and evaluators.
5) Keep access controls separate
Set your robots.txt policy for GPTBot (training) and OAI-SearchBot (search). Update directives and watch server logs.
6) Measure impact (GEO/AEO)
Track inclusion & citations in ChatGPT Search and Gemini across a fixed prompt set; pair this with SERP metrics for a unified view. Both products visibly surface sources.
Example implementation (with sample file)
Minimal /llms.txt (with natural, descriptive link text)
# Acme Docs
> Concise docs for Acme’s analytics platform. Use these pages for definitions, setup, and API usage. Last reviewed: 2025-12-09.
## Start here
- [Product overview](https://example.com/docs/overview.html.md) — what Acme does and core concepts
- [Quickstart](https://example.com/docs/quickstart.html.md) — 5-minute setup
## Reference
- [API reference](https://example.com/docs/api.html.md) — endpoints and auth
- [Event schema](https://example.com/docs/events.html.md) — tables and types
## How-to
- [Send events from JavaScript](https://example.com/docs/js-events.html.md)
- [Build conversion funnels](https://example.com/docs/funnels.html.md)
## Optional
- [Changelog](https://example.com/docs/changelog.html.md)
Pre-publish checklist
- Mirrors render cleanly; headings and tables parse as plain text.
- Each non-obvious claim in mirrors links to an official source.
- CI job regenerates mirrors and validates llms.txt structure.
FAQ
Is llms.txt an official web standard?
No. It’s a community proposal (by Jeremy Howard) with an open repo and growing tooling.
Does llms.txt control AI crawlers?
No. Control access with robots.txt—including policies for GPTBot (training) and OAI-SearchBot (search). llms.txt guides models to the right materials at answer time.
Why Markdown?
It’s consistent for both people and programs, which makes deterministic parsing and quoting much easier than complex HTML.
How does this help with AI search?
ChatGPT Search shows inline citations and a Sources button; Gemini can show Sources/related links. Clean, quotable .md pages improve the likelihood and quality of citations.
Should we generate llms-ctx.txt?
If agents or evaluators need pre-packed context, yes—the repo includes tooling to generate llms-ctx.txt / llms-ctx-full.txt.
Does this replace sitemaps or structured data?
No. Keep your sitemap.xml and structured data for classic SEO; llms.txt is an extra layer that helps answer engines use your content at inference time.
You don’t need a platform overhaul to make your content quotable in AI search.
In most cases, you can ship the basics in an afternoon:
- Export clean Markdown mirrors for your top docs.
- Write a concise
/llms.txtthat points to them with descriptive link text. - Keep
robots.txtpolicies in place for GPTBot and OAI-SearchBot. - Track citations in ChatGPT Search and Gemini across a small prompt set.
Was this helpful?
.png)





