Original Research

OpenAI Crawls llms.txt Every 15 Minutes. ChatGPT Says It Doesn't Matter.

Every major AI platform publishes an llms.txt file on their own website. Their bots actively crawl yours. But ask them if it works, and they'll tell you it's unproven. We ran the data. It's not unproven.

By Adam C. Higdon · April 8, 2026 · 8 min read

Ask ChatGPT whether your business should implement an llms.txt file — a simple markdown document at your site root that helps large language models understand your content — and you'll get a polite, measured response. Something along the lines of: "There's currently no evidence that major AI platforms use llms.txt to determine search rankings or citations. It's an emerging proposal, but its practical impact remains unproven."

That answer sounds reasonable. Cautious, even responsible. There's just one problem.

It's contradicted by everything OpenAI actually does.

Exhibit A: OpenAI Publishes Their Own llms.txt

OpenAI doesn't just know about llms.txt. They maintain it across multiple properties. Not a placeholder. Not a test file. Comprehensive, structured, actively maintained documentation indexes designed to help LLMs parse their content.

Live llms.txt files on OpenAI domains
cdn.openai.com/API/docs/txt/llms.txt
platform.openai.com/docs/llms.txt
developers.openai.com/api/docs/llms-full.txt
developers.openai.com/codex/llms.txt
developers.openai.com/codex/llms-full.txt

Their main llms.txt file links to four sub-files — models and pricing data, guide documentation, API reference, and a combined full export. This isn't a token gesture toward an emerging standard. It's infrastructure. They built it, they maintain it, and they update it.

If llms.txt were meaningless, why invest engineering resources in deploying it across five separate URLs?

Exhibit B: They're Not Alone

OpenAI isn't an outlier. Every major AI platform that operates a search or citation engine publishes their own llms.txt.

llms.txt files across major AI platforms
docs.anthropic.com/llms.txt — Anthropic (Claude)
claude.com/llms.txt — Claude consumer app
docs.claude.com/llms.txt — Claude documentation
console.anthropic.com/llms-full.txt — API console
docs.perplexity.ai/llms.txt — Perplexity

Anthropic — the company behind Claude — maintains llms.txt across four separate subdomains. Perplexity, whose entire business model is AI-powered search, publishes one on their developer documentation. These aren't companies hedging their bets on an unproven standard. These are the companies building the standard by using it.

When every company in the room has one on their own website, the question isn't whether llms.txt works. The question is why they're telling you it doesn't.

Exhibit C: GPTBot Is Actively Crawling Your llms.txt

In July 2025, server log analysis by Archer Education revealed something striking: OpenAI's crawler was requesting llms.txt files on third-party websites at a sustained cadence — hitting the file every 15 minutes and checking for freshness. The crawling IP traced directly to OpenAI's documented infrastructure.

This wasn't a one-off. The analysis showed GPTBot requesting llms.txt across multiple sites, including smaller niche properties that wouldn't normally warrant frequent crawling. More revealing: GPTBot — the crawler specifically designated for model training, not search — was the one making these requests. OpenAI found the file valuable enough to ingest it into their training pipeline.

They're not reading your llms.txt out of curiosity. They're ingesting it into the models that generate the answers your customers see.

Meanwhile, Google's Gary Illyes stated publicly that Google doesn't support llms.txt and isn't planning to. You can disagree with Google's position, but at least it's consistent. They don't publish one, and they don't crawl yours. OpenAI publishes theirs, crawls yours, and tells you through ChatGPT that there's no evidence it matters.

That's not caution. That's a conflict of interest.

The Data: 2.1× More AI Citations

Theory and server logs are compelling. But we wanted numbers. So we ran them.

In April 2026, Faneros conducted a technical AI readiness audit of 10 highly competitive personal injury law firms in the Chicago market. We tested 17 high-intent queries across 7 major AI platforms — ChatGPT, Claude, Perplexity, Gemini, Grok, Copilot, and Google AI Overview — and counted every time each firm was mentioned or recommended. Then we crawled every firm's live website, validating schema markup, robots.txt rules, llms.txt presence, and content extractability.

The results were unambiguous.

38.75
Avg. AI mentions
Firms WITH llms.txt
18.3
Avg. AI mentions
Firms WITHOUT llms.txt
2.1×
Visibility
multiplier

Firms with an llms.txt file averaged more than double the AI mentions of firms without one. This held even when controlling for authority. One mid-tier firm with fewer pages and less brand recognition than a competitor outperformed them 2.5× — 31 mentions vs. 12 — driven almost entirely by technical optimization: a clean llms.txt, explicit AI bot permissions in robots.txt, validated schema, and named author attribution on every blog post.

On the other end of the spectrum, a firm whose lead attorney was ranked #1 in the state for over 15 consecutive years received only 13 AI mentions. Billions in recoveries, top-tier peer recognition, unmatched courtroom authority — and they tied for the bottom of the ranking. No llms.txt. No FAQ schema. Honeypot security folders confusing legitimate crawlers. World-class authority, technically invisible to AI.

Technical Factor Firms With Avg. Mentions Firms Without Avg. Mentions Multiplier
llms.txt present 4 38.75 6 18.3 2.1×
Working FAQ schema 2 56.5 8 ~20 ~2.8×
Explicit AI bot rules 1 31 9 ~22 1.4×
Zero schema errors 8 ~31 2 12.5 2.5×

A note on methodology: this is a focused sample of 10 firms in a single geography and practice area. The sample size is intentionally tight to control variables — all 10 firms compete for the same queries in the same market. While broader studies will add further validation, the 2.1× signal is clear and consistent within the dataset.

The Framework: Authority Is the Ceiling. Machine Readability Is the Floor.

There's an ongoing debate in the GEO community about what drives AI visibility. One camp says it's all about E-E-A-T — experience, expertise, authoritativeness, trustworthiness. The legacy authority signals that have dominated SEO for a decade. The other camp says it's about machine readability — clean schema, extractable content, explicit crawler permissions, llms.txt.

Our data resolves this. They're not competing theories. They're two different axes.

Authority sets the ceiling. It determines your maximum potential visibility. Record verdicts, industry awards, institutional leadership, peer recognition — these signals still matter. No amount of technical optimization will make an unknown firm dominate AI recommendations without substance behind it.

Machine readability sets the floor. It determines how much of your existing authority actually reaches AI audiences. A firm with extraordinary credentials but a technically opaque website will be invisible. A firm with moderate credentials but pristine technical infrastructure will punch well above its weight.

The firms getting crushed right now aren't firms without authority. They're firms whose authority is locked behind broken schema, missing llms.txt files, and JavaScript-heavy pages that AI crawlers can't parse. The content exists. The reputation exists. The wrapper is broken.

Why the Platforms Won't Tell You This

Understanding why OpenAI, Anthropic, and Perplexity publish their own llms.txt files while their chatbots downplay the practice requires understanding incentives.

These platforms are training their models on the open web. The more freely they can crawl and ingest content, the better their models perform. An llms.txt file is essentially a curated guide saying: here's our best content, structured exactly how you need it. Of course they want that file on every website they crawl. It makes their job easier and their models smarter.

But if they officially endorsed llms.txt as a ranking factor — if they said publicly "yes, sites with llms.txt get more citations" — two things would happen. First, everyone would implement one, which reduces the competitive advantage for early adopters and potentially floods the system with low-quality implementations. Second, it would create an explicit optimization target, inviting the same kind of gaming that plagued traditional SEO for years.

So they do what any rational platform operator does: they benefit from it silently while publicly maintaining plausible deniability.

The same playbook search engines have run for two decades. "We don't comment on specific ranking factors." Meanwhile, every practitioner who studies the data knows exactly what moves the needle.

What You Should Actually Do

Stop asking AI chatbots whether their own optimization levers work. Watch what the platforms do, not what they say.

Here's what the evidence supports:

Deploy an llms.txt file. Put it at your site root. Include clear content summaries, key pages, practice area overviews, and links to your most important content in markdown format. OpenAI's own implementation is a useful reference for structure.

Update your robots.txt. Explicitly name and allow the major AI user agents — GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot, CCBot. This is a direct signal that your site welcomes AI crawling. In our audit, the single firm that did this outperformed its authority class by a wide margin.

Fix your schema. Validate every JSON-LD block. Deploy page-specific FAQ schema on high-intent pages. Add Article or BlogPosting schema with named authors and dates on every content piece. Eliminate duplicate blocks and parse errors. In our data, firms with zero schema errors averaged 2.5× the mentions of firms with errors.

Make your content extractable. Lead with concise, question-based copy. Don't bury your best content behind JavaScript fragments. When an AI crawler hits your page, the first readable paragraph should be your most valuable one.

Most of these fixes take days, not months. Some take minutes. The llms.txt file itself can be deployed in an afternoon.

The Bottom Line

We are in the early innings of a generational shift in how information is discovered, evaluated, and recommended. The companies building the AI platforms have already made their infrastructure decisions — they use llms.txt on their own properties, their bots actively crawl it on yours, and the correlation with visibility is measurable and consistent.

The only people saying it doesn't work are the chatbots those same companies built.

Believe the data. Not the chatbot.

Don't ask the machine if the machine reads your instructions. Watch the server logs. Count the citations. Follow the evidence.

How visible is your business to AI?

Faneros scans 7 major AI platforms and delivers a full technical readiness audit — llms.txt, schema, robots.txt, content extractability, and competitive positioning.

Request a Free AI Visibility Scan