TECHNICAL

The Complete Guide to robots.txt, llms.txt, and JSON-LD Schema

Three files determine whether AI recommends your business or your competitor. Here's exactly what each one does.
By Faneros AI · March 2026 · 10 min read

Three files determine whether AI recommends your business or your competitor. That sounds reductive — and it is, slightly. There are other factors. But these three files address the most common and most impactful failure points in AI visibility, and getting them right covers more ground than any other single optimization effort.

If you do nothing else after reading this guide, create or fix these three files. The ROI on time invested is extraordinary.

How AI "Reads" Your Website

AI platforms don't browse your site the way humans do. They don't scroll through your homepage, admire your design, or click through your navigation. They send crawlers — automated programs that request your pages via HTTP, parse the returned content, and store it in a database that the AI model queries when generating responses.

This process has three stages, and each of these three files maps to exactly one stage:

robots.txt
yoursite.com/robots.txt
Controls access — can the crawler get in?
llms.txt
yoursite.com/llms.txt
Provides context — who are you?
JSON-LD
Embedded in HTML <head>
Provides data — machine-readable facts

Without robots.txt allowing access, AI crawlers never see your site. Without llms.txt, AI has to piece together what your business does from raw HTML — and it often gets it wrong or gives up. Without JSON-LD, AI can't extract structured facts with the confidence needed to recommend you over a competitor whose structured data is comprehensive.

File 1: robots.txt — The Gatekeeper

Your robots.txt file sits at the root of your domain (yoursite.com/robots.txt) and is the first file every crawler checks before reading anything else on your site. It's a plain text file that uses a simple syntax to tell bots what they're allowed to read and what's off-limits.

For AI visibility, you need to explicitly allow the major AI crawlers:

# Allow AI crawlers — explicit Allow directives
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Googlebot
Allow: /

User-agent: Bytespider
Allow: /

User-agent: Bingbot
Allow: /

# Block only genuinely sensitive areas
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /staging/

# Point crawlers to your sitemap
Sitemap: https://yoursite.com/sitemap.xml
The critical mistake most businesses make: not having explicit Allow: / directives for AI crawlers. Many default robots.txt files either block everything with a wildcard rule, or say nothing about AI crawlers — which some platforms interpret conservatively as "when in doubt, don't crawl."

File 2: llms.txt — Your AI Resume

The llms.txt standard was proposed in late 2024 as a way for websites to communicate directly with AI platforms in a structured format. Think of it as the README.md of your business — but written for language models instead of developers.

It sits at yoursite.com/llms.txt and provides a concise, structured summary of your business in plain text with light Markdown formatting. A well-structured llms.txt includes:

97%
of websites don't have an llms.txt file. Adding one is the highest-impact, lowest-effort GEO improvement available today.

The Q&A section deserves special attention. When someone asks ChatGPT a question about your industry, AI looks for content it can cite with confidence. Pre-formatted Q&A pairs in your llms.txt give AI exactly what it needs — authoritative answers from your business, structured in a format that's designed for extraction and citation.

File 3: JSON-LD Schema — Machine-Readable Facts

JSON-LD (JavaScript Object Notation for Linked Data) is structured data embedded in your HTML's <head> section that tells AI exactly what your business is, where it's located, what services it offers, and what credentials it holds. Unlike human-readable text that AI has to interpret (sometimes incorrectly), JSON-LD provides facts in a format AI can parse with perfect accuracy.

Key schema types for AI visibility include Organization, LocalBusiness, Service, FAQPage, and Review. But the real differentiator between generic schema and GEO-optimized schema is the inclusion of fields that AI specifically weights:

areaServed knowsAbout speakable audience alternativeHeadline about

These fields go beyond what Google needs for rich snippets. They're the fields that give AI platforms the confidence to recommend your business — because they provide structured answers to the exact questions AI asks itself when assembling a recommendation: "Where does this business operate?" (areaServed), "What does it specialize in?" (knowsAbout), "What content should I quote?" (speakable).

How These Three Files Work Together

1

robots.txt opens the door

Without it explicitly allowing access, AI crawlers never see your site. The other two files are invisible. This is why crawler access is the highest-weighted factor in any AI visibility score — it's the prerequisite for everything else.

2

llms.txt introduces you

Once the crawler is in, llms.txt gives AI a clear, authoritative summary it can use immediately. Without it, AI has to parse your HTML, interpret your navigation, and guess what your business does. With it, AI knows exactly who you are, what you offer, and how to describe you.

3

JSON-LD proves your credentials

Structured data provides the machine-readable facts that AI uses to make recommendation decisions with confidence. Your location, services, credentials, reviews, and expertise — all in a format that eliminates interpretation errors and gives AI the data it needs to recommend you accurately.

Together, these three files cover the full pipeline from "can AI find me" to "does AI understand me" to "does AI trust me enough to recommend me." Faneros generates all three — customized to your specific business, audit findings, and competitive landscape — as part of its 18-deliverable output.

See Your AI Visibility Score

Faneros scans 7 AI platforms in 60 seconds. Find out if ChatGPT, Claude, and Perplexity can see your business.

Scan My Site →