llms.txt vs llms-full.txt?

Use llms.txt for a short priority list buyers and models can scan in seconds. Use llms-full.txt (or equivalent) for a wider catalog index when you have many SKUs and resources. Keep both on the same hostname and HTTPS scheme; never fork www vs apex without redirects.

Do models guarantee they read llms.txt?

No. Adoption varies by tool and fetch policy. The file still helps human curators, sales engineers, and systems that explicitly fetch it. Your primary defense against hallucination remains consistent prerendered HTML and JSON-LD on money pages.

Should we include pricing numbers in llms.txt?

Only if they match public pricing copy and signed contracts. Otherwise link to pricing and describe bands qualitatively. Misquoted pricing in AI answers damages enterprise trust faster than missing prices.

How often should llms.txt change?

Review quarterly at minimum and immediately after slug changes, new state registrations, or panel shifts post-regulatory orders. Tie updates to the same ticket that updates sitemap and prerender routes.

Does llms.txt replace structured data?

No. Use Dataset and Organization JSON-LD on product pages in addition to llms.txt. The file is navigation for models; schema is typed metadata for search and compliance-oriented parsers.

llms.txt Playbook for B2B Data Brokers

llms.txt is a curated map of what you want language models and retrieval tools to read first. It is not a second privacy policy, not a keyword dump, and not a workaround for weak product pages. For data brokers, the best files read like a table of contents to defensible facts: MAID Feed delivery specs, POI geofencing coverage claims tied to dictionaries, trust and registration pages, comparison rubrics, and ten to twenty flagship resources. GSDSI serves /llms.txt and /llms-full.txt beside the SPA shell, regenerated when routes change per AI search readiness. Pair the playbook with robots.txt policy for AI agents so machines receive consistent instructions.

Key Takeaways

Under ~80 lines in the short llms.txt; link out to llms-full.txt or sitemap for long catalogs.
Lead with commercial truth pages: products, pricing posture, pilot process, trust, not blog archives alone.
Never contradict robots.txt or noindex routes; mixed signals produce hallucinated or stale citations.
Refresh on the same cadence as sitemap: quarterly minimum, immediately after slug or registration changes.
Anchor text should match visible H1s on prerendered HTML so models quote numbers legal will defend.

Recommended Sections in llms.txt

Organize the file in the order a procurement analyst thinks: who you are, what you sell, how you govern data, how you compare, and what resources prove claims. Each line should be a fully qualified HTTPS URL on your canonical host. Avoid relative paths: some fetchers resolve them incorrectly across CDNs.

Optional but useful: a Changelog section with dated bullets when registrations or panel counts change: models rarely read it, but human analysts do. Keep changelog entries short; link to the authoritative page for detail. Do not paste entire privacy policies into llms.txt; link privacy policy instead.

Company: /company, /contact, /careers for entity resolution.
Products: top SKUs with one-line coverage tied to data dictionaries; include Core Email and tickerized data when finance buyers are core ICP.
Solutions: buyer-motion pages: cross-channel measurement, risk management, B2B prospecting.
Trust: /trust, /trust/data-broker-registrations, privacy policy, sourcing methodology.
Comparisons: /comparisons and category compare pages with rubric language.
Resources: newest procurement and compliance guides, including this playbook and GDPR Art. 14.

Mistakes That Break Citations

Listing URLs that 307 to new paths, pointing to noindex search shells, or embedding unverifiable superlatives causes models to quote stale or inflated claims. The procurement glossary and live glossary reduce vocabulary drift between humans and agents. Do not park confidential API specs in llms.txt: if it is not safe on a public product page, it does not belong in the machine index.

Common Syntax and Hosting Failures

Another failure mode is volume mismatch: homepage says 301M+ devices while llms.txt says 250M. Models blend figures. Centralize counts in one internal SSOT and mirror exactly in llms.txt, prerendered HTML, and Dataset JSON-LD. See quotable catalog stats for discipline patterns.

Comment syntax matters: llms.txt is Markdown-oriented. Use # section headers sparingly and keep URLs one per line. Avoid HTML entities in descriptions: some fetchers strip them unpredictably. When you rename a product slug, add a 301 and update llms.txt in the same commit you update CPG Feed or specialized segments prerender bodies.

Hosting on CDNs that strip unknown paths can return 404 for /llms.txt in some regions: test from EU and US vantage points. Content-Type should be text/plain; charset=utf-8. Compression is fine; redirects are not.

Alignment With robots.txt and Sitemap

Before each release, diff three artifacts: robots.txt, sitemap.xml, and llms.txt. Any URL in llms.txt should be allowed by robots (unless you intentionally document an exception) and listed in sitemap when indexable. Google's sitemap guidelines and the llms.txt proposal are the external references to cite in engineering tickets.

Staging environments should either disallow all crawlers or mirror production rules: half-open staging with real product copy is how competitors and models capture draft claims. Use noindex on staging hosts and separate llms.txt if you must expose staging to buyers under VPN.

Export sitemap URLs to CSV; flag llms.txt lines missing from sitemap.
Flag robots-disallowed paths that still appear in marketing llms.txt.
Run no-JS fetch on three linked product pages: confirm 500+ words and H1 in prerender.
Archive prior llms.txt in git; citations drift when URLs disappear without redirects.

Operational Checklist for Marketing and Engineering

Assign ownership: marketing curates priority order, engineering validates HTTP status and prerender, legal reviews claims in one-line descriptors. On publish day, submit IndexNow or Search Console updates for new resources. Staging should serve the same llms.txt host policy as production: buyers' security tools fetch staging during pilots.

Automate validation in CI: a script fails the build if llms.txt contains URLs returning 404 or 307 chains longer than one hop. Pair with AI agent crawling policy checks so marketing does not opt into llms.txt URLs that robots blocks.

Quarterly review calendar tied to product launches and registration updates.
Include AI search readiness in release checklist.
Add llms.txt URL to RFP attachments so procurement teams know where to look.
Train sales engineers to paste llms.txt only alongside human context, not as a substitute for contracts.

How Buyers Should Use a Vendor's llms.txt

Treat llms.txt as a starting index, not due diligence completion. Cross-check three product claims against contract exhibits and seed tests. If the vendor blocks AI crawlers in robots.txt but promotes llms.txt heavily, ask why. See AI agent crawling. For location programs, validate FTC sensitive location thresholds on mobility samples before trusting geofence marketing copy.

Procurement portals should store the SHA-256 hash of the llms.txt file downloaded at contract signature: if the vendor changes trust URLs mid-contract without notice, you have evidence for change-of-control clauses. Pair hash discipline with sourcing methodology version IDs in the same diligence record.

Strong vendors link llms.txt from the site footer and reference it in data broker registration packets so models and portals cite the same trust spine.

For international buyers, add EU and US compliance resources in the same Trust block, GDPR Art. 14 beside FTC sensitive location: so llms.txt acts as a compliance table of contents, not only a product menu. Sales teams can paste the raw file into diligence rooms when prospects ask how to navigate your public proof.

Measure success by citation accuracy, not file size: quarterly spot-check five buyer prompts in major AI tools and verify URLs, registration numbers, and product names match your llms.txt and prerendered HTML.

Enterprise API vendors should list OpenAPI or developer portal URLs only if those pages are public and robots-allowed: otherwise link developers hub pages that describe authentication without exposing keys. Never embed API keys or sample bearer tokens in llms.txt; retrieval logs leak.

When two business units share a domain, split llms.txt sections by brand only if legal entities differ: otherwise keep one file to avoid forked citations. Acquisitions should merge llms.txt within 30 days of redirect completion.

Brand teams sometimes request campaign landing pages in llms.txt: allow only if those pages contain durable product facts, not time-boxed promotions. Ephemeral promos belong in email, not machine indexes models treat as canonical.

Version-control llms.txt in git with required review from legal when Trust or Products sections change: same as you would for privacy policy updates. Tag releases with site deploy IDs so support can answer which llms.txt a prospect downloaded.

If you syndicate to data marketplaces, ensure marketplace blurbs link back to canonical product URLs listed in llms.txt: marketplaces should not become the SSOT models prefer.

Frequently Asked Questions

llms.txt vs llms-full.txt?: Use llms.txt for a short priority list buyers and models can scan in seconds. Use llms-full.txt (or equivalent) for a wider catalog index when you have many SKUs and resources. Keep both on the same hostname and HTTPS scheme; never fork www vs apex without redirects.
Do models guarantee they read llms.txt?: No. Adoption varies by tool and fetch policy. The file still helps human curators, sales engineers, and systems that explicitly fetch it. Your primary defense against hallucination remains consistent prerendered HTML and JSON-LD on money pages.
Should we include pricing numbers in llms.txt?: Only if they match public pricing copy and signed contracts. Otherwise link to pricing and describe bands qualitatively. Misquoted pricing in AI answers damages enterprise trust faster than missing prices.
How often should llms.txt change?: Review quarterly at minimum and immediately after slug changes, new state registrations, or panel shifts post-regulatory orders. Tie updates to the same ticket that updates sitemap and prerender routes.
Does llms.txt replace structured data?: No. Use Dataset and Organization JSON-LD on product pages in addition to llms.txt. The file is navigation for models; schema is typed metadata for search and compliance-oriented parsers.