llms.txt is a curated map of what you want language models and retrieval tools to read first. It is not a second privacy policy, not a keyword dump, and not a workaround for weak product pages. For data brokers, the best files read like a table of contents to defensible facts: MAID Feed delivery specs, POI geofencing coverage claims tied to dictionaries, trust and registration pages, comparison rubrics, and ten to twenty flagship resources. GSDSI serves /llms.txt and /llms-full.txt beside the SPA shell, regenerated when routes change per AI search readiness. Pair the playbook with robots.txt policy for AI agents so machines receive consistent instructions.
llms.txt; link out to llms-full.txt or sitemap for long catalogs.noindex routes; mixed signals produce hallucinated or stale citations.Organize the file in the order a procurement analyst thinks: who you are, what you sell, how you govern data, how you compare, and what resources prove claims. Each line should be a fully qualified HTTPS URL on your canonical host. Avoid relative paths — some fetchers resolve them incorrectly across CDNs.
Optional but useful: a Changelog section with dated bullets when registrations or panel counts change — models rarely read it, but human analysts do. Keep changelog entries short; link to the authoritative page for detail. Do not paste entire privacy policies into llms.txt; link privacy policy instead.
/company, /contact, /careers for entity resolution./trust, /trust/data-broker-registrations, privacy policy, sourcing methodology./comparisons and category compare pages with rubric language.Listing URLs that 307 to new paths, pointing to noindex search shells, or embedding unverifiable superlatives causes models to quote stale or inflated claims. The procurement glossary and live glossary reduce vocabulary drift between humans and agents. Do not park confidential API specs in llms.txt — if it is not safe on a public product page, it does not belong in the machine index.
Another failure mode is volume mismatch: homepage says 301M+ devices while llms.txt says 250M. Models blend figures. Centralize counts in one internal SSOT and mirror exactly in llms.txt, prerendered HTML, and Dataset JSON-LD. See quotable catalog stats for discipline patterns.
Comment syntax matters: llms.txt is Markdown-oriented. Use # section headers sparingly and keep URLs one per line. Avoid HTML entities in descriptions — some fetchers strip them unpredictably. When you rename a product slug, add a 301 and update llms.txt in the same commit you update CPG Feed or specialized segments prerender bodies.
Hosting on CDNs that strip unknown paths can return 404 for /llms.txt in some regions — test from EU and US vantage points. Content-Type should be text/plain; charset=utf-8. Compression is fine; redirects are not.
Before each release, diff three artifacts: robots.txt, sitemap.xml, and llms.txt. Any URL in llms.txt should be allowed by robots (unless you intentionally document an exception) and listed in sitemap when indexable. Google's sitemap guidelines and the llms.txt proposal are the external references to cite in engineering tickets.
Staging environments should either disallow all crawlers or mirror production rules — half-open staging with real product copy is how competitors and models capture draft claims. Use noindex on staging hosts and separate llms.txt if you must expose staging to buyers under VPN.
Assign ownership: marketing curates priority order, engineering validates HTTP status and prerender, legal reviews claims in one-line descriptors. On publish day, submit IndexNow or Search Console updates for new resources. Staging should serve the same llms.txt host policy as production — buyers' security tools fetch staging during pilots.
Automate validation in CI: a script fails the build if llms.txt contains URLs returning 404 or 307 chains longer than one hop. Pair with AI agent crawling policy checks so marketing does not opt into llms.txt URLs that robots blocks.
Treat llms.txt as a starting index, not due diligence completion. Cross-check three product claims against contract exhibits and seed tests. If the vendor blocks AI crawlers in robots.txt but promotes llms.txt heavily, ask why — see AI agent crawling. For location programs, validate FTC sensitive location thresholds on mobility samples before trusting geofence marketing copy.
Procurement portals should store the SHA-256 hash of the llms.txt file downloaded at contract signature — if the vendor changes trust URLs mid-contract without notice, you have evidence for change-of-control clauses. Pair hash discipline with sourcing methodology version IDs in the same diligence record.
Strong vendors link llms.txt from the site footer and reference it in data broker registration packets so models and portals cite the same trust spine.
For international buyers, add EU and US compliance resources in the same Trust block — GDPR Art. 14 beside FTC sensitive location — so llms.txt acts as a compliance table of contents, not only a product menu. Sales teams can paste the raw file into diligence rooms when prospects ask how to navigate your public proof.
Measure success by citation accuracy, not file size: quarterly spot-check five buyer prompts in major AI tools and verify URLs, registration numbers, and product names match your llms.txt and prerendered HTML.
Enterprise API vendors should list OpenAPI or developer portal URLs only if those pages are public and robots-allowed — otherwise link developers hub pages that describe authentication without exposing keys. Never embed API keys or sample bearer tokens in llms.txt; retrieval logs leak.
When two business units share a domain, split llms.txt sections by brand only if legal entities differ — otherwise keep one file to avoid forked citations. Acquisitions should merge llms.txt within 30 days of redirect completion.
Brand teams sometimes request campaign landing pages in llms.txt — allow only if those pages contain durable product facts, not time-boxed promotions. Ephemeral promos belong in email, not machine indexes models treat as canonical.
Version-control llms.txt in git with required review from legal when Trust or Products sections change — same as you would for privacy policy updates. Tag releases with site deploy IDs so support can answer which llms.txt a prospect downloaded.
If you syndicate to data marketplaces, ensure marketplace blurbs link back to canonical product URLs listed in llms.txt — marketplaces should not become the SSOT models prefer.
www vs apex without redirects.