Is your robots.txt blocking AI crawlers like GPTBot and ClaudeBot?
If your robots.txt allows Google but blocks AI answer-engine crawlers — GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended — you have quietly opted out of being cited in ChatGPT, Claude, Perplexity, and Google's AI answers. Allow the ones you want visibility in.
If your robots. txt allows Google but blocks AI answer-engine crawlers — GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended — you have quietly opted out of being cited in ChatGPT, Claude, Perplexity, and Google's AI answers. Allow the ones you want visibility in. AI answer engines use named crawlers to read the web: OpenAI's GPTBot and OAI-SearchBot, Anthropic's ClaudeBot, PerplexityBot, Google-Extended (Gemini / AI Overviews), and others. A robots.
Last updated·
What it is
AI answer engines use named crawlers to read the web: OpenAI's GPTBot and OAI-SearchBot, Anthropic's ClaudeBot, PerplexityBot, Google-Extended (Gemini / AI Overviews), and others. A robots.txt rule that disallows these user-agents — often added by default by a host or a privacy plugin — removes your content from the systems that generate and cite answers.
Why it matters
Answer engines increasingly sit between your content and your audience. If their crawlers can't fetch your pages, you can't be quoted or linked in the AI answer — even if you rank #1 in classic search. Blocking the live-fetch bots (OAI-SearchBot, PerplexityBot, ChatGPT-User) specifically removes you from current citations, not just model training.
How to fix it
- Check your robots.txt for AI user-agent blocks. Look for User-agent: GPTBot / ClaudeBot / PerplexityBot / Google-Extended / CCBot groups followed by Disallow: /. Many sites have these without realizing it.
- Decide training vs. live-answer access. Training crawlers (GPTBot, CCBot, Google-Extended) feed model knowledge; search crawlers (OAI-SearchBot, PerplexityBot, ChatGPT-User) fetch live to cite you. Blocking the live-fetch bots is what removes you from answers today.
- Allow the crawlers you want visibility in. Add an explicit Allow for each: e.g. "User-agent: GPTBot\nAllow: /". Repeat for OAI-SearchBot, ClaudeBot, PerplexityBot, and Google-Extended.
- Re-scan to confirm. Run your site monitor (or fetch /robots.txt) again to verify the AI crawlers are no longer disallowed while Googlebot stays allowed.
Common false positives
If you deliberately opt out of AI training and citation, this is intentional — but know it also removes you from live answer citations, not just training corpora.
Authoritative sources
- GPTBot and OpenAI crawlers — OpenAI
- Google-Extended — Google Search Central
- Google Search Central documentation — Google
- Schema.org vocabulary — schema.org
- SEO Starter Guide — Google Search Central
- MDN — HTML meta and link elements — Mozilla MDN