[ TOOL_03 / CRAWLABILITY ]
Robots.txt Tester
Fetch your live robots.txt or paste it. We flag the 9 common mistakes that quietly deindex pages — wildcard disallows, conflicting rules, missing sitemaps, ignored crawl-delays.
[ FAQ ]
Frequently asked questions
- What does robots.txt actually do?
- It tells crawlers (Googlebot, Bingbot, ChatGPT-User, etc.) which paths they may or may not request. It is a polite request, not enforcement — well-behaved bots obey it; abusive scrapers ignore it. Use
noindexmeta tags or HTTP auth for content that must be hidden from search engines. - Where should robots.txt live?
- Always at the root:
https://example.com/robots.txt. A robots.txt at a subdirectory (/blog/robots.txt) or a subdomain with no separate file is ignored by Google. Each subdomain needs its own robots.txt. - Will <code>Disallow: /</code> deindex my whole site?
- Eventually, yes. Once Googlebot can no longer crawl the URLs, they'll drop from the index over weeks. To force a fast removal use Search Console's "Remove URLs" tool. Be especially careful with staging or pre-launch sites that get pushed to production with the staging robots.txt still in place.
- Should I list my sitemap in robots.txt?
- Yes — add
Sitemap: https://example.com/sitemap.xmlon its own line. It costs nothing, helps Bing and DuckDuckGo discover the sitemap, and doubles as a self-documentation breadcrumb for the next person who edits the file. - Does ChatGPT obey robots.txt?
- OpenAI's grounding crawler honours the standard. Block it explicitly with a
User-agent: ChatGPT-Userblock if you want to opt out of being cited in ChatGPT answers, or withGPTBotto opt out of model training. Most other AI engines (Perplexity, Anthropic, Google-Extended) publish their own user-agent names — block them the same way. - Why did Google still index a page I disallowed?
- Disallow stops Google from crawling, not from indexing. If other sites link to a blocked URL, Google can list the URL with no description ("blocked by robots.txt"). To remove the page from search entirely, allow crawling and add
<meta name="robots" content="noindex">on the page itself.