skip to content →
[ TOOL_03 / CRAWLABILITY ]

Robots.txt Tester

Fetch your live robots.txt or paste it. We flag the 9 common mistakes that quietly deindex pages — wildcard disallows, conflicting rules, missing sitemaps, ignored crawl-delays.

We'll fetch /robots.txt from that origin.

[ FAQ ]

Frequently asked questions

What does robots.txt actually do?
It tells crawlers (Googlebot, Bingbot, ChatGPT-User, etc.) which paths they may or may not request. It is a polite request, not enforcement — well-behaved bots obey it; abusive scrapers ignore it. Use noindex meta tags or HTTP auth for content that must be hidden from search engines.
Where should robots.txt live?
Always at the root: https://example.com/robots.txt. A robots.txt at a subdirectory (/blog/robots.txt) or a subdomain with no separate file is ignored by Google. Each subdomain needs its own robots.txt.
Will <code>Disallow: /</code> deindex my whole site?
Eventually, yes. Once Googlebot can no longer crawl the URLs, they'll drop from the index over weeks. To force a fast removal use Search Console's "Remove URLs" tool. Be especially careful with staging or pre-launch sites that get pushed to production with the staging robots.txt still in place.
Should I list my sitemap in robots.txt?
Yes — add Sitemap: https://example.com/sitemap.xml on its own line. It costs nothing, helps Bing and DuckDuckGo discover the sitemap, and doubles as a self-documentation breadcrumb for the next person who edits the file.
Does ChatGPT obey robots.txt?
OpenAI's grounding crawler honours the standard. Block it explicitly with a User-agent: ChatGPT-User block if you want to opt out of being cited in ChatGPT answers, or with GPTBot to opt out of model training. Most other AI engines (Perplexity, Anthropic, Google-Extended) publish their own user-agent names — block them the same way.
Why did Google still index a page I disallowed?
Disallow stops Google from crawling, not from indexing. If other sites link to a blocked URL, Google can list the URL with no description ("blocked by robots.txt"). To remove the page from search entirely, allow crawling and add <meta name="robots" content="noindex"> on the page itself.