hashetools Logo

Robots.txt Checker

Fetch, validate, and audit any site's robots.txt file instantly. Check syntax errors, test URL crawlability, identify blocked bots, and fix crawl issues fast. Free, no login.

About Robots.txt Checker

Enter any domain to instantly fetch its live robots.txt file, validate syntax, identify which bots are allowed or blocked, and test whether specific URLs are crawlable. Built for SEO audits, no login, no cost, no limits.

What Is a Robots.txt Checker?

A robots.txt checker fetches a site's live robots.txt file and runs a full diagnostic, parsing every directive, validating syntax, mapping rules per user-agent, and testing individual URLs against the active ruleset.

For SEO professionals, robots.txt is a routine audit checkpoint that routinely surfaces serious issues: development-era Disallow: / rules left in production, overly broad wildcard blocks stripping crawlable URLs, CSS and JS assets blocked from rendering, and Crawl-delay values throttling Googlebot on large sites. None of these triggers shows visible errors in the browser; they fail silently while rankings erode. This tool surfaces them immediately.

How to Use This Tool

Step 1 - Enter the Domain: Paste any domain into the input field. The tool fetches the robots.txt file from its canonical location at yourdomain.com/robots.txt.

Step 2 - Click Check Robots.txt: The tool retrieves the live file, parses every directive across all user-agent blocks, validates syntax, and tests URL-level access rules.

Step 3 - Review the Full Diagnostic: You get a structured breakdown, per-bot rules, blocked and allowed paths, syntax errors, sitemap declarations, crawl delay values, and URL-level crawlability results, everything needed to complete a robots.txt audit in a single pass.

What This Tool Checks

Live robots.txt Fetch & Display Retrieves the raw file exactly as Googlebot sees it, no formatting changes, no caching. Also reports the HTTP status code of the robots.txt URL itself, since a 404 (no restrictions applied) and a 500 (Googlebot halts crawling site-wide) have very different crawl implications.

Syntax Validation Parses the file against correct robots.txt syntax rules and flags errors, including invalid directives, malformed wildcard patterns, missing user-agent declarations before rule blocks, unsupported directives (e.g., Noindex, which Google ignores in robots.txt), and any characters that cause parsers to silently misread rules.

Per-Bot Rule Mapping: Every user-agent block is listed separately, User-agent: * global rules, Googlebot-specific overrides, Bingbot, GPTBot, CCBot, and any other declared agents, with their full Allow/Disallow ruleset displayed clearly. Critical for audits where client sites have accumulated conflicting rules across multiple user-agents over time.

URL Crawlability Test: Test any specific URL against the active robots.txt ruleset to confirm whether it is allowed or blocked for crawling. Essential for verifying that key pages, product pages, category URLs, and canonical targets are not inadvertently blocked, and for confirming that low-value URLs (filtered parameters, session IDs, internal search results) are correctly excluded.

Allow/Disallow Conflict Detection: When Allow and Disallow rules overlap, Google applies the most specific (longest) matching rule. Conflicting rules frequently produce unexpected behaviour that is invisible without parsing the full ruleset. The tool identifies overlapping rules and flags which directive takes precedence per crawler.

Sitemap Declaration Verification: Confirms whether a sitemap URL is declared in the robots.txt file, validates the URL format, and checks that it uses the correct HTTPS version of the domain. A common issue on migrated or replatformed sites is a sitemap declaration still pointing to the old domain or HTTP version.

Crawl-Delay Detection: identifies any Crawl-Delay directives and flags values likely to throttle Googlebot unnecessarily. Note: Google does not officially honour Crawl-delay; the crawl rate for Googlebot is controlled via Google Search Console. High crawl-delay values intended for other bots can still affect Bingbot and other crawlers that do respect the directive.

Robots.txt Directives Quick Reference

DirectiveFunctionNotes
User-agent: *Applies rules to all crawlersCatch-all; specific user-agent blocks take precedence
User-agent: GooglebotTargets a specific crawlerOverrides * rules for that bot
Disallow: /path/Blocks crawler access to a pathAn empty value means allow all
Allow: /path/Overrides a broader DisallowLongest matching rule wins in Google's parser
Sitemap:Declares XML sitemap locationFull absolute URL required; supports multiple declarations
Crawl-delay:Sets the delay between requests (seconds)Ignored by Googlebot; honoured by Bingbot and others
NoindexNot a valid robots.txt directiveIgnored by Google; use the noindex meta tag instead

Common Robots.txt Issues Found in SEO Audits

Disallow: / Left in Production Set during development to block all crawlers, then never removed at launch. Often, the first thing to check when a site has near-zero organic visibility despite correct on-page optimisation. Googlebot will stop crawling entirely; the site may stay indexed from prior crawls for weeks before rankings collapse.

CSS and JS are blocked from crawling. Legacy advice recommended blocking asset directories to preserve crawl budget. Google's rendering pipeline now requires access to CSS and JS to understand page content, structured data, and layout. Blocking these assets degrades rendering quality, which directly impacts how pages are evaluated and ranked. Confirm with URL Inspection in GSC that Googlebot renders the page correctly after any robots.txt change.

Overly Broad Wildcard Rules Patterns like Disallow: /*? (block all URLs with query parameters) are commonly added to prevent crawling of filtered or paginated URLs, but frequently block legitimate canonical URLs that happen to include a parameter. Always test specific URLs against wildcard rules before deploying.

robots.txt Used to Remove Pages from Index. Blocking a URL in robots.txt does not remove it from Google's index; it only prevents crawling. If Googlebot has already indexed a page, or if other sites link to it, it can remain indexed indefinitely even after being blocked. For reliable deindexing, use a noindex meta tag on the crawlable page or submit a removal request via GSC.

Conflicting Rules Across User-Agent Blocks Sites that have had multiple SEO practitioners, CMS migrations, or CDN configurations applied over time frequently accumulate overlapping user-agent blocks with contradictory rules. The combined effect can be impossible to predict without parsing the full ruleset, exactly what this tool does.

AI Bot Directives Missing With GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), and others now actively crawling the web for training data, many site owners want explicit control over AI crawler access. If no directives exist for these agents, they default to crawling freely under the User-agent: * ruleset.

robots.txt vs. Noindex: A Critical Distinction for Audits

This distinction causes more client confusion than almost any other SEO concept, and getting it wrong has real consequences:

 robots.txt DisallowNoindex Meta Tag
Prevents crawlingYesNo
Removes from the indexNot reliablyYes
Page can still rankYes (if linked externally)No
Google can read page contentNoYes
Right tool for...Saving crawl budgetRemoving pages from search results

The practical implication: if you block a page in robots.txt that has external backlinks, Google will often keep it in the index as a URL-only result showing "No information is available for this page." To reliably remove it, the page must be crawlable and carry a noindex tag, or you use GSC's URL removal tool as a temporary measure.

More Tools

Frequently Asked Questions About Robots.txt Checker

Is this robots.txt checker free? 

Yes, no login, no account, no usage limits.

Can I test a specific URL against the robots.txt ruleset? 

Yes. Enter any URL to test whether it is allowed or blocked by the active ruleset for any user-agent.

Does robots.txt block pages from Google's index? 

Not reliably. Blocked URLs can still be indexed if external links point to them. Use noindex for reliable deindexing.

What happens if robots.txt returns a 500 error? 

Googlebot treats a 500 on robots.txt as a temporary error and stops crawling the entire site until the file becomes accessible again — one of the highest-impact robots.txt failure modes.

Can I block AI crawlers like GPTBot via robots.txt? 

Yes. Add a User-agent: GPTBot block with Disallow: / to block OpenAI's crawler. Similar directives apply to ClaudeBot, CCBot, and other AI agents.

How often does Google re-fetch robots.txt? 

Approximately every 24 hours, though Google caches the file and may not reflect changes instantly. Use GSC's robots.txt report to force a re-fetch after making changes.

Is Crawl-delay effective for controlling Googlebot? 

No. Google ignores Crawl-delay. Adjust Googlebot's crawl rate via the crawl rate settings in Google Search Console instead.