Robots.txt Generator
Generate robots.txt with a visual editor. Presets for standard sites, WordPress, e-commerce, and AI crawler blocking. Add sitemaps and crawl delays.
What is Robots.txt Generator?
A Robots.txt Generator helps you create the robots.txt file—a plain-text file placed at the root of your website that instructs web crawlers (robots) which parts of your site they may access and which they should ignore. The robots.txt standard, defined by the Robots Exclusion Protocol, is honored by all major search engine bots including Googlebot, Bingbot, and DuckDuckBot, as well as numerous other automated crawlers. Robots.txt is essential for SEO control: you can prevent duplicate content pages from being indexed (like pagination parameters, sort filters, or print versions), protect staging areas and admin sections from appearing in search results, control crawl budget on large sites by steering Googlebot toward your most important pages, prevent image or media directories from consuming crawl resources, and block AI training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot) that scrape your content for large language model training data. The file must be placed at exactly https://yourdomain.com/robots.txt—a strict path requirement that many developers get wrong. Writing robots.txt manually requires knowing the correct syntax (User-agent, Disallow, Allow, Sitemap directives), which this visual generator handles automatically.
How to Use Robots.txt Generator
FAQ
Where exactly should robots.txt be placed?
The robots.txt file must be accessible at the root domain: https://yourdomain.com/robots.txt. It cannot be in a subdirectory—https://yourdomain.com/blog/robots.txt will not work. If your site is on a subdomain, each subdomain needs its own robots.txt file (e.g., https://docs.yourdomain.com/robots.txt). After placing the file, verify it is accessible by visiting the URL directly in your browser, then submit it to Google Search Console under Settings > robots.txt.
Can I block AI crawlers from scraping my content?
Yes. Major AI companies have published their crawler user agent names: GPTBot (OpenAI/ChatGPT), ClaudeBot (Anthropic/Claude), Google-Extended (Google's AI training, separate from Googlebot), CCBot (Common Crawl—used by many LLMs), PerplexityBot (Perplexity AI), and Bytespider (ByteDance/TikTok). Add Disallow: / rules for each of these user agents. Note that unlike reputable search engines, some AI scrapers may not honor robots.txt—but all major labs have committed to respecting it.
Does Disallow in robots.txt prevent indexing?
Not completely. Disallow prevents crawling (Googlebot won't download the page), but if other sites link to a disallowed URL, Google can still index that URL based on the links alone—it just won't know the page content. For strong protection against indexing, use the noindex meta tag in the page's HTML: <meta name='robots' content='noindex'>. This way, even if Googlebot crawls the page, it will not add it to search results. For complete protection, use both robots.txt Disallow and noindex together, or require login for sensitive pages.
What is the difference between Disallow and Allow?
Disallow blocks access to a path, and Allow explicitly permits a path that a broader Disallow rule would otherwise block. The more specific rule takes precedence. For example: Disallow: /admin/ blocks all of /admin/, but Allow: /admin/public/ overrides the Disallow for that specific path. This pattern is common for WordPress sites that want to block /wp-admin/ but allow /wp-admin/admin-ajax.php (needed for some public AJAX functions). Allow rules only make sense in combination with a more general Disallow rule.
What is the Crawl-delay directive and should I use it?
Crawl-delay: N tells a crawler to wait N seconds between requests to your server. For example, Crawl-delay: 2 limits the bot to 30 requests per minute. This is useful for protecting servers with limited resources from being overwhelmed by aggressive crawlers. However, Googlebot ignores Crawl-delay—for Google, use the crawl rate settings in Google Search Console instead. Crawl-delay is respected by Bingbot, DuckDuckBot, and many other crawlers.