Robots.txt Generator

Generate robots.txt with a visual editor. Presets for standard sites, WordPress, e-commerce, and AI crawler blocking. Add sitemaps and crawl delays.

What is Robots.txt Generator?

A Robots.txt Generator helps you create the robots.txt file—a plain-text file placed at the root of your website that instructs web crawlers (robots) which parts of your site they may access and which they should ignore. The robots.txt standard, defined by the Robots Exclusion Protocol, is honored by all major search engine bots including Googlebot, Bingbot, and DuckDuckBot, as well as numerous other automated crawlers. Robots.txt is essential for SEO control: you can prevent duplicate content pages from being indexed (like pagination parameters, sort filters, or print versions), protect staging areas and admin sections from appearing in search results, control crawl budget on large sites by steering Googlebot toward your most important pages, prevent image or media directories from consuming crawl resources, and block AI training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot) that scrape your content for large language model training data. The file must be placed at exactly https://yourdomain.com/robots.txt—a strict path requirement that many developers get wrong. Writing robots.txt manually requires knowing the correct syntax (User-agent, Disallow, Allow, Sitemap directives), which this visual generator handles automatically.

How to Use Robots.txt Generator

Start with one of the preset templates that match your site type: Standard Website (allow everything, just add sitemap), WordPress (disallow common WordPress admin paths), E-commerce (disallow cart, checkout, and account paths), or Block AI Crawlers (disallow GPTBot, ClaudeBot, Google-Extended, and other AI training bots). The preset populates the editor with appropriate rules. To customize: click 'Add Rule Group' to add a new User-agent block. Enter the user agent name (use * for all bots, or a specific bot name like Googlebot or GPTBot), then add Disallow and Allow rules for URL paths. A path of / disallows the entire site; /admin/ disallows any URL starting with /admin/. Add your sitemap URLs in the Sitemap section—they appear as Sitemap: https://... directives at the bottom. Set a Crawl-delay value (in seconds) if you want to limit the crawl rate for specific bots. The generated robots.txt appears in the preview panel. Click Copy or Download to get the file, then upload it to your web server's root directory.

FAQ

Where exactly should robots.txt be placed?

The robots.txt file must be accessible at the root domain: https://yourdomain.com/robots.txt. It cannot be in a subdirectory—https://yourdomain.com/blog/robots.txt will not work. If your site is on a subdomain, each subdomain needs its own robots.txt file (e.g., https://docs.yourdomain.com/robots.txt). After placing the file, verify it is accessible by visiting the URL directly in your browser, then submit it to Google Search Console under Settings > robots.txt.

Can I block AI crawlers from scraping my content?

Yes. Major AI companies have published their crawler user agent names: GPTBot (OpenAI/ChatGPT), ClaudeBot (Anthropic/Claude), Google-Extended (Google's AI training, separate from Googlebot), CCBot (Common Crawl—used by many LLMs), PerplexityBot (Perplexity AI), and Bytespider (ByteDance/TikTok). Add Disallow: / rules for each of these user agents. Note that unlike reputable search engines, some AI scrapers may not honor robots.txt—but all major labs have committed to respecting it.

Does Disallow in robots.txt prevent indexing?

Not completely. Disallow prevents crawling (Googlebot won't download the page), but if other sites link to a disallowed URL, Google can still index that URL based on the links alone—it just won't know the page content. For strong protection against indexing, use the noindex meta tag in the page's HTML: <meta name='robots' content='noindex'>. This way, even if Googlebot crawls the page, it will not add it to search results. For complete protection, use both robots.txt Disallow and noindex together, or require login for sensitive pages.

What is the difference between Disallow and Allow?

Disallow blocks access to a path, and Allow explicitly permits a path that a broader Disallow rule would otherwise block. The more specific rule takes precedence. For example: Disallow: /admin/ blocks all of /admin/, but Allow: /admin/public/ overrides the Disallow for that specific path. This pattern is common for WordPress sites that want to block /wp-admin/ but allow /wp-admin/admin-ajax.php (needed for some public AJAX functions). Allow rules only make sense in combination with a more general Disallow rule.

What is the Crawl-delay directive and should I use it?

Crawl-delay: N tells a crawler to wait N seconds between requests to your server. For example, Crawl-delay: 2 limits the bot to 30 requests per minute. This is useful for protecting servers with limited resources from being overwhelmed by aggressive crawlers. However, Googlebot ignores Crawl-delay—for Google, use the crawl rate settings in Google Search Console instead. Crawl-delay is respected by Bingbot, DuckDuckBot, and many other crawlers.