WordPress robots.txt: How to Edit It (and Control AI Crawlers)

Your WordPress site already has a robots.txt file, even if you never created one. WordPress generates a virtual version on the fly, and it sits at yoursite.com/robots.txt right now. The file tells search engines and AI crawlers which parts of your site they are welcome to crawl, and a few wrong lines in it can quietly hide your whole site from Google.

This guide covers where the file lives in WordPress, how to edit it without breaking anything, a safe template you can copy, the one mistake that gets pages stuck in a “indexed, though blocked” limbo, and how to control AI crawlers like GPTBot and ClaudeBot now that they matter.

What robots.txt does, and what it does not

robots.txt is a plain text file at the root of your domain that gives crawlers instructions before they start crawling. You use it to keep bots out of areas that have no business in search results, like admin pages or internal search results, and to point them at your sitemap.

Two limits matter. First, it controls crawling, not indexing. Blocking a URL in robots.txt stops compliant bots from reading the page, but Google can still list that URL in search results if other pages link to it. More on that trap below. Second, it is advisory. Reputable crawlers from Google, Bing, OpenAI, and Anthropic honor it. Bad-faith scrapers ignore it entirely, so robots.txt is not a security tool. Never use it to hide private data.

Where is the robots.txt file in WordPress?

It is at yoursite.com/robots.txt. Type that into a browser and you will see it.

Here is the part that confuses people: by default there is no physical robots.txt file on your server. WordPress creates a virtual one in memory every time a crawler asks for it. That is why you can see the file at the URL but cannot find it in your file manager. The moment you upload a real robots.txt file to your site root, that physical file takes over and WordPress stops generating the virtual one.

WordPress’s default robots.txt

Out of the box, WordPress serves something close to this:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://example.com/wp-sitemap.xml

It tells every bot (User-agent: *) to stay out of the /wp-admin/ folder, with one exception: admin-ajax.php, which themes and plugins use to load content on the front end, so it needs to stay crawlable. Since WordPress 5.5, the virtual file also lists your core sitemap. If you run Yoast or Rank Math, your SEO plugin usually manages the sitemap line and points it at its own sitemap instead. For most sites, this default is already fine and you do not need to touch it.

robots.txt syntax, quickly

You only need four directives to read or write almost any robots.txt file:

User-agent: names the crawler a block of rules applies to. An asterisk means all crawlers.
Disallow: a path that crawlers should not request. Disallow: / blocks the entire site.
Allow: a path crawlers may request, used to carve an exception out of a broader Disallow.
Sitemap: the full URL of your XML sitemap. Add this so crawlers find every page you do want indexed.

You will also see Crawl-delay in older guides. Google ignores it, though Bing and Anthropic’s crawlers still honor it. If your sitemap is not set up yet, fix that first. Our guide on finding and checking your sitemap walks through it.

How to edit robots.txt in WordPress

You have two clean ways to do it. Pick based on whether you run an SEO plugin.

Option 1: Your SEO plugin (recommended)

Most SEO plugins include a robots.txt editor that writes to the virtual file for you, so you never touch the server:

Rank Math: Rank Math SEO > General Settings > Edit robots.txt.
Yoast SEO: Yoast SEO > Tools > File editor.
All in One SEO: Tools > Robots.txt Editor.

Edit your rules, save, and the plugin serves the updated file. This is the safest route because there is no physical file to misplace and nothing to break over FTP.

Option 2: Edit the file directly

If you do not use an SEO plugin, create a plain text file named robots.txt and upload it to your site root (the same folder as wp-config.php) using your host’s file manager or SFTP. Remember that this physical file overrides WordPress’s virtual one, so it becomes the single source of truth. Whichever method you use, confirm it worked by loading yoursite.com/robots.txt in a browser and checking the rules are what you expect.

A safe robots.txt template for WordPress

For a standard WordPress blog or business site, this is all you need:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://example.com/sitemap_index.xml

Swap the sitemap URL for your real one (Yoast uses sitemap_index.xml, Rank Math uses sitemap_index.xml, core WordPress uses wp-sitemap.xml). Resist the urge to add a long list of Disallow rules you copied from a forum. A few things you should almost never block:

Your CSS and JavaScript. Google renders pages like a browser, and blocking these can hurt how it sees your layout.
/wp-content/uploads/. That is where your images live, and blocking it can keep them out of image search.
The whole site with Disallow: /. It sounds obvious, but it is the single most common way people accidentally deindex themselves, usually left over from a staging site.

robots.txt is not the same as noindex

This is the mistake worth understanding before you change anything. If you want a page gone from Google, blocking it in robots.txt is the wrong tool and often backfires.

When you Disallow a URL, you stop Google from crawling it. But if another page links to that URL, Google can still index it without ever reading it, which produces the “Indexed, though blocked by robots.txt” warning in Search Console. The result is a bare URL in search results with no useful title or description, which is the opposite of what you wanted.

To actually keep a page out of the index, leave it crawlable and add a noindex meta robots tag instead, which every major SEO plugin lets you toggle per page. Google has to crawl the page to see the noindex tag, so do not block it in robots.txt at the same time. Use robots.txt to manage crawl behavior, and noindex to manage what appears in the index.

Controlling AI crawlers with robots.txt

robots.txt is now the main lever for telling AI companies whether they can use your content. The major AI crawlers publish named user-agents and say they honor robots.txt, so you can allow or block each one. Here are the ones worth knowing, taken from each company’s own documentation:

User-agent	Company	What it is for
GPTBot	OpenAI	Training data for its models
OAI-SearchBot	OpenAI	ChatGPT search results
ChatGPT-User	OpenAI	User-triggered fetches in ChatGPT
ClaudeBot	Anthropic	Training data for Claude
Claude-User	Anthropic	User-triggered fetches in Claude
Claude-SearchBot	Anthropic	Claude search indexing
Google-Extended	Google	Gemini and Vertex AI training

To opt out of AI training while staying visible everywhere else, block the training crawlers and leave the search ones alone:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Two things to keep straight. Google-Extended is separate from Googlebot: blocking it opts you out of Gemini training but has zero effect on your normal Google Search rankings or inclusion, which Google states directly in its crawler documentation. And blocking the search-oriented bots (OAI-SearchBot, Claude-SearchBot) is the move that can pull you out of AI answers, which for most sites is the wrong call. If your goal is the opposite, getting mentioned by these tools, see our guide on getting cited by LLMs and consider adding an llms.txt file alongside your robots.txt.

This is a values call as much as an SEO one. There is no single right answer, and the list of AI crawlers keeps growing. The community-maintained ai.robots.txt project tracks the current set, including others like PerplexityBot and CCBot, if you want a fuller block list.

Common WordPress robots.txt mistakes

Leaving Disallow: / live after launch. If your staging site blocked everything, double-check the production file the day you go live.
Blocking CSS and JavaScript, which stops Google from rendering your pages correctly.
Using robots.txt to hide private or sensitive URLs. Anyone can read your robots.txt, so you are publishing a map of what you want hidden.
Forgetting the Sitemap line, which is one of the easiest ways to help crawlers find everything.
Expecting robots.txt to remove a page from Google. That needs noindex, as covered above.

Frequently asked questions

Where is the robots.txt file in WordPress?

It is served at yoursite.com/robots.txt. By default WordPress generates it virtually, so there is no physical file in your folders unless you or a plugin created one. Edit it through your SEO plugin or by uploading a real robots.txt to your site root.

Does editing robots.txt affect my Google rankings?

Only if you block something important. Used correctly it has no negative effect and helps crawlers spend their time on the right pages. The danger is blocking pages, CSS, or your whole site by accident, which can drop you from search.

Should I block AI crawlers in robots.txt?

It depends on your goal. Block training crawlers like GPTBot and ClaudeBot if you do not want your content used to train models. Keep the search crawlers allowed if you want to appear in AI search answers. Blocking Google-Extended does not affect your normal Google rankings.

The short version

WordPress already serves a virtual robots.txt at yoursite.com/robots.txt, and the default is fine for most sites. Edit it through your SEO plugin rather than wrestling with files, always include your sitemap, and never block CSS, your uploads folder, or the whole site. Remember that robots.txt controls crawling, not indexing, so use a noindex tag when you actually want a page gone. And if AI matters to you, robots.txt is where you decide which crawlers like GPTBot, ClaudeBot, and Google-Extended get access, with the search-oriented bots being the ones you usually want to leave alone.