skip to Main Content

How to edit and optimize robots.txt in WordPress

In this post, I’m covering robots.txt optimization with WordPress. I’ll take you through what a robots.txt file is, why it’s important, how to optimize it, and we’ll close things out by covering common questions about robots.txt. Let’s dive in!

What a robots.txt file is and why it’s important

If you’re reading this article, there’s a good chance you already know what a robots.txt file is, and why it’s important. However, if you need a refresher, a robots.txt file puts power in your hands, when it comes to robots like web and search engine crawlers.

A robots.txt file tells robots how to crawl your website. With it, you can define what areas of your website crawlers can access, and which areas they can’t access. You can even specify directives for specific bots within the robots.txt file.

Let’s look at a simple robots.txt file example

If you go to https://expanderdigital.com/robots.txt, you can see the robots.txt file for this website (as of the date of publishing of this article). It’s a simple file:

1 # Hi there!
2 # Check out our SEO services at https://expanderdigital.com/services/
3 User-agent: *
4 Allow: /
5 Sitemap: https://expanderdigital.com/sitemap_index.xml

So, what’s going on with this robots.txt file? I decided to have a little fun with my file. Most people don’t usually visit the robots.txt file of a website, but for those who do, I’d like them to check out my SEO services. That’s what’s going on with the first two lines of the file: a greeting and a pitch. The third line says that the directives in the robots.txt file are for all user-agents, as indicated by the wildcard character. The fourth line says everything in the root directory can be crawled and accessed. The last line identifies the URL where robots can find the website’s XML sitemap.

Let’s look at a more complex robots.txt file example

For this example, I decided to go with a website that’s more popular and well-trafficked. Websites that are more well-known tend to have more robust robots.txt files due to the need to limit traffic from certain robots.

The business I’m going to use an example is T-Mobile. If you go to https://t-mobile.com/robots.txt, you can see the robots.txt file for this website. This robots.txt file a bit more complex:

User-agent: Twitterbot
Disallow:

User-agent: Atomz/1.0
Sitemap: https://www.t-mobile.com/sitemap.xml
Sitemap: https://www.t-mobile.com/company-sitemap.xml
Sitemap: https://www.t-mobile.com/store-locator-sitemap.xml
Sitemap: https://www.t-mobile.com/filter-sitemap.xml
Sitemap: https://www.t-mobile.com/product-sitemap.xml
Sitemap: https://www.t-mobile.com/business/sitemap.xml
Disallow: /_authoring/
Disallow: /personalized-campaign.html 
Disallow: /retargeted-campaign.html 
Disallow: /anonymous-campaign.html
Disallow: /PartnerServices.aspx* 
Disallow: /shop/cart/ 
Disallow: /popup/ 
Disallow: /Templates/Popup.aspx?* 
Disallow: /shop/plans/Retail/ 
Disallow: /system/sling/cqform/ 
Disallow: /home
Disallow: /styleguide
Disallow: /offers/aN1217Lbp.html
Disallow: /offers/kN1024Cat.html
Disallow: /offer/alcatel-linkzone-modal.html
Disallow: /customer/TX-210-726-area-code-overlay-notification.html
Disallow: /business/model-repository/*
Disallow: /content/t-mobile/consumer/_authoring/modules/hp/*
Disallow: /content/t-mobile/consumer/_authoring/pages/deals/*
Disallow: /foresee/*
Disallow: /content/t-mobile
Disallow: /shop/addons/Services/
Disallow: /templates/*
Disallow: /orderstatus/*
Disallow: /shop/AddOns/Accessories/*

User-agent: Baiduspider
Disallow: /

User-agent: YandexBot
Disallow: /

User-agent: *
Sitemap: https://www.t-mobile.com/sitemap.xml
Sitemap: https://www.t-mobile.com/company-sitemap.xml
Sitemap: https://www.t-mobile.com/store-locator-sitemap.xml
Sitemap: https://www.t-mobile.com/filter-sitemap.xml
Sitemap: https://www.t-mobile.com/product-sitemap.xml
Sitemap: https://www.t-mobile.com/business/sitemap.xml
Disallow: /_authoring/
Disallow: /?cmpid=*
Disallow: /?icid=*
Disallow: /personalized-campaign.html 
Disallow: /retargeted-campaign.html 
Disallow: /anonymous-campaign.html
Disallow: /PartnerServices.aspx* 
Disallow: /shop/cart/ 
Disallow: /popup/ 
Disallow: /Templates/Popup.aspx?* 
Disallow: /shop/plans/Retail/ 
Disallow: /system/sling/cqform/ 
Disallow: /home
Disallow: /styleguide
Disallow: /offers/aN1217Lbp.html
Disallow: /offers/kN1024Cat.html
Disallow: /offer/alcatel-linkzone-modal.html
Disallow: /customer/TX-210-726-area-code-overlay-notification.html
Disallow: /business/model-repository/*
Disallow: /content/t-mobile/consumer/_authoring/modules/hp/*
Disallow: /content/t-mobile/consumer/_authoring/pages/deals/*
Disallow: /foresee/*
Disallow: /content/t-mobile
Disallow: /shop/addons/Services/
Disallow: /templates/*
Disallow: /orderstatus/*
Disallow: /shop/AddOns/Accessories/*

I’ll focus on a few highlights that I think are worth calling out with this robots.txt. If I went line by line and covered everything, we might be here all day! Let’s start with the easy stuff first: blocked web crawlers.

Because T-Mobile US does business in the United States, it makes sense that they wouldn’t want web crawlers from other countries to crawl and index the website. This is why on lines 37 and 38, the Chinese search engine Baidu is blocked from crawling the website. This is also why on lines 40 and 41, the Russian search engine Yandex is blocked:

User-agent: Baiduspider 
Disallow: /

User-agent: YandexBot
Disallow: /

Jumping back up to the top, we see the following on lines one and two:

User-agent: Twitterbot 
Disallow:

The above indicates that Twitterbot is allowed to crawl everything on the domain. It’s worth pointing out that the following directive allows everything on your website to be crawled:

Disallow:

Interestingly, this same directive does the same thing:

Allow: /

Either directive, conjoined with a user-agent specification, would allow the user-agent to crawl the website. Let’s jump to the final section I want to go over.

The last section I’m going to call attention to is for Atomz:

User-agent: Atomz/1.0
Sitemap: https://www.t-mobile.com/sitemap.xml
Sitemap: https://www.t-mobile.com/company-sitemap.xml
Sitemap: https://www.t-mobile.com/store-locator-sitemap.xml
Sitemap: https://www.t-mobile.com/filter-sitemap.xml
Sitemap: https://www.t-mobile.com/product-sitemap.xml
Sitemap: https://www.t-mobile.com/business/sitemap.xml
Disallow: /_authoring/
Disallow: /personalized-campaign.html 
Disallow: /retargeted-campaign.html 
Disallow: /anonymous-campaign.html
Disallow: /PartnerServices.aspx* 
Disallow: /shop/cart/ 
Disallow: /popup/ 
Disallow: /Templates/Popup.aspx?* 
Disallow: /shop/plans/Retail/ 
Disallow: /system/sling/cqform/ 
Disallow: /home
Disallow: /styleguide
Disallow: /offers/aN1217Lbp.html
Disallow: /offers/kN1024Cat.html
Disallow: /offer/alcatel-linkzone-modal.html
Disallow: /customer/TX-210-726-area-code-overlay-notification.html
Disallow: /business/model-repository/*
Disallow: /content/t-mobile/consumer/_authoring/modules/hp/*
Disallow: /content/t-mobile/consumer/_authoring/pages/deals/*
Disallow: /foresee/*
Disallow: /content/t-mobile
Disallow: /shop/addons/Services/
Disallow: /templates/*
Disallow: /orderstatus/*
Disallow: /shop/AddOns/Accessories/*

T-Mobile has blocked the Atomz robot from crawling specific directories and has called out the URLs for several XML sitemaps. Notice how each line has a different sitemap, and each line also has a directive for which directory shouldn’t be crawled. You also may be curious about how some directives include a wildcard (*). We’re not going to dive into wildcard usage within today’s post, but Google’s Robots.txt Specification does a great job covering this topic.

Going deeper

If you want to gain a larger perspective about robots.txt, you may want to read up on the robots exclusion protocol (REP) to become more familiar with governing standards for crawling the web.

How to create a robot.txt file with Yoast

By now, you should have a good idea of how robots.txt works, and you’re probably ready to roll up your sleeves and start working with your robots.txt file. We’ll cover how to do this with Yoast. First up, creating a robots.txt file.

Here we go:

  1. Log in to WordPress.
  2. From the left menu, go to SEO > Tools.
  3. From the Tools section, select File editor.
    Note: If you don’t have this option, you may not have file editing enabled. You’ll need to turn this option on to use Yoast to create and manage your robots.txt file.
  4. Click the Create robots.txt file button.

How to edit and optimize robots.txt with Yoast

With a robots.txt in place, you can optimize the file to your needs:

  1. Log in to WordPress.
  2. From the left menu, go to SEO > Tools.
  3. From the Tools section, select File editor.
    Tools in Yoast SEO
  4. Edit the field to optimize your robots.txt file.
    Edit robots.txt in Yoast SEO
  5. Click the Save changes to robots.txt button.
  6. Check your robots.txt with the Google Search Console robots.txt tester. Note: This step isn’t required, but highly recommended. If you haven’t set up Google Search Console, check out How to add your WordPress website to Google Search Console.

Common questions about robots.txt

To round out this post, I thought it might help to cover some Frequently Asked Questions (FAQs) about robots.txt. If there’s a question you think I should cover, but haven’t included here, make sure you head over to our contact page and let me know.

Do I need a robots.txt file?

The short answer is no. However, websites without a robots.txt file run the risk of all web pages on the website getting crawled and indexed by search engines. This isn’t ideal because most websites have at least a few pages, and in some cases many pages, that shouldn’t be crawled or indexed by search engines. So, do you need a robots.txt file? No, but I think every website should have one.

How do I check my robots.txt file?

You can check the robots.txt file a few ways. The easiest method is going to the URL of the file. Because this file lives in the root directory, simply go to a domain address with /robots.txt, trailing at the end. For example, https://expanderdigital.com/robots.txt is how I could check the file on this website. The other way to check on the file is by using the Google Search Console robots.txt tester.

What happens if I delete my robots.txt file?

Websites without a robots.txt file run the risk of all pages on the website crawled and indexed by search engines.

Can I delete the robots.txt file through WordPress?

You won’t be able to delete the robots.txt file through WordPress directly. However, you can delete the robots.txt file on a WordPress-powered website with the cPanel file manager.

Can I edit my robots.txt file outside of WordPress?

Yes, you can. For most WordPress websites, you would have to log in to cPanel, access the file manager, open the robots.txt file to edit the file. Don’t forget to save any changes you make!

Wrap-up

If you’re new to working with a robots.txt file, or need some guidance, we’re here to help. Visit our contact page and reach out to us.  If you found this article helpful, check our blog regularly for more tips.

 

Josh Gellock

Josh is the SEO and Content Strategist at Expander Digital, an SEO studio he founded in 2014. He's been in the SEO space for over seven years and helps businesses drive website traffic from organic search. When he’s not meeting with clients, you can find Josh spending time with his children or on a bike.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top