Read on to learn some of the ways we stay up to date with SEO updates and trends. Find the best sites and platforms that keep you informed.
How to edit and optimize robots.txt in WordPress
In this post, I’m covering robots.txt optimization with WordPress. I’ll take you through what a robots.txt file is, why it’s important, how to optimize it, and we’ll close things out by covering common questions about robots.txt. Let’s dive in!
What a robots.txt file is and why it’s important
If you’re reading this article, there’s a good chance you already know what a robots.txt file is, and why it’s important. However, if you need a refresher, a robots.txt file puts power in your hands, when it comes to robots like web and search engine crawlers.
A robots.txt file tells robots how to crawl your website. With it, you can define what areas of your website crawlers can access, and which areas they can’t access. You can even specify directives for specific bots within the robots.txt file.
Let’s look at a simple robots.txt file example
If you go to https://expanderdigital.com/robots.txt, you can see the robots.txt file for this website (as of the date of publishing of this article). It’s a simple file:
1 # Hi there! 2 # Check out our SEO services at https://expanderdigital.com/services/ 3 User-agent: * 4 Allow: / 5 Sitemap: https://expanderdigital.com/sitemap_index.xml
So, what’s going on with this robots.txt file? I decided to have a little fun with my file. Most people don’t usually visit the robots.txt file of a website, but for those who do, I’d like them to check out my SEO services. That’s what’s going on with the first two lines of the file: a greeting and a pitch. The third line says that the directives in the robots.txt file are for all user-agents, as indicated by the wildcard character. The fourth line says everything in the root directory can be crawled and accessed. The last line identifies the URL where robots can find the website’s XML sitemap.
Let’s look at a more complex robots.txt file example
For this example, I decided to go with a website that’s more popular and well-trafficked. Websites that are more well-known tend to have more robust robots.txt files due to the need to limit traffic from certain robots.
The business I’m going to use an example is T-Mobile. If you go to https://t-mobile.com/robots.txt, you can see the robots.txt file for this website. This robots.txt file a bit more complex:
User-agent: Twitterbot Disallow: User-agent: Atomz/1.0 Sitemap: https://www.t-mobile.com/sitemap.xml Sitemap: https://www.t-mobile.com/company-sitemap.xml Sitemap: https://www.t-mobile.com/store-locator-sitemap.xml Sitemap: https://www.t-mobile.com/filter-sitemap.xml Sitemap: https://www.t-mobile.com/product-sitemap.xml Sitemap: https://www.t-mobile.com/business/sitemap.xml Disallow: /_authoring/ Disallow: /personalized-campaign.html Disallow: /retargeted-campaign.html Disallow: /anonymous-campaign.html Disallow: /PartnerServices.aspx* Disallow: /shop/cart/ Disallow: /popup/ Disallow: /Templates/Popup.aspx?* Disallow: /shop/plans/Retail/ Disallow: /system/sling/cqform/ Disallow: /home Disallow: /styleguide Disallow: /offers/aN1217Lbp.html Disallow: /offers/kN1024Cat.html Disallow: /offer/alcatel-linkzone-modal.html Disallow: /customer/TX-210-726-area-code-overlay-notification.html Disallow: /business/model-repository/* Disallow: /content/t-mobile/consumer/_authoring/modules/hp/* Disallow: /content/t-mobile/consumer/_authoring/pages/deals/* Disallow: /foresee/* Disallow: /content/t-mobile Disallow: /shop/addons/Services/ Disallow: /templates/* Disallow: /orderstatus/* Disallow: /shop/AddOns/Accessories/* User-agent: Baiduspider Disallow: / User-agent: YandexBot Disallow: / User-agent: * Sitemap: https://www.t-mobile.com/sitemap.xml Sitemap: https://www.t-mobile.com/company-sitemap.xml Sitemap: https://www.t-mobile.com/store-locator-sitemap.xml Sitemap: https://www.t-mobile.com/filter-sitemap.xml Sitemap: https://www.t-mobile.com/product-sitemap.xml Sitemap: https://www.t-mobile.com/business/sitemap.xml Disallow: /_authoring/ Disallow: /?cmpid=* Disallow: /?icid=* Disallow: /personalized-campaign.html Disallow: /retargeted-campaign.html Disallow: /anonymous-campaign.html Disallow: /PartnerServices.aspx* Disallow: /shop/cart/ Disallow: /popup/ Disallow: /Templates/Popup.aspx?* Disallow: /shop/plans/Retail/ Disallow: /system/sling/cqform/ Disallow: /home Disallow: /styleguide Disallow: /offers/aN1217Lbp.html Disallow: /offers/kN1024Cat.html Disallow: /offer/alcatel-linkzone-modal.html Disallow: /customer/TX-210-726-area-code-overlay-notification.html Disallow: /business/model-repository/* Disallow: /content/t-mobile/consumer/_authoring/modules/hp/* Disallow: /content/t-mobile/consumer/_authoring/pages/deals/* Disallow: /foresee/* Disallow: /content/t-mobile Disallow: /shop/addons/Services/ Disallow: /templates/* Disallow: /orderstatus/* Disallow: /shop/AddOns/Accessories/*
I’ll focus on a few highlights that I think are worth calling out with this robots.txt. If I went line by line and covered everything, we might be here all day! Let’s start with the easy stuff first: blocked web crawlers.
Because T-Mobile US does business in the United States, it makes sense that they wouldn’t want web crawlers from other countries to crawl and index the website. This is why on lines 37 and 38, the Chinese search engine Baidu is blocked from crawling the website. This is also why on lines 40 and 41, the Russian search engine Yandex is blocked:
User-agent: Baiduspider Disallow: / User-agent: YandexBot Disallow: /
Jumping back up to the top, we see the following on lines one and two:
User-agent: Twitterbot Disallow:
The above indicates that Twitterbot is allowed to crawl everything on the domain. It’s worth pointing out that the following directive allows everything on your website to be crawled:
Disallow:
Interestingly, this same directive does the same thing:
Allow: /
Either directive, conjoined with a user-agent specification, would allow the user-agent to crawl the website. Let’s jump to the final section I want to go over.
The last section I’m going to call attention to is for Atomz:
User-agent: Atomz/1.0 Sitemap: https://www.t-mobile.com/sitemap.xml Sitemap: https://www.t-mobile.com/company-sitemap.xml Sitemap: https://www.t-mobile.com/store-locator-sitemap.xml Sitemap: https://www.t-mobile.com/filter-sitemap.xml Sitemap: https://www.t-mobile.com/product-sitemap.xml Sitemap: https://www.t-mobile.com/business/sitemap.xml Disallow: /_authoring/ Disallow: /personalized-campaign.html Disallow: /retargeted-campaign.html Disallow: /anonymous-campaign.html Disallow: /PartnerServices.aspx* Disallow: /shop/cart/ Disallow: /popup/ Disallow: /Templates/Popup.aspx?* Disallow: /shop/plans/Retail/ Disallow: /system/sling/cqform/ Disallow: /home Disallow: /styleguide Disallow: /offers/aN1217Lbp.html Disallow: /offers/kN1024Cat.html Disallow: /offer/alcatel-linkzone-modal.html Disallow: /customer/TX-210-726-area-code-overlay-notification.html Disallow: /business/model-repository/* Disallow: /content/t-mobile/consumer/_authoring/modules/hp/* Disallow: /content/t-mobile/consumer/_authoring/pages/deals/* Disallow: /foresee/* Disallow: /content/t-mobile Disallow: /shop/addons/Services/ Disallow: /templates/* Disallow: /orderstatus/* Disallow: /shop/AddOns/Accessories/*
T-Mobile has blocked the Atomz robot from crawling specific directories and has called out the URLs for several XML sitemaps. Notice how each line has a different sitemap, and each line also has a directive for which directory shouldn’t be crawled. You also may be curious about how some directives include a wildcard (*). We’re not going to dive into wildcard usage within today’s post, but Google’s Robots.txt Specification does a great job covering this topic.
Going deeper
If you want to gain a larger perspective about robots.txt, you may want to read up on the robots exclusion protocol (REP) to become more familiar with governing standards for crawling the web.
How to create a robot.txt file with Yoast
By now, you should have a good idea of how robots.txt works, and you’re probably ready to roll up your sleeves and start working with your robots.txt file. We’ll cover how to do this with Yoast. First up, creating a robots.txt file.
Here we go:
- Log in to WordPress.
- From the left menu, go to SEO > Tools.
- From the Tools section, select File editor.
Note: If you don’t have this option, you may not have file editing enabled. You’ll need to turn this option on to use Yoast to create and manage your robots.txt file. - Click the Create robots.txt file button.
How to edit and optimize robots.txt with Yoast
With a robots.txt in place, you can optimize the file to your needs:
- Log in to WordPress.
- From the left menu, go to SEO > Tools.
- From the Tools section, select File editor.
- Edit the field to optimize your robots.txt file.
- Click the Save changes to robots.txt button.
- Check your robots.txt with the Google Search Console robots.txt tester. Note: This step isn’t required, but highly recommended. If you haven’t set up Google Search Console, check out How to add your WordPress website to Google Search Console.
Common questions about robots.txt
To round out this post, I thought it might help to cover some Frequently Asked Questions (FAQs) about robots.txt. If there’s a question you think I should cover, but haven’t included here, make sure you head over to our contact page and let me know.
Do I need a robots.txt file?
The short answer is no. However, websites without a robots.txt file run the risk of all web pages on the website getting crawled and indexed by search engines. This isn’t ideal because most websites have at least a few pages, and in some cases many pages, that shouldn’t be crawled or indexed by search engines. So, do you need a robots.txt file? No, but I think every website should have one.
How do I check my robots.txt file?
You can check the robots.txt file a few ways. The easiest method is going to the URL of the file. Because this file lives in the root directory, simply go to a domain address with /robots.txt, trailing at the end. For example, https://expanderdigital.com/robots.txt is how I could check the file on this website. The other way to check on the file is by using the Google Search Console robots.txt tester.
What happens if I delete my robots.txt file?
Websites without a robots.txt file run the risk of all pages on the website crawled and indexed by search engines.
Can I delete the robots.txt file through WordPress?
You won’t be able to delete the robots.txt file through WordPress directly. However, you can delete the robots.txt file on a WordPress-powered website with the cPanel file manager.
Can I edit my robots.txt file outside of WordPress?
Yes, you can. For most WordPress websites, you would have to log in to cPanel, access the file manager, open the robots.txt file to edit the file. Don’t forget to save any changes you make!
Wrap-up
If you’re new to working with a robots.txt file, or need some guidance, we’re here to help. Visit our contact page and reach out to us. If you found this article helpful, check our blog regularly for more tips.
This Post Has 0 Comments