This tutorial explains how to create the best SEO friendly robots.txt file for search engine optimization on any WordPress website. we have gathered the industry-standard practises & techniques for website SEO.
A website robots.txt file is a powerful tool that needs to be handled seriously when you’re working on your website’s (search engine optimization) SEO. However it should be handled with care. The robots.txt file allows you to deny google and other search engines access to a different section of your sites files & folders, but often that’s not the best way to optimize your website.
Today, Howtechwork we’ll explain how we think bloggers, webmasters and SEO specialist should use their robots.txt file. And also propose the ‘best practice/techniques’ approach that we think is suitable for most sites.
In this tutorial, You’ll find a robots.txt file example that works for the vast majority of WordPress and other cms based websites. However, if you want to know more about how your website robots.txt file works; you can read this detailed guide to robots.txt.
What we mean by best SEO robots.txt practices
Nowadays there is no magic to get your website to the top of Google or other search engines; since Search engines are now continually improving the way they crawl the web & index site content. So, therefore, what used to be best practice a few years ago may not work anymore, or, may even harm your website.
In this articles, the term best practice simply means “don’t rely on your robots.txt file” by blocking some search engine from crawling some file on your site. In fact, the only time you should think of blocking URLs in your robots.txt file is when you have complex technical challenges. for example a large eCommerce website with faceted navigation, or when you have no other option.
To summarize this blog post, Note that Blocking URLs via robots.txt is not a good approach, & can cause more problems to your blog/websites than it solves.
The below code is the recommended robots.txt file for most WordPress websites.
# This space intentionally left blank # this is Howtechwork recommended robos.txt file for WP websites User-agent: *
The above robots.txt code explained
If you notice, you will see that the files content does not contain much content. In the first two line’s starting with the # sign, We provided some information for humans looking at the file so that they understand why the file is ’empty’.
The User-agent: * indicates that any instructions below it apply to all bot and crawlers. However we did not provide any further instructions, which simply means “all search engine crawlers can freely crawl this website without restriction”.
Better ways to disallow crawlers from crawling URLs
If for any reason you want to prevent search engines from crawling/indexing certain parts of your WordPress website, there is a better approach to that different from using robots.txts.
You can simply use meta robots tags or robots HTTP headers.
This comprehensive guide to meta robots tags explains how to manage crawling & indexing the right way. The Yoast SEO plugin also provides the tools that can help you implement those meta tags on your site pages. However, If your website has crawling/indexing issues that can’t be fixed via meta robots tags or HTTP headers, or you just need to prevent crawler access for any reasons, please read this ultimate guide to robots.txt.
Also note that the WordPress script itself and the Yoast SEO Plugin already automatically prevent indexing of some sensitive files and URLs, like the WordPress admin area (via an x-robots HTTP header).
Why simple robots.txt file is best for SEO
If you have view most website robots.txt file, you will notice that they contain a lot of complicated instructions/ lines of codes. Each line of codes in the files are telling the search engines what part of the website to crawl & which not to.
But Howtechwork sticks with the simple, short and concise robots.txt format that allows search engine crawler to index all page. And below are our reasons why you shouldn’t complicate your site robots.txt file.
Misconfigured Robots.txt can block indexing
For your website to be able to compete for visibility in the search engine results, the search engines need to discover, crawl and index your site pages. However, while blocking certain URLs via robots.txt, you might use the wrong code which may block search engines from crawling your whole site. Which means all your website contents or key pages might get ignored by search engines.
As most of us might have know that one of the basic rules of (SEO) search engine optimization is that links from other pages can influence your site performance. So therefore If a URL is blocked via robots.txt, not only won’t search engines crawl it, but they also might not distribute any link value pointing to that URL to, or through that URL to other pages on the website.
Search engine (Google) fully renders all site code
The Previous best practice of blocking access to your wp-includes directory & your plugins directory via robots.txt is no longer valid, which is why the Yoast plugin team has now collaborated with WordPress to remove the default disallow rule for wp-includes in version 4.0.
Linking to the sitemap is not necessary
The robots.txt standard supports adding a link to your website XML sitemap(s) to the file. This helps search engines to discover the location and contents of your website. From an SEO perspective, this is no longer valid. A webmaster should already add his website sitemap to it Google Search Console & Bing Webmaster Tools account.
Adding your site to google search console or to bing webmaster helps you to access your website analytics & performance data. If you’ve done that, then you don’t need any reference in your robots.txt file. In other words, Robots.txt file is somehow useless.
We believe this article is enough for you to write a simple and working SEO friendly Robots.txt file for your WordPress blog.
To summarize the whole thing, the main idea of this post is just to inform you that blocking URL, files, and another part of your site content using robots.txt is no longer valid for SEO. Using meta tags and Http header is much more preferable.
If there is any part of this post that sounds vague to you, kindly let us know via the comment section.