WordPress Robots.txt For SEO
A couple of weeks ago I wrote an article about changing WordPress permalinks and how best to avoid duplicate content. I have gotten a lot of questions about the robots.txt portion of WordPress SEO. It is important to understand that the robots.txt file must be in your top directory. There is some speculation about this and the use of wildcard (*) within the robots.txt file. Spiders or bots are different, but you can look at this page on Google’s Webmaster help site which outlines their policy on robots.txt. Here is a quote directly from Google on placement of the robots.txt:
The robots.txt file must reside in the root of the domain and must be named “robots.txt”. A robots.txt file located in a subdirectory isn’t valid, as bots only check for this file in the root of the domain. For instance, http://www.example.com/robots.txt is a valid location. But, http://www.example.com/mysite/robots.txt is not.
OK so now what do you do?
SEO Strategy With The Robots.txt For WordPress
We all know that duplicate content is bad. WordPress in its very nature is a dupe content nightmare. There are many ways that the same page can have different URL’s and our job is to limit what the spiders crawl, without affecting the sites usability.
Not all spiders are the same, in fact it is unclear if they all support the wildcard, but Google does. The question is, do you optimize primarily for Google? I do. Yahoo does not have as strict enforcement of their content guidelines as Google does.
Below is how I set up a robots.txt on a test site that has excellent placement for specific keywords. I will keep everyone informed how the progress of the site in the SERP’s goes. Even though Google has eliminated Supplemental listings, they still retain a basement for dupe pages. My test site had a bunch or pages in the sup index, so we will see if the WordPress SEO robots.txt file has helped.
User-agent: *
Disallow: /cgi-bin/
Disallow: /z/
Disallow: /stats/
Disallow: /dh_
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: */trackback/
Disallow: /cgi-bin/
Disallow: /adlogger/
Disallow: /ads/
Disallow: /mint/
Disallow: /*?*
Disallow: /20*
User-agent: Googlebot
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.php*
Disallow: */trackback*
Disallow: /*?*
Disallow: /z/
Disallow: /wp-*
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
Disallow: /*/feed*
Disallow: /20*
