WordPress Robots.txt For SEO

September 14, 2007 by Michael Stankard · 5 Comments
Filed under: WordPress SEO 

A couple of weeks ago I wrote an article about changing WordPress permalinks and how best to avoid duplicate content. I have gotten a lot of questions about the robots.txt portion of WordPress SEO. It is important to understand that the robots.txt file must be in your top directory. There is some speculation about this and the use of wildcard (*) within the robots.txt file. Spiders or bots are different, but you can look at this page on Google’s Webmaster help site which outlines their policy on robots.txt. Here is a quote directly from Google on placement of the robots.txt:

The robots.txt file must reside in the root of the domain and must be named “robots.txt”. A robots.txt file located in a subdirectory isn’t valid, as bots only check for this file in the root of the domain. For instance, http://www.example.com/robots.txt is a valid location. But, http://www.example.com/mysite/robots.txt is not.

OK so now what do you do?

SEO Strategy With The Robots.txt For WordPress

We all know that duplicate content is bad. WordPress in its very nature is a dupe content nightmare. There are many ways that the same page can have different URL’s and our job is to limit what the spiders crawl, without affecting the sites usability.

Not all spiders are the same, in fact it is unclear if they all support the wildcard, but Google does. The question is, do you optimize primarily for Google? I do. Yahoo does not have as strict enforcement of their content guidelines as Google does.

Below is how I set up a robots.txt on a test site that has excellent placement for specific keywords. I will keep everyone informed how the progress of the site in the SERP’s goes. Even though Google has eliminated Supplemental listings, they still retain a basement for dupe pages. My test site had a bunch or pages in the sup index, so we will see if the WordPress SEO robots.txt file has helped.

User-agent: *
Disallow: /cgi-bin/
Disallow: /z/
Disallow: /stats/
Disallow: /dh_
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: */trackback/
Disallow: /cgi-bin/
Disallow: /adlogger/
Disallow: /ads/
Disallow: /mint/
Disallow: /*?*
Disallow: /20*

User-agent: Googlebot
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.php*
Disallow: */trackback*
Disallow: /*?*
Disallow: /z/
Disallow: /wp-*
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
Disallow: /*/feed*
Disallow: /20*

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

If you would prefer to receive updates of this blog through your email, click here to subscribe to email updates.

Share/Save/Bookmark

Comments

5 Responses to “WordPress Robots.txt For SEO”
  1. reece says:

    I was actually having a discussion with my Mrs about seo a few hours ago. This blog is very interesting, I have learnt something new. Cheers from London!

  2. Chris says:

    Can you provide an update on how this version of your robots.txt file has done so far? I just migrrated a site to WP and are looking for the best option for my robots.txt file.

  3. Chris, I have had no issues with my site or Google since changing my robots.txt file. Remember to keep your file in the root directory and also pay attention to any disallow’s that involve numerals. I has a client that was disallowing 20* and there were articles that had numbers in them that were not spidered. Wildcards can be tricky so I highly recommend using Google’s robots.txt tool and thoroughly test your file in their system first. You can check your URL’s with their tool and make sure the pages you want spidered are being seen and the sections you don’t want spidered are not!

  4. After installing the robots-meta WP plugin, I am still unable to find the robots.txt file it create’s. Does this plugin create a robots.txt file or just use the robots meta tag on individual pages?

Trackbacks

Check out what others are saying about this post...
  1. [...] you are using WordPress than you should follow the rules I laid out in my post: WordPress Robots.txt for SEO. If you are using WordPress the latest version of Arne Brachold’s awesome Google Sitemap [...]



Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!


Search Engine Optimization Placement Social Network Marketing | SEO Services | Web Design | SEO ABC's
SEO Aproach | Contact Us | About Us | Mission Statement | Dominate Local Search Results | The SEO Tune Up
Web Portal Systems | SAP Partner Program | 301 Redirect How To | mod_rewrite How To SEO Server Technologies - Optimized web hosting | IIS Windows 301 Redirects