Robots.txt File Validation More Important Than Ever
The robots.txt file is one of the simplest means of communicating with search engines. It is also very important for a lot of reasons including eliminating duplicate content, hiding secure content, hiding images and specifying what should and should not be spidered. One problem is that the 4 major search engines don’t have one exact policy for robots.txt attributes. One example of this is the * wildcard. Google accepts it, but Yahoo does not. It would be nice if they would all agree on one validation like they did with the XML site maps.
Validating Your Robots.txt File
There are several websites that have easy to use robots.txt validation tools, but the only real way to be sure is to use the tools that come with the search engines’ webmaster portals. Unfortunately only Google and Live have robots.txt validation tools at this time.
Things to remember are that the robots.txt file MUST be in the root of your site. In other words it has to be:
domain.com/robots.txt.
If you are using WordPress than you should follow the rules I laid out in my post: WordPress Robots.txt for SEO. If you are using WordPress the latest version of Arne Brachold’s awesome Google Sitemap Generator now can create a robot’s .txt file as well as a sitemap.
