Search Engine Optimization: Robots.txt

February 18, 2007 by Michael Stankard · 4 Comments
Filed under: Search Engine Optimization 

Get Your Robots.txt File in Shape

Robots.txt files are used for only one reason to tell search engine spiders which pages not to index. A common misconception is that a robots.txt file can somehow be used to encourage search engines to crawl a site. Not true! Pages that meet the guidelines outlined above are readily spidered by search engines without needing additional encouragement. As you may have noticed, an important part of search engine optimization (SEO) is identifying the elements that cause indexing difficulties for the spiders and eliminating those elements.

So why might you want to tell a search engine not to index some of your pages? Because search engine spiders function with limited time and resources when indexing site, your site will be better served by focusing on getting your important customer-development, product, and sales pages indexed.

Case-in Point: Why would you want a search engine to index your shopping cart? Chances are there is no benefit to you when your shopping cart checkout pages show up in the search engine results. Use a robots.txt file to make sure search engines don’t waste time indexing your shipping cart when they could be using their resources indexing your more important sales or informational content pages.

Other pages you will want to keep search engine spiders away from include anything in your cgi-bin folder as well as directories that contain images or otherwise sensitive company data. Basically, if there isn’t any benefit to having a page (or image) shows up in the search results, then you should hide it from the spiders by using a robots.txt file.

That will not only increase the search engine resources spent on your important pages, but will also have the important side benefit of making your site safer from hackers who may otherwise use search engine results to acquire sensitive information about your company or site. Search engine spiders are pretty voracious about indexing anything they can find on the web, including things like password files, so do be careful.

There is one other issue to be aware of when it comes to robots.txt files. A surprising number of sites have accidentally set up their robots.txt files to prevent search engine spiders from crawling their site.

Did you know that adding the following to lines to your robots.txt file is enough to keep all the major search engines from ever crawling your site?
User-agent:*
Disallow: /

Many people don’t and then wonder why they can’t find their site listed in the search engines. Don’t let this happen to you!

Share/Save/Bookmark

Comments

4 Responses to “Search Engine Optimization: Robots.txt”
  1. Robot.txt is pretty important but I’m surprised that most people don’t really bother about it. ;-)

  2. Egor says:

    Hello was searching Google for Engine Optimization Search Services Site Submission Web and your blog regarding Engine Optimization: Robots.txt | Get Found Now Internet Marketing looks really interesting for me. I will definitely bookmark it and come back for more cool postings to read! Cheers!

Trackbacks

Check out what others are saying about this post...
  1. [...] to spider. We moved database driven URL’s into a subfolder and disallowed the folder using a robots.txt file. You might want be asking yourself why disallow all those pages? If Google gets caught in a [...]

  2. [...] The sitemap location should be the full sitemap URL. Read my post Search Engine Optimization: Robots.txt for more [...]



Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!


Search Engine Optimization Placement Social Network Marketing | SEO Services | Web Design | SEO ABC's
SEO Aproach | Contact Us | About Us | Mission Statement | Dominate Local Search Results | The SEO Tune Up
Web Portal Systems | SAP Partner Program | 301 Redirect How To | mod_rewrite How To SEO Server Technologies - Optimized web hosting | IIS Windows 301 Redirects