Use mod_rewrite To Deal With Database Driven URLs

Perhaps you’re familiar with that little bit of Apache server magic known as mod_rewrite. You may already know that it can take your ugly, unwieldy, dynamic URLs that look like:

http://yoursite.com/index.php?item=muumuu&color=blue&size=large

…and turn them into static-looking URLs that are easier on the eyes:

http://yoursite.com/muumuu/blue/large

or even:

http://yoursite.com/m/b/l

Clearly, such rewritten URLs offer a more friendly appearance to real humans. They’re more descriptive and they’re easier to remember. What’s more, they tend to be about 30% to 50% shorter than URLs containing database driven dynamic characters and, on a super-large site, you may even save a little bandwidth.

But the real question is:

Do rewritten URLs make a difference when it comes to getting your pages indexed and ranked higher in the search engines?

Well, the answer is, they do make a difference. In fact, rewritten URLs can make a big difference.

While it’s true that Google has come a long way in parsing dynamic pages, just because the Google spider isn’t choking on your dynamic URLs anymore doesn’t mean it isn’t avoiding indexing them, and it also doesn’t mean it isn’t still giving preference to static URLs.

Google will accommodate dynamic URLs to a point, but it doesn’t like them. We’ve seen dynamic sites go from 2,000 pages indexed by Google to over 10,000 pages indexed after using mod_rewrite to switch from dynamic to static-looking URLs. In addition, these pages get indexed more quickly and more frequently than their dynamic counterparts.

How can I tell if mod_rewrite is set up on my server?
The easiest way to tell if you have mod_rewrite installed is simply to ask your system administrator.If you are the system administrator, or if you’re looking for some of the more technical details about getting mod_rewrite running, please have a look at the official mod_rewrite documentation.

In our experience, search engine spiders always prefer static URLs over dynamic URLs, and if you have a large, dynamic site that you want to get fully – as well as quickly and frequently – indexed by the search engines, then rewriting your dynamic URLs is a must.

A Simple Example

Despite what you may have heard, it’s actually fairly easy to get started with some basic URL rewriting. Here’s an example:

First, you’ll need to edit (or create if it doesn’t already exist) an .htaccess file in whichever directory you will be rewriting your URLs. An .htaccess file (note the period at the beginning of the file name) is an Apache server configuration file that lets you make changes to your server setup on a per-directory basis. For more info on creating and editing your .htaccess file, check out the official .htaccess guide, as well as this excellent tutorial.

Now, say you have a dynamic URL that looks like:

http://www.yoursite.com/index.php?item=muumuu&color=blue

…and you would like to have a more search engine (and user) friendly URL such as:

http://www.yoursite.com/muumuu/blue

To accomplish this, simply add the following two lines to your .htaccess file:

RewriteEngine On
RewriteRule muumuu/blue index.php?item=muumuu&color=blue

(Note: there is a space between the words blue and index.php in the above line)

The first line turns the mod_rewrite engine on. The second line consists of three parts:

  1. RewriteRule — This part lets the server know that you’re about to specify a rule to rewrite your URL.
  2. muumuu/blue — This is the way you want the dynamic part of your URL to appear to spiders and other visitors to your web site.
  3. index.php?item=muumuu&color=blue — This is the way the dynamic part of your URL appears currently.

And voilà! …you’ve just hidden your dynamic URL from any search engine spider.

Not so fast, there, pardner…

“Just a minute?” we hear you saying,

“This is all well and good for someone who has just a few dynamic pages, but I have over 10,000 dynamically generated pages! It could take me weeks to write all those rules.”

Right you are, and this is where the voodoo magic of regular expressions comes in.

The term regular expression refers to a type of miniature programming language which is extremely effective at recognizing and matching patterns in text. You can find a nice intro here.

Unfortunately, the awesome power of regular expressions comes with a fairly steep learning curve. Listed below, however, are some pre-fabricated formulas that should apply to many sites. These should help you get started with mod_rewrite even if you’re a regular expression novice.

By the way, you can get pretty far along the right path if you just know the following details about regular expression syntax:

. Matches any one character
? Matches the preceding character zero or one times
* Matches the preceding character zero or more times
+ Matches the preceding character one or more times
^ Matches the beginning of a line
$ Matches the end of a line

Now, let’s go back to our first example, but this time, instead of writing a specific rule for muumuu and blue, let’s make our rule flexible enough to handle any word as a value for the variables item and color in our dynamic URL. Open up that .htaccess file again and add the following:

RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteCond %{REQUEST_URI} !index.php
RewriteRule ^([^/]+)/([^/]+)/?$ /index.php?item=$1&color=$2

Looks like gibberish, doesn’t it?

Don’t worry, we can handle it. Let’s walk through it line-by-line:

  1. The first line, as we already know, simply turns on the mod_rewrite engine.
  2. The second line contains the phrase:

    Options +FollowSymlinks

    This line, primarily for those servers running Windows operating systems, tells the server to follow symbolic links. If you’re on a Unix or Linux-based server, then symbolic links are already enabled, but it doesn’t hurt to include this line in your .htaccess anyway – just in case they’ve been turned off.

  3. Next we come to the third line:

    RewriteBase /

    This sets the base URL. Setting it to “/” is usually the same as setting the base to your domain root which, in our example, is http://www.yoursite.com/.

    It’s important to understand that when we rewrite URLs we are rewriting the stem, not the root. In other words, when mod_rewrite rewrites a URL such as:

    http://www.yoursite.com/index.php?item=muumuu&color=blue

    … it will modify the part of the URL that looks like

    index.php?item=muumuu&color=blue

    while ignoring the part that looks like http://www.yoursite.com/.

  4. The fourth line consists of three parts:1) RewriteCond — This is the rewrite condition. In other words, this line needs to be true in order for any URL rewriting to occur. If this condition is not satisfied, the immediately following rewrite rule is ignored and mod_rewrite continues on processing any rules after it.2) %{REQUEST_URI} — This is an Apache-specific variable which refers to the page that is being requested. In the case of the URL http://www.yoursite.com/index.php?item=muumuu&color=blue the page being requested is index.php.3) !index.php — In English, this is equivalent to saying “is not index.php”. In regular expressions, the “!” character is also know as the “not” character.So what this line really says is “if the page being requested is not index.php, then process the next mod_rewrite rule”. We need to make sure that we don’t run mod_rewrite if index.php is called without any arguments, because this could leave us stuck in an infinite loop where we are just replacing index.php with itself over and over again.
  5. The last line also consists of three parts:1) RewriteRule — We already know that this part lets the server know that you’re about to specify a rule to rewrite your URL.2) ^([^/]+)/([^/]+)/?$ — This is the challenging part (No, it’s not a complicated smiley). It’s a regular expression that introduces something we haven’t seen before – namely, the “[ ]” symbols, which define a character class. A character class is a way to separate characters from each other in a regular expression. The rules are slightly different within a character class, namely, the “^” symbol no longer means “beginning of line”, but instead means “not” – similar to the “!” symbol we saw earlier.3) The other new concept we are seeing is the parenthesis. This is not only a grouping technique but also has the added power of being a back reference. What this means is that any text that matches an expression between parenthesis can be saved and referred to later by the symbols $1, $2, $3, and so forth. In other words, the text matching inside the first pair of parenthesis is placed in the variable $1, the text matching inside the second pair of parenthesis is placed in the variable $2, and so on.We can break this regular expression apart piece-by-piece:
    ^ Look for the beginning of a line.
    ( Opening parenthesis indicating the start of a back reference.
    [ Opening bracket indicating the beginning of a character class.
    ^ "Not" symbol (since it's inside a character class).
    / This represents an actual forward slash, such as you would find in a URL.
    ] Close the character class.
    + Match the preceding character or group of characters one or more times.
    ) Close the back reference.

    So what the expression ^([^/]+) really says is “find the beginning of a line, followed by as many characters as you can find that are NOT forward slashes, and store them in the variable $1 for later use”.

    If you have a line like:

    muumuu/blue

    …then the expression ^([^/]+) would have the effect of storing the word muumuu in variable $1.

    Back to our regular expression:

    / Look for a forward slash following the last chunk of text we matched.
    ([^/]+) We explained what this means earlier. It matches the word blue in the example above, and stores it in the variable $2.
    / Followed by another forward slash.
    ? This question mark means “match the preceding character 0 or 1 times.” Since it comes directly after the last forward slash, this means that the last forward slash is optional (i.e. it can show up once, or not at all).
    $ Finally, look for the end of a line.

    Now let’s look at the last part of line three, which looks like:

    /index.php?item=$1&color=$2

    This is the way your dynamic URL looked originally, except that in the place of the back reference variables $1 and $2 you will actually find your current GET variables (such as muumuu and blue). What will happen is the text that was captured by the back references in the regular expression will be inserted into your dynamic URL at the point where the $1 and $2 variables are found.

    When a request is made for a page like:

    http://www.yoursite.com/muumuu/blue

    …mod_rewrite will recognize that this URL is actually pointing to the address:

    http://www.yoursite.com/index.php?item=muumuu&color=blue

    …and will send your visitor (be they human or search engine spider) to the correct address, without them ever seeing what goes on behind the scenes. This is part of the beauty of mod_rewite. The URL is not being redirected – it is actually being rewritten on the server by mod_rewrite. No visitor to your site will ever see your dynamic URLs, or even be able to tell that they ever existed.

We Never Said It Was Easy

While it’s true this is a rather complicated example, and especially so if you’re new to regular expressions, our goal here is to give you some practical examples that can be applied to real life situations instead of the “toy” examples that are often used in most tutorials.

Additionally, the example we’ve used can easily be modified for any number of variables. For example, if your dynamic URLs have between one and three variables, depending on the circumstance:

http://www.yoursite.com/index.php?item=muumuu

http://www.yoursite.com/index.php?item=muumuu&color=blue

http://www.yoursite.com/index.php?item=muumuu&color=blue&size=large

… you could add the following to your .htaccess:

RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteCond %{REQUEST_URI} !index.php
RewriteRule ^([^/]+)/?$ /index.php?item=$1
RewriteCond %{REQUEST_URI} !index.php
RewriteRule ^([^/]+)/([^/]+)/?$ /index.php?item=$1&color=$2
RewriteCond %{REQUEST_URI} !index.php
RewriteRule ^([^/]+)/([^/]+)/([^/]+)/?$ /index.php?item=$1&color=$2&size=$3

… to get URLs that looks respectively like:

http://www.yoursite.com/muumuu/

http://www.yoursite.com/muumuu/blue

http://www.yoursite.com/muumuu/blue/large

Caveat emptor (A warning):

Bear in mind that rewriting your URLs increases the amount of processing done by your server, so your pages might load slightly slower, and, if you have a really high-traffic site, the load on your server might be unacceptably high.

The danger with mod_rewrite is that the search engine spiders won’t realize that you have a dynamically generated site, so they will crawl your site just as fast as they would a static site. Your system may be overwhelmed and you could experience a server slowdown or even crash if you don’t have adequate processing power.

We highly recommend that you experiment with mod_rewrite on a test server before taking it live with your main site. We have done our best to make sure that these examples work in a variety of circumstances but, considering all the vagaries of different server configurations, you may find that rewrite rules that work on our servers might not work on yours.

Voodoo?

Whether or not you agree with the fellow who commented: “this is damn cool voodoo, but voodoo nonetheless,” we consider it a valuable tool – especially useful for large, dynamically-generated sites.

Knowing that Google, for one, loves large sites and knowing that you can double or quadruple (or more) the number of pages that search engines will index, that should be enough incentive to, at least, experiment with mod_rewite. The reward waiting in the wings is an almost certain boost in your site’s overall search engine rankings once you get all of your database-driven dynamically-generated pages indexed and then working together in sync.

Get Found Now – a division of: On Top Of It, Inc

About Victoria Stankard

Victoria Stankard has been an online SEO content writer for a variety of markets across the nation since 2006. Specializing in optimized content marketing strategies and owner of a successful organic search engine optimization company, Victoria writes for real people with "The Optimized Edge" - putting you on the map and more!

Comments

  1. great
    content

  2. Rick White says:

    I really liked your blog! Nice and Informative..

  3. Blog Pet says:

    useful seo, this tutorial help me to learn seo thank’s

  4. Patty Rathburn says:

    Really good blog you have here. I found you on ask.com. I am going to add you to my rss reader.

  5. It’s not often I find such a awesome blog page, I will favourite. sincerely dean

Speak Your Mind

m4s0n501