Rewriting URLs

October 1, 2007 —

In my previous post, I mentioned Apache mod_rewrite lets you take a URL like:
http://yoursite.com/products/index.php?cat=software&item=Photoshop
and convert it to something like:
http://yoursite.com/products/software/Photoshop

There are two main benefits to formatting URLs the second way.

  • it’s easier for visitors to read.
  • it’s easier for Google to index.

You also gain a number of positive side effects, one of which is the ability to keep your variables hidden. Basically, the only reason not to rewrite URLs is the added effort required to do it. But, what if the added effort really wasn’t that significant?

It’s not.

First, let’s assume my earlier mention of Apache already turned off anyone not using it (if your rewrite engine doesn’t match mine, I’m no good to you.) Let’s also assume those who already know how to do this (and probably better than I) already got bored and moved on.

Okay, now if you’re still with me, all you need to do is create an .htaccess file in the directory you’re looking to rewrite (/products/ in my case) with the following lines:

RewriteEngine on

#Rewrite for software categories
RewriteRule ^([^/\.]*)/([^/\.]*)/?$ index.php?cat=$1&item=$2 [L]

Variables work the same, queries work the same, etc. It’s a lot like magic.

That’s it friends. Hi5.

3 Responses to “Rewriting URLs”

  1. Nicholas Schlueter

    That is a crazy url. I’ll freely admit that I don’t know PHP, and that I don’t know what you got knocked for. But, being a pretty fluent developer, and having cut a few enough corners in my day, I can see where this would come in handy. I have just never seen a question mark in the middle of a “normal” query string like that.

    But, the trouble with mod_rewrite is that your code needs to know about that rewrite rule. For example, you need to construct your links so that they will match the rewrite rule when the user clicks on them. This is a little brittle, probably not a problem with smaller sites, but could be a problem with huge sites. The only way around, that I have seen, is in rails (and therefore clones of rails). It is called routes. It “routes” requests to the proper controller and action, big deal. But it also comes with link/url helpers that construct links that match the route. That way if you decide your link structure is no good, when you change the route, all your links change with it. It is a little magical but once you get the hang of it, you are no longer stuck in mod_rewrite (do I include the trailing slash) hell.

    The good thing about mod_rewrite is it isn’t language/framework dependent. It is a tool I would not remove from my tool belt any time soon.

  2. Aaron Mentele

    Hey Nicholas. You caught a screw up. I switched to a multi-variable example after the first draft – cut wrong. I’ve updated the post – thanks.

    Rewrite rules don’t cause brittle code, though. People do. (That’s a bumper sticker, right?)

    Sure, changing your link convention will make your queries go to hell. But that’s no different than changing your query string.

    The trouble comes in cases where the rules weren’t considered as part of the initial dev / planning. Backing into pretty URLs is a lot more difficult than factoring them in from the start. I have examples.

    See how I ignored the rails part?

  3. Nicholas Schlueter

    I like how you ignored rails, which is good because my point had nothing to do with rails or PHP.

    I agree that planning your url structure up front is the best way to go. But SEO has changed a lot recently, and you can’t know the future. If it turns out that search engines like the word “in”, in urls you would have to change it in multiple places first in your rewrite rule and all your links. What you have now effectively is:

    /products/:cat/:item

    and if you needed to add “in”, it would look like:

    /products/in/:cat/:item

    Obviously this example is contrived, and I know this isn’t specifically a technical blog, so I shouldn’t belabor this point any longer. But the brittleness I was speaking of actually has a term in Computer Science. It is called coupling [http://en.wikipedia.org/wiki/Coupling_(computer_science)] and the way you are trying to accomplish SEO, I would classify as being tightly coupled.

    It’s all about managing the cost to change curve, oh my god I am boring myself, have a good time in SF. Keep up the good work.