<$BlogRSDUrl$>
 Just callin' it it as we see it
...could it have been HOW you asked me...
...or could it have been WHAT you asked me...
...maybe it's WHEN you asked me...
all i know is that YOU'RE ON FUCKIN' CRACK...

 Archives
May 2003
July 2003
August 2003
September 2003
November 2003
December 2003
January 2004
February 2004
March 2004
April 2004
May 2004
June 2004
July 2004
August 2004
September 2004
October 2004
November 2004
December 2004
February 2005
March 2005
April 2005
May 2005
June 2005
August 2005
October 2005
November 2005
December 2005
February 2006
April 2006
May 2006
June 2006
October 2006
September 2010
December 2011
 Contributing Authors
Tineybopper
JiminySpliff
MarleyMonk
Mizzymoto
Siscaholic
Wednesday, June 23, 2004
 The problem with search and how to fix it
To answer this question I think we first need to take a step back and understand what people use search engines for. Most people will tell you something along the lines of "I use search to look for information on X". At one time this seemed pretty straight forward. You type in string and the search engine would look for occurrences of that string in indexed websites. Then came the web-spammers that would sneak in huge blocks of unrelated keywords into their webpages so say a porn site would come up when you search for "equestrian training".

In an ongoing tug-a-war, the search engines would implement new features to neutralize the web spammer's tactics, then the web spammers would find another way to cheat their way up on results pages. Thus was born the concept of "relevance". Google pioneered this space with their PageRank system, a series of algorithms that attempts to predict a websites relevance to a search string by how many other pages link to it, thus taking into account legitimacy as a factor of its relevance.

This is a great way to come up with relevant search results when everyone defines something the same way. For instance if you're looking up information on a Honda car dealerships in California all you need type is "Honda California" into Google and the first result is Honda's own website followed by slew local dealerships. The problem arises if you happen to live in Honda, California and you are looking for a map of your town. Even a refined search for "Honda California Map" merely results in a listing of maps with instructions to get to various dealerships around California.

The problem here has to do with how you define what you are looking for. With most products or concepts that have a common cultural definition today's search engines work great. But when you start delving into searches involving something more nebulous where terminology may vary, or the terminology may be eclipsed by some greater social meaning, search engines are actually pretty bad at serving up what you are looking for.

So short of making a brainwave interface that connects to a computer using some advanced form of AI to determine what you are looking for, how will we ever have truly relevant searches? Simple... Build a personal context database for each search user that takes into account linguistic idiosyncrasies, historical search successes, and personal preferences, and use it to predict a webpage's relevance. The database could be built based on a user?s past searches, what they tended to click on, and more importantly how they rated the pages relevance in relation to what they were actually looking for. Over time this database would create a user's search profile and then compare it to other users who with similar search profiles.

Applying this to our earlier example, a person who lives in Honda, California would be likely to rate the pages relating to their city as highly relevant so the next time the search is performed the results are more targeted. The next step would be to use this information to predict future search results. At a micro level, by putting Bob in a mathematical "neighborhood" with Joe and Mary, because they both found pages about Honda California relevant, the search engine can now reference Joe and Mary's preferences as it relates to Honda, California for Bob's searches thus increasing the likelihood of Bob finding something he wants the next time he searches. At the macro level this would be happening on billions of searches conducted by millions of users, so it's statistically sound as well.

I can almost picture the day when I'll be able to type "crack pipe" into a Google search box and have it pull up a list of plumbers instead of drug paraphernalia sites;-P

Posted by Marc @ 4:56:00 PM --

(c) 2004 - UrOnCrack Enterprises. Not a single right reserved.
If you think this page needs a copyright then you need to get off the rock.