File Under: search

Million Short: A Search Engine for the Very Long Tail

Imagine a search engine that threw out the web’s top one million sites and then searched what was left. Sounds insane, right? But that’s exactly what Million Short purports to do and the results are, well, interesting.

Million Short seems like a terrible idea. Why would you want to remove the top sites on the web from your search results? In most cases you wouldn’t, but what Million Short offers is a chance to discover sites that just don’t make it to the top of the results from more popular search engines like Google, Bing or even DuckDuckGo.

It could be that these missing sites are just small, or perhaps they don’t use cutthroat SEO tactics to compete for popular terms, or maybe they just cover topics so niche they’re unlikely to rise to the top of any but the most targeted of searches. It could also be that they’re content farms and other worthless pages. Whatever the case, skimming the top million sites off the web just might open your eyes to how narrow your filters (and Google’s) have made your results, and how that’s both good and bad.

As Million Short notes, popularity is not an inverse corollary to quality, but when the same popular sites show up over and over in your results you are inevitably missing out on something. And that’s what Million Short wants to show you.

It’s important to realize that Million Short is removing the top websites not just the top search results for individual queries. It’s also worth noting that Million Short doesn’t disclose where its search results are from, nor how it calculates the top sites. [Update: Sanjay Arora, founder of Exponential Labs, tells Webmonkey that Million Short is using “the Bing API… augmented with some of our own data” for search results. What constitutes a “top site” in Million Short is determined by Alexa and Million Short’s own crawl data.]

Most of the time, narrowing search results down to trusted, well-known sites like Google, Bing and other search engines do is a good thing. To see why just plug a few programming queries in Million Short and you’ll quickly realize just how helpful Stack Overflow — well inside the web’s top 1 million sites — has become. At the same time you might discover some unknown blog that will never make the top results in Google and happens to have the answer to exactly your problem. Is that better than the same answer from Stack Overflow? That’s up to you.

Million Short does offer some customization options you can use to both cut out the top sites and keep the handful you don’t want to be without. Additionally you can change the limit from the top million to the top 100,000, 10,000, 1,000 or 100 sites. If you decide you love it there is a search engine plugin that will work in Firefox, Chrome and Internet Explorer.

Perhaps the better way to think of Million Short is not so much a search engine, but a discovery engine. Million Short’s strength is not going to be answering the specific kind of queries that Google is forever optimizing its index to handle, but to discover less well-known sites and explore the more remote corners of the web that might be lost in other search indexes.