All posts tagged ‘search’

File Under: search

Million Short: A Search Engine for the Very Long Tail

Where is that needle? Photo: Perry McKenna/Flickr.

Imagine a search engine that threw out the web’s top one million sites and then searched what was left. Sounds insane, right? But that’s exactly what Million Short purports to do and the results are, well, interesting.

Million Short seems like a terrible idea. Why would you want to remove the top sites on the web from your search results? In most cases you wouldn’t, but what Million Short offers is a chance to discover sites that just don’t make it to the top of the results from more popular search engines like Google, Bing or even DuckDuckGo.

It could be that these missing sites are just small, or perhaps they don’t use cutthroat SEO tactics to compete for popular terms, or maybe they just cover topics so niche they’re unlikely to rise to the top of any but the most targeted of searches. It could also be that they’re content farms and other worthless pages. Whatever the case, skimming the top million sites off the web just might open your eyes to how narrow your filters (and Google’s) have made your results, and how that’s both good and bad.

As Million Short notes, popularity is not an inverse corollary to quality, but when the same popular sites show up over and over in your results you are inevitably missing out on something. And that’s what Million Short wants to show you.

It’s important to realize that Million Short is removing the top websites not just the top search results for individual queries. It’s also worth noting that Million Short doesn’t disclose where its search results are from, nor how it calculates the top sites. [Update: Sanjay Arora, founder of Exponential Labs, tells Webmonkey that Million Short is using "the Bing API... augmented with some of our own data" for search results. What constitutes a "top site" in Million Short is determined by Alexa and Million Short's own crawl data.]

Most of the time, narrowing search results down to trusted, well-known sites like Google, Bing and other search engines do is a good thing. To see why just plug a few programming queries in Million Short and you’ll quickly realize just how helpful Stack Overflow — well inside the web’s top 1 million sites — has become. At the same time you might discover some unknown blog that will never make the top results in Google and happens to have the answer to exactly your problem. Is that better than the same answer from Stack Overflow? That’s up to you.

Million Short does offer some customization options you can use to both cut out the top sites and keep the handful you don’t want to be without. Additionally you can change the limit from the top million to the top 100,000, 10,000, 1,000 or 100 sites. If you decide you love it there is a search engine plugin that will work in Firefox, Chrome and Internet Explorer.

Perhaps the better way to think of Million Short is not so much a search engine, but a discovery engine. Million Short’s strength is not going to be answering the specific kind of queries that Google is forever optimizing its index to handle, but to discover less well-known sites and explore the more remote corners of the web that might be lost in other search indexes.

Build a Custom Site Search Engine With ‘Tapir’

If you’ve switched from a dynamic publishing tool like WordPress to a simpler, static site — whether to take advantage of cheap Amazon S3 hosting, or because you want to publish from flat files, without a database — there’s a few things you may be missing.

Some content is necessarily dynamic. If your site is just flat html files with no database behind them, there’s no easy way to build comments, contact forms or built-in search indexes. Luckily the web has a few solutions. For comments there are JavaScript solutions like Disqus or IntenseDebate, and contact forms can be built with Wufoo, but search is a little more difficult.

You could use Google’s Custom Search Engine tools, but then you’ll need to display things on Google’s terms (including a logo). Yahoo has a similar offering, but its results are often sub-par. The lack of search options for static sites led developer Jeff Kreeftmeijer to create Tapir, a JSON search API that indexes content from your site’s RSS feed.

Designed with static publishing systems in mind (like the popular Ruby on Rails tool, Jekyll), Tapir handles search through RSS and JavaScript without the overhead of a database on your own server. Tapir offers a JSON-based API and relies on Tire behind the scenes (which is powered by Elasticsearch, which in turn is powered by Lucene).

To use Tapir all you need to do is write a simple JavaScript-based search form, query the Tapir index for your site and then parse out the results to display for your visitors.

Tapir will parse and store the RSS feed you supply roughly every 15 minutes. For older posts (i.e. posts already long gone from your RSS feed) you’ll need to use the API to send over the data — something of a pain, but at least it’s a one-time pain.

If you’d like to give Tapir a try, just head over to the site, sign up for a token and read through the basic API docs for details on how to implement your search engine. The Tapir website says that sample code and better reference materials are coming soon, along with a JQuery plugin[Update: As Tapir creator, Kreeftmeijer, notes in the comments below, the JQuery plugin is now available].

See Also:

Google Uses HTML5, JavaScript to Visualize Popular Searches

Google has released its annual zeitgeist report, a look at how the world searched in the last year. The zeitgeist is Google’s record of popular search terms and draws on sources like Google Insights for Search and Google Trends. It’s also a reminder that, in addition to tracking you in the usual creepy ways, Google often reveals some interesting data.

The results are predictably disappointing — despite a year’s worth of events, Chatroulette and Apple’s iPad top the list of most popular searches — but the data visualization Google has created is impressive.

The visualizations combine HTML5 with some fancy JavaScript (which appears to rely on the Dojo framework) to offer maps, bar charts and timelines. The map is particularly cool, plotting out bar graphs of searches by country with an interactive timeline slider to narrow the results by month.

Other views include bar graphs of the top search terms by category. When you click on an individual bar, the graph morphs into a timeline.

There’s also a video with some overly-nostalgic music that walks you through the top terms of the year. Check it out:

See Also:

File Under: Browsers

Firefox 4 Adds Bing to List of Search Engines

Mozilla has announced that Microsoft’s upstart Bing search engine will soon become a default part of Firefox’s search bar. When Firefox 4 arrives it will feature some slight changes to the list of included search engines, offering, in order: Google (default), Yahoo, Bing, Amazon, eBay and Wikipedia.

Bing is a new option, though savvy users have long been able to install a Bing search plugin on their own. Now, it will be much easier to access by clicking on the drop-down list in the browser’s built-in search box.

Microsoft’s search engine continues to make inroads against Google, and while Microsoft has had a search product for years, it’s taken a long time to make its way onto Firefox’s short list. Mozilla vice president of products Jay Sullivan says Bing’s inclusion now is based on its “significant rise in popularity over the past year.”

Google’s engine will still be the default option for Firefox users. Google remains a primary source of income for the Mozilla — the two companies share the revenue generated by Google searches typed from within Firefox’s search box.

The new search engine default list removes the Answers.com and the Creative Commons search engine choices. Answers.com is disappearing because, according to Mozilla, “we have heard from our users that Wikipedia is more useful as an included reference search engine.”

The Creative Commons search engine is being removed because the search tool itself has changed from something that searches just CC licensed materials to a more general search engine that duplicates what’s found in Google, Yahoo and others. Mozilla is careful to point that the foundation “will continue to actively support [the Creative Commons] organization and mission through grants and joint programs,” but not, apparently, its search engine.

Of course users are still free to install any of the thousands of search plugins for the sites they’d like — we’re fans of the Flickr CC search plugin and the Speckly torrent search plugin — but making the default plugins list means more traffic for those lucky sites.

In Bing’s case it also means an important new avenue to perhaps pull a few users away from Google.

See also:

File Under: APIs, JavaScript

Add a Google Search Box to Your Site

Unless you’re incredibly handy at writing complex algorithms, building a search engine for your website is pain. And in the end, yours probably isn’t going to be that great, even after all your hard work. So why bother? Especially when there’s already a reasonably popular search engine by the name of Google — maybe you’ve heard of it? — that’s perfectly willing to handle the job for you.

The Google Search API is not only really good at searching, since it accesses the Google index, but it’s also really easy to use.

The potential for search-based mashups is nearly limitless, too. But in order to learn how it works, we’ll confine ourselves to a much more common use case — a site-specific search engine for your blog.


Continue Reading “Add a Google Search Box to Your Site” »