Archive for the ‘search’ Category

File Under: search

Million Short: A Search Engine for the Very Long Tail

Where is that needle? Photo: Perry McKenna/Flickr.

Imagine a search engine that threw out the web’s top one million sites and then searched what was left. Sounds insane, right? But that’s exactly what Million Short purports to do and the results are, well, interesting.

Million Short seems like a terrible idea. Why would you want to remove the top sites on the web from your search results? In most cases you wouldn’t, but what Million Short offers is a chance to discover sites that just don’t make it to the top of the results from more popular search engines like Google, Bing or even DuckDuckGo.

It could be that these missing sites are just small, or perhaps they don’t use cutthroat SEO tactics to compete for popular terms, or maybe they just cover topics so niche they’re unlikely to rise to the top of any but the most targeted of searches. It could also be that they’re content farms and other worthless pages. Whatever the case, skimming the top million sites off the web just might open your eyes to how narrow your filters (and Google’s) have made your results, and how that’s both good and bad.

As Million Short notes, popularity is not an inverse corollary to quality, but when the same popular sites show up over and over in your results you are inevitably missing out on something. And that’s what Million Short wants to show you.

It’s important to realize that Million Short is removing the top websites not just the top search results for individual queries. It’s also worth noting that Million Short doesn’t disclose where its search results are from, nor how it calculates the top sites. [Update: Sanjay Arora, founder of Exponential Labs, tells Webmonkey that Million Short is using "the Bing API... augmented with some of our own data" for search results. What constitutes a "top site" in Million Short is determined by Alexa and Million Short's own crawl data.]

Most of the time, narrowing search results down to trusted, well-known sites like Google, Bing and other search engines do is a good thing. To see why just plug a few programming queries in Million Short and you’ll quickly realize just how helpful Stack Overflow — well inside the web’s top 1 million sites — has become. At the same time you might discover some unknown blog that will never make the top results in Google and happens to have the answer to exactly your problem. Is that better than the same answer from Stack Overflow? That’s up to you.

Million Short does offer some customization options you can use to both cut out the top sites and keep the handful you don’t want to be without. Additionally you can change the limit from the top million to the top 100,000, 10,000, 1,000 or 100 sites. If you decide you love it there is a search engine plugin that will work in Firefox, Chrome and Internet Explorer.

Perhaps the better way to think of Million Short is not so much a search engine, but a discovery engine. Million Short’s strength is not going to be answering the specific kind of queries that Google is forever optimizing its index to handle, but to discover less well-known sites and explore the more remote corners of the web that might be lost in other search indexes.

File Under: search

DuckDuckGo Search Engine Crowdsources Plugins

Searching XKCD. Image: DuckDuckGo.

DuckDuckGo, the privacy-conscious search alternative to Google, has introduced a new feature dubbed DuckDuckHack — a developer platform that allows anyone to add new features to the search engine.

DuckDuckGo already offers quite a few of what the company refers to as goodies — quick answers and clever shortcuts for common search tasks. For example type “1 + 1″ in DuckDuckGo’s search field and you’ll get “1 + 1 = 2″ in addition to your search results. Other goodies include time based queries, unit conversions, facts (for example the weight of a penny and of course DuckDuckGo’s “!bang” syntax for searching specific websites). For a full list of all the built in goodies, see DuckDuckGo.

DuckDuckHack takes the goodies concept and crowdsources it. Now anyone can write a custom plugin for DuckDuckGo and everyone can take advantage of it. There is already a number of cool plugins available, including the XKCD comic search pictured above, as well as more generally useful tools like lyrics search, a Twitter username search or an em-to-pixel converter for web developers.

Anyone can build DuckDuckHack plugins and the idea is for developers to build tools that they personally care about — scratch your own itch and pass it along so everyone benefits. For more info on how to write a plugin (DuckDuckHack plugins are written in a variety of languages depending on the type of search), check out the tutorial and guidelines for developers. DuckDuckHack takes requests as well. If you’ve got an idea for a search shortcut, you can let developers know.

While the new DuckDuckHack plugins idea makes DuckDuckGo an even more compelling Google alternative, some of DuckDuckHack is still a bit rough around the edges and it’s unclear what DuckDuckGo plans to do if two developers build two plugins that conflict by both responding to the same keyword trigger.

Still, DuckDuckHack is only a day old and already it’s added several useful new tools to DuckDuckGo. To learn more about the new plugin search features already a part of DuckDuckGo, visit the DuckDuckGo goodies page.

File Under: search

Google’s New Search Algorithm to Crack Down on ‘Black Hat Webspam’

By Matthew Braga, Ars Technica

Nefarious search engine optimizers be warned. Google is coming for you—again.

Following previous changes to Google’s ranking and page layout algorithms, the search giant is pushing yet another update to its algorithm this week with the hopes of curbing “black hat webspam” from creeping into search results.

The change will go live for all languages at the same time within the next few days, said engineer Matt Cutts in a blog post yesterday, and will affect roughly 3.1 percent of queries in English “to a degree that a regular user might notice.”

Cutts said the changes are targeted at sites engaged in tactics such as keyword stuffing, or “unusual linking patterns” where unrelated links are sprinkled throughout a fake or manufactured article. These sites might be harder to recognize than more blatant SEO offenses, but Google engineers believe that targeted sites “are engaging in webspam tactics to manipulate search engine rankings.”

As previously reported, there have been at least nine major updates to Google’s “Panda” algorithms since they were introduced last February, with numerous other tweaks along the way. In some cases, otherwise innocent sites were harmed, though this change is promised to affect a much smaller subset of visible search results.

Google’s quality guidelines outline just some of the discouraged tactics, which include hidden text or links, pages with irrelevant keywords, cloaking, and, of course, the presence of malicious software. That’s not to say all SEO is bad, however. Cutts points out that so-called white hat techniques are still fair game, and can often improve the usability of a site, “which is good for both users and search engines.”

As for packing every known pharmaceutical synonym into your site’s footer? That’s probably not as wise.

This article originally appeared on Ars Technica, Wired’s sister site for in-depth technology news.

File Under: search, Web Services

Hack Swaps Google’s Search Plus Your World Results for the Wider Social Web

Shortly after Google launched Search plus Your World earlier this month, critics accused the company of favoring its own nascent social network over the much richer results on others, like Twitter or Facebook. As Wired’s Steven Levy quipped, “there’s too much Plus and not enough of Our World, which has oodles of content on other social networks.”

Now developers at Twitter, Facebook and MySpace have put together a demonstration of just how much relevancy Google sacrifices in order to push Google+. The demo, which uses only Google’s own results, shows, among other questionable results, how Google routinely ignores more relevant Twitter pages to show off seldom-used Google+ profiles. To see it in action, head on over to the new Focus on the User website.

If you decide you prefer the often more relevant results from the Focus on the User experiment there’s a bookmarklet available, cheekily entitled “don’t be evil.” Just drag the bookmarklet into your web browser’s bookmarks bar and then click it whenever you want to see more than just Google+ results in Google’s search results.

The developers behind Focus on the User do work for Google+ rivals, but that doesn’t change the results of the experiment which speak for themselves. The developers also point out that their tool relies entirely on Google’s own data to rank social search results. Here’s their description of how the don’t be evil tool works:

the tool identifies the social profiles within the first ten pages of Google results (top 100 results). The ones Google ranks highest — whether they are from Flickr, Twitter, Facebook, LinkedIn, MySpace, Quora, Tumblr, Foursquare, Crunchbase, FriendFeed, Stack Overflow, Github or Google+ — replace the previous results that could only be from Google+.

In other words the bookmarklet largely returns Google to its previous state, before the Search Plus Your World Update. If you’d like to know more about how the bookmarklet works or see some examples and situations in which the emphasis on Google+ social results actually degrades the quality of search results be sure to check out the video below.

Photo: Rene Tillmann/AP

File Under: search

Google Tweaks Search Results to Punish Ad-Heavy Websites

Google has tweaked its search algorithm to punish websites with excessive advertising “above-the-fold,” that is, websites that stack the top of the page with nothing but advertisements.

According to Google, “rather than scrolling down the page past a slew of ads, users want to see content right away.” To help users get to that content, Google may drop ad-heavy websites from its search results.

Google says that the change will only affect about one in 100 searches, and emphasizes that websites using what Google’s Distinguished Engineer and SEO guru Matt Cutts calls “ads above-the-fold to a normal degree” will not be affected.

Instead the change is designed to punish sites that “go much further to load the top of the page with ads to an excessive degree or that make it hard to find the actual original content on the page.” In other words, if a site is so packed with ads that people can’t find what they’re looking for then Google isn’t going to send them to that site anymore.

While the distinction seems clear at first glance, digging deeper reveals some potential confusion for webmasters — for example, what role does screen size play? On a netbook, for instance, Google’s own search results page is almost entirely taken over by advertisements, not the actual search results (i.e., the content).

Google on a netbook screen: Ads are in red, search results in green

At small screen resolutions, Google’s own search results page is one of the worst offenders when it comes to advertising clutter obscuring content. That seeming hypocrisy may leave some webmasters wondering what constitutes “a normal degree of ads” and how screen size affects what is defined as “normal.” Sticking simply with what Google has written about the change, copying Google’s search results page is probably not a good idea in this case.

Cutts does encourage webmasters view their websites at different screen resolutions, suggesting that screen size does play a role, but unfortunately he doesn’t offer any details about what that role is or how it affects the algorithm’s new layout ranking scheme.