All posts tagged ‘search’

Build a Custom Site Search Engine With ‘Tapir’

If you’ve switched from a dynamic publishing tool like WordPress to a simpler, static site — whether to take advantage of cheap Amazon S3 hosting, or because you want to publish from flat files, without a database — there’s a few things you may be missing.

Some content is necessarily dynamic. If your site is just flat html files with no database behind them, there’s no easy way to build comments, contact forms or built-in search indexes. Luckily the web has a few solutions. For comments there are JavaScript solutions like Disqus or IntenseDebate, and contact forms can be built with Wufoo, but search is a little more difficult.

You could use Google’s Custom Search Engine tools, but then you’ll need to display things on Google’s terms (including a logo). Yahoo has a similar offering, but its results are often sub-par. The lack of search options for static sites led developer Jeff Kreeftmeijer to create Tapir, a JSON search API that indexes content from your site’s RSS feed.

Designed with static publishing systems in mind (like the popular Ruby on Rails tool, Jekyll), Tapir handles search through RSS and JavaScript without the overhead of a database on your own server. Tapir offers a JSON-based API and relies on Tire behind the scenes (which is powered by Elasticsearch, which in turn is powered by Lucene).

To use Tapir all you need to do is write a simple JavaScript-based search form, query the Tapir index for your site and then parse out the results to display for your visitors.

Tapir will parse and store the RSS feed you supply roughly every 15 minutes. For older posts (i.e. posts already long gone from your RSS feed) you’ll need to use the API to send over the data — something of a pain, but at least it’s a one-time pain.

If you’d like to give Tapir a try, just head over to the site, sign up for a token and read through the basic API docs for details on how to implement your search engine. The Tapir website says that sample code and better reference materials are coming soon, along with a JQuery plugin[Update: As Tapir creator, Kreeftmeijer, notes in the comments below, the JQuery plugin is now available].

See Also:

Google Uses HTML5, JavaScript to Visualize Popular Searches

Google has released its annual zeitgeist report, a look at how the world searched in the last year. The zeitgeist is Google’s record of popular search terms and draws on sources like Google Insights for Search and Google Trends. It’s also a reminder that, in addition to tracking you in the usual creepy ways, Google often reveals some interesting data.

The results are predictably disappointing — despite a year’s worth of events, Chatroulette and Apple’s iPad top the list of most popular searches — but the data visualization Google has created is impressive.

The visualizations combine HTML5 with some fancy JavaScript (which appears to rely on the Dojo framework) to offer maps, bar charts and timelines. The map is particularly cool, plotting out bar graphs of searches by country with an interactive timeline slider to narrow the results by month.

Other views include bar graphs of the top search terms by category. When you click on an individual bar, the graph morphs into a timeline.

There’s also a video with some overly-nostalgic music that walks you through the top terms of the year. Check it out:

See Also:

File Under: Browsers

Firefox 4 Adds Bing to List of Search Engines

Mozilla has announced that Microsoft’s upstart Bing search engine will soon become a default part of Firefox’s search bar. When Firefox 4 arrives it will feature some slight changes to the list of included search engines, offering, in order: Google (default), Yahoo, Bing, Amazon, eBay and Wikipedia.

Bing is a new option, though savvy users have long been able to install a Bing search plugin on their own. Now, it will be much easier to access by clicking on the drop-down list in the browser’s built-in search box.

Microsoft’s search engine continues to make inroads against Google, and while Microsoft has had a search product for years, it’s taken a long time to make its way onto Firefox’s short list. Mozilla vice president of products Jay Sullivan says Bing’s inclusion now is based on its “significant rise in popularity over the past year.”

Google’s engine will still be the default option for Firefox users. Google remains a primary source of income for the Mozilla — the two companies share the revenue generated by Google searches typed from within Firefox’s search box.

The new search engine default list removes the Answers.com and the Creative Commons search engine choices. Answers.com is disappearing because, according to Mozilla, “we have heard from our users that Wikipedia is more useful as an included reference search engine.”

The Creative Commons search engine is being removed because the search tool itself has changed from something that searches just CC licensed materials to a more general search engine that duplicates what’s found in Google, Yahoo and others. Mozilla is careful to point that the foundation “will continue to actively support [the Creative Commons] organization and mission through grants and joint programs,” but not, apparently, its search engine.

Of course users are still free to install any of the thousands of search plugins for the sites they’d like — we’re fans of the Flickr CC search plugin and the Speckly torrent search plugin — but making the default plugins list means more traffic for those lucky sites.

In Bing’s case it also means an important new avenue to perhaps pull a few users away from Google.

See also:

File Under: APIs, JavaScript

Add a Google Search Box to Your Site

Unless you’re incredibly handy at writing complex algorithms, building a search engine for your website is pain. And in the end, yours probably isn’t going to be that great, even after all your hard work. So why bother? Especially when there’s already a reasonably popular search engine by the name of Google — maybe you’ve heard of it? — that’s perfectly willing to handle the job for you.

The Google Search API is not only really good at searching, since it accesses the Google index, but it’s also really easy to use.

The potential for search-based mashups is nearly limitless, too. But in order to learn how it works, we’ll confine ourselves to a much more common use case — a site-specific search engine for your blog.


Continue Reading “Add a Google Search Box to Your Site” »

File Under: Identity, Social

Google Crawlers Now Understand ‘Canonical’ URLs

Migrating a web site from one domain to another is never easy. You’ll probably lose whatever Google ranking your old pages had, possibly break incoming links and generally disrupt the flux capacitor of the web. Of course, there are occasionally good reasons to move your content and now there are some new ways to let Google know what you’re up to. The Google Webmaster blog recently announced that Google will support the cross-domain rel="canonical" link element. That means you can effectively migrate your site to a new domain even if you don’t have server access to do redirects. In most cases, Google still suggests that, if possible, you use 301 permanent redirects to point both visitors and search engine bots to your new domain. However, if that’s not possible for some reason, (for example, if you’re migrating from a hosted blog service to your own domain) then you can add rel="canonical" element to your page headers and Google will index the new URL. Note that in our example — moving from a hosted blogging service to a self-hosted domain — it’s OK if there are some differences between the new and old pages, but the basic content (the blog post) should be the same. Previously, Google would look down on cases of duplicate content across domains. Given the number of content-stealing “splogs” out there, filtering duplicate content by domains is a good way for Google to stop search engine spam. The problem is there are legitimate reasons to have duplicate content, like migrating a site to a new domain, and now there’s a way to do it. One important note, Google no longer recommends blocking access to duplicate content on your website, whether with a robots.txt file or other methods. Just use the rel="canonical" tag instead. See Also:
File Under: UI/UX, Web Services

Google Tests Redesigned Search Page

Google’s new look? The search giant is testing a revamped results page. Click the image for a larger view. Google appears to be testing a possible redesign of its iconic search page. Whether or not the new prototype will ever become official remains unknown, but thanks to some clever JavaScript you can check out the new look today. The Google watchers over at Google Blogoscoped have found a snippet of JavaScript you can paste into your browser’s URL field which will activate the new look. Because the JavaScript code sets a new cookie, you’ll most likely need to log out of your Google account before it works. Once the cookie is set, refresh the Google homepage and you’ll see the changes. The search buttons have become blue and the overall look is a bit like that of Google Wave. More significant is the redesigned search results page (seen above) which features an always-on sidebar for narrowing search results by type, date and view. The brighter, more Wave-like look of the prototype doesn’t bother us, but we’re not so sure about the sidebar, especially given that the same options are already available in the infinitely more compact menu that runs along the top of the page. There is one new search option in the sidebar that you won’t find on the current Google page — the ability to see results from online forum sites. The good news, should the new look utterly disgust you, is that so far Google hasn’t even mentioned the new look (and had not responded to our inquires when this story was published) let alone taken any steps toward making it official. Given Google’s track record of beta testing, we suspect the redesign will be thoroughly and publicly tested before it goes live, if in fact it ever does. See Also:
File Under: Identity, Web Services

Google Dashboard: One Service to Rule Them All

If you’ve ever wanted to see all the Google services you use — and how you’re using them — in one spot, then the new Google Dashboard is exactly what you’ve been looking for. Google Dashboard is a one-stop shop for browsing through of almost all the Google services you’re using and, by extension, shows you everything Google knows about you. The nice thing about the new dashboard is that it gives you central way to manage and control that data — change privacy settings, control sharing and limit what data Google stores about you. Each service listed in your dashboard contains an overview of your usage and links to change any data-sharing settings, edit any associated profiles and control who can see what. For example, the Google Reader entry in the dashboard shows a summary of your feeds, starred items and followers, and includes handy links to control your sharing settings. There’s nothing in dashboard that can’t be found within the individual services themselves, but navigating through dashboard is considerably easier than trying to do the same on a service-by-service basis. That said, Dashboard has a few quirks. For example my dashboard says I’m sharing a photo album on Orkut, but in fact it’s just the default album associated with my Orkut account, and it doesn’t actually have an photos in it. Ditto for my Picasa account. Dashboard doesn’t currently offer any transparency about how your data is being used by Google for advertising or user-behavior data-collection purposes. It also offers little info about how (or how long) your data is being stored. It would also be nice if the Dashboard gave you a nice link to export all your data for each Google service. Eventually we’re hoping Google’s Data Liberation Front will fix that oversight and integrate some exporting tools directly into Dashboard. Dashboard doesn’t currently support every Google service, though it does cover the most popular tools. The big omissions are Maps and Groups, though Dashboard does at least offer links to the services it doesn’t track. To access the new Dashboard features, just click the My Account link in any Google service and then look for the new Dashboard link. Alternately you can head directly to the new Dashboard URL: https://www.google.com/dashboard. To see Dashboard in action, check out the following video from Google:

See Also:
File Under: Social, Web Services

Google Social Search Adds Your Friends to Your Search Results

Google has added a new social-search tool to its experimental search options. Google Social Search, which went live Monday afternoon, finds results from your social network, pulls a list of your contacts from sites like Twitter, FriendFeed, Picasa, Blogger, Google Reader and other social networks, as well as your Gmail contacts, to find results for search terms from people you know. Facebook’s friend data isn’t shared publicly, so results from your Facebook friends won’t show up unless you’re also friends on other networks. To enable the new experiment, head over to the Google Experimental Search page and add the new Social Search option. As with other experiments, you’ll need to be logged in to Google to see the social results. Once the experiment is enabled, you’ll be able to search for something like “potato chips” with enhanced results. Along with the regular Google results showing top hits for the entire web, you’ll see a link to a write-up about potato chips from your friend’s food blog, as well. You might also see a friend’s tweet about potato chips, or a link to a Yelp review written by somebody you know where they talked about how good the potato chips are at the Lulu Petite sandwich shop. While Google’s intro video (embedded below) shows search results from the social tool inline with other results (under the heading “Results from people in your social circle…”) that didn’t happen in our testing. To see the personalized results from our social graph we had to click the “Options” button and then filter the results by “social.” As for the results, well, Social Search leaves a little to be desired, but the results depend heavily on how large your social circle is and how closely your interests match your friends. For example, a search for “Webmonkey” turned up a number of hits, since the past and present Webmonkey staff members are part of our social graph. However, two of us have been passing around a link to a (NSFW) McSweeney’s article about decorative gourds Tuesday morning, but a social search for “decorative gourds” returned nothing from our social graph. We seem to be alone on that one. It’s important to note that Google Social Search is not a real-time search engine. The quality of results may suffer a little if you’re searching for things that your friends have only started posting about very recently. The quality of results will also depend on how many services you’ve added to your Google Profile — the more social sites Google knows you hang out on, the more friends it has to draw on, and thus the more results you’ll see. The exclusion of Facebook may seem like an egregious oversight, but it comes amidst a very public battle between Google and Facebook to become your hub on the social web. The recent push behind Google Profiles was the search company’s first major attempt to create a central place for you to store information about yourself and link to your profiles on other social networks. But Facebook is still the more popular place to build a profile, and Facebook struck a deal with Microsoft last week to let the Bing search engine index user activity on the site — a deal Google was left out of. Compared to using the search features on social sites themselves, like Twitter and FriendFeed, Google’s Social Search comes in a distant second. But it does offer the advantage of finding everything in one place. It also acts as a very welcome filter. Try searching for “Where the Wild Things Are” on Twitter, and you’ll see thousands of tweets from people commenting about the movie or the book. Run the same search in Google Social Search, and you’ll just see what your friends — and the people they chat with publicly — are saying. All the information that appears as part of Google Social Search is already available publicly on the web — with a bit of Google hacking you could find it yourself. But what’s social about that? To see Social Search in action, check out this video from Google: To enable Social Search, make sure you’re logged in to your Google account and head over to the Experimental Search page. See Also:
File Under: Social, Web Services

Bing Is in Your Facebook, Indexing Your Status

Facebook’s Twitter envy is showing again; the site recently announced a deal with Microsoft that will see public Facebook statuses indexed by a search engine for the first time. Although users sticking with Facebook’s default privacy settings won’t be affected, the move clearly shows Facebook moving beyond its closed, walled-garden beginnings. Twitter’s success has clearly shaped several of Facebook’s recent changes, including the move to real-time updates and the acquisition of FriendFeed, but this latest development — turning over Facebook’s walled data to a search engine — goes well beyond earlier moves. Part of Facebook’s appeal for many is precisely its walled-garden aspect. Sharing information on Facebook is a much more private, limited experience than with public services like Twitter, where anyone, friend or otherwise can see what you post. But Facebook’s new deal with Bing, which comes close on the heals of Bing’s similar indexing plan for Twitter, will change that. If the idea of your status messages finding their way into search engine indexes fills you with horror, there’s no need for alarm, only Facebook profiles set to “everyone” will be indexed. Since changing your privacy settings to “everyone” requires a trip to Settings -> Privacy Settings -> Profile, presumably only those that truly want their profiles public will be affected. Facebook’s own terms of service also prevent outside applications from caching any user data, which means Bing’s indexing will likely be very ephemeral — don’t expect deep time-based searches or cached pages. So if most users stick with the default privacy settings and Bing can’t cache the results, who does benefit from the new deal? Earlier this year, Facebook announced “fan pages” for products and brands that wanted a presence on the site, but for whom a traditional account would not have worked. It’s precisely this segment of Facebook’s population that will likely be most excited about the new Bing search deal. Brands and celebrity users already heavily invested in a Facebook presence will see that presence now available to the world at large thanks to Bing’s indexing plan. At the moment the Facebook integration is just an announcement, but if the end result is anything like the Twitter integration in Bing (which is already live), expect the focus to be on links and whatever the buzzwords of the moment happen to be. How much value Facebook’s status updates will add to Bing’s search results remains to be seen, but one thing is for sure, Bing finally has some data Google doesn’t. Unlike Wednesday’s Bing/Twitter deal, which was quickly mirrored by a similar announcement from Google, thus far, Facebook and Google have shown each other no love. See Also:
File Under: Software & Tools

OCR Tech Allows Google to Index Millions of Scanned Documents

GoogleScanned PDFs are a kind of darknet on a web — at best search engines see an image inside a PDF, but can’t parse out the actual text. But now that’s changed as Google recently announced that it will begin using OCR (optical character recognition) technology to index the text inside scanned PDF documents.

Although there’s no flashy new interface or anything tangibly different in Google’s search results page, the new technology means that the full text of the some 300 million PDF files in Google’s index will soon be converted to searchable text.

That’s quite a boost for your search results, though whether or not the PDFs show up in your searches depends a lot on what you search for. Google’s examples would seem to indicate that many of the these documents are very technical, like this guide to repairing aluminum wiring (follow the link and then click “view as HTML” to see what the results look like).

Lifehacker has a fairly novel way to put the new features to work for you — upload your scanned PDFs, tell Google about them with a link and then sit back and wait for your free OCR conversion.

Certainly there are faster ways of converting scanned documents and, given that most scanners ship with free OCR programs, we’re not sure how practical the idea is, but they get points for creativity.

See Also: