Microdata: HTML5′s Best-Kept Secret

Given the amount of industry noise about native video and scripted animations, you’d be forgiven if you had never heard of the new microdata specification included in HTML5.

Similar to outside efforts like Microformats, HTML5′s microdata offers a way of extend HTML by adding custom vocabularies to your pages.

The easiest way to understand it is to consider a common use case. Let’s say you want list details about a business on your page — the name, address, telephone number and so on. To do that you’ll need to use some vocabulary in addition to HTML, since there is no <business> tag.

Using microdata, you can create your own custom name/value pairs to define a vocabulary that describes a business listing.

When a search engine spider comes along, it will know that not only is your data a business listing, but it can discover the address, the phone number, or even the precise geo-coordinates if you want to include them.

Given that HTML5 is still a draft at this point, why bother?

Actually, despite its lack of publicity and HTML5′s still-incomplete status, microdata is already being used by Google, which has started adding information gleaned from microdata markup to its search result snippets.

Microdata is useful today, but what about Microformats or more complex tools like RDFa? The answer is that all three will work (and Google, in most cases, understands all of them).

In the end, the differences between the three are primarily in the syntax, and each has its advantages and disadvantages. But given that the Microdata specification will very likely become an official recommended web standard as part of HTML5, it seems the most future-proof of the three options.

So how do we add Microdata to a web page? Consider the following basic HTML markup, which might be used to describe my local coffee shop:

    <h1>Hendershot's Coffee Bar</h1>
    <p>1560 Oglethorpe Ave, Athens, GA</p>

This markup gets the basic information on the page and humans can read it, but search engine spiders aren’t going to get much out of it. While it’s true that even Google says you should design for humans first and robots second, we can improve this code without making it any less human readable.


To rewrite this business listing using HTML5′s microdata syntax, we would do something like this:

<div itemscope itemtype="http://data-vocabulary.org/Organization"> 
    <h1 itemprop="name">Hendershot's Coffee Bar</h1>
    <p itemprop="address" itemscope itemtype="http://data-vocabulary.org/Address">
      <span itemprop="street-address">1560 Oglethorpe Ave</span>, 
      <span itemprop="locality">Athens</span>, 
      <span itemprop="region">GA</span>.

The Microdata markup adds a couple attributes you may not have seen before, itemscope, itemtype and itemprop. The first is essentially just a top level marker, it tells the search engine spider that you’re about to define something in the following nested tags. The itemtype attribute tells the spider what you’re defining — in this case, an organization.

The rest of the markup should look pretty familiar if you’ve used Microformats. The main change is the itemprop attribute (short for item property) to define what each element is. Because our address is all one paragraph, we’ve added some span tags to define each element of the address separately — street address, locality and so on. If we wanted, we could add other properties like a phone number (itemprop="tel"), a URL (itemprop="url") or even geodata (itemprop="geo").

So where did we get these itemprop vocabularies from? Well, as the URL in the itemtype attribute indicates, they come from data-vocabulary.org. Of course you can make up your own itemprop syntax, but if you want search engine spiders to understand your microdata, you’re going to have to document what you’re doing. Since the definitions at data-vocabulary.org cover a number of common use cases — events, organizations, people, products, recipes, reviews — it makes a good starting point.

Microformats and RDFa

So how does Microdata fit with Microformats and RDFa? Well, the WHAT-WG, which helps to develop the HTML5 spec, decided the flame wars provoked by the debate over whether to use Microformats or RDFa lacked sufficient vehemence, so they added a third definition of their own.

Actually, the reasoning seems to have been something like this: Microformats are a really good idea, but essentially a hack. Because Microformats rely only on the class and rel attributes, writing parsers to read them is complicated.

At the same time, RDFa was designed to work with the now-defunct XHTML 2.0 spec. Although RDFa is being ported to work with HTML5, it can be overly complex for many use cases. RDFa is a bit like asking what time it is and having someone tell you how to build a watch. Yes, RDFa can do the same things HTML5 microdata and Microformats do (and more), but if the history of the web teaches us a lesson, it’s that simpler solutions almost always win.

Further Reading

Before you dive into microdata, be sure to check out all the options. Google has a nice overview on adding microdata to your page, and offers examples using all three markup syntaxes. Mark Pilgrim’s Dive Into HTML5 also devotes a chapter to microdata with more detail on how microdata parsers read your markup.

Also, keep in mind that it isn’t just search engines that stand to benefit from microdata on your pages. The HTML5 spec also defines a set of DOM APIs for web browsers to read and manipulate microdata on your pages. At the moment, no browser supports the API, but most probably will eventually.

The more information you can give the web, the more it can do with that information. Eventually, search engines could use microdata to find your friends on the web (like XRD and WebFinger) and browsers could use it to connect you with those friends no matter what flavor-of-the-month social site they might be using.

See Also: