File Under: HTML5

HTML5 Simplequiz: The Complexities of the Cite Tag

The latest installment in HTML5Doctor’s Simplequiz — part of a series of “tests” designed to help you understand HTML5 and how to use it — delves into what might be the most controversial change in HTML5: the cite tag.

The question in the quiz seems simple: given a passage by an author, how do you mark up the author’s name? The possible answers are, using the <b> tag, the <i> tag, a span tag, the <cite> or nothing at all.

There’s something of a trick lurking in those answers, because if you’re familiar with HTML as it’s been applied for the last decade, you’d probably pick <cite>. In HTML 4.01, the cite tag is “intended to give information about the source from which the quotation was borrowed.” Source is somewhat ambiguous, but most of us would assume that a person could be the source of a quote, thus, wrapping their name in a cite tag makes perfect sense.

In HTML5, however, the spec is not ambiguous and clearly says: “the cite element represents the title of a work… a person’s name is not the title of a work — even if people call that person a piece of work — and the element must therefore not be used to mark up people’s names.”

So, while the cite tag might have been a possible answer to the question of how you should mark up an author’s name in HTML 4.01. it clearly is not in HTML5.

I know what you’re thinking, HTML5 is supposed to be backwards-compatible with previous versions of HTML. So what’s up with redefining the cite tag? Well, there’s a good chance that the authors of HTML 4 meant what the authors of HTML5 actually wrote, but that doesn’t change that fact that there are probably millions of cases on pages around the web where <cite> will suddenly be wrong. So much for backward compatibility.

As web developer Jeremy Keith has pointed out, it’s actually much worse than it looks at first glance. Not only is cite no longer an option for peoples names, the HTML5 spec suggests that <b> might be appropriate to mark up author’s name. As Keith says, “we are seriously being told to use semantically meaningless elements to mark up content that is semantically meaningful.”

Keith calls for users to reject HTML5′s definition of <cite> and there’s a page on the WHATWG wiki that tracks the usage of <cite> in the wild to attempt to prove that HTML5′s change is ill-advised. Given that the default WordPress theme uses <cite> to mark up the names of blog commenters, there’s no shortage of examples.

However, given that Keith’s article dates from 2009 and there have been no major changes to the <cite> tag since, his may be a losing battle.

As for the HTML5 Simplequiz question, well, I’d use the <cite> tag. But I’d do so knowing I was trading valid code for more semantically meaningful content and, while that may not be “correct,” I can live with it. I’d just pair it with a little CSS to get rid of the browser’s default use of italics for <cite>.

See Also: