File Under: Programming

Structured Data Is Structured For a Reason

There is an eloquent blog post at glyph.twistedmatrix.com this week, inspired by the Blogger platform’s post-mangling.

* Properly-quoted “<” and “>” (i.e. “&lt; and “&gt;”) are quoted again.

* Additional line-breaks are added.

*   is converted to white-space, and then

* white space is collapsed.

This is a symptom, Glyph Lefkowitz argues, of treating HTML as just a string of text, rather than as the smartly manipulable structured data that it in fact is.

String manipulation is easier than DOM wrangling — indeed, string manipulation is one of the first topics covered in How To Program books — but that doesn’t mean it’s the best tool for every job. I daresay that treating structured data as a string is a reason that, for instance, good calendar applications, which should be something computers can do well, are so hard to come by.