Microformats vs. The Semantic Web (Big S, Big W)

Dan Cederholm recently linked to an article called Can Your Website be Your API? by Drew McLellan. In that article, Drew linked to his presentation of the same name that he gave at the Web Standards Group London Meetup (podcast feed linked on that page) on October 19. Naturally, I downloaded all of the podcasts and listened to just about all of them on my way home last night and way in this morning.

Drew’s talk was one of three related to Microformats. The first was by Mark Norman Francis (of Yahoo) and the second was by Jeremy Keith. Drew’s came third. The theme of the three was a past, present, and future of Microformats and the applications of Microformats.

Drew talked about how simply adding Microformats (the formal name for adding semantic, standardized class names to your HTML, essentially) could open your site up to many of the benefits an API gives. Your plain HTML content then becomes machine readable and agents can use your data with very little difficulty.

I have explained Microformats to colleagues as somewhat of a “band aid” between “semantic markup” and “The Semantic Web” (the latter being Tim Berners-Lee’s vision of a machine-readable and -usable web). But Microformats seems to be encroaching on some of the ground established by the Semantic Web. The first speaker, Mark Norman Francis, gave a compelling and provocative case for adopting Microformats as a replacement for much of what the Semantic Web is trying to accomplish.

First, Mark quoted the Semantic Web’s entry on Wikipedia:

For example, with HTML and a tool to render it (perhaps Web browser software, perhaps another user agent), one can create and present a page that lists items for sale. The HTML of this catalog page can make simple, document-level assertions such as “this document’s title is ‘Widget Superstore’”. But there is no capability within the HTML itself to unambiguously assert that, say, item number X586172 is an Acme Gizmo with a retail price of ‚Ǩ199, or that it is a consumer product. Rather, HTML can only say that the span of text “X586172″ is something that should be positioned near “Acme Gizmo” and “‚Ǩ199″, etc. There is no way to say “this is a catalog” or even to establish that “Acme Gizmo” is a kind of title or that “‚Ǩ199″ is a price. There is also no way to express that these pieces of information are bound together in describing a discrete item, distinct from other items perhaps listed on the page.

Mark immediately follows with:

Those people have never heard of the hListing microformat, which does pretty much exactly that.

This is the problem we have—visible metadata. All of the information on the web should be visible to you as a human. The moment you take this information, whether it is already on the web page or about the web page and put it somewhere else, it becomes invisible to you. You then have to go and use other software to extract it. This is bad, from my point of view anyway.

Mark then continues to go on to slap around the Semantic Web some more, as well as things like XML namespaces. He certainly makes a case. I’m not going to pretend I know enough about the Semantic Web to say all of the associated technologies can be replaced by Microformats, but it seems that some aspects of both the Semantic Web and APIs are simplified by Microformats.

I highly recommend listening to the podcasts. Mark’s is quick—just 21 minutes. Here are the links:

More Microformats resources:

2 Comments

  1. On October 25th, 2006 at 2:08 pm Bruce said:

    First quick correction for you.. the audio links for Drew and Jeremy need to be switched.

    What a great summary of microformats. This kind of clarity is exactly what the ‘Semantic Web’ folks lack in describing their tools for the semantic web. I agree that only ‘really smart people’ understand, and therefore can use RDF and other Semantic Web standards effectively. I have read, and reread pages on these, and have yet to fully grasp, how it works and how to use it. What I disagree with is that there is some kind of struggle over which to use. In general I’m certain that microformats and the Semantic Web standards can peacefully co-exist. I say in general, because there is likely to be some necessary overlap that may seem tedious, but probably easily mitigated through automated tools, since both are making information explicitly machine readable.

    The point that I think the microformats folks don’t get is that the Semantic Web is not just about web pages and metadata. It is about all forms of media, which do not always come in the form of text where you can insert microformats. It is also about providing the ability to create relationships between metadata to enable querying and machine inferencing. RSS feeds are the perfect example of the first point. While one may have microformats on your Blog page, I do not know how they could be used to publish the fact that you just made a new entry, and that the entry contained a presentation and mp3 file (yes there is hAtom, but this is a translation, see below).

    On the second point, microformats allow some relationships and inferencing a simple level (i.e. if geo coordinates aren’t provided, I can perhaps get an approximate location via street address, city, county, etc.). Ultimately, however, to do more complex relationships, I think the simple beauty of microformats would quickly begin to look like the Semantic Web.

    But as I said, I think there is plenty of room for coexistence. One example would perhaps be hAtom. In this case they are taking only the pieces needed for syndicating a blog, leaving out the additional functionality of Atom. Granted Atom is not RDF as RSS is, but this case does illustrate that a microformat can be translated to another format serving a higher purpose. Another example was mentioned in Drew’s presentation when he was talking about Dan Cederholm’s use of microformats on Cork’d. Brian Suda wrote to Dan about XSLT/SPARQL, in how he could find ‘trusted reviews.’ My guess is that Brian taking the microformat data encoding it in RDF and then running a SPARQL (basically a SQL like query language for RDF). Here again, the metadata provided (these are reviews and these are friends) was used to serve a higher purpose (these are reviews I trust).

    Anyway, that was far more than I intended to write. I just wanted to say bravo to anyone trying to make the use of metadata easier and more widespread. Those that think there is a schism are thinking too much about implementation and not enough about use. No matter how it is implemented, if it is truly machine readable, there will always be a bridge. The important point is that people realize the growing utility of metadata, and find a way to provide it.

  2. On October 25th, 2006 at 3:18 pm Adam Darowski said:

    Good, I was hoping someone with more Semantic Web experience would chime in. Thanks much. Mark’s argument sounded compelling, but there *had* to be something missing.

    I agree that only ‘really smart people’ understand, and therefore can use RDF and other Semantic Web standards effectively. I have read, and reread pages on these, and have yet to fully grasp, how it works and how to use it.

    Right. What attracts people to Microformats is not just the simplicity, but the fact that it is just an extra step in what we’re already doing. But as you explain, the simplicity makes it an obvious solution in some respects, but in other areas of the Semantic Web, it sounds like Microformats simply can’t provide an alternative.

    What I liked about Drew’s talk was the fact that he gave real world examples of how Microformats COULD fill the role of an API. It’s not a challenge and it’s not poking fun… it’s actually providing a real solution.

    Thanks thanks for the bug find!