Menu
Menu
Blog: The Wisdom of Crowds Meets the Wisdom of Authors: How XML Enables the Semantic Web

Blog: The Wisdom of Crowds Meets the Wisdom of Authors: How XML Enables the Semantic Web

By my colleague, Paul Wlodarczyk the VP of Solutions Consulting for JustSystems:

I recently attended the first-ever Linked Data Planet conference, where a number of pioneers in the field of Semantic Web shared their perspectives on the state of the art - and business - of helping the world tag their web pages for meaning. So what is the Semantic Web and how is it different from the web of today? On the web, most search engines today use key words and the number of links to a page to determine the relevance of search results. This is the wisdom of crowds at work: If the key words you are searching for occur often on that page, and the page is popular (i.e. lots of links to it), then it is probably the best bet for what you are searching for. The downside of this approach is that it infers meaning of the page. On the Semantic Web, the crowds get wiser thanks to the wisdom of authors, who can let the crowds know - in no uncertain terms - what their content means.

For example, when "New York" appears in an HTML document, it could mean New York City, New York State, the Yankees, the Mets, the Giants, the Jets, the song, the strip steak, the state of mind, etc. You get the idea. Words are ambiguous when taken out of context.

If I'm writing about a sporting event, the context of the article lets you know that "New York" means a specific team. The typical search engine, however, doesn't recognize context. To a search engine, "New York" is just a string that occurs in the document with some frequency.

Key to the Semantic Web is semantic markup, which lets users annotate their web pages with metadata - HTML attributes that don't get displayed in the document. Semantic metadata describes what the pages are about, letting authors define things with authority and precision.

In my "New York" document, I can state that the document is about the sports team, not the steak. I can do this by tagging the named entities in the document - the people, places, things, events, and facts - in an unambiguous way. I can also set those entities into relationships with each other. If part of my document refers to a player trade between the New York Yankees and Oakland A's, I can tag the Yankees (entity number one), the A's (entity number two), and the player trade (an event, but also a relationship between the two named entities).

Overcoming the semantic hurdles. While semantic tagging gives documents unambiguous meaning, it has traditionally faced two large hurdles. First, adding semantic markup has been relatively expensive, in terms of either labor or technology. Second, the market for consuming this markup has been small. Both of those hurdles are rapidly falling away.

Let's address the second point first. Yahoo! has introduced SearchMonkey, a new technology that rates web pages. Rather than use keywords and number of links to the page (the wisdom of crowds), SearchMonkey finds web pages using the semantic markup that is embedded in the page (the wisdom of authors). This creates a substantial motive for adding semantic markup - search engine optimization. Semantic markup makes your content more likely to be found and more relevant to the searcher.

Marking up existing content. Which brings us to the first point: How do you add semantic markup? For legacy content, you need to use some combination of people and automation. Using people to tag existing content requires specialized skills that are in short supply. But some interesting technologies for auto-tagging content are emerging. Thomson Reuters' Calais is a great example. For a demo, visit http://sws.clearforest.com/SWS.htm, and try pasting some text that describes your company. I did and Calais accurately tagged all named companies, legal entities, products, technologies, countries, cities, and correctly identified a product acquisition as an event.

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!

Error: Please check your email address.

More about ClearForestEnronLeaderLeaderPromiseReuters AustraliaYahoo

Show Comments
Computerworld
ARN
Techworld
CMO