Friday | 9 January, 2009
CIO
Blog: The Wisdom of Crowds Meets the Wisdom of Authors: How XML Enables the Semantic Web
Jake Sorofman 21 July, 2008 14:02:16

By my colleague, Paul Wlodarczyk the VP of Solutions Consulting for JustSystems:

I recently attended the first-ever Linked Data Planet conference, where a number of pioneers in the field of Semantic Web shared their perspectives on the state of the art - and business - of helping the world tag their web pages for meaning. So what is the Semantic Web and how is it different from the web of today? On the web, most search engines today use key words and the number of links to a page to determine the relevance of search results. This is the wisdom of crowds at work: If the key words you are searching for occur often on that page, and the page is popular (i.e. lots of links to it), then it is probably the best bet for what you are searching for. The downside of this approach is that it infers meaning of the page. On the Semantic Web, the crowds get wiser thanks to the wisdom of authors, who can let the crowds know - in no uncertain terms - what their content means.

For example, when "New York" appears in an HTML document, it could mean New York City, New York State, the Yankees, the Mets, the Giants, the Jets, the song, the strip steak, the state of mind, etc. You get the idea. Words are ambiguous when taken out of context.

If I'm writing about a sporting event, the context of the article lets you know that "New York" means a specific team. The typical search engine, however, doesn't recognize context. To a search engine, "New York" is just a string that occurs in the document with some frequency.

Key to the Semantic Web is semantic markup, which lets users annotate their web pages with metadata - HTML attributes that don't get displayed in the document. Semantic metadata describes what the pages are about, letting authors define things with authority and precision.

In my "New York" document, I can state that the document is about the sports team, not the steak. I can do this by tagging the named entities in the document - the people, places, things, events, and facts - in an unambiguous way. I can also set those entities into relationships with each other. If part of my document refers to a player trade between the New York Yankees and Oakland A's, I can tag the Yankees (entity number one), the A's (entity number two), and the player trade (an event, but also a relationship between the two named entities).

Overcoming the semantic hurdles. While semantic tagging gives documents unambiguous meaning, it has traditionally faced two large hurdles. First, adding semantic markup has been relatively expensive, in terms of either labor or technology. Second, the market for consuming this markup has been small. Both of those hurdles are rapidly falling away.

Let's address the second point first. Yahoo! has introduced SearchMonkey, a new technology that rates web pages. Rather than use keywords and number of links to the page (the wisdom of crowds), SearchMonkey finds web pages using the semantic markup that is embedded in the page (the wisdom of authors). This creates a substantial motive for adding semantic markup - search engine optimization. Semantic markup makes your content more likely to be found and more relevant to the searcher.

Marking up existing content. Which brings us to the first point: How do you add semantic markup? For legacy content, you need to use some combination of people and automation. Using people to tag existing content requires specialized skills that are in short supply. But some interesting technologies for auto-tagging content are emerging. Thomson Reuters' Calais is a great example. For a demo, visit http://sws.clearforest.com/SWS.htm, and try pasting some text that describes your company. I did and Calais accurately tagged all named companies, legal entities, products, technologies, countries, cities, and correctly identified a product acquisition as an event.

Latest User Comments
There are no comments yet. Be the first to add one!
Additional Resources
Executive Guides
Whitepapers
Zones
Zone logoZones provide focussed content from CIO and leading technology partners.
Newsletter Subscription
Sign up for our CIO newsletters!
RSS Feeds
Featured Whitepaper Sponsors
Market Place
 

Smart SOA World Tour

Discover how SOA can create smarter outcomes for your business.

Attend and learn:

  • How SOA is helping leading companies to become more agile
  • Where you should be applying SOA processes in your company
  • The top SOA implementation mistakes to avoid

Click here for more information.
  • +

    CIO Live Podcast #79: Brent D Taylor, author of The Outsider's Edge: The Making of Self-Made Billionaires Part II 05 October, 2007 06:00:00

    For his new book, The Outsider's Edge: The Making of Self-Made Billionaires, social researcher Brent D Taylor spent four years of intensive research investigating the psychological make-up and backgrounds of some of the world's richest men and women, including IT luminaries Bill Gates, Larry Ellison and Steve Jobs. Taylor discovered that, despite working in different industries and coming from different upbringings, they all have one thing in common -- they are all outsiders.
  • +

    CIO Live Podcast #78: Brent D Taylor, author of The Outsider's Edge: The Making of Self-Made Billionaires 28 September, 2007 17:34:25

    For his new book, The Outsider's Edge: The Making of Self-Made Billionaires, social researcher Brent D Taylor spent four years of intensive research investigating the psychological make-up and backgrounds of some of the world's richest men and women, including IT luminaries Bill Gates, Larry Ellison and Steve Jobs. Taylor discovered that, despite working in different industries and coming from different upbringings, they all have one thing in common -- they are all outsiders.
  • +

    CIO Live Podcast #77: Panasonic Speeds Up Trans-Pacific File Transfers, Part III 21 September, 2007 07:00:00

    Part three in our three-part special report from CIO's sister publication Network World in the US, as Paul Desmond reports from the Network World IT Roadmap Conference in Santa Clara, California. With development teams in the US and Japan, Panasonic needed a more efficient way to move very large files between the two locations. Iben Rodriguez, IT consultant for Panasonic Research and Development, explains how a storage-area network and virtual server technology helped speed up WAN performance.
  • +

    CIO Live Podcast #76: Panasonic Speeds Up Trans-Pacific File Transfers, Part II 14 September, 2007 07:00:00

    Part two in our three-part special report from CIO's sister publication Network World in the US, as Paul Desmond reports from the Network World IT Roadmap Conference in Santa Clara, California. With development teams in the US and Japan, Panasonic needed a more efficient way to move very large files between the two locations. Iben Rodriguez, IT consultant for Panasonic Research and Development, explains how a storage-area network and virtual server technology helped speed up WAN performance.
  • +

    CIO Live Podcast #75: Panasonic Speeds Up Trans-Pacific File Transfers, Part I 07 September, 2007 07:00:05

    Part one in our three-part special report from CIO's sister publication Network World in the US, as Paul Desmond reports from the Network World IT Roadmap Conference in Santa Clara, California. With development teams in the US and Japan, Panasonic needed a more efficient way to move very large files between the two locations. Iben Rodriguez, IT consultant for Panasonic Research and Development, explains how a storage-area network and virtual server technology helped speed up WAN performance.
  • +

    TJX Maxx hacker banged up for 30 years 09 January, 2009 11:26:00

    Key figure in the infamous TJX Maxx Wi-Fi hack of 2005 has been sentenced to 30-years in prison by a Turkish court.
    Maksym Yastremskiy, the Ukrainian accused of being a key figure in the infamous TJX Maxx Wi-Fi hack of 2005, has been sentenced to 30-years in prison by a Turkish court.
  • +

    Data breaches rose sharply in 2008, says study 08 January, 2009 08:27:00

    More than 35 million data records were breached in 2008, according to the Identity Theft Resource Center.
    More than 35 million data records were breached in 2008 in the U.S., a figure that underscores continuing difficulties in securing information, according to the Identity Theft Resource Center (ITRC).
  • +

    Rogue SSL certificate exploit puts VeriSign on the spot 07 January, 2009 11:04:00

    Wishes "white hat" researchers had notified VeriSign before public demo.
    Following the success of researchers last week in creating a false SSL certificate based on VeriSign's RapidSSL brand, the company is scrambling to explain how it happened, how it's preventing it from reoccurring, and whether its other SSL certificate-generation services are at risk.
  • +

    With Gaza conflict, cyberattacks come too 05 January, 2009 08:03:00

    Pro-Palestinian hackers have defaced thousands of sites following attacks in Gaza.
    The conflict raging in Gaza between Israel and Palestine has spilled over to the Internet.
  • +

    5 ways to secure your Blackberry 18 December, 2008 12:58:00

    What do Tom Cruise and the McCain campaign have in common? They have both been bitten by the loss of a Blackberry. Mobile expert Dan Hoffman gives advice on how to keep your cherished mobile device safe, even if it's out of your hands
    What do Tom Cruise and the McCain campaign have in common? They have both been bitten by the loss of a Blackberry. Mobile expert Dan Hoffman gives advice on how to keep your cherished mobile device safe, even if it's out of your hands.
CIO Webcast Innovation #8 - What are the biggest roadblocks to IT's involvement in innovation at your company?
Watch the latest latest edition of CIO Innovation which is now available for download.
Watch the webcast
Sign up to the CIO Innovation update email


CIO Live Podcast #79: Brent D Taylor, author of The Outsider's Edge: The Making of Self-Made Billionaires Part II
Listen to the latest edition of CIO Live which is now available for download.
Listen to the podcast
Sign up to the CIO Live email
Whitepaper

Choices in Storage Architecture for Oracle Environments

Database systems have always been at the core of the IT landscape. Not only is storage an increasingly large cost component of database investments, but storage architecture can significantly and directly impact the performance, availability, and recovery of data. Read on to explore the interaction between Oracle databases and EMC and Network Appliance storage architectures.