Tuesday | 14 October, 2008
CIO
Blog: The Wisdom of Crowds Meets the Wisdom of Authors: How XML Enables the Semantic Web
Jake Sorofman 21 July, 2008 14:02:16

By my colleague, Paul Wlodarczyk the VP of Solutions Consulting for JustSystems:

I recently attended the first-ever Linked Data Planet conference, where a number of pioneers in the field of Semantic Web shared their perspectives on the state of the art - and business - of helping the world tag their web pages for meaning. So what is the Semantic Web and how is it different from the web of today? On the web, most search engines today use key words and the number of links to a page to determine the relevance of search results. This is the wisdom of crowds at work: If the key words you are searching for occur often on that page, and the page is popular (i.e. lots of links to it), then it is probably the best bet for what you are searching for. The downside of this approach is that it infers meaning of the page. On the Semantic Web, the crowds get wiser thanks to the wisdom of authors, who can let the crowds know - in no uncertain terms - what their content means.

For example, when "New York" appears in an HTML document, it could mean New York City, New York State, the Yankees, the Mets, the Giants, the Jets, the song, the strip steak, the state of mind, etc. You get the idea. Words are ambiguous when taken out of context.

If I'm writing about a sporting event, the context of the article lets you know that "New York" means a specific team. The typical search engine, however, doesn't recognize context. To a search engine, "New York" is just a string that occurs in the document with some frequency.

Key to the Semantic Web is semantic markup, which lets users annotate their web pages with metadata - HTML attributes that don't get displayed in the document. Semantic metadata describes what the pages are about, letting authors define things with authority and precision.

In my "New York" document, I can state that the document is about the sports team, not the steak. I can do this by tagging the named entities in the document - the people, places, things, events, and facts - in an unambiguous way. I can also set those entities into relationships with each other. If part of my document refers to a player trade between the New York Yankees and Oakland A's, I can tag the Yankees (entity number one), the A's (entity number two), and the player trade (an event, but also a relationship between the two named entities).

Overcoming the semantic hurdles. While semantic tagging gives documents unambiguous meaning, it has traditionally faced two large hurdles. First, adding semantic markup has been relatively expensive, in terms of either labor or technology. Second, the market for consuming this markup has been small. Both of those hurdles are rapidly falling away.

Let's address the second point first. Yahoo! has introduced SearchMonkey, a new technology that rates web pages. Rather than use keywords and number of links to the page (the wisdom of crowds), SearchMonkey finds web pages using the semantic markup that is embedded in the page (the wisdom of authors). This creates a substantial motive for adding semantic markup - search engine optimization. Semantic markup makes your content more likely to be found and more relevant to the searcher.

Marking up existing content. Which brings us to the first point: How do you add semantic markup? For legacy content, you need to use some combination of people and automation. Using people to tag existing content requires specialized skills that are in short supply. But some interesting technologies for auto-tagging content are emerging. Thomson Reuters' Calais is a great example. For a demo, visit http://sws.clearforest.com/SWS.htm, and try pasting some text that describes your company. I did and Calais accurately tagged all named companies, legal entities, products, technologies, countries, cities, and correctly identified a product acquisition as an event.

Latest User Comments
There are no comments yet. Be the first to add one!

CIO Member Login

Market Place
 

Smart SOA World Tour

Discover how SOA can create smarter outcomes for your business.

Attend and learn:

  • How SOA is helping leading companies to become more agile
  • Where you should be applying SOA processes in your company
  • The top SOA implementation mistakes to avoid

Click here for more information.
  • +

    CIO Live Podcast #79: Brent D Taylor, author of The Outsider's Edge: The Making of Self-Made Billionaires Part II 05 October, 2007 06:00:00

    For his new book, The Outsider's Edge: The Making of Self-Made Billionaires, social researcher Brent D Taylor spent four years of intensive research investigating the psychological make-up and backgrounds of some of the world's richest men and women, including IT luminaries Bill Gates, Larry Ellison and Steve Jobs. Taylor discovered that, despite working in different industries and coming from different upbringings, they all have one thing in common -- they are all outsiders.
  • +

    CIO Live Podcast #78: Brent D Taylor, author of The Outsider's Edge: The Making of Self-Made Billionaires 28 September, 2007 17:34:25

    For his new book, The Outsider's Edge: The Making of Self-Made Billionaires, social researcher Brent D Taylor spent four years of intensive research investigating the psychological make-up and backgrounds of some of the world's richest men and women, including IT luminaries Bill Gates, Larry Ellison and Steve Jobs. Taylor discovered that, despite working in different industries and coming from different upbringings, they all have one thing in common -- they are all outsiders.
  • +

    CIO Live Podcast #77: Panasonic Speeds Up Trans-Pacific File Transfers, Part III 21 September, 2007 07:00:00

    Part three in our three-part special report from CIO's sister publication Network World in the US, as Paul Desmond reports from the Network World IT Roadmap Conference in Santa Clara, California. With development teams in the US and Japan, Panasonic needed a more efficient way to move very large files between the two locations. Iben Rodriguez, IT consultant for Panasonic Research and Development, explains how a storage-area network and virtual server technology helped speed up WAN performance.
  • +

    CIO Live Podcast #76: Panasonic Speeds Up Trans-Pacific File Transfers, Part II 14 September, 2007 07:00:00

    Part two in our three-part special report from CIO's sister publication Network World in the US, as Paul Desmond reports from the Network World IT Roadmap Conference in Santa Clara, California. With development teams in the US and Japan, Panasonic needed a more efficient way to move very large files between the two locations. Iben Rodriguez, IT consultant for Panasonic Research and Development, explains how a storage-area network and virtual server technology helped speed up WAN performance.
  • +

    CIO Live Podcast #75: Panasonic Speeds Up Trans-Pacific File Transfers, Part I 07 September, 2007 07:00:05

    Part one in our three-part special report from CIO's sister publication Network World in the US, as Paul Desmond reports from the Network World IT Roadmap Conference in Santa Clara, California. With development teams in the US and Japan, Panasonic needed a more efficient way to move very large files between the two locations. Iben Rodriguez, IT consultant for Panasonic Research and Development, explains how a storage-area network and virtual server technology helped speed up WAN performance.
  • +

    Cutting Through the Spin of Recent Vulnerability Disclosures 13 October, 2008 10:53:00

    The FUD surrounding the ClickJacking and TCP/IP vulnerabilities has the world seemingly frozen in fear. But once you cut through the spin, the vulnerabilities aren't all that they were made out to be.
    There are a few highly publicised vulnerabilities at the moment which haven't completely been disclosed and which, it is claimed, could threaten the whole Internet as-we-know-it. Only, when the vulnerabilities are finally disclosed, it seems that the whole incident has been somewhat Chicken Little.
  • +

    PCI app security: Who's guarding the data bank? 13 October, 2008 11:09:00

    Compliance strategies for PCI's new application security requirements
    While Willy Sutton never really said it, the truth is that people rob banks because that is where the money is. Today's criminals don't walk into banks with loaded guns and get-away drivers. Rather they connect from a remote location using a browser and are armed with hacking tools and spyware.
  • +

    Data-center security tools to not overlook 10 October, 2008 11:37:00

    With the rise of security suites, it's time to consider some emerging security tools and rethink others
    Protecting a corporate data center is like trying to keep an elephant safe from a swarm of flies. Despite your best efforts, bites happen. As the staples of security -- such as firewalls, antivirus software, spam and spyware filters -- come together in suites of products that allow for sophisticated management, there are other security tools either emerging or worth a rethink.
  • +

    IBM, Secret Service, others study identity/cybercrime issues 09 October, 2008 10:09:00

    Center for Applied Identity Management Research organization teams experts in criminal justice, financial crime, biometrics, cybercrime and cyberdefense, data protection, homeland security and national defense.
    IBM, LexisNexis and the Secret Service are among a group of corporations, government agencies and academic institutions that has formed to study and help solve identity management challenges around cybercrime, terrorism and narcotics trafficking.
  • +

    Strange account management at Amazon 09 October, 2008 09:51:00

    A careless login led to the discovery of some strange ccount management practices at one of the Internet's largest retailers.
    Via the RISKS mailing list comes an interesting tale of poor online account management at a major online retailer. According to Graham Bennett, accounts with Amazon display an odd behaviour that doesn't seem to have attracted much attention in the past.
CIO Webcast Innovation #8 - What are the biggest roadblocks to IT's involvement in innovation at your company?
Watch the latest latest edition of CIO Innovation which is now available for download.
Watch the webcast
Sign up to the CIO Innovation update email


CIO Live Podcast #79: Brent D Taylor, author of The Outsider's Edge: The Making of Self-Made Billionaires Part II
Listen to the latest edition of CIO Live which is now available for download.
Listen to the podcast
Sign up to the CIO Live email
Whitepaper

Dude! You Say I Need an Application-Layer Firewall?!

Proxy firewall technologies have proven time and again to be more secure than “stateful” firewalls. They will also prove to be more secure than “deep inspection” firewalls. High-performance proxy firewalls are available today which are easily capable of handling gigabit-level traffic. Discover more by reading on.