CIO
Blog: The Wisdom of Crowds Meets the Wisdom of Authors: How XML Enables the Semantic Web
Jake Sorofman  21 July, 2008 14:02:16

By my colleague, Paul Wlodarczyk the VP of Solutions Consulting for JustSystems:

I recently attended the first-ever Linked Data Planet conference, where a number of pioneers in the field of Semantic Web shared their perspectives on the state of the art - and business - of helping the world tag their web pages for meaning. So what is the Semantic Web and how is it different from the web of today? On the web, most search engines today use key words and the number of links to a page to determine the relevance of search results. This is the wisdom of crowds at work: If the key words you are searching for occur often on that page, and the page is popular (i.e. lots of links to it), then it is probably the best bet for what you are searching for. The downside of this approach is that it infers meaning of the page. On the Semantic Web, the crowds get wiser thanks to the wisdom of authors, who can let the crowds know - in no uncertain terms - what their content means.

For example, when "New York" appears in an HTML document, it could mean New York City, New York State, the Yankees, the Mets, the Giants, the Jets, the song, the strip steak, the state of mind, etc. You get the idea. Words are ambiguous when taken out of context.

If I'm writing about a sporting event, the context of the article lets you know that "New York" means a specific team. The typical search engine, however, doesn't recognize context. To a search engine, "New York" is just a string that occurs in the document with some frequency.

Key to the Semantic Web is semantic markup, which lets users annotate their web pages with metadata - HTML attributes that don't get displayed in the document. Semantic metadata describes what the pages are about, letting authors define things with authority and precision.

In my "New York" document, I can state that the document is about the sports team, not the steak. I can do this by tagging the named entities in the document - the people, places, things, events, and facts - in an unambiguous way. I can also set those entities into relationships with each other. If part of my document refers to a player trade between the New York Yankees and Oakland A's, I can tag the Yankees (entity number one), the A's (entity number two), and the player trade (an event, but also a relationship between the two named entities).

Overcoming the semantic hurdles. While semantic tagging gives documents unambiguous meaning, it has traditionally faced two large hurdles. First, adding semantic markup has been relatively expensive, in terms of either labor or technology. Second, the market for consuming this markup has been small. Both of those hurdles are rapidly falling away.

Let's address the second point first. Yahoo! has introduced SearchMonkey, a new technology that rates web pages. Rather than use keywords and number of links to the page (the wisdom of crowds), SearchMonkey finds web pages using the semantic markup that is embedded in the page (the wisdom of authors). This creates a substantial motive for adding semantic markup - search engine optimization. Semantic markup makes your content more likely to be found and more relevant to the searcher.

Marking up existing content. Which brings us to the first point: How do you add semantic markup? For legacy content, you need to use some combination of people and automation. Using people to tag existing content requires specialized skills that are in short supply. But some interesting technologies for auto-tagging content are emerging. Thomson Reuters' Calais is a great example. For a demo, visit http://sws.clearforest.com/SWS.htm, and try pasting some text that describes your company. I did and Calais accurately tagged all named companies, legal entities, products, technologies, countries, cities, and correctly identified a product acquisition as an event.

Latest User Comments
There are no comments yet. Be the first to add one!

Comments

Post new comment

Login or register to link comments to your user profile, or you may also post a comment without being logged in.
The content of this field is kept private and will not be shown publicly.
Enter the fully qualified URL, eg. http://www.example.com/
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

Syndicate content

HP Data Center Transformation solutions offer practical ways to overcome the energy and capacity limitations, operational vulnerabilities and technology constraints that can plague your data center. Choosing from a portfolio of solutions matched to your business needs, we can help you transform your data center into a business-driven, process-smart and future-ready asset.

Latest on Data Centre

  • +

    Inside Internode's data centre 05 June, 2009 14:39:00

    Computerworld gets an exclusive behind the scenes look inside Internode's Adelaide data centre with network guru Mark Newton
    Computerworld gets an exclusive behind the scenes look inside Internode's Adelaide data centre with network guru Mark Newton
  • +

    HP uses outside air, big fans, 12-foot raised floor to cool servers 03 June, 2009 07:44:00

    It's also cutting data center power use by painting server racks white
    Just off the North Sea coast in the United Kingdom, Hewlett-Packard Co.'s EDS unit has built a data center that largely relies on cold sea air to keep servers chilled and -- by doing so -- cut the center's cooling power needs in half.
  • +

    HP targets the cloud with new hardware 12 June, 2009 08:27:00

    HP offers complete cloud computing package for businesses
    HP has designed a new portfolio of hardware, software, and services, aimed at reducing costs and saving resource, particularly for businesses involved in Web 2.0, cloud and high-performance computing.
  • +

    Defence to spend $700m on ICT reform 05 June, 2009 11:13:00

    Strategic Reform Program report reveals only half of defence IT budget visible to CIO
    Less than half of the annual $1.2 billion spent by Defence on its ICT is visible to its chief information officer, Greg Farr, a new report has revealed.
  • +

    Inside Telstra's Virtualisation Strategy 11 May, 2009 14:12:00

    Need to cut infrastructure costs driving the strategy
    Telstra is increasingly turning to virtualisation as its core strategy to both manage the rising costs of, and growth in, its data centres, according the company’s CIO, John McInerney.
  • +

    Defence to Initiate ICT Reform Program, Expand CIO Role 05 May, 2009 11:56:00

    ERP rollout, data centre consolidation, single architecture all on the cards, according to the Department of Defence’s strategic policy white paper
    The Defence department has signaled a raft of changes to its approach to information technology under a new ICT reform program.

Free Resource Library

Data Centre Assessments

The First step to Optimising

Speeding business innovation

Removing barriers to growth, increasing agility and driving out costs

Assessments: Ammunition for Facts-Based Decision Making
by Richard L. Sawyer, Senior Principal, HP Critical Facilities Services
Download Podcast Download Transcript
 

CIO Summit The New World Order Opportunities and Challenges for CIOs

23rd July 2009
The Westin Sydney


A content-rich networking event where CIOs and senior executives collaborate on business and technology issues ranging from the impact of the economic downturn to the most pressing trends affecting IT in the enterprise.

Register Now

  • +

    New scam email uses Australian Federal Police to gain victims' trust 03 July, 2009 10:49:00

    Fake offers of free AFP monitoring service to stop "cybernetic attacks"
    Cyber criminals have changed tack in their ongoing scam campaign against banks, moving to the use of government agencies to gain the trust of unsuspecting email recipients.
  • +

    AFP hits $6 million identity fraud syndicate 03 July, 2009 08:25:00

    $500,000 of goods per week purchased with fake credit cards
    The Australian Federal Police (AFP) claims to have struck a major blow to a multi-million identity fraud syndicate.
  • +

    5 steps to secure a new PC 30 June, 2009 00:19:00

    Just unwrapped a brand-new PC? Security pros share their secrets for making your system Internet-safe.
    A common misconception is that a shiny new computer is more or less secure because it hasn't yet been exposed to the Internet's sinister underbelly. But the truth is, these machines come out of the box needing scores of patches, some basic security software downloads and the disabling or replacing of items security pros don't typically trust.
  • +

    Facebook simplifies privacy settings, calls them too complex 02 July, 2009 05:48:00

    The social-networking site is also getting ready to let members share content with anyone on the Internet
    Facebook will simplify the way in which it offers privacy options to its users, as it gets ready to give its members for the first time the option to make the content they post on their profiles available to anyone on the Internet.
  • +

    DR a growing concern for A/NZ CIOs: Symantec 02 July, 2009 09:16:00

    Mission critical apps and cost of down-time major drivers
    CIOs in Australia and New Zealand are increasingly getting involved in the disaster recovery planning of their organisations, according to a new survey from Symantec.
Upcoming Industry Events
  • CIO SummitNSW - Sydney | 23/07/2009 | Hosted by CIO Magazine, IDC & the CIO Executive Council
Whitepaper

State of Internet Security

Spyware, viruses and other malware transported via Web sites represent the most serious data threat to companies today. Read on find out how you can appropriately leverage technology and appropriate business technologies to protect your business.


CIO Industry Insight Podcast #4: Kerry Stratton, Managing Director of Healthcare, InterSystems
Listen to the latest edition of CIO Live which is now available for download.
Listen to the podcast
Sign up to the CIO Live email