Blog: The Wisdom of Crowds Meets the Wisdom of Authors: How XML Enables the Semantic Web
- 21 July, 2008 14:02
- Comments
By my colleague, Paul Wlodarczyk the VP of Solutions Consulting for JustSystems:
I recently attended the first-ever Linked Data Planet conference, where a number of pioneers in the field of Semantic Web shared their perspectives on the state of the art - and business - of helping the world tag their web pages for meaning. So what is the Semantic Web and how is it different from the web of today? On the web, most search engines today use key words and the number of links to a page to determine the relevance of search results. This is the wisdom of crowds at work: If the key words you are searching for occur often on that page, and the page is popular (i.e. lots of links to it), then it is probably the best bet for what you are searching for. The downside of this approach is that it infers meaning of the page. On the Semantic Web, the crowds get wiser thanks to the wisdom of authors, who can let the crowds know - in no uncertain terms - what their content means.
For example, when "New York" appears in an HTML document, it could mean New York City, New York State, the Yankees, the Mets, the Giants, the Jets, the song, the strip steak, the state of mind, etc. You get the idea. Words are ambiguous when taken out of context.
If I'm writing about a sporting event, the context of the article lets you know that "New York" means a specific team. The typical search engine, however, doesn't recognize context. To a search engine, "New York" is just a string that occurs in the document with some frequency.
Key to the Semantic Web is semantic markup, which lets users annotate their web pages with metadata - HTML attributes that don't get displayed in the document. Semantic metadata describes what the pages are about, letting authors define things with authority and precision.
In my "New York" document, I can state that the document is about the sports team, not the steak. I can do this by tagging the named entities in the document - the people, places, things, events, and facts - in an unambiguous way. I can also set those entities into relationships with each other. If part of my document refers to a player trade between the New York Yankees and Oakland A's, I can tag the Yankees (entity number one), the A's (entity number two), and the player trade (an event, but also a relationship between the two named entities).
Overcoming the semantic hurdles. While semantic tagging gives documents unambiguous meaning, it has traditionally faced two large hurdles. First, adding semantic markup has been relatively expensive, in terms of either labor or technology. Second, the market for consuming this markup has been small. Both of those hurdles are rapidly falling away.
Let's address the second point first. Yahoo! has introduced SearchMonkey, a new technology that rates web pages. Rather than use keywords and number of links to the page (the wisdom of crowds), SearchMonkey finds web pages using the semantic markup that is embedded in the page (the wisdom of authors). This creates a substantial motive for adding semantic markup - search engine optimization. Semantic markup makes your content more likely to be found and more relevant to the searcher.
Marking up existing content. Which brings us to the first point: How do you add semantic markup? For legacy content, you need to use some combination of people and automation. Using people to tag existing content requires specialized skills that are in short supply. But some interesting technologies for auto-tagging content are emerging. Thomson Reuters' Calais is a great example. For a demo, visit http://sws.clearforest.com/SWS.htm, and try pasting some text that describes your company. I did and Calais accurately tagged all named companies, legal entities, products, technologies, countries, cities, and correctly identified a product acquisition as an event.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
- Bookmark this page
- Share this article
- Got more on this story? Email CIO
- Follow CIO on twitter
- Stella Travel Services embarks on a strategic refresh of print operations
- Printer Usage and Cost Management Strategies for the Australian Mid-market, an Unrealised Opportunity
- Endpoint Buyers Guide
- Oracle Business Intelligence and Data Warehousing From Storage to Scorecard
- Lost USB keys have 66% chance of malware
-
10 Tips for Dealing with a Bully Boss
-
Social networking security in the workplace
-
Facebook stock slumps for third day
-
Dell's profit shrinks in the first quarter
-
How to design a successful RACI project plan
-
Agile: Transforming small-team thinking into big business results
Agile is fast becoming the development method of choice for many Australian businesses. This whitepaper discusses key trends and best practices for scaling agile within complex organisations. -
Oracle SOA vs. IBM SOA - Customer Perspectives on Evaluating Complexity and Business Value
The Service-Oriented Architecture (SOA) model has become the cornerstone of business computing. Its ability to greatly accelerate the development of business-critical applications promotes business agility, decreases time-to-value and total cost of ownership (TCO), and greatly increases the efficiency and strategic value of IT. SOA implementations tend to be complex, IT decision makers should carefully consider their choice of a SOA platform in terms of its ability to simplify the fundamental development, deployment, and management tasks involved. Read on. -
A buyer’s guide to application lifecycle management (ALM) solutions
This buyer's guide describes the key criteria for application lifecycle management (ALM) solutions for today's high-performance teams. It includes key considerations for enhancing your single- or multi-vendor ALM environment.
-
Office 2007 All-In-One Desk Reference for Dummies
-
Teach Yourself Visually Windows 7
-
Windows 7 for Seniors for Dummies®
-
Windows 7 for Dummies®
-
MYOB Software for Dummies 6E Australian Edition
-
Microsoft Office
-
Office 2007 for Dummies
-
Windows 7 for Dummies® Dvd+book Bundle
-
Computers for Seniors for Dummies, 2nd Edition








Comments
Post new comment