Tuesday | 14 October, 2008
CIO
Diligent CTO demystifies data deduplication
Diligent's CTO, Neville Yates discusses deduplication technologies
Deni Connor (Computerworld) 04 May, 2007 12:14:34

Related Features
  • +

    Ticked Off at Tick the Box Mentality 04 February, 2008 13:01:15

    Does your executive search firm know the difference between an MIS manager and a CIO, and if it does, can it explain that difference to its corporate clients?
    Does your executive search firm know its MIS managers from its elbow? Does it even know the difference between an MIS manager and a CIO, and if it does, can it explain that difference to its corporate clients?
  • +

    Strategies for Dealing With IT Complexity 24 December, 2007 10:30:47

    Every innovation, every business process improvement, comes with an IT complexity tax that must be paid by CIOs in time, money and sweat. Here are strategies to mitigate the increasing complexity of IT as it enables new business.
    Every innovation, every business process improvement, comes with an IT complexity tax that must be paid by CIOs in time, money and sweat. Here are strategies to mitigate the increasing complexity of IT as it enables new business.
Related Stories
  • +

    Clean up your SOAP-based Web services 27 November, 2007 13:16:14

    The Test Center inspects five worthy tools for keeping your services squeaky clean
    SOAP is the currency of the SOA marketplace -- for now, anyway. Though SOAP's significance may diminish as Web services evolve, its importance for the time being is unquestionable. Therefore, a substantial portion of the QA work by Web service providers and consumers must entail verifying the accurate exchange of SOAP messages. Not surprisingly, several SOAP-focused Web service testing tools have appeared.
  • +

    Can Macs conquer the enterprise? 11 January, 2008 10:55:53

    The field is wide open for a Macintosh insurrection on the business desktop. It could happen, but probably won't. Here's why.
    If Apple were a football team, the New England Patriots would have had some serious competition this year.
Additional Resources
Executive Guides
Whitepapers

Newsletter Subscription

Sign up for our CIO newsletters!
Weekly coverage of the issues that impact corporate and government information
RSS Feeds

Diligent Technologies is among the pioneers of data deduplication technology, which helps enterprises reduce redundant copies of data and, in turn, shrink storage requirements and shorten backup times. Neville Yates, Diligent's CTO, talked with Network World Senior Editor Deni Connor about the varying deduplication technologies used with today's virtual tape libraries (VTL).

So what is deduplication?

Deduplication is a means by which data is examined and compared to existing data. If it is the same, it is filtered out and the existing data is referenced. Deduplication is very prominent in applications such as backup that cause a lot of duplication as a byproduct of how they work. These applications are prime targets for deduplication technology.

What forms of deduplication are there?

There are three ways deduplication can occur that are talked about today in the market. One of them is the offering from Diligent called HyperFactor, which takes a look at data in an agnostic form and searches the datastream for similarity. Once similarity is found, a computation difference is performed guaranteeing that what is to be filtered out is exactly the same as what is referenced. Only new data is stored.

Another one uses hash technology or hash algorithms whereby data is sliced into some digestible piece -- such as perhaps 8Kbytes in size -- and a hash is assigned to that data and the data is stored. If that signature or hash is recomputed on a new datastream, then that computation suggests that that data already exists and can be referenced. It doesn't need to consume more storage, thereby reducing the amount of storage consumed.

The third is one where the datastream is looked at inside for its logical content, assuming that a file of a particular name is most likely to be a good candidate when compared to the contents of a file of exactly the same name on a fully qualified basis, meaning directory, directory tree, etc., and then a computational difference is done between the two files.

So there are three fundamental approaches and many different ways of implementing those approaches.

What are the different ways deduplication has been implemented?

One of the implementation differences in those approaches is whether you receive all of the data and lay it down on disk and then sometime in the future read it back in from a deduplication perspective, or whether during the receipt of the data you process it inline and in real time to achieve the deduplication.

Those are called inline and post-processing?

That is correct.

You say that Diligent uses the HyperFactor approach. Who are some of the vendors that use hash algorithms?

Hashing or some derivative thereof is used by Quantum/ADIC, Data Domain and FalconStor. HyperFactor is our own IP. Content-aware is something that is being pursued by Sepaton.

What are the advantages and disadvantages of inline deduplication and post-processing?

Inline deduplication first of all is difficult to achieve in terms of performance. But if you do achieve it, it is advantageous because once you have finished the job, the job is done -- there is no heavy lifting and you don't have to worry about capacity planning for any background tasks and what resources might be available to support that. Contrary to post-processing, while the data is being received by the backup application, none of the heavy lifting is being done, and so end users need to concern themselves with the amount of effort needed to do the post-processing.

It is quite easy to understand when you look under the covers that the activity on the disk subsystem is greatly increased as a byproduct of post processing, simply because you have to write everything and read it back. Then there's all the database and indexing overhead that is painful and can slow the process down. It is quite reasonable to assert that if you are able to de-dupe inline at 300 to 400MB per sec you wouldn't even consider doing post processing because the situation drives toward a higher I/O profile and slows you down.

Market Place
 

Smart SOA World Tour

Discover how SOA can create smarter outcomes for your business.

Attend and learn:

  • How SOA is helping leading companies to become more agile
  • Where you should be applying SOA processes in your company
  • The top SOA implementation mistakes to avoid

Click here for more information.
  • +

    CIO Live Podcast #79: Brent D Taylor, author of The Outsider's Edge: The Making of Self-Made Billionaires Part II 05 October, 2007 06:00:00

    For his new book, The Outsider's Edge: The Making of Self-Made Billionaires, social researcher Brent D Taylor spent four years of intensive research investigating the psychological make-up and backgrounds of some of the world's richest men and women, including IT luminaries Bill Gates, Larry Ellison and Steve Jobs. Taylor discovered that, despite working in different industries and coming from different upbringings, they all have one thing in common -- they are all outsiders.
  • +

    CIO Live Podcast #78: Brent D Taylor, author of The Outsider's Edge: The Making of Self-Made Billionaires 28 September, 2007 17:34:25

    For his new book, The Outsider's Edge: The Making of Self-Made Billionaires, social researcher Brent D Taylor spent four years of intensive research investigating the psychological make-up and backgrounds of some of the world's richest men and women, including IT luminaries Bill Gates, Larry Ellison and Steve Jobs. Taylor discovered that, despite working in different industries and coming from different upbringings, they all have one thing in common -- they are all outsiders.
  • +

    CIO Live Podcast #77: Panasonic Speeds Up Trans-Pacific File Transfers, Part III 21 September, 2007 07:00:00

    Part three in our three-part special report from CIO's sister publication Network World in the US, as Paul Desmond reports from the Network World IT Roadmap Conference in Santa Clara, California. With development teams in the US and Japan, Panasonic needed a more efficient way to move very large files between the two locations. Iben Rodriguez, IT consultant for Panasonic Research and Development, explains how a storage-area network and virtual server technology helped speed up WAN performance.
  • +

    CIO Live Podcast #76: Panasonic Speeds Up Trans-Pacific File Transfers, Part II 14 September, 2007 07:00:00

    Part two in our three-part special report from CIO's sister publication Network World in the US, as Paul Desmond reports from the Network World IT Roadmap Conference in Santa Clara, California. With development teams in the US and Japan, Panasonic needed a more efficient way to move very large files between the two locations. Iben Rodriguez, IT consultant for Panasonic Research and Development, explains how a storage-area network and virtual server technology helped speed up WAN performance.
  • +

    CIO Live Podcast #75: Panasonic Speeds Up Trans-Pacific File Transfers, Part I 07 September, 2007 07:00:05

    Part one in our three-part special report from CIO's sister publication Network World in the US, as Paul Desmond reports from the Network World IT Roadmap Conference in Santa Clara, California. With development teams in the US and Japan, Panasonic needed a more efficient way to move very large files between the two locations. Iben Rodriguez, IT consultant for Panasonic Research and Development, explains how a storage-area network and virtual server technology helped speed up WAN performance.
  • +

    Cutting Through the Spin of Recent Vulnerability Disclosures 13 October, 2008 10:53:00

    The FUD surrounding the ClickJacking and TCP/IP vulnerabilities has the world seemingly frozen in fear. But once you cut through the spin, the vulnerabilities aren't all that they were made out to be.
    There are a few highly publicised vulnerabilities at the moment which haven't completely been disclosed and which, it is claimed, could threaten the whole Internet as-we-know-it. Only, when the vulnerabilities are finally disclosed, it seems that the whole incident has been somewhat Chicken Little.
  • +

    PCI app security: Who's guarding the data bank? 13 October, 2008 11:09:00

    Compliance strategies for PCI's new application security requirements
    While Willy Sutton never really said it, the truth is that people rob banks because that is where the money is. Today's criminals don't walk into banks with loaded guns and get-away drivers. Rather they connect from a remote location using a browser and are armed with hacking tools and spyware.
  • +

    Data-center security tools to not overlook 10 October, 2008 11:37:00

    With the rise of security suites, it's time to consider some emerging security tools and rethink others
    Protecting a corporate data center is like trying to keep an elephant safe from a swarm of flies. Despite your best efforts, bites happen. As the staples of security -- such as firewalls, antivirus software, spam and spyware filters -- come together in suites of products that allow for sophisticated management, there are other security tools either emerging or worth a rethink.
  • +

    IBM, Secret Service, others study identity/cybercrime issues 09 October, 2008 10:09:00

    Center for Applied Identity Management Research organization teams experts in criminal justice, financial crime, biometrics, cybercrime and cyberdefense, data protection, homeland security and national defense.
    IBM, LexisNexis and the Secret Service are among a group of corporations, government agencies and academic institutions that has formed to study and help solve identity management challenges around cybercrime, terrorism and narcotics trafficking.
  • +

    Strange account management at Amazon 09 October, 2008 09:51:00

    A careless login led to the discovery of some strange ccount management practices at one of the Internet's largest retailers.
    Via the RISKS mailing list comes an interesting tale of poor online account management at a major online retailer. According to Graham Bennett, accounts with Amazon display an odd behaviour that doesn't seem to have attracted much attention in the past.
CIO Webcast Innovation #8 - What are the biggest roadblocks to IT's involvement in innovation at your company?
Watch the latest latest edition of CIO Innovation which is now available for download.
Watch the webcast
Sign up to the CIO Innovation update email


CIO Live Podcast #79: Brent D Taylor, author of The Outsider's Edge: The Making of Self-Made Billionaires Part II
Listen to the latest edition of CIO Live which is now available for download.
Listen to the podcast
Sign up to the CIO Live email
Whitepaper

The IP Storage payoff: Turning your investment into efficient, affordable results

Recent advances in IP-based storage technologies leverage existing technology and staff to easily and cost-effectively build and maintain sophisticated storage networks. Discover the solutions to your data storage challenges with IP storage.