Open source deduplication software released for Linux
- 25 March, 2010 07:28
- Comments 2
A new open source project, dubbed Opendedup, has appeared with the goal of creating a deduplication-based file system for Linux called SDFS.
The project’s developer Sam Silverberg says today’s deduplication solutions only solve the problem of storing deduplicated data, not reading and writing inline data.
SDFS is designed to support the needs of virtual environments including the VMware, Xen, and KVM hypervisors.
The filesystem can deduplicate inline (at a line speed of 150Mbps or greater) or periodically based on needs and this can be changed on the fly. Support for file or folder level snapshots is also a feature.
With support for deduplication at 4K block sizes, virtual machines data can be deduplicated and stored locally, across multiple nodes or in the cloud. It supports some 3TB of storage per gigabyte of memory.
A design goal was a distributed architecture and SDFS is scalable to eight petabytes of capacity with 256 storage engines, which can each store up to 32TB of deduplicated data. Each volume can be up to 8 exabytes and the number of files is limited by underlying file system.
The requirements for Opendedup are a 64-bit Linux distribution (it’s tested and developed on Ubuntu), Fuse 2.8 or greater, 2 GB of memory and Java 7.
Silverberg designed Opendedup to run in user space and be object-based because it would be platform independent, have a faster development cycle, easier to scale and cluster and to provide flexibility for integrating with other user space services like Amazon S3.
There is also the opportunity to leverage file system technologies like replication and snapshotting.
The latest release of SDFS, version 0.8.8 adds better I/O performance, scheduling of filesystem tasks, and a fix for a data corruption issue when removing unused deduplicated chunks.
The maximum file size it currently limited to 250GB with 4K chunk size.
Opendedup’s architecture consists of a SDFS Volume (one deduplicated file engine and one Fuse-based file system); a dedup file engine (manages file-level activities); a Fuse-ased file system; and a dedup storage engine which is the server-side service that stores and retrieves chunks of deduplicated data.
SDFS is licenced under the GPLv2. Windows support and block level replication are on the Opendedup roadmap.
Opendedup is online at Opendedup.org and on the Google Code portal at: http://code.google.com/p/opendedup/.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
- Bookmark this page
- Share this article
- Got more on this story? Email CIO
- Follow CIO on twitter
-
Monday Grok: Will Siri crack the walls of GOOG?
-
Face Time - Interview with John Brennan and Robert DiStefano
-
Face Time - Interview with John Brennan and Robert DiStefano
-
Phones are distractions during catch-ups
-
Google's Sidewiki lets people post comments about Web pages
-
Synergy gains sustainable competitive edge with HP printers, services and solutions
Western Australian electricity retailer Synergy signed a four-year HP Smart Print Services agreement to establish an efficient and sustainable imaging and printing network which reduces waste and the organisation’s environmental footprint, without compromising on quality, reliability or security. Read more. -
Justifying Business Intelligence Applications
This white paper explores the decision criteria used in a build vs. buy scenario when considering the Oracle BI Applications. The major benefits of the BI Applications will be discussed in the framework of an overall buy vs. build argument. -
Seven Ways Business Activity Monitoring (BAM) Makes Your Supply Chain More Efficient
webMethods Optimize for B2B offers a set of technology capabilities commonly described as Business Activity Monitoring (BAM). To appreciate the value of Optimize and how it operates in conjunction with webMethods Trading Networks, it is helpful to understand the basic concepts behind BAM and how the technology is applied in a business setting. Read on.
-
The Data Modeling Handbook
-
The Cognitive Dynamics of Computer Science
-
Cleaning Windows XP for Dummies
-
Big C++ 2E WileyPlus Standalone Registration Card
-
Professional Web 2.0 Programming
-
Airport and Mac Wireless Networks for Dummies
-
Programming in Cobol/400 2E
-
Lotus Notes R5 for Dummies
-
Excel 2010 in a Rush








Comments
Bob
Deduplication
This is all very interesting I'm sure. But it would be nice, when writing articles or blogs like this, if the writer would define the subject of the article for "the rest of us". What the hell is "deduplication"? Makes no sense to me.
What is it like? You copy a file and then you uncopy it? Or you copy a music CD and then you uncopy the music CD????
Dreadmaul
Be wary of the website. It has a trojan in it!
Post new comment