Open source deduplication software released for Linux
- 25 March, 2010 07:28
A new open source project, dubbed Opendedup, has appeared with the goal of creating a deduplication-based file system for Linux called SDFS.
The project’s developer Sam Silverberg says today’s deduplication solutions only solve the problem of storing deduplicated data, not reading and writing inline data.
SDFS is designed to support the needs of virtual environments including the VMware, Xen, and KVM hypervisors.
The filesystem can deduplicate inline (at a line speed of 150Mbps or greater) or periodically based on needs and this can be changed on the fly. Support for file or folder level snapshots is also a feature.
With support for deduplication at 4K block sizes, virtual machines data can be deduplicated and stored locally, across multiple nodes or in the cloud. It supports some 3TB of storage per gigabyte of memory.
A design goal was a distributed architecture and SDFS is scalable to eight petabytes of capacity with 256 storage engines, which can each store up to 32TB of deduplicated data. Each volume can be up to 8 exabytes and the number of files is limited by underlying file system.
The requirements for Opendedup are a 64-bit Linux distribution (it’s tested and developed on Ubuntu), Fuse 2.8 or greater, 2 GB of memory and Java 7.
Silverberg designed Opendedup to run in user space and be object-based because it would be platform independent, have a faster development cycle, easier to scale and cluster and to provide flexibility for integrating with other user space services like Amazon S3.
There is also the opportunity to leverage file system technologies like replication and snapshotting.
The latest release of SDFS, version 0.8.8 adds better I/O performance, scheduling of filesystem tasks, and a fix for a data corruption issue when removing unused deduplicated chunks.
The maximum file size it currently limited to 250GB with 4K chunk size.
Opendedup’s architecture consists of a SDFS Volume (one deduplicated file engine and one Fuse-based file system); a dedup file engine (manages file-level activities); a Fuse-ased file system; and a dedup storage engine which is the server-side service that stores and retrieves chunks of deduplicated data.
SDFS is licenced under the GPLv2. Windows support and block level replication are on the Opendedup roadmap.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
- Some Australian businesses 'unlikely' to be ready for Privacy Act changes: survey
- BYOA 'shadow IT' grows in the enterprise: Telsyte
- Cost of a Privacy Act breach could extend to ongoing audits: legal expert
- How Hunter Water is saving $50k a year in software licences
- Audit agency does BYOD with BlackBerry
Trust issue looms large for tech companies capitalizing on personal data
5 women who've made it in IT
Five trends affecting legal CIOs
CIO Roundtable: The changing face of security
Bitcoin malware count soars as cryptocurrency value climbs
Implementing an Effective Vulnerability Management Program
Your company's information is often not secured in a large safe which can be easily protected. Instead, information is spread across many systems, networks and devices exposing it to a higher possibility of it being compromised in some way. This paper discusses the challenges of vulnerability management - human, implementation, changes, and software and how to overcome these by implementing an effective vulnerability management program.
Ensuring Online Business in Asia-Pacific
In an increasingly connected world, your company’s online presence is often the first - and sometimes the only – way that your customers and clients engage with you. In this whitepaper, we look at how companies can leverage the right technology and service to deliver a secure network infrastructure - from internal communication and record-keeping to product design and financial management.
Convergence with Vblock Systems: A Value Measurement - IDC In-depth assessment
IT infrastructure is the backbone of today's modern business. It enables rapid expansion into new, fast-growing markets. It is at the core of new customer services offerings such as mobile commerce. It is the key to successfully exploiting an explosion in data and data analytics within business processes.