How big is Big Data? According to technology advisory firm IDC, the world’s data in 2007 totaled nearly exabytes. One exabyte equals 1018 bytes. To put that in perspective, GMail allows you to send attachments totaling “only” 25 gigabytes (25 billion bytes), a huge allowance but still small fry—ridiculously small. This year, the world’s data is expected to reachone zetabyte, or 1,000 exabytes. Confused? Think of it this way: one zetabyte of data is equivalent to over 2,000 Libraries of Congress.
Forrester reports that 80% of the world’s data is unstructured in file formats that represent non-textual contexts such as images, graphics, videos, and audio files, as well as file formats such as PDFs and PowerPoint. Ninety two percent (92%) of the world’s data is stored on magnetic media, thereby taxing the organizations that pay for the hardware, bandwidth, power, physical plant, and IT management to store it all.
Traditional storage of multi-petabyte systems has become prohibitively expensive and requires a completely new way of looking at data storage.
I recently had the opportunity to speak at length with Chris Gladwin, Founder and CEO of Cleversafe, which specializes in storing huge amounts of data securely. When I asked Gladwin the key to addressing today’s Big Data storage demands, the answer he gave was as fascinating as it is intuitive: slice it; encrypt it algorithmically; and then disperse it to different storage facilities with as little redundancy as possible.
Gladwin, a successful serial entrepreneur and MIT-trained engineer, has a background in approaching storage systems
from new angles based on early work at Zenith Data Storage, a competitor to Net Apps when storage units were first connected to networks themselves, rather than merely as peripherals to hardware such as desktops. Even so, his Cleversafe team approached the challenges of Big Data yet anew, applying to storage their expertise from other large-scale systems such as global telephony. One of the keys was to work around the limitations inherent in a conventional method of large-scale storage known as the Redundant Array of Independent Disks (RAID) used to virtualize redundant copies of data. RAID, however, simply can’t scale when one begins to speak of petabytes and exabytes. Gladwin explained:
As storage grows from the terabyte to the petabyte range, the number of copies required to keep the data protected increases. This means the storage system will get more expensive as the amount of data increases.
(Indeed, RAID initially stood for “Inexpensive Disks.” The name was changed by RAID’s own Advisory Board, as set forth here in footnote 2.)
Morever, the use of purely redundant data means that the failure of only a limited number of storage nodes renders one’s data inaccessible irrespective of whether you’re a social media sharing site or a city of three million, the type of entities that traffic in petabytes.
Cleversafe knew that Big Data required a paradigm shift in the way that we conceive of large-scale storage. Its approach to data storage is the exact opposite of RAID, which placed as many redundant copies of one’s data as necessary in different places. Instead, Cleversafe’s Information Dispersal Algorithms take data and slice it into a certain number of unrecognizable pieces (N), each of which is encrypted into random zeroes and ones. These can then be distributed anywhere in the world. Each piece contains the minimumamount of redundant data so that if a critical number (M) of the total number of pieces (again, N) can be recovered, so too can the data stored on them. According to Gladwin, the optimal ratio of (M/N) is approximately ⅝. The larger the numerator and the denominator—that is, the number of slices both in existence and required to recover the data—the safer the system. Unlike RAID, making a system bigger not only makes it more efficient, but also entails no marginal cost.
Scientists are accustomed to working in peer-reviewed environments, and so Gladwin sought the opinion of the world’s finest—the U.S. Intelligence Community—and received its approval in the form of a strategic investment and development agreement with In-Q-Tel (IQT) supporting the missions of the U.S. Intelligence Community. For the intelligence community, the defense community, the federal government, and even state and local governments (especially with video surveillance), living with the challenges of massive data storage was nothing new. Finding scalable, affordable solutions in the age of petabytes and exabytes was another matter. Cleversafe regularly teams with governments at all levels and recently announced an important partnership with Shutterfly, a team that knows plenty about large-scale systems from its days at eBay.
- Big Data
- Business Intelligence
- Chief Information Officer
- Cloud Computing
- Corporate Compliance
- Corporate Counsel
- Corporate Governance
- Corporate Hierarchy
- Corporate Liability
- Data Privacy
- Data Security
- Electronic Discovery
- Electronic Discovery Sanctions
- Electronic Mail
- Electronically Stored Information
- Freedom of Information Act
- Health Care
- Information Technology
- Intellectual Property
- Internet Resources
- Legal Services
- Predictive Coding
- Social Media
- Wireless Networks