[Image above] Even in this digital age, physical copies can still be the best way to go. Credit: Quartz, YouTube
Like many young adults raised in the 1980s and 90s, I grew up on VHS tapes. Even now I miss the days when a scratch meant just a few seconds of static rather than a death toll that consigns a DVD to the trash.
Unfortunately, watching my VHS collection today is increasingly difficult—replacement VCRs are scarce as companies have shuttered their VCR production.
Big data is here—where does it go?
Currently, we generate an estimated 2.5 quintillion bytes of data every day. To put that number in perspective, imagine that number in years. Our world, which has been around an estimated 4.5 billion years, would need to exist over half a billion times its current age to reach 2.5 quintillion!
Even as we generate new data, we need a way to store our old data. Especially in the sciences, if we compare data collected from old observations to data collected from new observations, we can draw inferences about how things change over time (such as climate) and pinpoint rare occurrences that only show up when analyzing large datasets (such as obscure particles).
While digital systems have the advantage over tapes when it comes to quickly processing, recording, and accessing huge amounts of data, it is unwise to archive data long-term in only a digital format. The Gmail outage of 2011 illustrates why.
One seemingly innocuous Sunday in February 2011, a small group of Gmail users logged into their accounts to discover their inbox, sent box, folders—everything—gone. As Google support forums began filling with desperate pleas from these users, Google first turned to its digital data redundancy files—and found them useless.
“If you make five copies of data on disk mirrors, you’ve got five bad copies… [The 2011 Gmail outage] showed that redundancy is not a recovery strategy,” Google’s staff site reliability engineer Raymond Blum explains during a talk at the 2014 Fujifilm Global IT Executive Summit, as summarized in a TechTarget article.
Magnetic tape for archive storage
As mentioned, tapes cannot record or access data at the speeds achieved by hard disks or semiconductor memories. However, for long-term archive storage, tapes offer multiple advantages over digital systems.
For one, tape storage is more energy efficient. Once all the data is recorded, a tape cartridge waits quietly on a shelf and does not consume any power. The offline nature of tapes also provides another layer of protection against hackers, cyberattacks, and buggy software. Additionally, tape has lower bit error rates (BER) than hard drives. For example, one uncorrected bit read error occurs in a consumer SATA disk every 1014 bits (2018 specifications) compared to every 1019 bits in a new linear tape-open (LTO-8) tape.
There is another advantage to tapes that make them desirable compared to disks (besides being cheaper). In a YouTube video posted by Quartz, IBM manager for advanced tape technologies Mark Lantz explains that while we are running into a limit of how much data we can cram onto a single digital disk, magnetic tapes are far from reaching their limit.
“Today, we have 20 terabytes in a cartridge like this,” he explains while holding a tape cartridge. “But our demonstration has shown that we can, with the technologies we have here today, achieve capacities of 330 terabytes in this same form factor.”
IBM researchers achieved this feat by increasing the tape’s areal density.
Areal density is a measure of the amount of data that can be stored on a given unit of physical space on storage media. In the case of magnetic tapes, data is stored by magnetizing magnetic particles (typically iron oxide) that coat the plastic film to point either left or right (encode bits 1 or 0).
The IBM researchers increased areal density by shrinking the magnetic grains that coat the plastic.
Currently, the areal density of tape is still lower than for disk drives, but the greater surface area available on a tape provides opportunities to significantly increase tape areal density.
Google is not alone in having a tape library. Many companies, like Microsoft, use IBM System Storage Tape Libraries to archive their data, and CERN, one of the world’s largest international research collaborations, also archives their particle physics data on tapes. In today’s video, Quartz takes a close look at how magnetic tapes are used to archive data at CERN.
Credit: Quartz, YouTube
Tapes are popular—but a shortage makes procurement hard
Tapes provide valuable archival storage for companies and organizations. However, an ongoing patent infringement battle is making obtaining tapes difficult.
Only two manufacturers still produce LTO tape: Fujifilm and Sony. A lawsuit brought by Fujifilm against Sony in 2016, followed by Sony counter-suing Fujifilm, has led to a shortage of LTO-8 tapes (the tapes with BER of 1:1019). By March 2019, the United States banned import of LTO products from both manufacturers due to the ongoing patent infringement battle.