Now truly hitting its stride, the field of Big Data is starting to focus on some of the hardest problems that people ever face. Until recently, the science of genomics has produced relatively modest amounts of data, with the landmark 2003 overview of the human genome occupying a scant few gigabytes.
As it becomes faster and easier to sequence the genomes of particular people, though, data stores are swelling quickly. The New York Genome Center alone generates 20 to 30 terabytes of fresh data daily, shuffling the old information off site to cheaper, slower storage in order to make room.
Even given such already-impressive output, the challenges are only becoming greater. A recent paper in open-access journal PLOS Biology sees genomics overtaking today’s top data generators, like YouTube’s users and the science of astrophysics, within the decade. With the store of worldwide genomics data today amounting to a fraction of what YouTube acquires in a year, the growth rate will be phenomenal.
The allure of genomics is such that the challenge of mastering its future data output must be accepted. The science promises to deliver genetically targeted treatments for diseases of all kinds, offering hope for everyone from cancer victims to those with degenerative ailments like Alzheimer’s. While early Big Data success stories were often of a less-momentous sort, it is now becoming clear that the field will broadly, positively impact human life in the most basic and important ways possible.
Fresh Partnerships Bring Big Data Power to Genomics Researchers
Recognizing the role they have been called upon to play, the biggest names in technology are stepping up to the plate. Already in possession of many of the world’s secrets, Google late last year launched a brand-new Genomics addition to its established cloud-computing platform.
Joining forces with the Broad Institute, itself a partnership between MIT and Harvard that boasts the world’s biggest store of disease-related genetic information, Google aims at helping researchers around the world add to a centralized repository of genetic data. Toward that end, the company offers its not-inconsiderable supplies of cloud storage and on-demand CPU time, thereby hoping to remove at a stroke the technological barriers that genomics researchers would otherwise face.
“We saw biologists moving from studying one genome at a time to studying millions,” observed Google Genomics head David Glazer, noting that the company’s previous experience with large-scale data translates precisely into the realm of genomics. For $25 per year, Google promises to keep any individual genome securely stored and accessible to the same software tools that support the inquiries of the Broad Institute’s researchers. As with many things Google is involved with, that arrangement will raise plenty of questions regarding privacy, but the attractiveness of the basic package is plain.
Not be outdone, longtime technology stalwart IBM is forming partnerships of its own. Best known for its performance on the game show Jeopardy!, the company’s Watson supercomputer is now being tasked with crunching genomic data. Focusing specifically on discovering genetically targeted answers for those not helped by traditional cancer treatments, the new initiative sees IBM’s Watson Health division working with over a dozen top hospitals and research centers right from the start.
Two Important Fields Evolving Quickly and With Great Timing
With the rate of global genomics data production now doubling every seven months, this kind of heavy technological lifting is greatly appreciated. Mapping individual human genomes to cure diseases, in fact, is only part of the potential of genomics. Aiming to improve yields and nutrition to put an end to hunger for good, for example, the Beijing Genomics Institute has already sequenced the genomes of more than 3,000 varieties of rice. All told, the group hopes to come to a deep genetic understanding of more than a million different crop varieties and breeds of livestock animals, a pursuit that will engender challenges just as great as those regarding the human genome.
Whether it turns out to be the most demanding application of all, as recently predicted, or never quite overtakes the basic human appetite for YouTube videos, it is already clear that genomics will benefit from the technologies and strategies of Big Data. Without that support, in fact, relatively little of the tremendous potential of the field could ever be realized. Although only the earliest of results have rolled in so far, it is undeniably fortunate that the two fields are developing so quickly and at the same time.