DNA sequencing is producing a wave of data which will change the way that drugs are developed, patients diagnosed, and our understanding of human biology. To fulfill this promise, however, the tools for interpretation and analysis must scale to match the quantity and diversity of "big data genomics."
ADAM is an open-source genomics processing engine, built using Spark, Apache Avro, and Parquet. This talk will discuss some of the advantages that the Spark platform brings to genomics, the benefits of using technologies like Parquet in conjunction with Spark, and the challenges of adapting new technologies for existing tools in bioinformatics.
These are slides for a talk given at the Apache Spark Meetup in Boston on October 20, 2014.