Presentation from Gigaom's Structure 2014 conference, June 21-22 in San Francisco
Genomics - where science fiction meets reality
Francis deSouza, President, Illumina
#gigaomlive
More at http://events.gigaom.com/structure-2014/
5. Genetic Code:
‣ Defines your traits & uniqueness
‣ Carries the blueprints that run the body and its functions
‣ Can change or be modified by environment
6.
7. Reproductive Health
Family Planning Pregnancy Newborn Screening Inherited Disease
Carrier Screening,
IVF/Pre-implantation
screening
8. Cancer
Theragnostics
Targeted treatment mix
Predisposition
Risk
Early Detection
Before the tumor
Monitoring
Treatment effectiveness
& recurrence
16. 1.Sequencing begins to directly save lives
2.Tumor samples routinely sequenced/Standard of Care
3.Nations begin sequencing their populations
4.Sequences accessible in electronic medical record
5.Infants routinely sequenced at birth
6.Cancer managed as chronic disease
Time Frame
(years)
0 10
0 10
0 10
0 10
0 10
0 10
17. Genomics: Where Science Fiction
Meets Reality
17
Francis de Souza
President, Illumina
@fdesouza
Notes de l'éditeur
Oncology start to finish transformation (would like to create something similar to reproductive health paradigm)
- Relegating cancer to a chronic disease
In addition to direct impacts to human health and health care, there are many other markets affected by genomic science that will impact our quality of life.
Like public safety … Forensics
Forensics Story – what will be possible?
Full profiling (eye color, ethnicity, hair, etc.) from genetic material
In genome sequencing, data is generated, analyzed computationally and then those findings are “interpreted”, typically by experts.
When you think about populations being sequenced and scaling this process, it begs for automation.
700TB = NetFlix repository = that would hold 210,000 whole human genomes
Developed world population = 7.2B in 2013, expected to go to 8.2B over the next 12 years
4,761 times more data
1TB disk drive = 300 whole human genomes
Public genome repositories, such as the one maintained by the National Center for Biotechnology Information, or NCBI, already house petabytes — millions of gigabytes — of data, and biologists around the world are churning out 15 petabases (a base is a letter of DNA) of sequence per year. If these were stored on regular DVDs, the resulting stack would be 2.2 miles tall.
http://www.wired.com/2013/10/big-data-biology/
roughly 100 billion bases and millions of genes that make up the microbes found in the human body.
Life scientists are embarking on countless other big data projects, including efforts to analyze the genomes of many cancers, to map the human brain, and to develop better biofuels and other crops. (The wheat genome is more than five times larger than the human genome, and it has six copies of every chromosome to our two.)
uencing, data is generated, analyzed computationally and then those findings are “interpreted”. This process doesn’t scale when you are talking populations and begs for automation.
But unlike the computing space, where Moore’s law is our friend, in the genomics space, we are outstripping the pace of Moore’s law, which is both amazing and also creates a challenge….
This is where we need to go … but there are challenges….
Major hurdle is aggregating and analyzing the voluminous data output and interpreting it for clinical decisions.
Need streamlined, scalable processes
Data is unstructured and some of it is not easily analyzed as a result
Once aggregated, we need standards and structure to make it usable for analytics.
Data is in siloes today
Need discovery environment that support all scientific/medical experts
Need comprehensive longitudinal review of a patient record incorporating a variety of data across systems
Integrate public data with private data
Analysis is too complex
Curation is manual today, using experts that provide a qualitative “filter” to what’s relevant and what’s not, what should be dug into further
Need simplified interfaces that allow clinicians, patients to query the data and receive simplified answer sets
And it all needs to be secure, compliant and be able to be anonymized
Clinical data is highly dimensional, temporal, and disconnected from molecular data. Patient data in multiple transactional system is difficult to aggregate (average hospital EMR integration can have over 100 integration points)
Genomic information is inherently complex with most relationships not well understood
DNA/RNA, RNA/methylation, temporal, risk predisposition, etc
Genomic information is poorly available in a medical context
Like many data types, it’s inherently temporal, especially in progressive diseases like cancer
To make things worse, scientists don’t have a good understanding of how many of these different variables interact.
Discovery of novel relationships can guide deeper dives by specialists
Interactive analytics is a proven tool to enable subject matter experts
Big data analytics will expand impact by allowing better access to aggregated information
Normalization across data sets unlocks the value of data aggregation
Ontologies a required first step!
There is enormous leverage in the combination of curated private data that can be combined with normalized public information
In an effort to deal with some of these challenges, in 2012 the National Institutes of Health launched the Big Data to Knowledge Initiative (BD2K), which aims, in part, to create data sharing standards and develop data analysis tools that can be easily distributed.
Researchers studying social media networks, by contrast, know exactly what the data they are collecting means; each node in the network represents a Facebook account, for example, with links delineating friends. A gene regulatory network, which attempts to map how different genes control the expression of other genes, is smaller than a social network, with thousands rather than millions of nodes. But the data is harder to define. “The data from which we construct networks is noisy and imprecise,” said Zola. “When we look at biological data, we don’t know exactly what we are looking at yet.”
- From Wired article on big data and genomics
LOTS OF DATA AND COMPLEX DATA
The data picture progresses as populations of patients, even countries are sequenced and clinical data or phenotypic information, are collected.
The data sources are diverse and always changing/evolving: new research, patients self-reporting, lots of open text information on phenotype, hundreds of data integration points in an EMR (xrays, blood tests, etc.)
This data needs to be analyzed across massive databases for patterning. For example:
How many patients with this genotype, have this type of breast cancer and have done well on this drug?
What drug was used the most successfully for my patient who has diabetes, heart disease and this genotype?
Artificial intelligence is not “intelligent” enough to handle it
Patterning, differential analysis, are used in the computing world and can be applied to the world of genomic data
It is only through rapid evolution of analysis tools that we will be able to deliver on the promise of individualized, personalized health care, treating the specific genetic and resulting physical attributes whole-istically.
Consent
Privacy
Security
Anonymity
Personal Identifiers
HIPAA
The genomic revolution is here and we are in the fast lane on the genetic superhighway. It’s a brave new world, a genetic superhighway and at Illumina we believe it will take us far and transform our lives.