2. DeepDream (wikipedia)
is a computer vision program
created by Google which uses a
convolutional neural network to
find and enhance patterns in
images via algorithmic
pareidolia[1], thus creating a
dream-like hallucinogenic
appearance in the deliberately
over-processed images.
A late-stage DeepDream processed photograph of three men in a pool.
[1]Pareidolia is a psychological phenomenon in which the mind responds to a stimulus (an image or a sound) by
perceiving a familiar pattern where none exists.
13. Deep Neural Networks: Algorithms that Learn
● Modernization of artificial neural networks
● Made of of simple mathematical units,
organized in layers, that together can
compute some (arbitrary) function
● more layers = deeper = more general
● Learn from raw, heterogeneous data
14. * Human Performance
based on analysis done
by Andrej Karpathy.
More details here.
Image understanding is (getting) better than human level
ImageNet Challenge: Given
an image, predict one of
1000+ of classes
%errors
15. “Given an image,
predict one of
1000+ of classes”
Image credit:
360phot0.blogspot.com
ImageNet
Challenge
16. Transfer Learning
Quickly able to Learn New Concepts
“t-rex”“quidditch”
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images 2015
17. Style Transfer
Learn features from one dataset, apply them to another
Can be done within domain:
Image Labels => New Image Classes
And between domains:
Image Features => Image Filters
Image Labels + Language Model => Image Captions
Show and Tell: A
Neural Image Caption
Generator 2015
25. Select & Recombine
Grow
Generate Marker Fingerprint
Sample tissue
Extract DNAModel Data & Identify
desirable carriers
Marker-Assisted Breeding Rapidly Increases Frequency of
Favorable Genes
26. Genomics & Genetics Problems:
How to Start Applying DNNs?
Must-haves for deep learning:
● Lots of data: >50k examples, >1M examples ideal
● High-quality input and labels for training
● Label ~ F(data) unknown but certainly function exists
● High-quality prev. efforts so we know that DNNs are key
○ i.e. hard to solve with classical statistical
approaches
SNP and indel calling from NGS data
28. Verily | Confidential & Proprietary
... but lots of places in the genome are difficult
29. Creating a universal SNP and small indel
variant caller with deep neural networks
Ryan Poplin, Cory McLean, Dan Newburger, Jojo Dijamco, Nam Nguyen, Dion Loy,
Sam Gross, Madeleine Cule, Peyton Greenside, Justin Zook, Marc Salit, Mark
DePristo, Verily Life Sciences, October 2016
30. DNN (Inception V3) Predicts True Genotype from Pileup Images
{ 0.001, 0.994, 0.005 }
{ 0.001, 0.990, 0.009 }
{ 0.000, 0.001, 0.999 }
{ 0.600, 0.399, 0.001 }
Output:
Probability of diploid
genotype states
{ HOM_REF, HET, HOM_VAR }
Raw pixels
Input:
Millions of labeled pileup
images from gold standard
samples
31. Verily | Confidential & Proprietary
Using deep learning for ultra-accurate mutation detection
Input:
Millions of labeled
pileup image
stacks from gold
standard sample
Raw pixels
{ 0.001, 0.994, 0.005 }
{ 0.001, 0.990, 0.009 }
{ 0.000, 0.001, 0.999 }
{ 0.600, 0.399, 0.001 }
Output:
Probability distribution
over the three diploid
genotype states
{ HOM_REF, HET, HOM_VAR }
31
32. Verily | Confidential & Proprietary
Example DNA read pileup “images”
true snps true indels false variants
red = {A,C,G,T}. green = {quality score}. blue = {read strand}.
alpha = {matches ref genome}.
33. Verily | Confidential & Proprietary
PrecisionFDA: unique opportunity with blinded truth sample
NA12878
35. Select & Recombine
Grow
Generate Marker Fingerprint
Sample tissue
Extract DNAModel Data & Identify
desirable carriers
Marker-Assisted Breeding Rapidly Increases Frequency of
Favorable Genes
DNA sequencing is no
longer the bottleneck...
36. Select & Recombine
Grow
Generate Marker Fingerprint
Sample tissue
Extract DNAModel Data & Identify
desirable carriers
Marker-Assisted Breeding Rapidly Increases Frequency of
Favorable Genes
Leading to increased
investment in
machine learning DNA sequencing is no
longer the bottleneck...
37. Select & Recombine
Grow
Generate Marker Fingerprint
Sample tissue
Extract DNAModel Data & Identify
desirable carriers
Marker-Assisted Breeding Rapidly Increases Frequency of
Favorable Genes
Increased investment
in machine
learning…
...requires more data and other data types
42. Bootstrapping a Virtuous Cycle
● Increased profit (from risk modeling) leads to increased investment
and risk reduction in the form of:
● More accurate forecasting / engineering of climate
○ Collect & model more meteorological data
● Development of crop varieties to complement future terrestrial /
climate conditions
● High-precision placement and monitoring of individual plants
○ Autonomous planting
○ remote sensing
47. Mapping the Diversity of Maize Races in Mexico
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0114657
48.
49.
50.
51.
52. Why Cannabis?
● Intellectual Property - No patented genes or strains… yet
● Update Mar 18, 2017: US PTO issues trademark for Gorilla Glue #4
● Production - Breeding is highly fragmented… for now
● However, unclear that breeding will centralize due to cheap DNA
sequencing and digital phenotyping
● Distribution (Growing) - Most likely to centralize due to economies of
scale (e.g. multi-tenant greenhouses), and already crowded, wtf?
● Market Access - Unclear that this is a viable segment of supply chain
(see GG#4 above). Also self-replication property of plants...
53. Why Cannabis?
● Intellectual Property - No patented genes or strains… yet
● Update Mar 18, 2017: US PTO issues trademark for Gorilla Glue #4
● Production - Breeding is highly fragmented… for now
● However, unclear that breeding will centralize due to cheap DNA
sequencing and digital phenotyping
● Distribution (Growing) - Most likely to centralize due to economies of
scale (e.g. multi-tenant greenhouses), and already crowded, wtf?
● Market Access - Unclear that this is a viable segment of supply chain
(see GG#4 above). Also self-replication property of plants...
● Threat: does Cannabis become like Yogurt starter kits?
54. Cannabis Genomics @ Google Cloud
https://cloud.google.com/bigquery/public-data/1000-cannabis
55.
56. Build What’s Next
Thank You!
Allen Day, PhD // Science Advocate // @allenday // #genomics #ml #datascience