SlideShare une entreprise Scribd logo
1  sur  39
The Genetic Signature of Behavior
Vanessa Sochat
Research in Progress
April 1, 2014
Autism Spectrum Disorders
(ASD)
• $126 billion annually
• ~1% prevalence
Social deficits
Communication deficits
Repetitive behaviors
ASD
anxiety
PTSD
depression
autism
ADHD
bipolar
What causes Autism Spectrum Disorders?
Neuroimaging
Environment
Behavior
Genetics
37% heritable
MZ twins: 66% concordance, fraternel, 30%
No single SNP genome-wide significance
CNV’s: less than 1% of cases
De novo mutations: 10-20% of cases
valproic acid, rubella, infections during pregnancy,
alcohol, thalidomide, parental age, antidepressants,
something else?
aberrant functional connectivity and structure
not reproducible
biased and unreliable
“gold standard”
Research in Progress
1. Brain structure
2. Behavioral Phenotype
3. Genetic Signature of Behavior
1. Meta analysis of Brain Function
2. Gene Expression
3. Evaluation
Why is this work meaningful?
A new model of neuropsychiatric disorder based on
patterns of local brain structure
neuropsychiatric profile
brain
phenotype
cognitive
phenotype
1. Brain Structure to Predict ASD
• N=400 samples
• M=276 features
– Area
– Volume
– Curvature
– Thickness
brain
phenotype
cognitive
phenotype
2. Behavioral Phenotype
“Eye gaze score”
What is the developmental trajectory of eye gaze?
0: normal 1: aberrant
• National Database of Autism Research (NDAR)
• ~150-200 behavioral metrics
• “eye”,“gaze”: 678 questions for 22,823 subjects
cognitive
phenotype
2. Behavioral Phenotype
ASD vs. Healthy Control Eye Gaze Scores
Two Sample T-Test
t = 46.315, p-value < 2.2e-16
score
Frequency
N=22,823
autism
control
2. Behavioral Phenotype
Eye Gaze Scores by Age
age
score
2. Behavioral Phenotype
cognitive
phenotype
3. Genetic Signature of Behavior
Social deficits
Communication deficits
Repetitive behaviors
ASD
Brain Map
Meta Analysis of Brain Function
“anxiety” 525 Terms
http://vbmis.com/bmi/project/neuromap/
Gene
Expression
3. Genetic Signature of Behavior
Gene Expression
Social deficits
Communication deficits
Repetitive behaviors
ASD
Brain Map
“anxiety”
Why is this work meaningful?
Gene
Expression
Social deficits
Communication deficits
Repetitive behaviors
Brain MapBehavior• Clinical solutions:
– Autism has no drugs
– Identify genetic markers that can be detected in blood
• Genetic signature of a behavior
– Leads us closer to drug solution
– Signature indicates likelihood of drug working for
specific kind of ASD
Mapping behavior to genes
Gene
Expression
Social deficits
Communication deficits
Repetitive behaviors
Brain MapBehavior
“anxiety”
Neurosynth AllenOverlap
3. Genetic Signature of Behavior
Match points in “anxiety” map to Allen Brain Atlas
Neurosynth Allen
3. Genetic Signature of Behavior
How to find interesting genes for a behavioral map?
Sample 1
Sample 2.
.
Sample N
“anxiety”
0 0 0 0 0 0 1 0
0 1 0 0 0 0 0 1
1 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1
1 0 0 0 1 0 1 0
0 0 0 0 0 1 0 1
0 0 1 0 1 0 0 1
genes
0.25 .012 1.20
1.50 0.80 3.40
0.80 0.90 1.00
0.40 .075 0.20
1.40 0.32 4.50
0.89 0.21 2.40
0.70 0.10 1.20
genes
3. Genetic Signature of Behavior
How to find interesting genes for a behavioral map?
“anxiety”
0 0 0 0 0 0 1 0
0 1 0 0 0 0 0 1
1 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1
1 0 0 0 1 0 1 0
0 0 0 0 0 1 0 1
0 0 1 0 1 0 0 1
Samples
Gene Probes (~60K)
2 1 2 0 2 1 2 4
3. Genetic Signature of Behavior
How to find interesting genes for a behavioral map?
“anxiety”
• Assess the “relative importance” of each gene probe
to define a term
• If predictors in regression are uncorrelated,
assessing relative importance means:
3. Genetic Signature of Behavior
How to find interesting genes for a behavioral map?
Shapley Value Regression
Bigger change = more “important”
3. Genetic Signature of Behavior
How to find interesting genes for a behavioral map?
Shapley Value Regression
• Assess the “relative importance” of each gene probe
to define a term
• If predictors in regression are uncorrelated,
assessing relative importance means:
R2
% variance accounted for by model
quality of model predictors
3. Genetic Signature of Behavior
How to find interesting genes for a behavioral map?
Shapley Value Regression
• creates a score for each player in a game that
represents that player’s contribution to the total
value of the game
Attributes (genes): players
Total Value: quality of model (R2)
R2 with
attribute j
R2 without
attribute j
Shapley value
of gene j
weight based on n total
Predictors, k in model
3. Genetic Signature of Behavior
How to find interesting genes for a behavioral map?
Shapley Value Regression
• creates a score for each player in a game that
represents that player’s contribution to the total
value of the game
Attributes (genes): players
Total Value: quality of model (R2)
marginal contribution to the R2 from adding the
attribute to the model last
0 0 0
0 1 0
1 0 1
0 0 0
1 0 0
0 0 0
0 0 1
3. Genetic Signature of Behavior
How to find interesting genes for a behavioral map?
Shapley Value Regression
• Assess the “relative importance” of each gene to define a term
• Define an expression property: consistent pattern of regulation
0.25 0.12 1.20
1.50 0.80 3.40
0.80 0.90 1.00
0.40 0.75 0.20
1.40 0.32 4.50
0.89 0.21 2.40
0.70 0.10 1.20
Probes
Samples
1 0 1
0 0 0
0 0 0
0 1 0
0 0 0
1 0 1
0 0 0
Microarray Expression Condition 1 (B1) Condition 2 (B2)
3. Genetic Signature of Behavior
How do I evaluate my gene subsets?
• Gene Set Enrichment Analysis
– determines whether an a priori defined set of genes
shows statistically significant, concordant differences
between two phenotypes.
Nextbio gene expression data for ASD vs. HC
Broad Institute Drug Gene Expression Database
3. Genetic Signature of Behavior
How do I evaluate my subsets?
Gene Set Enrichment Analysis
1. Enrichment Score: the degree to which a set S is
overrepresented at the extremes of my list
2. Estimate the significance level of the scores
3. Multiple hypothesis testing
Subramanian, et. al, Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles.
PNAS 2005 102 (43) 15545-15550; published ahead of print September 30, 2005,doi:10.1073/pnas.0506580102
3. Genetic Signature of Behavior
How do I evaluate my gene subsets?
• Nextbio gene expression data for ASD vs. HC
Is actual gene expression data in ASD vs HC:
1. overexpressed for any of my behavioral term sets?
2. overexpressed for gene sets found aberrant in ASD?
3. overexpressed for any functional pathways (C2)
Analysis in Progress!
3. Genetic Signature of Behavior
How do I evaluate my gene subsets?
– Broad Institute Drug Gene Expression Database
– Daily Med
(disorders with anxiety): Adjustment Disorders Affective Disorders, Psychotic
Neurocirculatory Asthenia Obsessive-Compulsive Disorder Premenstrual
Syndrome Seasonal Affective Disorder Panic Disorder
(drugs): Meprobamate Fluvoxamine Clorazepate Dipotassium Alprazolam
Chlormezanone Trazodone Lorazepam Temazepam Amobarbital Pentobarbital
Oxazepam Secobarbital Diazepam Hydroxyzine Ritanserin Oxprenolol
Medazepam Secobarbital Diazepam Meprobamate Fluvoxamine Clorazepate
Dipotassium Pentobarbital Amobarbital Alprazolam Chlormezanone
Trazodone Lorazepam Temazepam Hydroxyzine Oxazepam Oxprenolol
Medazepam
3. Genetic Signature of Behavior
How do I evaluate my gene subsets?
– Broad Institute Connectivity map .CEL Files
• Extract Log2 transformed normalized data
• 17 cell lines, 22K probes, 5 anxiety medications
Is gene expression data in for cells exposed to drugs:
1. overexpressed for any of my behavioral term sets?
2. overexpressed for gene sets found aberrant in ASD?
3. overexpressed for any functional pathways (C2)
How to define phenotypes?
Acknowledgements
Advisors
Dennis Wall
Russ Altman
Daniel Rubin
Colleagues
Ruth O’Hara
Joachim Hallmayer
Antonio Hardan
Admin Support
Susan Aptekar
John DiMario
Mary Jeanne & Nancy
Steven Bagley
Funding
Microsoft Research
SGF and NSF
Wall Lab
Maude David
Leticia Diaz Beltran
Jena Daniels
Marlena Duda
Alex Lancaster
Jack Kosmicki
Jae Yoon-Jung
Nikhila Albert
Byron Hinebaugh
Rubin Lab
Francisco Gimenez
Rebecca Sawyer
Tiffany Ting Lu
BMI Family
Diego
Boots
Peyton
Linda
Katie
Natalie
Beth
Winn
Sarah
Emily
Jonathan
Erika and Brian & co
Luke
Sam
Thank you!
3. Genetic Signature of Behavior
How to find interesting genes for a behavioral map?
PACall.csv
Contains a present/absent flag which indicates whether the probe's
expression is well above background. It is set to 1 when both of the
following conditions are met.
1) The 2-sided t-test p-value is lower than 0.01, (indicating the mean
signal of the probe's expression is significantly different from the
corresponding background).
2) The difference between the background subtracted signal and the
background is significant (> 2.6 * background standard deviation).
• Microarray expression
• PA Call
3. Genetic Signature of Behavior
How unique are spatial maps?
1. Brain Structure to Predict ASD
• N=400
• M=276
– Area
– Volume
– Curvature
– Thickness
Correctly Classified Instances 316 79.4 %
Incorrectly Classified Instances 82 20.6 %
rh_rostralmiddlefrontal_area
rh_lateraloccipital_area
rh_lateraloccipital_thickness
rh_lingual_thickness
lh_lingual_thickness
lh_inferiortemporal_meancurv
lh_frontalpole_meancurv
Vineland_TOTAL
ADI_TOTAL_BV
ADOS_TOTAL_A
ADOS_TOTAL_B
3. Genetic Signature of Behavior
Gene Set Enrichment Analysis
1. Calculate an enrichment score (ES) that reflects the
degree to which a set S is overrepresented at the
extremes of the entire ranked list L.
2. Estimate the significance level of the ES by permuting
the phenotype labels and recomputing the ES for
permuted data  null distribution  calculate P value
3. Multiple hypothesis testing
3. Genetic Signature of Behavior
Gene Set Enrichment Analysis
Brain Structure
(Age Specific) Brain Structure to Predict ASD
age 9-18 years 18+ years
Correctly Classified 58 100%
Incorrectly Classified 0 0
Correctly Classified 69 100%
Incorrectly Classified 0 0
3. Genetic Signature of Behavior
Terms with >75% overlap
childhood : children
japanese : chinese
default : chinese
taskrelated : chinese
frequency : card
tracking : words
family : videos
default : japanese
taskrelated : japanese
taskrelated : default
2. Behavioral Phenotype
Eye Gaze Scores, Colored by Severity

Contenu connexe

En vedette

Conditions Presentation: FAS and Prenatal Drug Exposure
Conditions Presentation: FAS and Prenatal Drug ExposureConditions Presentation: FAS and Prenatal Drug Exposure
Conditions Presentation: FAS and Prenatal Drug Exposuresarahjanecalub
 
New autism spectrum disorder (asd)
New autism spectrum disorder (asd)New autism spectrum disorder (asd)
New autism spectrum disorder (asd)trhatmaker42
 
Autism in bangladesh
Autism in bangladeshAutism in bangladesh
Autism in bangladeshRAKIBDU
 
S5 jan buitelaar_adhd_asd_overlap
S5 jan buitelaar_adhd_asd_overlapS5 jan buitelaar_adhd_asd_overlap
S5 jan buitelaar_adhd_asd_overlapUtrecht
 
Autism Spectrum Disorder (ASD) Presentation
Autism Spectrum Disorder (ASD) PresentationAutism Spectrum Disorder (ASD) Presentation
Autism Spectrum Disorder (ASD) PresentationAnoudHuss
 
Autism powerpoint
Autism powerpointAutism powerpoint
Autism powerpointyobrithere
 
An Introduction to Autism
An Introduction to AutismAn Introduction to Autism
An Introduction to AutismAshraf Rahmani
 

En vedette (8)

Conditions Presentation: FAS and Prenatal Drug Exposure
Conditions Presentation: FAS and Prenatal Drug ExposureConditions Presentation: FAS and Prenatal Drug Exposure
Conditions Presentation: FAS and Prenatal Drug Exposure
 
New autism spectrum disorder (asd)
New autism spectrum disorder (asd)New autism spectrum disorder (asd)
New autism spectrum disorder (asd)
 
Autism in bangladesh
Autism in bangladeshAutism in bangladesh
Autism in bangladesh
 
S5 jan buitelaar_adhd_asd_overlap
S5 jan buitelaar_adhd_asd_overlapS5 jan buitelaar_adhd_asd_overlap
S5 jan buitelaar_adhd_asd_overlap
 
Autism pp
Autism ppAutism pp
Autism pp
 
Autism Spectrum Disorder (ASD) Presentation
Autism Spectrum Disorder (ASD) PresentationAutism Spectrum Disorder (ASD) Presentation
Autism Spectrum Disorder (ASD) Presentation
 
Autism powerpoint
Autism powerpointAutism powerpoint
Autism powerpoint
 
An Introduction to Autism
An Introduction to AutismAn Introduction to Autism
An Introduction to Autism
 

Similaire à Research in Progress April 2014

Qualifying Exam Presentation
Qualifying Exam PresentationQualifying Exam Presentation
Qualifying Exam PresentationVanessa S
 
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score RegressionPartitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score Regressionbbuliksullivan
 
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITSTHE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITSNikolaos Tselios
 
Pre-Proposal Presentation
Pre-Proposal PresentationPre-Proposal Presentation
Pre-Proposal PresentationVanessa S
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...RussellHanson
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm CrawfordSean Paul
 
Genetic mapping of behaviour and gene expression in the chicken
Genetic mapping of behaviour and gene expression in the chickenGenetic mapping of behaviour and gene expression in the chicken
Genetic mapping of behaviour and gene expression in the chickenMartin Johnsson
 
Outsmarting Smart Technology to Reclaim our Health and Focus
Outsmarting Smart Technology to Reclaim our Health and FocusOutsmarting Smart Technology to Reclaim our Health and Focus
Outsmarting Smart Technology to Reclaim our Health and FocusSharpBrains
 
Research Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsResearch Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsMelanie Swan
 
Altered proliferation and networks in neural cells derived from idiopathic au...
Altered proliferation and networks in neural cells derived from idiopathic au...Altered proliferation and networks in neural cells derived from idiopathic au...
Altered proliferation and networks in neural cells derived from idiopathic au...Masuma Sani
 
'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014Anita de Waard
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPatricia Francis-Lyon
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsElena Sügis
 
How to mea­sure and improve brain-based out­comes that mat­ter in health care
How to mea­sure and improve brain-based out­comes that mat­ter in health careHow to mea­sure and improve brain-based out­comes that mat­ter in health care
How to mea­sure and improve brain-based out­comes that mat­ter in health careSharpBrains
 
Identifying rare genetic variation in obsessive compulsive disorder
Identifying rare genetic variation in obsessive compulsive disorderIdentifying rare genetic variation in obsessive compulsive disorder
Identifying rare genetic variation in obsessive compulsive disorderTÀI LIỆU NGÀNH MAY
 
Development and validation of V-chip, a DNA microarray for explorative analys...
Development and validation of V-chip, a DNA microarray for explorative analys...Development and validation of V-chip, a DNA microarray for explorative analys...
Development and validation of V-chip, a DNA microarray for explorative analys...Roxana Hickey
 

Similaire à Research in Progress April 2014 (20)

Slides_SB3.ppt
Slides_SB3.pptSlides_SB3.ppt
Slides_SB3.ppt
 
Slides_SB3.ppt
Slides_SB3.pptSlides_SB3.ppt
Slides_SB3.ppt
 
Qualifying Exam Presentation
Qualifying Exam PresentationQualifying Exam Presentation
Qualifying Exam Presentation
 
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score RegressionPartitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
 
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITSTHE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
 
Pre-Proposal Presentation
Pre-Proposal PresentationPre-Proposal Presentation
Pre-Proposal Presentation
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm Crawford
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Genetic mapping of behaviour and gene expression in the chicken
Genetic mapping of behaviour and gene expression in the chickenGenetic mapping of behaviour and gene expression in the chicken
Genetic mapping of behaviour and gene expression in the chicken
 
El Cerebro Social por Pablo Billeke
El Cerebro Social por Pablo BillekeEl Cerebro Social por Pablo Billeke
El Cerebro Social por Pablo Billeke
 
Outsmarting Smart Technology to Reclaim our Health and Focus
Outsmarting Smart Technology to Reclaim our Health and FocusOutsmarting Smart Technology to Reclaim our Health and Focus
Outsmarting Smart Technology to Reclaim our Health and Focus
 
Research Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsResearch Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance Genomics
 
Altered proliferation and networks in neural cells derived from idiopathic au...
Altered proliferation and networks in neural cells derived from idiopathic au...Altered proliferation and networks in neural cells derived from idiopathic au...
Altered proliferation and networks in neural cells derived from idiopathic au...
 
'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014'Stories that persuade with data' - talk at CENDI meeting January 9 2014
'Stories that persuade with data' - talk at CENDI meeting January 9 2014
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
How to mea­sure and improve brain-based out­comes that mat­ter in health care
How to mea­sure and improve brain-based out­comes that mat­ter in health careHow to mea­sure and improve brain-based out­comes that mat­ter in health care
How to mea­sure and improve brain-based out­comes that mat­ter in health care
 
Identifying rare genetic variation in obsessive compulsive disorder
Identifying rare genetic variation in obsessive compulsive disorderIdentifying rare genetic variation in obsessive compulsive disorder
Identifying rare genetic variation in obsessive compulsive disorder
 
Development and validation of V-chip, a DNA microarray for explorative analys...
Development and validation of V-chip, a DNA microarray for explorative analys...Development and validation of V-chip, a DNA microarray for explorative analys...
Development and validation of V-chip, a DNA microarray for explorative analys...
 

Plus de Vanessa S

The Stories We Tell Ourselves
The Stories We Tell OurselvesThe Stories We Tell Ourselves
The Stories We Tell OurselvesVanessa S
 
Singularity Registry HPC
Singularity Registry HPCSingularity Registry HPC
Singularity Registry HPCVanessa S
 
Introduction to Singularity and Data Containers
Introduction to Singularity and Data ContainersIntroduction to Singularity and Data Containers
Introduction to Singularity and Data ContainersVanessa S
 
Research Software Engineering at Stanford University
Research Software Engineering at Stanford UniversityResearch Software Engineering at Stanford University
Research Software Engineering at Stanford UniversityVanessa S
 
Research Software Engineering at Stanford
Research Software Engineering at StanfordResearch Software Engineering at Stanford
Research Software Engineering at StanfordVanessa S
 
Adding An Operator to Airflow: A Contributor Overflow Exception
Adding An Operator to Airflow: A Contributor Overflow ExceptionAdding An Operator to Airflow: A Contributor Overflow Exception
Adding An Operator to Airflow: A Contributor Overflow ExceptionVanessa S
 
The Research Software Encyclopedia
The Research Software EncyclopediaThe Research Software Encyclopedia
The Research Software EncyclopediaVanessa S
 
The Scientific Filesystem
The Scientific FilesystemThe Scientific Filesystem
The Scientific FilesystemVanessa S
 
Singularity Containers for Scientific Compute
Singularity Containers for Scientific ComputeSingularity Containers for Scientific Compute
Singularity Containers for Scientific ComputeVanessa S
 
Laboratory of NeuroGenetics QA (8/2010)
Laboratory of NeuroGenetics QA (8/2010)Laboratory of NeuroGenetics QA (8/2010)
Laboratory of NeuroGenetics QA (8/2010)Vanessa S
 
PEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect SandwichPEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect SandwichVanessa S
 
Building Tools for Neuroimaging
Building Tools for NeuroimagingBuilding Tools for Neuroimaging
Building Tools for NeuroimagingVanessa S
 
Brain Maps like Mine
Brain Maps like MineBrain Maps like Mine
Brain Maps like MineVanessa S
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysisVanessa S
 
Subnetworks in Schizophrenia, fMRI
Subnetworks in Schizophrenia, fMRISubnetworks in Schizophrenia, fMRI
Subnetworks in Schizophrenia, fMRIVanessa S
 
Research in Progress Presentation
Research in Progress PresentationResearch in Progress Presentation
Research in Progress PresentationVanessa S
 
Introduction to Machine Learning Lecture
Introduction to Machine Learning LectureIntroduction to Machine Learning Lecture
Introduction to Machine Learning LectureVanessa S
 
Introduction to Neuroimaging Informatics
Introduction to Neuroimaging InformaticsIntroduction to Neuroimaging Informatics
Introduction to Neuroimaging InformaticsVanessa S
 
ISIS Clustering Functional Connectivity
ISIS Clustering Functional ConnectivityISIS Clustering Functional Connectivity
ISIS Clustering Functional ConnectivityVanessa S
 
Classification of Functional Networks Poster
Classification of Functional Networks PosterClassification of Functional Networks Poster
Classification of Functional Networks PosterVanessa S
 

Plus de Vanessa S (20)

The Stories We Tell Ourselves
The Stories We Tell OurselvesThe Stories We Tell Ourselves
The Stories We Tell Ourselves
 
Singularity Registry HPC
Singularity Registry HPCSingularity Registry HPC
Singularity Registry HPC
 
Introduction to Singularity and Data Containers
Introduction to Singularity and Data ContainersIntroduction to Singularity and Data Containers
Introduction to Singularity and Data Containers
 
Research Software Engineering at Stanford University
Research Software Engineering at Stanford UniversityResearch Software Engineering at Stanford University
Research Software Engineering at Stanford University
 
Research Software Engineering at Stanford
Research Software Engineering at StanfordResearch Software Engineering at Stanford
Research Software Engineering at Stanford
 
Adding An Operator to Airflow: A Contributor Overflow Exception
Adding An Operator to Airflow: A Contributor Overflow ExceptionAdding An Operator to Airflow: A Contributor Overflow Exception
Adding An Operator to Airflow: A Contributor Overflow Exception
 
The Research Software Encyclopedia
The Research Software EncyclopediaThe Research Software Encyclopedia
The Research Software Encyclopedia
 
The Scientific Filesystem
The Scientific FilesystemThe Scientific Filesystem
The Scientific Filesystem
 
Singularity Containers for Scientific Compute
Singularity Containers for Scientific ComputeSingularity Containers for Scientific Compute
Singularity Containers for Scientific Compute
 
Laboratory of NeuroGenetics QA (8/2010)
Laboratory of NeuroGenetics QA (8/2010)Laboratory of NeuroGenetics QA (8/2010)
Laboratory of NeuroGenetics QA (8/2010)
 
PEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect SandwichPEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect Sandwich
 
Building Tools for Neuroimaging
Building Tools for NeuroimagingBuilding Tools for Neuroimaging
Building Tools for Neuroimaging
 
Brain Maps like Mine
Brain Maps like MineBrain Maps like Mine
Brain Maps like Mine
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysis
 
Subnetworks in Schizophrenia, fMRI
Subnetworks in Schizophrenia, fMRISubnetworks in Schizophrenia, fMRI
Subnetworks in Schizophrenia, fMRI
 
Research in Progress Presentation
Research in Progress PresentationResearch in Progress Presentation
Research in Progress Presentation
 
Introduction to Machine Learning Lecture
Introduction to Machine Learning LectureIntroduction to Machine Learning Lecture
Introduction to Machine Learning Lecture
 
Introduction to Neuroimaging Informatics
Introduction to Neuroimaging InformaticsIntroduction to Neuroimaging Informatics
Introduction to Neuroimaging Informatics
 
ISIS Clustering Functional Connectivity
ISIS Clustering Functional ConnectivityISIS Clustering Functional Connectivity
ISIS Clustering Functional Connectivity
 
Classification of Functional Networks Poster
Classification of Functional Networks PosterClassification of Functional Networks Poster
Classification of Functional Networks Poster
 

Dernier

Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfAtiaGohar1
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosZachary Labe
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and momentdonamiaquintan2
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxJosielynTars
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 

Dernier (20)

Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenarios
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and moment
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptx
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 

Research in Progress April 2014

  • 1. The Genetic Signature of Behavior Vanessa Sochat Research in Progress April 1, 2014
  • 2. Autism Spectrum Disorders (ASD) • $126 billion annually • ~1% prevalence Social deficits Communication deficits Repetitive behaviors ASD anxiety PTSD depression autism ADHD bipolar
  • 3. What causes Autism Spectrum Disorders? Neuroimaging Environment Behavior Genetics 37% heritable MZ twins: 66% concordance, fraternel, 30% No single SNP genome-wide significance CNV’s: less than 1% of cases De novo mutations: 10-20% of cases valproic acid, rubella, infections during pregnancy, alcohol, thalidomide, parental age, antidepressants, something else? aberrant functional connectivity and structure not reproducible biased and unreliable “gold standard”
  • 4. Research in Progress 1. Brain structure 2. Behavioral Phenotype 3. Genetic Signature of Behavior 1. Meta analysis of Brain Function 2. Gene Expression 3. Evaluation
  • 5. Why is this work meaningful? A new model of neuropsychiatric disorder based on patterns of local brain structure neuropsychiatric profile brain phenotype cognitive phenotype
  • 6. 1. Brain Structure to Predict ASD • N=400 samples • M=276 features – Area – Volume – Curvature – Thickness brain phenotype cognitive phenotype
  • 7. 2. Behavioral Phenotype “Eye gaze score” What is the developmental trajectory of eye gaze? 0: normal 1: aberrant • National Database of Autism Research (NDAR) • ~150-200 behavioral metrics • “eye”,“gaze”: 678 questions for 22,823 subjects cognitive phenotype
  • 8. 2. Behavioral Phenotype ASD vs. Healthy Control Eye Gaze Scores Two Sample T-Test t = 46.315, p-value < 2.2e-16 score Frequency N=22,823 autism control
  • 9. 2. Behavioral Phenotype Eye Gaze Scores by Age age score
  • 11. 3. Genetic Signature of Behavior Social deficits Communication deficits Repetitive behaviors ASD Brain Map Meta Analysis of Brain Function “anxiety” 525 Terms http://vbmis.com/bmi/project/neuromap/
  • 12. Gene Expression 3. Genetic Signature of Behavior Gene Expression Social deficits Communication deficits Repetitive behaviors ASD Brain Map “anxiety”
  • 13. Why is this work meaningful? Gene Expression Social deficits Communication deficits Repetitive behaviors Brain MapBehavior• Clinical solutions: – Autism has no drugs – Identify genetic markers that can be detected in blood • Genetic signature of a behavior – Leads us closer to drug solution – Signature indicates likelihood of drug working for specific kind of ASD
  • 14. Mapping behavior to genes Gene Expression Social deficits Communication deficits Repetitive behaviors Brain MapBehavior “anxiety” Neurosynth AllenOverlap
  • 15. 3. Genetic Signature of Behavior Match points in “anxiety” map to Allen Brain Atlas Neurosynth Allen
  • 16. 3. Genetic Signature of Behavior How to find interesting genes for a behavioral map? Sample 1 Sample 2. . Sample N “anxiety” 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 genes 0.25 .012 1.20 1.50 0.80 3.40 0.80 0.90 1.00 0.40 .075 0.20 1.40 0.32 4.50 0.89 0.21 2.40 0.70 0.10 1.20 genes
  • 17. 3. Genetic Signature of Behavior How to find interesting genes for a behavioral map? “anxiety” 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 Samples Gene Probes (~60K) 2 1 2 0 2 1 2 4
  • 18. 3. Genetic Signature of Behavior How to find interesting genes for a behavioral map? “anxiety”
  • 19. • Assess the “relative importance” of each gene probe to define a term • If predictors in regression are uncorrelated, assessing relative importance means: 3. Genetic Signature of Behavior How to find interesting genes for a behavioral map? Shapley Value Regression Bigger change = more “important”
  • 20. 3. Genetic Signature of Behavior How to find interesting genes for a behavioral map? Shapley Value Regression • Assess the “relative importance” of each gene probe to define a term • If predictors in regression are uncorrelated, assessing relative importance means: R2 % variance accounted for by model quality of model predictors
  • 21. 3. Genetic Signature of Behavior How to find interesting genes for a behavioral map? Shapley Value Regression • creates a score for each player in a game that represents that player’s contribution to the total value of the game Attributes (genes): players Total Value: quality of model (R2) R2 with attribute j R2 without attribute j Shapley value of gene j weight based on n total Predictors, k in model
  • 22. 3. Genetic Signature of Behavior How to find interesting genes for a behavioral map? Shapley Value Regression • creates a score for each player in a game that represents that player’s contribution to the total value of the game Attributes (genes): players Total Value: quality of model (R2) marginal contribution to the R2 from adding the attribute to the model last
  • 23. 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 3. Genetic Signature of Behavior How to find interesting genes for a behavioral map? Shapley Value Regression • Assess the “relative importance” of each gene to define a term • Define an expression property: consistent pattern of regulation 0.25 0.12 1.20 1.50 0.80 3.40 0.80 0.90 1.00 0.40 0.75 0.20 1.40 0.32 4.50 0.89 0.21 2.40 0.70 0.10 1.20 Probes Samples 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 Microarray Expression Condition 1 (B1) Condition 2 (B2)
  • 24. 3. Genetic Signature of Behavior How do I evaluate my gene subsets? • Gene Set Enrichment Analysis – determines whether an a priori defined set of genes shows statistically significant, concordant differences between two phenotypes. Nextbio gene expression data for ASD vs. HC Broad Institute Drug Gene Expression Database
  • 25. 3. Genetic Signature of Behavior How do I evaluate my subsets? Gene Set Enrichment Analysis 1. Enrichment Score: the degree to which a set S is overrepresented at the extremes of my list 2. Estimate the significance level of the scores 3. Multiple hypothesis testing Subramanian, et. al, Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 2005 102 (43) 15545-15550; published ahead of print September 30, 2005,doi:10.1073/pnas.0506580102
  • 26. 3. Genetic Signature of Behavior How do I evaluate my gene subsets? • Nextbio gene expression data for ASD vs. HC Is actual gene expression data in ASD vs HC: 1. overexpressed for any of my behavioral term sets? 2. overexpressed for gene sets found aberrant in ASD? 3. overexpressed for any functional pathways (C2) Analysis in Progress!
  • 27. 3. Genetic Signature of Behavior How do I evaluate my gene subsets? – Broad Institute Drug Gene Expression Database – Daily Med (disorders with anxiety): Adjustment Disorders Affective Disorders, Psychotic Neurocirculatory Asthenia Obsessive-Compulsive Disorder Premenstrual Syndrome Seasonal Affective Disorder Panic Disorder (drugs): Meprobamate Fluvoxamine Clorazepate Dipotassium Alprazolam Chlormezanone Trazodone Lorazepam Temazepam Amobarbital Pentobarbital Oxazepam Secobarbital Diazepam Hydroxyzine Ritanserin Oxprenolol Medazepam Secobarbital Diazepam Meprobamate Fluvoxamine Clorazepate Dipotassium Pentobarbital Amobarbital Alprazolam Chlormezanone Trazodone Lorazepam Temazepam Hydroxyzine Oxazepam Oxprenolol Medazepam
  • 28. 3. Genetic Signature of Behavior How do I evaluate my gene subsets? – Broad Institute Connectivity map .CEL Files • Extract Log2 transformed normalized data • 17 cell lines, 22K probes, 5 anxiety medications Is gene expression data in for cells exposed to drugs: 1. overexpressed for any of my behavioral term sets? 2. overexpressed for gene sets found aberrant in ASD? 3. overexpressed for any functional pathways (C2) How to define phenotypes?
  • 29. Acknowledgements Advisors Dennis Wall Russ Altman Daniel Rubin Colleagues Ruth O’Hara Joachim Hallmayer Antonio Hardan Admin Support Susan Aptekar John DiMario Mary Jeanne & Nancy Steven Bagley Funding Microsoft Research SGF and NSF Wall Lab Maude David Leticia Diaz Beltran Jena Daniels Marlena Duda Alex Lancaster Jack Kosmicki Jae Yoon-Jung Nikhila Albert Byron Hinebaugh Rubin Lab Francisco Gimenez Rebecca Sawyer Tiffany Ting Lu BMI Family Diego Boots Peyton Linda Katie Natalie Beth Winn Sarah Emily Jonathan Erika and Brian & co Luke Sam
  • 31. 3. Genetic Signature of Behavior How to find interesting genes for a behavioral map? PACall.csv Contains a present/absent flag which indicates whether the probe's expression is well above background. It is set to 1 when both of the following conditions are met. 1) The 2-sided t-test p-value is lower than 0.01, (indicating the mean signal of the probe's expression is significantly different from the corresponding background). 2) The difference between the background subtracted signal and the background is significant (> 2.6 * background standard deviation). • Microarray expression • PA Call
  • 32. 3. Genetic Signature of Behavior How unique are spatial maps?
  • 33. 1. Brain Structure to Predict ASD • N=400 • M=276 – Area – Volume – Curvature – Thickness Correctly Classified Instances 316 79.4 % Incorrectly Classified Instances 82 20.6 % rh_rostralmiddlefrontal_area rh_lateraloccipital_area rh_lateraloccipital_thickness rh_lingual_thickness lh_lingual_thickness lh_inferiortemporal_meancurv lh_frontalpole_meancurv Vineland_TOTAL ADI_TOTAL_BV ADOS_TOTAL_A ADOS_TOTAL_B
  • 34. 3. Genetic Signature of Behavior Gene Set Enrichment Analysis 1. Calculate an enrichment score (ES) that reflects the degree to which a set S is overrepresented at the extremes of the entire ranked list L. 2. Estimate the significance level of the ES by permuting the phenotype labels and recomputing the ES for permuted data  null distribution  calculate P value 3. Multiple hypothesis testing
  • 35. 3. Genetic Signature of Behavior Gene Set Enrichment Analysis
  • 37. (Age Specific) Brain Structure to Predict ASD age 9-18 years 18+ years Correctly Classified 58 100% Incorrectly Classified 0 0 Correctly Classified 69 100% Incorrectly Classified 0 0
  • 38. 3. Genetic Signature of Behavior Terms with >75% overlap childhood : children japanese : chinese default : chinese taskrelated : chinese frequency : card tracking : words family : videos default : japanese taskrelated : japanese taskrelated : default
  • 39. 2. Behavioral Phenotype Eye Gaze Scores, Colored by Severity

Notes de l'éditeur

  1. Can we find the genetic signature of a behavior
  2. So if you remember my quals talk, you know that my biological problem pertains to a data driven approach to discover subtypes of autism spectrum disorder. I’ll briefly motivate this again. Cost, prevalence, emerges between 2-3 years of age. Prevalance increases 10-17%/year and it’s not due to changes in definition or diagnostic rates. We use the DSM-5 to diagnose, it’s based on behavior and clinical observations, and the problem with this approach is that it’s subject to bias, differences between clinicians, and autism is so highly heterogenous. Not only is it comorbid with all of these disorders, but what is clear that it extends far beyond being a “brain” disorder. Individuals with ASD span the gamut in terms of intellectual disability, motor coordination, attention, sleep, gastrointestinal disturbance, and then you have a small cohort that excel in visual skills, music, math and art. So a list of behavioral terms is not sufficient for early diagnosis, which is essential for treatment and better long term outcome. And you know, a big issue with autism and many neuropsychiatric disorders is that we don’t understand the underlying etiology of the disorder enough to know where to look in biology. So, of course this is a ripe area for research.
  3. Sorry internet, the answer is not vaccines. Here is a summary of what we do know. We know that the disorder influences the brain, but we can’t find a reliable biomarker. We do know there is aberrant functional connectivity, prefrontal cortex and temporal cortex, increased white and gray matter, and this all starts around 2 years of age. There are proven environmental factors that increase ASD risk, and Russ and Steven Bagley’s recent study which found a strong correlation between intellectual disability and environmental location. Behavior on its own we know tends to be biased an unreliable, but we use it a lot because it’s the gold standard Here’s the thing about genetics. We know that ASD has a genetic signature, we just haven't found it yet. Current estimates: 37% heritable. MZ twins: 66% concordance, fraternal, 30%.
  4. So now I want to talk about my research for the past 6 months, so you have context of my current “research in progress” I started by looking at pure brain structure, then developed methods to extract a behavioral phenotype, conduct meta analysis of brain function, and now my current work to identify the genetic signature of a behavior. We will talk about all of these.
  5. If we understand the genetic signature of behavior, we can look at genes and predict someone’s behavior If we find that any of the drug data is overexpressed for a set of genes, what we’ve essentially found is a set of druggable genes. Autism has no drugs. We need strategy that leads us closer to hope for clinical/drug solutions Understanding the relationship between brain, behavior in a way that leads to clinical solutions – e.g. identification of genetic markers that can be detected in the blood (that predicts early the likelihood of a behavior developing later). Signature that indicates likelihood of a drug working for a kid who will develop a specific kind of autism.
  6. I started very simply. I extracted 276 structural metrics describing thickness, curvature, volume, and area across 400 individuals, and there was beautiful clustering! The problem, of course is the same story – there is definitely aberrant structure, but because it’s so heterogenous, we don’t have any labels to validate this clustering. I was able to use these features with some behavioral traits to predict ASD with an accuracy of about 80%, but it dropped…
  7. So of course I looked for validation in behavioral data! As we talked about, we have a lot of metrics that “get at” issues with social, communication, repetitive behaviors, but research suggests that much more important is sensory aspects, such as sensitivity to sound, touch, and something like eye gaze. So since we know that eye contact is aberrant with ASD, I decided that I would try to use the NDAR database to extract “eye gaze scores.” I spent about 2 months developing infrastructure and methods to query the entire database of the National Database of Autism Research (NDAR) to develop these scores. And it’s not complicated – I used regular expressions to find any kind of word related to eye / gaze / eye contact, and then manually curated my set of questions, manually normalized them to all be between 0 and 1 with 0 indicating normal eye gaze, 1 indicated abberant, and then I would want to know if I can distinguish ASD vs HC with my scores.
  8. What about differences in age?
  9. I could also break apart my data to look at differences in age groups, and we see clear difference between all ages of HC and different ages of autism, with the worst eye contact being in infants, and then having it slowly improve over time. So this was great, I now wanted to go back to my behavioral data to see if this could explain my clustering. It couldn’t, at all. So then I looked to do validation on another data source, ABIDE, but there just isn’t the overlap of behavioral metrics to make it possible. I would do a completely new analysis finding possibly different questions, and you can’t compare apples and oranges. And in the late fall a paper came out of harvard that manually curated EVERY single question in this database for these kind of behavioral terms, so I considered myself scooped and totally dropped this research. My manual curation method would be totally infeasible for any large number of metrics, and so at this point I have future plans to use their ontology.
  10. I could also break apart my data to look at differences in age groups, and we see clear difference between all ages of HC and different ages of autism, with the worst eye contact being in infants, and then having it slowly improve over time. So this was great, I now wanted to go back to my behavioral data to see if this could explain my clustering. It couldn’t, at all. So then I looked to do validation on another data source, ABIDE, but there just isn’t the overlap of behavioral metrics to make it possible. I would do a completely new analysis finding possibly different questions, and you can’t compare apples and oranges. And in the late fall a paper came out of harvard that manually curated EVERY single question in this database for these kind of behavioral terms, so I considered myself scooped and totally dropped this research. My manual curation method would be totally infeasible for any large number of metrics, and so at this point I have future plans to use their ontology.
  11. TODO: Better look up method At this point I decided that getting scooped was terrible, and if I’m in an awesome new lab with Dennis Wall, I should try to expand my skillset beyond imaging data. I wanted to incorporate genetics somewhere in here, because heritability plays a big role. I would want to create hypotheses about the genetic signature of behavior. If we can start with behaviors that are aberrant in a disorder, find brains areas involved in the manifestation of that behavior, and then look at gene expression, we can create a hypothesized subset of genes that are implicated for the behavior, and test with actual data. Then we can predict someone’s behavior from genetics! So let’s start with our behavioral data  step 1 is to figure out what spatial areas in the brain are likely to be involved with that behavior. I used the Neurosynth API Takes as input a behavioral term, and a significance threshold Performs meta analysis to produce a set of spatial maps Extracts nonzero voxels from FDR corrected (absolute value) image --> MNI coordinates for significant spatial locations associated with term based on literature Here is a visualization of the map for the term "anxiety" - which is my first query/test term.  As we would expect, we see activation in bilateral amygdala, OFC, and insula.
  12. Now we are interested in gene expression of these areas. And now we go to the Allen Brain Atlas, which has gene expression for 3,702 spatial locations in the brain for 60K gene probes.
  13. If we understand the genetic signature of behavior, we can look at genes and predict someone’s behavior If we find that any of the drug data is overexpressed for a set of genes, what we’ve essentially found is a set of druggable genes. Autism has no drugs. We need strategy that leads us closer to hope for clinical/drug solutions Understanding the relationship between brain, behavior in a way that leads to clinical solutions – e.g. identification of genetic markers that can be detected in the blood (that predicts early the likelihood of a behavior developing later). Signature that indicates likelihood of a drug working for a kid who will develop a specific kind of autism.
  14. If we understand the genetic signature of behavior, we can look at genes and predict someone’s behavior If we find that any of the drug data is overexpressed for a set of genes, what we’ve essentially found is a set of druggable genes. Autism has no drugs. We need strategy that leads us closer to hope for clinical/drug solutions Understanding the relationship between brain, behavior in a way that leads to clinical solutions – e.g. identification of genetic markers that can be detected in the blood (that predicts early the likelihood of a behavior developing later). Signature that indicates likelihood of a drug working for a kid who will develop a specific kind of autism.
  15. My metric was simple – find the closest sample point for each point in my behavioral map, and only keep those that are 3mm or closer, because that’s the typical resolution of a voxel in neuroimaging. And you can see we have pretty good overlap.
  16. OK, so at this point we have for each term a set of sample points in the Allen Brain Atlas – now I needed to figure out my interesting subset of genes. I started by taking the entirety of the Allen Brain Atlas and putting it into BigQuery. So let’s talk about the data that I have
  17. So since this PA call matrix has a 1 to indicate expression above background across the entire brain, if I could just find the values of 1 for regions in my behavioral maps, those would be interesting. So here we have a toy example of this PA call matrix for a single behavioral term. Rows are samples, and columns are gene probes. So my first strategy was to sum over the samples, the idea being that a gene would be more relevant to a term if it’s expressed above background in more areas. So this becomes a vector of features to describe my behavioral term, and I could normalize these values to get the genes that are expressed across most of the map. However – this is misleading – because for any term, there is no one probe that is expressed sig. above background for greater than 2% of sample locations. And even if this was meaningful, I found that an arbitrary threshold at .9 still gave me 15-20K genes. That’s not a small enough subset!
  18. So since this PA call matrix has a 1 to indicate expression above background across the entire brain, if I could just find the values of 1 for regions in my behavioral maps, those would be interesting. So here we have a toy example of this PA call matrix for a single behavioral term. Rows are samples, and columns are gene probes. So my first strategy was to sum over the samples, the idea being that a gene would be more relevant to a term if it’s expressed above background in more areas. So this becomes a vector of features to describe my behavioral term, and I could normalize these values to get the genes that are expressed across most of the map. However – this is misleading – because for any term, there is no one probe that is expressed sig. above background for greater than 2% of sample locations. And even if this was meaningful, I found that an arbitrary threshold at .9 still gave me 15-20K genes. That’s not a small enough subset!
  19. A change in 1 standard unit of a coefficient == predicted change of βA units of the criterion variable Bigger changes in β == bigger changes == more “important” Take absolute value or square coefficients to deal with negatives Sum is the R2 value == quality of model predictors, % variance accounted for by model Show regression coefficients, animate larger So let’s look at attempt number 2. We are going to use shapley value regression, which is use to assess the “relative importance” of a gene probe to define a term. If all of the predictor variables in a regression model are uncorrelated with each other then assessing the relative importance of the various predictors is fairly straightforward. If we consider the standardized regression coefficients (often called Beta coefficients) their interpretation is clear. A change of 1 standard unit in the variable A will result in a predicted change of βA standard units of our criterion variable Bigger values of β mean bigger changes in our criterion. Therefore, β can be thought of as a measure of importance. We take absolute value or square to get rid of negative signs, and that’s why R2 gets at relative importance. R2 == interpreted as the percent of variance in the criterion variable that is accounted for by the model. But let’s go back to this “uncorrelated” term – yeah right! When we have correlated variables the idea of holding all constant and changing one to assess relative importance breaks down.
  20. A change in 1 standard unit of a coefficient == predicted change of βA units of the criterion variable Bigger changes in β == bigger changes == more “important” Take absolute value or square coefficients to deal with negatives Sum is the R2 value == quality of model predictors, % variance accounted for by model Show regression coefficients, animate larger So let’s look at attempt number 2. We are going to use shapley value regression, which is use to assess the “relative importance” of a gene probe to define a term. If all of the predictor variables in a regression model are uncorrelated with each other then assessing the relative importance of the various predictors is fairly straightforward. If we consider the standardized regression coefficients (often called Beta coefficients) their interpretation is clear. A change of 1 standard unit in the variable A will result in a predicted change of βA standard units of our criterion variable Bigger values of β mean bigger changes in our criterion. Therefore, β can be thought of as a measure of importance. We take absolute value or square to get rid of negative signs, and that’s why R2 gets at relative importance. R2 == interpreted as the percent of variance in the criterion variable that is accounted for by the model. But let’s go back to this “uncorrelated” term – yeah right! When we have correlated variables the idea of holding all constant and changing one to assess relative importance breaks down.
  21. So shapley value regression is creating a score for each player in a game that represents the players contribution to the total value of the game. attributes as the players and the total value of the game as the quality of the regression model or the R2
  22. So this entire dudesey, when M is the full model, is the marginal contribution to the R squared from adding the attribute to the model last. So with these shapley values we have assessed the “relative importance” of each gene. Now how does this get applied to our data?
  23. We start with microarray expression data, and we need to find genes that are associated with some “expression property” which in this case is being generally upregulated or downregulated in this set. A group of genes S⊆N which realizes the association between the expression property and the condition on a single array is called a winning coalition for that array. So I took the mean +/1- 1SD to define two new matrices, B1 and B2, B1 representing the conditoin of UP, and B2 DOWN. And we plug these two matrices into the shapley value formula, and in order to get rid of high shapley values that could be attributed to chance we did 1000 bootstrap samples and for each calculated the unadjusted p value. Then we column bind these two matrices, and use the R package multtest to do the bootstrap procedure and the result is WHAT And this was very helpful because sets of 20K genes went down to a couple of hundred.
  24. Step 1: Calculation of an Enrichment Score. We calculate an enrichment score (ES) that reflects the degree to which a set S is overrepresented at the extremes (top or bottom) of the entire ranked list L. The score is calculated by walking down the list L, increasing a running-sum statistic when we encounter a gene in S and decreasing it when we encounter genes not in S. The magnitude of the increment depends on the correlation of the gene with the phenotype. The enrichment score is the maximum deviation from zero encountered in the random walk; it corresponds to a weighted Kolmogorov–Smirnov-like statistic (ref. 7 and Fig. 1B). Step 2: Estimation of Significance Level of ES. We estimate the statistical significance (nominal P value) of the ES by using an empirical phenotype-based permutation test procedure that preserves the complex correlation structure of the gene expression data. Specifically, we permute the phenotype labels and recompute the ES of the gene set for the permuted data, which generates a null distribution for the ES. The empirical, nominal P value of the observed ES is then calculated relative to this null distribution. Importantly, the permutation of class labels preserves gene-gene correlations and, thus, provides a more biologically reasonable assessment of significance than would be obtained by permuting genes.
  25. I have THIS MANY datasets from nextbio to use for this analysis. I am interested if gene expression in ASD vs HC is overexpressed for any of my behavioral term sets, gene sets found to be aberrant in ASD, and for any functional pathways. WRITE ABOUT RESULTS? I can also take this data, filter it to only include terms in each of my subsets, and then do GSEA with the autism datasets and functional pathways database.
  26. The Broad institute has a database of gene expression for THIS MANY cell cultures exposed to different medications. The stupid web interface requires an “up” and “down” list of genes, and it’s a black box, so I decided to download their instances and do analysis on my own. I saw that I would need to look up the instances based on the medication name, so it’s a question of “which medications are relevant for, anxiety?” for example. I wrote scripts that use Daily Med to find all medications relevant to anxiety: I then could search my cmap instances for these drugs, and since a bunch of these are kind of old, I found 20 instances for 5 drugs. I used regular expressions to find them, and I’m going to go back and make sure that I haven’t missed any.
  27. I now want to look at the Connectivity Map data, which are Affymetrix files. I extracted the log2 transformed, normalized data, and decided to keep drugs separate, because just because they all treat anxiety doesn’t mean we can just assume they impact gene expression equivalently. In total I have 17 cell lines across about 22K probes.
  28. OK, so at this point we have for each term a set of sample points in the Allen Brain Atlas – now I needed to figure out my interesting subset of genes. I started by taking the entirety of the Allen Brain Atlas and putting it into BigQuery. So let’s talk about the data that I have
  29. So of course if we have similar terms, I was worried that there would be too much overlap in my sample spatial maps. So first I looked at tanimoto scores, or the Jacaard index, to assess the intersection divided by the union – scores of 1 mean perfectly the same, and 0 are different. So here are pairwise scores, and we have the terms matched to themselves over here. I also looked at this plot for each behavioral term to all others because you get the sense there are some squished similar maps here, and visually looked at all maps with scores greater than .75. There was some overlap, but I didn’t see reason at this point to artificially remove terms from the analysis.
  30. That just shows the data, but in terms of a classifier, my best performance was using ADTrees. I was able to correctly predict almost 80% of cases. And the features are what we would expect. And I sort of glossed over this, because in my mind this is not good enough. This classifier uses behavioral data, and I’m not convinced by that. Other people might have been, because there was a paper published with a 60% accuracy classifier. But honestly, why bother? When I removed the behavioral data, the accuracy dropped to about 70%. But you know, there is interesting clustering here. Can I come up with some behavioral metric to explain this?
  31. Step 1: Calculation of an Enrichment Score. We calculate an enrichment score (ES) that reflects the degree to which a set S is overrepresented at the extremes (top or bottom) of the entire ranked list L. The score is calculated by walking down the list L, increasing a running-sum statistic when we encounter a gene in S and decreasing it when we encounter genes not in S. The magnitude of the increment depends on the correlation of the gene with the phenotype. The enrichment score is the maximum deviation from zero encountered in the random walk; it corresponds to a weighted Kolmogorov–Smirnov-like statistic (ref. 7 and Fig. 1B). Step 2: Estimation of Significance Level of ES. We estimate the statistical significance (nominal P value) of the ES by using an empirical phenotype-based permutation test procedure that preserves the complex correlation structure of the gene expression data. Specifically, we permute the phenotype labels and recompute the ES of the gene set for the permuted data, which generates a null distribution for the ES. The empirical, nominal P value of the observed ES is then calculated relative to this null distribution. Importantly, the permutation of class labels preserves gene-gene correlations and, thus, provides a more biologically reasonable assessment of significance than would be obtained by permuting genes.
  32. I also have 1300…
  33. So we had some major overfitting going on here, but when I separated into groups, I could get perfect performance. This is ADTree 10 fold cross validation. However my sample sizes were also way too small. And it’s not so helpful for these two Still not great. But one thing this doesn’t account for is the huge variation that we see between different age groups. So I tried that. age groups to make a diagnosis, we need intervention in the first few years of life.
  34. I found a subset of terms with greater than 75% overlap, and manually checked them – and we can see that the spatial maps are different – but then the question is – do I really want to artificially remove terms just because they have similar spatial maps? I didn’t see any reason to.