SlideShare a Scribd company logo
1 of 18
Big Data, Bioscience
and the Cloud
Dan Sullivan
June 25, 2015
BioCatalyst: Cloud Computing in Bioscience
Oregon Bioscience Association
Overview
• Background
• Varieties of Big Data in Bioscience
• Continuous learning about Big Data & Cloud
• Making Connections
My Background
 Data Architect / Engineer
 NoSQL and relational data modeler
 Big data
 Analytics, machine learning and text mining
 Cloud computing
 Computational Biologist
 Author
 No SQL for Mere Mortals
 Contributor to TechTarget
Overview
• Background
• Varieties of Big Data in Bioscience
• Continuous learning about Big Data & Cloud
• Making Connections
Big Data Challenges in Bioscience
Volume
Velocity
Variety
Integration
Varieties of Big Data in
Bioscience
Subcellular – Genetics and
Proteomics
Cellular – Metabolic and
Signaling Pathways
Organism – Disease, Medicine,
Insurance
Populations – Epidemiology,
Social Networks
Genetics and Proteomics
• Genetic Sequencing
• Order of nucleotides in DNA
• Most DNA is common across species
• Many genes code proteins
• Some variants associated with disease
• Which ones?
• Proteomics
• Structure and function of proteins
• Variation in protein sequence and
structure associated with disease
• Which ones? In what context?
Images: http://www.masimo.it/hemoglobin/anemia.htm, https://en.wikipedia.org/wiki/DNA
Pathways
• Metabolic Pathways
• Series of chemical reactions
• Coordinated to produce
reactants
• Choreography of molecules
• Signaling Pathways
• Molecules on cell surface detect
changes in environment
• Cascade of reactions to change
state of cell
• Choreography of molecules
• How do they interact?
• Early 1950s
Korean War
autopsies
2012-2016 Genomic and Proteomic Studies
1985-1998 Pathology Studies - Pathodeterminants of
Atherosclerosis in Youth (PDAY) study
Disease - Atherosclerosis
Healthcare
• Genetics and Disease
• Post-Approval Drug Efficacy
• Discovering and Retrieving Medical
Information
• Comparative Quality
Populations
• Infectious Disease Spread
• How fast will disease spread?
• What countermeasures are
effective?
• What is the morbidity and
mortality?
• Simulation
– Synthetic population
– Model interactions
– Probabilistic
Why Cloud for Big Data in
BioScience?
• Scalability
• Access to compute and memory optimized
virtual machines
• Virtually unlimited storage
• Speed
• Many bioscience computations highly
parallel
• Minimize time to analyze, lower IT
overhead
• Cost
• AWS Spot Instances
• Google Pre-emptible VMs
Overview
• Background
• Varieties of Big Data in Bioscience
• Continuous learning about Big Data & Cloud
• Making Connections
Continuous Learning
• Coursera
• Cloud Computing Concepts
• Bioinformatics: Life Sciences on Your
Computer
• edX
• Introduction to Statistics
• Introduction to Biology
• Principles of Biochemistry
• Rackspace CloudU
• You Tube
• Big Data Vendors
• MapR
• Cloudera
• HortonWorks
• DataStax
• Data Bricks
• Trade Publications
– TechTarget
• SearchAWS
• SearchCloudComputing
• SearchCloudSecurity
– Health Data Management
– Harvard Business Review
Overview
• Background
• Varieties of Big Data in Bioscience
• Continuous learning about Big Data & Cloud
• Making Connections
LinkedIn Groups
Final Thoughts
• Great time to get into Biosciences and
Big Data
• Don’t be intimidated if it’s been a
while since you’ve studied biology –
we are all constantly learning in this
field
• Network online and in person
• Take advantage of free resources
• Courses
• Cloud
• AWS Free Tier
• MAPR Hadoop On Demand Training
• Connect with me on LinkedIn
• https://www.linkedin.com/in/dansull
ivanpdx
• Join me at a Meetup
• Dan.sullivan@cambiahealth.com

More Related Content

What's hot

Science Distributed's Chain Event: Distributed Science Pilot - Lauren Long
Science Distributed's Chain Event: Distributed Science Pilot - Lauren LongScience Distributed's Chain Event: Distributed Science Pilot - Lauren Long
Science Distributed's Chain Event: Distributed Science Pilot - Lauren LongSean Manion PhD
 
Validating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniquesValidating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniquesEagle Genomics
 
Expert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchExpert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchEagle Genomics
 
Smartness in Today’s healthcare applications
Smartness in Today’s healthcare applicationsSmartness in Today’s healthcare applications
Smartness in Today’s healthcare applicationsDr. Shivananda Koteshwar
 
PerkinElmer Informatics Overview
PerkinElmer Informatics OverviewPerkinElmer Informatics Overview
PerkinElmer Informatics OverviewPerkinElmer, Inc.
 
Berzinski Writing Sample7-091023
Berzinski Writing Sample7-091023Berzinski Writing Sample7-091023
Berzinski Writing Sample7-091023pberzins
 
Data Con LA 2018 Keynote - Better Collaborative Data Science by Megan Risdal
Data Con LA 2018 Keynote - Better Collaborative Data  Science by Megan RisdalData Con LA 2018 Keynote - Better Collaborative Data  Science by Megan Risdal
Data Con LA 2018 Keynote - Better Collaborative Data Science by Megan RisdalData Con LA
 
Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13DataDryad
 
Beacon: A Protocol for Federated Discovery and Sharing of Genomic Data
Beacon: A Protocol for Federated Discovery and Sharing of Genomic DataBeacon: A Protocol for Federated Discovery and Sharing of Genomic Data
Beacon: A Protocol for Federated Discovery and Sharing of Genomic DataMiro Cupak
 
Beacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingBeacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingMiro Cupak
 
Beacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingBeacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingMiro Cupak
 
Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...Bryan Beecher
 
Caris Life Sciences
Caris Life SciencesCaris Life Sciences
Caris Life SciencesKim Kozlik
 
GENOME DATA ANALYSIS
GENOME DATA ANALYSISGENOME DATA ANALYSIS
GENOME DATA ANALYSISAmeldaAkoijam
 

What's hot (15)

Science Distributed's Chain Event: Distributed Science Pilot - Lauren Long
Science Distributed's Chain Event: Distributed Science Pilot - Lauren LongScience Distributed's Chain Event: Distributed Science Pilot - Lauren Long
Science Distributed's Chain Event: Distributed Science Pilot - Lauren Long
 
Validating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniquesValidating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniques
 
Expert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchExpert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational Research
 
Smartness in Today’s healthcare applications
Smartness in Today’s healthcare applicationsSmartness in Today’s healthcare applications
Smartness in Today’s healthcare applications
 
PerkinElmer Informatics Overview
PerkinElmer Informatics OverviewPerkinElmer Informatics Overview
PerkinElmer Informatics Overview
 
Berzinski Writing Sample7-091023
Berzinski Writing Sample7-091023Berzinski Writing Sample7-091023
Berzinski Writing Sample7-091023
 
Data Con LA 2018 Keynote - Better Collaborative Data Science by Megan Risdal
Data Con LA 2018 Keynote - Better Collaborative Data  Science by Megan RisdalData Con LA 2018 Keynote - Better Collaborative Data  Science by Megan Risdal
Data Con LA 2018 Keynote - Better Collaborative Data Science by Megan Risdal
 
Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13
 
Beacon: A Protocol for Federated Discovery and Sharing of Genomic Data
Beacon: A Protocol for Federated Discovery and Sharing of Genomic DataBeacon: A Protocol for Federated Discovery and Sharing of Genomic Data
Beacon: A Protocol for Federated Discovery and Sharing of Genomic Data
 
Beacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingBeacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data Sharing
 
Beacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingBeacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data Sharing
 
Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...
 
Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...
 
Caris Life Sciences
Caris Life SciencesCaris Life Sciences
Caris Life Sciences
 
GENOME DATA ANALYSIS
GENOME DATA ANALYSISGENOME DATA ANALYSIS
GENOME DATA ANALYSIS
 

Similar to Big data, bioscience and the cloud biocatalyst june 2015 sullivan

Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Database technologies in bioinformatics
Database technologies in bioinformaticsDatabase technologies in bioinformatics
Database technologies in bioinformaticsGleb Sklyr
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingDenodo
 
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)Erich Gombocz
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Amazon Web Services
 
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...Warren Kibbe
 
Next Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemNext Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemSubhendu Dey
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
Information Technology and Radiology: challenges and future perspectives
Information Technology and Radiology: challenges and future perspectivesInformation Technology and Radiology: challenges and future perspectives
Information Technology and Radiology: challenges and future perspectivesErik R. Ranschaert, MD, PhD
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
Vph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_finalVph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_finalNour Shublaq
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?Al Dossetter
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsPerficient, Inc.
 

Similar to Big data, bioscience and the cloud biocatalyst june 2015 sullivan (20)

Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Database technologies in bioinformatics
Database technologies in bioinformaticsDatabase technologies in bioinformatics
Database technologies in bioinformatics
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes Biobanking
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
 
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
 
Next Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemNext Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problem
 
Hadoop Enabled Healthcare
Hadoop Enabled HealthcareHadoop Enabled Healthcare
Hadoop Enabled Healthcare
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Information Technology and Radiology: challenges and future perspectives
Information Technology and Radiology: challenges and future perspectivesInformation Technology and Radiology: challenges and future perspectives
Information Technology and Radiology: challenges and future perspectives
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
Big data analystics
Big data analysticsBig data analystics
Big data analystics
 
Vph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_finalVph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_final
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
 
How to Architect Smarter Systems for Healthcare
How to Architect Smarter Systems for HealthcareHow to Architect Smarter Systems for Healthcare
How to Architect Smarter Systems for Healthcare
 

More from Dan Sullivan, Ph.D.

How to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryHow to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryDan Sullivan, Ph.D.
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?Dan Sullivan, Ph.D.
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery MLDan Sullivan, Ph.D.
 
Google Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningGoogle Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningDan Sullivan, Ph.D.
 
Unstructured text to structured data
Unstructured text to structured dataUnstructured text to structured data
Unstructured text to structured dataDan Sullivan, Ph.D.
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupDan Sullivan, Ph.D.
 
ACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyDan Sullivan, Ph.D.
 
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyDan Sullivan, Ph.D.
 
Modeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key PatternsModeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key PatternsDan Sullivan, Ph.D.
 
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Dan Sullivan, Ph.D.
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesDan Sullivan, Ph.D.
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsDan Sullivan, Ph.D.
 

More from Dan Sullivan, Ph.D. (13)

How to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryHow to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQuery
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery ML
 
Google Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningGoogle Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine Learning
 
Unstructured text to structured data
Unstructured text to structured dataUnstructured text to structured data
Unstructured text to structured data
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetup
 
Text mining meets neural nets
Text mining meets neural netsText mining meets neural nets
Text mining meets neural nets
 
ACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False Dichotomy
 
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
 
Modeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key PatternsModeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key Patterns
 
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious Diseases
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
 

Recently uploaded

在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证nhjeo1gg
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一F La
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 

Recently uploaded (20)

在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 

Big data, bioscience and the cloud biocatalyst june 2015 sullivan

  • 1. Big Data, Bioscience and the Cloud Dan Sullivan June 25, 2015 BioCatalyst: Cloud Computing in Bioscience Oregon Bioscience Association
  • 2. Overview • Background • Varieties of Big Data in Bioscience • Continuous learning about Big Data & Cloud • Making Connections
  • 3. My Background  Data Architect / Engineer  NoSQL and relational data modeler  Big data  Analytics, machine learning and text mining  Cloud computing  Computational Biologist  Author  No SQL for Mere Mortals  Contributor to TechTarget
  • 4. Overview • Background • Varieties of Big Data in Bioscience • Continuous learning about Big Data & Cloud • Making Connections
  • 5. Big Data Challenges in Bioscience Volume Velocity Variety Integration
  • 6. Varieties of Big Data in Bioscience Subcellular – Genetics and Proteomics Cellular – Metabolic and Signaling Pathways Organism – Disease, Medicine, Insurance Populations – Epidemiology, Social Networks
  • 7. Genetics and Proteomics • Genetic Sequencing • Order of nucleotides in DNA • Most DNA is common across species • Many genes code proteins • Some variants associated with disease • Which ones? • Proteomics • Structure and function of proteins • Variation in protein sequence and structure associated with disease • Which ones? In what context? Images: http://www.masimo.it/hemoglobin/anemia.htm, https://en.wikipedia.org/wiki/DNA
  • 8. Pathways • Metabolic Pathways • Series of chemical reactions • Coordinated to produce reactants • Choreography of molecules • Signaling Pathways • Molecules on cell surface detect changes in environment • Cascade of reactions to change state of cell • Choreography of molecules • How do they interact?
  • 9. • Early 1950s Korean War autopsies 2012-2016 Genomic and Proteomic Studies 1985-1998 Pathology Studies - Pathodeterminants of Atherosclerosis in Youth (PDAY) study Disease - Atherosclerosis
  • 10. Healthcare • Genetics and Disease • Post-Approval Drug Efficacy • Discovering and Retrieving Medical Information • Comparative Quality
  • 11. Populations • Infectious Disease Spread • How fast will disease spread? • What countermeasures are effective? • What is the morbidity and mortality? • Simulation – Synthetic population – Model interactions – Probabilistic
  • 12. Why Cloud for Big Data in BioScience? • Scalability • Access to compute and memory optimized virtual machines • Virtually unlimited storage • Speed • Many bioscience computations highly parallel • Minimize time to analyze, lower IT overhead • Cost • AWS Spot Instances • Google Pre-emptible VMs
  • 13. Overview • Background • Varieties of Big Data in Bioscience • Continuous learning about Big Data & Cloud • Making Connections
  • 14. Continuous Learning • Coursera • Cloud Computing Concepts • Bioinformatics: Life Sciences on Your Computer • edX • Introduction to Statistics • Introduction to Biology • Principles of Biochemistry • Rackspace CloudU • You Tube • Big Data Vendors • MapR • Cloudera • HortonWorks • DataStax • Data Bricks • Trade Publications – TechTarget • SearchAWS • SearchCloudComputing • SearchCloudSecurity – Health Data Management – Harvard Business Review
  • 15. Overview • Background • Varieties of Big Data in Bioscience • Continuous learning about Big Data & Cloud • Making Connections
  • 17.
  • 18. Final Thoughts • Great time to get into Biosciences and Big Data • Don’t be intimidated if it’s been a while since you’ve studied biology – we are all constantly learning in this field • Network online and in person • Take advantage of free resources • Courses • Cloud • AWS Free Tier • MAPR Hadoop On Demand Training • Connect with me on LinkedIn • https://www.linkedin.com/in/dansull ivanpdx • Join me at a Meetup • Dan.sullivan@cambiahealth.com

Editor's Notes

  1. Projects with any two of these can probably be well handled by RDBMS. When all three are encountered in one project, NoSQL can often provide better performance with different levels of support for Consistency, Availability and network Partitioning (CAP Theorem)
  2. Autopsies performed during Korean War found evidence of early on set athero. Not enough time for lifestyle factors, such as high fat diet, smoking and inactivity to be sole cause of plague. Hypothesis – genetic factor influencing athero. PDAY – confirmed and expanded on earlier findings. Large collaboration of pathologists collected samples from young people who died of non-cardiovascular causes. 3,000 autopsies 15-34 year olds Aorta and LAD samples preserved in fixed formalin, paraffin embedded blocks. Liver samples also collected. GPAA - Use liver samples to sequence genomes. Proteomics collaborators have developed techniques for extracting proteins from old FFPE blocks. Makes genomic and proteomics analysis possible today.