SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
#BeyondTheBench	
  
#BECareer2013	
  
#CurrentExchange	
  

ORGANIZERS:

SPONSORS:
ur
yo
ing ce
lish sen il
tab pre oukhal
Es ne
b
er t A
Ro b
nli
o
?
Why

You’r
e

bein

g Go

ogle

d
inke

#1: L

dIn
Why LinkedIn?
•  Online CV + networking
•  Recruiters use LinkedIn
•  Find jobs posted on LinkedIn
•  Apply to jobs
www.linkedin.com/pub/robert-aboukhalil/84/a648df/
#2: F
aceb
o

ok
#3: T

witte
r
#4: Y
our w
ebsi
te
Step 1: Wordpress.com
Step 1: Wordpress.com
Step 2: themeforest.net
Step 2: themeforest.net
Step 3: Have an awesome portfolio
Now

what
?
A language all scientists should know
How R helped me look at billions of genotypes and how it can
help you too
Mitchell Bekritsky
WSBS Graduate Student
What is R?
•  Language for statistical
analysis, data manipulation
and graphics
•  Open source
•  Flexible language
•  Powerful built-in functions
•  Strong user community
•  Publication quality graphs
•  Free!

Graphic	
  from	
  h=p://blenditbayes.blogspot.com/2013/06/visualising-­‐crime-­‐hotspots-­‐in-­‐england_25.html	
  
Who uses R?

Source:	
  h=p://www.revoluKonanalyKcs.com/what-­‐is-­‐open-­‐source-­‐r/companies-­‐using-­‐r.php	
  
What is R used for?
•  Movie recommendations

•  Clinical drug development

•  Credit risk analysis

•  News graphics

•  Tailoring online advertising

•  Modeling oil spills

•  Predicting economic activity

•  Predicting election outcomes

Graphic	
  from	
  h=p://www.nyKmes.com/interacKve/2009/06/25/arts/0625-­‐jackson-­‐graphic.html	
  
But I’m a biologist…
How R helped me see my data
•  First time looking at microsatellite genotypes
•  How many microsatellites differ from reference genome?
•  By how much?
Problems:
–  Lots of data (4.7 million genotypes)
–  Complex information
–  Too big for Excel
–  No good graphics in Excel either
One of my first graphs in R
Lessons learned about my data
•  Lots of microsatellites differ
from reference by a little bit
•  Thousands differ by ± 20 bp
•  8.27% of all microsatellites
differ from reference (~400k)
Lessons learned about my graph
•  This is a terrible graph
A bad R graph is better than no R graph
Bad graphs helped me
•  Understand my data better
•  Improve my analyses
•  Improve how I communicate
my data
•  R has incredible flexibility for
graphing—if you can dream it,
you can probably build it
A bad R graph is better than no R graph
Bad graphs helped me
•  Understand my data better
•  Improve my analyses
•  Improve how I communicate
my data
•  R has incredible flexibility for
graphing—if you can dream it,
you can probably build it

My best R graphs make one point clearly without clutter
For example…
How R saved my thesis
•  Processing lots of sequencing
data in hundreds of people
•  Too many people and
processes to monitor all steps
of pipeline by eye while data
was being processed
Sanity check
•  After data processing did data
look bi-allelic?
How R saved my thesis
•  Processing lots of sequencing
data in hundreds of people
•  Too many people and
processes to monitor all steps
of pipeline by eye while data
was being processed
Sanity check
•  After data processing did data
look bi-allelic?

No!!	
  
Troubleshooting using R
•  People don’t actually have massive deletions and amplifications
•  My pipeline was deleting files because of a bug, which would
remove large chunks of chromosomes
•  Thanks to R, I found people where this had happened, tracked
down the bug, and didn’t report massive CNVs in autistic children
Side note
•  If it looks too good to be true, it probably is
R helped me build a better genotyper
•  Some non-reference alleles
aren’t covered well
•  Leads to incorrect genotype
calls
Problem
•  How do I develop a smarter
genotyper and know that it
works?
R helped me build a better genotyper
•  Some non-reference alleles
chr19:54772760 A repeat, reference length 8

aren’t covered well

Genotypes
100

•  Leads to incorrect genotype

works?

60
40
20
0

genotyper and know that it

10 bp allele coverage

•  How do I develop a smarter

80

calls
Problem

10|-1
10|10
8|-1
8|10
8|8

0

20

40

60

8 bp allele coverage

80

100
Modeling genotypes in R
•  Built a model for biased
genotypes in R
•  Model helped me build a more
accurate genotyper
•  When applied to real data,
clear improvements
R finds de novo mutations for me
•  >300 million genotypes
•  How do I find de novo mutations in all that data?

R to the rescue!
What R has done for me
Data mining
• 

Finding de novo mutations

• 

Quality control for my data

Data manipulation
• 

Converting raw read counts to genotypes

Data simulation and modeling
• 

Finding ways to improve my genotyper

Data visualization
R has extensive support for biologists
Bioconductor is an incredible resource for biological analyses in R
•  Microarrays
•  Differential expression (DESeq, edgeR, cummeRbund)
•  Gene models
•  Flow cytometry (flowCore, flowStats, flowViz)
•  Interacting with Ensembl, Cosmic, Gramene, etc. (biomaRt)
Installing R
•  R can be downloaded from rproject.org
•  R runs on PCs, Macs and
Linux computers
•  The R project website has an
R manual to get you started
Working in R
Native R interface can be hard to
work with
•  Lots of windows
•  Difficult to keep things
organized
RStudio interface
•  All your variables, help pages,
script windows and consoles
in one place
•  Highlights R code for easier
programming
•  Tabbed windows for multiple
scripts
•  History saves all previous
commands, plot history saves
all previous plots
•  Find it at rstudio.com
Learning R
Many online tutorials
•  R has its own introduction
•  Statistics Using R with Biological Examples
Take interesting data, use it to explore R
•  Plot, graph, use statistical tests
Ask someone who knows R
•  Getting started is pretty easy
•  Learn what you need when you need it
Thanks!!
The Bioscience Entreprise Club is dedicated to helping CSHL’s science research
professionals and alumni cultivate and leverage their cross-disciplinary skill sets and
expertise to transition into diverse careers.
Current Exchange is CSHL’s very own student-run magazine. We feature articles about
science aimed at a general audience. Check out our inaugural issue at issuu.com/
currentexchange
Send your articles to raboukha@cshl.edu by November 5, 2013	
  

Contenu connexe

En vedette

Research task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spreadResearch task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spreadasmediae15
 
Segunda parte movilidad no motorizada
Segunda parte movilidad no motorizadaSegunda parte movilidad no motorizada
Segunda parte movilidad no motorizadaRodolfo Moran
 
L’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTechL’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTechFrédérique Bordignon
 
в большинстве случаев
в большинстве случаевв большинстве случаев
в большинстве случаевyogatherapia
 
Vmware desktop infrastructure virtualization assessment
Vmware  desktop infrastructure virtualization assessmentVmware  desktop infrastructure virtualization assessment
Vmware desktop infrastructure virtualization assessmentsolarisyougood
 
Візитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.РоманіваВізитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.РоманіваVictor Kravtsov
 
Chipping away at healthcare special interests yet
Chipping away at healthcare special interests yetChipping away at healthcare special interests yet
Chipping away at healthcare special interests yetWayne Caswell
 
Toxicology-History
Toxicology-HistoryToxicology-History
Toxicology-Historytmondol
 
Modelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoesModelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoesacajado
 

En vedette (13)

Blog
BlogBlog
Blog
 
Research task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spreadResearch task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spread
 
Segunda parte movilidad no motorizada
Segunda parte movilidad no motorizadaSegunda parte movilidad no motorizada
Segunda parte movilidad no motorizada
 
Krishnan Kameshwaran-Resume_
Krishnan Kameshwaran-Resume_Krishnan Kameshwaran-Resume_
Krishnan Kameshwaran-Resume_
 
Claroline
ClarolineClaroline
Claroline
 
L’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTechL’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTech
 
в большинстве случаев
в большинстве случаевв большинстве случаев
в большинстве случаев
 
Vmware desktop infrastructure virtualization assessment
Vmware  desktop infrastructure virtualization assessmentVmware  desktop infrastructure virtualization assessment
Vmware desktop infrastructure virtualization assessment
 
Візитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.РоманіваВізитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.Романіва
 
La litosfera
La litosferaLa litosfera
La litosfera
 
Chipping away at healthcare special interests yet
Chipping away at healthcare special interests yetChipping away at healthcare special interests yet
Chipping away at healthcare special interests yet
 
Toxicology-History
Toxicology-HistoryToxicology-History
Toxicology-History
 
Modelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoesModelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoes
 

Similaire à Beyond The Bench Workshops

Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkAdaryl "Bob" Wakefield, MBA
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Rehgan Avon
 
Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...kristgen
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Hofstra University - Overview of Big Data
Hofstra University - Overview of Big DataHofstra University - Overview of Big Data
Hofstra University - Overview of Big Datasarasioux
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeAdel Rahimi
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017philippbayer
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)Yuan Chuan Kee
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricEdward Baker
 
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesDigital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesShawn Day
 
R programming language - Mustafa Wahedi
R programming language - Mustafa WahediR programming language - Mustafa Wahedi
R programming language - Mustafa WahediUNICORNS IN TECH
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big DataMiguel Pastor
 

Similaire à Beyond The Bench Workshops (20)

Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
 
Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Hofstra University - Overview of Big Data
Hofstra University - Overview of Big DataHofstra University - Overview of Big Data
Hofstra University - Overview of Big Data
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017
 
2013 arizona-swc
2013 arizona-swc2013 arizona-swc
2013 arizona-swc
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metric
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
 
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesDigital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
 
Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!
 
R programming language - Mustafa Wahedi
R programming language - Mustafa WahediR programming language - Mustafa Wahedi
R programming language - Mustafa Wahedi
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big Data
 

Dernier

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Beyond The Bench Workshops

  • 2.
  • 3. ur yo ing ce lish sen il tab pre oukhal Es ne b er t A Ro b nli o
  • 6. Why LinkedIn? •  Online CV + networking •  Recruiters use LinkedIn •  Find jobs posted on LinkedIn •  Apply to jobs
  • 8.
  • 10.
  • 17. Step 3: Have an awesome portfolio
  • 19.
  • 20.
  • 21. A language all scientists should know How R helped me look at billions of genotypes and how it can help you too Mitchell Bekritsky WSBS Graduate Student
  • 22. What is R? •  Language for statistical analysis, data manipulation and graphics •  Open source •  Flexible language •  Powerful built-in functions •  Strong user community •  Publication quality graphs •  Free! Graphic  from  h=p://blenditbayes.blogspot.com/2013/06/visualising-­‐crime-­‐hotspots-­‐in-­‐england_25.html  
  • 23.
  • 24. Who uses R? Source:  h=p://www.revoluKonanalyKcs.com/what-­‐is-­‐open-­‐source-­‐r/companies-­‐using-­‐r.php  
  • 25. What is R used for? •  Movie recommendations •  Clinical drug development •  Credit risk analysis •  News graphics •  Tailoring online advertising •  Modeling oil spills •  Predicting economic activity •  Predicting election outcomes Graphic  from  h=p://www.nyKmes.com/interacKve/2009/06/25/arts/0625-­‐jackson-­‐graphic.html  
  • 26. But I’m a biologist…
  • 27. How R helped me see my data •  First time looking at microsatellite genotypes •  How many microsatellites differ from reference genome? •  By how much? Problems: –  Lots of data (4.7 million genotypes) –  Complex information –  Too big for Excel –  No good graphics in Excel either
  • 28. One of my first graphs in R Lessons learned about my data •  Lots of microsatellites differ from reference by a little bit •  Thousands differ by ± 20 bp •  8.27% of all microsatellites differ from reference (~400k) Lessons learned about my graph •  This is a terrible graph
  • 29. A bad R graph is better than no R graph Bad graphs helped me •  Understand my data better •  Improve my analyses •  Improve how I communicate my data •  R has incredible flexibility for graphing—if you can dream it, you can probably build it
  • 30. A bad R graph is better than no R graph Bad graphs helped me •  Understand my data better •  Improve my analyses •  Improve how I communicate my data •  R has incredible flexibility for graphing—if you can dream it, you can probably build it My best R graphs make one point clearly without clutter
  • 32. How R saved my thesis •  Processing lots of sequencing data in hundreds of people •  Too many people and processes to monitor all steps of pipeline by eye while data was being processed Sanity check •  After data processing did data look bi-allelic?
  • 33. How R saved my thesis •  Processing lots of sequencing data in hundreds of people •  Too many people and processes to monitor all steps of pipeline by eye while data was being processed Sanity check •  After data processing did data look bi-allelic? No!!  
  • 34. Troubleshooting using R •  People don’t actually have massive deletions and amplifications •  My pipeline was deleting files because of a bug, which would remove large chunks of chromosomes •  Thanks to R, I found people where this had happened, tracked down the bug, and didn’t report massive CNVs in autistic children Side note •  If it looks too good to be true, it probably is
  • 35. R helped me build a better genotyper •  Some non-reference alleles aren’t covered well •  Leads to incorrect genotype calls Problem •  How do I develop a smarter genotyper and know that it works?
  • 36. R helped me build a better genotyper •  Some non-reference alleles chr19:54772760 A repeat, reference length 8 aren’t covered well Genotypes 100 •  Leads to incorrect genotype works? 60 40 20 0 genotyper and know that it 10 bp allele coverage •  How do I develop a smarter 80 calls Problem 10|-1 10|10 8|-1 8|10 8|8 0 20 40 60 8 bp allele coverage 80 100
  • 37. Modeling genotypes in R •  Built a model for biased genotypes in R •  Model helped me build a more accurate genotyper •  When applied to real data, clear improvements
  • 38. R finds de novo mutations for me •  >300 million genotypes •  How do I find de novo mutations in all that data? R to the rescue!
  • 39. What R has done for me Data mining •  Finding de novo mutations •  Quality control for my data Data manipulation •  Converting raw read counts to genotypes Data simulation and modeling •  Finding ways to improve my genotyper Data visualization
  • 40. R has extensive support for biologists Bioconductor is an incredible resource for biological analyses in R •  Microarrays •  Differential expression (DESeq, edgeR, cummeRbund) •  Gene models •  Flow cytometry (flowCore, flowStats, flowViz) •  Interacting with Ensembl, Cosmic, Gramene, etc. (biomaRt)
  • 41. Installing R •  R can be downloaded from rproject.org •  R runs on PCs, Macs and Linux computers •  The R project website has an R manual to get you started
  • 42. Working in R Native R interface can be hard to work with •  Lots of windows •  Difficult to keep things organized
  • 43. RStudio interface •  All your variables, help pages, script windows and consoles in one place •  Highlights R code for easier programming •  Tabbed windows for multiple scripts •  History saves all previous commands, plot history saves all previous plots •  Find it at rstudio.com
  • 44. Learning R Many online tutorials •  R has its own introduction •  Statistics Using R with Biological Examples Take interesting data, use it to explore R •  Plot, graph, use statistical tests Ask someone who knows R •  Getting started is pretty easy •  Learn what you need when you need it
  • 46.
  • 47. The Bioscience Entreprise Club is dedicated to helping CSHL’s science research professionals and alumni cultivate and leverage their cross-disciplinary skill sets and expertise to transition into diverse careers.
  • 48. Current Exchange is CSHL’s very own student-run magazine. We feature articles about science aimed at a general audience. Check out our inaugural issue at issuu.com/ currentexchange Send your articles to raboukha@cshl.edu by November 5, 2013