SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
ISMU pipeline for NGS data
analysis and facilitating
molecular breeding
http://hpc.icrisat.cgiar.org/NGS/
• Short read length of sequences
• Availability of many tools
• Platform dependency and command line driven
• No direct ways for prediction of SNPs between
genotypes
• Quality scores vary depending on version and
technology
Challenges
ISMU version 1
• SNP discovery from NGS data
– Pipeline for mapping / assembling
– Calling SNPs between genotypes
– Visualisation
ISMU version 2
• Application of identified SNPs to breeding
• Benchmark available open source short reads
assembly and downstream analysis
programs/software.
• Assembly and polymorphism detection between
genotypes and visualization
• Assay design (Illumina GoldenGate Assay), genotype
calling and visualization and analysis of SNP
genotyping and haplotype data
• Identify and use parental lines for using in MABC or
MARS
• Discovery of SNP markers for use in foreground and
background selection of MABC or MARS.
• Documentation of the pipeline and the integrated
software.
Objectives of NGS Pipeline
Control Flowchart
ICRISAT
CROPS
YesNo
Input Data & validation
Upload Reference
& data
Mapping (Maq,Novo)
Mapped reads
Assembly Visualization
Consensus calling
Report SNPs
• Extract sequences with SNPs
• Design primers
• In silico validation by SNP2CAPS
Database
ADT Score
G.G Assay
Bead Studio
Flapjack
Genotype 1 Genotype 2
Chrom1 Pos RefAllele Gtyp1 Gtyp2
5 303 A G ?
Maq NovoProgramme
SNP Bet Genotypes
Standard Methodology
Mapping Mapping
Assembly
SNP Calling
ag. Reference
ADT Scoring
Reporting
Remove
duplicates
Check the inverse
combination
Compare allele between
genotypes
Base calling in 2nd genotype
Predicted SNPs against Reference
Customized Methodology
(Consensus Base Calling-cc)
ccMaq ccNovo
SNP Calling
Genotype 1 Genotype 2
Programme
Inhouse Script
ADT scoring
Genotype 2
fmaj=21/28
=0.75
Genotype 1
fmaj =38/40
=0.95
Mapping Mapping
Consensus Base Calling
Parameters (Default)
• Max number of mismatches <= 7
• Sum of mismatches score <=60
• Min mapping quality =>0
• Read depth threshold =>5
• Major base frequency threshold => 0.75
What if more than 2 genotypes?
Genotype1
Genotype2
Genotype3
Genotype4
G1 G2 G3
G1 0 1 1
G2 0 0 1
G3 0 0 0
Combination of genotypes = (n2–n)/2
• Reads format
fna and qual
(Standard/Sanger)Fastq
SCARF fomat
Solexa fastq, Solexa export
AB SOLiD read format
FASTA
• Reference sequence
Chickpea transcript assembly
Pearl millet transcript assembly
Pigeonpea transcript assembly
Medicago genome
Sorghum genome
NGS pipeline input data
NGS pipeline (Input 1)
http://hpc.icrisat.cgiar.org/NGS/
NGS pipeline (Input 2)
NGS pipeline (Help page)
NGS pipeline (Results)
NGS pipeline (Visualisation)
Available in 2 Editions
1. Server Edition
2. Desktop Edition
Pipeline Editions
• User friendly web interface
– Installation on following Linux platform
• Fedora 13
• Cent OS 5
• Clients can be any OS with a web browser
• Communication resources
• SMTP (Email)
• Session specific job processing
- Avoid file over writing
Server Edition
Desktop Edition
• All functionalities of Server Edition on a Desktop
• Supported OS
• Fedora 13
• RHEL 5
• Single command installation
• Available in Installable CD
Future plans
•Consideration of new tools to integrate /
update eg: BWA, Bowtie
•Implementation of the extension to the
pipeline
•Evaluate cloud computing and high
performance computing cluster options
•Initiatives such as iPlant (discovery
environment – genotype to phenotype)
• Identification of
appropriate modules for
MARS, GWS and GBS
• Integration of MARS and
GWS module
• Linking of ISMU pipeline
with DMS of IBP
• Documentation & Training
of ISMU pipeline
Future Plans: ISMU v 2
Internet
Architecture
Reference
Sequences
Velvet
Perl Prog
Maq
Novo
CGI
SNP Database
Files
downloading
Dynamic
Querying
Assembly
Visualization
Input data
validation
NGS Data Analysis pipeline at ICRISAT
Apache Server
Hosting Web
Pages
SMTP
Server
• Rajeev K. Varshney
• Abhishek Rathore
• Jayashree B
• Vivek Thakur
• R. Pradeep
• A. Bhanu Prakash
• Sarwar Azam
• G.Meenakshi
• David Marshall
• Iain Milne
Contributors
• Jonathan Jones
• David Studholme
• Greg May
• Andrew Farmer
• Jimmy Woodward
• Dave Edwards

Contenu connexe

Similaire à GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

DIYA: An annotation pipeline for any genomics lab
DIYA: An annotation pipeline for any genomics labDIYA: An annotation pipeline for any genomics lab
DIYA: An annotation pipeline for any genomics lab
Andrew Stewart
 
Under the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS ResearchersUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers
Golden Helix Inc
 
For your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and laFor your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and la
ShainaBoling829
 
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence ReadsPipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Adam Bradley
 
TechWiseTV Workshop: Segment Routing for the Datacenter
TechWiseTV Workshop: Segment Routing for the DatacenterTechWiseTV Workshop: Segment Routing for the Datacenter
TechWiseTV Workshop: Segment Routing for the Datacenter
Robb Boyd
 

Similaire à GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding (20)

DIYA: An annotation pipeline for any genomics lab
DIYA: An annotation pipeline for any genomics labDIYA: An annotation pipeline for any genomics lab
DIYA: An annotation pipeline for any genomics lab
 
N2os overview
N2os overviewN2os overview
N2os overview
 
rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdf
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysis
 
Bioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzamanBioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzaman
 
Introduction to NBL
Introduction to NBLIntroduction to NBL
Introduction to NBL
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Under the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS ResearchersUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers
 
Ion Torrent Sequencer, Mappers, Variant Callers
Ion Torrent Sequencer, Mappers, Variant CallersIon Torrent Sequencer, Mappers, Variant Callers
Ion Torrent Sequencer, Mappers, Variant Callers
 
For your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and laFor your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and la
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Network Situational Awareness with d00gle
Network Situational Awareness with d00gleNetwork Situational Awareness with d00gle
Network Situational Awareness with d00gle
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence ReadsPipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
 
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
TechWiseTV Workshop: Segment Routing for the Datacenter
TechWiseTV Workshop: Segment Routing for the DatacenterTechWiseTV Workshop: Segment Routing for the Datacenter
TechWiseTV Workshop: Segment Routing for the Datacenter
 

Plus de CGIAR Generation Challenge Programme

ARM 2008: Dissection, characterisation and utilisation of disease QTL -- R Ne...
ARM 2008: Dissection, characterisation and utilisation of disease QTL -- R Ne...ARM 2008: Dissection, characterisation and utilisation of disease QTL -- R Ne...
ARM 2008: Dissection, characterisation and utilisation of disease QTL -- R Ne...
CGIAR Generation Challenge Programme
 
ARM 2007: Dissection, characterisation and utilisation of disease QTL -- R Ne...
ARM 2007: Dissection, characterisation and utilisation of disease QTL -- R Ne...ARM 2007: Dissection, characterisation and utilisation of disease QTL -- R Ne...
ARM 2007: Dissection, characterisation and utilisation of disease QTL -- R Ne...
CGIAR Generation Challenge Programme
 
TLM III: Improve groundnut productivity for marginal environments from sub-Sa...
TLM III: Improve groundnut productivity for marginal environments from sub-Sa...TLM III: Improve groundnut productivity for marginal environments from sub-Sa...
TLM III: Improve groundnut productivity for marginal environments from sub-Sa...
CGIAR Generation Challenge Programme
 
TLM III: Improve cowpea productivity for marginal environments in sub-Sahara...
TLM III: Improve cowpea productivity for marginal  environments in sub-Sahara...TLM III: Improve cowpea productivity for marginal  environments in sub-Sahara...
TLM III: Improve cowpea productivity for marginal environments in sub-Sahara...
CGIAR Generation Challenge Programme
 
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...
CGIAR Generation Challenge Programme
 
TLIII: Tropical Legumes I – Improving Tropical Legume Productivity for Margin...
TLIII: Tropical Legumes I – Improving Tropical Legume Productivity for Margin...TLIII: Tropical Legumes I – Improving Tropical Legume Productivity for Margin...
TLIII: Tropical Legumes I – Improving Tropical Legume Productivity for Margin...
CGIAR Generation Challenge Programme
 
PAG XXII 2014 – The Breeding Management System (BMS) of the Integrated Breedi...
PAG XXII 2014 – The Breeding Management System (BMS) of the Integrated Breedi...PAG XXII 2014 – The Breeding Management System (BMS) of the Integrated Breedi...
PAG XXII 2014 – The Breeding Management System (BMS) of the Integrated Breedi...
CGIAR Generation Challenge Programme
 
PAG XXII 2014 – The Crop Ontology: A resource for enabling access to breeders...
PAG XXII 2014 – The Crop Ontology: A resource for enabling access to breeders...PAG XXII 2014 – The Crop Ontology: A resource for enabling access to breeders...
PAG XXII 2014 – The Crop Ontology: A resource for enabling access to breeders...
CGIAR Generation Challenge Programme
 
PAG XXII 2014 – Genomic resources applied to marker-assisted breeding in cowp...
PAG XXII 2014 – Genomic resources applied to marker-assisted breeding in cowp...PAG XXII 2014 – Genomic resources applied to marker-assisted breeding in cowp...
PAG XXII 2014 – Genomic resources applied to marker-assisted breeding in cowp...
CGIAR Generation Challenge Programme
 
GRM 2013: Improving rice productivity in lowland ecosystems of Burkina Faso, ...
GRM 2013: Improving rice productivity in lowland ecosystems of Burkina Faso, ...GRM 2013: Improving rice productivity in lowland ecosystems of Burkina Faso, ...
GRM 2013: Improving rice productivity in lowland ecosystems of Burkina Faso, ...
CGIAR Generation Challenge Programme
 
GRM 2013: Improving sorghum productivity in semi-arid environments of Mali th...
GRM 2013: Improving sorghum productivity in semi-arid environments of Mali th...GRM 2013: Improving sorghum productivity in semi-arid environments of Mali th...
GRM 2013: Improving sorghum productivity in semi-arid environments of Mali th...
CGIAR Generation Challenge Programme
 

Plus de CGIAR Generation Challenge Programme (20)

Capacity Building: Gain or Drain? J-M Ribaut, F Okono and NN Diop
Capacity Building: Gain or Drain? J-M Ribaut, F Okono and NN DiopCapacity Building: Gain or Drain? J-M Ribaut, F Okono and NN Diop
Capacity Building: Gain or Drain? J-M Ribaut, F Okono and NN Diop
 
ARM 2008: Dissection, characterisation and utilisation of disease QTL -- R Ne...
ARM 2008: Dissection, characterisation and utilisation of disease QTL -- R Ne...ARM 2008: Dissection, characterisation and utilisation of disease QTL -- R Ne...
ARM 2008: Dissection, characterisation and utilisation of disease QTL -- R Ne...
 
ARM 2007: Dissection, characterisation and utilisation of disease QTL -- R Ne...
ARM 2007: Dissection, characterisation and utilisation of disease QTL -- R Ne...ARM 2007: Dissection, characterisation and utilisation of disease QTL -- R Ne...
ARM 2007: Dissection, characterisation and utilisation of disease QTL -- R Ne...
 
The Generation Challenge Programme: Lessons learnt relevant to CRPs, and the ...
The Generation Challenge Programme: Lessons learnt relevant to CRPs, and the ...The Generation Challenge Programme: Lessons learnt relevant to CRPs, and the ...
The Generation Challenge Programme: Lessons learnt relevant to CRPs, and the ...
 
Lessons learnt from the GCP experience – J-M Ribaut
Lessons learnt from the GCP experience – J-M RibautLessons learnt from the GCP experience – J-M Ribaut
Lessons learnt from the GCP experience – J-M Ribaut
 
Integrated Breeding Platform (IBP): A user-friendly platform to implement the...
Integrated Breeding Platform (IBP): A user-friendly platform to implement the...Integrated Breeding Platform (IBP): A user-friendly platform to implement the...
Integrated Breeding Platform (IBP): A user-friendly platform to implement the...
 
Integrated Breeding Platform (IBP): A user-friendly platform to implement the...
Integrated Breeding Platform (IBP): A user-friendly platform to implement the...Integrated Breeding Platform (IBP): A user-friendly platform to implement the...
Integrated Breeding Platform (IBP): A user-friendly platform to implement the...
 
TLM III: : Improve common bean productivity for marginal environments in su...
TLM III: :   Improve common bean productivity for marginal environments in su...TLM III: :   Improve common bean productivity for marginal environments in su...
TLM III: : Improve common bean productivity for marginal environments in su...
 
TLM III: Improve groundnut productivity for marginal environments from sub-Sa...
TLM III: Improve groundnut productivity for marginal environments from sub-Sa...TLM III: Improve groundnut productivity for marginal environments from sub-Sa...
TLM III: Improve groundnut productivity for marginal environments from sub-Sa...
 
TLM III: Improve cowpea productivity for marginal environments in sub-Sahara...
TLM III: Improve cowpea productivity for marginal  environments in sub-Sahara...TLM III: Improve cowpea productivity for marginal  environments in sub-Sahara...
TLM III: Improve cowpea productivity for marginal environments in sub-Sahara...
 
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...
TLIII: Overview of TLII achievements, lessons and challenges for Phase III – ...
 
TLIII: Tropical Legumes I – Improving Tropical Legume Productivity for Margin...
TLIII: Tropical Legumes I – Improving Tropical Legume Productivity for Margin...TLIII: Tropical Legumes I – Improving Tropical Legume Productivity for Margin...
TLIII: Tropical Legumes I – Improving Tropical Legume Productivity for Margin...
 
Adoption of modern breeding tools in developing countries: challenges and opp...
Adoption of modern breeding tools in developing countries: challenges and opp...Adoption of modern breeding tools in developing countries: challenges and opp...
Adoption of modern breeding tools in developing countries: challenges and opp...
 
PAG XXII 2014 – The Breeding Management System (BMS) of the Integrated Breedi...
PAG XXII 2014 – The Breeding Management System (BMS) of the Integrated Breedi...PAG XXII 2014 – The Breeding Management System (BMS) of the Integrated Breedi...
PAG XXII 2014 – The Breeding Management System (BMS) of the Integrated Breedi...
 
PAG XXII 2014 – The Crop Ontology: A resource for enabling access to breeders...
PAG XXII 2014 – The Crop Ontology: A resource for enabling access to breeders...PAG XXII 2014 – The Crop Ontology: A resource for enabling access to breeders...
PAG XXII 2014 – The Crop Ontology: A resource for enabling access to breeders...
 
PAG XXII 2014 – Genomic resources applied to marker-assisted breeding in cowp...
PAG XXII 2014 – Genomic resources applied to marker-assisted breeding in cowp...PAG XXII 2014 – Genomic resources applied to marker-assisted breeding in cowp...
PAG XXII 2014 – Genomic resources applied to marker-assisted breeding in cowp...
 
2011: Introduction to the CGIAR Generation Challenge Programme (GCP)
2011: Introduction to the CGIAR Generation Challenge Programme (GCP)2011: Introduction to the CGIAR Generation Challenge Programme (GCP)
2011: Introduction to the CGIAR Generation Challenge Programme (GCP)
 
Working with diversity in international partnerships -- The GCP experience --...
Working with diversity in international partnerships -- The GCP experience --...Working with diversity in international partnerships -- The GCP experience --...
Working with diversity in international partnerships -- The GCP experience --...
 
GRM 2013: Improving rice productivity in lowland ecosystems of Burkina Faso, ...
GRM 2013: Improving rice productivity in lowland ecosystems of Burkina Faso, ...GRM 2013: Improving rice productivity in lowland ecosystems of Burkina Faso, ...
GRM 2013: Improving rice productivity in lowland ecosystems of Burkina Faso, ...
 
GRM 2013: Improving sorghum productivity in semi-arid environments of Mali th...
GRM 2013: Improving sorghum productivity in semi-arid environments of Mali th...GRM 2013: Improving sorghum productivity in semi-arid environments of Mali th...
GRM 2013: Improving sorghum productivity in semi-arid environments of Mali th...
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

  • 1. ISMU pipeline for NGS data analysis and facilitating molecular breeding http://hpc.icrisat.cgiar.org/NGS/
  • 2. • Short read length of sequences • Availability of many tools • Platform dependency and command line driven • No direct ways for prediction of SNPs between genotypes • Quality scores vary depending on version and technology Challenges
  • 3. ISMU version 1 • SNP discovery from NGS data – Pipeline for mapping / assembling – Calling SNPs between genotypes – Visualisation
  • 4. ISMU version 2 • Application of identified SNPs to breeding
  • 5. • Benchmark available open source short reads assembly and downstream analysis programs/software. • Assembly and polymorphism detection between genotypes and visualization • Assay design (Illumina GoldenGate Assay), genotype calling and visualization and analysis of SNP genotyping and haplotype data • Identify and use parental lines for using in MABC or MARS • Discovery of SNP markers for use in foreground and background selection of MABC or MARS. • Documentation of the pipeline and the integrated software. Objectives of NGS Pipeline
  • 6. Control Flowchart ICRISAT CROPS YesNo Input Data & validation Upload Reference & data Mapping (Maq,Novo) Mapped reads Assembly Visualization Consensus calling Report SNPs • Extract sequences with SNPs • Design primers • In silico validation by SNP2CAPS Database ADT Score G.G Assay Bead Studio Flapjack
  • 7. Genotype 1 Genotype 2 Chrom1 Pos RefAllele Gtyp1 Gtyp2 5 303 A G ? Maq NovoProgramme SNP Bet Genotypes Standard Methodology Mapping Mapping Assembly SNP Calling ag. Reference ADT Scoring Reporting Remove duplicates Check the inverse combination Compare allele between genotypes Base calling in 2nd genotype Predicted SNPs against Reference
  • 8. Customized Methodology (Consensus Base Calling-cc) ccMaq ccNovo SNP Calling Genotype 1 Genotype 2 Programme Inhouse Script ADT scoring Genotype 2 fmaj=21/28 =0.75 Genotype 1 fmaj =38/40 =0.95 Mapping Mapping
  • 9. Consensus Base Calling Parameters (Default) • Max number of mismatches <= 7 • Sum of mismatches score <=60 • Min mapping quality =>0 • Read depth threshold =>5 • Major base frequency threshold => 0.75
  • 10. What if more than 2 genotypes? Genotype1 Genotype2 Genotype3 Genotype4 G1 G2 G3 G1 0 1 1 G2 0 0 1 G3 0 0 0 Combination of genotypes = (n2–n)/2
  • 11. • Reads format fna and qual (Standard/Sanger)Fastq SCARF fomat Solexa fastq, Solexa export AB SOLiD read format FASTA • Reference sequence Chickpea transcript assembly Pearl millet transcript assembly Pigeonpea transcript assembly Medicago genome Sorghum genome NGS pipeline input data
  • 12. NGS pipeline (Input 1) http://hpc.icrisat.cgiar.org/NGS/
  • 17. Available in 2 Editions 1. Server Edition 2. Desktop Edition Pipeline Editions
  • 18. • User friendly web interface – Installation on following Linux platform • Fedora 13 • Cent OS 5 • Clients can be any OS with a web browser • Communication resources • SMTP (Email) • Session specific job processing - Avoid file over writing Server Edition
  • 19. Desktop Edition • All functionalities of Server Edition on a Desktop • Supported OS • Fedora 13 • RHEL 5 • Single command installation • Available in Installable CD
  • 20. Future plans •Consideration of new tools to integrate / update eg: BWA, Bowtie •Implementation of the extension to the pipeline •Evaluate cloud computing and high performance computing cluster options •Initiatives such as iPlant (discovery environment – genotype to phenotype)
  • 21. • Identification of appropriate modules for MARS, GWS and GBS • Integration of MARS and GWS module • Linking of ISMU pipeline with DMS of IBP • Documentation & Training of ISMU pipeline Future Plans: ISMU v 2
  • 22. Internet Architecture Reference Sequences Velvet Perl Prog Maq Novo CGI SNP Database Files downloading Dynamic Querying Assembly Visualization Input data validation NGS Data Analysis pipeline at ICRISAT Apache Server Hosting Web Pages SMTP Server
  • 23. • Rajeev K. Varshney • Abhishek Rathore • Jayashree B • Vivek Thakur • R. Pradeep • A. Bhanu Prakash • Sarwar Azam • G.Meenakshi • David Marshall • Iain Milne Contributors • Jonathan Jones • David Studholme • Greg May • Andrew Farmer • Jimmy Woodward • Dave Edwards