SlideShare une entreprise Scribd logo
1  sur  14
Milko Krachunov2
, Ivan Popov1
, Valeria Simeonova2
, Irena Avdjieva1
,
Paweł Szczęsny3
, Urszula Zelenkiewicz3
, Piotr Zelenkiewicz3
,
Dimitar Vassilev1
1
Bioinforomatics group, AgroBioInstitute, Bulgaria
2
Faculty of mathematics and informatics; Sofia University “St. Kliment Ohridski”, Bulgaria
3
Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
Detection and correction of errors in
metagenomic 16S RNA parallel sequencing
NGS errors – common problems
 Introduced errors in the assembled reads due to
imperfections both of biological and mathematical origin;
Impossibility to re-sequence the same sample again in
metagenomic studies ;
Tendency the error rate to increase in every step of the
process;
No easy way to differentiate between “sequencing error” and
“rare variant”;
Many existing methods and algorithms concerning different
aspects of the problem but no unified solutions are available;
Large amounts of data are difficult to process with common
software.
Significance of 16S RNA sequencing
Highly conserved between different species of bacteria and
archaea;
Sequence analysis is done with universal PCR primers;
Contains hypervariable regions that can provide species-
specific signature sequences;
Suitable for phylogenetic studies;
Suitable for metagenomic studies.
General approach in metagenomic biodiversity studies
454 Sequencing
Filtering / Denoising
Multiple alignment
Distance matrix
ОTU clusters with abundance count
Our approach:
A. Raw data characteristics and processing
Two separate runs of metagenomic 16S RNA fragments,
sequenced with 454 platform and converted in FASTA format:
run 02 – 46429 short reads
run 04 – 41386 short reads
Our task – extract, denoise and correct only the quality
reads.
Raw data length histogram
Run 02 Run 04
B. Correction with SHREC
C. Correction with our method:
Classification and performance evaluation
ClaMS parameters:
Distance cut-off: 0,05
Signature type: DBC
k-mer length: 3
Existing taxonomy: 4th Level
Aim of the method – idea outline
To deal with the heterogeneous nature of the data, similar or
related sequences are considered more important in the error
evaluation
The naïve approach: If a base is less common than the
sequencer error rate, assume it’s likely an error and replace
with the most common base
Our modification: Calculate the occurrence of the base in
reads that are similar in the given region – assign them bigger
weights or use them exclusively
Progress so far
Calculate occurrence rates of every base in reads that are
identical to the evaluated read in a window with radius of n
bases
 Preliminary results: The first basic implementation leads to
an increase in the number of OTUs found with ClaMS
Under development
 Good choice(s) of approach for alignment of the reads
 Empirical evaluation of the parameters
 Comparative evaluation of the variants of the approach
Software used in this project:
Python: http://www.python.org/
Cython: http://cython.org/
MEGA (Molecular Evolutionary Genetics Analysis):
http://www.megasoftware.net/
Muscle: http://www.drive5.com/muscle/
SHREC (SHort Read Error Correction method):
http://ww2.cs.mu.oz.au/~schroder/shrec_www/
ClaMS (Classifier for Metagenomic Sequences): http://clams.jgi-
psf.org/
NINJA (modified): http://nimbletwist.com/software/ninja/index.html
R-package: http://www.r-project.org/
milko@3mhz.net
Thank you

Contenu connexe

Tendances

Prediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen usingPrediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen usingShamik Tiwari
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Networking Summit
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkJen Aman
 
Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...Enrico Busto
 
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...Rebeca Orellana
 
Modular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed SystemsModular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed SystemsSoumya Banerjee
 

Tendances (9)

Prediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen usingPrediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen using
 
Network approaches to systems biology analysis of complex disease integrative...
Network approaches to systems biology analysis of complex disease integrative...Network approaches to systems biology analysis of complex disease integrative...
Network approaches to systems biology analysis of complex disease integrative...
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of Cancer
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
 
Bioinformatics Projects And Applications
Bioinformatics Projects And ApplicationsBioinformatics Projects And Applications
Bioinformatics Projects And Applications
 
Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...
 
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
 
Modular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed SystemsModular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed Systems
 
nicolau_BioSketch
nicolau_BioSketchnicolau_BioSketch
nicolau_BioSketch
 

En vedette

презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
Startup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, PakistanStartup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, PakistanSiim Teller
 
Day in the life of a mobile commerce user
Day in the life of a mobile commerce userDay in the life of a mobile commerce user
Day in the life of a mobile commerce userSiim Teller
 
Startup lessons from Estonia
Startup lessons from EstoniaStartup lessons from Estonia
Startup lessons from EstoniaSiim Teller
 
Thailand Mobile Market 2013
Thailand Mobile Market 2013Thailand Mobile Market 2013
Thailand Mobile Market 2013Siim Teller
 
Pakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, SocialPakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, SocialSiim Teller
 

En vedette (12)

Ett Profile
Ett ProfileEtt Profile
Ett Profile
 
3302 3305
3302 33053302 3305
3302 3305
 
3877 3884
3877 38843877 3884
3877 3884
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Product List
Product ListProduct List
Product List
 
Simeonova
SimeonovaSimeonova
Simeonova
 
Startup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, PakistanStartup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, Pakistan
 
Day in the life of a mobile commerce user
Day in the life of a mobile commerce userDay in the life of a mobile commerce user
Day in the life of a mobile commerce user
 
Kontakt 2006
Kontakt 2006Kontakt 2006
Kontakt 2006
 
Startup lessons from Estonia
Startup lessons from EstoniaStartup lessons from Estonia
Startup lessons from Estonia
 
Thailand Mobile Market 2013
Thailand Mobile Market 2013Thailand Mobile Market 2013
Thailand Mobile Market 2013
 
Pakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, SocialPakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, Social
 

Similaire à Milko stat seq_toulouse

Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceEfficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceIJSTA
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
Systems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsSystems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsimprovemed
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesElia Brodsky
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...Lars Juhl Jensen
 
Pathway analysis for genomics data
Pathway analysis for genomics dataPathway analysis for genomics data
Pathway analysis for genomics dataSakshiJha40
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingNixon Mendez
 
Assign 2.0 software for the analysis of Phred quality values for quality con...
Assign 2.0  software for the analysis of Phred quality values for quality con...Assign 2.0  software for the analysis of Phred quality values for quality con...
Assign 2.0 software for the analysis of Phred quality values for quality con...Crystal Sanchez
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods Zohaib HUSSAIN
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisPrasanthperceptron
 
Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]Luís Rita
 
Common copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samplesCommon copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samplesieeepondy
 
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...Ji-Youn Yeo
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposterElsa Fecke
 

Similaire à Milko stat seq_toulouse (20)

Kirmitzoglou_PhD_Final
Kirmitzoglou_PhD_FinalKirmitzoglou_PhD_Final
Kirmitzoglou_PhD_Final
 
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceEfficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Systems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsSystems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasets
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...
 
Pathway analysis for genomics data
Pathway analysis for genomics dataPathway analysis for genomics data
Pathway analysis for genomics data
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation Sequencing
 
Assign 2.0 software for the analysis of Phred quality values for quality con...
Assign 2.0  software for the analysis of Phred quality values for quality con...Assign 2.0  software for the analysis of Phred quality values for quality con...
Assign 2.0 software for the analysis of Phred quality values for quality con...
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
 
Medical science
Medical scienceMedical science
Medical science
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]
 
Common copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samplesCommon copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samples
 
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
 

Dernier

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Milko stat seq_toulouse

  • 1. Milko Krachunov2 , Ivan Popov1 , Valeria Simeonova2 , Irena Avdjieva1 , Paweł Szczęsny3 , Urszula Zelenkiewicz3 , Piotr Zelenkiewicz3 , Dimitar Vassilev1 1 Bioinforomatics group, AgroBioInstitute, Bulgaria 2 Faculty of mathematics and informatics; Sofia University “St. Kliment Ohridski”, Bulgaria 3 Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland Detection and correction of errors in metagenomic 16S RNA parallel sequencing
  • 2. NGS errors – common problems  Introduced errors in the assembled reads due to imperfections both of biological and mathematical origin; Impossibility to re-sequence the same sample again in metagenomic studies ; Tendency the error rate to increase in every step of the process; No easy way to differentiate between “sequencing error” and “rare variant”; Many existing methods and algorithms concerning different aspects of the problem but no unified solutions are available; Large amounts of data are difficult to process with common software.
  • 3. Significance of 16S RNA sequencing Highly conserved between different species of bacteria and archaea; Sequence analysis is done with universal PCR primers; Contains hypervariable regions that can provide species- specific signature sequences; Suitable for phylogenetic studies; Suitable for metagenomic studies.
  • 4. General approach in metagenomic biodiversity studies 454 Sequencing Filtering / Denoising Multiple alignment Distance matrix ОTU clusters with abundance count
  • 6. A. Raw data characteristics and processing Two separate runs of metagenomic 16S RNA fragments, sequenced with 454 platform and converted in FASTA format: run 02 – 46429 short reads run 04 – 41386 short reads Our task – extract, denoise and correct only the quality reads.
  • 7. Raw data length histogram Run 02 Run 04
  • 9. C. Correction with our method:
  • 10. Classification and performance evaluation ClaMS parameters: Distance cut-off: 0,05 Signature type: DBC k-mer length: 3 Existing taxonomy: 4th Level
  • 11. Aim of the method – idea outline To deal with the heterogeneous nature of the data, similar or related sequences are considered more important in the error evaluation The naïve approach: If a base is less common than the sequencer error rate, assume it’s likely an error and replace with the most common base Our modification: Calculate the occurrence of the base in reads that are similar in the given region – assign them bigger weights or use them exclusively
  • 12. Progress so far Calculate occurrence rates of every base in reads that are identical to the evaluated read in a window with radius of n bases  Preliminary results: The first basic implementation leads to an increase in the number of OTUs found with ClaMS Under development  Good choice(s) of approach for alignment of the reads  Empirical evaluation of the parameters  Comparative evaluation of the variants of the approach
  • 13. Software used in this project: Python: http://www.python.org/ Cython: http://cython.org/ MEGA (Molecular Evolutionary Genetics Analysis): http://www.megasoftware.net/ Muscle: http://www.drive5.com/muscle/ SHREC (SHort Read Error Correction method): http://ww2.cs.mu.oz.au/~schroder/shrec_www/ ClaMS (Classifier for Metagenomic Sequences): http://clams.jgi- psf.org/ NINJA (modified): http://nimbletwist.com/software/ninja/index.html R-package: http://www.r-project.org/

Notes de l'éditeur

  1. Last two change places?
  2. Нещо допълнително?
  3. Деф. заглавие!
  4. Още 1 доп. Слайд?