SlideShare une entreprise Scribd logo
1  sur  19
Concurrent Bioinformatics Software FORDISCOVERING Genome-Wide Patternsand Word-based Genomic Signatures Jens Lichtenberg, Kyle Kurz, Xiaoyu Liang, Rami Al-Ouran, Lev Neiman, Lee Nau, Joshua Welch, Edwin Jacox, Thomas Bitterman, Klaus Ecker, Laura Elnitski, Frank Drews, Stephen Lee, Lonnie Welch
The WordSeeker Tool Enumeration Suffix Tree and Suffix Array Radix Tree Scoring Clustering Sequence Clustering Word Clustering Conservation Analysis Phast Cons Score Extraction Location Distributions Sequence Coverage Min set of words necessary to 	cover all sequences Module Discovery Enumerative Ranger Markup Basic Functional Elements
Software Properties Google code repository: http://code.google.com/p/word-seeker/ GNU General Public License v3 Doxygen code generator (Internal Documentation). Svn for command line access: http://word-seeker.googlecode.com/svn/trunk Requirements G++ compiler version 4.1* or higher OpenMP headers MPI environment (distributed version) For visualizations and other post-processing steps Perl 5.8.8, TFBS (http://tfbs.genereg.net/) SET::Scalar LWP::Simple Parallel::Forkmanager GD::Graphs::bars, Algorithm::Cluster  Bio::SeqIO (all available through CPAN) Gnuplot version 4.2 or higher
Need for a Scalable Approach Word Enumeration Module Represents a set of biological input sequences based on some data structure Keeps track of words, word counts, sequence counts, and word locations Need to keep the data persistent in memory Word Scoring Module Determines statistical scores for each word Frequent lookups for words and substrings of words  Example: Markov order m model requires lookups for all  substrings of up to length m for all words ,[object Object],lookups low
Enumeration Approaches Total number of nucleotides in the input sequences: n Word length: m
Distributed Solution Tasks executed on different nodes Distributed Memory Multi-core Solution Tasks executed on different cores Shared Memory Solution Parallelization
Parallel Software Properties Shared Memory Open MP parallelization Simple, portable, directives that compile even on non supported architectures Simple loops are run in parallel on multiple processors Distributed Memory MPI parallelization Hardware optimizations and support for Fortran, C/C++, Perl Each node is provided a subset of the data to process “Smart” division of tasks is key
Results Analyzed the Arabidopsis thaliana genome All segments and the full genome Multiple word lengths (1-20) Searched top words against AGRIS (repository of known elements in A. thaliana) Characterized the Framework Speedup and runtime analysis Radix Trie and Suffix Tree
Memory Requirements for Arabidopsis thaliana Conducted at the Ohio Supercomputer Center
Execution Times for Arabidopsis thaliana
Speedup, efficiency and timing using A. thaliana core promoter sequences. Analyzing the Parallel System
Shared and Distributed Memory Speedup Radix Trie Suffix Tree
Shared and Distributed Memory Efficiency Radix Trie Suffix Tree
Shared and Distributed Memory Performance Radix Trie Suffix Tree
Scoring Speedup Contribution Runtime Scoring
Results: Pushing the limits
Summary Parallel Shared memory on single nodes Distributed memory on 5 nodes High-throughput Full genomes analyzed in under 5 hours Long word lengths Genomes approaching 20 Smaller files often 100 or greater Powerful analysis Detailed statistics Degeneracy via clustering Additional post-processing (scatter plots, logos, etc.)
Future Work Post-processing Word distributions Sequence clustering Gbrowse visualization Further parallelization Within a node Greater distributed abstraction (more prefixes)
Questions?

Contenu connexe

Tendances

Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma
 

Tendances (20)

Pthread Library
Pthread LibraryPthread Library
Pthread Library
 
Taming Snakemake
Taming SnakemakeTaming Snakemake
Taming Snakemake
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
 
Microkernel design
Microkernel designMicrokernel design
Microkernel design
 
eScience Cluster Arch. Overview
eScience Cluster Arch. OvervieweScience Cluster Arch. Overview
eScience Cluster Arch. Overview
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismSummary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
 
Wiki 2
Wiki 2Wiki 2
Wiki 2
 
Chapter04 new
Chapter04 newChapter04 new
Chapter04 new
 
A Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on HadoopA Survey of NGS Data Analysis on Hadoop
A Survey of NGS Data Analysis on Hadoop
 
LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
LO-PHI: Low-Observable Physical Host Instrumentation for Malware AnalysisLO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Acdc
AcdcAcdc
Acdc
 
Kosmos Filesystem
Kosmos FilesystemKosmos Filesystem
Kosmos Filesystem
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using Containers
 
Hadoop
HadoopHadoop
Hadoop
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
 
Cn lab manual sb 19_scsl56 (1)
Cn lab manual sb 19_scsl56 (1)Cn lab manual sb 19_scsl56 (1)
Cn lab manual sb 19_scsl56 (1)
 

En vedette

Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
BOSC 2010
 
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphins
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The DolphinsC:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphins
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphins
kkindig
 
Results from survey.
Results from survey.Results from survey.
Results from survey.
afrostwick
 
Edison.powerpoint.106.v2
Edison.powerpoint.106.v2Edison.powerpoint.106.v2
Edison.powerpoint.106.v2
aedison
 
Nars cosmetics coupon
Nars cosmetics couponNars cosmetics coupon
Nars cosmetics coupon
Materazzi3
 
Gogirl indonesia
Gogirl indonesiaGogirl indonesia
Gogirl indonesia
Jay Lee
 
자바스터디 4
자바스터디 4자바스터디 4
자바스터디 4
jangpd007
 

En vedette (20)

Как стать информационным продюсером
Как стать информационным продюсеромКак стать информационным продюсером
Как стать информационным продюсером
 
Оптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кодаОптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кода
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
1.2 Hubert Bolduc
1.2 Hubert Bolduc1.2 Hubert Bolduc
1.2 Hubert Bolduc
 
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphins
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The DolphinsC:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphins
C:\Users\The Andersens\Desktop\Karin\I Wanna Learn To Play Like The Dolphins
 
Limecoconut
LimecoconutLimecoconut
Limecoconut
 
Results from survey.
Results from survey.Results from survey.
Results from survey.
 
Edison.powerpoint.106.v2
Edison.powerpoint.106.v2Edison.powerpoint.106.v2
Edison.powerpoint.106.v2
 
Nars cosmetics coupon
Nars cosmetics couponNars cosmetics coupon
Nars cosmetics coupon
 
Snapshot Of Umt For Investment
Snapshot Of Umt For InvestmentSnapshot Of Umt For Investment
Snapshot Of Umt For Investment
 
Gogirl indonesia
Gogirl indonesiaGogirl indonesia
Gogirl indonesia
 
Portfolio acadêmico
Portfolio acadêmicoPortfolio acadêmico
Portfolio acadêmico
 
_right_ Goozzy TechCrunch presentation
_right_ Goozzy TechCrunch presentation_right_ Goozzy TechCrunch presentation
_right_ Goozzy TechCrunch presentation
 
자바스터디 4
자바스터디 4자바스터디 4
자바스터디 4
 
From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley
From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley
From Forests to Farms, and Back Again: Land Use Change in the Hudson Valley
 
Influenta brandurilor asupra consumatorilor social media
Influenta brandurilor asupra consumatorilor social mediaInfluenta brandurilor asupra consumatorilor social media
Influenta brandurilor asupra consumatorilor social media
 
CRITERIOS DE OBTENCIÓN DEL CERTIFICADO BAI EUSKARARI
CRITERIOS DE OBTENCIÓN DEL CERTIFICADO BAI EUSKARARICRITERIOS DE OBTENCIÓN DEL CERTIFICADO BAI EUSKARARI
CRITERIOS DE OBTENCIÓN DEL CERTIFICADO BAI EUSKARARI
 
Latest trends in em
Latest trends in emLatest trends in em
Latest trends in em
 
LE LABEL BAI EUSKARARI: CRITERES D'OBTENCION
LE LABEL BAI EUSKARARI: CRITERES D'OBTENCIONLE LABEL BAI EUSKARARI: CRITERES D'OBTENCION
LE LABEL BAI EUSKARARI: CRITERES D'OBTENCION
 
Gustar2
Gustar2Gustar2
Gustar2
 

Similaire à Lichtenberg bosc2010 wordseeker

Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS Interference
Eric Van Hensbergen
 
Lecture 3,4 operating systems
Lecture 3,4   operating systemsLecture 3,4   operating systems
Lecture 3,4 operating systems
Pradeep Kumar TS
 
Lecture 3,4 operating systems
Lecture 3,4   operating systemsLecture 3,4   operating systems
Lecture 3,4 operating systems
Pradeep Kumar TS
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
Vasavi College of Engg
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
mona_hakmy
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
tech2click
 
.Net framework interview questions
.Net framework interview questions.Net framework interview questions
.Net framework interview questions
Mir Majid
 

Similaire à Lichtenberg bosc2010 wordseeker (20)

Linux Driver and Embedded Developer with Android Course Content & Highlights
Linux Driver and Embedded Developer with Android Course Content & HighlightsLinux Driver and Embedded Developer with Android Course Content & Highlights
Linux Driver and Embedded Developer with Android Course Content & Highlights
 
Linux Driver and Embedded Developer Course Highlights
Linux Driver and  Embedded Developer Course HighlightsLinux Driver and  Embedded Developer Course Highlights
Linux Driver and Embedded Developer Course Highlights
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS Interference
 
Mmp hotos2003-slides
Mmp hotos2003-slidesMmp hotos2003-slides
Mmp hotos2003-slides
 
Lecture 3,4 operating systems
Lecture 3,4   operating systemsLecture 3,4   operating systems
Lecture 3,4 operating systems
 
Lecture 3,4 operating systems
Lecture 3,4   operating systemsLecture 3,4   operating systems
Lecture 3,4 operating systems
 
Hardware & softwares
Hardware & softwaresHardware & softwares
Hardware & softwares
 
App A
App AApp A
App A
 
Operating system concepts
Operating system conceptsOperating system concepts
Operating system concepts
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
 
Chapter 22 - Windows XP
Chapter 22 - Windows XPChapter 22 - Windows XP
Chapter 22 - Windows XP
 
Lamp
LampLamp
Lamp
 
Unix1
Unix1Unix1
Unix1
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
 
.Net framework interview questions
.Net framework interview questions.Net framework interview questions
.Net framework interview questions
 
Open64 compiler
Open64 compilerOpen64 compiler
Open64 compiler
 
Intro to Perfect - LA presentation
Intro to Perfect - LA presentationIntro to Perfect - LA presentation
Intro to Perfect - LA presentation
 
Windows Operating system notes taken from somewhere
Windows Operating system notes taken from somewhereWindows Operating system notes taken from somewhere
Windows Operating system notes taken from somewhere
 

Plus de BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
BOSC 2010
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
BOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
BOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
BOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
BOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
BOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
BOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
BOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
BOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
BOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
BOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
BOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
BOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
BOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
BOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
BOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
BOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
BOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
BOSC 2010
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
BOSC 2010
 

Plus de BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Lichtenberg bosc2010 wordseeker

  • 1. Concurrent Bioinformatics Software FORDISCOVERING Genome-Wide Patternsand Word-based Genomic Signatures Jens Lichtenberg, Kyle Kurz, Xiaoyu Liang, Rami Al-Ouran, Lev Neiman, Lee Nau, Joshua Welch, Edwin Jacox, Thomas Bitterman, Klaus Ecker, Laura Elnitski, Frank Drews, Stephen Lee, Lonnie Welch
  • 2. The WordSeeker Tool Enumeration Suffix Tree and Suffix Array Radix Tree Scoring Clustering Sequence Clustering Word Clustering Conservation Analysis Phast Cons Score Extraction Location Distributions Sequence Coverage Min set of words necessary to cover all sequences Module Discovery Enumerative Ranger Markup Basic Functional Elements
  • 3. Software Properties Google code repository: http://code.google.com/p/word-seeker/ GNU General Public License v3 Doxygen code generator (Internal Documentation). Svn for command line access: http://word-seeker.googlecode.com/svn/trunk Requirements G++ compiler version 4.1* or higher OpenMP headers MPI environment (distributed version) For visualizations and other post-processing steps Perl 5.8.8, TFBS (http://tfbs.genereg.net/) SET::Scalar LWP::Simple Parallel::Forkmanager GD::Graphs::bars, Algorithm::Cluster Bio::SeqIO (all available through CPAN) Gnuplot version 4.2 or higher
  • 4.
  • 5. Enumeration Approaches Total number of nucleotides in the input sequences: n Word length: m
  • 6. Distributed Solution Tasks executed on different nodes Distributed Memory Multi-core Solution Tasks executed on different cores Shared Memory Solution Parallelization
  • 7. Parallel Software Properties Shared Memory Open MP parallelization Simple, portable, directives that compile even on non supported architectures Simple loops are run in parallel on multiple processors Distributed Memory MPI parallelization Hardware optimizations and support for Fortran, C/C++, Perl Each node is provided a subset of the data to process “Smart” division of tasks is key
  • 8. Results Analyzed the Arabidopsis thaliana genome All segments and the full genome Multiple word lengths (1-20) Searched top words against AGRIS (repository of known elements in A. thaliana) Characterized the Framework Speedup and runtime analysis Radix Trie and Suffix Tree
  • 9. Memory Requirements for Arabidopsis thaliana Conducted at the Ohio Supercomputer Center
  • 10. Execution Times for Arabidopsis thaliana
  • 11. Speedup, efficiency and timing using A. thaliana core promoter sequences. Analyzing the Parallel System
  • 12. Shared and Distributed Memory Speedup Radix Trie Suffix Tree
  • 13. Shared and Distributed Memory Efficiency Radix Trie Suffix Tree
  • 14. Shared and Distributed Memory Performance Radix Trie Suffix Tree
  • 15. Scoring Speedup Contribution Runtime Scoring
  • 17. Summary Parallel Shared memory on single nodes Distributed memory on 5 nodes High-throughput Full genomes analyzed in under 5 hours Long word lengths Genomes approaching 20 Smaller files often 100 or greater Powerful analysis Detailed statistics Degeneracy via clustering Additional post-processing (scatter plots, logos, etc.)
  • 18. Future Work Post-processing Word distributions Sequence clustering Gbrowse visualization Further parallelization Within a node Greater distributed abstraction (more prefixes)

Notes de l'éditeur

  1. MPI: Widely Supported by network interface designers