SlideShare une entreprise Scribd logo
1  sur  32
Azhar Ali Shah @ Interdisciplinary Optimization and Decision Making  Journal Club (IODMJC) IODMJC, March 20 , 2009
Overview  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Introduction:  authors Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Introduction:  Hierarchical  Clustering Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Introduction:  Hierarchical Clustering ,[object Object],[object Object],[object Object],[object Object],[object Object],Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Introduction:  about the topic  Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 There is no guideline for selecting the best linkage method. In practice, people almost always use  average linkage. UPGMA  (Unweighted Pair Group Method using arithmetic Averages) Scalable to large datasets as it requires only (O(1)) edges in memory. BUT Highly susceptible to outliers!
Introduction:  UPGMA ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction:  UPGMA -Sparse input N=11  input singletons ( vertices ): {1,2,3,4,11,12,13,14,21,22,23}  and  14 edges  in the sparse input.   The input is considered  sparse  since  not all pairs are given  e.g. there is no edge b/w 1 and 22.  Clusters  1,2,3,4  form a  clique  A.  Clusters  11,12,13,14  are missing edge < 11,14 > to form  clique  B.  Clusters  21,22,23  are loosely connected to each other and to the cluster of  clique  A.  In total there are two connected components in the input graph:  ({1,2,3,4,21,22,23})  (producing 6 merges for 7 vertices) and  {11,12,13,14}  (producing 4 merges for 3 nodes), which therefore forms a  forest of two disjoint trees , rather than the full tree of N-1=10 merges.  UPGMA-input 90 23 1 70 23 22 50 22 21 30 14 13 20 14 12 12 13 12 11 13 11 1e+01 12 11 4e-10 4 3 1e-50 4 2 1e-80 3 2 2e-40 4 1 1e-40 3 1 1e-100 2 1 UPGMA-tree 32 99.167 31 26 31 85 29 23 30 50 28 14 29 50 22 21 28 11.5 27 13 27 10 12 11 26 1.33e-10 25 4 25 5e-41 24 3 24 1e-100 2 1
Research Problem:  UPGMA ,[object Object],Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 This data renders UPGMA impractical
Methodology: 1)  Sparse-UPGMA Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 Can’t  cope with huge datasets, where an  O ( E ) memory requirement is intolerable (e.g. Table 1).  UPGMA (mean): New eq: Time and memory improvement:
Methodology: 2)  Multi-Round MC-UPGMA ,[object Object],[object Object],[object Object],Illustration of  non-metric  constraints imposed by BLAST sequence similarities (eges).  False transitivity  is possible due to CSKP_HUMAN.
Methodology: 2)  Multi-Round MC-UPGMA ,[object Object],[object Object],Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Methodology: 2)  Multi-Round MC-UPGMA Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 ,[object Object],[object Object]
Methodology: 2)  Single-Round MC-UPGMA Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 Requires O(n) memory for holding forming tree!
Methodology: 2)  Single-Round MC-UPGMA
Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Methods ,[object Object],[object Object],[object Object],Jaccard Score
Results ,[object Object],[object Object],[object Object],[object Object]
Results Smith–Waterman BLAST Sparse UPGMA With reduced dataset 220K 1.80M
Results 200 clustering rounds on a single 4GB memory 4-CPU workstation took about 1-2 days.
Results
Observations ,[object Object],[object Object]
Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Cluster Card Page
View Proteins of Cluster
Keywords Appearances
Cluster Similarity Distribution
similarity matrix for the proteins in this cluster
 
 
 
 

Contenu connexe

Tendances

B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticezahid6
 
BTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsBTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsAmrinder Arora
 
Blast fasta
Blast fastaBlast fasta
Blast fastayaghava
 
Graphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsGraphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsAmrinder Arora
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeProf. Wim Van Criekinge
 
Swaati algorithm of alignment ppt
Swaati algorithm of alignment pptSwaati algorithm of alignment ppt
Swaati algorithm of alignment pptSwati Kumari
 
Product to a Power
Product to a PowerProduct to a Power
Product to a Powertoni dimella
 
Splay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresSplay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresAmrinder Arora
 
Prediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsPrediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsziggurat
 
Data Structure with C -Part-2 ADT,Array, Strucure and Union
Data Structure with C -Part-2 ADT,Array, Strucure and  UnionData Structure with C -Part-2 ADT,Array, Strucure and  Union
Data Structure with C -Part-2 ADT,Array, Strucure and UnionSyed Mustafa
 

Tendances (20)

B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informatice
 
Syabus
SyabusSyabus
Syabus
 
BTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsBTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTs
 
Phylogenetics: Tree building
Phylogenetics: Tree buildingPhylogenetics: Tree building
Phylogenetics: Tree building
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
 
Graphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsGraphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their Representations
 
synopsis_divyesh
synopsis_divyeshsynopsis_divyesh
synopsis_divyesh
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekinge
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
 
blast and fasta
 blast and fasta blast and fasta
blast and fasta
 
Upgma
UpgmaUpgma
Upgma
 
Use of the Tree.
Use of the Tree.Use of the Tree.
Use of the Tree.
 
Swaati algorithm of alignment ppt
Swaati algorithm of alignment pptSwaati algorithm of alignment ppt
Swaati algorithm of alignment ppt
 
Product to a Power
Product to a PowerProduct to a Power
Product to a Power
 
Biological sequences analysis
Biological sequences analysisBiological sequences analysis
Biological sequences analysis
 
Splay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresSplay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data Structures
 
Prediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsPrediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methods
 
Slides -a._afanasiev
Slides  -a._afanasievSlides  -a._afanasiev
Slides -a._afanasiev
 
Data Structure with C -Part-2 ADT,Array, Strucure and Union
Data Structure with C -Part-2 ADT,Array, Strucure and  UnionData Structure with C -Part-2 ADT,Array, Strucure and  Union
Data Structure with C -Part-2 ADT,Array, Strucure and Union
 

En vedette

Final Journal Club Presentation
Final Journal Club PresentationFinal Journal Club Presentation
Final Journal Club PresentationAnna Schemel
 
The Structural Basis for Agonist and Partial Agonist
The Structural Basis for Agonist and Partial AgonistThe Structural Basis for Agonist and Partial Agonist
The Structural Basis for Agonist and Partial AgonistLucas Man
 
20140328 TNTL journal club axion electrodynamics, TI-FI interface (nomura, ...
20140328 TNTL journal club   axion electrodynamics, TI-FI interface (nomura, ...20140328 TNTL journal club   axion electrodynamics, TI-FI interface (nomura, ...
20140328 TNTL journal club axion electrodynamics, TI-FI interface (nomura, ...Dongwook Go
 
Pseudogene Journal Club Presentation
Pseudogene Journal Club PresentationPseudogene Journal Club Presentation
Pseudogene Journal Club PresentationLucas Man
 
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill Adults
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill AdultsJournal Club - Early versus Late Parenteral Nutrition in Critically Ill Adults
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill AdultsJoy Awoniyi
 
Schaefer, Joseph, R. Fidaxomicin Presentation
Schaefer, Joseph, R. Fidaxomicin PresentationSchaefer, Joseph, R. Fidaxomicin Presentation
Schaefer, Joseph, R. Fidaxomicin PresentationJoseph Schaefer
 
Parkinson's Disease Presentation
Parkinson's Disease PresentationParkinson's Disease Presentation
Parkinson's Disease PresentationSteven Zuckerman
 
Azithromycin for prevention of exacerbations of copd
Azithromycin for prevention of exacerbations of copdAzithromycin for prevention of exacerbations of copd
Azithromycin for prevention of exacerbations of copdWarawut Ia
 
Acute exacerbation of COPD
Acute exacerbation of COPDAcute exacerbation of COPD
Acute exacerbation of COPDThomas Kurian
 
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...Hofstra Northwell School of Medicine
 
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile Infection
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile InfectionJournal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile Infection
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile InfectionJoy Awoniyi
 
Prevention of Venous Thromboembolism
Prevention of Venous ThromboembolismPrevention of Venous Thromboembolism
Prevention of Venous ThromboembolismJoy Awoniyi
 
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary Syndromes
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary SyndromesJournal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary Syndromes
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary SyndromesJoy Awoniyi
 
Parkinsons Disease
Parkinsons DiseaseParkinsons Disease
Parkinsons Diseasetest
 
How to present a journal club
How to present a journal clubHow to present a journal club
How to present a journal clubsanch1684
 

En vedette (19)

Journal Club @ UVigo 2011.07.22
Journal Club @ UVigo 2011.07.22Journal Club @ UVigo 2011.07.22
Journal Club @ UVigo 2011.07.22
 
Final Journal Club Presentation
Final Journal Club PresentationFinal Journal Club Presentation
Final Journal Club Presentation
 
The Structural Basis for Agonist and Partial Agonist
The Structural Basis for Agonist and Partial AgonistThe Structural Basis for Agonist and Partial Agonist
The Structural Basis for Agonist and Partial Agonist
 
20140328 TNTL journal club axion electrodynamics, TI-FI interface (nomura, ...
20140328 TNTL journal club   axion electrodynamics, TI-FI interface (nomura, ...20140328 TNTL journal club   axion electrodynamics, TI-FI interface (nomura, ...
20140328 TNTL journal club axion electrodynamics, TI-FI interface (nomura, ...
 
Pseudogene Journal Club Presentation
Pseudogene Journal Club PresentationPseudogene Journal Club Presentation
Pseudogene Journal Club Presentation
 
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill Adults
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill AdultsJournal Club - Early versus Late Parenteral Nutrition in Critically Ill Adults
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill Adults
 
Schaefer, Joseph, R. Fidaxomicin Presentation
Schaefer, Joseph, R. Fidaxomicin PresentationSchaefer, Joseph, R. Fidaxomicin Presentation
Schaefer, Joseph, R. Fidaxomicin Presentation
 
Rituximab CJASN Journal Club
Rituximab CJASN Journal ClubRituximab CJASN Journal Club
Rituximab CJASN Journal Club
 
Parkinson's Disease Presentation
Parkinson's Disease PresentationParkinson's Disease Presentation
Parkinson's Disease Presentation
 
Azithromycin for prevention of exacerbations of copd
Azithromycin for prevention of exacerbations of copdAzithromycin for prevention of exacerbations of copd
Azithromycin for prevention of exacerbations of copd
 
Acute exacerbation of COPD
Acute exacerbation of COPDAcute exacerbation of COPD
Acute exacerbation of COPD
 
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...
 
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile Infection
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile InfectionJournal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile Infection
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile Infection
 
Genetic Basis Of Parkinson Disease
Genetic Basis Of Parkinson DiseaseGenetic Basis Of Parkinson Disease
Genetic Basis Of Parkinson Disease
 
Prevention of Venous Thromboembolism
Prevention of Venous ThromboembolismPrevention of Venous Thromboembolism
Prevention of Venous Thromboembolism
 
Journal Club
Journal ClubJournal Club
Journal Club
 
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary Syndromes
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary SyndromesJournal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary Syndromes
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary Syndromes
 
Parkinsons Disease
Parkinsons DiseaseParkinsons Disease
Parkinsons Disease
 
How to present a journal club
How to present a journal clubHow to present a journal club
How to present a journal club
 

Similaire à Presentation 2009 Journal Club Azhar Ali Shah

The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...CSCJournals
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...Waqas Tariq
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programmingNixon Mendez
 
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics Computational Materials Science Initiative
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...TELKOMNIKA JOURNAL
 
Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS TECSI FEA USP
 
Msa & rooted/unrooted tree
Msa & rooted/unrooted treeMsa & rooted/unrooted tree
Msa & rooted/unrooted treeSamiul Ehsan
 
04 15029 active node ijeecs 1570310145(edit)
04 15029 active node ijeecs 1570310145(edit)04 15029 active node ijeecs 1570310145(edit)
04 15029 active node ijeecs 1570310145(edit)nooriasukmaningtyas
 
Nural network ER.Abhishek k. upadhyay
Nural network  ER.Abhishek k. upadhyayNural network  ER.Abhishek k. upadhyay
Nural network ER.Abhishek k. upadhyayabhishek upadhyay
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
Graph theoretic neuromorphology
Graph theoretic neuromorphologyGraph theoretic neuromorphology
Graph theoretic neuromorphologyTamalBatabyal
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Natalio Krasnogor
 
Elastic path2path (International Conference on Image Processing'18)
Elastic path2path (International Conference on Image Processing'18)Elastic path2path (International Conference on Image Processing'18)
Elastic path2path (International Conference on Image Processing'18)TamalBatabyal
 

Similaire à Presentation 2009 Journal Club Azhar Ali Shah (20)

The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
report
reportreport
report
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
post119s1-file2
post119s1-file2post119s1-file2
post119s1-file2
 
BioINfo.pptx
BioINfo.pptxBioINfo.pptx
BioINfo.pptx
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...
 
Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS
 
Msa & rooted/unrooted tree
Msa & rooted/unrooted treeMsa & rooted/unrooted tree
Msa & rooted/unrooted tree
 
04 15029 active node ijeecs 1570310145(edit)
04 15029 active node ijeecs 1570310145(edit)04 15029 active node ijeecs 1570310145(edit)
04 15029 active node ijeecs 1570310145(edit)
 
Nural network ER.Abhishek k. upadhyay
Nural network  ER.Abhishek k. upadhyayNural network  ER.Abhishek k. upadhyay
Nural network ER.Abhishek k. upadhyay
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
FractalTreeIndex
FractalTreeIndexFractalTreeIndex
FractalTreeIndex
 
H010223640
H010223640H010223640
H010223640
 
Graph theoretic neuromorphology
Graph theoretic neuromorphologyGraph theoretic neuromorphology
Graph theoretic neuromorphology
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
Elastic path2path (International Conference on Image Processing'18)
Elastic path2path (International Conference on Image Processing'18)Elastic path2path (International Conference on Image Processing'18)
Elastic path2path (International Conference on Image Processing'18)
 

Dernier

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Dernier (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 

Presentation 2009 Journal Club Azhar Ali Shah

  • 1. Azhar Ali Shah @ Interdisciplinary Optimization and Decision Making Journal Club (IODMJC) IODMJC, March 20 , 2009
  • 2.
  • 3. Introduction: authors Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
  • 4. Introduction: Hierarchical Clustering Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
  • 5.
  • 6. Introduction: about the topic Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 There is no guideline for selecting the best linkage method. In practice, people almost always use average linkage. UPGMA (Unweighted Pair Group Method using arithmetic Averages) Scalable to large datasets as it requires only (O(1)) edges in memory. BUT Highly susceptible to outliers!
  • 7.
  • 8. Introduction: UPGMA -Sparse input N=11 input singletons ( vertices ): {1,2,3,4,11,12,13,14,21,22,23} and 14 edges in the sparse input. The input is considered sparse since not all pairs are given e.g. there is no edge b/w 1 and 22. Clusters 1,2,3,4 form a clique A. Clusters 11,12,13,14 are missing edge < 11,14 > to form clique B. Clusters 21,22,23 are loosely connected to each other and to the cluster of clique A. In total there are two connected components in the input graph: ({1,2,3,4,21,22,23}) (producing 6 merges for 7 vertices) and {11,12,13,14} (producing 4 merges for 3 nodes), which therefore forms a forest of two disjoint trees , rather than the full tree of N-1=10 merges. UPGMA-input 90 23 1 70 23 22 50 22 21 30 14 13 20 14 12 12 13 12 11 13 11 1e+01 12 11 4e-10 4 3 1e-50 4 2 1e-80 3 2 2e-40 4 1 1e-40 3 1 1e-100 2 1 UPGMA-tree 32 99.167 31 26 31 85 29 23 30 50 28 14 29 50 22 21 28 11.5 27 13 27 10 12 11 26 1.33e-10 25 4 25 5e-41 24 3 24 1e-100 2 1
  • 9.
  • 10. Methodology: 1) Sparse-UPGMA Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 Can’t cope with huge datasets, where an O ( E ) memory requirement is intolerable (e.g. Table 1). UPGMA (mean): New eq: Time and memory improvement:
  • 11.
  • 12.
  • 13.
  • 14. Methodology: 2) Single-Round MC-UPGMA Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 Requires O(n) memory for holding forming tree!
  • 15. Methodology: 2) Single-Round MC-UPGMA
  • 16.
  • 17.
  • 18.
  • 19. Results Smith–Waterman BLAST Sparse UPGMA With reduced dataset 220K 1.80M
  • 20. Results 200 clustering rounds on a single 4GB memory 4-CPU workstation took about 1-2 days.
  • 22.
  • 23. Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
  • 25. View Proteins of Cluster
  • 28. similarity matrix for the proteins in this cluster
  • 29.  
  • 30.  
  • 31.  
  • 32.