SlideShare une entreprise Scribd logo
1  sur  44
Data Marts Integrate the ProteomeJay Vyas
The Information Content of the Proteome      Knowledge     Information         Data 1) cdc2+, cyclinB+, Mitosis,  2) cdc2-, Arrest 3) cdc2 Binds Importin alpha/beta. …
Evolution of a Relational Proteome NCBI PDB SCOP PDGF-VSIS … 1965      	    1975                  1985                 1995                 2005 HGP Insulin Atlas Smith Waterman; NEWAT Needleman Wunsch       REFSEQ SWISSPROT Protein Domains
	Data  vs. Knowledge Data > Information Sequences Structures/Functions http://bytesizebio.net/http://www.dna.affrc.go.jp/growth/images/P-grwth-entrs.gifPLoS Comput Biol. 2006 Aug 25;2(8):e114. Epub 2006 Jul 14.Genome Res. 2008 March; 18(3): 449–461.  doi: 10.1101/gr.6943508.http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=fold-scop
An Integrated Framework for building Molecular Biological Data Marts Putting the model to use …
Data Marts : Targeted Integration FlatData Repositories function structure sequence taxonomy
A Family of Data Driven Molecular Biology Tools     	Integrated of structure calculation via NMR. 	-hybrid methods, iterative processing, reproducibility spectra,sequence,chemical shifts -> structure 	Automated detection of signaling/binding motifs in a candidate protein. protein sequence -> biological activity 	        Filtration of “passenger” residues from specificity/functional residues  	        on surfaces of protein structures .   sequence + structure - > function       “Multidimensional” Sequence Comparison sequence + taxonomy -> evolution
Sequence + Spectrum -> Structure
CONNJUR WB integrates format conversion, data inspection, and integrative processing . . . .  Connjur-WB RNMRTK NMRPIPE CONNJURWB J Bio. NMR, 2011
Detection of functional subunits in proteins ,[object Object]
SwissProt vs Uniprot vs TREMBL
Machine Learning
Spearmint (+)
(Nuclear) Bacterial Proteins
Xanthippe (-)
Snake proteins (can’t bind ATP)
Domain databases? Bioinformatics. 2004 Aug 4;20 Suppl 1:i342-7. http://pir.georgetown.edu/pirwww/about/doc/tutorials/uniprot_struc.gif Bioinformatics (2001) 17 (10): 920-926
MinimotifMiner – a tool forpredicting protein function viaShort Sequence Peptide Motifs + _ ,[object Object],[object Object]
Relational Model of Functional Data - A Precise Model of Protein Functional Semantics. BMC Genomics, 2009
NCBI_FEDERATED + Mimosa RMSD = .9 BMC Genomics , 2009
A Peptide Annotation Pipeline BMC Bioinformatics 2010, 11:328
Further (GO) integration controls for the degenerate nature of motif searches ~400 ~400 ~900 PLOS One, 2010
Short Sequences are degenerate…Can they be merged withstructural and evolutionaryinformation ? Chemistry & Biology, January 2000 BMC Genomics, 2009
 Venn : An Integrated ApplicationFor Database Driven HomologyThreading of Protein Structures …. Nucleic Acids Research, 2009 Trends in Plant sciences, 2010
 VENN : "Twilight Zone"  Sequence Homology Threading NAR, 2009
VENN-InterfaceMiner : How do different SH3 binding peptides  functionally relate to one another ? Left to right …  1AZG (Human FYN) PRPLPVAP LYYGDWIPSNY 1AVZ (Human FYN) TPQVPL YD … GDWPSNY 1PRL (Chicken FYN) APPLPR YD ... WPNY (not shown) 1H3H (Mouse GRB2) SRSTK ENPSWWTLPANY
Standard BLASTSearches
SSPEs reside in the “Twilight Zone”  J. Bacteriology 2011
What happens when a sequence is inherently noisy ? max 100-250  eval 10E-3 ...  word size3-5  score matrix 80,62,30  gap?0,4    Q/N?     manskysktdvqqvkrqnqqsasgqgqygtef gsetdaqqvrkqnqsaeqnkqqns
Sequence mining in 2D
	Use a hypersensitive sequence search(+), and 	expand results into a 2nd dimension (-). Combined with taxonomical information  To pinpoint a first estimate of the gene’s appearance. J. Bacteriology 2011
R3 : A prototypical methodfor improved structure calculation.
R3: Convergence is generally improved by reseeding
Availability Sequence , Structure Sequence , Function Structure Sequence Taxonomy Function , Specificity  Sequence Taxonomy , Evolution  www.connjur.org mnm.engr.uconn.edu venn.vcell.uchc.edu www.bio-toolkit.com
NCBI_FEDERATED + EXPERT SYSTEM RMSD = .9 BMC Genomics , 2009
VENN : Fine grained analysis. Nuc. Acids Research, 2009
NCBI_FEDERATED : Taxonomy, Domain, Homologene & Refseq. Residue enrichment profiles.
VENN : Fine grained analysis of SH3 bound peptides--- reveals a similar interface for divergent sequences. Are the peptides similar to ? Left to right …  1AZG (Human FYN) PRPLPVAP LYYGDWIPSNY 1AVZ (Human FYN) TPQVPL YD … GDWPSNY 1PRL (Chicken FYN) APPLPR YD ... WPNY 1H3H (Mouse GRB2) SRSTK ENPSWWTLPANY
Solution : Use an hypersensitive sequence search, and expand results into a 2nd dimension. Combined with taxonomical information pinpoints a first estimate of the gene’s appearance.
Gene Duplication, Domain Reuse, Functional Motifs, and Varaince of Structural Specificity     - "Twilight Zone" homologies    - Structural Interfaces  - Binding Specificity  - Short Functional Motifs                Vertebrates appear to have arranged pre-existing components into a richer collection of domain architectures.                               Nature 2001
Doolittle * Functional Protein Bioinformatics     - CDD, MnM, Modular evolution of Proteins  * Database Normalization      - "Archival" -> low S/N ; unrepresentative   * Protein-centric sequence searching     - Rous Sarcoma Discovery (DNA, lost in                translation)  ***** All done before modern computing/database theory.
The Modern Age     Gen Bank  - archival     NCBI / EBI - sequence data curation PDB/BMRB - structural data curation, deposition GO - functional annotations  ...............................
What is data modelling ? - Ambiguety vs. Vagueness  - "Text" vs "Syntax"  - Biological Data : No clear "reference object".     Solution : CONTEXT
Integration Strategies Database Federation Architectures  Data Warehousing       Data Marts

Contenu connexe

Tendances

Evidence for tissue and stage-specific composition of the ribosome
Evidence for tissue and stage-specific composition of the ribosomeEvidence for tissue and stage-specific composition of the ribosome
Evidence for tissue and stage-specific composition of the ribosomeMichaelBiehl7
 
BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITS
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2Razzaqe
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomicsmikaelhuss
 
Biological database by kk sahu
Biological database by kk sahuBiological database by kk sahu
Biological database by kk sahuKAUSHAL SAHU
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 

Tendances (19)

Article
ArticleArticle
Article
 
NCBI
NCBINCBI
NCBI
 
Rishi
RishiRishi
Rishi
 
Evidence for tissue and stage-specific composition of the ribosome
Evidence for tissue and stage-specific composition of the ribosomeEvidence for tissue and stage-specific composition of the ribosome
Evidence for tissue and stage-specific composition of the ribosome
 
BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.
 
PROTEIN DATABASE
PROTEIN DATABASEPROTEIN DATABASE
PROTEIN DATABASE
 
Biological database
Biological databaseBiological database
Biological database
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Biological database by kk sahu
Biological database by kk sahuBiological database by kk sahu
Biological database by kk sahu
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Proteome databases
Proteome databasesProteome databases
Proteome databases
 

En vedette

Mid semester exam notes
Mid semester exam notesMid semester exam notes
Mid semester exam notesJac
 
The world is flat
The world is flatThe world is flat
The world is flats10075
 
Life scape ugadi and performance management
Life scape ugadi and performance managementLife scape ugadi and performance management
Life scape ugadi and performance managementNagarajKulkarni
 
电商体检报告
电商体检报告电商体检报告
电商体检报告wwwtravel
 
Mobile dla każdego, aplikacje niekoniecznie.
Mobile dla każdego, aplikacje niekoniecznie.Mobile dla każdego, aplikacje niekoniecznie.
Mobile dla każdego, aplikacje niekoniecznie.arekurban
 

En vedette (7)

Geradores
GeradoresGeradores
Geradores
 
Mid semester exam notes
Mid semester exam notesMid semester exam notes
Mid semester exam notes
 
ομάδες
ομάδεςομάδες
ομάδες
 
The world is flat
The world is flatThe world is flat
The world is flat
 
Life scape ugadi and performance management
Life scape ugadi and performance managementLife scape ugadi and performance management
Life scape ugadi and performance management
 
电商体检报告
电商体检报告电商体检报告
电商体检报告
 
Mobile dla każdego, aplikacje niekoniecznie.
Mobile dla każdego, aplikacje niekoniecznie.Mobile dla każdego, aplikacje niekoniecznie.
Mobile dla każdego, aplikacje niekoniecznie.
 

Similaire à Thesis def

bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Bioinformatics applications and challenges
Bioinformatics applications and challengesBioinformatics applications and challenges
Bioinformatics applications and challengesS V Singh
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsMark Gerstein
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
Integrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS dataIntegrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS dataJoão André Carriço
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics finalRainu Rajeev
 
BolingerJustin - Honors Thesis
BolingerJustin - Honors ThesisBolingerJustin - Honors Thesis
BolingerJustin - Honors ThesisJustin P. Bolinger
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Solutions
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRONPrabin Shakya
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informaticsDaniela Rotariu
 

Similaire à Thesis def (20)

bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Bioinformatics applications and challenges
Bioinformatics applications and challengesBioinformatics applications and challenges
Bioinformatics applications and challenges
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 Nets
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Integrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS dataIntegrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS data
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics final
 
Ismb2009
Ismb2009Ismb2009
Ismb2009
 
BolingerJustin - Honors Thesis
BolingerJustin - Honors ThesisBolingerJustin - Honors Thesis
BolingerJustin - Honors Thesis
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Semantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life SciencesSemantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life Sciences
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 

Dernier

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Thesis def

  • 1. Data Marts Integrate the ProteomeJay Vyas
  • 2. The Information Content of the Proteome Knowledge Information Data 1) cdc2+, cyclinB+, Mitosis,  2) cdc2-, Arrest 3) cdc2 Binds Importin alpha/beta. …
  • 3. Evolution of a Relational Proteome NCBI PDB SCOP PDGF-VSIS … 1965 1975 1985 1995 2005 HGP Insulin Atlas Smith Waterman; NEWAT Needleman Wunsch REFSEQ SWISSPROT Protein Domains
  • 4. Data vs. Knowledge Data > Information Sequences Structures/Functions http://bytesizebio.net/http://www.dna.affrc.go.jp/growth/images/P-grwth-entrs.gifPLoS Comput Biol. 2006 Aug 25;2(8):e114. Epub 2006 Jul 14.Genome Res. 2008 March; 18(3): 449–461. doi: 10.1101/gr.6943508.http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=fold-scop
  • 5. An Integrated Framework for building Molecular Biological Data Marts Putting the model to use …
  • 6. Data Marts : Targeted Integration FlatData Repositories function structure sequence taxonomy
  • 7. A Family of Data Driven Molecular Biology Tools Integrated of structure calculation via NMR. -hybrid methods, iterative processing, reproducibility spectra,sequence,chemical shifts -> structure Automated detection of signaling/binding motifs in a candidate protein. protein sequence -> biological activity Filtration of “passenger” residues from specificity/functional residues on surfaces of protein structures . sequence + structure - > function “Multidimensional” Sequence Comparison sequence + taxonomy -> evolution
  • 8. Sequence + Spectrum -> Structure
  • 9. CONNJUR WB integrates format conversion, data inspection, and integrative processing . . . . Connjur-WB RNMRTK NMRPIPE CONNJURWB J Bio. NMR, 2011
  • 10.
  • 17. Domain databases? Bioinformatics. 2004 Aug 4;20 Suppl 1:i342-7. http://pir.georgetown.edu/pirwww/about/doc/tutorials/uniprot_struc.gif Bioinformatics (2001) 17 (10): 920-926
  • 18.
  • 19. Relational Model of Functional Data - A Precise Model of Protein Functional Semantics. BMC Genomics, 2009
  • 20. NCBI_FEDERATED + Mimosa RMSD = .9 BMC Genomics , 2009
  • 21. A Peptide Annotation Pipeline BMC Bioinformatics 2010, 11:328
  • 22. Further (GO) integration controls for the degenerate nature of motif searches ~400 ~400 ~900 PLOS One, 2010
  • 23. Short Sequences are degenerate…Can they be merged withstructural and evolutionaryinformation ? Chemistry & Biology, January 2000 BMC Genomics, 2009
  • 24. Venn : An Integrated ApplicationFor Database Driven HomologyThreading of Protein Structures …. Nucleic Acids Research, 2009 Trends in Plant sciences, 2010
  • 25.  VENN : "Twilight Zone"  Sequence Homology Threading NAR, 2009
  • 26. VENN-InterfaceMiner : How do different SH3 binding peptides  functionally relate to one another ? Left to right … 1AZG (Human FYN) PRPLPVAP LYYGDWIPSNY 1AVZ (Human FYN) TPQVPL YD … GDWPSNY 1PRL (Chicken FYN) APPLPR YD ... WPNY (not shown) 1H3H (Mouse GRB2) SRSTK ENPSWWTLPANY
  • 28. SSPEs reside in the “Twilight Zone” J. Bacteriology 2011
  • 29. What happens when a sequence is inherently noisy ? max 100-250  eval 10E-3 ...  word size3-5  score matrix 80,62,30  gap?0,4    Q/N?     manskysktdvqqvkrqnqqsasgqgqygtef gsetdaqqvrkqnqsaeqnkqqns
  • 31. Use a hypersensitive sequence search(+), and expand results into a 2nd dimension (-). Combined with taxonomical information To pinpoint a first estimate of the gene’s appearance. J. Bacteriology 2011
  • 32. R3 : A prototypical methodfor improved structure calculation.
  • 33. R3: Convergence is generally improved by reseeding
  • 34. Availability Sequence , Structure Sequence , Function Structure Sequence Taxonomy Function , Specificity Sequence Taxonomy , Evolution www.connjur.org mnm.engr.uconn.edu venn.vcell.uchc.edu www.bio-toolkit.com
  • 35. NCBI_FEDERATED + EXPERT SYSTEM RMSD = .9 BMC Genomics , 2009
  • 36. VENN : Fine grained analysis. Nuc. Acids Research, 2009
  • 37. NCBI_FEDERATED : Taxonomy, Domain, Homologene & Refseq. Residue enrichment profiles.
  • 38. VENN : Fine grained analysis of SH3 bound peptides--- reveals a similar interface for divergent sequences. Are the peptides similar to ? Left to right … 1AZG (Human FYN) PRPLPVAP LYYGDWIPSNY 1AVZ (Human FYN) TPQVPL YD … GDWPSNY 1PRL (Chicken FYN) APPLPR YD ... WPNY 1H3H (Mouse GRB2) SRSTK ENPSWWTLPANY
  • 39. Solution : Use an hypersensitive sequence search, and expand results into a 2nd dimension. Combined with taxonomical information pinpoints a first estimate of the gene’s appearance.
  • 40. Gene Duplication, Domain Reuse, Functional Motifs, and Varaince of Structural Specificity     - "Twilight Zone" homologies    - Structural Interfaces - Binding Specificity - Short Functional Motifs               Vertebrates appear to have arranged pre-existing components into a richer collection of domain architectures.                               Nature 2001
  • 41. Doolittle * Functional Protein Bioinformatics     - CDD, MnM, Modular evolution of Proteins * Database Normalization      - "Archival" -> low S/N ; unrepresentative   * Protein-centric sequence searching     - Rous Sarcoma Discovery (DNA, lost in                translation) ***** All done before modern computing/database theory.
  • 42. The Modern Age     Gen Bank  - archival   NCBI / EBI - sequence data curation PDB/BMRB - structural data curation, deposition GO - functional annotations  ...............................
  • 43. What is data modelling ? - Ambiguety vs. Vagueness  - "Text" vs "Syntax"  - Biological Data : No clear "reference object".     Solution : CONTEXT
  • 44. Integration Strategies Database Federation Architectures Data Warehousing       Data Marts
  • 45. When To Federate ? * New Genomes... Draft sequences. * Reproducibility is less important than insight.  
  • 46. Stark et Al. Control of the G2/M Transition 2006
  • 47. Problem: There are hundreds of native peptides which possess subsequences which are predicted to have SH3 binding properties. For example [KR]..[KR] and P..P are known to interact with SH3 domains.  But there is no method for comparing the structural binding mechanisms behind these variant peptides.  This is necessary, given the fact that there are hundreds of SH3 domains in the human genome, with several diverse structures existing in the protein data bank, which cannot be collectively analyzed by eye. Solution: Use the VENN program for homology titration to extract molecular interfaces from SH3 bound peptides. 1) For each atom “a1” in each peptide chain of a structure For each atom in “a2” DIFFERENT chain of the same structure. Is “a1” close to “a2” ? If yes, store a1,a2. If no, keep going. 2) Now, create a “synthetic structure”, which extracts residues associated with only atoms stored in step (1), which ignores covalent peptide bonds entirely. This structure represents a molecular interface, where all non interacting residues are considered to be “extraneous noise”. 3) To test the biological relevance of the molecular interface, apply it to varying species : Is the same signature generated from different structures ? Conclusion: Although the W/P/N/Y residues in SH3 domains are far apart and variably spaced in sequence distance, they may have evolved to possess a common feature : Conformance to a highly specific molecular interface. Mouse GRB2 / Human FYN are completely different domains, in different species, which bind different peptides …. Yet surprisingly, their binding sites conform to the same interface. Venn is available at  http://sbtools.uchc.edu/venn. Results Left to right … 1AZG (Human FYN) PRPLPVAP bindsLYYGDWIPSNY 1AVZ (Human FYN) TPQVPL bindsYD … GDWPSNY 1PRL (Chicken FYN) APPLPR binds YD ... WPNY 1H3H (Mouse GRB2) SRSTK binds ENPSWWTLPANY
  • 48. Orthologous Homology Threading : Course Grained Function . . .
  • 49.
  • 57. Non SH3 bound, non PXXP
  • 58.

Notes de l'éditeur

  1. Data is the evidence of our measurements, essentially useless, except for book keeping. Information is data that is meaninfgul ; has relationships, and context. Knowledge is readily useful factual models and descriptions. Like the model of CDC2’s role in the G2/M transition.The reason why we do computational biology is that the vast amount of proteins and networks in the cell cannotPossibly be “held” in the mind of a human being – 20,000 proteins, easily 10,000 in a liver cell, with concentrations of up To 1 million per cell. Promiscuiity and regulation, which determine cell fate and physiology cannot be readily analyzed by any one person.Thus , we have the “proteome” – the collection of relationships , sequences, and structures that allow us to make classify and make general conclusionsAbout the specific networks of protein driven processes in the cell……
  2. 1700s linnaeus : classification of life forms is more important than just tallying them up.1980s doolittle : archiving proteins is not of value unless we classify them in a non redundant manner that’s consistent with how proteins evolved, via duplication.Doolittle, interested in gene duplication, not a computer guy, built NEWAT as a new version of margaretdayhoffs atlas and encouragd people to use it For protein-centric sequence searching…. And was able to find that different proteins in different organisms shared common features on a grand scale.
  3. Where are we now ? We now know that doolittle was right – the human genome is highly modular, with one of the highest enrichments of multidomainProteins of any organisms. Maybe by integrating information, we can transfer informattion between proteins more efficiently and effectively….thus decreasing the gap betweenSequence data and sequence knowledge….