SlideShare a Scribd company logo
1 of 12
Interview with
Carol Scott
PhD, Bioengineering
Bioinformatics Scientist and Curator
Conserved Domain Database
A project of the U.S. National Library of Medicine at the
National Institutes of Health,
National Center for Biotechnology Information
Katie Rapp
LBSC 690
March 1, 2011
Protein is Everything!
 Every living thing is made up of unique, identifiable proteins
 Examples: human hemoglobin, insulin, proteins in
fungus, bacteria, plants
 Proteins are made of different combinations of amino acids
 20 naturally-occurring amino acids; they are like beads in a necklace
and their order determines the type of protein
 Proteins do the work inside cells
 Examples: Hemoglobin carries oxygen in the blood, insulin regulates
glucose metabolism
Problems with Proteins
 Proteins do the work inside cells, so when there are
problems, such as diseases, they are often caused by a
defective protein
 Example: Sickle Cell Anemia (one change in one amino acid in
hemoglobin and you go from healthy to ill)
 Medical researchers study proteins at the molecular level in
order to find cures to diseases
Conserved Domains –
Motivation behind the
database
 The amino acid chains that make up proteins are coiled and
folded. Repeated blocks of coiled and folded amino acids are
referred to as “conserved domains.”
 Conserved domains have specific functions and 3-
dimensional shapes
 It is useful for researchers to be able to compare related
conserved domains in different proteins, but there was no
real way to do this in the past
Conserved Domain Database -
Development
 This database was developed to meet the needs of
researchers
 Project begun in 2001; Carol Scott has worked on it since
2002
 Worked with software developers to produce highly-
interactive database
Conserved Domain Database
Curators
 Carol Scott and other curators create the data in the
database from lists of amino acid sequences found in other
databases
 They take amino acid sequences from millions of proteins
and link them based on structural and functional similarities
 They work with programmers to create the interface and
visual output of the database
 Curators also find and provide links to information about each
protein, journal articles and other resources, related proteins
Conserved Domain Database -
Challenges
 Not all amino acid sequence information is reliable – curators
must pick and choose where they get the basic data to put
into their database
 The process of creating the comparisons in the CDD is very
complex and time-consuming
 Software exists to help find these comparisons, but much
work must be done manually based on knowledge of the
chemical attributes of the amino acids
 The project is currently facing budgetary cutbacks which
affect staffing and perhaps the future of the database
Conserved Domain Database
Results
 Enables scientists to search on specific amino acid chains of
interest to them
 Genetic studies, mutation studies, studying size, shape and
function of proteins
 They can find and compare similar chemical alignments in
different proteins
 These alignments can provide insight into the functions of
different parts of protein molecules
Conserved Domain Database
Output – 3-Dimensional Structures
Conserved Domain Database
Output - Superfamilies
Conserved Domain Database
Users – Who Are They?
 The database is freely accessible to anyone over the internet
 It is used frequently by researchers around the world
 Users include anyone studying proteins – everyone from high
school and college students up to very high level researchers
at NIH, pharmaceutical companies, genetic researchers,
bioengineering firms, etc.
 Can be used to spur further research into areas where
defects in proteins could be repaired using genetic
engineering
Conserved Domain Database
 Questions?

More Related Content

Similar to Interview with NCBI Staff Scientist Carol Scott

Biochemistry-Student-Copy.pptx
Biochemistry-Student-Copy.pptxBiochemistry-Student-Copy.pptx
Biochemistry-Student-Copy.pptx
Ellahdulpina
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
57.insilico studies of cellulase from Aspergillus terreus
57.insilico studies of cellulase from Aspergillus terreus57.insilico studies of cellulase from Aspergillus terreus
57.insilico studies of cellulase from Aspergillus terreus
Annadurai B
 

Similar to Interview with NCBI Staff Scientist Carol Scott (20)

Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Biochemistry-Student-Copy.pptx
Biochemistry-Student-Copy.pptxBiochemistry-Student-Copy.pptx
Biochemistry-Student-Copy.pptx
 
Databases
DatabasesDatabases
Databases
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
Protein motif analysis and optimization using neural algorithms
Protein motif analysis and optimization using neural algorithmsProtein motif analysis and optimization using neural algorithms
Protein motif analysis and optimization using neural algorithms
 
The Importance of an Amino Acid Library Iroa
The Importance of an Amino Acid Library IroaThe Importance of an Amino Acid Library Iroa
The Importance of an Amino Acid Library Iroa
 
Chibucos annot go_final
Chibucos annot go_finalChibucos annot go_final
Chibucos annot go_final
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
Types of biological databases-protein database
Types of biological databases-protein databaseTypes of biological databases-protein database
Types of biological databases-protein database
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
Intro to in silico drug discovery 2014
Intro to in silico drug discovery 2014Intro to in silico drug discovery 2014
Intro to in silico drug discovery 2014
 
57.insilico studies of cellulase from Aspergillus terreus
57.insilico studies of cellulase from Aspergillus terreus57.insilico studies of cellulase from Aspergillus terreus
57.insilico studies of cellulase from Aspergillus terreus
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Interview with NCBI Staff Scientist Carol Scott

  • 1. Interview with Carol Scott PhD, Bioengineering Bioinformatics Scientist and Curator Conserved Domain Database A project of the U.S. National Library of Medicine at the National Institutes of Health, National Center for Biotechnology Information Katie Rapp LBSC 690 March 1, 2011
  • 2. Protein is Everything!  Every living thing is made up of unique, identifiable proteins  Examples: human hemoglobin, insulin, proteins in fungus, bacteria, plants  Proteins are made of different combinations of amino acids  20 naturally-occurring amino acids; they are like beads in a necklace and their order determines the type of protein  Proteins do the work inside cells  Examples: Hemoglobin carries oxygen in the blood, insulin regulates glucose metabolism
  • 3. Problems with Proteins  Proteins do the work inside cells, so when there are problems, such as diseases, they are often caused by a defective protein  Example: Sickle Cell Anemia (one change in one amino acid in hemoglobin and you go from healthy to ill)  Medical researchers study proteins at the molecular level in order to find cures to diseases
  • 4. Conserved Domains – Motivation behind the database  The amino acid chains that make up proteins are coiled and folded. Repeated blocks of coiled and folded amino acids are referred to as “conserved domains.”  Conserved domains have specific functions and 3- dimensional shapes  It is useful for researchers to be able to compare related conserved domains in different proteins, but there was no real way to do this in the past
  • 5. Conserved Domain Database - Development  This database was developed to meet the needs of researchers  Project begun in 2001; Carol Scott has worked on it since 2002  Worked with software developers to produce highly- interactive database
  • 6. Conserved Domain Database Curators  Carol Scott and other curators create the data in the database from lists of amino acid sequences found in other databases  They take amino acid sequences from millions of proteins and link them based on structural and functional similarities  They work with programmers to create the interface and visual output of the database  Curators also find and provide links to information about each protein, journal articles and other resources, related proteins
  • 7. Conserved Domain Database - Challenges  Not all amino acid sequence information is reliable – curators must pick and choose where they get the basic data to put into their database  The process of creating the comparisons in the CDD is very complex and time-consuming  Software exists to help find these comparisons, but much work must be done manually based on knowledge of the chemical attributes of the amino acids  The project is currently facing budgetary cutbacks which affect staffing and perhaps the future of the database
  • 8. Conserved Domain Database Results  Enables scientists to search on specific amino acid chains of interest to them  Genetic studies, mutation studies, studying size, shape and function of proteins  They can find and compare similar chemical alignments in different proteins  These alignments can provide insight into the functions of different parts of protein molecules
  • 9. Conserved Domain Database Output – 3-Dimensional Structures
  • 11. Conserved Domain Database Users – Who Are They?  The database is freely accessible to anyone over the internet  It is used frequently by researchers around the world  Users include anyone studying proteins – everyone from high school and college students up to very high level researchers at NIH, pharmaceutical companies, genetic researchers, bioengineering firms, etc.  Can be used to spur further research into areas where defects in proteins could be repaired using genetic engineering