SlideShare une entreprise Scribd logo
1  sur  147
NCBI Boot Camp
NCBI “ ” ...advances science and health by providing access to biomedical and genomic information
NCBI Sequences Expression Genome maps Structures Protein Domains Homology (gene, protein, structure) Pathways Genetic Variation
NCBI tools databases
databases* * a brief survey of selected dbs
1 literature
PubMed Bookshelf OMIM
PubMed 20,672,941 citations 2,157,529 PubMed Central 5,519 indexed journals
Bookshelf 767
Dr. McKusick OMIM
Lesch-Nyhan If you query for Lesch-Nyhan, you get a very long OMIM record  OMIM
Clinical Features Biochemical Features Inheritance Pathogenesis Diagnosis History Description Cloning Gene Structure Mapping Molecular Genetics Pathogenesis Evolution Animal Model Allelic Variants See Also References Contributors Creation Date Edit History OMIM Note: there are separate entries for Lesch-Nyhan syndrome and the protein that causes the defect
OMIM Every OMIM Record has an extensive list of internal and external links
2 sequences
Nucleotide GenBank RefSeq
DNA RNA Protein EST: expressed sequence tag SNP: single nucleotide polymorphism WGS: whole genome sequencing CDS: coding sequence STS: sequence tagged site
NCBI SNP Primary  Databases GEO GenBank Protein
GenBank Format GenBank
    LOCUS Locus name, size, type, division, modification date Search tips:  	Locus names can change! 	Division names are historical,  	not taxonomical!
    DEFINITION As the author sees fit… Search tip: No Controlled Vocabulary in Definitions!
    ACCESSION/Version Accession numbers do not change, even if information in the record is changed at the author's request. Version and GI numbers change
    Keywords, Source, Organism Organism: Tied into Taxonomy Browser Search tip: Keywords are often blank When performing a “keyword” style search, use [all] , [word] or [title]
    Selected References Newest First Last “reference” covers submission information
    Features I Source, gene, misc features
    Features II CDS: links, translation
    Sequence
GenBank Format GenBank (also for protein)
132,015,054 Sequences in GenBank 3/20/11 + HARD WORK - redundancy RefSeq
RefSeqs provides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes RefSeq
bio mol DNA RNA Protein RefSeqs provides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes RefSeq
bio mol DNA RNA Protein RefSeqs HELLO my name is provides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes XX_123456 RefSeq
bio molecules Genomic DNA (NC) Incomplete (NG) mRNA (NM) Model mRNA (XM) Curated Protein (NP) Model protein  (XP) RefSeq
NG_012250.1 NM_000690.2 AY621070.1 EU414258.1 EU414257.1 EU414256.1 EU414255.1 EU414254.1 EU414253.1 EU414252.1 EU414251.1 EU414250.1 EU414249.1 AF164120.1 EU373813.1 EU373812.1 EU373811.1 EU373810.1 EU373809.1 EU373808.1 EU373807.1 EU373806.1 EU373805.1 EU373804.1 AH002599.1 M20456.1 M20455.1 M20454.1 M20453.1 M20452.1 M20451.1 M20450.1 M20449.1 M20448.1 M20447.1 M20446.1 M20445.1 M20444.1 CR456991.1 AB385105.1 CU678321.1 CU678320.1 AF073514.1 AF073513.1 AF073512.1 AF073511.1  NG_012250.1  NM_000690.2  RefSeq
Note: the NP sequence would not normally be found using a nucleotide search – I have included it only to show the complete suite of RefSeq for ALDH2 NG_012250.1  NM_000690.2  NP_000681.2 RefSeq
3 genes/genome
Genome Gene HomoloGene
Genome 1090 eukaryota 1483 prokaryota 2507 viruses
Note: genome records are either mitochondrial or chromosome Note: no common names are listed as genome query results
The genome record shows a variety of stats for different databases, as well as a map of the genome that is scrollable
Searching in BioProject yields common names
BioProject results contain background information
Instead of searching Genome, you can also browse via the Genome Resource Guide
Genome Resources G Genome BLAST B Map Viewer M Genome Project (BioProject) P
G Genes and Human Health Epigenomics The Genomic Sequence Maps and Markers Transcribed Sequences Cytogenetics Comparative Genomics A standard record in Genome Resources contains many links out along with brief database summaries
M Map Viewer starts by letting you select a chromosome (or section of a circular genome)
M
To the left of each gene, there are a variety of links out.  Note: these change based on the level of information known about a given gene. M HUGO Gene Nomenclature Sequence Viewer Protein Download Evidence Viewer Molecular Model STS, OMIM, CCDS, SNP
Regulatory Gene		 Intron Exon Intron
Each gene record provides extensive details.  We will go through an example Gene record in the following slides.
Sequence Viewer and MapViewer Genomic Info
Bibliography PubMed NOTE: Gene Reference into Function is an excellent resource for literature related to function.  These articles have been submitted for inclusion into GeneRIF and are not the product of an automated text search. GeneRIF
There’s Even More! Interactions Gene Ontology Genotypes Homologues Protein Information Interactions will list all known interacting molecules, providing links to
RefSeq These reference sequences are stable and are independent of genome builds
The NCBI Assembly ~100 individuals The Celera assembly ~5 individuals These reference sequences refer to specific builds HuRef Just Craig Ventner
LINKS LINKS: internal, external and commercial
HomoloGene
Homologs paralogs orthologs orthologs frog α chick α mouseα mouseβ chick β frogβ α-chain gene β-chain gene GENE DUPLICATION Early Gene of Interest
P3H1
Protein of Interest (P3H1) Cross-species identity is automatically calculated Automatic sequence alignments are easily accessible
Protein of Interest (P3H1) Note: UniGene  may come up with different results, since it is based on EST clusters and not protein sequence
4 expression & structure
UniGene      EST, GEO		 Structures		      CDD, MMDB, PubChem…
UniGene …an organized view of the transcriptome
SELECTED PROTEIN SIMILARITIES GENE EXPRESSION EST GEO MAPPING POSITION SEQUENCES mRNA EST
GENE EXPRESSION EST This is a “virtual northern” whereESTs are counted to get a rough sense of overall expression levels
GENE EXPRESSION GEO Note: the GEO results contain all arrays that assay for this gene; most of these results are for specific disease or altered states and do not necessarily reflect wild type, normal levels of expression
Structures		      CDD, MMDB, PubChem…
Cn3D colored by secondary structure Note: Cn3D has aligned the individual chains for you
Cn3D colored by chain (there are 7)
“structure function”: the hemolysin protein bores a hole into red blood cells and sucks their insides out.  The structure kind of looks like a hollow tack.
Note: the structure listing shows each individual chain (along with 3D domains and superfamilies) AND the chemical that was found in the crustal structure (see arrow)
Another example… this time a single chain with distinct domains
Now we are coloring by domain.  Also note the funky space-filling model.  It makes proteins look fat.
Note that Super Families are defined: clicking on them will take you to the conserved domain database
The Conserved Domains Database provides alignments across species of conserved domains, along with a general description of the domain
3D domains are color coded.  Note: 3D domains do not always correlate to Super Families! Clicking on the 3D domain will take you to related structures
You can select structures and then view the 3D alignment in Cn3D
Volia!  Structural alignment.  Note: the sequences are aligned in the Sequence View box.
PubChem has three primary areas: BioAssay – registry of assays that can be searched by small molecule Substance – a redundant registry of compounds Compound – a non-redundant, curated chemical database
You can search PubChem by chemical name, CAS number, or even by similar structures. Records contain lots of additional information.  Highlights: synonyms (which can be quite extensive in chemical nomenclature).  Of particular note: if the compound shows up in Structure, you can link to a view in Cn3D that shows it complexed with protein/DNA/RNA!
BioSystemswill display a short verbal description, a schematic of the system in question and a link to all of the genes, proteins, small molecules found in the system along with links to related systems .
NCBI discovery initiative
NCBI high quality DB discovery tools
high quality DB discovery tools RefSeq GenBank Database Ads check out these resources! Sensors are you looking for… Analysis tools pre-computed & on the fly
where do I start?
anywhere* *but gene acts as a good hub
Apolipoprotein E APOE Cys130Arg
We  Can  Do  It! Gene and RefSeq Genome Maps Allelic Var/Disease Expression Homologus G/P Structure
Search for APOE in Entrez: Note that there are many different records in several different databases that have hits for APOE.   Select PubMed.
Select APOE in homo sapiens We have used PubMed for it’s gene sensor , which is fantastically useful.  However, you can also search directly in the Gene database.
LOTS of information in this report, including links IN the report, links to other NCBI databases and links to outside resources.
Let’s check out the reference sequences….
Note the genomic, mRNA and protein RefSeq that are independently maintained.
Separate records for ref sequences associated with specific genomic builds…
(many databases here) Let’s check out the SNP:Variation Viewer
Note, the Cys130Arg variant has been frequently observed and well documented
Let’s observe the Sequence Viewer and MapViewer for this gene.
Note, you can change which sequence you want to observe (Stable reference, reference, celera and HuRef)
The full view shows genes in the area, along with info on SNPs and other variation classifications.
Full screen of MapViewer
ab initio modeling Ensemble Genes UniGene RefSeq These are default maps
You can change the maps you wish to view, both in terms of how you are annotating the genome but also in which organisms.
Here we are looking at Chimp, Mouse and Human Gene maps. You can zoom out to get a larger picture of the area.
APOE is part of the APO  gene cluster.  Note: lines between maps are mapped homologues.
Each gene has a series of links following its name.  We’ll jump to the APOE OMIM record.
Another extensive record!  Let’s jump to allelic variants.
Let’s go to the SNP record Cys130Arg is .0016 .0016 Extensive documentation…
OMIM Prot 3D SeqView GeneView MapView VarView PubMed
NOTE: the default setting doesn’t show much, because it doesn’t include clinically associated variants – click this box and refresh.
This is the one we want!  Let’s jump to this reference SNP
Hummmm… the two reference assemblies have a wild type allele, whereas Celera and HuRef carry the mutant allele. Let’s check out this area in HuRef using the sequence viewer – click on the chromosome position link.
Clicking on sequence will bring up the sequence and the CDS.  You will note that HuRef (which means Craig Ventner) carries the mutant allele.
(many databases here) Let’s check out expression in UniGene
Click on EST profile to go to the virtual northern.
Disease State BODY SITES Development
Click on GEO Profiles to see actually gene expression array data.
Note, there are thousands of hits, meaning many gene arrays have assayed for this gene.  However, most of these are in reference to a disease or altered state.
Use GDS596 – it is the results for  “normal” gene expression. Click on a chart to see detailed results.
Highest expression in the liver, with lower level throughout the brain. liver brain
(many databases here) Let’s check out homologs
You can show a pairwise alignment using BLAST…
E value Note the very low E value 1e-158
The alignment shows that the Chimp genome carries an R at the allele in question!
You can also check out homologs found in UniGene –  a different a way to search.
Bunnies show up using the UniGene homolog search, but not the HomoloGene search.
Let’s go check out the protein record…
Click here to link to the RefSeq protein record.
Let’s run a BLAST to see if we can identify the giant panda homolog.
I’ve changed the search to focus on the RefSeq protein database and limit it to the giant panda.
Note: BLAST automatically detects domains The highest hit is a hypothetical protein.  Let’s take a look at the alignment.
Note, the panda has the mutant arginine... … does this mean pandas and chimps both have early onset Alzheimer's disease?  Nobody knows!
Let’s check out some related structures.
This is the default setting.  Change to all similar MMDB.
Click here to go to an alignent between your query and the structure’s sequence.
Click here to view in Cn3D Note: the structure sequence contains the mutant arginine
Showing side chains, colored by hydrophobicity.  The arginine is shown in yellow. Click here to go to the structure summary for 1B68
Click here to find similar 3D domains
Select another structure and then view 3D alignment.
Overall alignment, showing side chains colored by hydrophobicity.  Note, the Cys vs. Arg doesn’t make a huge change structurally.
asdf science can be complex...
…we can help you with that.
thank  you
Jackie Wirz, PhD wirzj@ohsu.edu

Contenu connexe

Tendances

Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02
Sreekanth Gali
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
nadeem akhter
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
Meetika Gupta
 
Biological databases: Challenges in organization and usability
Biological databases: Challenges in organization and usabilityBiological databases: Challenges in organization and usability
Biological databases: Challenges in organization and usability
Lars Juhl Jensen
 

Tendances (20)

Rishi
RishiRishi
Rishi
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
TOOLS AND DATA BASES OF NCBI
TOOLS AND DATA BASES OF NCBITOOLS AND DATA BASES OF NCBI
TOOLS AND DATA BASES OF NCBI
 
Biological database
Biological databaseBiological database
Biological database
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02
 
Biological databases
Biological databasesBiological databases
Biological databases
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
 
Databases ii
Databases iiDatabases ii
Databases ii
 
B.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseB.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 database
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 
Biological databases: Challenges in organization and usability
Biological databases: Challenges in organization and usabilityBiological databases: Challenges in organization and usability
Biological databases: Challenges in organization and usability
 

En vedette

En vedette (6)

How to make a monkey: functional adaptation in the primate genome
How to make a monkey: functional adaptation in the primate genomeHow to make a monkey: functional adaptation in the primate genome
How to make a monkey: functional adaptation in the primate genome
 
AM Career Marketing OHSU RIPSS 2014
AM Career Marketing OHSU RIPSS 2014AM Career Marketing OHSU RIPSS 2014
AM Career Marketing OHSU RIPSS 2014
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 

Similaire à NCBI Boot Camp for Beginners Slides

Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
Abhik Seal
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
Atai Rabby
 

Similaire à NCBI Boot Camp for Beginners Slides (20)

Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics final
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
Gen bank
Gen bankGen bank
Gen bank
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Whole genome sequence
Whole genome sequenceWhole genome sequence
Whole genome sequence
 
Transcriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysisTranscriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysis
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
02. Biological sequence databases.pptx
02. Biological sequence databases.pptx02. Biological sequence databases.pptx
02. Biological sequence databases.pptx
 
Proteome databases
Proteome databasesProteome databases
Proteome databases
 
Apolo Taller en BIOS
Apolo Taller en BIOS Apolo Taller en BIOS
Apolo Taller en BIOS
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 

Plus de Jackie Wirz, PhD

Online NW 2015 Wirz Developing Novel Outreach Data Visualization
Online NW 2015 Wirz Developing Novel Outreach Data VisualizationOnline NW 2015 Wirz Developing Novel Outreach Data Visualization
Online NW 2015 Wirz Developing Novel Outreach Data Visualization
Jackie Wirz, PhD
 
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
Jackie Wirz, PhD
 
SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation
Jackie Wirz, PhD
 
Science is a moving target
Science is a moving targetScience is a moving target
Science is a moving target
Jackie Wirz, PhD
 
Powered by Libraries: Leveraging Libraries for Semantic Web and Linked Open D...
Powered by Libraries: Leveraging Libraries for Semantic Web and Linked Open D...Powered by Libraries: Leveraging Libraries for Semantic Web and Linked Open D...
Powered by Libraries: Leveraging Libraries for Semantic Web and Linked Open D...
Jackie Wirz, PhD
 

Plus de Jackie Wirz, PhD (20)

Online NW 2015 Wirz Developing Novel Outreach Data Visualization
Online NW 2015 Wirz Developing Novel Outreach Data VisualizationOnline NW 2015 Wirz Developing Novel Outreach Data Visualization
Online NW 2015 Wirz Developing Novel Outreach Data Visualization
 
Data Viz CE 2014 Vision and the Brain
Data Viz CE 2014 Vision and the BrainData Viz CE 2014 Vision and the Brain
Data Viz CE 2014 Vision and the Brain
 
Data Viz CE 2014 Toolbox
Data Viz CE 2014 ToolboxData Viz CE 2014 Toolbox
Data Viz CE 2014 Toolbox
 
Data Viz CE 2014 Storytelling
Data Viz CE 2014 StorytellingData Viz CE 2014 Storytelling
Data Viz CE 2014 Storytelling
 
Data Viz CE 2014 Intro and Overview
Data Viz CE 2014 Intro and OverviewData Viz CE 2014 Intro and Overview
Data Viz CE 2014 Intro and Overview
 
Data Viz CE 2014 Color
Data Viz CE 2014 ColorData Viz CE 2014 Color
Data Viz CE 2014 Color
 
Data Viz CE 2014 Libraries
Data Viz CE 2014 LibrariesData Viz CE 2014 Libraries
Data Viz CE 2014 Libraries
 
Scientific Writing 2014 IEH
Scientific Writing 2014 IEHScientific Writing 2014 IEH
Scientific Writing 2014 IEH
 
Posters & Presentations that Don't Suck
Posters & Presentations that Don't SuckPosters & Presentations that Don't Suck
Posters & Presentations that Don't Suck
 
Data Management
Data ManagementData Management
Data Management
 
Rw 2014 poster final
Rw 2014 poster finalRw 2014 poster final
Rw 2014 poster final
 
Rw 2014 data visulization
Rw 2014 data visulizationRw 2014 data visulization
Rw 2014 data visulization
 
Data management workshop 101113
Data management workshop 101113Data management workshop 101113
Data management workshop 101113
 
Data Management Open House
Data Management Open HouseData Management Open House
Data Management Open House
 
Foundations of data viz
Foundations of data vizFoundations of data viz
Foundations of data viz
 
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
 
SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation
 
Science is a moving target
Science is a moving targetScience is a moving target
Science is a moving target
 
Powered by Libraries: Leveraging Libraries for Semantic Web and Linked Open D...
Powered by Libraries: Leveraging Libraries for Semantic Web and Linked Open D...Powered by Libraries: Leveraging Libraries for Semantic Web and Linked Open D...
Powered by Libraries: Leveraging Libraries for Semantic Web and Linked Open D...
 
Science101 slideshare
Science101 slideshareScience101 slideshare
Science101 slideshare
 

Dernier

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Dernier (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 

NCBI Boot Camp for Beginners Slides

  • 2. NCBI “ ” ...advances science and health by providing access to biomedical and genomic information
  • 3. NCBI Sequences Expression Genome maps Structures Protein Domains Homology (gene, protein, structure) Pathways Genetic Variation
  • 5. databases* * a brief survey of selected dbs
  • 8. PubMed 20,672,941 citations 2,157,529 PubMed Central 5,519 indexed journals
  • 11. Lesch-Nyhan If you query for Lesch-Nyhan, you get a very long OMIM record  OMIM
  • 12. Clinical Features Biochemical Features Inheritance Pathogenesis Diagnosis History Description Cloning Gene Structure Mapping Molecular Genetics Pathogenesis Evolution Animal Model Allelic Variants See Also References Contributors Creation Date Edit History OMIM Note: there are separate entries for Lesch-Nyhan syndrome and the protein that causes the defect
  • 13. OMIM Every OMIM Record has an extensive list of internal and external links
  • 16. DNA RNA Protein EST: expressed sequence tag SNP: single nucleotide polymorphism WGS: whole genome sequencing CDS: coding sequence STS: sequence tagged site
  • 17. NCBI SNP Primary Databases GEO GenBank Protein
  • 19. LOCUS Locus name, size, type, division, modification date Search tips: Locus names can change! Division names are historical, not taxonomical!
  • 20. DEFINITION As the author sees fit… Search tip: No Controlled Vocabulary in Definitions!
  • 21. ACCESSION/Version Accession numbers do not change, even if information in the record is changed at the author's request. Version and GI numbers change
  • 22. Keywords, Source, Organism Organism: Tied into Taxonomy Browser Search tip: Keywords are often blank When performing a “keyword” style search, use [all] , [word] or [title]
  • 23. Selected References Newest First Last “reference” covers submission information
  • 24. Features I Source, gene, misc features
  • 25. Features II CDS: links, translation
  • 26. Sequence
  • 27. GenBank Format GenBank (also for protein)
  • 28. 132,015,054 Sequences in GenBank 3/20/11 + HARD WORK - redundancy RefSeq
  • 29. RefSeqs provides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes RefSeq
  • 30. bio mol DNA RNA Protein RefSeqs provides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes RefSeq
  • 31. bio mol DNA RNA Protein RefSeqs HELLO my name is provides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes XX_123456 RefSeq
  • 32. bio molecules Genomic DNA (NC) Incomplete (NG) mRNA (NM) Model mRNA (XM) Curated Protein (NP) Model protein (XP) RefSeq
  • 33. NG_012250.1 NM_000690.2 AY621070.1 EU414258.1 EU414257.1 EU414256.1 EU414255.1 EU414254.1 EU414253.1 EU414252.1 EU414251.1 EU414250.1 EU414249.1 AF164120.1 EU373813.1 EU373812.1 EU373811.1 EU373810.1 EU373809.1 EU373808.1 EU373807.1 EU373806.1 EU373805.1 EU373804.1 AH002599.1 M20456.1 M20455.1 M20454.1 M20453.1 M20452.1 M20451.1 M20450.1 M20449.1 M20448.1 M20447.1 M20446.1 M20445.1 M20444.1 CR456991.1 AB385105.1 CU678321.1 CU678320.1 AF073514.1 AF073513.1 AF073512.1 AF073511.1 NG_012250.1 NM_000690.2 RefSeq
  • 34. Note: the NP sequence would not normally be found using a nucleotide search – I have included it only to show the complete suite of RefSeq for ALDH2 NG_012250.1 NM_000690.2 NP_000681.2 RefSeq
  • 37. Genome 1090 eukaryota 1483 prokaryota 2507 viruses
  • 38. Note: genome records are either mitochondrial or chromosome Note: no common names are listed as genome query results
  • 39. The genome record shows a variety of stats for different databases, as well as a map of the genome that is scrollable
  • 40. Searching in BioProject yields common names
  • 41. BioProject results contain background information
  • 42. Instead of searching Genome, you can also browse via the Genome Resource Guide
  • 43. Genome Resources G Genome BLAST B Map Viewer M Genome Project (BioProject) P
  • 44. G Genes and Human Health Epigenomics The Genomic Sequence Maps and Markers Transcribed Sequences Cytogenetics Comparative Genomics A standard record in Genome Resources contains many links out along with brief database summaries
  • 45. M Map Viewer starts by letting you select a chromosome (or section of a circular genome)
  • 46. M
  • 47. To the left of each gene, there are a variety of links out. Note: these change based on the level of information known about a given gene. M HUGO Gene Nomenclature Sequence Viewer Protein Download Evidence Viewer Molecular Model STS, OMIM, CCDS, SNP
  • 48. Regulatory Gene Intron Exon Intron
  • 49.
  • 50. Each gene record provides extensive details. We will go through an example Gene record in the following slides.
  • 51. Sequence Viewer and MapViewer Genomic Info
  • 52. Bibliography PubMed NOTE: Gene Reference into Function is an excellent resource for literature related to function. These articles have been submitted for inclusion into GeneRIF and are not the product of an automated text search. GeneRIF
  • 53. There’s Even More! Interactions Gene Ontology Genotypes Homologues Protein Information Interactions will list all known interacting molecules, providing links to
  • 54. RefSeq These reference sequences are stable and are independent of genome builds
  • 55. The NCBI Assembly ~100 individuals The Celera assembly ~5 individuals These reference sequences refer to specific builds HuRef Just Craig Ventner
  • 56. LINKS LINKS: internal, external and commercial
  • 58. Homologs paralogs orthologs orthologs frog α chick α mouseα mouseβ chick β frogβ α-chain gene β-chain gene GENE DUPLICATION Early Gene of Interest
  • 59. P3H1
  • 60. Protein of Interest (P3H1) Cross-species identity is automatically calculated Automatic sequence alignments are easily accessible
  • 61. Protein of Interest (P3H1) Note: UniGene may come up with different results, since it is based on EST clusters and not protein sequence
  • 62. 4 expression & structure
  • 63. UniGene EST, GEO Structures CDD, MMDB, PubChem…
  • 64. UniGene …an organized view of the transcriptome
  • 65. SELECTED PROTEIN SIMILARITIES GENE EXPRESSION EST GEO MAPPING POSITION SEQUENCES mRNA EST
  • 66. GENE EXPRESSION EST This is a “virtual northern” whereESTs are counted to get a rough sense of overall expression levels
  • 67. GENE EXPRESSION GEO Note: the GEO results contain all arrays that assay for this gene; most of these results are for specific disease or altered states and do not necessarily reflect wild type, normal levels of expression
  • 68. Structures CDD, MMDB, PubChem…
  • 69.
  • 70.
  • 71. Cn3D colored by secondary structure Note: Cn3D has aligned the individual chains for you
  • 72. Cn3D colored by chain (there are 7)
  • 73. “structure function”: the hemolysin protein bores a hole into red blood cells and sucks their insides out. The structure kind of looks like a hollow tack.
  • 74. Note: the structure listing shows each individual chain (along with 3D domains and superfamilies) AND the chemical that was found in the crustal structure (see arrow)
  • 75. Another example… this time a single chain with distinct domains
  • 76. Now we are coloring by domain. Also note the funky space-filling model. It makes proteins look fat.
  • 77. Note that Super Families are defined: clicking on them will take you to the conserved domain database
  • 78. The Conserved Domains Database provides alignments across species of conserved domains, along with a general description of the domain
  • 79. 3D domains are color coded. Note: 3D domains do not always correlate to Super Families! Clicking on the 3D domain will take you to related structures
  • 80. You can select structures and then view the 3D alignment in Cn3D
  • 81. Volia! Structural alignment. Note: the sequences are aligned in the Sequence View box.
  • 82. PubChem has three primary areas: BioAssay – registry of assays that can be searched by small molecule Substance – a redundant registry of compounds Compound – a non-redundant, curated chemical database
  • 83. You can search PubChem by chemical name, CAS number, or even by similar structures. Records contain lots of additional information. Highlights: synonyms (which can be quite extensive in chemical nomenclature). Of particular note: if the compound shows up in Structure, you can link to a view in Cn3D that shows it complexed with protein/DNA/RNA!
  • 84.
  • 85. BioSystemswill display a short verbal description, a schematic of the system in question and a link to all of the genes, proteins, small molecules found in the system along with links to related systems .
  • 87. NCBI high quality DB discovery tools
  • 88. high quality DB discovery tools RefSeq GenBank Database Ads check out these resources! Sensors are you looking for… Analysis tools pre-computed & on the fly
  • 89. where do I start?
  • 90. anywhere* *but gene acts as a good hub
  • 92. We Can Do It! Gene and RefSeq Genome Maps Allelic Var/Disease Expression Homologus G/P Structure
  • 93. Search for APOE in Entrez: Note that there are many different records in several different databases that have hits for APOE. Select PubMed.
  • 94. Select APOE in homo sapiens We have used PubMed for it’s gene sensor , which is fantastically useful. However, you can also search directly in the Gene database.
  • 95. LOTS of information in this report, including links IN the report, links to other NCBI databases and links to outside resources.
  • 96. Let’s check out the reference sequences….
  • 97. Note the genomic, mRNA and protein RefSeq that are independently maintained.
  • 98. Separate records for ref sequences associated with specific genomic builds…
  • 99. (many databases here) Let’s check out the SNP:Variation Viewer
  • 100. Note, the Cys130Arg variant has been frequently observed and well documented
  • 101. Let’s observe the Sequence Viewer and MapViewer for this gene.
  • 102. Note, you can change which sequence you want to observe (Stable reference, reference, celera and HuRef)
  • 103. The full view shows genes in the area, along with info on SNPs and other variation classifications.
  • 104. Full screen of MapViewer
  • 105. ab initio modeling Ensemble Genes UniGene RefSeq These are default maps
  • 106. You can change the maps you wish to view, both in terms of how you are annotating the genome but also in which organisms.
  • 107. Here we are looking at Chimp, Mouse and Human Gene maps. You can zoom out to get a larger picture of the area.
  • 108. APOE is part of the APO gene cluster. Note: lines between maps are mapped homologues.
  • 109. Each gene has a series of links following its name. We’ll jump to the APOE OMIM record.
  • 110. Another extensive record! Let’s jump to allelic variants.
  • 111. Let’s go to the SNP record Cys130Arg is .0016 .0016 Extensive documentation…
  • 112. OMIM Prot 3D SeqView GeneView MapView VarView PubMed
  • 113. NOTE: the default setting doesn’t show much, because it doesn’t include clinically associated variants – click this box and refresh.
  • 114. This is the one we want! Let’s jump to this reference SNP
  • 115. Hummmm… the two reference assemblies have a wild type allele, whereas Celera and HuRef carry the mutant allele. Let’s check out this area in HuRef using the sequence viewer – click on the chromosome position link.
  • 116. Clicking on sequence will bring up the sequence and the CDS. You will note that HuRef (which means Craig Ventner) carries the mutant allele.
  • 117. (many databases here) Let’s check out expression in UniGene
  • 118. Click on EST profile to go to the virtual northern.
  • 119. Disease State BODY SITES Development
  • 120. Click on GEO Profiles to see actually gene expression array data.
  • 121. Note, there are thousands of hits, meaning many gene arrays have assayed for this gene. However, most of these are in reference to a disease or altered state.
  • 122. Use GDS596 – it is the results for “normal” gene expression. Click on a chart to see detailed results.
  • 123. Highest expression in the liver, with lower level throughout the brain. liver brain
  • 124. (many databases here) Let’s check out homologs
  • 125. You can show a pairwise alignment using BLAST…
  • 126. E value Note the very low E value 1e-158
  • 127. The alignment shows that the Chimp genome carries an R at the allele in question!
  • 128. You can also check out homologs found in UniGene – a different a way to search.
  • 129. Bunnies show up using the UniGene homolog search, but not the HomoloGene search.
  • 130. Let’s go check out the protein record…
  • 131. Click here to link to the RefSeq protein record.
  • 132. Let’s run a BLAST to see if we can identify the giant panda homolog.
  • 133. I’ve changed the search to focus on the RefSeq protein database and limit it to the giant panda.
  • 134. Note: BLAST automatically detects domains The highest hit is a hypothetical protein. Let’s take a look at the alignment.
  • 135. Note, the panda has the mutant arginine... … does this mean pandas and chimps both have early onset Alzheimer's disease? Nobody knows!
  • 136. Let’s check out some related structures.
  • 137. This is the default setting. Change to all similar MMDB.
  • 138. Click here to go to an alignent between your query and the structure’s sequence.
  • 139. Click here to view in Cn3D Note: the structure sequence contains the mutant arginine
  • 140. Showing side chains, colored by hydrophobicity. The arginine is shown in yellow. Click here to go to the structure summary for 1B68
  • 141. Click here to find similar 3D domains
  • 142. Select another structure and then view 3D alignment.
  • 143. Overall alignment, showing side chains colored by hydrophobicity. Note, the Cys vs. Arg doesn’t make a huge change structurally.
  • 144. asdf science can be complex...
  • 145. …we can help you with that.
  • 147. Jackie Wirz, PhD wirzj@ohsu.edu

Notes de l'éditeur

  1. DNA fingerprint of M. tuberculosis
  2. Nathan Sawaya, LEGO Artist
  3. About 4000 major organisms vs. the 250,000 that are present in all of GenBank
  4. Figure ©1979 by T. C. Hsu; all text material ©2007 by Steven M. Carr
  5. http://survivingtheworkday.com
  6. http://survivingtheworkday.com
  7. http://survivingtheworkday.com
  8. http://survivingtheworkday.com
  9. http://survivingtheworkday.com
  10. www.biojobblog.com
  11. www.biojobblog.com
  12. www.biojobblog.com
  13. Crystal structure of putative aminotransferase (YP_614685.1) from SILICIBACTER SP. TM1040 at 1.80 A resolution. To be published
  14. cba-ramblings.blogspot.com
  15. cba-ramblings.blogspot.com
  16. http://www.alz.org/alzheimers_disease_4719.asp
  17. Reference SequenceHow people accessExpresseionGenomic assemblies maps region in map viewer look at gene cluster on ch19 compare across two other genomesPolymorphismsGenotypes referenceHuRefHomologusBlast – pandaGenome Reference Consortium human
  18. OMIMOMIM Link        HGNC        HGNC Listingsv        Sequence Viewpr        Proteinsdl        Download sequence region: corresponding contig regionevEvidence viewermm        Model Makerhm        HomologeneSTSUniSTSSNP        SNPs linked to gene
  19. Virtual northern blog
  20. Human on top: apoe3 CChimp has “risk” allele has r… interesting
  21. Panda has an RRestricted to completely sequenced eukaryotic genomesTranslating blast seqences against expressed sequences?
  22. http://www.petwebsite.com/rabbits/rabbit_care.htm
  23. Changes howprotein is processed, not so much structureColor by hydrophobicity!! When interact with lipid, interior partilaly unfolds to interact with lipid.