The NCBI Boot Camp for Beginners was designed to offer an overview of the NCBI suite of resources. In the first half of the presentation, highlighted databases were covered in four main categories: literature, sequences, genes & genomes and expression & structure. The second half of the class used the apolipoprotein A as a query that was explored through many of the NCBI databases, from identifying the reference sequences to a structural analysis of the Cys130Arg variant.
11. Lesch-Nyhan If you query for Lesch-Nyhan, you get a very long OMIM record OMIM
12. Clinical Features Biochemical Features Inheritance Pathogenesis Diagnosis History Description Cloning Gene Structure Mapping Molecular Genetics Pathogenesis Evolution Animal Model Allelic Variants See Also References Contributors Creation Date Edit History OMIM Note: there are separate entries for Lesch-Nyhan syndrome and the protein that causes the defect
13. OMIM Every OMIM Record has an extensive list of internal and external links
16. DNA RNA Protein EST: expressed sequence tag SNP: single nucleotide polymorphism WGS: whole genome sequencing CDS: coding sequence STS: sequence tagged site
19. LOCUS Locus name, size, type, division, modification date Search tips: Locus names can change! Division names are historical, not taxonomical!
20. DEFINITION As the author sees fit… Search tip: No Controlled Vocabulary in Definitions!
21. ACCESSION/Version Accession numbers do not change, even if information in the record is changed at the author's request. Version and GI numbers change
22. Keywords, Source, Organism Organism: Tied into Taxonomy Browser Search tip: Keywords are often blank When performing a “keyword” style search, use [all] , [word] or [title]
23. Selected References Newest First Last “reference” covers submission information
29. RefSeqs provides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes RefSeq
30. bio mol DNA RNA Protein RefSeqs provides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes RefSeq
31. bio mol DNA RNA Protein RefSeqs HELLO my name is provides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes XX_123456 RefSeq
32. bio molecules Genomic DNA (NC) Incomplete (NG) mRNA (NM) Model mRNA (XM) Curated Protein (NP) Model protein (XP) RefSeq
34. Note: the NP sequence would not normally be found using a nucleotide search – I have included it only to show the complete suite of RefSeq for ALDH2 NG_012250.1 NM_000690.2 NP_000681.2 RefSeq
43. Genome Resources G Genome BLAST B Map Viewer M Genome Project (BioProject) P
44. G Genes and Human Health Epigenomics The Genomic Sequence Maps and Markers Transcribed Sequences Cytogenetics Comparative Genomics A standard record in Genome Resources contains many links out along with brief database summaries
45. M Map Viewer starts by letting you select a chromosome (or section of a circular genome)
47. To the left of each gene, there are a variety of links out. Note: these change based on the level of information known about a given gene. M HUGO Gene Nomenclature Sequence Viewer Protein Download Evidence Viewer Molecular Model STS, OMIM, CCDS, SNP
52. Bibliography PubMed NOTE: Gene Reference into Function is an excellent resource for literature related to function. These articles have been submitted for inclusion into GeneRIF and are not the product of an automated text search. GeneRIF
53. There’s Even More! Interactions Gene Ontology Genotypes Homologues Protein Information Interactions will list all known interacting molecules, providing links to
66. GENE EXPRESSION EST This is a “virtual northern” whereESTs are counted to get a rough sense of overall expression levels
67. GENE EXPRESSION GEO Note: the GEO results contain all arrays that assay for this gene; most of these results are for specific disease or altered states and do not necessarily reflect wild type, normal levels of expression
73. “structure function”: the hemolysin protein bores a hole into red blood cells and sucks their insides out. The structure kind of looks like a hollow tack.
74. Note: the structure listing shows each individual chain (along with 3D domains and superfamilies) AND the chemical that was found in the crustal structure (see arrow)
76. Now we are coloring by domain. Also note the funky space-filling model. It makes proteins look fat.
77. Note that Super Families are defined: clicking on them will take you to the conserved domain database
78. The Conserved Domains Database provides alignments across species of conserved domains, along with a general description of the domain
79. 3D domains are color coded. Note: 3D domains do not always correlate to Super Families! Clicking on the 3D domain will take you to related structures
80. You can select structures and then view the 3D alignment in Cn3D
81. Volia! Structural alignment. Note: the sequences are aligned in the Sequence View box.
82. PubChem has three primary areas: BioAssay – registry of assays that can be searched by small molecule Substance – a redundant registry of compounds Compound – a non-redundant, curated chemical database
83. You can search PubChem by chemical name, CAS number, or even by similar structures. Records contain lots of additional information. Highlights: synonyms (which can be quite extensive in chemical nomenclature). Of particular note: if the compound shows up in Structure, you can link to a view in Cn3D that shows it complexed with protein/DNA/RNA!
84.
85. BioSystemswill display a short verbal description, a schematic of the system in question and a link to all of the genes, proteins, small molecules found in the system along with links to related systems .
88. high quality DB discovery tools RefSeq GenBank Database Ads check out these resources! Sensors are you looking for… Analysis tools pre-computed & on the fly
92. We Can Do It! Gene and RefSeq Genome Maps Allelic Var/Disease Expression Homologus G/P Structure
93. Search for APOE in Entrez: Note that there are many different records in several different databases that have hits for APOE. Select PubMed.
94. Select APOE in homo sapiens We have used PubMed for it’s gene sensor , which is fantastically useful. However, you can also search directly in the Gene database.
95. LOTS of information in this report, including links IN the report, links to other NCBI databases and links to outside resources.
111. Let’s go to the SNP record Cys130Arg is .0016 .0016 Extensive documentation…
112. OMIM Prot 3D SeqView GeneView MapView VarView PubMed
113. NOTE: the default setting doesn’t show much, because it doesn’t include clinically associated variants – click this box and refresh.
114. This is the one we want! Let’s jump to this reference SNP
115. Hummmm… the two reference assemblies have a wild type allele, whereas Celera and HuRef carry the mutant allele. Let’s check out this area in HuRef using the sequence viewer – click on the chromosome position link.
116. Clicking on sequence will bring up the sequence and the CDS. You will note that HuRef (which means Craig Ventner) carries the mutant allele.
120. Click on GEO Profiles to see actually gene expression array data.
121. Note, there are thousands of hits, meaning many gene arrays have assayed for this gene. However, most of these are in reference to a disease or altered state.
122. Use GDS596 – it is the results for “normal” gene expression. Click on a chart to see detailed results.
123. Highest expression in the liver, with lower level throughout the brain. liver brain
Crystal structure of putative aminotransferase (YP_614685.1) from SILICIBACTER SP. TM1040 at 1.80 A resolution. To be published
cba-ramblings.blogspot.com
cba-ramblings.blogspot.com
http://www.alz.org/alzheimers_disease_4719.asp
Reference SequenceHow people accessExpresseionGenomic assemblies maps region in map viewer look at gene cluster on ch19 compare across two other genomesPolymorphismsGenotypes referenceHuRefHomologusBlast – pandaGenome Reference Consortium human
OMIMOMIM Link HGNC HGNC Listingsv Sequence Viewpr Proteinsdl Download sequence region: corresponding contig regionevEvidence viewermm Model Makerhm HomologeneSTSUniSTSSNP SNPs linked to gene
Virtual northern blog
Human on top: apoe3 CChimp has “risk” allele has r… interesting
Panda has an RRestricted to completely sequenced eukaryotic genomesTranslating blast seqences against expressed sequences?
http://www.petwebsite.com/rabbits/rabbit_care.htm
Changes howprotein is processed, not so much structureColor by hydrophobicity!! When interact with lipid, interior partilaly unfolds to interact with lipid.