The document discusses the National Center for Biotechnology Information (NCBI), which maintains biological databases and provides bioinformatics tools. NCBI houses both primary databases directly submitted by researchers and secondary databases compiled from primary sources. Major databases include GenBank (nucleotide sequences), PubMed Central (biomedical literature), and reference sequence databases. Tools like BLAST, Entrez, and ORFfinder allow users to search and analyze sequence data. NCBI aims to make biomedical research data freely accessible worldwide.
2. NCBI
ï‚— What is NCBI?
ï‚— National center for biotechnology
information
ï‚— Established in 1998
ï‚— Part of national library of medicine at
national institute of health
ï‚— Major aim : public database
ï‚— Development of software tools for
sequence analysis and disseminate
biomedical information
2
3. 2 explain Roles of NCBI
1) Maintenance of biological databases
whether primary or secondary. It
includes GENEBANK
2) NCBI provides the data retrieval
systems such as ENTREZ
3) Provides computational sources for
the analysis of the GENEBANK data
and other biological data
3
4. Kinds of databases
Primary databases
Secondary
databases
ï‚— Original submission by
the experimentalists who
have originally searched
ï‚— Content Is controlled by
the submitters
ï‚— Examples include
GENEBANK, SNP and
GEO
ï‚— Built up from primary
data which is retrieved by
primary database
ï‚— Content controlled by
third party NCBI
ï‚— Examples include
RefSeq, RefSNP, NCBI
Structure, Protein. Etc.
4
6. 6
NCBI
TOOLS
BLAST
Standard blast Mega blast
PSI-blast PHI-blast
RPS blast
BLAST 2 SEQ
DATABASE
RETREIVAL
TOOL
SPECIALIZED
TOOL
ORF finder E-pcr
Sequence
submission
tool bankit
Spidey
DATABASES
Nucleotide
database
Literature
database
Protein
database
Expression
database
Structure
database
7. Retrieval tool ENTREZ
ï‚— Integrated database search and
retrieval system
ï‚— Provides extensive links between and
within database records
ï‚— Cross references of different
databases
7
8. 3 Sequence submission to
NCBI
ï‚— Databases are constantly updated
with the newer submissions of the
sequences via sequence submission
tools such as:
ï‚— Bankit
ï‚— Sequein
8
9. Bank it
ï‚— Web-based sequence submission tool
ï‚— Connect to NCBI Home Page
ï‚— Connect to GENEBANK side bar at
left
ï‚— Tool of choice for simple submissions
ï‚— Can also be used for updating
previously added information
9
10. Sequein
ï‚— Stand alone sequence submission
and updating tool
ï‚— Handling multiple sequence
submission
ï‚— Provides increased capacity for long
sequence submissions
ï‚— Multiple annotation
ï‚— Phylogenetic analysis population
10
11. BLAST
ï‚— Basic local alignment search tool
program
ï‚— Sequence similarity searches against
a variety of different sequence
databases
ï‚— Unigene, gene, MMDB, GEO
11
13. SPECIALIZED TOOLS
ï‚— There are a lot of sequence analysis
tools which will be explained later
1) ORF Finder
2) e-PCR
3) SPIDEY
13
14. ORF FINDER
ï‚— Open reading frame finder
ï‚— Graphical analysis tool
ï‚— Finds all open reading frames in the
user’s sequence or the sequence
already submitted in the databases
ï‚— Uses standard and alternative genetic
codes for the analysis of reading
frames
ï‚— Packaged with sequein
14
15. e-PCR
ï‚— Electronic polymerase chain reaction
ï‚— Searches for the STS
ï‚— Whole template DNA is searched for
STS
ï‚— New database searches a query
sequence against a sequence
database
15
16. Spidey
ï‚— This is another m RNA to genome
alignment tool
ï‚— Searches databases via BLAST
ï‚— As an input it gets a single genomic
sequence and m RNA FASTA
sequences
ï‚— Pseudo genes and paralogues are
eliminated in this search and rue gene
is selected.
16
18. Nucleotide database-
GENEBANK
 NCBI’s primary sequence database
ï‚— Comprehensive public database of
nucleotide sequences
ï‚— Bibliographic support
ï‚— Built from authors entry into genebak
regarding EST
ï‚— Genebank an EMBL make an INSD
ï‚— Collaborative approach to share data
daily
18
19. HOMOLOGENE
ï‚— Automated detection of homologues
ï‚— Completely sequenced eukaryotic
genes
ï‚— Analyses the proteins of the input
organism
ï‚— Blastp
ï‚— Taxonomic trees are being made
ï‚— Statistical analysis of each match is
done and orthologs and paralogs are
identified 19
20. Db SNP
ï‚— Database of single nucleotide
polymorphisms
ï‚— Short deletion and insertions
polymorphisms
ï‚— SNP~ 3D structures via Cn3D and
MMDB
ï‚— Functional variants could be matched
with the OMIM
20
21. Literature database- PMC
ï‚— Pubmed central
ï‚— Digital archive of peer review journals
of life sciences
ï‚— Enormous full text journals are there
ï‚— Immediate access to full text journals
or within 12 months of publishing
21
22. Protein database
ï‚— ENTREZ PROTEIN ~ Protein
sequence database of NCBI
ï‚— Databases are cross searched
ï‚— PDB, Swiss-Prot
ï‚— Taxonomic relations
ï‚— CDD conserved domain database
22
23. Gene expression database
ï‚— Distribution and regulation of the
Transcriptional products
ï‚— Normal and abnormal cell types
ï‚— Lot of techniques have been
developed for survey of genome wide
transcript expression
23
24. SAGE map
ï‚— Serial analysis of gene expression
map
ï‚— Gene expression data analysis
ï‚— Tag-to-gene function map
ï‚— SAGE tags to gene clusters or a
single gene
ï‚— A reciprocal gene to tag SAGE Map is
also available
ï‚— Updated weekly
24
25. Structural database- MMDB
ï‚— Molecular modeling database MMDB
ï‚— 3D macromolecular structures
ï‚— XRD and NMR are being used for the
experimental structure determination
ï‚— Evolutionary history of function
ï‚— Relationship between
macromolecules.
25
30. Chemical database- Pubchem
ï‚— Database for the chemical molecules
ï‚— Freely accessed through web-user
interface
ï‚— Chemical structure
ï‚— Diagnostic and therapeutic agents
ï‚— Molecular mass below 2000u
ï‚— Bridge between macromolecular
genomics and small organic
molecules of cellular metabolism
30