This document discusses databases in bioinformatics. It begins by noting the rapid increase in biological data from sources like gene sequences, protein sequences, structural data, and gene expression data. It then defines biological databases as structured, searchable collections of data that are periodically updated and cross-referenced. The major purposes of databases are to make biological data available, systematize the data, and allow analysis of computed biological data. The document provides a brief history of biological databases and sequencing efforts. It also classifies biological databases based on data type, maintenance status, data access, data sources, database design, and organism. Specific databases discussed include DDBJ, EMBL, GenBank, Swiss-Prot, and NCB
2. Introduction
Fast increase in biological information
Biological science has now turned into a
data rich science
Gene sequences
Amino acid sequences in proteins
Motifs and domains in proteins
Structural data from XRD & NMR
Metabolic pathways
Protein-protein interactions
Gene expression data DNA microarrays
3. Biological databases
Biological database is a collection of
data which is structured, searchable,
updated periodically and also cross-
referenced.
Some databases are multi functional
Major purposes of databases is as
follows:Availability of
biological data
Systemization
of data
Analysis of
computed
biological data
4. History
1956; first sequence database when insulin
was sequenced
51 amino acids
Atlas of protein sequences and structures in
1965 by Margaret Day Hoff et al was a
printed book.
Became base for PIR protein information
resource
First nucleotide sequence: yeast tRNA
77 bases
During this time 3D structure of proteins was
being studied and renowned PDB was made.
5. …
First genome published was of free
living virus haemophilus influenzae in
1995
Genome?
All genes ? Or all DNA?
Why are complete genome
interesting?
6. Aspects of genome analysis
Ab initio Gene
prediction
Locus
Gene
identification by
EST (expressed
sequence tags)
Gene prediction
via EST
Gene prediction
via comparison,
coding and
regulatory
regions
7. Features of biological
databases
1) Data heterogeneity
2) High volume data
3) Uncertainty
4) Data Curation
5) Large scale data integration
6) Data sharing
7) Dynamic and subject to change
11. Based on data access
1) Publicly available
2) Available with copy wright
3) Browsing only, accessible but not
downloadable
4) Academic but not freely available
5) Proprietary commercial
6) Restricted
13. Primary databases
Contains original data from the
researchers
Public or open access mostly
NCBI , GENEBANK
EMBL
SWISS-PROT
NDB
14. Secondary databases
Results from entries of primary
database
Manually created or automatically
generated
Swiss-prot is an example of secondary
database
17. DDBJ
DNA databank of japan
Nucleotide sequence database
Established in 1986
Has been working in collaboration
with EMBL & NCBI
After 20 years another collaborative
project named INSDC was formed
EMBL Genebank DDBJ
18.
19.
20.
21. SWISS-PROT
Protein sequence database
Maintained by SIB Swiss institute of
bioinformatics in Switzerland and also
the European bioinformatics institute
EBI
The output format is swiss-prot file
That has been explained in molecular
file formats