SlideShare une entreprise Scribd logo
1  sur  12
Web based servers and softwares for genome analysis
Dr. Naveen Gaurav
Associate Professor and Head
Department of Biotechnology
Shri Guru Ram Rai University
Dehradun
Characteristics of Biological Data (Genome Data Management)
There are many characteristics of biological data. All these characteristics make the
management of biological information a particularly challenging problem. Here mainly we
will focus on characteristics of biological information and multidisciplinary field called
bioinformatics. Bioinformatics, now a days has emerged with graduate degree programs in
several universities.
Characteristics of Biological Information:
1. There is a high amount and range of variability in data.
There should be a flexibility in biological systems so that it can handle data types and values.
Placing constraints on data types must be limited with such a wide range of possible data
values. There can be a loss of information when there is exclusion of such values.
2. There will be a difference in representation of the same data by different biologists.
This can be done even using the same system. There is a multiple ways to model any given
entity with the results often reflecting the particular focus of the scientist.
There should be a linking of data elements in a network of schemas.
3. Defining the complex queries and also important to the biologists.
Complex queries must be supported by biological systems. Knowledge of the data structure
is needed for the average users because with the help of this knowledge average user can
construct a complex query across data sets on their own. For this systems must provide some
tools for building these queries.
4. When compared with most other domains or applications, biological data becomes
highly complex.
Such data must ensure that no information is lost during biological data modelling and such
data must be able to represent a complex substructure of data as well as relationships. An
additional context is provided by the structure of the biological data for interpretation of
the information.
5. There is a rapid change in schemas of biological databases.
There should be a support of schema evolution and data object migration so that there can
be an improved information flow between generations or releases of databases.
The relational database systems support the ability to extend the schema and a frequent
occurrence in the biological setting.
6. Most biologists are not likely to have knowledge of internal structure of the database or
about schema design.
Users need an information which can be displayed in a manner such that it can be
applicable to the problem which they are trying to address. Also the data structure should
be reflected in an easy and understandable manner. An information regarding the meaning
of the schema is not provided to the user because of the failure by the relational schemas.
A present search interfaces is provided by the web interfaces, which may limit access into
the database.
7. There is no need of the write access to the database by the users of biological data,
instead they only require read access.
There is limitation of write access to the privileged users called curators. There are only
small numbers of users which require write access but a wide variety of read access
patterns are generated by the users into the databases.
8. Access to “old” values of the data are required by the users of biological data most often
while verifying the previously reported results.
Hence system of archives must support the changes to the values of the data in the
database. Access to both the most recent version of data value and its previous version are
important in the biological domain.
9. Added meaning is given by the context of data for its use in biological applications.
Whenever appropriate, context must be maintained and conveyed to the user. For the
maximization of the interpretation of a biological data value, it should be possible to
integrate as many contexts as possible.
Web based servers and softwares for genome analysis
Genome browser provides a graphical interface for users to browse, search, retrieve and
analyze genomic sequence and annotation data. Web-based genome browsers can be
classified into general genome browsers with multiple species and species-specific genome
browsers. Currently, there are two types of web-based genome browsers. The first type is the
multiple-species genome browsers implemented in, among others, the UCSC genome
database, the ENSEMBL project, the NCBI Map viewer website, the Phytozome and Gramene
platforms. These genome browsers integrate sequence and annotations for dozens of
organisms and further promote cross-species comparative analysis. Most of them contain
abundant annotations, covering gene model, transcript evidence, expression profiles,
regulatory data, genomic conversation, etc.
1. ENSEMBL (https://asia.ensembl.org/index.html): ENSEMBL is a genome browser for vertebrate
genomes that supports research in comparative genomics, evolution, sequence variation and
transcriptional regulation. ENSEMBL annotate genes, computes multiple alignments, predicts
regulatory function and collects disease data. ENSEMBL tools include BLAST, BLAT, BioMart and
the Variant Effect Predictor (VEP) for all supported species. The Ensembl Rapid Release website
provides annotation for recently produced, publicly available vertebrate and non-vertebrate
genomes from biodiversity initiatives such as Darwin Tree of Life, the Vertebrate Genomes
Project and the Earth BioGenome Project.
Method: a. Open website by the link https://grch37.ensembl.org/index.html
b. Enter the name of living organism (microrganisms, fungi, algae, plants and animals)
c. Enter the obtain sequence (isolated genome from test organism) and compare with data feed in
ensembl tool/website.
2. VISTA: a. Windows Vista, the line of Microsoft Windows client operating systems released
in 2006 and 2007
b. VistA, (Veterans Health Information Systems and Technology Architecture) a medical
records system of the United States Department of Veterans Affairs and others worldwide
c. VISTA (comparative genomics), software tools for genome analysis and genomic sequence
comparisons
d. VistaPro, and Vista, 3D landscape generation software for the Amiga and PC
e. VIsualizing STructures And Sequences, bioinformatics software
Organizations and institutions for VISTA:
1. Vista Entertainment Solutions, a New Zealand software company specializing in solutions for
the cinema industry
2. Americorps VISTA, a national service program to fight poverty through local government
agencies and non-profit organizations
3. Ventura Intercity Service Transit Authorit, a public transportation agency in Ventura County,
California, US
4. Vista Community College, now Berkeley City College, a community college in Berkeley,
California, US
5. Vista Federal Credit Union, now merged with Partners Federal Credit Union, a credit union
that serves employees of The Walt Disney Company
6. Vista University, a now-closed South African university
7. Volunteers in Service to America
8. Vista Equity Partners
9. Vista Oil & Gas, an oil and gas company founded by Miguel Galuccio
10. Vista Outdoor Inc., a U.S.-based publicly traded outdoor and shooting sports compan.
3. UCSC Genome Browser (https://genome.ucsc.edu/): On June 22, 2000, UCSC and the
other members of the International Human Genome Project consortium completed the first working
draft of the human genome assembly, forever ensuring free public access to the genome and the
information it contains. A few weeks later, on July 7, 2000, the newly assembled genome was
released on the web at http://genome.ucsc.edu, along with the initial prototype of a graphical
viewing tool, the UCSC Genome Browser. In the ensuing years, the website has grown to include a
broad collection of vertebrate and model organism assemblies and annotations, along with a large
suite of tools for viewing, analyzing and downloading data. In the years since its inception, the UCSC
Browser has expanded to accommodate genome sequences of all vertebrate species and selected
invertebrates for which high-coverage genomic sequences is available,now including 46 species. High
coverage is necessary to allow overlap to guide the construction of larger contiguous regions.
Genomic sequences with less coverage are included in multiple-alignment tracks on some browsers,
but the fragmented nature of these assemblies does not make them suitable for building full featured
browsers. (more below on multiple-alignment tracks). The species hosted with full-featured genome
browsers are shown in the table.
Browser functionality: The large amount of data about biological systems that is accumulating in the
literature makes it necessary to collect and digest information using the tools of bioinformatics. The
UCSC Genome Browser presents a diverse collection of annotation datasets (known as "tracks" and
presented graphically), including mRNA alignments, mappings of DNA repeat elements, gene
predictions, gene-expression data, disease-association data (representing the relationships of genes
to diseases), and mappings of commercially available gene chips (e.g., Illumina and Agilent). The basic
paradigm of display is to show the genome sequence in the horizontal dimension, and show graphical
representations of the locations of the mRNAs, gene predictions, etc. Blocks of color along the
coordinate axis show the locations of the alignments of the various data types. The ability to show
this large variety of data types on a single coordinate axis makes the browser a handy tool for the
vertical integration of the data.
To find a specific gene or genomic region, the user may type in the gene name, a DNA sequence, an
accession number for an RNA, the name of a genomic cytological band (e.g., 20p13 for band 13 on
the short arm of chr20) or a chromosomal position (chr17:38,450,000-38,531,000 for the region
around the gene BRCA1). Presenting the data in the graphical format allows the browser to present
link access to detailed information about any of the annotations. The gene details page of the UCSC
Genes track provides a large number of links to more specific information about the gene at many
other data resources, such as Online Mendelian Inheritance in Man (OMIM) and SwissProt.
Designed for the presentation of complex and voluminous data, the UCSC Browser is optimized for
speed. By pre-aligning the 55 million RNAs of GenBank to each of the 81 genome assemblies (many of
the 46 species have more than one assembly), the browser allows instant access to the alignments of
any RNA to any of the hosted species. The juxtaposition of the many types of data allow researchers
to display exactly the combination of data that will answer specific questions. A pdf/postscript output
functionality allows export of a camera-ready image for publication in academic journals.
One unique and useful feature that distinguishes the UCSC Browser from other genome browsers is
the continuously variable nature of the display. Sequence of any size can be displayed, from a single
DNA base up to the entire chromosome (human chr1 = 245 million bases, Mb) with full annotation
tracks. Researchers can display a single gene, a single exon, or an entire chromosome band, showing
dozens or hundreds of genes and any combination of the many annotations. A convenient drag-and-
zoom feature allows the user to choose any region in the genome image and expand it to occupy the
full screen. Researchers may also use the browser to display their own data via the Custom Tracks
tool. This feature allows users to upload a file of their own data and view the data in the context of
the reference genome assembly. Users may also use the data hosted by UCSC, creating subsets of the
data of their choosing with the Table Browser tool (such as only the SNPs that change the amino acid
sequence of a protein) and display this specific subset of the data in the browser as a Custom Track.
Any browser view created by a user, including those containing Custom Tracks, may be shared with
other users via the Saved Sessions tool.
Analysis tools
The UCSC site hosts a set of genome analysis tools, including a full-featured GUI interface for
mining the information in the browser database, a FAST sequence alignment tool BLAT that is
also useful for simply finding sequences in the massive sequence (human genome = 3.23
billion bases [Gb]) of any of the featured genomes.
A liftOver tool uses whole-genome alignments to allow conversion of sequences from one
assembly to another or between species. The Genome Graphs tool allows users to view all
chromosomes at once and display the results of genome-wide association studies (GWAS).
The Gene Sorter displays genes grouped by parameters not linked to genome location, such
as expression pattern in tissues.
4. NCBI genome (https://www.ncbi.nlm.nih.gov/genome/): The National Center for
Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM),
a branch of the National Institutes of Health (NIH). It is approved and funded by the government of
the United States. The NCBI is located in Bethesda, Maryland and was founded in 1988 through
legislation sponsored by Senator Claude Pepper.The NCBI houses a series of databases relevant
to biotechnology and biomedicine and is an important resource for bioinformatics tools and
services. Major databases include GenBank for DNA sequences and PubMed, a bibliographic
database for biomedical literature. Other databases include the NCBI Epigenomics database. All
these databases are available online through the Entrez search engine. NCBI was directed by David
Lipman, one of the original authors of the BLAST sequence alignment program and a widely
respected figure in bioinformatics. He also led an intramural research program, including groups
led by Stephen Altschul (another BLAST co-author), David Landsman, Eugene Koonin, John Wilbur,
Teresa Przytycka, and Zhiyong Lu. David Lipman stood down from his post in May 2017.
GenBank: NCBI had responsibility for making available the GenBank DNA sequence
database since 1992. GenBank coordinates with individual laboratories and other sequence
databases such as those of the European Molecular Biology Laboratory (EMBL) and
the DNA Data Bank of Japan (DDBJ).
Since 1992, NCBI has grown to provide other databases in addition to GenBank. NCBI
provides Gene, Online Mendelian Inheritance in Man, the Molecular Modeling Database
(3D protein structures), dbSNP (a database of single-nucleotide polymorphisms), the
Reference Sequence Collection, a map of the human genome, and a taxonomy browser,
and coordinates with the National Cancer Institute to provide the Cancer Genome Anatomy
Project. The NCBI assigns a unique identifier (taxonomy ID number) to each species of
organism. The NCBI has software tools that are available through internet browsers or by
FTP. For example, BLAST is a sequence similarity searching program. BLAST can do
sequence comparisons against the GenBank DNA database in less than 15 seconds.
NCBI Bookshelf: The NCBI Bookshelf is a collection of freely accessible, downloadable, on-
line versions of selected biomedical books. The Bookshelf covers a wide range of topics
including molecular biology, biochemistry, cell biology, genetics, microbiology, disease
states from a molecular and cellular point of view, research methods, and virology. Some of
the books are online versions of previously published books, while others, such as Coffee
Break, are written and edited by NCBI staff. The Bookshelf is a complement to the Entrez
PubMed repository of peer-reviewed publication abstracts in that Bookshelf contents
provide established perspectives on evolving areas of study and a context in which many
disparate individual pieces of reported research can be organized.
Basic Local Alignment Search Tool (BLAST): BLAST is an algorithm used for calculating sequence
similarity between biological sequences such as nucleotide sequences of DNA and amino acid
sequences of proteins. BLAST is a powerful tool for finding sequences similar to the query sequence
within the same organism or in different organisms. It searches the query sequence on NCBI databases
and servers and posts the results back to the person's browser in the chosen format. Input sequences to
the BLAST are mostly in FASTA or Genbank format while output could be delivered in a variety of formats
such as HTML, XML formatting, and plain text. HTML is the default output format for NCBI's web-page.
Results for NCBI-BLAST are presented in graphical format with all the hits found, a table with sequence
identifiers for the hits having scoring related data, along with the alignments for the sequence of interest
and the hits received with analogous BLAST scores for these.
Entrez: The Entrez Global Query Cross-Database Search System is used at NCBI for all the major
databases such as Nucleotide and Protein Sequences, Protein Structures, PubMed, Taxonomy, Complete
Genomes, OMIM, and several others.[10] Entrez is both an indexing and retrieval system having data from
various sources for biomedical research. NCBI distributed the first version of Entrez in 1991, composed of
nucleotide sequences from PDB and GenBank, protein sequences from SWISS-PROT, translated GenBank,
PIR, PRF, PDB, and associated abstracts and citations from PubMed. Entrez is specially designed to
integrate the data from several different sources, databases, and formats into a uniform information
model and retrieval system which can efficiently retrieve that relevant references, sequences and
structures.[11]
Gene: Gene has been implemented at NCBI to characterize and organize the information about genes. It
serves as a major node in the nexus of the genomic map, expression, sequence, protein function,
structure, and homology data. A unique GeneID is assigned to each gene record that can be followed
through revision cycles. Gene records for known or predicted genes are established here and are
demarcated by map positions or nucleotide sequences. Gene has several advantages over its
predecessor, LocusLink, including, better integration with other databases in NCBI, broader taxonomic
scope, and enhanced options for query and retrieval provided by the Entrez system.[12]
Protein database: Protein database maintains the text record for individual protein
sequences, derived from many different resources such as NCBI Reference Sequence
(RefSeq) project, GenBank, PDB, and UniProtKB/SWISS-Prot. Protein records are present in
different formats including FASTA and XML and are linked to other NCBI resources. Protein
provides the relevant data to the users such as genes, DNA/RNA sequences, biological
pathways, expression and variation data, and literature. It also provides the pre-determined
sets of similar and identical proteins for each sequence as computed by the BLAST. The
Structure database of NCBI contains 3D coordinate sets for experimentally-determined
structures in PDB that are imported by NCBI. The Conserved Domain database (CDD of
protein contains sequence profiles that characterize highly conserved domains within
protein sequences. It also has records from external resources like SMART and Pfam. There
is another database in a protein known as Protein Clusters database which contains sets of
proteins sequences that are clustered according to the maximum alignments between the
individual sequences as calculated by BLAST.
Pubchem database:
PubChem database of NCBI is a public resource for molecules and their activities against
biological assays. PubChem is searchable and accessible by Entrez information retrieval
system.
Thank you
References: Online notes, notes from research papers and Books by google search Engine

Contenu connexe

Tendances (20)

Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Protein database
Protein databaseProtein database
Protein database
 
Sts
StsSts
Sts
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Cath
CathCath
Cath
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Est database
Est databaseEst database
Est database
 
2d Page
2d Page2d Page
2d Page
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Site directed mutagenesis by pcr
Site directed mutagenesis by pcrSite directed mutagenesis by pcr
Site directed mutagenesis by pcr
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl database
 
Ddbj
DdbjDdbj
Ddbj
 
Library screening
Library screeningLibrary screening
Library screening
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Functional proteomics, methods and tools
Functional proteomics, methods and toolsFunctional proteomics, methods and tools
Functional proteomics, methods and tools
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 

Similaire à Web based servers and softwares for genome analysis

PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
 
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and PrimersGASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and Primersijdmtaiir
 
V1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.docV1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.docpraveena06
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsmaulikchaudhary8
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020Alexander Pico
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
 
Genome voyager-beta-brochure
Genome voyager-beta-brochureGenome voyager-beta-brochure
Genome voyager-beta-brochureXing Xu
 

Similaire à Web based servers and softwares for genome analysis (20)

Protocols for genomics and proteomics
Protocols for genomics and proteomics Protocols for genomics and proteomics
Protocols for genomics and proteomics
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and PrimersGASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
 
V1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.docV1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.doc
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
97320630009588
9732063000958897320630009588
97320630009588
 
WWW in biotechnology
WWW in biotechnology WWW in biotechnology
WWW in biotechnology
 
Web Apollo Workshop UIUC
Web Apollo Workshop UIUCWeb Apollo Workshop UIUC
Web Apollo Workshop UIUC
 
Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020
 
D1803012022
D1803012022D1803012022
D1803012022
 
B.3.5
B.3.5B.3.5
B.3.5
 
iBioSearch: The Integrated Biological Database Search
iBioSearch: The Integrated Biological Database SearchiBioSearch: The Integrated Biological Database Search
iBioSearch: The Integrated Biological Database Search
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
 
Genome voyager-beta-brochure
Genome voyager-beta-brochureGenome voyager-beta-brochure
Genome voyager-beta-brochure
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 

Plus de Dr. Naveen Gaurav srivastava

Polymerase chain reaction andRestriction fragment length polymorphism (RFLP):...
Polymerase chain reaction andRestriction fragment length polymorphism (RFLP):...Polymerase chain reaction andRestriction fragment length polymorphism (RFLP):...
Polymerase chain reaction andRestriction fragment length polymorphism (RFLP):...Dr. Naveen Gaurav srivastava
 
Intracellular Compartments / Intracellular fluid
Intracellular Compartments / Intracellular fluidIntracellular Compartments / Intracellular fluid
Intracellular Compartments / Intracellular fluidDr. Naveen Gaurav srivastava
 
Biotechnology for the livestock improvements and phb degradation
Biotechnology for the livestock improvements and phb degradationBiotechnology for the livestock improvements and phb degradation
Biotechnology for the livestock improvements and phb degradationDr. Naveen Gaurav srivastava
 
Environmental biotech and plant tissue culture protocols
Environmental biotech and plant tissue culture protocolsEnvironmental biotech and plant tissue culture protocols
Environmental biotech and plant tissue culture protocolsDr. Naveen Gaurav srivastava
 
Biotechnology for the livestock improvements and phb degradation
Biotechnology for the livestock improvements and phb degradationBiotechnology for the livestock improvements and phb degradation
Biotechnology for the livestock improvements and phb degradationDr. Naveen Gaurav srivastava
 
Treatment of municipal waste and industrial effluents
Treatment of municipal waste and industrial effluentsTreatment of municipal waste and industrial effluents
Treatment of municipal waste and industrial effluentsDr. Naveen Gaurav srivastava
 

Plus de Dr. Naveen Gaurav srivastava (20)

Global environmental problems
Global environmental problemsGlobal environmental problems
Global environmental problems
 
Polymerase chain reaction andRestriction fragment length polymorphism (RFLP):...
Polymerase chain reaction andRestriction fragment length polymorphism (RFLP):...Polymerase chain reaction andRestriction fragment length polymorphism (RFLP):...
Polymerase chain reaction andRestriction fragment length polymorphism (RFLP):...
 
Intracellular Compartments / Intracellular fluid
Intracellular Compartments / Intracellular fluidIntracellular Compartments / Intracellular fluid
Intracellular Compartments / Intracellular fluid
 
Types of receptors
Types of receptors Types of receptors
Types of receptors
 
Hydrogen production by microbes
Hydrogen production by microbesHydrogen production by microbes
Hydrogen production by microbes
 
Permanent Tissues of Plants
Permanent Tissues of PlantsPermanent Tissues of Plants
Permanent Tissues of Plants
 
Biotechnology for the livestock improvements and phb degradation
Biotechnology for the livestock improvements and phb degradationBiotechnology for the livestock improvements and phb degradation
Biotechnology for the livestock improvements and phb degradation
 
Environmental biotech and plant tissue culture protocols
Environmental biotech and plant tissue culture protocolsEnvironmental biotech and plant tissue culture protocols
Environmental biotech and plant tissue culture protocols
 
Monoclonal and polyclonal in diagnostics
Monoclonal and polyclonal in diagnosticsMonoclonal and polyclonal in diagnostics
Monoclonal and polyclonal in diagnostics
 
Enzyme immuno assay and radioimmunoassay
Enzyme immuno assay and radioimmunoassayEnzyme immuno assay and radioimmunoassay
Enzyme immuno assay and radioimmunoassay
 
Biotechnology for the livestock improvements and phb degradation
Biotechnology for the livestock improvements and phb degradationBiotechnology for the livestock improvements and phb degradation
Biotechnology for the livestock improvements and phb degradation
 
Shotgun and clone contig method
Shotgun and clone contig methodShotgun and clone contig method
Shotgun and clone contig method
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Vaccine production in plants
Vaccine production in plantsVaccine production in plants
Vaccine production in plants
 
Treatment of municipal waste and industrial effluents
Treatment of municipal waste and industrial effluentsTreatment of municipal waste and industrial effluents
Treatment of municipal waste and industrial effluents
 
Somaclonal variations
Somaclonal variationsSomaclonal variations
Somaclonal variations
 
Solid waste management
Solid waste managementSolid waste management
Solid waste management
 
Secondary metabolites
Secondary metabolitesSecondary metabolites
Secondary metabolites
 
Resistance to biotic stresses
Resistance to biotic stressesResistance to biotic stresses
Resistance to biotic stresses
 
Pyrosequencing
PyrosequencingPyrosequencing
Pyrosequencing
 

Dernier

TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 

Dernier (20)

TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 

Web based servers and softwares for genome analysis

  • 1. Web based servers and softwares for genome analysis Dr. Naveen Gaurav Associate Professor and Head Department of Biotechnology Shri Guru Ram Rai University Dehradun
  • 2. Characteristics of Biological Data (Genome Data Management) There are many characteristics of biological data. All these characteristics make the management of biological information a particularly challenging problem. Here mainly we will focus on characteristics of biological information and multidisciplinary field called bioinformatics. Bioinformatics, now a days has emerged with graduate degree programs in several universities. Characteristics of Biological Information: 1. There is a high amount and range of variability in data. There should be a flexibility in biological systems so that it can handle data types and values. Placing constraints on data types must be limited with such a wide range of possible data values. There can be a loss of information when there is exclusion of such values. 2. There will be a difference in representation of the same data by different biologists. This can be done even using the same system. There is a multiple ways to model any given entity with the results often reflecting the particular focus of the scientist. There should be a linking of data elements in a network of schemas. 3. Defining the complex queries and also important to the biologists. Complex queries must be supported by biological systems. Knowledge of the data structure is needed for the average users because with the help of this knowledge average user can construct a complex query across data sets on their own. For this systems must provide some tools for building these queries.
  • 3. 4. When compared with most other domains or applications, biological data becomes highly complex. Such data must ensure that no information is lost during biological data modelling and such data must be able to represent a complex substructure of data as well as relationships. An additional context is provided by the structure of the biological data for interpretation of the information. 5. There is a rapid change in schemas of biological databases. There should be a support of schema evolution and data object migration so that there can be an improved information flow between generations or releases of databases. The relational database systems support the ability to extend the schema and a frequent occurrence in the biological setting. 6. Most biologists are not likely to have knowledge of internal structure of the database or about schema design. Users need an information which can be displayed in a manner such that it can be applicable to the problem which they are trying to address. Also the data structure should be reflected in an easy and understandable manner. An information regarding the meaning of the schema is not provided to the user because of the failure by the relational schemas. A present search interfaces is provided by the web interfaces, which may limit access into the database.
  • 4. 7. There is no need of the write access to the database by the users of biological data, instead they only require read access. There is limitation of write access to the privileged users called curators. There are only small numbers of users which require write access but a wide variety of read access patterns are generated by the users into the databases. 8. Access to “old” values of the data are required by the users of biological data most often while verifying the previously reported results. Hence system of archives must support the changes to the values of the data in the database. Access to both the most recent version of data value and its previous version are important in the biological domain. 9. Added meaning is given by the context of data for its use in biological applications. Whenever appropriate, context must be maintained and conveyed to the user. For the maximization of the interpretation of a biological data value, it should be possible to integrate as many contexts as possible.
  • 5. Web based servers and softwares for genome analysis Genome browser provides a graphical interface for users to browse, search, retrieve and analyze genomic sequence and annotation data. Web-based genome browsers can be classified into general genome browsers with multiple species and species-specific genome browsers. Currently, there are two types of web-based genome browsers. The first type is the multiple-species genome browsers implemented in, among others, the UCSC genome database, the ENSEMBL project, the NCBI Map viewer website, the Phytozome and Gramene platforms. These genome browsers integrate sequence and annotations for dozens of organisms and further promote cross-species comparative analysis. Most of them contain abundant annotations, covering gene model, transcript evidence, expression profiles, regulatory data, genomic conversation, etc. 1. ENSEMBL (https://asia.ensembl.org/index.html): ENSEMBL is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. ENSEMBL annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. ENSEMBL tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species. The Ensembl Rapid Release website provides annotation for recently produced, publicly available vertebrate and non-vertebrate genomes from biodiversity initiatives such as Darwin Tree of Life, the Vertebrate Genomes Project and the Earth BioGenome Project. Method: a. Open website by the link https://grch37.ensembl.org/index.html b. Enter the name of living organism (microrganisms, fungi, algae, plants and animals) c. Enter the obtain sequence (isolated genome from test organism) and compare with data feed in ensembl tool/website.
  • 6. 2. VISTA: a. Windows Vista, the line of Microsoft Windows client operating systems released in 2006 and 2007 b. VistA, (Veterans Health Information Systems and Technology Architecture) a medical records system of the United States Department of Veterans Affairs and others worldwide c. VISTA (comparative genomics), software tools for genome analysis and genomic sequence comparisons d. VistaPro, and Vista, 3D landscape generation software for the Amiga and PC e. VIsualizing STructures And Sequences, bioinformatics software Organizations and institutions for VISTA: 1. Vista Entertainment Solutions, a New Zealand software company specializing in solutions for the cinema industry 2. Americorps VISTA, a national service program to fight poverty through local government agencies and non-profit organizations 3. Ventura Intercity Service Transit Authorit, a public transportation agency in Ventura County, California, US 4. Vista Community College, now Berkeley City College, a community college in Berkeley, California, US 5. Vista Federal Credit Union, now merged with Partners Federal Credit Union, a credit union that serves employees of The Walt Disney Company 6. Vista University, a now-closed South African university 7. Volunteers in Service to America 8. Vista Equity Partners 9. Vista Oil & Gas, an oil and gas company founded by Miguel Galuccio 10. Vista Outdoor Inc., a U.S.-based publicly traded outdoor and shooting sports compan.
  • 7. 3. UCSC Genome Browser (https://genome.ucsc.edu/): On June 22, 2000, UCSC and the other members of the International Human Genome Project consortium completed the first working draft of the human genome assembly, forever ensuring free public access to the genome and the information it contains. A few weeks later, on July 7, 2000, the newly assembled genome was released on the web at http://genome.ucsc.edu, along with the initial prototype of a graphical viewing tool, the UCSC Genome Browser. In the ensuing years, the website has grown to include a broad collection of vertebrate and model organism assemblies and annotations, along with a large suite of tools for viewing, analyzing and downloading data. In the years since its inception, the UCSC Browser has expanded to accommodate genome sequences of all vertebrate species and selected invertebrates for which high-coverage genomic sequences is available,now including 46 species. High coverage is necessary to allow overlap to guide the construction of larger contiguous regions. Genomic sequences with less coverage are included in multiple-alignment tracks on some browsers, but the fragmented nature of these assemblies does not make them suitable for building full featured browsers. (more below on multiple-alignment tracks). The species hosted with full-featured genome browsers are shown in the table. Browser functionality: The large amount of data about biological systems that is accumulating in the literature makes it necessary to collect and digest information using the tools of bioinformatics. The UCSC Genome Browser presents a diverse collection of annotation datasets (known as "tracks" and presented graphically), including mRNA alignments, mappings of DNA repeat elements, gene predictions, gene-expression data, disease-association data (representing the relationships of genes to diseases), and mappings of commercially available gene chips (e.g., Illumina and Agilent). The basic paradigm of display is to show the genome sequence in the horizontal dimension, and show graphical representations of the locations of the mRNAs, gene predictions, etc. Blocks of color along the coordinate axis show the locations of the alignments of the various data types. The ability to show this large variety of data types on a single coordinate axis makes the browser a handy tool for the vertical integration of the data.
  • 8. To find a specific gene or genomic region, the user may type in the gene name, a DNA sequence, an accession number for an RNA, the name of a genomic cytological band (e.g., 20p13 for band 13 on the short arm of chr20) or a chromosomal position (chr17:38,450,000-38,531,000 for the region around the gene BRCA1). Presenting the data in the graphical format allows the browser to present link access to detailed information about any of the annotations. The gene details page of the UCSC Genes track provides a large number of links to more specific information about the gene at many other data resources, such as Online Mendelian Inheritance in Man (OMIM) and SwissProt. Designed for the presentation of complex and voluminous data, the UCSC Browser is optimized for speed. By pre-aligning the 55 million RNAs of GenBank to each of the 81 genome assemblies (many of the 46 species have more than one assembly), the browser allows instant access to the alignments of any RNA to any of the hosted species. The juxtaposition of the many types of data allow researchers to display exactly the combination of data that will answer specific questions. A pdf/postscript output functionality allows export of a camera-ready image for publication in academic journals. One unique and useful feature that distinguishes the UCSC Browser from other genome browsers is the continuously variable nature of the display. Sequence of any size can be displayed, from a single DNA base up to the entire chromosome (human chr1 = 245 million bases, Mb) with full annotation tracks. Researchers can display a single gene, a single exon, or an entire chromosome band, showing dozens or hundreds of genes and any combination of the many annotations. A convenient drag-and- zoom feature allows the user to choose any region in the genome image and expand it to occupy the full screen. Researchers may also use the browser to display their own data via the Custom Tracks tool. This feature allows users to upload a file of their own data and view the data in the context of the reference genome assembly. Users may also use the data hosted by UCSC, creating subsets of the data of their choosing with the Table Browser tool (such as only the SNPs that change the amino acid sequence of a protein) and display this specific subset of the data in the browser as a Custom Track. Any browser view created by a user, including those containing Custom Tracks, may be shared with other users via the Saved Sessions tool.
  • 9. Analysis tools The UCSC site hosts a set of genome analysis tools, including a full-featured GUI interface for mining the information in the browser database, a FAST sequence alignment tool BLAT that is also useful for simply finding sequences in the massive sequence (human genome = 3.23 billion bases [Gb]) of any of the featured genomes. A liftOver tool uses whole-genome alignments to allow conversion of sequences from one assembly to another or between species. The Genome Graphs tool allows users to view all chromosomes at once and display the results of genome-wide association studies (GWAS). The Gene Sorter displays genes grouped by parameters not linked to genome location, such as expression pattern in tissues. 4. NCBI genome (https://www.ncbi.nlm.nih.gov/genome/): The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper.The NCBI houses a series of databases relevant to biotechnology and biomedicine and is an important resource for bioinformatics tools and services. Major databases include GenBank for DNA sequences and PubMed, a bibliographic database for biomedical literature. Other databases include the NCBI Epigenomics database. All these databases are available online through the Entrez search engine. NCBI was directed by David Lipman, one of the original authors of the BLAST sequence alignment program and a widely respected figure in bioinformatics. He also led an intramural research program, including groups led by Stephen Altschul (another BLAST co-author), David Landsman, Eugene Koonin, John Wilbur, Teresa Przytycka, and Zhiyong Lu. David Lipman stood down from his post in May 2017.
  • 10. GenBank: NCBI had responsibility for making available the GenBank DNA sequence database since 1992. GenBank coordinates with individual laboratories and other sequence databases such as those of the European Molecular Biology Laboratory (EMBL) and the DNA Data Bank of Japan (DDBJ). Since 1992, NCBI has grown to provide other databases in addition to GenBank. NCBI provides Gene, Online Mendelian Inheritance in Man, the Molecular Modeling Database (3D protein structures), dbSNP (a database of single-nucleotide polymorphisms), the Reference Sequence Collection, a map of the human genome, and a taxonomy browser, and coordinates with the National Cancer Institute to provide the Cancer Genome Anatomy Project. The NCBI assigns a unique identifier (taxonomy ID number) to each species of organism. The NCBI has software tools that are available through internet browsers or by FTP. For example, BLAST is a sequence similarity searching program. BLAST can do sequence comparisons against the GenBank DNA database in less than 15 seconds. NCBI Bookshelf: The NCBI Bookshelf is a collection of freely accessible, downloadable, on- line versions of selected biomedical books. The Bookshelf covers a wide range of topics including molecular biology, biochemistry, cell biology, genetics, microbiology, disease states from a molecular and cellular point of view, research methods, and virology. Some of the books are online versions of previously published books, while others, such as Coffee Break, are written and edited by NCBI staff. The Bookshelf is a complement to the Entrez PubMed repository of peer-reviewed publication abstracts in that Bookshelf contents provide established perspectives on evolving areas of study and a context in which many disparate individual pieces of reported research can be organized.
  • 11. Basic Local Alignment Search Tool (BLAST): BLAST is an algorithm used for calculating sequence similarity between biological sequences such as nucleotide sequences of DNA and amino acid sequences of proteins. BLAST is a powerful tool for finding sequences similar to the query sequence within the same organism or in different organisms. It searches the query sequence on NCBI databases and servers and posts the results back to the person's browser in the chosen format. Input sequences to the BLAST are mostly in FASTA or Genbank format while output could be delivered in a variety of formats such as HTML, XML formatting, and plain text. HTML is the default output format for NCBI's web-page. Results for NCBI-BLAST are presented in graphical format with all the hits found, a table with sequence identifiers for the hits having scoring related data, along with the alignments for the sequence of interest and the hits received with analogous BLAST scores for these. Entrez: The Entrez Global Query Cross-Database Search System is used at NCBI for all the major databases such as Nucleotide and Protein Sequences, Protein Structures, PubMed, Taxonomy, Complete Genomes, OMIM, and several others.[10] Entrez is both an indexing and retrieval system having data from various sources for biomedical research. NCBI distributed the first version of Entrez in 1991, composed of nucleotide sequences from PDB and GenBank, protein sequences from SWISS-PROT, translated GenBank, PIR, PRF, PDB, and associated abstracts and citations from PubMed. Entrez is specially designed to integrate the data from several different sources, databases, and formats into a uniform information model and retrieval system which can efficiently retrieve that relevant references, sequences and structures.[11] Gene: Gene has been implemented at NCBI to characterize and organize the information about genes. It serves as a major node in the nexus of the genomic map, expression, sequence, protein function, structure, and homology data. A unique GeneID is assigned to each gene record that can be followed through revision cycles. Gene records for known or predicted genes are established here and are demarcated by map positions or nucleotide sequences. Gene has several advantages over its predecessor, LocusLink, including, better integration with other databases in NCBI, broader taxonomic scope, and enhanced options for query and retrieval provided by the Entrez system.[12]
  • 12. Protein database: Protein database maintains the text record for individual protein sequences, derived from many different resources such as NCBI Reference Sequence (RefSeq) project, GenBank, PDB, and UniProtKB/SWISS-Prot. Protein records are present in different formats including FASTA and XML and are linked to other NCBI resources. Protein provides the relevant data to the users such as genes, DNA/RNA sequences, biological pathways, expression and variation data, and literature. It also provides the pre-determined sets of similar and identical proteins for each sequence as computed by the BLAST. The Structure database of NCBI contains 3D coordinate sets for experimentally-determined structures in PDB that are imported by NCBI. The Conserved Domain database (CDD of protein contains sequence profiles that characterize highly conserved domains within protein sequences. It also has records from external resources like SMART and Pfam. There is another database in a protein known as Protein Clusters database which contains sets of proteins sequences that are clustered according to the maximum alignments between the individual sequences as calculated by BLAST. Pubchem database: PubChem database of NCBI is a public resource for molecules and their activities against biological assays. PubChem is searchable and accessible by Entrez information retrieval system. Thank you References: Online notes, notes from research papers and Books by google search Engine