SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
RETRIEVING BIOLOGICAL DATA
Nucleotide sequence databases
Database search
Sequence alignment and comparison
dsdht.wikispaces.com
Biological sequence databases
Originally – just a storage place for sequences.
Currently – the databases are bioinformatics work bench which provide
many tools for retrieving, comparing and analyzing sequences.
Different types of databases:
• GenBank of NCBI (National Center for Biotechnology Information)
• The European Molecular Biology Laboratory (EMBL) database
• The DNA Data Bank of Japan (DDBJ)
Genome-centered databases
• NCBI genomes
• Ensembl Genome Browser
• UCSC Genome Bioinformatics Site
Protein Databases
• UNIPROT
• PDB
• EBI
Bioinformatics Resources

Bioinformatics resources at EBI
(http://www.ebi.ac.uk/services)
Bioinformatics Resources
Entrez (http://www.ncbi.nlm.nih.gov/sites/gquery)

A service of the US National Center for Biotechnology Information
(NCBI) 40 core search pages interconnecting different types of
information.
More resources
GenBank - is a genetic sequence database that provides an annotated
collection of all genomic and cDNA sequences that are in the public
Domain.
PubMed - PubMed is a literature search engine providing access to all
published medically related articles, abstracts and journals.
Subscribe to youtube videos to learn about Genbank tools and services
http://www.youtube.com/user/NCBINLM/videos
Facebook page for notifications and updates (https:/www .facebook
.com/ncbi.nlm)
Ensemble
Ensembl is a joined project between European Bioinformatics Institute (EMBO) and the Sanger
Institute in Cambridge. Its major focus is the genomes of animals and other eukaryotes. The
annotation of most of the genomes in Ensembl database have been processed automatically
though annotation pipelines. Genomes of five species (human, mouse, zebrafish, pig and dog)
have been carefully annotated by manual curation.

Check out the video for Ensemble videos
https://www.youtube.com/watch?v=bGryvTCOMGA
https://www.youtube.com/watch?v=ZpnQOOxXufM
Ensemble
A nice feature is cutomizable home page that keeps track of your searches even if
done from different computers. Ensemble has many tools for extracting and
downloading user specific aspects of the data in various formats for high end
bioinformatics.

http://www.ensembl
.org/
UCSC Genome Bioinformatics Site
A highly sophisticated genome browser allowing for visualization of many genome
features: mapping and sequencing, phenotype and disease association, genes and
gene predictions, transcript evidence, gene expression data, comparative genomics,
sequence variations. (http://genome.ucsc.edu/)

Link to the tutorial for UCSC Genome Browser
(http://www.sciencedirect.com/science/article/pii/S0888754308000451)
Facts while working with sequence
databases
• NCBI (The National Center for Biotechnology Information) accepts all submitted
sequences and does not check whether the sequence is accurate or not.
• Annotations of most genes and genomes are done by researchers and except
for specific ‘Reference Sequences’ (RefSeqs) are not curated by Genbank
experts.

So it means, the structure of the same gene
reported by different labs can be different!
• All genomes are full of SNPs and other polymorphisms.
• Gene finding algorithms are not perfect, especially when it comes
to predicting splice sites .
• Alternative splicing can lead to different transcripts (different
versions) of the same gene.

There is no single correct sequence for a given gene
Extracting sequence of a gene from GenBank
GenBank
Go to : http://www.ncbi.nlm.nih.gov
Aim: To find sequence of E.coli trpA gene
Search
Name

Select
database
Nucleotide
GenBank
In the results page, find the link to the entry
of interest. The list may have many pages.
The link you are looking for, might not be the
first entry.
a standard Genbank entry

the genomic database

Use the „find‟ function of
your browser to find “trpA” gene.
GenBank
Scroll to the bottom of the file to see the nucleotide and
amino acid sequence of the gene

Note, that this gene is encoded in a complementary
strand relative to the one shown in the database.
Therefore, the sequence shows the template strand.

amino acid sequence of the encoded protein

nucleotide sequence of the gene.
GenBank
Choose the FASTA format from the entry page to get a „pure‟
sequence that you can cut and paste into another application .
Homologous sequences
If sequences of two proteins are similar, they most likely are derived from the same ancestor and therefore, have similar
structures and similar biological functions.

Science by analogy:
If something is true for one such sequence it is probably true also for the other.
Studying a protein (a gene) in the lab takes years, searching a database takes seconds.
Proteins (genes) that have the same ancestor are called homologous.
Homologous proteins (genes) usually have similar sequences, structures and functions.
A general rule: two proteins are likely homologous if their sequences are at least 25%
identical. DNA sequences are likely homologous if they are at least 70% identical.
Do not confuse homology and similarity. Homology is a binary parameter (homologus vs.
non-homologous) and reflects evolutionary relationship between the sequences.
Similarity is a quantitative measure – two sequences can be more similar or less similar.
Similarity does not necessarily reflect evolutionary relationship between sequences.
Sequence similarity
•

A pair-wise comparison of sequences is a very common task in bioinformatics
and in genomics. By aligning two sequences we are testing the hypothesis that
the sequences are homologous, meaning that each pair of aligned residues are
descendants of a common ancestral residue.

•

If two sequences have some similarity simply by chance – this does not have any
biological significance. But if two sequences are homologous, by comparing them
we can gain information about their structure, function and evolution.

•

Computational alignment of two sequences is based on assigning a score to each
possible alignment. The score is composed of bonuses (for a match) and
penalties for a mismatch or indel. Let us assign bonus of +3 points for a match, a
penalty of -1 for a mismatch and a penalty of -5 for an indel(insertions or
deletions).
Example align two sequences
Align two sequences: ACGCTTGA and ACCCGTTA

The choice of parameters depends on the biological
model.
The choice of parameters defines the score of the
alignment.
BLAST-Basic Local Alignment and Search
Tool
Once you have determined the sequence of your protein (gene), the next
most important thing to do is to find in the database sequences similar to
the one you are studying and to retrieve as much information about those
sequences as possible. Then you decide which of the known information
applies to your protein (gene).
BLAST comes in many different flavors. The choice of the BLAST protocol
depends on your sequence and the question you ask.
Check the youtube video for BLAST and FASTA programs
https://www.youtube.com/watch?v=LlnMtI2Sg4g
Tutorial for Using BLAST
https://www.youtube.com/watch?v=HXEpBnUbAMo
Tutorial for using PSI BLAST
https://www.youtube.com/watch?v=T3kHEieyylk
Blast Outputs
• The bit score („goodness‟ of the match). Ignore
anything below 50.
• The E-value (the expectation value). Statistical
significance of the match to the hit: how often a
match like this would be found if there is no real
match in the database. Matches with the E-value
below 0.001 (e-3) is worth exploring.

Contenu connexe

Tendances (20)

blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
FASTA
FASTAFASTA
FASTA
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Whole genome shotgun sequencing
Whole genome shotgun sequencingWhole genome shotgun sequencing
Whole genome shotgun sequencing
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
BLAST
BLASTBLAST
BLAST
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
 
biological detabase
biological detabasebiological detabase
biological detabase
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Kegg databse
Kegg databseKegg databse
Kegg databse
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Est database
Est databaseEst database
Est database
 

Similaire à Sequencedatabases

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 
database retrival.pdf
database retrival.pdfdatabase retrival.pdf
database retrival.pdfSrimathideviJ
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxRanjan Jyoti Sarma
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
Hands on training_biological_databases.ppt
Hands on training_biological_databases.pptHands on training_biological_databases.ppt
Hands on training_biological_databases.pptSoumen Barman
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRONPrabin Shakya
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3AhmedAbdElMoniem35
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesJackie Wirz, PhD
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptxericndunek
 

Similaire à Sequencedatabases (20)

Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Ncbi
NcbiNcbi
Ncbi
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
 
Article
ArticleArticle
Article
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
database retrival.pdf
database retrival.pdfdatabase retrival.pdf
database retrival.pdf
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Hands on training_biological_databases.ppt
Hands on training_biological_databases.pptHands on training_biological_databases.ppt
Hands on training_biological_databases.ppt
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
Harvester I
Harvester IHarvester I
Harvester I
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3
 
BLAST
BLASTBLAST
BLAST
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners Slides
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Plus de Abhik Seal

Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in rAbhik Seal
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryAbhik Seal
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
Data handling in r
Data handling in rData handling in r
Data handling in rAbhik Seal
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical DatasetsAbhik Seal
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsAbhik Seal
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to functionAbhik Seal
 
Chemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataChemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataAbhik Seal
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles Abhik Seal
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with googleAbhik Seal
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using dataAbhik Seal
 
R scatter plots
R scatter plotsR scatter plots
R scatter plotsAbhik Seal
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorialAbhik Seal
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
PharmacohorepptAbhik Seal
 

Plus de Abhik Seal (20)

Chemical data
Chemical dataChemical data
Chemical data
 
Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in r
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug Discovery
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Data handling in r
Data handling in rData handling in r
Data handling in r
 
Networks
NetworksNetworks
Networks
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug Reactions
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Chemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataChemical File Formats for storing chemical data
Chemical File Formats for storing chemical data
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with google
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data
 
Poster
PosterPoster
Poster
 
R scatter plots
R scatter plotsR scatter plots
R scatter plots
 
Indo us 2012
Indo us 2012Indo us 2012
Indo us 2012
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorial
 
Weka guide
Weka guideWeka guide
Weka guide
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
Pharmacohoreppt
 
Document1
Document1Document1
Document1
 

Dernier

6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPCeline George
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
How to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineHow to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineCeline George
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEPART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEMISSRITIMABIOLOGYEXP
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 

Dernier (20)

Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERP
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
 
How to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineHow to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command Line
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEPART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 

Sequencedatabases

  • 1. RETRIEVING BIOLOGICAL DATA Nucleotide sequence databases Database search Sequence alignment and comparison dsdht.wikispaces.com
  • 2. Biological sequence databases Originally – just a storage place for sequences. Currently – the databases are bioinformatics work bench which provide many tools for retrieving, comparing and analyzing sequences. Different types of databases: • GenBank of NCBI (National Center for Biotechnology Information) • The European Molecular Biology Laboratory (EMBL) database • The DNA Data Bank of Japan (DDBJ) Genome-centered databases • NCBI genomes • Ensembl Genome Browser • UCSC Genome Bioinformatics Site Protein Databases • UNIPROT • PDB • EBI
  • 3. Bioinformatics Resources Bioinformatics resources at EBI (http://www.ebi.ac.uk/services)
  • 4. Bioinformatics Resources Entrez (http://www.ncbi.nlm.nih.gov/sites/gquery) A service of the US National Center for Biotechnology Information (NCBI) 40 core search pages interconnecting different types of information.
  • 5. More resources GenBank - is a genetic sequence database that provides an annotated collection of all genomic and cDNA sequences that are in the public Domain. PubMed - PubMed is a literature search engine providing access to all published medically related articles, abstracts and journals. Subscribe to youtube videos to learn about Genbank tools and services http://www.youtube.com/user/NCBINLM/videos Facebook page for notifications and updates (https:/www .facebook .com/ncbi.nlm)
  • 6. Ensemble Ensembl is a joined project between European Bioinformatics Institute (EMBO) and the Sanger Institute in Cambridge. Its major focus is the genomes of animals and other eukaryotes. The annotation of most of the genomes in Ensembl database have been processed automatically though annotation pipelines. Genomes of five species (human, mouse, zebrafish, pig and dog) have been carefully annotated by manual curation. Check out the video for Ensemble videos https://www.youtube.com/watch?v=bGryvTCOMGA https://www.youtube.com/watch?v=ZpnQOOxXufM
  • 7. Ensemble A nice feature is cutomizable home page that keeps track of your searches even if done from different computers. Ensemble has many tools for extracting and downloading user specific aspects of the data in various formats for high end bioinformatics. http://www.ensembl .org/
  • 8. UCSC Genome Bioinformatics Site A highly sophisticated genome browser allowing for visualization of many genome features: mapping and sequencing, phenotype and disease association, genes and gene predictions, transcript evidence, gene expression data, comparative genomics, sequence variations. (http://genome.ucsc.edu/) Link to the tutorial for UCSC Genome Browser (http://www.sciencedirect.com/science/article/pii/S0888754308000451)
  • 9. Facts while working with sequence databases • NCBI (The National Center for Biotechnology Information) accepts all submitted sequences and does not check whether the sequence is accurate or not. • Annotations of most genes and genomes are done by researchers and except for specific ‘Reference Sequences’ (RefSeqs) are not curated by Genbank experts. So it means, the structure of the same gene reported by different labs can be different! • All genomes are full of SNPs and other polymorphisms. • Gene finding algorithms are not perfect, especially when it comes to predicting splice sites . • Alternative splicing can lead to different transcripts (different versions) of the same gene. There is no single correct sequence for a given gene
  • 10. Extracting sequence of a gene from GenBank
  • 11. GenBank Go to : http://www.ncbi.nlm.nih.gov Aim: To find sequence of E.coli trpA gene Search Name Select database Nucleotide
  • 12. GenBank In the results page, find the link to the entry of interest. The list may have many pages. The link you are looking for, might not be the first entry. a standard Genbank entry the genomic database Use the „find‟ function of your browser to find “trpA” gene.
  • 13. GenBank Scroll to the bottom of the file to see the nucleotide and amino acid sequence of the gene Note, that this gene is encoded in a complementary strand relative to the one shown in the database. Therefore, the sequence shows the template strand. amino acid sequence of the encoded protein nucleotide sequence of the gene.
  • 14. GenBank Choose the FASTA format from the entry page to get a „pure‟ sequence that you can cut and paste into another application .
  • 15. Homologous sequences If sequences of two proteins are similar, they most likely are derived from the same ancestor and therefore, have similar structures and similar biological functions. Science by analogy: If something is true for one such sequence it is probably true also for the other. Studying a protein (a gene) in the lab takes years, searching a database takes seconds. Proteins (genes) that have the same ancestor are called homologous. Homologous proteins (genes) usually have similar sequences, structures and functions. A general rule: two proteins are likely homologous if their sequences are at least 25% identical. DNA sequences are likely homologous if they are at least 70% identical. Do not confuse homology and similarity. Homology is a binary parameter (homologus vs. non-homologous) and reflects evolutionary relationship between the sequences. Similarity is a quantitative measure – two sequences can be more similar or less similar. Similarity does not necessarily reflect evolutionary relationship between sequences.
  • 16. Sequence similarity • A pair-wise comparison of sequences is a very common task in bioinformatics and in genomics. By aligning two sequences we are testing the hypothesis that the sequences are homologous, meaning that each pair of aligned residues are descendants of a common ancestral residue. • If two sequences have some similarity simply by chance – this does not have any biological significance. But if two sequences are homologous, by comparing them we can gain information about their structure, function and evolution. • Computational alignment of two sequences is based on assigning a score to each possible alignment. The score is composed of bonuses (for a match) and penalties for a mismatch or indel. Let us assign bonus of +3 points for a match, a penalty of -1 for a mismatch and a penalty of -5 for an indel(insertions or deletions).
  • 17. Example align two sequences Align two sequences: ACGCTTGA and ACCCGTTA The choice of parameters depends on the biological model. The choice of parameters defines the score of the alignment.
  • 18. BLAST-Basic Local Alignment and Search Tool Once you have determined the sequence of your protein (gene), the next most important thing to do is to find in the database sequences similar to the one you are studying and to retrieve as much information about those sequences as possible. Then you decide which of the known information applies to your protein (gene). BLAST comes in many different flavors. The choice of the BLAST protocol depends on your sequence and the question you ask. Check the youtube video for BLAST and FASTA programs https://www.youtube.com/watch?v=LlnMtI2Sg4g Tutorial for Using BLAST https://www.youtube.com/watch?v=HXEpBnUbAMo Tutorial for using PSI BLAST https://www.youtube.com/watch?v=T3kHEieyylk
  • 19. Blast Outputs • The bit score („goodness‟ of the match). Ignore anything below 50. • The E-value (the expectation value). Statistical significance of the match to the hit: how often a match like this would be found if there is no real match in the database. Matches with the E-value below 0.001 (e-3) is worth exploring.