SlideShare une entreprise Scribd logo
1  sur  49
A Guided SQL Tour of
Bioinformatics Databases
Yannick Pouliot, PhD
Bioresearch Informationist
lanebioresearch@stanford.edu
Lane Medical Library & Knowledge Management Center
2/28/2007

Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
Content





Very abbreviated review of the relational principle
Some of the technology required to connect to a
remote database
Walk-through of the database schema for Ensembl




Walk-through of the database schema for
BioWarehouse




Hands-on querying

Hands-on querying

Resources


Details on connecting to a remote database

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

2
So Why Are We Here?

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

3
Bioinformatics Databases: Who
Supports Direct Querying?

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

4
Relational Database Terms


Database: Collection of tables and relationship

between tables


Table
 Collection of records that share a common
fundamental characteristic




E.g., patients and locations can each be stored in their own
table

Record
 Basic unit of information in a relational database


E.g., 1 record per perso

A record is composed of columns (“fields”)
Query
 Set of instructions to a database “engine” to retrieve,
sort and format returning data.






“find me all patients in my database”

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

5
Main Relational Database “Engines”




Filemaker
MS Access
MS SQL Server

 MySQL
 Oracle



Postgress
Sybase

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

6
Structure of Relational DB Tables

Data values
live in rows

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

7
Understanding the Relational Principle: A
Simple Database
“join”








Every patient gets ONE record in the Patients table
Every visit gets ONE record in the Visits table
Rows in different tables can be related one to another
using a shared key (identifier)
There can be multiple visits records for a given patient
There can be multiple tissue records for a given patient
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

return

8
The Relational Principle at Work


Related records can be found using a shared
key


Example: Patients.ID = Visits.PatientID
Table name Primary Key

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

9
SQL Querying…With What?
Query browsers used here:

MySQL Query Browser

WinSQL

Other query browsers exist but are more sophisticated



Often more expensive or more complex
Example: PL/SQL Developer, from Allround Automations

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

10
Example: Network Querying of Ensembl
Database Using MySQL Query Browser


What happens when you use query a remote
database?





DEMO

Of note:
May take some time




Big database, lots of data to return from far away…
Easy to write queries with voluminous output
May have to kill the query…

Setting up ODBC: not discussed here, but cheat sheet instructions are in
handout. Location will also be mailed
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

11
The Database Schema: Your
Roadmap For Querying


The schema describes all tables and all fields




Used to determine how to inter-relate tables to
retrieve the desired data

Very important:



Must understand schema for accurate querying
Wrong understanding = wrong results

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

12
Introducing The SQL Select Statement


Good news: This is the only SQL
statement you need to understand for
querying
SELECT LastName, FirstName
FROM Patients

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

13
Basic Syntax of Select Statement
SELECT field_name
FROM table
[WHERE condition]

[ ] = elective

Example:

Select LastName,FirstName
From Patients
Where Alive = ‘Y’;
Note: case sensitive for all but Oracle
 Query statement are written into a tool such as MS Query or
MySQL Query Browser
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

Handout: p2

14
SELECT – (Some) Details

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

15
Moving On:
Real
Biodatabase

Schemas

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

16
Schemas We’ll Look At…


Remember: Schemas…



describe all tables and all fields
used to determine how to inter-relate tables to
retrieve the desired data

Our schemas today:
 Ensembl
 BioWarehouse
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

17
Ensembl








Produced by Sanger Institute
Collection of genome databases for many different
organisms
Free, open source
Web querying: http://www.ensembl.org/
FAQ: What is Ensembl?
All PubMed references pertaining to Ensembl and written
by the Ensembl group

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

18
Exploring the Ensembl Schema


Ensembl CORE schema documentation




First place to go to answer: “what does this table
store?”
Problem: no graphical representation of overall
schema
Relationships harder to appreciate
 Use Catalog function and go from there…


Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

19
“Fundamental” Tables
Fundamental tables
assembly
assembly_exception
attrib_type
coord_system
dna
dnac
exon
exon_stable_id
exon_transcript
gene
gene_stable_id
karyotype
meta
meta_coord
prediction_exon
prediction_transcript
seq_region
seq_region_attrib
supporting_feature
transcript
transcript_attrib
transcript_stable_id
translation
translation_attrib
translation_stable_id

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

Features and analyses
alt_allele
analysis
analysis_description
density_feature
density_type
dna_align_feature
map
marker
marker_feature
marker_map_location
marker_synonym
misc_attrib
misc_feature
misc_feature_misc_set
misc_set
prediction_transcript
protein_align_feature
protein_feature
qtl
qtl_feature
qtl_synonym
regulatory_factor
regulatory_factor_coding
regulatory_feature
regulatory_feature_object
regulatory_search_region
repeat_consensus
repeat_feature
simple_feature

ID Mapping (Map identifiers between releases)
gene_archive
mapping_session
peptide_archive
stable_id_event

Exernal references (IDs to objects in other dbs)
external_db
external_synonym
go_xref
identity_xref
object_xref
xref

Miscellaneous
interpro

20
Understanding
The Ensembl
Schema Using
The Catalog

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

21
Querying Ensembl
 Ensembl

runs on the MySQL
database engine
We’ll use WinSQL


MySQL Query Browser can also
be used, as well as lots of other
querying tools

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

22
Before Proceeding: A Word of Caution


Easy to write queries that…



Retrieve nonsense
Never complete






Scotty to Captain Kirk: “Where going in circles, and at warp 6
we’re going mighty fast…”

Understanding schema is only way to prevent this

Tips:




Use “count” to determine the number of rows in table
BEFORE returning large datasets
Remember: the more tables are joined, the slower the
query

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

Go to join
23
Demo Queries… To Get You
Started




Query 1: return number of genes stored in
Ensembl Human
Query 2: return number of transcripts
produced by genes stored in Ensembl
Human
 Demonstrates JOINing of tables

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

24
Exercises
Together:
1. the number of genes stored in Ensembl Human
 2. the number of transcripts produced by genes stored in
Ensembl Human
(10 min)


On your own:
3. the types of analyses that Ensembl provides
 4. the number of types of markers
 5. the number of markers per chromosome for all chromosomes
 6. Extra points: the minimum and maximum marker distances for
markers on chromosome 19
(20 min)


Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

25
SELCT Statement: A Refresher
“Modifiers” of
select list:
 DISTINCT
FROM table_list
 COUNT
[WHERE conditions]
 SUM
 MIN
[START WITH] [CONNECT BY]
 MAX
[GROUP BY group_by_list]
Also:
 ORDER BY
[HAVING search_conditions]
 LIKE (used in
[ORDER BY order_list [ASC | DESC] ]
WHERE clause)
SELECT [DISTINCT] select_list

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

26
Example Of A Biologically-Useful
Query: All Markers on Chromosome 1

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

27
Now We’re Talking: Returning
Results into Your Favorite
Tool


SQL query results returned to…


MS Excel


… using Data/Import External Data/New
Database Query




Details: Excel Advanced Report Development
, Zapawa 2005

Spotfire

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

In Lane
catalog

28
Next:









BioWarehouse

Produced by SRI International
Integration of genome, biochem rxns, pathways, etc databases from
many different organisms
Free, open source
Accessing PublicHouse
FAQ
Schema
All PubMed references pertaining to BioWarehouse and written by
the BioWarehouse group
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

29
Conceptual Views of the
BioWarehouse Database

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

30
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

31
Querying BioWarehouse



We’ll query using MySQL Query Browser
Caveats:


Lots of datasets supported by BioWarehouse…


.. but some critical ones are missing from publichouse
due to licensing requirements, e.g.,





Also: Need to request account to query




MetaCyc
UniProt

Anonymous user not supported

Resource: MySQL v5 Reference Manual
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

32
BioWarehouse Demo Queries
…to get you started





Query 1: What are the datasets available in
PublicHouse?
Query 2: How many pathways are there for
the EcoCyc dataset?

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

33
Example Biologically Meaningful Query Of BioWarehouse:
For a Given Pathway, Return Proteins Involved Pathway
and Their Molecular Weight

SELECT D.Name as PathwayName,J.WID AS
ProteinWID, J.Name AS ProteinName,
J.MolecularWeightCalc AS MolecularWeightCalc
FROM Pathway D,PathwayReaction F, Reaction G,
EnzymaticReaction H, Protein J
WHERE D.WID = F.PathwayWID AND
F.ReactionWID = G.WID
AND G.WID = H.ReactionWID and H.ProteinWID =
J.WID
AND D.DataSetWID=19
AND D.Name LIKE "%lipopolysaccharide%"
ORDER BY ProteinName
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

34
Exercises
Together:



1. How many datasets are there in PublicHouse?
2. What is the number of genes in S. aureus
(SAUR158878Cyc)?

(10 min)
On your own:
3. List the coding region start and ends for all genes that
code for proteins in the SAUR158878Cyc dataset
 4. How many biochemical reactions are there in each
pathway (of any type) in the EcoCyc (=E. coli) dataset?
(20 min)


Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

35
In Summary…




Knowing the db schema is essential
SELECT statement all you need to know
Remote databases good for exploring a schema at
low cost




No installation…

But:




Performance can be poor
Restrictions on data set
Better to install locally if “real work” to be performed

Remember: SQL gives you the power to return results
directly into your favorite tool!

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

36
Don’t Forget The
Class Evaluation

Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
Resources

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

38
Setting-Up for
Internet SQL
Querying
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

39
Setting Up Data Source Names
Steps
1. Make sure you have the requisite
driver (next slide)
2. Create a Data Source Name (Windows
only)
3.
4.

Write your query
Get the results back into Excel!
See Lane videorecorded class Managing
Experiment Data Using Excel and Friends:
Digging Out from Under the Avalanche for lots
more details.
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

40
Step 1: Getting Drivers
Essential for SQL Querying


A driver is a piece of software that lets your
operating system talk to a database


Installed drivers visible in ODBC manager




Each database engine (Oracle, MySQL, etc)
requires its own driver






“data connectivity” tool

Generally must be installed by user

Drivers are needed by Data Source Name
tool and querying programs
Require (simple) installation
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

41
MySQL Driver: Needed to Query
MySQL Databases




Windows: Download MySQL
Connector/ODBC 3.51 here
Must be installed for direct querying using
e.g. Excel


Not necessary if you are using the MySQL Query
Browser

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

42
Oracle Driver: Needed to Query
Oracle Databases


Installing “client” software will also install
driver






Windows: Download 10g Client here
Mac: Download 10g Client here
Free Oracle user account required to
download

Must be installed if you are querying
using MS Query or any other query
browser involving Oracle
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

43
Step 2: Creating a Data Source Name




A Data Source Name (DSN) tells programs
on your PC where and how to query a
database
Populating the fields:





Data Source Name: Unique name of your choice
Description: anything
Server: exactly as given by the database provider
Port number: as specified by database provider


Defaults: MySQL: 3306; Oracle: 1521; MS Access: N/A

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

44
Resources – SQL



eBook: Beginning SQL
eBook: Learning SQL

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

45
Lots More Resources From Lane

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

46
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

47
How To Get Accounts for Direct
SQL Querying
Direct Querying of Selected Bioinformatics Databases
Database

How?

DB
Engine
MySQL

BioWarehouse

http://biowarehouse.ai.sri.com/
 get account for access to publichouse
(publicly-accessible installation of
BioWarehouse; see
http://biowarehouse.ai.sri.com/PublicHouse
Overview.html

Ensembl

http://www.ensembl.org/info/data/download MySQL
.html

Mouse Genome
Database

Mail mgi-help@informatics.jax.org to ask
for an account

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

Sybase

48
Example Querying with MySQL Query
Browser






Free
MySQL only
Facilitates writing of a SQL query 
Execute
graphical
statement
Query statement
Get it at http://www.mysql.com/products/tools/querybrowser/

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

Table descriptions

49

Contenu connexe

Tendances

Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsBITS
 
Functional annotation- prediction of genes.pptx
Functional annotation- prediction of genes.pptxFunctional annotation- prediction of genes.pptx
Functional annotation- prediction of genes.pptxSridharshinisathishk
 
Use of Rasmol and study of proteins
Use of Rasmol and study of proteins Use of Rasmol and study of proteins
Use of Rasmol and study of proteins kamalmodi481
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPuneet Kulyana
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformaticianChristian Frech
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformaticsavrilcoghlan
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matricesAshwini
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptxPagudalaSangeetha
 
Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designingW Roseybala Devi
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission ToolsRishikaMaji
 
Recent trends in bioinformatics
Recent trends in bioinformaticsRecent trends in bioinformatics
Recent trends in bioinformaticsZeeshan Hanjra
 

Tendances (20)

Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
 
Pub med
Pub medPub med
Pub med
 
Functional annotation- prediction of genes.pptx
Functional annotation- prediction of genes.pptxFunctional annotation- prediction of genes.pptx
Functional annotation- prediction of genes.pptx
 
Use of Rasmol and study of proteins
Use of Rasmol and study of proteins Use of Rasmol and study of proteins
Use of Rasmol and study of proteins
 
Network Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and ApplicationsNetwork Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and Applications
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics ppt
Bioinformatics pptBioinformatics ppt
Bioinformatics ppt
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Data mining
Data miningData mining
Data mining
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Swiss pdb viewer
Swiss pdb viewerSwiss pdb viewer
Swiss pdb viewer
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Global alignment
Global alignmentGlobal alignment
Global alignment
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
 
Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designing
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
Parsimony methods
Parsimony methodsParsimony methods
Parsimony methods
 
Recent trends in bioinformatics
Recent trends in bioinformaticsRecent trends in bioinformatics
Recent trends in bioinformatics
 

Similaire à A guided SQL tour of bioinformatics databases

Managing experiment data using Excel and Friends
Managing experiment data using Excel and FriendsManaging experiment data using Excel and Friends
Managing experiment data using Excel and FriendsYannick Pouliot
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introductionHasan Kata
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introductionsanjaychauhan689
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 
Sql a practical_introduction
Sql a practical_introductionSql a practical_introduction
Sql a practical_introductioninvestnow
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyNeil Swainston
 
Enug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-finalEnug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-finalENUG
 
Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database SystemsDenis Shestakov
 

Similaire à A guided SQL tour of bioinformatics databases (20)

Managing experiment data using Excel and Friends
Managing experiment data using Excel and FriendsManaging experiment data using Excel and Friends
Managing experiment data using Excel and Friends
 
Chemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleansChemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleans
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introduction
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introduction
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
ChemSpider – An Online Database and Registration System Linking the Web
ChemSpider – An Online Database and  Registration System Linking the WebChemSpider – An Online Database and  Registration System Linking the Web
ChemSpider – An Online Database and Registration System Linking the Web
 
Sql a practical_introduction
Sql a practical_introductionSql a practical_introduction
Sql a practical_introduction
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
Systematic reviews searching part 2 2019
Systematic reviews searching part 2 2019Systematic reviews searching part 2 2019
Systematic reviews searching part 2 2019
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
 
Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...
Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...
Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...
 
Databases
DatabasesDatabases
Databases
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
 
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
 
Enug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-finalEnug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-final
 
Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database Systems
 
Pivoting approach-eav-data-dinu-2006
Pivoting approach-eav-data-dinu-2006Pivoting approach-eav-data-dinu-2006
Pivoting approach-eav-data-dinu-2006
 
WWW in biotechnology
WWW in biotechnology WWW in biotechnology
WWW in biotechnology
 

Plus de Yannick Pouliot

Survey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and AnalyticsSurvey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and AnalyticsYannick Pouliot
 
Systems Immunology -- 2014
Systems Immunology -- 2014Systems Immunology -- 2014
Systems Immunology -- 2014Yannick Pouliot
 
Essential UNIX skills for biologists
Essential UNIX skills for biologistsEssential UNIX skills for biologists
Essential UNIX skills for biologistsYannick Pouliot
 
Ontologically-Aware Automated Gating
Ontologically-Aware Automated GatingOntologically-Aware Automated Gating
Ontologically-Aware Automated GatingYannick Pouliot
 
Why The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendWhy The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendYannick Pouliot
 
There’s No Avoiding It: Programming Skills You’ll Need
There’s No Avoiding It:  Programming Skills You’ll NeedThere’s No Avoiding It:  Programming Skills You’ll Need
There’s No Avoiding It: Programming Skills You’ll NeedYannick Pouliot
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataYannick Pouliot
 
Predicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening DataPredicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening DataYannick Pouliot
 
Repositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational ApproachesRepositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational ApproachesYannick Pouliot
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 

Plus de Yannick Pouliot (10)

Survey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and AnalyticsSurvey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and Analytics
 
Systems Immunology -- 2014
Systems Immunology -- 2014Systems Immunology -- 2014
Systems Immunology -- 2014
 
Essential UNIX skills for biologists
Essential UNIX skills for biologistsEssential UNIX skills for biologists
Essential UNIX skills for biologists
 
Ontologically-Aware Automated Gating
Ontologically-Aware Automated GatingOntologically-Aware Automated Gating
Ontologically-Aware Automated Gating
 
Why The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendWhy The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best Friend
 
There’s No Avoiding It: Programming Skills You’ll Need
There’s No Avoiding It:  Programming Skills You’ll NeedThere’s No Avoiding It:  Programming Skills You’ll Need
There’s No Avoiding It: Programming Skills You’ll Need
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
 
Predicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening DataPredicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening Data
 
Repositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational ApproachesRepositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational Approaches
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 

Dernier

Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsMedicoseAcademics
 
💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...
💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...
💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...dilbirsingh0889
 
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...gragneelam30
 
Call Girls Pune Just Call 9142599079 Top Class Call Girl Service Available
Call Girls Pune Just Call 9142599079 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9142599079 Top Class Call Girl Service Available
Call Girls Pune Just Call 9142599079 Top Class Call Girl Service AvailableSheetaleventcompany
 
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service AvailableCall Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service AvailableJanvi Singh
 
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...Sheetaleventcompany
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan 087776558899
 
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...amritaverma53
 
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...soniyagrag336
 
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...call girls hydrabad
 
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxSwetaba Besh
 
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...gragneelam30
 
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Sheetaleventcompany
 
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...Sheetaleventcompany
 
❤️ Chandigarh Call Girls☎️98151-579OO☎️ Call Girl service in Chandigarh ☎️ Ch...
❤️ Chandigarh Call Girls☎️98151-579OO☎️ Call Girl service in Chandigarh ☎️ Ch...❤️ Chandigarh Call Girls☎️98151-579OO☎️ Call Girl service in Chandigarh ☎️ Ch...
❤️ Chandigarh Call Girls☎️98151-579OO☎️ Call Girl service in Chandigarh ☎️ Ch...Rashmi Entertainment
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana GuptaLifecare Centre
 
Lucknow Call Girls Just Call 👉👉8630512678 Top Class Call Girl Service Available
Lucknow Call Girls Just Call 👉👉8630512678 Top Class Call Girl Service AvailableLucknow Call Girls Just Call 👉👉8630512678 Top Class Call Girl Service Available
Lucknow Call Girls Just Call 👉👉8630512678 Top Class Call Girl Service Availablesoniyagrag336
 
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Namrata Singh
 
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...Sheetaleventcompany
 
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...Sheetaleventcompany
 

Dernier (20)

Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanisms
 
💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...
💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...
💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...
 
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
 
Call Girls Pune Just Call 9142599079 Top Class Call Girl Service Available
Call Girls Pune Just Call 9142599079 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9142599079 Top Class Call Girl Service Available
Call Girls Pune Just Call 9142599079 Top Class Call Girl Service Available
 
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service AvailableCall Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
 
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
 
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
 
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
 
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
 
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
 
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...
 
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
 
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
 
❤️ Chandigarh Call Girls☎️98151-579OO☎️ Call Girl service in Chandigarh ☎️ Ch...
❤️ Chandigarh Call Girls☎️98151-579OO☎️ Call Girl service in Chandigarh ☎️ Ch...❤️ Chandigarh Call Girls☎️98151-579OO☎️ Call Girl service in Chandigarh ☎️ Ch...
❤️ Chandigarh Call Girls☎️98151-579OO☎️ Call Girl service in Chandigarh ☎️ Ch...
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
 
Lucknow Call Girls Just Call 👉👉8630512678 Top Class Call Girl Service Available
Lucknow Call Girls Just Call 👉👉8630512678 Top Class Call Girl Service AvailableLucknow Call Girls Just Call 👉👉8630512678 Top Class Call Girl Service Available
Lucknow Call Girls Just Call 👉👉8630512678 Top Class Call Girl Service Available
 
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
 
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
❤️Amritsar Escorts Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amri...
 
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
Call Girl In Chandigarh 📞9809698092📞 Just📲 Call Inaaya Chandigarh Call Girls ...
 

A guided SQL tour of bioinformatics databases

  • 1. A Guided SQL Tour of Bioinformatics Databases Yannick Pouliot, PhD Bioresearch Informationist lanebioresearch@stanford.edu Lane Medical Library & Knowledge Management Center 2/28/2007 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu
  • 2. Content    Very abbreviated review of the relational principle Some of the technology required to connect to a remote database Walk-through of the database schema for Ensembl   Walk-through of the database schema for BioWarehouse   Hands-on querying Hands-on querying Resources  Details on connecting to a remote database Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 2
  • 3. So Why Are We Here? Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 3
  • 4. Bioinformatics Databases: Who Supports Direct Querying? Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 4
  • 5. Relational Database Terms  Database: Collection of tables and relationship between tables  Table  Collection of records that share a common fundamental characteristic   E.g., patients and locations can each be stored in their own table Record  Basic unit of information in a relational database  E.g., 1 record per perso A record is composed of columns (“fields”) Query  Set of instructions to a database “engine” to retrieve, sort and format returning data.    “find me all patients in my database” Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 5
  • 6. Main Relational Database “Engines”    Filemaker MS Access MS SQL Server  MySQL  Oracle   Postgress Sybase Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 6
  • 7. Structure of Relational DB Tables Data values live in rows Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 7
  • 8. Understanding the Relational Principle: A Simple Database “join”      Every patient gets ONE record in the Patients table Every visit gets ONE record in the Visits table Rows in different tables can be related one to another using a shared key (identifier) There can be multiple visits records for a given patient There can be multiple tissue records for a given patient Lane Medical Library & Knowledge Management Center http://lane.stanford.edu return 8
  • 9. The Relational Principle at Work  Related records can be found using a shared key  Example: Patients.ID = Visits.PatientID Table name Primary Key Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 9
  • 10. SQL Querying…With What? Query browsers used here:  MySQL Query Browser  WinSQL Other query browsers exist but are more sophisticated   Often more expensive or more complex Example: PL/SQL Developer, from Allround Automations Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 10
  • 11. Example: Network Querying of Ensembl Database Using MySQL Query Browser  What happens when you use query a remote database?    DEMO Of note: May take some time    Big database, lots of data to return from far away… Easy to write queries with voluminous output May have to kill the query… Setting up ODBC: not discussed here, but cheat sheet instructions are in handout. Location will also be mailed Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 11
  • 12. The Database Schema: Your Roadmap For Querying  The schema describes all tables and all fields   Used to determine how to inter-relate tables to retrieve the desired data Very important:   Must understand schema for accurate querying Wrong understanding = wrong results Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 12
  • 13. Introducing The SQL Select Statement  Good news: This is the only SQL statement you need to understand for querying SELECT LastName, FirstName FROM Patients Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 13
  • 14. Basic Syntax of Select Statement SELECT field_name FROM table [WHERE condition] [ ] = elective Example: Select LastName,FirstName From Patients Where Alive = ‘Y’; Note: case sensitive for all but Oracle  Query statement are written into a tool such as MS Query or MySQL Query Browser Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Handout: p2 14
  • 15. SELECT – (Some) Details Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 15
  • 16. Moving On: Real Biodatabase Schemas Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 16
  • 17. Schemas We’ll Look At…  Remember: Schemas…   describe all tables and all fields used to determine how to inter-relate tables to retrieve the desired data Our schemas today:  Ensembl  BioWarehouse Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 17
  • 18. Ensembl       Produced by Sanger Institute Collection of genome databases for many different organisms Free, open source Web querying: http://www.ensembl.org/ FAQ: What is Ensembl? All PubMed references pertaining to Ensembl and written by the Ensembl group Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 18
  • 19. Exploring the Ensembl Schema  Ensembl CORE schema documentation   First place to go to answer: “what does this table store?” Problem: no graphical representation of overall schema Relationships harder to appreciate  Use Catalog function and go from there…  Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 19
  • 20. “Fundamental” Tables Fundamental tables assembly assembly_exception attrib_type coord_system dna dnac exon exon_stable_id exon_transcript gene gene_stable_id karyotype meta meta_coord prediction_exon prediction_transcript seq_region seq_region_attrib supporting_feature transcript transcript_attrib transcript_stable_id translation translation_attrib translation_stable_id Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Features and analyses alt_allele analysis analysis_description density_feature density_type dna_align_feature map marker marker_feature marker_map_location marker_synonym misc_attrib misc_feature misc_feature_misc_set misc_set prediction_transcript protein_align_feature protein_feature qtl qtl_feature qtl_synonym regulatory_factor regulatory_factor_coding regulatory_feature regulatory_feature_object regulatory_search_region repeat_consensus repeat_feature simple_feature ID Mapping (Map identifiers between releases) gene_archive mapping_session peptide_archive stable_id_event Exernal references (IDs to objects in other dbs) external_db external_synonym go_xref identity_xref object_xref xref Miscellaneous interpro 20
  • 21. Understanding The Ensembl Schema Using The Catalog Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 21
  • 22. Querying Ensembl  Ensembl runs on the MySQL database engine We’ll use WinSQL  MySQL Query Browser can also be used, as well as lots of other querying tools Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 22
  • 23. Before Proceeding: A Word of Caution  Easy to write queries that…   Retrieve nonsense Never complete    Scotty to Captain Kirk: “Where going in circles, and at warp 6 we’re going mighty fast…” Understanding schema is only way to prevent this Tips:   Use “count” to determine the number of rows in table BEFORE returning large datasets Remember: the more tables are joined, the slower the query Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Go to join 23
  • 24. Demo Queries… To Get You Started   Query 1: return number of genes stored in Ensembl Human Query 2: return number of transcripts produced by genes stored in Ensembl Human  Demonstrates JOINing of tables Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 24
  • 25. Exercises Together: 1. the number of genes stored in Ensembl Human  2. the number of transcripts produced by genes stored in Ensembl Human (10 min)  On your own: 3. the types of analyses that Ensembl provides  4. the number of types of markers  5. the number of markers per chromosome for all chromosomes  6. Extra points: the minimum and maximum marker distances for markers on chromosome 19 (20 min)  Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 25
  • 26. SELCT Statement: A Refresher “Modifiers” of select list:  DISTINCT FROM table_list  COUNT [WHERE conditions]  SUM  MIN [START WITH] [CONNECT BY]  MAX [GROUP BY group_by_list] Also:  ORDER BY [HAVING search_conditions]  LIKE (used in [ORDER BY order_list [ASC | DESC] ] WHERE clause) SELECT [DISTINCT] select_list Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 26
  • 27. Example Of A Biologically-Useful Query: All Markers on Chromosome 1 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 27
  • 28. Now We’re Talking: Returning Results into Your Favorite Tool  SQL query results returned to…  MS Excel  … using Data/Import External Data/New Database Query   Details: Excel Advanced Report Development , Zapawa 2005 Spotfire Lane Medical Library & Knowledge Management Center http://lane.stanford.edu In Lane catalog 28
  • 29. Next:        BioWarehouse Produced by SRI International Integration of genome, biochem rxns, pathways, etc databases from many different organisms Free, open source Accessing PublicHouse FAQ Schema All PubMed references pertaining to BioWarehouse and written by the BioWarehouse group Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 29
  • 30. Conceptual Views of the BioWarehouse Database Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 30
  • 31. Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 31
  • 32. Querying BioWarehouse   We’ll query using MySQL Query Browser Caveats:  Lots of datasets supported by BioWarehouse…  .. but some critical ones are missing from publichouse due to licensing requirements, e.g.,    Also: Need to request account to query   MetaCyc UniProt Anonymous user not supported Resource: MySQL v5 Reference Manual Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 32
  • 33. BioWarehouse Demo Queries …to get you started   Query 1: What are the datasets available in PublicHouse? Query 2: How many pathways are there for the EcoCyc dataset? Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 33
  • 34. Example Biologically Meaningful Query Of BioWarehouse: For a Given Pathway, Return Proteins Involved Pathway and Their Molecular Weight SELECT D.Name as PathwayName,J.WID AS ProteinWID, J.Name AS ProteinName, J.MolecularWeightCalc AS MolecularWeightCalc FROM Pathway D,PathwayReaction F, Reaction G, EnzymaticReaction H, Protein J WHERE D.WID = F.PathwayWID AND F.ReactionWID = G.WID AND G.WID = H.ReactionWID and H.ProteinWID = J.WID AND D.DataSetWID=19 AND D.Name LIKE "%lipopolysaccharide%" ORDER BY ProteinName Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 34
  • 35. Exercises Together:   1. How many datasets are there in PublicHouse? 2. What is the number of genes in S. aureus (SAUR158878Cyc)? (10 min) On your own: 3. List the coding region start and ends for all genes that code for proteins in the SAUR158878Cyc dataset  4. How many biochemical reactions are there in each pathway (of any type) in the EcoCyc (=E. coli) dataset? (20 min)  Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 35
  • 36. In Summary…    Knowing the db schema is essential SELECT statement all you need to know Remote databases good for exploring a schema at low cost   No installation… But:    Performance can be poor Restrictions on data set Better to install locally if “real work” to be performed Remember: SQL gives you the power to return results directly into your favorite tool! Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 36
  • 37. Don’t Forget The Class Evaluation Lane Medical Library & Knowledge Management Center http://lane.stanford.edu
  • 38. Resources Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 38
  • 39. Setting-Up for Internet SQL Querying Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 39
  • 40. Setting Up Data Source Names Steps 1. Make sure you have the requisite driver (next slide) 2. Create a Data Source Name (Windows only) 3. 4. Write your query Get the results back into Excel! See Lane videorecorded class Managing Experiment Data Using Excel and Friends: Digging Out from Under the Avalanche for lots more details. Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 40
  • 41. Step 1: Getting Drivers Essential for SQL Querying  A driver is a piece of software that lets your operating system talk to a database  Installed drivers visible in ODBC manager   Each database engine (Oracle, MySQL, etc) requires its own driver    “data connectivity” tool Generally must be installed by user Drivers are needed by Data Source Name tool and querying programs Require (simple) installation Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 41
  • 42. MySQL Driver: Needed to Query MySQL Databases   Windows: Download MySQL Connector/ODBC 3.51 here Must be installed for direct querying using e.g. Excel  Not necessary if you are using the MySQL Query Browser Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 42
  • 43. Oracle Driver: Needed to Query Oracle Databases  Installing “client” software will also install driver     Windows: Download 10g Client here Mac: Download 10g Client here Free Oracle user account required to download Must be installed if you are querying using MS Query or any other query browser involving Oracle Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 43
  • 44. Step 2: Creating a Data Source Name   A Data Source Name (DSN) tells programs on your PC where and how to query a database Populating the fields:     Data Source Name: Unique name of your choice Description: anything Server: exactly as given by the database provider Port number: as specified by database provider  Defaults: MySQL: 3306; Oracle: 1521; MS Access: N/A Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 44
  • 45. Resources – SQL   eBook: Beginning SQL eBook: Learning SQL Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 45
  • 46. Lots More Resources From Lane Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 46
  • 47. Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 47
  • 48. How To Get Accounts for Direct SQL Querying Direct Querying of Selected Bioinformatics Databases Database How? DB Engine MySQL BioWarehouse http://biowarehouse.ai.sri.com/  get account for access to publichouse (publicly-accessible installation of BioWarehouse; see http://biowarehouse.ai.sri.com/PublicHouse Overview.html Ensembl http://www.ensembl.org/info/data/download MySQL .html Mouse Genome Database Mail mgi-help@informatics.jax.org to ask for an account Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Sybase 48
  • 49. Example Querying with MySQL Query Browser     Free MySQL only Facilitates writing of a SQL query  Execute graphical statement Query statement Get it at http://www.mysql.com/products/tools/querybrowser/ Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Table descriptions 49

Notes de l'éditeur

  1. select marker.marker_id, marker_map_location.chromosome_name, marker_map_location.position, map.map_name from ((marker marker INNER JOIN marker_map_location marker_map_location ON marker.marker_id = marker_map_location.marker_id) INNER JOIN map map ON marker_map_location.map_id = map.map_id) where (marker_map_location.chromosome_name = '19')
  2. SELECT D.Name as PathwayName,J.WID as ProteinWID, J.Name as ProteinName, J.MolecularWeightCalc as MolecularWeightCalc FROM Pathway D,PathwayReaction F, Reaction G, EnzymaticReaction H, Protein J where D.WID = F.PathwayWID and F.ReactionWID = G.WID and G.WID = H.ReactionWID and H.ProteinWID = J.WID and D.DataSetWID=19 and D.Name like "%lipopolysaccharide%" order by ProteinName