Forth lecture in our lecture series. Introduction to different biological databases. How to use mysql or nosql database in a research setting. What data repositories are available? How to use pride, peptide pilot and co. How to formulate queries for your custom databases. Dr. Marius Codrea Dr. Sven Nahnsen
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Data Management for Quantitative Biology - Database systems, May 7, 2015, Dr. Marius Codrea
1. Dr. Sven Nahnsen/Dr. Marius Codrea,
Quantitative Biology Center (QBiC)
Data Management for Quantitative Biology
Lecture 4: Database systems
2. Database systems in modern data-driven
biomedical research
I. Typical research scenario
From samples to data
From data to databases and back
From data/databases to information
II. How to?
12. Example Resources
● ENSEMBL + BioMart
● http://www.ensembl.org
● National Center for Biotechnology Information (NCBI) + BLAST
● http://www.ncbi.nlm.nih.gov/
● The European Bioinformatics Institute (EMBL-EBI) + Clustal
● http://www.ebi.ac.uk/services
● UniProt
● http://www.uniprot.org/
● University of California, Santa Cruz (UCSC)
● https://genome.ucsc.edu/
27. Database systems in modern data-driven
biomedical research
I. Typical research scenarios
1.From samples to data (numbers in a file) = digitization
2.From data to databases and back
= use resources (e.g., human genome) and contribute to
repositories (data + scientific observations + metadata)
3.From data/databases to information
= data mining
II. How to?
28. Many database design & concepts
http://dataconomy.com/wp-content/uploads/2014/07/fig2large.jpg
28
29. Databases
DB = "A database is an organized collection of data" http://en.wikipedia.org/wiki/Database
DB = DB + data model for the application at hand (business logic) + implementation
DB = DB + database management system (DBMS). Software than enables:
29
CRUD
• Create entries
• Read (retrieve)
• Update / edit
• Delete
DB = DB + Administration (User privilages, monitoring)
31. Specific characteristics MongoDB vs MySQL
More details here: http://db-engines.com/en/system/MongoDB%3BMySQL
System Property MongoDB MySQL
Initial release 2009 1995
Current release 3.0.2, April 2015 5.6.24, April 2015
Triggers No Yes
MapReduce Yes No
Foreign keys No Yes
Transaction concepts No ACID*
*A database transaction, must be Atomic, Consistent, Isolated and Durable.
32. Relational databases
● A plausible experiment where samples are collected from different
mice before and after some treatment
● High redundancy
● Cumbersome to maintain/update
33. Mice table
Samples table
● Split the data into RELATED tables
● Low redundancy
● Easier to maintain/update (e.g., add some genotype information to mice)
1:M one-to-many relationship
Relational databases (Normalization)
34. Terminology
Fields
Record 1
Record 6
Primary
keyPrimary
key
Foreign Key
Ref
Mice.Mouse_number
● Table rows are called "records"
● Table columns are called "fields"
● The values of the primary keys uniquely identifies the rows of the table
● The foreign key uniquely links the rows of the host table to 1 record in the referencing table
Mice table
Samples table
35. Databases
DB = "A database is an organized collection of data" http://en.wikipedia.org/wiki/Database
DB = DB + data model for the application at hand (business logic) + implementation
DB = DB + database management system (DBMS). Software than enables:
35
CRUD
• Create entries
• Read (retrieve or search)
• Update / edit
• Delete
DB = DB + Administration (User privilages, monitoring)
36. Structured Query Language (SQL)
● SQL is a standard language for
creating, accessing and modifying relational databases
● MySQL implements SQL database management
37. Connect to the server and create the database
mysql u username p h localhost
CREATE database mouse_experiment;
USE mouse_experiment;
38. Create the tables and insert values
CREATE TABLE mice (
Mouse_number SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
Gender enum('Male','Female','NA') DEFAULT 'Female',
Age decimal(4,2) DEFAULT NULL,
Treatment VARCHAR(50) NOT NULL,
PRIMARY KEY (Mouse_number)
);
INSERT INTO mice (Gender, Age, Treatment)
VALUES ('Male','3','Vitamin A'),
('Male','2','Vitamin B'),
('Female','2.5','Vitamin A'),
('Female','3','Vitamin B'),
('Male','4','Vitamin A'),
('Female','2','Vitamin B');
No Mouse_number! It is the task of the DBMS to generate UNIQUE id's
39. Create the tables and insert values
CREATE TABLE samples (
Sample_ID SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
Mouse_number SMALLINT UNSIGNED NOT NULL,
Timepoint VARCHAR(15) NOT NULL,
PRIMARY KEY (Sample_ID),
FOREIGN KEY (Mouse_number)
REFERENCES mice(Mouse_number)
ON DELETE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
44. Delete queries
“Mouse number 3 went wrong. Let's just delete it.”
SELECT * from samples;
DELETE from mice where
Mouse_number = 3;
SELECT * from samples;
Where are these two
samples gone?
46. Extending the database
Mice table
Samples table
1:M one-to-many relationship
Protein tableM:M many-to-many relationship
47. Indexing
● Ultimately, the data is stored in files on disks
● With large amounts of data (tens of million of records),
sequential searching becomes not feasible
● MySQL uses B-trees
● “In computer science, a B-tree is a tree data structure that keeps
data sorted and allows searches, sequential access, insertions, and
deletions in logarithmic time.” http://en.wikipedia.org/wiki/B-tree
49. Summary
● Database design requires domain knowledge, including
example usecases
● Normalization
● Primary and Foreign Keys
● Implement a MySQL database
● Queries & Join Queries
● Indices