2. Course Expectations
Objectives
Demonstrate
Windows vs. Mac
Structure
… good practices
… useful features
… the value of querying via Excel
Examples, use cases
Exercises
Resources
Class evaluation questionnaire:
http://www.surveymk.com/s.asp?u=915602161402
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
2
3. Contents
Complexity
+
Querying Web sites &
databases using Excel
Excel handy functions
Excel good practices
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
3
4. So Why Are We Here?
Lots of data
Need for better management of these data
Need exceeds Excel
Excel never really meant for data management anyway
Applying common tools to ameliorate the problem
“In IT, there’s no problem that enough money
can’t solve” not the philosophy here…
Instead: invest yourself and you’ll get a handsome
return
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
4
5. Essential Tip
Clippy: not as dorky as
you might think
Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
6. How To Help Clippy Give You
Better Answers
Read a (good) Excel manual cover to cover
Don’t try to understand everything
Just flip pages and let it impress into your brain
Not fun, but it will give you the requisite
vocabulary
Increases your odds of getting the right answer
Gives you an idea of what Excel can do
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
6
7. Part I: Essential Excel
Functions
Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
8. Essential Excel Functions
1.
2.
3.
4.
5.
6.
Conditional Formatting
Named ranges & Input validation
Custom Toolbar
PivotTable
Web Querying
MS Query
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
8
9. Excel Functions 1: Conditional
Formatting
Definition: A formatting (e.g., cell shading or
font color) applied automatically by Excel to
cells if a specified condition is true.
Example: applying green cell color to the cell if a
test result exceeds a threshold value
In: Format/Conditional Formatting
See Spreadsheet1.xls/ConditionalExample1 - try
Reference
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
9
10. Excel Functions 2: Named Ranges and
Validation
Named ranges are ranges of cells that
are…named!
Named ranges can be used for validating input
data
Important for ensuring data consistency
Essential for queryability
Also useful to avoid repetitive typing by using drop-down
menu
See: Spreadsheet1.xls/InputValidation - try
How to: here
Other references
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
10
11. Excel Functions 3: Custom Toolbar
Why? Bring often used functions together for faster
access
DEMO
How to? 50 min online tutorial
Section on custom toolbars here
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
11
12. Excel Functions 4: PivotTables
Automatic summarization of data
See: Spreadsheet3.xls/Summary1 - try
Converting same category data into summarized values
Tall/skinny wide/fat
Underlying data can always be accessed by
clicking on a summary cell
Online demo (5 min)
How to? 30 min online tutorial
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
12
13. Excel Functions 5: Web Querying
Why Query the Web Using Excel?
Data in a Web page = first step
Need data stored in tool used for daily work
Excel
E.g., with a list I can:
Sort
Annotate
Edit
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
13
14. Excel Functions 5: Web Querying
Options
Copy/paste Web page into Excel - try
Run Web query from within Excel more control try
1.
2.
Going one step further: creating a refreshable Web query
Excel Web querying is not perfect…
Still limited to how data are formatted on Web page
requires editing
Some Web pages don’t work
No arbitrary querying capability (limited by Web interface)
The answer: direct querying using e.g. SQL
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
14
16. Part II: Querying
Databases Using Excel
Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
17. Putting MSQuery to Work
MSQuery, an unknown hero
Free
Facilitates writing a SQL query graphical
What is SQL?
First, need to find it!
Search for “MSQRY32.EXE” using “Search for Files or
Folders”
Search hidden files and folders
On my disk, it is located in C:Program FilesMicrosoft
OfficeOFFICE11
Once you find it, create a shortcut to it and rename it e.g.
MSQuery
move the shortcut to a desired location
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
17
18. Example: Network Querying of Ensembl
Database Using MS Query
Remote
Big database, lots of data to return from far away… DB
ult
s
What happens when you use MS Query
DEMO
query
qu
May take some time
e ry
re s
results
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
18
19. FYI - Bioinformatics Databases:
Direct
WhoQueryability of Selected Bioinformatics Databases Querying?
Supports Direct
Database
Internet SQL querying?
ArrayExpress
How?
Eventually
Modality
DB Engine
SOAP-based
Ensembl
Yes
http://www.ensembl.org
/info/data/download.ht SQL
ml
Mouse Genome
Database
Yes
ask for account
Yes
http://eutils.ncbi.nlm.nih
.gov/entrez/query/static SOAP-based
/esoap_help.html
SQL Server
Yes
http://www.pharmgkb.or
g/home/projects/webser SOAP-based
vices/
Oracle
NCBI Entrez
PharmGKB
SQL
MySQL
Sybase
Saccharomyces Genome
EventuallyMaybe
Database
Oracle
Stanford Microarray
Database
Oracle
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
No
19
20. How to Query Using MSQuery
Steps
1. Make sure you have the requisite driver
2. Create a Data Source Name
3. Write your SQL query
4. Get the results back into Excel!
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
20
21. Step1: Getting Drivers
Essential for Querying
A driver is a piece of software that lets your
operating system talk to a database
Each database engine (Oracle, MySQL, etc)
requires its own driver
Generally must be installed by user
Drivers are needed by Data Source Name
tool and querying programs
Require (simple) installation
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
21
22. MySQL Driver: Needed to Query
MySQL Databases
Windows: Download MySQL
Connector/ODBC 3.51 here
Must be installed for direct querying using
e.g. Excel
Not necessary if you are using the MySQL Query
Browser
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
22
23. Oracle Driver: Needed to Query
Oracle Databases
Installing “client” software will install
driver
Windows: Download 10g Client here
Mac: Download 10g Client here
Must be installed if you are querying
using e.g. Excel
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
23
24. Step 2: Creating a Data Source Name
A Data Source Name (DSN) tells programs
on your PC where and how to query a
database
Populating the fields:
Data Source Name: Unique name of your choice
Description: anything
Server: exactly as given by the database provider
Port number: as specified by database provider
Defaults: MySQL: 3306; Oracle: 1521; MS Access: N/A
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
24
25. Step 3: Building a Query
DEMO
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
25
26. Resources – Excel
Summarizing Numerical Data
Data summarization (text):
http://office.microsoft.com/enus/assistance/HA011864391033.aspx
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
26
27. Resources – MS Access
Free Online Training Resources
Using an Access database to store and information (2 min)
http://office.microsoft.com/en-us/assistance/HA011709681033.aspx
Creating a database from Excel (5 min): http://office.microsoft.com/enus/assistance/HA012013211033.aspx
Creating tables in Access (50 min):
http://office.microsoft.com/training/training.aspx?AssetID=RC061183261033
Writing queries (50 min):
http://office.microsoft.com/training/training.aspx?AssetID=RC010776611033
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
27
28. Resources - Excel
Accessible from
Lane Library
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
Available
via Safari
Available
via Safari
28
29. Resources - Excel
Available from
Lane Library
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
29
30. MS Query Resources
Excellent tutorial:
http://office.microsoft.com/training/Training.as
px?AssetID=RP011856321033&CTT=6&Orig
in=RC011856161033
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
30
31. Resources – SQL
SQL=Structured Query Language
The Language to Query Relational Databases
Beginning SQL, Wilton P & Colby JW: E
http://jenson.stanford.edu/uhtbin/cgisirsi/5AG
uKeptoD/GREEN/59960102/9#holdings
Oracle SQL*Plus, Gennick, J.
Beginning MySQL: E
http://site.ebrary.com/lib/stanford/Doc?id=101
14227
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
31
32. Resources – MS Access
Accessible from
Lane Library
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
Not in SU catalog; on
order by Lane
1st edition available
from SU; 2nd edition
available via Safari
32