Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Los Angeles R users group - Dec 14 2010 - Part 1
1. A SQL primer for R users
with examples from Pokemon
Neal Fultz
UCLA Statistics
Goal of talk
Make SQL look easy
And present R equivalents
Not another 'customer db'
Paradigms
R: fundamental unit is the vector
RDBMS: fundamental unit is the table
Pokemon
Best selling video game of the 90s
sold in multiple versions
(and major fad)
Turn based JRPG
Featuring hundreds(!) of characters to collect
Gotta catch em all!
5. In R it's natural to represent this as a matrix.
In SQL, it's natural to pivot it to tuples.
More concretely
id Name Type 1 Type 2 In Red In Blue
001 Bulbusaur Plant Poison T T
002 Ivysaur Plant Poison T T
003 Venusaur Plant Poison T T
004 Charmander Fire T T
005 Charmelion Fire T T
006 Charzard Fire Flying T T
What's in Red only?
select id, name
from pokemon
where red and not blue;
What's in Red only? (2)
23;"Ekans"
24;"Arbok"
43;"Oddish"
44;"Gloom"
45;"Vileplume"
56;"Mankey"
57;"Primeape"
58;"Growlithe"
59;"Arcanine"
6. 123;"Scyther"
125;"Electabuzz"
What's in Red only? (R)
pokemon[red & ! blue];
Consider Psyduck
select * from pokemon where name like 'Psyduck';
image from http://strategywiki.org/wiki/Pok%C3%A9mon_Gold_and_Silver/Ilex_Foresthttp://strategywiki.org/wiki/Pok
%C3%A9mon_Gold_and_Silver/Ilex_Forest
Consider Psyduck (2)
54;"Psyduck";"Water";"";t;t
Consider Psyduck (R)
pokemon[grep('Psyduck', names)];
7. What types are least common?
Select type1, Count(type1) as c
from pokemon
group by type1
order by c;
What types are least common? (2)
"Ice";2
"Ghost";3
"Dragon";3
...
What types are least common? (R)
sort(table(type1));
Second Types
select type1, type2, count(type2) as c
from pokemon
where type2 is not null
group by type1, type2 order by type2
Second Types (2)
"Water";"Fighting";1
"Normal";"Flying";8
"Fire";"Flying";1
8. "Water";"Flying";1
"Rock";"Flying";1
Second Types (R)
table(type1, type2, exclude=type2==NULL);
Vs Gyarados?
Select attackType, multiplier
from pokemon, pokemonType
where name like 'Gyarados'
and defendType in (type1, type2)
Vs Gyarados (2)
"Fighting";0.5
"Ground";0
"Rock";2
"Bug";0.5
"Fire";0.5
"Water";0.5
"Grass";0.5
"Grass";2
"Electric";2
"Electric";2
"Ice";2
"Ice";0.5
Vs Gyarados (T)
i <- grep("Gyarados", names);
9. multipliers <- types[, c(type1[i], type2[i])];
multipliers[which(multipliers != 1)];
Vs Gyarados Cont
Select attackType,
round(exp(sum(ln(multiplier+.00000000000001))),3)
from pokemon, pokemonType
where name like 'Gyarados'
and defendType in (type1, type2) group by AttackType
Vs Gyarados Cont (2)
"Ground";0.000
"Bug";0.500
"Grass";1.000
"Water";0.500
"Ice";1.000
"Rock";2.000
"Fighting";0.500
"Fire";0.500
"Electric";4.00
Vs Gyarados Cont (R)
i <- grep("Gyarados", names);
multipliers <- types[, c(type1[i], type2[i])];
apply(multipliers,2,prod);
Vs Gyarados Final
Select o.name,
10. round(exp(sum(ln(multiplier+.00000000000001))),3) as m
from pokemon p, pokemonType t, pokemon o
where p.name like 'Gyarados'
and defendType in (p.type1, p.type2)
and attackType in (o.type1, o.type2)
group by o.name
order by m desc;
Vs Gyarados Final (2)
"Raichu";4.000
"Electabuzz";4.000
"Jolteon";4.000
"Electrode";4.000
"Zapados";4.000
"Magneton";4.000
"Pikachu";4.000
"Magnemite";4.000
"Voltorb";4.000
"Aerodactyl";2.000
"Bellsprout";1.000
"Bulbasaur";1.000
...
Vs Gyarados Final (R)
i <- grep("Gyarados", names);
multipliers <- types[, c(type1[i], type2[i])];
totals <- apply(multipliers,2,prod);
cbind(names, type1[totals] * type2[totals]);
Conclusions
11. See the pattern?
SQL:
SELECT (cols) FROM (tables) WHERE (row condition)
R:
Subsetting (Logical, index, multiple index)
grep()
table()
apply()
merge()
See also: sqldf library
Questions/Comments
Resources
PostgreSQL An open source RDBMS
W3schools SQL tutorial
Wikipedia comparison page
Bulbapedia Everything about pokemon
Pokemon for Dummies
Log Parser A Win util for running SQL directly against files