5. Why R
Data Science is Hot !
最多人使用的統計語言
最多人用它分析資料
# 矩陣運算
# 統計分析
# 與C++對接容易
6. Software used in data analysis competitions in 2011.
source :http://r4stats.com/articles/popularity/
http://blog.revolutionanalytics.com/2012/08/r-language-
popularity-for-data-mining.html
17. Variable 變數
1
2
3
4
# R code
# vector
a = c(1,2,3,4) # numeric vector 數值向量
b = c("1", "2", "3","4") # string vector 字串向量
c = c( T, F, T, T) # boolean vector 布林向量
# matrix
d = matrix(a, nrow=2, ncol=2)
dim(a) = c(2,2)
# data.frame
e = data.frame(string = b, booling = c) #it can store
different type data
1 3
2 4
“1” T
“2” F
“3” T
“4” F
numeric
vector
numeric
matrix
data.frame
18. Function 函式
like a collection of computation
也就是說, 把一堆運算包起來
do some computation
a function:length
[1,2,3,4] 4
return
# R code
a = c(1,2,3,4)
result = length(a)
result
input
19. Function 函式
do some computation
function: mean
[1,2,3,4] 2.5
return
output
Built-in
self-defined
(package)
# Built-in
data = 1:4
output= mean(data)
data
# Self-defined
MyMean = function(data){
total = sum(data)
len = length(data)
result = total / len
return(result)
}
data = 1:4
output = MyMean(data)
input
20. Module 模組
like a collection of function
[example] data_preprocess.R
21. Package 套件
you can expand your
built-in function
by installing a packages
like a collection of module
22. Package 套件
how to use PACKAGES ?
# R code
x = 1:10 # 設定x軸
y = sin(3*x) # 設定y軸
plot(x,y) # 原本R預設的畫圖函式
# 為了畫比較漂亮的圖....
install.packages(“ggplot2”) # 將 ggplot2這個套件從官網上載到本機端
#括號是必要的
library(ggplot2) # 從本機端 load 到這份程式碼裡
qplot(x,y) # 可以使用 ggplot2裡面寫好的函式 qplot了
24. Flow Control
#1 if
if (expression){
statement
}
# R code
data = rnorm(100) #從標準常態分配中抽
100個樣本點
mu = mean(data)
mu > 0
if ( mu > 0 ){
print("mean is greater than 0")
}else{
print("mean is less than 0")
}
如果偵測到TRUE,
就執行大括弧內敘
述;
否則不執行
25. Flow Control
#2 while
while (expression){
statement
}
# R code
for ( i in 1:3){
data = rnorm(i)
print(data)
}
只要偵測到TRUE,
執行大括弧內敘述;
否則不執行
26. Flow Control
#3 For
For( i in 1: 3){
statement (i)
}
# R code
data = rnorm(100) #從標準常態分配中抽
100個樣本點
mu = mean(data)
mu > 0
while (mu > 0){
print("mean is greater than 0")
# mu = "tested"
}
# 發生無窮迴圈,試著把while內的註解打
開
當 i = 1 , 執行一次
當 i = 2 ,再執行一次
當 i = 3 ,再執行一次
結束迴圈