SlideShare une entreprise Scribd logo
1  sur  27
SHARETHIS
DATA ANALYSIS with R
Hassan Namarvar
2
WHAT IS R?
• R is a free software programming language and software
development for statistical computing and graphics.
• It is similar to S language developed at AT&T Bell Labs by Rick
Becker, John Chambers and Allan Wilks.
• R was initially developed by Ross Ihaka and Robert Gentleman
(1996), from the University of Auckland, New Zealand.
• R source code is written in C, Fortran, and R.
3
R PARADIGMS
Multi paradigms:
– Array
– Object-oriented
– Imperative
– Functional
– Procedural
– Reflective
4
STATISTICAL FEATURES
• Graphical Techniques
• Linear and nonlinear modeling
• Classical statistical tests
• Time-series analysis
• Classification
• Clustering
• Machine learning
5
PROGRAMMING FEATURES
• R is an interpreted language
• Access R through a command-line interpreter
• Like MATLAB, R supports matrix arithmetic
• Data structures:
– Vectors
– Metrics
– Array
– Data Frames
– Lists
6
ADVANTAGES OF R
• The most comprehensive statistical analysis package
available.
• Outstanding graphical capabilities
• Open source software – reviewed by experts
• R is free and licensed under the GNU.
• R has over 5,578 packages as of May 31, 2014!
• R is cross-platform. GNU/Linux, Mac, Windows.
• R plays well with CSV, SAS, SPSS, Excel, Access, Oracle, MySQL,
and SQLite.
7
HOW TO INSTALL R?
• Download an install the latest version from:
– http://cran.r-project.org
• Install packages from R Console:
– > install.packages(‘package_name’)
• R has its own LaTeX-like documentation:
– > help()
8
STARTING WITH R
• In R console:
– > x <- 2
– > x
– > y <- x^2
– > y
– > ls()
– > rm(y)
• Vectors:
– > v <- c(4, 7, 23.5, 76.2, 80)
– > Summary(v)
9
STARTING WITH R
• Histogram:
– > r <- rnorm(100)
– > summary(r)
– > plot(r)
– > hist(r)
• QQ-Plot (Quantile):
– > qqplot(r, rnorm(1000))
10
STARTING WITH R
• Factors:
– > g <- c(‘f’, ‘m’, ‘m’, ‘m’, ‘f’, ‘m’, ‘f’, ‘m’)
– > h <- factor(g)
– > table(g)
• Matrices:
– > r <- rnorm(100)
– > dim(r) <- c(50,2)
– > r
– > Summary(r)
– > M <- matrix(c(45, 23, 66, 77, 33, 44), 2, 3,
byrow=T)
11
STARTING WITH R
• Data Frames:
– > n = c(2, 3, 5)
– > s = c("aa", "bb", "cc")
– > b = c(TRUE, FALSE, TRUE)
– > df = data.frame(n, s, b)
• Built-in Data Set:
– > state.x77
– > st = as.data.frame(state.x77)
– > st$Density = st$Population * 1000 / st$Area
– > summary(st)
– > cor(st)
– > pairs(st)
12
STARTING WITH R
Population
3000 5500 68 71 40 55 0e+00 5e+05
015000
30005500
Income
Illiteracy
0.52.0
6871
Life Exp
Murder
2814
4055
HS Grad
Frost
0100
0e+005e+05
Area
0 15000 0.5 2.0 2 8 14 0 100 0 600
0600
Density
13
LINEAR REGRESSION MODEL IN R
• Linear Regression Model:
– > x <- 1:100
– > y <- x^3
– Model y = a + b . x
– > lm(y ~ x)
– > model <- lm(y ~ x)
– > summary(model)
– > par(mfrow=c(2,2))
– > plot(model)
14
LM MODEL
– Call:
– lm(formula = y ~ x)
– Residuals:
– Min 1Q Median 3Q Max
– -129827 -103680 -29649 85058 292030
– Coefficients:
– Estimate Std. Error t value Pr(>|t|)
– (Intercept) -207070.2 23299.3 -8.887 3.14e-14 ***
– x 9150.4 400.6 22.844 < 2e-16 ***
– ---
– Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
– Residual standard error: 115600 on 98 degrees of freedom
– Multiple R-squared: 0.8419, Adjusted R-squared: 0.8403
– F-statistic: 521.9 on 1 and 98 DF, p-value: < 2.2e-16
15
LM MODEL
0 20 40 60 80 100
0e+002e+054e+056e+058e+051e+06
y=x^3
x
y
16
DIAGNOSIS PLOT
-2e+05 2e+05 4e+05 6e+05
-1e+051e+053e+05
Fitted values
Residuals
Residuals vs Fitted
100
99
98
-2 -1 0 1 2
-10123
Theoretical Quantiles
Standardizedresiduals
Normal Q-Q
100
99
98
-2e+05 2e+05 4e+05 6e+05
0.00.51.01.5
Fitted values
Standardizedresiduals
Scale-Location
100
99
98
0.00 0.01 0.02 0.03 0.04
-10123
Leverage
Standardizedresiduals
Cook's distance
Residuals vs Leverage
100
99
98
17
LINEAR REGRESSION MODEL IN R
• Model Built-in Data:
– > colnames(st)[4] = "Life.Exp"
– > colnames(st)[6] = "HS.Grad"
– model1 = lm(Life.Exp ~ Population + Income
+ Illiteracy + Murder + HS.Grad + Frost +
Area + Density, data=st)
– > summary(model1)
– > model2 <- step(model1)
– > model3 = update(model2, .~.-Population)
– > Summary(model3)
18
LINEAR REGRESSION MODEL IN R
• Confidence limits on Estimated Coefficients:
– > confint(model3)
– > predict(model3, list(Murder=10.5,
HS.Grad=48, Frost=100))
19
OUTLIERS
• Boxplot:
– > v <- rnorm(100)
– > v = c(v,10)
– > boxplot(v)
– > rug(jitter(v), side=2)
-20246810
20
PROBABILITY DENSITY FUNCTION
• PDF:
– > r <- rnorm(1000)
– > hist(r, prob=T)
– > lines(density(r), col="red") Histogram of r
r
Density
-3 -2 -1 0 1 2 3
0.00.10.20.30.4
21
CASE STUDY: SHARETHIS EXAMPLE
• Relationship of clicks with winning price and Impression on
ADX:
• Data
– Analyzed ADX Hourly Impression Logs
• Method
– Detected outliers
– Predicted clicks using a regression tree model
22
CASE STUDY: SHARETHIS EXAMPLE
• Outlier Detection:
Clicks Impressions
23
CASE STUDY: SHARETHIS EXAMPLE
• Regression Tree
– One of the most powerful classification/regression
– > library(rpart)
– > fit <- rpart(log(CLK) ~ log(IMP) + AVG_PRICE +
SD_PRICE, data=x)
– > plot(fit)
– > text(fit)
– > plot(predict(fit), log(x$CLK))
24
CASE STUDY: SHARETHIS EXAMPLE
• Regression Tree
|
log(IMP)< 9.33
log(IMP)< 8.349 log(IMP)< 11.28
SD_PRICE< 0.2604
log(IMP)>=10.04 log(IMP)< 10.39
AVG_PRICE>=1.713 AVG_PRICE>=1.247
AVG_PRICE< 0.8555
log(IMP)< 12.49
0.751 1.387
1.541 2.869
1.959 2.729
3.003
3.104 4.331
3.577 4.753
25
CASE STUDY: SHARETHIS EXAMPLE
• Predict Log of Clicks
0 1 2 3 4 5 6 7
1234
log(x$CLK)
predict(fit)
26
CASE STUDY: COLOR DETECTION
• Detect color from product image:
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
27
RESOURCES
• Books:
– An Introduction to Statistical Learning: with
Applications in R by G. James, D. Witten, T. Hatie,
R. Tibshirani, 2013
– The Art of R Programming: A Tour of Statistical
Software Design, N. Matloff, 2011
– R Cookbook (O'Reilly Cookbooks), P. Teetor, 2011
• R Blog:
– http://www.r-bloggers.com

Contenu connexe

Tendances

Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in RRupak Roy
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to RstudioOlga Scrivner
 
R programming slides
R  programming slidesR  programming slides
R programming slidesPankaj Saini
 
R programming presentation
R programming presentationR programming presentation
R programming presentationAkshat Sharma
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubMartin Bago
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming FundamentalsRagia Ibrahim
 
8. R Graphics with R
8. R Graphics with R8. R Graphics with R
8. R Graphics with RFAO
 
R language tutorial
R language tutorialR language tutorial
R language tutorialDavid Chiu
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with RKazuki Yoshida
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Guy Lebanon
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with RYanchang Zhao
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In RRsquared Academy
 

Tendances (20)

Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in R
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to Rstudio
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
R programming presentation
R programming presentationR programming presentation
R programming presentation
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
 
8. R Graphics with R
8. R Graphics with R8. R Graphics with R
8. R Graphics with R
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
 
R studio
R studio R studio
R studio
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
Class ppt intro to r
Class ppt intro to rClass ppt intro to r
Class ppt intro to r
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 

En vedette

Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in RDuyen Do
 
Discriminant analysis basicrelationships
Discriminant analysis basicrelationshipsDiscriminant analysis basicrelationships
Discriminant analysis basicrelationshipsdivyakalsi89
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
R programming Basic & Advanced
R programming Basic & AdvancedR programming Basic & Advanced
R programming Basic & AdvancedSohom Ghosh
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with RYanchang Zhao
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformSyracuse University
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Dr. Volkan OBAN
 
Applied spatial data introducing
Applied spatial data introducingApplied spatial data introducing
Applied spatial data introducingHa Hoang
 
Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Duyen Do
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysisAbhiram Kanigolla
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013BertrandDrouvot
 
Web data from R
Web data from RWeb data from R
Web data from Rschamber
 

En vedette (20)

Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
 
Discriminant analysis basicrelationships
Discriminant analysis basicrelationshipsDiscriminant analysis basicrelationships
Discriminant analysis basicrelationships
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
R for data analytics
R for data analyticsR for data analytics
R for data analytics
 
R programming Basic & Advanced
R programming Basic & AdvancedR programming Basic & Advanced
R programming Basic & Advanced
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
Biopilot training centre @ vadodara
Biopilot training centre @ vadodaraBiopilot training centre @ vadodara
Biopilot training centre @ vadodara
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple.
 
Applied spatial data introducing
Applied spatial data introducingApplied spatial data introducing
Applied spatial data introducing
 
Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...
 
Introtor
IntrotorIntrotor
Introtor
 
Building powerful dashboards with r shiny
Building powerful dashboards with r shinyBuilding powerful dashboards with r shiny
Building powerful dashboards with r shiny
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysis
 
Data clustering
Data clustering Data clustering
Data clustering
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
 
Introduction To R
Introduction To RIntroduction To R
Introduction To R
 
Web data from R
Web data from RWeb data from R
Web data from R
 

Similaire à Data analysis with R

Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Chia-Chi Chang
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptxkarthikks82
 
statistical computation using R- an intro..
statistical computation using R- an intro..statistical computation using R- an intro..
statistical computation using R- an intro..Kamarudheen KV
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data scienceLong Nguyen
 
India software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreIndia software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreSatnam Singh
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettyNoam Ross
 
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraDBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraLinaCovington707
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RHappy Garg
 

Similaire à Data analysis with R (20)

R
RR
R
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
 
R programming by ganesh kavhar
R programming by ganesh kavharR programming by ganesh kavhar
R programming by ganesh kavhar
 
Rtutorial
RtutorialRtutorial
Rtutorial
 
Perm winter school 2014.01.31
Perm winter school 2014.01.31Perm winter school 2014.01.31
Perm winter school 2014.01.31
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptx
 
Ch1
Ch1Ch1
Ch1
 
Seminar psu 20.10.2013
Seminar psu 20.10.2013Seminar psu 20.10.2013
Seminar psu 20.10.2013
 
statistical computation using R- an intro..
statistical computation using R- an intro..statistical computation using R- an intro..
statistical computation using R- an intro..
 
Language R
Language RLanguage R
Language R
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
India software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreIndia software developers conference 2013 Bangalore
India software developers conference 2013 Bangalore
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the Pretty
 
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraDBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R lecture oga
R lecture ogaR lecture oga
R lecture oga
 

Plus de ShareThis

ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015ShareThis
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacleShareThis
 
ShareThis TV Study
ShareThis TV StudyShareThis TV Study
ShareThis TV StudyShareThis
 
Q1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends ReportQ1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends ReportShareThis
 
ShareThis Finance Study
ShareThis Finance Study ShareThis Finance Study
ShareThis Finance Study ShareThis
 
DataScienceInnovation_ShareThis
DataScienceInnovation_ShareThisDataScienceInnovation_ShareThis
DataScienceInnovation_ShareThisShareThis
 
Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015ShareThis
 
ShareThis TravelStudy-2014
ShareThis TravelStudy-2014ShareThis TravelStudy-2014
ShareThis TravelStudy-2014ShareThis
 
ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014ShareThis
 
H2O platform workshop
H2O platform workshopH2O platform workshop
H2O platform workshopShareThis
 
Q3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends ReportQ3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends ReportShareThis
 
ShareThis_Return on a Share Study
ShareThis_Return on a Share StudyShareThis_Return on a Share Study
ShareThis_Return on a Share StudyShareThis
 
Share this millennial study_2014
Share this millennial study_2014Share this millennial study_2014
Share this millennial study_2014ShareThis
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieShareThis
 
ShareThis_CSTR_July2014
ShareThis_CSTR_July2014ShareThis_CSTR_July2014
ShareThis_CSTR_July2014ShareThis
 
Sharing Steals the Cup
Sharing Steals the CupSharing Steals the Cup
Sharing Steals the CupShareThis
 
ShareThis Auto Study
ShareThis Auto Study ShareThis Auto Study
ShareThis Auto Study ShareThis
 
ShareThis Return on a Share Study
ShareThis Return on a Share StudyShareThis Return on a Share Study
ShareThis Return on a Share StudyShareThis
 
ShareThis RoS
ShareThis RoS ShareThis RoS
ShareThis RoS ShareThis
 

Plus de ShareThis (20)

ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacle
 
ShareThis TV Study
ShareThis TV StudyShareThis TV Study
ShareThis TV Study
 
Q1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends ReportQ1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends Report
 
ShareThis Finance Study
ShareThis Finance Study ShareThis Finance Study
ShareThis Finance Study
 
DataScienceInnovation_ShareThis
DataScienceInnovation_ShareThisDataScienceInnovation_ShareThis
DataScienceInnovation_ShareThis
 
Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015
 
ShareThis TravelStudy-2014
ShareThis TravelStudy-2014ShareThis TravelStudy-2014
ShareThis TravelStudy-2014
 
ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014
 
H2O platform workshop
H2O platform workshopH2O platform workshop
H2O platform workshop
 
Q3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends ReportQ3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends Report
 
ShareThis_Return on a Share Study
ShareThis_Return on a Share StudyShareThis_Return on a Share Study
ShareThis_Return on a Share Study
 
Share this millennial study_2014
Share this millennial study_2014Share this millennial study_2014
Share this millennial study_2014
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
 
ShareThis_CSTR_July2014
ShareThis_CSTR_July2014ShareThis_CSTR_July2014
ShareThis_CSTR_July2014
 
Sharing Steals the Cup
Sharing Steals the CupSharing Steals the Cup
Sharing Steals the Cup
 
ShareThis Auto Study
ShareThis Auto Study ShareThis Auto Study
ShareThis Auto Study
 
ShareThis Return on a Share Study
ShareThis Return on a Share StudyShareThis Return on a Share Study
ShareThis Return on a Share Study
 
Social TV
Social TVSocial TV
Social TV
 
ShareThis RoS
ShareThis RoS ShareThis RoS
ShareThis RoS
 

Dernier

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 

Dernier (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 

Data analysis with R

  • 1. SHARETHIS DATA ANALYSIS with R Hassan Namarvar
  • 2. 2 WHAT IS R? • R is a free software programming language and software development for statistical computing and graphics. • It is similar to S language developed at AT&T Bell Labs by Rick Becker, John Chambers and Allan Wilks. • R was initially developed by Ross Ihaka and Robert Gentleman (1996), from the University of Auckland, New Zealand. • R source code is written in C, Fortran, and R.
  • 3. 3 R PARADIGMS Multi paradigms: – Array – Object-oriented – Imperative – Functional – Procedural – Reflective
  • 4. 4 STATISTICAL FEATURES • Graphical Techniques • Linear and nonlinear modeling • Classical statistical tests • Time-series analysis • Classification • Clustering • Machine learning
  • 5. 5 PROGRAMMING FEATURES • R is an interpreted language • Access R through a command-line interpreter • Like MATLAB, R supports matrix arithmetic • Data structures: – Vectors – Metrics – Array – Data Frames – Lists
  • 6. 6 ADVANTAGES OF R • The most comprehensive statistical analysis package available. • Outstanding graphical capabilities • Open source software – reviewed by experts • R is free and licensed under the GNU. • R has over 5,578 packages as of May 31, 2014! • R is cross-platform. GNU/Linux, Mac, Windows. • R plays well with CSV, SAS, SPSS, Excel, Access, Oracle, MySQL, and SQLite.
  • 7. 7 HOW TO INSTALL R? • Download an install the latest version from: – http://cran.r-project.org • Install packages from R Console: – > install.packages(‘package_name’) • R has its own LaTeX-like documentation: – > help()
  • 8. 8 STARTING WITH R • In R console: – > x <- 2 – > x – > y <- x^2 – > y – > ls() – > rm(y) • Vectors: – > v <- c(4, 7, 23.5, 76.2, 80) – > Summary(v)
  • 9. 9 STARTING WITH R • Histogram: – > r <- rnorm(100) – > summary(r) – > plot(r) – > hist(r) • QQ-Plot (Quantile): – > qqplot(r, rnorm(1000))
  • 10. 10 STARTING WITH R • Factors: – > g <- c(‘f’, ‘m’, ‘m’, ‘m’, ‘f’, ‘m’, ‘f’, ‘m’) – > h <- factor(g) – > table(g) • Matrices: – > r <- rnorm(100) – > dim(r) <- c(50,2) – > r – > Summary(r) – > M <- matrix(c(45, 23, 66, 77, 33, 44), 2, 3, byrow=T)
  • 11. 11 STARTING WITH R • Data Frames: – > n = c(2, 3, 5) – > s = c("aa", "bb", "cc") – > b = c(TRUE, FALSE, TRUE) – > df = data.frame(n, s, b) • Built-in Data Set: – > state.x77 – > st = as.data.frame(state.x77) – > st$Density = st$Population * 1000 / st$Area – > summary(st) – > cor(st) – > pairs(st)
  • 12. 12 STARTING WITH R Population 3000 5500 68 71 40 55 0e+00 5e+05 015000 30005500 Income Illiteracy 0.52.0 6871 Life Exp Murder 2814 4055 HS Grad Frost 0100 0e+005e+05 Area 0 15000 0.5 2.0 2 8 14 0 100 0 600 0600 Density
  • 13. 13 LINEAR REGRESSION MODEL IN R • Linear Regression Model: – > x <- 1:100 – > y <- x^3 – Model y = a + b . x – > lm(y ~ x) – > model <- lm(y ~ x) – > summary(model) – > par(mfrow=c(2,2)) – > plot(model)
  • 14. 14 LM MODEL – Call: – lm(formula = y ~ x) – Residuals: – Min 1Q Median 3Q Max – -129827 -103680 -29649 85058 292030 – Coefficients: – Estimate Std. Error t value Pr(>|t|) – (Intercept) -207070.2 23299.3 -8.887 3.14e-14 *** – x 9150.4 400.6 22.844 < 2e-16 *** – --- – Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 – Residual standard error: 115600 on 98 degrees of freedom – Multiple R-squared: 0.8419, Adjusted R-squared: 0.8403 – F-statistic: 521.9 on 1 and 98 DF, p-value: < 2.2e-16
  • 15. 15 LM MODEL 0 20 40 60 80 100 0e+002e+054e+056e+058e+051e+06 y=x^3 x y
  • 16. 16 DIAGNOSIS PLOT -2e+05 2e+05 4e+05 6e+05 -1e+051e+053e+05 Fitted values Residuals Residuals vs Fitted 100 99 98 -2 -1 0 1 2 -10123 Theoretical Quantiles Standardizedresiduals Normal Q-Q 100 99 98 -2e+05 2e+05 4e+05 6e+05 0.00.51.01.5 Fitted values Standardizedresiduals Scale-Location 100 99 98 0.00 0.01 0.02 0.03 0.04 -10123 Leverage Standardizedresiduals Cook's distance Residuals vs Leverage 100 99 98
  • 17. 17 LINEAR REGRESSION MODEL IN R • Model Built-in Data: – > colnames(st)[4] = "Life.Exp" – > colnames(st)[6] = "HS.Grad" – model1 = lm(Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad + Frost + Area + Density, data=st) – > summary(model1) – > model2 <- step(model1) – > model3 = update(model2, .~.-Population) – > Summary(model3)
  • 18. 18 LINEAR REGRESSION MODEL IN R • Confidence limits on Estimated Coefficients: – > confint(model3) – > predict(model3, list(Murder=10.5, HS.Grad=48, Frost=100))
  • 19. 19 OUTLIERS • Boxplot: – > v <- rnorm(100) – > v = c(v,10) – > boxplot(v) – > rug(jitter(v), side=2) -20246810
  • 20. 20 PROBABILITY DENSITY FUNCTION • PDF: – > r <- rnorm(1000) – > hist(r, prob=T) – > lines(density(r), col="red") Histogram of r r Density -3 -2 -1 0 1 2 3 0.00.10.20.30.4
  • 21. 21 CASE STUDY: SHARETHIS EXAMPLE • Relationship of clicks with winning price and Impression on ADX: • Data – Analyzed ADX Hourly Impression Logs • Method – Detected outliers – Predicted clicks using a regression tree model
  • 22. 22 CASE STUDY: SHARETHIS EXAMPLE • Outlier Detection: Clicks Impressions
  • 23. 23 CASE STUDY: SHARETHIS EXAMPLE • Regression Tree – One of the most powerful classification/regression – > library(rpart) – > fit <- rpart(log(CLK) ~ log(IMP) + AVG_PRICE + SD_PRICE, data=x) – > plot(fit) – > text(fit) – > plot(predict(fit), log(x$CLK))
  • 24. 24 CASE STUDY: SHARETHIS EXAMPLE • Regression Tree | log(IMP)< 9.33 log(IMP)< 8.349 log(IMP)< 11.28 SD_PRICE< 0.2604 log(IMP)>=10.04 log(IMP)< 10.39 AVG_PRICE>=1.713 AVG_PRICE>=1.247 AVG_PRICE< 0.8555 log(IMP)< 12.49 0.751 1.387 1.541 2.869 1.959 2.729 3.003 3.104 4.331 3.577 4.753
  • 25. 25 CASE STUDY: SHARETHIS EXAMPLE • Predict Log of Clicks 0 1 2 3 4 5 6 7 1234 log(x$CLK) predict(fit)
  • 26. 26 CASE STUDY: COLOR DETECTION • Detect color from product image: -1.0 -0.5 0.0 0.5 1.0 -1.0-0.50.00.51.0 -1.0 -0.5 0.0 0.5 1.0 -1.0-0.50.00.51.0 -1.0 -0.5 0.0 0.5 1.0 -1.0-0.50.00.51.0
  • 27. 27 RESOURCES • Books: – An Introduction to Statistical Learning: with Applications in R by G. James, D. Witten, T. Hatie, R. Tibshirani, 2013 – The Art of R Programming: A Tour of Statistical Software Design, N. Matloff, 2011 – R Cookbook (O'Reilly Cookbooks), P. Teetor, 2011 • R Blog: – http://www.r-bloggers.com

Notes de l'éditeur

  1. Client Interview Position the upcoming as introductory and a launching pad for further exploration To get started, want to share a brief video that’s been helpful for our partners …