SlideShare une entreprise Scribd logo
1  sur  113
Pre- Placement Workshop 
in R and Analytics 
Delhi School of Economics 2014 
Ajay Ohri
Hi , I am Ajay Ohri
Agenda 
• Try and learn R in 12 hours
Agenda 
• Try and learn R in 12 hours 
• Get an introduction to Analytics
Agenda 
• Try and learn R in 12 hours 
• Get an introduction to Analytics 
• Be better skilled for Analytics as a career
Agenda 
• Try and learn R in 12 hours 
• Get an introduction to Analytics 
• Be better skilled for Analytics as a career (?)
Training Plan 
• DAY 1 
– Session 1 -2.5 hours 
– Session 2 -3.5 hours 
• DAY 2 
– Session 1-2.5 hours 
– Session 2 -3.5 hours
Instructor 
• Author of R for Business Analytics 
• Author of R for Cloud Computing ( An 
approach for Data Scientists) 
• 10+ yrs in Analytics and 6+ years in R 
• Founder, Decisionstats.com
The Audience 
Breakup – Demographics and Background
Expectations from each other 
• From Instructor 
– Your turn to speak
Expectations from each other 
• From Instructor 
• From Audience 
– mobile phones should be kindly switched off 
• Yes, this includes Whatsapp 
– Ask Questions at end of session 
– Take Notes
Day 1 Session 1 
– Introductions 
• Introduction to Analytics 
• Introduction to R 
• Interfaces in R 
– Demos in R (Maths, Objects,etc) 
• Break 1- 
– Installation, Trouble Shooting, Questions
Day 1 Session 2 
– Recap 
• Input of Data 
• Inspecting Data Quality 
• Investigating Data Issues 
– Demos in R 
• Data Input, 
• Data Quality, 
• Data Exploration) 
• Break 2- 
– Questions
Day 2 Session 1 
– Revision 
• Exploring Data 
• Manipulating Data 
• Visualization of Data 
• Demos in R 
• Data Exploration, 
• Data Manipulation, 
• Data Visualizations 
• Break 1 
– Questions
Day 2 Session 2 
– Recap 
• Data Mining 
• Regression Models 
• Advanced Topics 
• Demos in R 
• Data Mining, 
• Model Building, 
• Advanced Topics 
• Summary and Conclusion 
• Break 2 
– Questions
Analytics 
• What is analytics? 
• Where is it used? 
• How is it used? 
• What are some good practices?
Analytics 
• What is analytics? – Study of data for helping 
with decision making using software 
• Where is it used? 
• How is it used? 
• What are some good practices?
Analytics 
• What is analytics? 
• Where is it used? – Industries (like Pharma, 
BFSI, Telecom, Retail) 
• How is it used? –Use statistics and software 
• What are some good practices?
Analytics 
• What is analytics? 
• Where is it used? 
• How is it used? 
• What are some good practices? – 
– Learn one new thing extra from your 
competition every day. This is a fast moving field. 
– Etc.
What is Data Science
Other Analytics Software 
• SAS (Base) et al 
• JMP 
• SPSS 
• Python 
• Octave 
• Clojure 
• Julia(?)
Other Analytics Software 
• SAS (Base) et al 
• JMP 
• SPSS 
• Python 
• Octave 
• Clojure 
• Julia(?)
What is R? 
http://www.r-project.org/ 
• Language 
– Object oriented 
– Open Source 
– Free 
– Widely used 
the concept of "objects" that have data fields(attributes that describe the object) 
and associated procedures known as methods. Objects, which are 
usually instances of classes, are used to interact with one another to design 
applications and computer programs
Pre Requisites 
• Installation of R 
http://cran.rstudio.com/bin/windows/base/ 
• R Studio 
• R Packages
Pre Requisites 
• Installation of R 
– Rtools 
– http://cran.rstudio.com/bin/windows/Rtools/ 
• R Studio 
• R Packages
Pre Requisites 
• Installation of R 
– RTools 
• R Studio 
http://www.rstudio.com/products/rstudio/download/ 
• R Packages
Pre Requisites 
• Installation of R 
– RTools 
• R Studio 
http://www.rstudio.com/products/rstudio/download/ 
• R Packages 
about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet 
sites covering a very wide range of modern statistics.
Pre Requisites 
• Installation of R 
– RTools 
• R Studio 
http://www.rstudio.com/products/rstudio/download/ 
• R Packages 
install.packages(), 
update.packages(), 
library() 
Packages are installed once, updated periodically, but loaded every time
Pre Requisites 
• R 
• R Studio 
• R Tools (for Windows) 
• JAVA (JRE) 
– R Packages (need Internet connection) 
– Rcmdr 
• All packages asked at startup 
• Epack plugin 
• KMggplot2plugin 
– rattle 
• A few packages that are asked when using rattle 
• GTK+ (needs internet) 
– Deducer 
– ggmap 
– Hmisc 
– arules 
– MASS
Interfaces to R 
• Console 
Default 
Customization 
• IDE 
• GUI
Demo- 
Basic Math on R Console 
• + 
• - 
• Log 
• Exp 
• * 
• / 
• () 
• mean 
• sum 
• sd 
• log 
• median 
• exp
Demo- 
Basic Math on R Console 
• + 
• - 
• Log 
• Exp 
• * 
• / 
• () 
Hint- Ctrl +L clears screen
Demo- 
Basic Objects on R Console 
• + 
• - 
• Log 
• Exp 
• * 
• / 
• () 
Functions-ls() 
– what objects are here 
rm(“foo”) removes object named foo 
Assignment 
Using = or -> assigns object names to values 
Hint- Up arrow gives you last 
typed command
Functions and Loops 
• Loops 
for (number in 1:5){ print (number) }
Functions and Loops 
• Function 
functionajay=function(a)(a^2+2*a+1) 
Hint: Always match brackets 
Each ( deserves a ) 
Each { deserves a } 
Each [ deserves a ]
Demo- 
Basic Objects on R Console 
• + 
• - 
• Log 
• Exp 
• * 
This is made more clear in 
next slide 
Functions-class() 
gives class 
dim() gives dimensions 
nrow() gives rows 
ncol() gives columns 
length() gives length 
str() gives structure 
Hint- Up arrow gives you last 
typed command
Demo- 
Datasets on R Console 
• 
Hint- use data() to list all loaded 
datasets
Demo- 
Datasets on R Console 
• 
Hint- use data() to list all loaded 
datasets 
library(FOO) loads package “FOO”
R- Basic Functions 
– ls() 
– rm() 
– str() 
– summary() 
– getwd() 
– setwd() 
– dir() 
– read.csv()
Day 1 Session 2 
– Recap 
• Input of Data 
• Inspecting Data Quality 
• Investigating Data Issues 
– Demos in R 
• Data Input, 
• Data Quality, 
• Data Exploration) 
• Break 2- 
– Questions
read.table()
Statistical formats 
• read.spss from foreign package 
• read.sas7bdat from sas7bdat package
From Databases 
The RODBC package provides access to databases through 
an ODBC interface. 
The primary functions are 
• odbcConnect(dsn, uid="", pwd="") Open a connection 
to an ODBC database 
• sqlFetch(channel, sqltable) Read a table from an ODBC 
database into a data frame 
Hint- a good site to learn R 
http://www.statmethods.net
A Detour to SQL
From Web (aka Web Scraping) 
• readlines Hint : R is case sensitive 
readlines is not the same as readLines 
Hint : Use head() and tail() to inspect objects 
Other packages are XML and Curl 
Case Study- http://decisionstats.com/2013/04/14/using-r-for-cricket-analysis-rstats/
Inspecting Data Quality 
• head() 
• tail() 
• names() 
• str() 
• objectname[I,m] 
• objectname$variable 
Hint- Try this code please 
data(mtcars) 
head(mtcars,10) 
tail(mtcars,5) 
names(mtcars) 
str(mtcars) 
mtcars[1,] 
mtcars[,2] 
mtcars[2,3] 
mtcars$cyl
Inspecting Data Quality: Demo 
•
Inspecting Data Quality: Demo 
•
Data Selection 
• object[l,m] gives the value in l row and m 
column 
• object[l,] will give all the values in l row 
• object$varname gives all values of varname 
• subset helps in selection
Data Selection: Demo 
Questions- How do I use multiple conditions (AND OR) 
Can I do away with subset function 
How do I select random sample 
Useful Link- http://decisionstats.com/2013/11/24/50-functions-to-clear-a-basic-interview-for-business- 
analytics-rstats/
Day 2 Session 1 
– Revision 
• Exploring Data 
• Manipulating Data 
• Visualization of Data 
• Demos in R 
• Data Exploration, 
• Data Manipulation, 
• Data Visualizations 
• Break 1 
– Questions
Good coding practices 
• Use # for comment 
• Use git for version control 
• Use Rstudio for multiple lines of code
Functions in R 
• custom functions 
• source code for a function 
• Understanding help ? , ??
Packages in R 
• CRAN 
• CRAN Views 
• R Documentation
Documentation in R 
• Help ? And ?? 
• CRAN Views 
• Package Help 
• Tips for Googling 
– Stack Overflow 
– Email Lists 
– Twitter 
– R Bloggers
Interfaces to R 
• Console 
• IDE 
R Studio 
• GUI 
Graphical User 
Interface
Graphical Interfaces to R 
• R Commander 
• Rattle 
• Deducer
Installation of R Commander
Overview of R Commander
Demo 
R Commander – 3D Graphs
Installation of Rattle
Installation of Rattle
Installation of Rattle
Installation of Rattle
Installation of Rattle 
• GTK+ Installation Necessary 
• Install other packages when prompted
Installation of Rattle 
• GTK+ Installation Necessary 
• Install other packages when prompted
Overview of Rattle
Demo Rattle
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Installation Deducer (with JGR)
Overview of Deducer (with JGR)
Demo Deducer 
• data() 
• data(mtcars)
Data Exploration 
• summary() 
• table() 
• describe() (Hmisc) 
• summarize()(Hmisc) 
Hint- Try this code please 
summary(mtcars) 
table(mtcars$cyl) 
library(Hmisc) 
describe(mtcars) 
summarize(mtcars$mpg,mtcars$cyl,mean) 
CLASS WORK- 
•Use table command for two variables 
•Summarize mtcars$mpg for two variables (cyl , gear) 
•Try and find min and max for the same
Data Exploration 
• missing values are represented by NA in R 
• Demo 
– is.na 
– na.omit 
– na.rm
Data Visualization 
Notes- 
Explaining Basic Types of Graphs 
Customizing Graphs 
Graph Output 
Advanced Graphs 
Facets, 
Grammar of Graphics 
Data Visualization Rules
Data Manipulation Demo 
Notes- 
1. gsub 
2. gsub with 
escape 
3. as operator 
4. is operator
Text Manipulation 
Functions-nchar 
substr 
paste
Date Manipulation
Date Manipulation 
Use ? help generously 
Hit escape to escape the + signs 
+ signs occur due to unclosed quotes or brackets 
Class Work 
What is your age in days as of today? 
What is your age in weeks as of today? 
Hint- 
> age2=difftime(Sys.Date(),dob2,units='weeks') 
> age2 
Time difference of 1959.286 weeks
Data Output 
• Graphical Output 
• Numerical Output (aggregation)
Data Output 
• Graphical Output 
• Numerical Output (aggregation)
Data Output 
• Graphical Output
Data Output 
• Use objects to summarize 
• Use write.csv 
• Use setwd() to set location of output
Econometrics 
Coming up 
Regression
Correlation
Regression 
Notes- 
Correlation is not causation 
How do we determine which is dependent 
and which are independent variables
Regression
Regression using R Commander
Lies True Lies and Statistics 
• Anscombe -case study
Regression Recap 
• cor 
• lm 
• anova 
• summary and plot of lm object 
• residuals 
• p value 
– vif 
– heteroskedascity 
– outliers
Propensity Modeling in Industry 
• Response Rates 
• Lift 
• Test and Control groups
Day 2 Session 2 
– Recap 
• Data Mining 
• Regression Models 
• Advanced Topics 
• Demos in R 
• Data Mining, 
• Model Building, 
• Advanced Topics 
• Summary and Conclusion 
• Break 2 
– Questions
Data Mining 
• Rattle 
– association analysis 
– cluster analysis 
– modeling
Rattle 
• Analyze wine
Rattle 
• Analyze wine
Rattle 
• Analyze wine
Rattle 
• Cluster Analysis
Data Mining 
• Brief Introduction 
– Affinity analysis is a data analysis and data mining technique that 
discovers co-occurrence relationships among activities performed by (or 
recorded about) specific individuals or groups. In general, this can be 
applied to any process where agents can be uniquely identified and 
information about their activities can be recorded. In retail, affinity 
analysis is used to perform market basket analysis, in which retailers seek 
to understand the purchase behavior of customers. This information can 
then be used for purposes of cross-selling and up-selling,
Rattle 
• Brief Introduction 
– market basket analysis 
– Market basket analysis might tell a retailer that customers often 
purchase shampoo and conditioner together, so putting both items on 
promotion at the same time would not create a significant increase in 
revenue, while a promotion involving just one of the items would likely 
drive sales of the other
Rattle 
• Brief Introduction 
– association rules 
– if butter and bread are bought, customers also buy milk 
Example database with 4 items and 5 transactions 
transactio 
n ID 
milk bread butter beer 
1 1 1 0 0 
2 0 0 1 0 
3 0 0 0 1 
4 1 1 1 0 
5 0 1 0 0
Rattle 
• Brief Introduction 
– association rules 
– the itemset (milk,bread->butter) has a support of 20% since it occurs in 20% of all 
transactions (1 out of 5 transactions). 
– the itemset (milk,bread->butter) has a confidence of 50% since it occurs in 50% of all 
such transactions (1 out of 2 transactions). 
–
Rattle 
• Brief Introduction 
– association rules
Regression Models 
• lm function 
• Understanding output 
• Diagnostics 
– homoskedasticity 
– Multicollinearity 
– p value 
– Residuals
Advanced Topics :Demos 
• Time Series Analysis (use epack plugin) 
http://decisionstats.com/2010/10/22/doing-time-series-using-a-r-gui/
Advanced Topics :Demos 
• Advanced Data Visualization ( kmggplot2 
plugin) 
http://decisionstats.com/2012/05/21/new-rcommander-with-ggplot-rstats/
Advanced Topics :Demos 
Social Network Analysis (sna) 
Facebook 
http://decisionstats.com/2014/05/10/analyzing-facebook-networks-using-rstats/ 
Twitter 
http://www.slideshare.net/ajayohri/twitter-analysis-by-kaify-rais
Advanced Topics :Demos 
• Spatial Analysis 
• ggmap demo 
• http://decisionstats.com/2013/08/19/the-wonderful-ggmap-package-for-spatial-analysis-in-r-rstats/ 
• rmaps 
• http://rcharts.io/viewer/?9223554#.Uw4hOPmSySp
Thank You 
• http://linkedin.com/in/ajayohri 
• ohri2007@gmail.com

Contenu connexe

Tendances

Managing large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsManaging large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsAjay Ohri
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformSyracuse University
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programmingNimrita Koul
 
The History and Use of R
The History and Use of RThe History and Use of R
The History and Use of RAnalyticsWeek
 
R programming slides
R  programming slidesR  programming slides
R programming slidesPankaj Saini
 
R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul SinghRavi Basil
 
Learning R and Teaching R
Learning R and Teaching RLearning R and Teaching R
Learning R and Teaching RAjay Ohri
 
1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in rSimple Research
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkSandy Ryza
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.C. Tobin Magle
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data scienceSovello Hildebrand
 
ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsEdwin de Jonge
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasWes McKinney
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandasAkshitaKanther
 
R programming language
R programming languageR programming language
R programming languageKeerti Verma
 

Tendances (20)

Managing large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and conceptsManaging large datasets in R – ff examples and concepts
Managing large datasets in R – ff examples and concepts
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
 
The History and Use of R
The History and Use of RThe History and Use of R
The History and Use of R
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
R program
R programR program
R program
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul Singh
 
Learning R and Teaching R
Learning R and Teaching RLearning R and Teaching R
Learning R and Teaching R
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with Spark
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
R tutorial
R tutorialR tutorial
R tutorial
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasets
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandas
 
R programming language
R programming languageR programming language
R programming language
 

Similaire à R Workshop Delhi School Economics Learn Analytics

Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersVitomir Kovanovic
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to pythonActiveState
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folksThomas Hütter
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSonaCharles2
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
How to obtain and install R.ppt
How to obtain and install R.pptHow to obtain and install R.ppt
How to obtain and install R.pptrajalakshmi5921
 
ACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfAnyscale
 
Wrokflow programming and provenance query model
Wrokflow programming and provenance query model  Wrokflow programming and provenance query model
Wrokflow programming and provenance query model Rayhan Ferdous
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumertirlukachaitanya
 

Similaire à R Workshop Delhi School Economics Learn Analytics (20)

Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
R - the language
R - the languageR - the language
R - the language
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
How to obtain and install R.ppt
How to obtain and install R.pptHow to obtain and install R.ppt
How to obtain and install R.ppt
 
ACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdf
 
Wrokflow programming and provenance query model
Wrokflow programming and provenance query model  Wrokflow programming and provenance query model
Wrokflow programming and provenance query model
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer
 

Plus de Ajay Ohri

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay OhriAjay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionAjay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen OomsAjay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha Ajay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 
Summer school python in spanish
Summer school python in spanishSummer school python in spanish
Summer school python in spanishAjay Ohri
 

Plus de Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Pyspark
PysparkPyspark
Pyspark
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Analyze this
Analyze thisAnalyze this
Analyze this
 
Summer school python in spanish
Summer school python in spanishSummer school python in spanish
Summer school python in spanish
 

Dernier

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 

Dernier (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 

R Workshop Delhi School Economics Learn Analytics

  • 1. Pre- Placement Workshop in R and Analytics Delhi School of Economics 2014 Ajay Ohri
  • 2. Hi , I am Ajay Ohri
  • 3. Agenda • Try and learn R in 12 hours
  • 4. Agenda • Try and learn R in 12 hours • Get an introduction to Analytics
  • 5. Agenda • Try and learn R in 12 hours • Get an introduction to Analytics • Be better skilled for Analytics as a career
  • 6. Agenda • Try and learn R in 12 hours • Get an introduction to Analytics • Be better skilled for Analytics as a career (?)
  • 7. Training Plan • DAY 1 – Session 1 -2.5 hours – Session 2 -3.5 hours • DAY 2 – Session 1-2.5 hours – Session 2 -3.5 hours
  • 8. Instructor • Author of R for Business Analytics • Author of R for Cloud Computing ( An approach for Data Scientists) • 10+ yrs in Analytics and 6+ years in R • Founder, Decisionstats.com
  • 9. The Audience Breakup – Demographics and Background
  • 10. Expectations from each other • From Instructor – Your turn to speak
  • 11. Expectations from each other • From Instructor • From Audience – mobile phones should be kindly switched off • Yes, this includes Whatsapp – Ask Questions at end of session – Take Notes
  • 12. Day 1 Session 1 – Introductions • Introduction to Analytics • Introduction to R • Interfaces in R – Demos in R (Maths, Objects,etc) • Break 1- – Installation, Trouble Shooting, Questions
  • 13. Day 1 Session 2 – Recap • Input of Data • Inspecting Data Quality • Investigating Data Issues – Demos in R • Data Input, • Data Quality, • Data Exploration) • Break 2- – Questions
  • 14. Day 2 Session 1 – Revision • Exploring Data • Manipulating Data • Visualization of Data • Demos in R • Data Exploration, • Data Manipulation, • Data Visualizations • Break 1 – Questions
  • 15. Day 2 Session 2 – Recap • Data Mining • Regression Models • Advanced Topics • Demos in R • Data Mining, • Model Building, • Advanced Topics • Summary and Conclusion • Break 2 – Questions
  • 16. Analytics • What is analytics? • Where is it used? • How is it used? • What are some good practices?
  • 17. Analytics • What is analytics? – Study of data for helping with decision making using software • Where is it used? • How is it used? • What are some good practices?
  • 18. Analytics • What is analytics? • Where is it used? – Industries (like Pharma, BFSI, Telecom, Retail) • How is it used? –Use statistics and software • What are some good practices?
  • 19. Analytics • What is analytics? • Where is it used? • How is it used? • What are some good practices? – – Learn one new thing extra from your competition every day. This is a fast moving field. – Etc.
  • 20. What is Data Science
  • 21. Other Analytics Software • SAS (Base) et al • JMP • SPSS • Python • Octave • Clojure • Julia(?)
  • 22. Other Analytics Software • SAS (Base) et al • JMP • SPSS • Python • Octave • Clojure • Julia(?)
  • 23. What is R? http://www.r-project.org/ • Language – Object oriented – Open Source – Free – Widely used the concept of "objects" that have data fields(attributes that describe the object) and associated procedures known as methods. Objects, which are usually instances of classes, are used to interact with one another to design applications and computer programs
  • 24. Pre Requisites • Installation of R http://cran.rstudio.com/bin/windows/base/ • R Studio • R Packages
  • 25. Pre Requisites • Installation of R – Rtools – http://cran.rstudio.com/bin/windows/Rtools/ • R Studio • R Packages
  • 26. Pre Requisites • Installation of R – RTools • R Studio http://www.rstudio.com/products/rstudio/download/ • R Packages
  • 27. Pre Requisites • Installation of R – RTools • R Studio http://www.rstudio.com/products/rstudio/download/ • R Packages about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics.
  • 28. Pre Requisites • Installation of R – RTools • R Studio http://www.rstudio.com/products/rstudio/download/ • R Packages install.packages(), update.packages(), library() Packages are installed once, updated periodically, but loaded every time
  • 29. Pre Requisites • R • R Studio • R Tools (for Windows) • JAVA (JRE) – R Packages (need Internet connection) – Rcmdr • All packages asked at startup • Epack plugin • KMggplot2plugin – rattle • A few packages that are asked when using rattle • GTK+ (needs internet) – Deducer – ggmap – Hmisc – arules – MASS
  • 30. Interfaces to R • Console Default Customization • IDE • GUI
  • 31. Demo- Basic Math on R Console • + • - • Log • Exp • * • / • () • mean • sum • sd • log • median • exp
  • 32. Demo- Basic Math on R Console • + • - • Log • Exp • * • / • () Hint- Ctrl +L clears screen
  • 33. Demo- Basic Objects on R Console • + • - • Log • Exp • * • / • () Functions-ls() – what objects are here rm(“foo”) removes object named foo Assignment Using = or -> assigns object names to values Hint- Up arrow gives you last typed command
  • 34. Functions and Loops • Loops for (number in 1:5){ print (number) }
  • 35. Functions and Loops • Function functionajay=function(a)(a^2+2*a+1) Hint: Always match brackets Each ( deserves a ) Each { deserves a } Each [ deserves a ]
  • 36. Demo- Basic Objects on R Console • + • - • Log • Exp • * This is made more clear in next slide Functions-class() gives class dim() gives dimensions nrow() gives rows ncol() gives columns length() gives length str() gives structure Hint- Up arrow gives you last typed command
  • 37. Demo- Datasets on R Console • Hint- use data() to list all loaded datasets
  • 38. Demo- Datasets on R Console • Hint- use data() to list all loaded datasets library(FOO) loads package “FOO”
  • 39. R- Basic Functions – ls() – rm() – str() – summary() – getwd() – setwd() – dir() – read.csv()
  • 40. Day 1 Session 2 – Recap • Input of Data • Inspecting Data Quality • Investigating Data Issues – Demos in R • Data Input, • Data Quality, • Data Exploration) • Break 2- – Questions
  • 42. Statistical formats • read.spss from foreign package • read.sas7bdat from sas7bdat package
  • 43. From Databases The RODBC package provides access to databases through an ODBC interface. The primary functions are • odbcConnect(dsn, uid="", pwd="") Open a connection to an ODBC database • sqlFetch(channel, sqltable) Read a table from an ODBC database into a data frame Hint- a good site to learn R http://www.statmethods.net
  • 44. A Detour to SQL
  • 45. From Web (aka Web Scraping) • readlines Hint : R is case sensitive readlines is not the same as readLines Hint : Use head() and tail() to inspect objects Other packages are XML and Curl Case Study- http://decisionstats.com/2013/04/14/using-r-for-cricket-analysis-rstats/
  • 46. Inspecting Data Quality • head() • tail() • names() • str() • objectname[I,m] • objectname$variable Hint- Try this code please data(mtcars) head(mtcars,10) tail(mtcars,5) names(mtcars) str(mtcars) mtcars[1,] mtcars[,2] mtcars[2,3] mtcars$cyl
  • 49. Data Selection • object[l,m] gives the value in l row and m column • object[l,] will give all the values in l row • object$varname gives all values of varname • subset helps in selection
  • 50. Data Selection: Demo Questions- How do I use multiple conditions (AND OR) Can I do away with subset function How do I select random sample Useful Link- http://decisionstats.com/2013/11/24/50-functions-to-clear-a-basic-interview-for-business- analytics-rstats/
  • 51. Day 2 Session 1 – Revision • Exploring Data • Manipulating Data • Visualization of Data • Demos in R • Data Exploration, • Data Manipulation, • Data Visualizations • Break 1 – Questions
  • 52. Good coding practices • Use # for comment • Use git for version control • Use Rstudio for multiple lines of code
  • 53. Functions in R • custom functions • source code for a function • Understanding help ? , ??
  • 54. Packages in R • CRAN • CRAN Views • R Documentation
  • 55. Documentation in R • Help ? And ?? • CRAN Views • Package Help • Tips for Googling – Stack Overflow – Email Lists – Twitter – R Bloggers
  • 56. Interfaces to R • Console • IDE R Studio • GUI Graphical User Interface
  • 57. Graphical Interfaces to R • R Commander • Rattle • Deducer
  • 58. Installation of R Commander
  • 59. Overview of R Commander
  • 60. Demo R Commander – 3D Graphs
  • 65. Installation of Rattle • GTK+ Installation Necessary • Install other packages when prompted
  • 66. Installation of Rattle • GTK+ Installation Necessary • Install other packages when prompted
  • 76. Overview of Deducer (with JGR)
  • 77. Demo Deducer • data() • data(mtcars)
  • 78. Data Exploration • summary() • table() • describe() (Hmisc) • summarize()(Hmisc) Hint- Try this code please summary(mtcars) table(mtcars$cyl) library(Hmisc) describe(mtcars) summarize(mtcars$mpg,mtcars$cyl,mean) CLASS WORK- •Use table command for two variables •Summarize mtcars$mpg for two variables (cyl , gear) •Try and find min and max for the same
  • 79. Data Exploration • missing values are represented by NA in R • Demo – is.na – na.omit – na.rm
  • 80. Data Visualization Notes- Explaining Basic Types of Graphs Customizing Graphs Graph Output Advanced Graphs Facets, Grammar of Graphics Data Visualization Rules
  • 81. Data Manipulation Demo Notes- 1. gsub 2. gsub with escape 3. as operator 4. is operator
  • 84. Date Manipulation Use ? help generously Hit escape to escape the + signs + signs occur due to unclosed quotes or brackets Class Work What is your age in days as of today? What is your age in weeks as of today? Hint- > age2=difftime(Sys.Date(),dob2,units='weeks') > age2 Time difference of 1959.286 weeks
  • 85. Data Output • Graphical Output • Numerical Output (aggregation)
  • 86. Data Output • Graphical Output • Numerical Output (aggregation)
  • 87. Data Output • Graphical Output
  • 88. Data Output • Use objects to summarize • Use write.csv • Use setwd() to set location of output
  • 91. Regression Notes- Correlation is not causation How do we determine which is dependent and which are independent variables
  • 93. Regression using R Commander
  • 94. Lies True Lies and Statistics • Anscombe -case study
  • 95. Regression Recap • cor • lm • anova • summary and plot of lm object • residuals • p value – vif – heteroskedascity – outliers
  • 96. Propensity Modeling in Industry • Response Rates • Lift • Test and Control groups
  • 97. Day 2 Session 2 – Recap • Data Mining • Regression Models • Advanced Topics • Demos in R • Data Mining, • Model Building, • Advanced Topics • Summary and Conclusion • Break 2 – Questions
  • 98. Data Mining • Rattle – association analysis – cluster analysis – modeling
  • 102. Rattle • Cluster Analysis
  • 103. Data Mining • Brief Introduction – Affinity analysis is a data analysis and data mining technique that discovers co-occurrence relationships among activities performed by (or recorded about) specific individuals or groups. In general, this can be applied to any process where agents can be uniquely identified and information about their activities can be recorded. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of cross-selling and up-selling,
  • 104. Rattle • Brief Introduction – market basket analysis – Market basket analysis might tell a retailer that customers often purchase shampoo and conditioner together, so putting both items on promotion at the same time would not create a significant increase in revenue, while a promotion involving just one of the items would likely drive sales of the other
  • 105. Rattle • Brief Introduction – association rules – if butter and bread are bought, customers also buy milk Example database with 4 items and 5 transactions transactio n ID milk bread butter beer 1 1 1 0 0 2 0 0 1 0 3 0 0 0 1 4 1 1 1 0 5 0 1 0 0
  • 106. Rattle • Brief Introduction – association rules – the itemset (milk,bread->butter) has a support of 20% since it occurs in 20% of all transactions (1 out of 5 transactions). – the itemset (milk,bread->butter) has a confidence of 50% since it occurs in 50% of all such transactions (1 out of 2 transactions). –
  • 107. Rattle • Brief Introduction – association rules
  • 108. Regression Models • lm function • Understanding output • Diagnostics – homoskedasticity – Multicollinearity – p value – Residuals
  • 109. Advanced Topics :Demos • Time Series Analysis (use epack plugin) http://decisionstats.com/2010/10/22/doing-time-series-using-a-r-gui/
  • 110. Advanced Topics :Demos • Advanced Data Visualization ( kmggplot2 plugin) http://decisionstats.com/2012/05/21/new-rcommander-with-ggplot-rstats/
  • 111. Advanced Topics :Demos Social Network Analysis (sna) Facebook http://decisionstats.com/2014/05/10/analyzing-facebook-networks-using-rstats/ Twitter http://www.slideshare.net/ajayohri/twitter-analysis-by-kaify-rais
  • 112. Advanced Topics :Demos • Spatial Analysis • ggmap demo • http://decisionstats.com/2013/08/19/the-wonderful-ggmap-package-for-spatial-analysis-in-r-rstats/ • rmaps • http://rcharts.io/viewer/?9223554#.Uw4hOPmSySp
  • 113. Thank You • http://linkedin.com/in/ajayohri • ohri2007@gmail.com