This document outlines the agenda for a two-day workshop on learning R and analytics. Day 1 will introduce R and cover data input, quality, and exploration. Day 2 will focus on data manipulation, visualization, regression models, and advanced topics. Sessions include lectures and demos in R. The goal is to help attendees learn R in 12 hours and gain an introduction to analytics skills for career opportunities.
4. Agenda
• Try and learn R in 12 hours
• Get an introduction to Analytics
5. Agenda
• Try and learn R in 12 hours
• Get an introduction to Analytics
• Be better skilled for Analytics as a career
6. Agenda
• Try and learn R in 12 hours
• Get an introduction to Analytics
• Be better skilled for Analytics as a career (?)
7. Training Plan
• DAY 1
– Session 1 -2.5 hours
– Session 2 -3.5 hours
• DAY 2
– Session 1-2.5 hours
– Session 2 -3.5 hours
8. Instructor
• Author of R for Business Analytics
• Author of R for Cloud Computing ( An
approach for Data Scientists)
• 10+ yrs in Analytics and 6+ years in R
• Founder, Decisionstats.com
11. Expectations from each other
• From Instructor
• From Audience
– mobile phones should be kindly switched off
• Yes, this includes Whatsapp
– Ask Questions at end of session
– Take Notes
12. Day 1 Session 1
– Introductions
• Introduction to Analytics
• Introduction to R
• Interfaces in R
– Demos in R (Maths, Objects,etc)
• Break 1-
– Installation, Trouble Shooting, Questions
13. Day 1 Session 2
– Recap
• Input of Data
• Inspecting Data Quality
• Investigating Data Issues
– Demos in R
• Data Input,
• Data Quality,
• Data Exploration)
• Break 2-
– Questions
14. Day 2 Session 1
– Revision
• Exploring Data
• Manipulating Data
• Visualization of Data
• Demos in R
• Data Exploration,
• Data Manipulation,
• Data Visualizations
• Break 1
– Questions
15. Day 2 Session 2
– Recap
• Data Mining
• Regression Models
• Advanced Topics
• Demos in R
• Data Mining,
• Model Building,
• Advanced Topics
• Summary and Conclusion
• Break 2
– Questions
16. Analytics
• What is analytics?
• Where is it used?
• How is it used?
• What are some good practices?
17. Analytics
• What is analytics? – Study of data for helping
with decision making using software
• Where is it used?
• How is it used?
• What are some good practices?
18. Analytics
• What is analytics?
• Where is it used? – Industries (like Pharma,
BFSI, Telecom, Retail)
• How is it used? –Use statistics and software
• What are some good practices?
19. Analytics
• What is analytics?
• Where is it used?
• How is it used?
• What are some good practices? –
– Learn one new thing extra from your
competition every day. This is a fast moving field.
– Etc.
23. What is R?
http://www.r-project.org/
• Language
– Object oriented
– Open Source
– Free
– Widely used
the concept of "objects" that have data fields(attributes that describe the object)
and associated procedures known as methods. Objects, which are
usually instances of classes, are used to interact with one another to design
applications and computer programs
24. Pre Requisites
• Installation of R
http://cran.rstudio.com/bin/windows/base/
• R Studio
• R Packages
25. Pre Requisites
• Installation of R
– Rtools
– http://cran.rstudio.com/bin/windows/Rtools/
• R Studio
• R Packages
26. Pre Requisites
• Installation of R
– RTools
• R Studio
http://www.rstudio.com/products/rstudio/download/
• R Packages
27. Pre Requisites
• Installation of R
– RTools
• R Studio
http://www.rstudio.com/products/rstudio/download/
• R Packages
about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet
sites covering a very wide range of modern statistics.
28. Pre Requisites
• Installation of R
– RTools
• R Studio
http://www.rstudio.com/products/rstudio/download/
• R Packages
install.packages(),
update.packages(),
library()
Packages are installed once, updated periodically, but loaded every time
29. Pre Requisites
• R
• R Studio
• R Tools (for Windows)
• JAVA (JRE)
– R Packages (need Internet connection)
– Rcmdr
• All packages asked at startup
• Epack plugin
• KMggplot2plugin
– rattle
• A few packages that are asked when using rattle
• GTK+ (needs internet)
– Deducer
– ggmap
– Hmisc
– arules
– MASS
31. Demo-
Basic Math on R Console
• +
• -
• Log
• Exp
• *
• /
• ()
• mean
• sum
• sd
• log
• median
• exp
32. Demo-
Basic Math on R Console
• +
• -
• Log
• Exp
• *
• /
• ()
Hint- Ctrl +L clears screen
33. Demo-
Basic Objects on R Console
• +
• -
• Log
• Exp
• *
• /
• ()
Functions-ls()
– what objects are here
rm(“foo”) removes object named foo
Assignment
Using = or -> assigns object names to values
Hint- Up arrow gives you last
typed command
35. Functions and Loops
• Function
functionajay=function(a)(a^2+2*a+1)
Hint: Always match brackets
Each ( deserves a )
Each { deserves a }
Each [ deserves a ]
36. Demo-
Basic Objects on R Console
• +
• -
• Log
• Exp
• *
This is made more clear in
next slide
Functions-class()
gives class
dim() gives dimensions
nrow() gives rows
ncol() gives columns
length() gives length
str() gives structure
Hint- Up arrow gives you last
typed command
37. Demo-
Datasets on R Console
•
Hint- use data() to list all loaded
datasets
38. Demo-
Datasets on R Console
•
Hint- use data() to list all loaded
datasets
library(FOO) loads package “FOO”
40. Day 1 Session 2
– Recap
• Input of Data
• Inspecting Data Quality
• Investigating Data Issues
– Demos in R
• Data Input,
• Data Quality,
• Data Exploration)
• Break 2-
– Questions
42. Statistical formats
• read.spss from foreign package
• read.sas7bdat from sas7bdat package
43. From Databases
The RODBC package provides access to databases through
an ODBC interface.
The primary functions are
• odbcConnect(dsn, uid="", pwd="") Open a connection
to an ODBC database
• sqlFetch(channel, sqltable) Read a table from an ODBC
database into a data frame
Hint- a good site to learn R
http://www.statmethods.net
45. From Web (aka Web Scraping)
• readlines Hint : R is case sensitive
readlines is not the same as readLines
Hint : Use head() and tail() to inspect objects
Other packages are XML and Curl
Case Study- http://decisionstats.com/2013/04/14/using-r-for-cricket-analysis-rstats/
49. Data Selection
• object[l,m] gives the value in l row and m
column
• object[l,] will give all the values in l row
• object$varname gives all values of varname
• subset helps in selection
50. Data Selection: Demo
Questions- How do I use multiple conditions (AND OR)
Can I do away with subset function
How do I select random sample
Useful Link- http://decisionstats.com/2013/11/24/50-functions-to-clear-a-basic-interview-for-business-
analytics-rstats/
51. Day 2 Session 1
– Revision
• Exploring Data
• Manipulating Data
• Visualization of Data
• Demos in R
• Data Exploration,
• Data Manipulation,
• Data Visualizations
• Break 1
– Questions
52. Good coding practices
• Use # for comment
• Use git for version control
• Use Rstudio for multiple lines of code
53. Functions in R
• custom functions
• source code for a function
• Understanding help ? , ??
78. Data Exploration
• summary()
• table()
• describe() (Hmisc)
• summarize()(Hmisc)
Hint- Try this code please
summary(mtcars)
table(mtcars$cyl)
library(Hmisc)
describe(mtcars)
summarize(mtcars$mpg,mtcars$cyl,mean)
CLASS WORK-
•Use table command for two variables
•Summarize mtcars$mpg for two variables (cyl , gear)
•Try and find min and max for the same
79. Data Exploration
• missing values are represented by NA in R
• Demo
– is.na
– na.omit
– na.rm
80. Data Visualization
Notes-
Explaining Basic Types of Graphs
Customizing Graphs
Graph Output
Advanced Graphs
Facets,
Grammar of Graphics
Data Visualization Rules
84. Date Manipulation
Use ? help generously
Hit escape to escape the + signs
+ signs occur due to unclosed quotes or brackets
Class Work
What is your age in days as of today?
What is your age in weeks as of today?
Hint-
> age2=difftime(Sys.Date(),dob2,units='weeks')
> age2
Time difference of 1959.286 weeks
85. Data Output
• Graphical Output
• Numerical Output (aggregation)
86. Data Output
• Graphical Output
• Numerical Output (aggregation)
103. Data Mining
• Brief Introduction
– Affinity analysis is a data analysis and data mining technique that
discovers co-occurrence relationships among activities performed by (or
recorded about) specific individuals or groups. In general, this can be
applied to any process where agents can be uniquely identified and
information about their activities can be recorded. In retail, affinity
analysis is used to perform market basket analysis, in which retailers seek
to understand the purchase behavior of customers. This information can
then be used for purposes of cross-selling and up-selling,
104. Rattle
• Brief Introduction
– market basket analysis
– Market basket analysis might tell a retailer that customers often
purchase shampoo and conditioner together, so putting both items on
promotion at the same time would not create a significant increase in
revenue, while a promotion involving just one of the items would likely
drive sales of the other
105. Rattle
• Brief Introduction
– association rules
– if butter and bread are bought, customers also buy milk
Example database with 4 items and 5 transactions
transactio
n ID
milk bread butter beer
1 1 1 0 0
2 0 0 1 0
3 0 0 0 1
4 1 1 1 0
5 0 1 0 0
106. Rattle
• Brief Introduction
– association rules
– the itemset (milk,bread->butter) has a support of 20% since it occurs in 20% of all
transactions (1 out of 5 transactions).
– the itemset (milk,bread->butter) has a confidence of 50% since it occurs in 50% of all
such transactions (1 out of 2 transactions).
–