SlideShare une entreprise Scribd logo
1  sur  30
Data Analytics with R and SQL Server
Stéphane Fréchette
Thursday March 19, 2015
Who am I?
My name is Stéphane Fréchette
SQL Server MVP | Consultant | Speaker | Data & BI Architect | Big Data
|NoSQL | Data Science. Drums, good food and fine wine.
I have a passion for architecting, designing and building solutions that
matter.
Twitter: @sfrechette
Blog: stephanefrechette.com
Email: stephanefrechette@ukubu.com
Topics
• What is R?
• Should I use R?
• Data Structures
• Graphics
• Data Manipulation in R
• Connecting to SQL Server
• Demos
• Resources
• Q&A
DISCLAIMER
This is not a course nor a tutorial, but
an introduction, a walkthrough to
inspire you to further explore and
learn more about R and statistical computing
“ Analysis of data is a process of inspecting, cleaning,
transforming, and modeling data with the goal of
discovering useful information, suggesting conclusions,
and supporting decision-making. Data analysis has
multiple facets and approaches, encompassing diverse
techniques under a variety of names, in different business,
science, and social science domains.”
- Wikipedia
What is R?
• A programming language, environment for statistical computing and graphics
• R has its origins in the S programming language created in the 1970’s
• Best used to manipulate moderately sized datasets, do statistical analysis and
produce data-centric documents and presentations
• These tools are distributed as packages, which any user can download to
customize the R environment
• Cross-platform: runs on Mac, Windows and Unix based systems
Should I use R?
Are you
doing
statistics
?
No Yes
No Yes
Where “statistics” can mean machine learning, predictive analytics, data
science, anything that falls under a rather broad umbrella…
But if you have some data that makes sense to represent in a tabular like
structure, and you want to do some cool analytical or statistics stuff with it, R is
definitely a good choice…
Downloading and Installing R
http://www.r-project.org/ http://www.rstudio.com/
The IDE (RStudio)
1. View Files and Data
2. See Workspace and
History
3. See Files, Plots,
Packages and Help
4. Console
1 2
34
Installing Packages
• To use packages in R, one must first install them using the install.packages
function
• Downloads the packages from CRAN and installs it to ready to be use
Loading Packages
• To use particular packages in your current R session, one must load it into the
R environment using the library or require functions
Common Data Structures in R
To make the best of the R language, one needs a strong understanding of the
basic data types and data structures and how to operate and use them.
R has a wide variety of data types including scalars, vectors (numerical,
character, logical), matrices, data frames, and lists…
To understand computations in R, two slogans are helpful:
• Everything that exists is an object
• Everything that happens is a function call
John Chambers
creator of the S programming language, and core member of the R programming language project.
Data Structures - Vectors
The simplest structure is the numeric vector, which is a single entity consisting of an ordered
collection of numbers.
Data Structures - Matrices
Matrices are nothing more than 2-dimensional vectors. To define a matrix, use the function
matrix.
Data Structures - Data frames
Time series are often ordered in data frames. A data frame is a matrix with names above the
columns. This is nice, because you can call and use one of the columns without knowing in
which position it is.
Data Structures - Lists
An R list is an object consisting of an ordered collection of objects known as its components.
Data Structures - Date and Time
Sys.time() # returns the current system date time
Data Structures - Date and Time
Two main (internal) formats for date-time are: POSIXct and POSIXlt
• POSIXct: A short format of date-time, typically used to store date-time columns in a data-frame
• POSIXlt: A long format of date-time, various other sub-units of time can be extracted from here
Data Structures - Others
Other useful and important data type
• NULL: Typically used for initializing variables. (x = NULL) creates a variable x of length zero.
The function is.null() returns TRUE or FALSE and tells whether a variable is NULL or not.
• NA: Used for denoting missing values. (x = NA) creates a variable x with missing values.
The function is.na() returns TRUE or FALSE and tells whether a variable is NA or not.
• NaN: NaN stands for “Not a Number”. Prints a warning message in console. The function
is.nan() lets you check whether the value of a variable is NaN or not.
• Inf: Inf stands for “Infinity”. (x = 10/0 ; y = -3/0) sets value of x to Inf ad y to –Inf. The
function is.finite() lets you check whether the value of a variable is infinity or not.
Graphics
One of the main reasons data analysts and data
scientists turn to R is for its strong graphic
capabilities.
Basic Graphs:
• These include density plots (histograms and kernel
density plots), dot plots, bar charts (simple,
stacked, grouped), line charts, pie charts (simple,
annotated, 3D), boxplots (simple, notched, violin
plots, bagplots) and scatter plots (simple, with fit
lines, scatterplot matrices, high density plots, and
3D plots).
Graphics
Advances Graphs:
• Graphical parameters describes how to change a
graph's symbols, fonts, colors, and lines. Axes and
text describe how to customize a graph's axes, add
reference lines, text annotations and a legend.
Combining plots describes how to organize
multiple plots into a single graph.
• The lattice package provides a comprehensive
system for visualizing multivariate data, including
the ability to create plots conditioned on one or
more variables. The ggplot2 package offers a
elegant systems for generating univariate and
multivariate graphs based on a grammar of
graphics.
Data Manipulation in R
dplyr an R package for fast and easy data manipulation.
Data manipulation often involves common tasks, such as selecting certain variables, filtering
on certain conditions, deriving new variables from existing variables, and so forth. If we
think of these tasks as “verbs”, we can define a grammar of sorts for data manipulation.
In dplyr the main verbs (or functions) are:
• filter: select a subset of the rows of a data frame
• arrange: works similarly to filter, except that instead of filtering or selecting rows, it
reorders them
• select: select columns of a data frame
• mutate: add new columns to a data frame that are functions of existing columns
• summarize: summarize values
• group_by: describe how to break a data frame into groups of rows
Demo
[dplyr – manipulating data]
Connecting R and SQL Server
The RODBC package provides access to databases (including Microsoft Access
and Microsoft SQL Server) through an ODBC interface
Function Description
odbcConnection(dsn, uid = “”, pwd = “”) Open a connection to an ODBC database
sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame
sqlQuery(channel, query) Submit a query to an ODBC database and return the
results
sqlSave(channel, mydf, tablename = sqtable, append
= FALSE)
Write or update (append=TRUE) a data frame to a
table in the ODBC database
sqlDrop(channel, sqtable) Remove a table from the ODBC database
close(channel) Close the connection
RODBC Example
Other interface
The RJDBC package provides access to databases through a JDBC interface.
(requires JDBC driver from Microsoft)
Demo
[Let’s analyze - R and SQL Server]
Resources
• The R Project for Statistical Computing http://www.r-project.org/
• RStudio http://www.rstudio.com/
• Revolution Analytics http://www.revolutionanalytics.com/
• Shiny http://shiny.rstudio.com/
• {swirl} Learn R, in R http://swirlstats.com/
• R-bloggers http://www.r-bloggers.com/
• Online R resources for Beginners http://bit.ly/1x2q6Gl
• 60+ R resources to improve your data skills http://bit.ly/1BzW4ox
• Stack Overflow - R http://stackoverflow.com/tags/r
• Cerebral Mastication - R Resources http://bit.ly/17YhZj4
• Microsoft JDBC Drivers 4.1 and 4.0 for SQL Server http://bit.ly/1kEgJ7O
What Questions Do You Have?
Thank You
For attending this session

Contenu connexe

Tendances

Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Simplilearn
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
bartlowe
 

Tendances (20)

Data Management in R
Data Management in RData Management in R
Data Management in R
 
Data Visualization.pptx
Data Visualization.pptxData Visualization.pptx
Data Visualization.pptx
 
Unit 2
Unit 2Unit 2
Unit 2
 
Multidimensional schema of data warehouse
Multidimensional schema of data warehouseMultidimensional schema of data warehouse
Multidimensional schema of data warehouse
 
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
 
Data Visualization & Analytics.pptx
Data Visualization & Analytics.pptxData Visualization & Analytics.pptx
Data Visualization & Analytics.pptx
 
Data Analysis & Visualization using MS. Excel
Data Analysis & Visualization using MS. ExcelData Analysis & Visualization using MS. Excel
Data Analysis & Visualization using MS. Excel
 
3 data visualization
3 data visualization3 data visualization
3 data visualization
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Data cleansing
Data cleansingData cleansing
Data cleansing
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau
 
Data visualization
Data visualizationData visualization
Data visualization
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 

En vedette (6)

A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 
RHadoop
RHadoopRHadoop
RHadoop
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
 
Introduction to Data Analytics with R
Introduction to Data Analytics with RIntroduction to Data Analytics with R
Introduction to Data Analytics with R
 
Tata consultancy services final
Tata consultancy services finalTata consultancy services final
Tata consultancy services final
 

Similaire à Data Analytics with R and SQL Server

Unit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxUnit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptx
SreeLaya9
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
SonaCharles2
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
helzerpatrina
 

Similaire à Data Analytics with R and SQL Server (20)

Big data analytics with R tool.pptx
Big data analytics with R tool.pptxBig data analytics with R tool.pptx
Big data analytics with R tool.pptx
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
Introduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSIntroduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICS
 
Essentials of R
Essentials of REssentials of R
Essentials of R
 
R programming by ganesh kavhar
R programming by ganesh kavharR programming by ganesh kavhar
R programming by ganesh kavhar
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Unit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxUnit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptx
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Advanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.pptAdvanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.ppt
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Data Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptxData Structure & aaplications_Module-1.pptx
Data Structure & aaplications_Module-1.pptx
 
Data mining query language
Data mining query languageData mining query language
Data mining query language
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
 
Unit1_Introduction to R.pdf
Unit1_Introduction to R.pdfUnit1_Introduction to R.pdf
Unit1_Introduction to R.pdf
 

Plus de Stéphane Fréchette

Plus de Stéphane Fréchette (18)

Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016
 
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston  Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
 
Power BI - Bring your data together
Power BI - Bring your data togetherPower BI - Bring your data together
Power BI - Bring your data together
 
Self-Service Data Integration with Power Query
Self-Service Data Integration with Power QuerySelf-Service Data Integration with Power Query
Self-Service Data Integration with Power Query
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
 
Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 WinnipegGraph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
 
Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server Professionals
 
SQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any DataSQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any Data
 
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
 
TEDxGatineau
TEDxGatineau TEDxGatineau
TEDxGatineau
 
Power BI
Power BIPower BI
Power BI
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012
 
Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012
 
Business Intelligence in Excel 2013
Business Intelligence in Excel 2013Business Intelligence in Excel 2013
Business Intelligence in Excel 2013
 
Gatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publiqueGatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publique
 
Gatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publiqueGatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publique
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Data Analytics with R and SQL Server

  • 1. Data Analytics with R and SQL Server Stéphane Fréchette Thursday March 19, 2015
  • 2. Who am I? My name is Stéphane Fréchette SQL Server MVP | Consultant | Speaker | Data & BI Architect | Big Data |NoSQL | Data Science. Drums, good food and fine wine. I have a passion for architecting, designing and building solutions that matter. Twitter: @sfrechette Blog: stephanefrechette.com Email: stephanefrechette@ukubu.com
  • 3. Topics • What is R? • Should I use R? • Data Structures • Graphics • Data Manipulation in R • Connecting to SQL Server • Demos • Resources • Q&A
  • 4. DISCLAIMER This is not a course nor a tutorial, but an introduction, a walkthrough to inspire you to further explore and learn more about R and statistical computing
  • 5. “ Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.” - Wikipedia
  • 6. What is R? • A programming language, environment for statistical computing and graphics • R has its origins in the S programming language created in the 1970’s • Best used to manipulate moderately sized datasets, do statistical analysis and produce data-centric documents and presentations • These tools are distributed as packages, which any user can download to customize the R environment • Cross-platform: runs on Mac, Windows and Unix based systems
  • 7. Should I use R? Are you doing statistics ? No Yes No Yes Where “statistics” can mean machine learning, predictive analytics, data science, anything that falls under a rather broad umbrella… But if you have some data that makes sense to represent in a tabular like structure, and you want to do some cool analytical or statistics stuff with it, R is definitely a good choice…
  • 8. Downloading and Installing R http://www.r-project.org/ http://www.rstudio.com/
  • 9. The IDE (RStudio) 1. View Files and Data 2. See Workspace and History 3. See Files, Plots, Packages and Help 4. Console 1 2 34
  • 10. Installing Packages • To use packages in R, one must first install them using the install.packages function • Downloads the packages from CRAN and installs it to ready to be use
  • 11. Loading Packages • To use particular packages in your current R session, one must load it into the R environment using the library or require functions
  • 12. Common Data Structures in R To make the best of the R language, one needs a strong understanding of the basic data types and data structures and how to operate and use them. R has a wide variety of data types including scalars, vectors (numerical, character, logical), matrices, data frames, and lists… To understand computations in R, two slogans are helpful: • Everything that exists is an object • Everything that happens is a function call John Chambers creator of the S programming language, and core member of the R programming language project.
  • 13. Data Structures - Vectors The simplest structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers.
  • 14. Data Structures - Matrices Matrices are nothing more than 2-dimensional vectors. To define a matrix, use the function matrix.
  • 15. Data Structures - Data frames Time series are often ordered in data frames. A data frame is a matrix with names above the columns. This is nice, because you can call and use one of the columns without knowing in which position it is.
  • 16. Data Structures - Lists An R list is an object consisting of an ordered collection of objects known as its components.
  • 17. Data Structures - Date and Time Sys.time() # returns the current system date time
  • 18. Data Structures - Date and Time Two main (internal) formats for date-time are: POSIXct and POSIXlt • POSIXct: A short format of date-time, typically used to store date-time columns in a data-frame • POSIXlt: A long format of date-time, various other sub-units of time can be extracted from here
  • 19. Data Structures - Others Other useful and important data type • NULL: Typically used for initializing variables. (x = NULL) creates a variable x of length zero. The function is.null() returns TRUE or FALSE and tells whether a variable is NULL or not. • NA: Used for denoting missing values. (x = NA) creates a variable x with missing values. The function is.na() returns TRUE or FALSE and tells whether a variable is NA or not. • NaN: NaN stands for “Not a Number”. Prints a warning message in console. The function is.nan() lets you check whether the value of a variable is NaN or not. • Inf: Inf stands for “Infinity”. (x = 10/0 ; y = -3/0) sets value of x to Inf ad y to –Inf. The function is.finite() lets you check whether the value of a variable is infinity or not.
  • 20. Graphics One of the main reasons data analysts and data scientists turn to R is for its strong graphic capabilities. Basic Graphs: • These include density plots (histograms and kernel density plots), dot plots, bar charts (simple, stacked, grouped), line charts, pie charts (simple, annotated, 3D), boxplots (simple, notched, violin plots, bagplots) and scatter plots (simple, with fit lines, scatterplot matrices, high density plots, and 3D plots).
  • 21. Graphics Advances Graphs: • Graphical parameters describes how to change a graph's symbols, fonts, colors, and lines. Axes and text describe how to customize a graph's axes, add reference lines, text annotations and a legend. Combining plots describes how to organize multiple plots into a single graph. • The lattice package provides a comprehensive system for visualizing multivariate data, including the ability to create plots conditioned on one or more variables. The ggplot2 package offers a elegant systems for generating univariate and multivariate graphs based on a grammar of graphics.
  • 22. Data Manipulation in R dplyr an R package for fast and easy data manipulation. Data manipulation often involves common tasks, such as selecting certain variables, filtering on certain conditions, deriving new variables from existing variables, and so forth. If we think of these tasks as “verbs”, we can define a grammar of sorts for data manipulation. In dplyr the main verbs (or functions) are: • filter: select a subset of the rows of a data frame • arrange: works similarly to filter, except that instead of filtering or selecting rows, it reorders them • select: select columns of a data frame • mutate: add new columns to a data frame that are functions of existing columns • summarize: summarize values • group_by: describe how to break a data frame into groups of rows
  • 24. Connecting R and SQL Server The RODBC package provides access to databases (including Microsoft Access and Microsoft SQL Server) through an ODBC interface Function Description odbcConnection(dsn, uid = “”, pwd = “”) Open a connection to an ODBC database sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame sqlQuery(channel, query) Submit a query to an ODBC database and return the results sqlSave(channel, mydf, tablename = sqtable, append = FALSE) Write or update (append=TRUE) a data frame to a table in the ODBC database sqlDrop(channel, sqtable) Remove a table from the ODBC database close(channel) Close the connection
  • 26. Other interface The RJDBC package provides access to databases through a JDBC interface. (requires JDBC driver from Microsoft)
  • 27. Demo [Let’s analyze - R and SQL Server]
  • 28. Resources • The R Project for Statistical Computing http://www.r-project.org/ • RStudio http://www.rstudio.com/ • Revolution Analytics http://www.revolutionanalytics.com/ • Shiny http://shiny.rstudio.com/ • {swirl} Learn R, in R http://swirlstats.com/ • R-bloggers http://www.r-bloggers.com/ • Online R resources for Beginners http://bit.ly/1x2q6Gl • 60+ R resources to improve your data skills http://bit.ly/1BzW4ox • Stack Overflow - R http://stackoverflow.com/tags/r • Cerebral Mastication - R Resources http://bit.ly/17YhZj4 • Microsoft JDBC Drivers 4.1 and 4.0 for SQL Server http://bit.ly/1kEgJ7O
  • 29. What Questions Do You Have?
  • 30. Thank You For attending this session