SlideShare une entreprise Scribd logo
1  sur  20
R IntroWeek 1 Scott Chamberlain [modified from Haldre Rogers] September 9, 2011
Don’t just listen to me! Other Intros to R: http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdf http://www.cyclismo.org/tutorial/R/ http://www.r-tutor.com/r-introduction Quick R: http://www.statmethods.net/ http://www.bioconductor.org/help/course-materials/2011/CSAMA/Monday/Morning%20Talks/R_intro.pdf
R user frameworks R from command line: OSX and PC Just type “R” into the command line – and have fun! R itself http://www.r-project.org/ RStudio – good choice http://www.rstudio.org/ RevolutionR [free academic version] – this is sort of the SAS-ised version of R http://www.revolutionanalytics.com/downloads/free-academic.php Uses proprietary .xdf file format that speeds up computation times Many other ways to use R, including GUIs, other IDEs, and huge variety of text editors https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources If you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red R You can learn using these interfaces what code does what after pressing buttons
R user frameworks, cont. R from Python RPy: http://rpy.sourceforge.net/ C from R:  rcpp package: http://cran.r-project.org/web/packages/Rcpp/index.html http://dirk.eddelbuettel.com/code/rcpp.html Can hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R. E.g., http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html & http://dirk.eddelbuettel.com/code/rcpp.examples.html Excel from R XLConnect package: http://cran.r-project.org/web/packages/XLConnect/index.html And more….see for yourself
R Tips R can crash  Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links:  https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources When asking for help on listserves/help websites, use BRIEF and  REPRODUCIBLE examples Not doing this makes people not want to help you! R automatically overwrites files with the same file name!!!! Make sure you want to overwrite a file before doing so
Style
Not this kind of style…
This kind of style!!!
Style Style is important so YOU and OTHERS can read your code and actually use it Google style guide:  http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html#generallayout Henrik Bengtsson style guide:  http://www1.maths.lth.se/help/R/RCC/ Hadley Wickham's style guide:  https://github.com/hadley/devtools/wiki/Style
Preparing your data for R What makes clean data? Correct spelling Identical capitalization (e.g. Premna vspremna) If myvector <- c(3, 4, 5), calling Myvector does not work! No spaces between words (spaces turned into “.”) Generally try to avoid, use underscores instead NA or blank (if using csv) for missing values Find and replace to get rid of spaces after words I generally keep an .xls and a .csv file so you can always recreate work in R with the .csv file and still modify the .xls file
Bringing data into R Create csv file One worksheet only No special formatting, filters, comments etc. Copy only columns and rows with your data to the CSV, as R will read in columns without data sometimes Name your variables well  self-explanatory, unique, lowercase, short-ish, one-word names In R, set the working directory setwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro") What is the working directory? getwd() What is in the working directory? dir() Read in data CSV files: iris.df <- read.csv("iris_df.csv", header=T) Clipboard: read.csv("clipboard")- reads in file like cutting and pasting it From web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV") From excel files: (using the XLConnect package) iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”) Write data write.csv(dataframe, “dataframename.csv”), OR save(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
R data structures Scalar: Object with a single value, either numeric or character Vector: Sequence of any values, including numeric, character, and NA List: Arbitrary collections of variables – very useful R object Character: Text, e.g., “this is some text” Factor: Like character vectors, but only w/ values in predefined “levels” Matrix: Only numeric values allowed Dataframe:  Each column can be of a different class Immutable dataframe:  special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculations Function Environment
Exploring dataframes str(dataframe) gives column formats and dimensions head(dataframe) and tail() give first and last 6 rows names(dataframe) gives column names row.names(dataframe) gives row names attributes(dataframe) gives column and row names and object class summary(dataframe) gives a lot of good information Make sure variables are appropriate form Character/string, Numeric, Factor, Integer, logical Make sure mins, maxs, means, etc. seem right Make sure you don’t have typing errors so Premna and premna are two separate factors Use: unique(iris$species) to see what all unique values of a column are Or use: levels(spider$species) to see different levels
To attach or not to attach…that is the question Some like to use ‘attach’ to make dataframe variables accessible by name within the R session  Generally, ‘attach’ is frowned upon by R junkies.   Use dataframe$y, or data=dataframe, or dataframe[,”y”], or dataframe[, 2] To detach the object, use: detach()   I recommend: do not use attach, but do what you want
R Packages 3,262 packages!!!! Packages are extensions written by anyone for any purpose, usually loaded by: install.packages(”packagename”), then require(packagename) or library() Use ?functionname for help on any function in base R or in R packages In RStudio, just press tab when in parentheses after the function name to see function options!!! Explore packages at the CRAN site: http://cran.r-project.org/web/packages/ Inside-R package reference:  http://www.inside-r.org/packages
Data manipulation Packages: plyr, data.table, doBY, sqldf, reshape2, and more Comparison of packages Modified from code from Recipes, scripts and Genomics blog: https://gist.github.com/878919 data.table is by far the fastest!!!  BUT, ease of use and flexibility may be plyr? See for yourself… Also, see examples in the tutorial code for reshape2 package for neat data manipulation tricks
Visualizations A few different approaches: Base graphics Lattice graphics Grid graphics ggplot2 graphics Further reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphics An example:
more on ggplot2 graphics There are classes taught by Hadley Wickham here at Rice if you want to learn more! Data visualization (Stat645): http://had.co.nz/stat645/ Statistical computing (Stat405): http://had.co.nz/stat405/ Hadley’s website is really helpful: http://had.co.nz/ggplot2/ The ggplot2 google groups site: https://groups.google.com/forum/#!forum/ggplot2
QUICK RSTUDIO RUN THROUGH Keyboard shortcuts!! http://www.rstudio.org/docs/using/keyboard_shortcuts
USE CASE HERE [see intro_usecase.R file]

Contenu connexe

Tendances

R programming slides
R  programming slidesR  programming slides
R programming slidesPankaj Saini
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 
How to get started with R programming
How to get started with R programmingHow to get started with R programming
How to get started with R programmingRamon Salazar
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to RstudioOlga Scrivner
 
R programming presentation
R programming presentationR programming presentation
R programming presentationAkshat Sharma
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPTANUSUYA T K
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataPaco Nathan
 
Top 100 Python Interview Questions And Answers
Top 100 Python Interview Questions And AnswersTop 100 Python Interview Questions And Answers
Top 100 Python Interview Questions And AnswersProBytes
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using RVictoria López
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with RKazuki Yoshida
 
Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and ClusteringAnkur Shrivastava
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming FundamentalsRagia Ibrahim
 

Tendances (20)

R programming slides
R  programming slidesR  programming slides
R programming slides
 
R data types
R   data typesR   data types
R data types
 
R programming
R programmingR programming
R programming
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
How to get started with R programming
How to get started with R programmingHow to get started with R programming
How to get started with R programming
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to Rstudio
 
R programming presentation
R programming presentationR programming presentation
R programming presentation
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPT
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
 
Top 100 Python Interview Questions And Answers
Top 100 Python Interview Questions And AnswersTop 100 Python Interview Questions And Answers
Top 100 Python Interview Questions And Answers
 
R programming
R programmingR programming
R programming
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
R Basics
R BasicsR Basics
R Basics
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
Programming in R
Programming in RProgramming in R
Programming in R
 
Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and Clustering
 
Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
 

En vedette

R language tutorial
R language tutorialR language tutorial
R language tutorialDavid Chiu
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformSyracuse University
 
R programming Basic & Advanced
R programming Basic & AdvancedR programming Basic & Advanced
R programming Basic & AdvancedSohom Ghosh
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial ProgrammingSakthi Dasans
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial IntroductionSakthi Dasans
 
Intro to RStudio
Intro to RStudioIntro to RStudio
Intro to RStudioegoodwintx
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching moduleSander Timmer
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with RShareThis
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environmentizahn
 
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Goran S. Milovanovic
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
Counterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning modelsCounterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning modelsMichael Manapat
 

En vedette (20)

R language tutorial
R language tutorialR language tutorial
R language tutorial
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
R programming
R programmingR programming
R programming
 
R programming Basic & Advanced
R programming Basic & AdvancedR programming Basic & Advanced
R programming Basic & Advanced
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
 
R presentation
R presentationR presentation
R presentation
 
R programming language
R programming languageR programming language
R programming language
 
Rtutorial
RtutorialRtutorial
Rtutorial
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
Intro to RStudio
Intro to RStudioIntro to RStudio
Intro to RStudio
 
R tutorial
R tutorialR tutorial
R tutorial
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Counterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning modelsCounterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning models
 

Similaire à R Introduction

Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchAndrew Lowe
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTHaritikaChhatwal1
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studioDerek Kane
 
Reproducible research (and literate programming) in R
Reproducible research (and literate programming) in RReproducible research (and literate programming) in R
Reproducible research (and literate programming) in Rliz__is
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISEDKazuki Yoshida
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programmingNimrita Koul
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
Introduction to r
Introduction to rIntroduction to r
Introduction to rgslicraf
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in RSamuel Bosch
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R StudioSusan Johnston
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
 
1 installing & Getting Started with R
1 installing & Getting Started with R1 installing & Getting Started with R
1 installing & Getting Started with RDr Nisha Arora
 

Similaire à R Introduction (20)

Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Unit 3
Unit 3Unit 3
Unit 3
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Reproducible research (and literate programming) in R
Reproducible research (and literate programming) in RReproducible research (and literate programming) in R
Reproducible research (and literate programming) in R
 
Basics R.ppt
Basics R.pptBasics R.ppt
Basics R.ppt
 
Basics.ppt
Basics.pptBasics.ppt
Basics.ppt
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISED
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Easy R
Easy REasy R
Easy R
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
1 installing & Getting Started with R
1 installing & Getting Started with R1 installing & Getting Started with R
1 installing & Getting Started with R
 

Plus de schamber

Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesisschamber
 
Phylogenetics in R
Phylogenetics in RPhylogenetics in R
Phylogenetics in Rschamber
 
Web data from R
Web data from RWeb data from R
Web data from Rschamber
 
regex-presentation_ed_goodwin
regex-presentation_ed_goodwinregex-presentation_ed_goodwin
regex-presentation_ed_goodwinschamber
 

Plus de schamber (6)

Poster
PosterPoster
Poster
 
Poster
PosterPoster
Poster
 
Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesis
 
Phylogenetics in R
Phylogenetics in RPhylogenetics in R
Phylogenetics in R
 
Web data from R
Web data from RWeb data from R
Web data from R
 
regex-presentation_ed_goodwin
regex-presentation_ed_goodwinregex-presentation_ed_goodwin
regex-presentation_ed_goodwin
 

Dernier

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Dernier (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

R Introduction

  • 1. R IntroWeek 1 Scott Chamberlain [modified from Haldre Rogers] September 9, 2011
  • 2. Don’t just listen to me! Other Intros to R: http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdf http://www.cyclismo.org/tutorial/R/ http://www.r-tutor.com/r-introduction Quick R: http://www.statmethods.net/ http://www.bioconductor.org/help/course-materials/2011/CSAMA/Monday/Morning%20Talks/R_intro.pdf
  • 3. R user frameworks R from command line: OSX and PC Just type “R” into the command line – and have fun! R itself http://www.r-project.org/ RStudio – good choice http://www.rstudio.org/ RevolutionR [free academic version] – this is sort of the SAS-ised version of R http://www.revolutionanalytics.com/downloads/free-academic.php Uses proprietary .xdf file format that speeds up computation times Many other ways to use R, including GUIs, other IDEs, and huge variety of text editors https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources If you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red R You can learn using these interfaces what code does what after pressing buttons
  • 4. R user frameworks, cont. R from Python RPy: http://rpy.sourceforge.net/ C from R: rcpp package: http://cran.r-project.org/web/packages/Rcpp/index.html http://dirk.eddelbuettel.com/code/rcpp.html Can hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R. E.g., http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html & http://dirk.eddelbuettel.com/code/rcpp.examples.html Excel from R XLConnect package: http://cran.r-project.org/web/packages/XLConnect/index.html And more….see for yourself
  • 5. R Tips R can crash  Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links: https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources When asking for help on listserves/help websites, use BRIEF and REPRODUCIBLE examples Not doing this makes people not want to help you! R automatically overwrites files with the same file name!!!! Make sure you want to overwrite a file before doing so
  • 7. Not this kind of style…
  • 8. This kind of style!!!
  • 9. Style Style is important so YOU and OTHERS can read your code and actually use it Google style guide: http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html#generallayout Henrik Bengtsson style guide: http://www1.maths.lth.se/help/R/RCC/ Hadley Wickham's style guide: https://github.com/hadley/devtools/wiki/Style
  • 10. Preparing your data for R What makes clean data? Correct spelling Identical capitalization (e.g. Premna vspremna) If myvector <- c(3, 4, 5), calling Myvector does not work! No spaces between words (spaces turned into “.”) Generally try to avoid, use underscores instead NA or blank (if using csv) for missing values Find and replace to get rid of spaces after words I generally keep an .xls and a .csv file so you can always recreate work in R with the .csv file and still modify the .xls file
  • 11. Bringing data into R Create csv file One worksheet only No special formatting, filters, comments etc. Copy only columns and rows with your data to the CSV, as R will read in columns without data sometimes Name your variables well self-explanatory, unique, lowercase, short-ish, one-word names In R, set the working directory setwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro") What is the working directory? getwd() What is in the working directory? dir() Read in data CSV files: iris.df <- read.csv("iris_df.csv", header=T) Clipboard: read.csv("clipboard")- reads in file like cutting and pasting it From web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV") From excel files: (using the XLConnect package) iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”) Write data write.csv(dataframe, “dataframename.csv”), OR save(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
  • 12. R data structures Scalar: Object with a single value, either numeric or character Vector: Sequence of any values, including numeric, character, and NA List: Arbitrary collections of variables – very useful R object Character: Text, e.g., “this is some text” Factor: Like character vectors, but only w/ values in predefined “levels” Matrix: Only numeric values allowed Dataframe: Each column can be of a different class Immutable dataframe: special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculations Function Environment
  • 13. Exploring dataframes str(dataframe) gives column formats and dimensions head(dataframe) and tail() give first and last 6 rows names(dataframe) gives column names row.names(dataframe) gives row names attributes(dataframe) gives column and row names and object class summary(dataframe) gives a lot of good information Make sure variables are appropriate form Character/string, Numeric, Factor, Integer, logical Make sure mins, maxs, means, etc. seem right Make sure you don’t have typing errors so Premna and premna are two separate factors Use: unique(iris$species) to see what all unique values of a column are Or use: levels(spider$species) to see different levels
  • 14. To attach or not to attach…that is the question Some like to use ‘attach’ to make dataframe variables accessible by name within the R session Generally, ‘attach’ is frowned upon by R junkies. Use dataframe$y, or data=dataframe, or dataframe[,”y”], or dataframe[, 2] To detach the object, use: detach()  I recommend: do not use attach, but do what you want
  • 15. R Packages 3,262 packages!!!! Packages are extensions written by anyone for any purpose, usually loaded by: install.packages(”packagename”), then require(packagename) or library() Use ?functionname for help on any function in base R or in R packages In RStudio, just press tab when in parentheses after the function name to see function options!!! Explore packages at the CRAN site: http://cran.r-project.org/web/packages/ Inside-R package reference: http://www.inside-r.org/packages
  • 16. Data manipulation Packages: plyr, data.table, doBY, sqldf, reshape2, and more Comparison of packages Modified from code from Recipes, scripts and Genomics blog: https://gist.github.com/878919 data.table is by far the fastest!!! BUT, ease of use and flexibility may be plyr? See for yourself… Also, see examples in the tutorial code for reshape2 package for neat data manipulation tricks
  • 17. Visualizations A few different approaches: Base graphics Lattice graphics Grid graphics ggplot2 graphics Further reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphics An example:
  • 18. more on ggplot2 graphics There are classes taught by Hadley Wickham here at Rice if you want to learn more! Data visualization (Stat645): http://had.co.nz/stat645/ Statistical computing (Stat405): http://had.co.nz/stat405/ Hadley’s website is really helpful: http://had.co.nz/ggplot2/ The ggplot2 google groups site: https://groups.google.com/forum/#!forum/ggplot2
  • 19. QUICK RSTUDIO RUN THROUGH Keyboard shortcuts!! http://www.rstudio.org/docs/using/keyboard_shortcuts
  • 20. USE CASE HERE [see intro_usecase.R file]

Notes de l'éditeur

  1. Header=T means first row contains variable names
  2. Some numbers are actually factors- think of 0/1 for dead/alive or zipcodes (average zipcode?)