SlideShare a Scribd company logo
1 of 20
Download to read offline
data.table talk
January 21, 2015
The data.table package
author: Pete Dodd date: 4 November, 2014
dataframes in R
What is a dataframe?
default R objects for holding data
can mix numeric, and text data
ordered/unordered factors
many statistical functions require dataframe inputs
dataframes in R
Problems:
print!
slow searching
verbose syntax
no built-in methods for aggregation
Which is most annoying depends on who you are. . .
Constructing data.tables
myDT <- data.table(
number=1:3,
letter=c('a','b','c')
) # like data.frame constructor
myDT2 <- as.data.frame(myDF) #conversion
The data.table class inherits dataframe, so data.tables (mostly) can
be used exactly like dataframes, and should not break existing code.
Examples
WHO TB data:
D <- read.csv('TB_burden_countries_2014-09-10.csv')
names(D)[1:10]
## [1] "country" "iso2" "iso3"
## [5] "g_whoregion" "year" "e_pop_num"
## [9] "e_prev_100k_lo" "e_prev_100k_hi"
Examples
WHO TB data:
head(D[,c(1,6,8)])
## country year e_prev_100k
## 1 Afghanistan 1990 327
## 2 Afghanistan 1991 359
## 3 Afghanistan 1992 387
## 4 Afghanistan 1993 412
## 5 Afghanistan 1994 431
## 6 Afghanistan 1995 447
Examples
Mean TB in Afghanistan
mean(D[D$country=='Afghanistan','e_prev_100k'])
## [1] 397.6087
As data.table:
library(data.table)
E <- as.data.table(D) #convert
E[country=='Afghanistan',mean(e_prev_100k)]
## [1] 397.6087
Examples
dataframe multi-column access:
D[D$country=='Afghanistan',
c('e_prev_100k','e_prev_100k_lo',
'e_prev_100k_hi')]
data.table multi-column means, renamed:
E[country=='Afghanistan',
list(mid=mean(e_prev_100k),
lo=mean(e_prev_100k_lo),
hi=mean(e_prev_100k_hi))]
## mid lo hi
## 1: 397.6087 187.913 684.7391
Examples
Means for each country? data.table solution:
E[,list(mid=mean(e_prev_100k)),by=country]
## country mid
## 1: Afghanistan 397.60870
## 2: Albania 29.52174
## 3: Algeria 133.95652
## 4: American Samoa 15.09130
## 5: Andorra 30.71304
## ---
## 215: Wallis and Futuna Islands 117.86957
## 216: West Bank and Gaza Strip 11.14783
## 217: Yemen 180.30435
## 218: Zambia 501.39130
## 219: Zimbabwe 386.30435
Examples
A more complicated example:
E[,
list(lo=mean(e_prev_100k_lo),
hi=mean(e_prev_100k_hi)),
by=list(country,
century=factor(year<2000)
)]
Examples
Output:
## country century lo hi
## 1: Afghanistan TRUE 189.20000 749.80000
## 2: Afghanistan FALSE 186.92308 634.69231
## 3: Albania TRUE 13.20000 65.40000
## 4: Albania FALSE 10.59231 47.53846
## 5: Algeria TRUE 49.40000 212.80000
## ---
## 427: Yemen FALSE 62.69231 218.38462
## 428: Zambia TRUE 291.60000 1024.90000
## 429: Zambia FALSE 197.00000 733.76923
## 430: Zimbabwe TRUE 14.81000 1074.60000
## 431: Zimbabwe FALSE 56.07692 1219.61538
Examples
eo <- E[,plot(sort(e_prev_100k))]
0 1000 2000 3000 4000 5000
050010001500
Index
sort(e_prev_100k)
(1-
line combination with aggregations
Fast insertion
A new column can be inserted by:
E[,country_t := paste0(country,year)]
head(E[,country_t])
## [1] "Afghanistan1990" "Afghanistan1991" "Afghanistan1992
## [5] "Afghanistan1994" "Afghanistan1995"
Keys: fast row retrieval
Need to pre-compute (setkey line)
setkey(E,country) #must be sorted
E['Afghanistan',e_inc_100k]
## country e_inc_100k
## 1: Afghanistan 189
## 2: Afghanistan 189
## 3: Afghanistan 189
## 4: Afghanistan 189
## 5: Afghanistan 189
## 6: Afghanistan 189
## 7: Afghanistan 189
## 8: Afghanistan 189
## 9: Afghanistan 189
## 10: Afghanistan 189
## 11: Afghanistan 189
## 12: Afghanistan 189
Gotchas: column access
E[,1]
## [1] 1
E[,1,with=FALSE]
## country
## 1: Afghanistan
## 2: Afghanistan
## 3: Afghanistan
## 4: Afghanistan
## 5: Afghanistan
## ---
## 4899: Zimbabwe
## 4900: Zimbabwe
## 4901: Zimbabwe
## 4902: Zimbabwe
## 4903: Zimbabwe
Gotchas: copying
E2 <- E
E[,foo:='bar']
head(E2[,foo])
## [1] "bar" "bar" "bar" "bar" "bar" "bar"
Gotchas: copying
This is because copying is by reference.
Use:
E2 <- copy(E)
instead.
Summary
more compact
faster (sometimes lots)
less memory
great for aggregation/exploratory data crunching
But: - a few traps for the unwary
Good package vignettes & FAQ,
Related
aggregate in base R
plyr: use of ddply
sqldf: good if you know SQL
RSQLlite: ditto
other: - RODBC etc: talk to databases - dplyr: nascent, by Hadley,
internal & external

More Related Content

What's hot

Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeJeffrey Breen
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply FunctionSakthi Dasans
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factorskrishna singh
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In RRsquared Academy
 
Data preparation, depth function
Data preparation, depth functionData preparation, depth function
Data preparation, depth functionFAO
 
Python Pandas
Python PandasPython Pandas
Python PandasSunil OS
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting SpatialFAO
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In RRsquared Academy
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RRsquared Academy
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data StructureSakthi Dasans
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat SheetACASH1011
 

What's hot (20)

Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Rsplit apply combine
Rsplit apply combineRsplit apply combine
Rsplit apply combine
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R code
 
R factors
R   factorsR   factors
R factors
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Pandas
PandasPandas
Pandas
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
Data preparation, depth function
Data preparation, depth functionData preparation, depth function
Data preparation, depth function
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting Spatial
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In R
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In R
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat Sheet
 

Viewers also liked

How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)Paul Richards
 
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...Paul Richards
 
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflowSheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflowPaul Richards
 
Introduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users groupIntroduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users groupPaul Richards
 
constants, variables and datatypes in C
constants, variables and datatypes in Cconstants, variables and datatypes in C
constants, variables and datatypes in CSahithi Naraparaju
 
Data and its types by adeel
Data and its types by adeelData and its types by adeel
Data and its types by adeelAyaan Adeel
 
Concept Of C++ Data Types
Concept Of C++ Data TypesConcept Of C++ Data Types
Concept Of C++ Data Typesk v
 
How to Present Data in PowerPoint
How to Present Data in PowerPointHow to Present Data in PowerPoint
How to Present Data in PowerPointMatt Hunter
 

Viewers also liked (11)

How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
 
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
 
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflowSheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
 
Introduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users groupIntroduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users group
 
constants, variables and datatypes in C
constants, variables and datatypes in Cconstants, variables and datatypes in C
constants, variables and datatypes in C
 
Data and its types by adeel
Data and its types by adeelData and its types by adeel
Data and its types by adeel
 
Data types
Data typesData types
Data types
 
Data presentation 2
Data presentation 2Data presentation 2
Data presentation 2
 
Presentation of data
Presentation of dataPresentation of data
Presentation of data
 
Concept Of C++ Data Types
Concept Of C++ Data TypesConcept Of C++ Data Types
Concept Of C++ Data Types
 
How to Present Data in PowerPoint
How to Present Data in PowerPointHow to Present Data in PowerPoint
How to Present Data in PowerPoint
 

Similar to Introduction to data.table in R

Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling Edureka!
 
Data structure manual
Data structure manualData structure manual
Data structure manualsameer farooq
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen TatarynovFwdays
 
Data manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsyData manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsySmartHinJ
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data framekrishna singh
 
Getting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdfGetting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdfSudhakarVenkey
 
R Programming.pptx
R Programming.pptxR Programming.pptx
R Programming.pptxkalai75
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxdataKarthik
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RStacy Irwin
 
Writing Readable Code with Pipes
Writing Readable Code with PipesWriting Readable Code with Pipes
Writing Readable Code with PipesRsquared Academy
 
CUDA First Programs: Computer Architecture CSE448 : UAA Alaska : Notes
CUDA First Programs: Computer Architecture CSE448 : UAA Alaska : NotesCUDA First Programs: Computer Architecture CSE448 : UAA Alaska : Notes
CUDA First Programs: Computer Architecture CSE448 : UAA Alaska : NotesSubhajit Sahu
 
Python Programming.pptx
Python Programming.pptxPython Programming.pptx
Python Programming.pptxSudhakarVenkey
 

Similar to Introduction to data.table in R (20)

R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
 
Doc 20180130-wa0005
Doc 20180130-wa0005Doc 20180130-wa0005
Doc 20180130-wa0005
 
Doc 20180130-wa0004-1
Doc 20180130-wa0004-1Doc 20180130-wa0004-1
Doc 20180130-wa0004-1
 
Doc 20180130-wa0004
Doc 20180130-wa0004Doc 20180130-wa0004
Doc 20180130-wa0004
 
Introduction to tibbles
Introduction to tibblesIntroduction to tibbles
Introduction to tibbles
 
RBootcam Day 2
RBootcam Day 2RBootcam Day 2
RBootcam Day 2
 
Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling
 
Data structure manual
Data structure manualData structure manual
Data structure manual
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
 
Data manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsyData manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsy
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data frame
 
Getting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdfGetting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdf
 
R Programming.pptx
R Programming.pptxR Programming.pptx
R Programming.pptx
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Computer Science Assignment Help
Computer Science Assignment Help Computer Science Assignment Help
Computer Science Assignment Help
 
Writing Readable Code with Pipes
Writing Readable Code with PipesWriting Readable Code with Pipes
Writing Readable Code with Pipes
 
Big Data Analytics Lab File
Big Data Analytics Lab FileBig Data Analytics Lab File
Big Data Analytics Lab File
 
CUDA First Programs: Computer Architecture CSE448 : UAA Alaska : Notes
CUDA First Programs: Computer Architecture CSE448 : UAA Alaska : NotesCUDA First Programs: Computer Architecture CSE448 : UAA Alaska : Notes
CUDA First Programs: Computer Architecture CSE448 : UAA Alaska : Notes
 
Python Programming.pptx
Python Programming.pptxPython Programming.pptx
Python Programming.pptx
 

More from Paul Richards

SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...Paul Richards
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Paul Richards
 
Querying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users GpQuerying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users GpPaul Richards
 
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...Paul Richards
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Paul Richards
 
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Paul Richards
 
Introduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in RIntroduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in RPaul Richards
 

More from Paul Richards (7)

SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
 
Querying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users GpQuerying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users Gp
 
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
 
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
 
Introduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in RIntroduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in R
 

Recently uploaded

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyAnusha Are
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 

Recently uploaded (20)

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 

Introduction to data.table in R

  • 2. The data.table package author: Pete Dodd date: 4 November, 2014
  • 3. dataframes in R What is a dataframe? default R objects for holding data can mix numeric, and text data ordered/unordered factors many statistical functions require dataframe inputs
  • 4. dataframes in R Problems: print! slow searching verbose syntax no built-in methods for aggregation Which is most annoying depends on who you are. . .
  • 5. Constructing data.tables myDT <- data.table( number=1:3, letter=c('a','b','c') ) # like data.frame constructor myDT2 <- as.data.frame(myDF) #conversion The data.table class inherits dataframe, so data.tables (mostly) can be used exactly like dataframes, and should not break existing code.
  • 6. Examples WHO TB data: D <- read.csv('TB_burden_countries_2014-09-10.csv') names(D)[1:10] ## [1] "country" "iso2" "iso3" ## [5] "g_whoregion" "year" "e_pop_num" ## [9] "e_prev_100k_lo" "e_prev_100k_hi"
  • 7. Examples WHO TB data: head(D[,c(1,6,8)]) ## country year e_prev_100k ## 1 Afghanistan 1990 327 ## 2 Afghanistan 1991 359 ## 3 Afghanistan 1992 387 ## 4 Afghanistan 1993 412 ## 5 Afghanistan 1994 431 ## 6 Afghanistan 1995 447
  • 8. Examples Mean TB in Afghanistan mean(D[D$country=='Afghanistan','e_prev_100k']) ## [1] 397.6087 As data.table: library(data.table) E <- as.data.table(D) #convert E[country=='Afghanistan',mean(e_prev_100k)] ## [1] 397.6087
  • 9. Examples dataframe multi-column access: D[D$country=='Afghanistan', c('e_prev_100k','e_prev_100k_lo', 'e_prev_100k_hi')] data.table multi-column means, renamed: E[country=='Afghanistan', list(mid=mean(e_prev_100k), lo=mean(e_prev_100k_lo), hi=mean(e_prev_100k_hi))] ## mid lo hi ## 1: 397.6087 187.913 684.7391
  • 10. Examples Means for each country? data.table solution: E[,list(mid=mean(e_prev_100k)),by=country] ## country mid ## 1: Afghanistan 397.60870 ## 2: Albania 29.52174 ## 3: Algeria 133.95652 ## 4: American Samoa 15.09130 ## 5: Andorra 30.71304 ## --- ## 215: Wallis and Futuna Islands 117.86957 ## 216: West Bank and Gaza Strip 11.14783 ## 217: Yemen 180.30435 ## 218: Zambia 501.39130 ## 219: Zimbabwe 386.30435
  • 11. Examples A more complicated example: E[, list(lo=mean(e_prev_100k_lo), hi=mean(e_prev_100k_hi)), by=list(country, century=factor(year<2000) )]
  • 12. Examples Output: ## country century lo hi ## 1: Afghanistan TRUE 189.20000 749.80000 ## 2: Afghanistan FALSE 186.92308 634.69231 ## 3: Albania TRUE 13.20000 65.40000 ## 4: Albania FALSE 10.59231 47.53846 ## 5: Algeria TRUE 49.40000 212.80000 ## --- ## 427: Yemen FALSE 62.69231 218.38462 ## 428: Zambia TRUE 291.60000 1024.90000 ## 429: Zambia FALSE 197.00000 733.76923 ## 430: Zimbabwe TRUE 14.81000 1074.60000 ## 431: Zimbabwe FALSE 56.07692 1219.61538
  • 13. Examples eo <- E[,plot(sort(e_prev_100k))] 0 1000 2000 3000 4000 5000 050010001500 Index sort(e_prev_100k) (1- line combination with aggregations
  • 14. Fast insertion A new column can be inserted by: E[,country_t := paste0(country,year)] head(E[,country_t]) ## [1] "Afghanistan1990" "Afghanistan1991" "Afghanistan1992 ## [5] "Afghanistan1994" "Afghanistan1995"
  • 15. Keys: fast row retrieval Need to pre-compute (setkey line) setkey(E,country) #must be sorted E['Afghanistan',e_inc_100k] ## country e_inc_100k ## 1: Afghanistan 189 ## 2: Afghanistan 189 ## 3: Afghanistan 189 ## 4: Afghanistan 189 ## 5: Afghanistan 189 ## 6: Afghanistan 189 ## 7: Afghanistan 189 ## 8: Afghanistan 189 ## 9: Afghanistan 189 ## 10: Afghanistan 189 ## 11: Afghanistan 189 ## 12: Afghanistan 189
  • 16. Gotchas: column access E[,1] ## [1] 1 E[,1,with=FALSE] ## country ## 1: Afghanistan ## 2: Afghanistan ## 3: Afghanistan ## 4: Afghanistan ## 5: Afghanistan ## --- ## 4899: Zimbabwe ## 4900: Zimbabwe ## 4901: Zimbabwe ## 4902: Zimbabwe ## 4903: Zimbabwe
  • 17. Gotchas: copying E2 <- E E[,foo:='bar'] head(E2[,foo]) ## [1] "bar" "bar" "bar" "bar" "bar" "bar"
  • 18. Gotchas: copying This is because copying is by reference. Use: E2 <- copy(E) instead.
  • 19. Summary more compact faster (sometimes lots) less memory great for aggregation/exploratory data crunching But: - a few traps for the unwary Good package vignettes & FAQ,
  • 20. Related aggregate in base R plyr: use of ddply sqldf: good if you know SQL RSQLlite: ditto other: - RODBC etc: talk to databases - dplyr: nascent, by Hadley, internal & external