Soumettre la recherche
Mettre en ligne
R meets Hadoop
•
Télécharger en tant que KEY, PDF
•
5 j'aime
•
5,350 vues
Hidekazu Tanaka
Suivre
Technologie
Signaler
Partager
Signaler
Partager
1 sur 28
Télécharger maintenant
Recommandé
RHadoop の紹介
RHadoop の紹介
Hidekazu Tanaka
imager package in R and examples..
imager package in R and examples..
Dr. Volkan OBAN
Advanced Data Visualization in R- Somes Examples.
Advanced Data Visualization in R- Somes Examples.
Dr. Volkan OBAN
Basic Calculus in R.
Basic Calculus in R.
Dr. Volkan OBAN
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part II
Dr. Volkan OBAN
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
Revolution Analytics
A Shiny Example-- R
A Shiny Example-- R
Dr. Volkan OBAN
Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...
Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...
Dr. Volkan OBAN
Recommandé
RHadoop の紹介
RHadoop の紹介
Hidekazu Tanaka
imager package in R and examples..
imager package in R and examples..
Dr. Volkan OBAN
Advanced Data Visualization in R- Somes Examples.
Advanced Data Visualization in R- Somes Examples.
Dr. Volkan OBAN
Basic Calculus in R.
Basic Calculus in R.
Dr. Volkan OBAN
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part II
Dr. Volkan OBAN
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
Revolution Analytics
A Shiny Example-- R
A Shiny Example-- R
Dr. Volkan OBAN
Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...
Some R Examples[R table and Graphics] -Advanced Data Visualization in R (Some...
Dr. Volkan OBAN
Mosaic plot in R.
Mosaic plot in R.
Dr. Volkan OBAN
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions
Dr. Volkan OBAN
ggplot2 extensions-ggtree.
ggplot2 extensions-ggtree.
Dr. Volkan OBAN
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
The Statistical and Applied Mathematical Sciences Institute
C++ TUTORIAL 6
C++ TUTORIAL 6
Farhan Ab Rahman
ECMAScript 6 major changes
ECMAScript 6 major changes
hayato
C++ TUTORIAL 7
C++ TUTORIAL 7
Farhan Ab Rahman
Plot3D Package and Example in R.-Data visualizat,on
Plot3D Package and Example in R.-Data visualizat,on
Dr. Volkan OBAN
Python hmm
Python hmm
立民 林
Effector: we need to go deeper
Effector: we need to go deeper
Victor Didenko
Om (Cont.)
Om (Cont.)
Taku Fukushima
C++ TUTORIAL 10
C++ TUTORIAL 10
Farhan Ab Rahman
Angular Refactoring in Real World
Angular Refactoring in Real World
bitbank, Inc. Tokyo, Japan
Hacking the Internet of Things for Fun & Profit
Hacking the Internet of Things for Fun & Profit
Ruben van Vreeland
C++ TUTORIAL 3
C++ TUTORIAL 3
Farhan Ab Rahman
C++ TUTORIAL 9
C++ TUTORIAL 9
Farhan Ab Rahman
Camping
Camping
Gregor Schmidt
dplyr
dplyr
Romain Francois
Metaprogramming
Metaprogramming
Dmitri Nesteruk
Data aggregation in R
Data aggregation in R
Andrija Djurovic
Refactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
All I know about rsc.io/c2go
All I know about rsc.io/c2go
Moriyoshi Koizumi
Contenu connexe
Tendances
Mosaic plot in R.
Mosaic plot in R.
Dr. Volkan OBAN
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions
Dr. Volkan OBAN
ggplot2 extensions-ggtree.
ggplot2 extensions-ggtree.
Dr. Volkan OBAN
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
The Statistical and Applied Mathematical Sciences Institute
C++ TUTORIAL 6
C++ TUTORIAL 6
Farhan Ab Rahman
ECMAScript 6 major changes
ECMAScript 6 major changes
hayato
C++ TUTORIAL 7
C++ TUTORIAL 7
Farhan Ab Rahman
Plot3D Package and Example in R.-Data visualizat,on
Plot3D Package and Example in R.-Data visualizat,on
Dr. Volkan OBAN
Python hmm
Python hmm
立民 林
Effector: we need to go deeper
Effector: we need to go deeper
Victor Didenko
Om (Cont.)
Om (Cont.)
Taku Fukushima
C++ TUTORIAL 10
C++ TUTORIAL 10
Farhan Ab Rahman
Angular Refactoring in Real World
Angular Refactoring in Real World
bitbank, Inc. Tokyo, Japan
Hacking the Internet of Things for Fun & Profit
Hacking the Internet of Things for Fun & Profit
Ruben van Vreeland
C++ TUTORIAL 3
C++ TUTORIAL 3
Farhan Ab Rahman
C++ TUTORIAL 9
C++ TUTORIAL 9
Farhan Ab Rahman
Camping
Camping
Gregor Schmidt
dplyr
dplyr
Romain Francois
Metaprogramming
Metaprogramming
Dmitri Nesteruk
Data aggregation in R
Data aggregation in R
Andrija Djurovic
Tendances
(20)
Mosaic plot in R.
Mosaic plot in R.
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions
ggplot2 extensions-ggtree.
ggplot2 extensions-ggtree.
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
C++ TUTORIAL 6
C++ TUTORIAL 6
ECMAScript 6 major changes
ECMAScript 6 major changes
C++ TUTORIAL 7
C++ TUTORIAL 7
Plot3D Package and Example in R.-Data visualizat,on
Plot3D Package and Example in R.-Data visualizat,on
Python hmm
Python hmm
Effector: we need to go deeper
Effector: we need to go deeper
Om (Cont.)
Om (Cont.)
C++ TUTORIAL 10
C++ TUTORIAL 10
Angular Refactoring in Real World
Angular Refactoring in Real World
Hacking the Internet of Things for Fun & Profit
Hacking the Internet of Things for Fun & Profit
C++ TUTORIAL 3
C++ TUTORIAL 3
C++ TUTORIAL 9
C++ TUTORIAL 9
Camping
Camping
dplyr
dplyr
Metaprogramming
Metaprogramming
Data aggregation in R
Data aggregation in R
Similaire à R meets Hadoop
Refactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
All I know about rsc.io/c2go
All I know about rsc.io/c2go
Moriyoshi Koizumi
R (Shiny Package) - Server Side Code for Decision Support System
R (Shiny Package) - Server Side Code for Decision Support System
Maithreya Chakravarthula
Joclad 2010 d
Joclad 2010 d
a1000caroliveira
BOXPLOT EXAMPLES in R And An Example for BEESWARM:
BOXPLOT EXAMPLES in R And An Example for BEESWARM:
Dr. Volkan OBAN
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
Samir Bessalah
dplyr use case
dplyr use case
Romain Francois
Introduction to R
Introduction to R
Sander Kieft
Jan 2012 HUG: RHadoop
Jan 2012 HUG: RHadoop
Yahoo Developer Network
From Javascript To Haskell
From Javascript To Haskell
ujihisa
Aaron Ellison Keynote: Reaching the 99%
Aaron Ellison Keynote: Reaching the 99%
David LeBauer
Hadoop I/O Analysis
Hadoop I/O Analysis
Richard McDougall
オープンデータを使ったモバイルアプリ開発(応用編)
オープンデータを使ったモバイルアプリ開発(応用編)
Takayuki Goto
Implementing Software Machines in Go and C
Implementing Software Machines in Go and C
Eleanor McHugh
ZeroMQ: Messaging Made Simple
ZeroMQ: Messaging Made Simple
Ian Barber
Scilab presentation
Scilab presentation
Nasir Ansari
R programming language
R programming language
Alberto Minetti
Артём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data Analysis
SpbDotNet Community
Using R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective Dashboard
Andrea Gigli
Clojure to Slang
Clojure to Slang
Magne Gåsland
Similaire à R meets Hadoop
(20)
Refactoring to Macros with Clojure
Refactoring to Macros with Clojure
All I know about rsc.io/c2go
All I know about rsc.io/c2go
R (Shiny Package) - Server Side Code for Decision Support System
R (Shiny Package) - Server Side Code for Decision Support System
Joclad 2010 d
Joclad 2010 d
BOXPLOT EXAMPLES in R And An Example for BEESWARM:
BOXPLOT EXAMPLES in R And An Example for BEESWARM:
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
dplyr use case
dplyr use case
Introduction to R
Introduction to R
Jan 2012 HUG: RHadoop
Jan 2012 HUG: RHadoop
From Javascript To Haskell
From Javascript To Haskell
Aaron Ellison Keynote: Reaching the 99%
Aaron Ellison Keynote: Reaching the 99%
Hadoop I/O Analysis
Hadoop I/O Analysis
オープンデータを使ったモバイルアプリ開発(応用編)
オープンデータを使ったモバイルアプリ開発(応用編)
Implementing Software Machines in Go and C
Implementing Software Machines in Go and C
ZeroMQ: Messaging Made Simple
ZeroMQ: Messaging Made Simple
Scilab presentation
Scilab presentation
R programming language
R programming language
Артём Акуляков - F# for Data Analysis
Артём Акуляков - F# for Data Analysis
Using R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective Dashboard
Clojure to Slang
Clojure to Slang
Plus de Hidekazu Tanaka
ggplot2 に入門してみた
ggplot2 に入門してみた
Hidekazu Tanaka
データベースのお話
データベースのお話
Hidekazu Tanaka
フォントのお話
フォントのお話
Hidekazu Tanaka
フォントのお話
フォントのお話
Hidekazu Tanaka
バギングで構築された各決定木
バギングで構築された各決定木
Hidekazu Tanaka
アンサンブル学習
アンサンブル学習
Hidekazu Tanaka
Rの紹介
Rの紹介
Hidekazu Tanaka
Rで解く最適化問題 線型計画問題編
Rで解く最適化問題 線型計画問題編
Hidekazu Tanaka
RでMapreduce
RでMapreduce
Hidekazu Tanaka
Rによるやさしい統計学 第16章 : 因子分析
Rによるやさしい統計学 第16章 : 因子分析
Hidekazu Tanaka
Plus de Hidekazu Tanaka
(10)
ggplot2 に入門してみた
ggplot2 に入門してみた
データベースのお話
データベースのお話
フォントのお話
フォントのお話
フォントのお話
フォントのお話
バギングで構築された各決定木
バギングで構築された各決定木
アンサンブル学習
アンサンブル学習
Rの紹介
Rの紹介
Rで解く最適化問題 線型計画問題編
Rで解く最適化問題 線型計画問題編
RでMapreduce
RでMapreduce
Rによるやさしい統計学 第16章 : 因子分析
Rによるやさしい統計学 第16章 : 因子分析
Dernier
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Roshan Dwivedi
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
SynarionITSolutions
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Boston Institute of Analytics
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
The Digital Insurer
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
Dernier
(20)
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
R meets Hadoop
1.
2.
3.
4.
5.
6.
6
7.
7
8.
8
9.
Blocks (Input Data)
Parallel Partition Node 1 Node 2 Node 3 (MAP) Network Transfer Parallel Recombine Node 1 Node 2 Node 3 (REDUCE) Output Data 9
10.
10
11.
11
12.
12
13.
13
14.
14
15.
• R CMD INSTALL
Rhipe_version.tar.gz 15
16.
map <- expression({
# }) reduce <- expression( pre = {}, reduce = {}, post = {} ) z <- rhmr(map=map, reduce=reduce, inout=c("text","sequence") ,ifolder=”/tmp/input”, ofolder=”/tmp/output”) rhex(z) results <- rhread(“/tmp/output”) 16
17.
map <- expression({
library(openNLP) f <- table(tokenize(unlist(map.values), language = "en")) n <- names(f) p <- as.numeric(f) sapply(seq_along(n),function(r) rhcollect(n[r],p[r])) }) reduce <- expression( pre = { total <- 0}, reduce = { total <- total+sum(unlist(reduce.values)) }, post = { rhcollect(reduce.key,total) } ) z <- rhmr(map=map, reduce=reduce, inout=c("text","sequence") ,ifolder=”/tmp/input”, ofolder=”/tmp/output”) rhex(z) 17
18.
> results
<- rhread("/tmp/output") > results <- data.frame(word=unlist(lapply(results,"[[",1))’ + ,count =unlist (lapply(results,"[[",2))) > results <- (results[order(results$count, decreasing=TRUE), ]) > head(results) word count 13 . 2080 439 the 1101 11 , 760 32 a 701 153 to 658 28 I 651 > results[results["word"] == "FACEBOOK", ] word count 3221 FACEBOOK 6 > results[results["word"] == "Facebook", ] word count 3223 Facebook 39 > results[results["word"] == "facebook", ] word count 3389 facebook 6 18
19.
19
20.
map <- expression({
msys <- function(on){ system(sprintf("wget %s --directory-prefix ./tmp 2> ./errors",on)) if(length(grep("(failed)|(unable)",readLines("./errors")))>0){ stop(paste(readLines("./errors"),collapse="n")) }} lapply(map.values,function(x){ x=1986+x on <- sprintf("http://stat-computing.org/dataexpo/2009/%s.csv.bz2",x) fn <- sprintf("./tmp/%s.csv.bz2",x) rhstatus(sprintf("Downloading %s", on)) msys(on) rhstatus(sprintf("Downloaded %s", on)) system(sprintf('bunzip2 %s',fn)) rhstatus(sprintf("Unzipped %s", on)) rhcounter("FILES",x,1) rhcounter("FILES","_ALL_",1) }) }) z <- rhmr(map=map,ofolder="/airline/data",inout=c("lapply"), N=length(1987:2008), mapred=list(mapred.reduce.tasks=0,mapred.task.timeout=0),copyFiles=TRUE) j <- rhex(z,async=TRUE) 20
21.
setup <- expression({
convertHHMM <- function(s){ t(sapply(s,function(r){ l=nchar(r) if(l==4) c(substr(r,1,2),substr(r,3,4)) else if(l==3) c(substr(r,1,1),substr(r,2,3)) else c('0','0') }) )} }) map <- expression({ y <- do.call("rbind",lapply(map.values,function(r){ if(substr(r,1,4)!='Year') strsplit(r,",")[[1]] })) mu <- rep(1,nrow(y));yr <- y[,1]; mn=y[,2];dy=y[,3] hr <- convertHHMM(y[,5]) depart <- ISOdatetime(year=yr,month=mn,day=dy,hour=hr[,1],min=hr[,2],sec=mu) hr <- convertHHMM(y[,6]) sdepart <- ISOdatetime(year=yr,month=mn,day=dy,hour=hr[,1],min=hr[,2],sec=mu) hr <- convertHHMM(y[,7]) arrive <- ISOdatetime(year=yr,month=mn,day=dy,hour=hr[,1],min=hr[,2],sec=mu) hr <- convertHHMM(y[,8]) sarrive <- ISOdatetime(year=yr,month=mn,day=dy,hour=hr[,1],min=hr[,2],sec=mu) d <- data.frame(depart= depart,sdepart = sdepart, arrive = arrive,sarrive =sarrive ,carrier = y[,9],origin = y[,17], dest=y[,18],dist = y[,19], year=yr, month=mn, day=dy ,cancelled=y[,22], stringsAsFactors=FALSE) d <- d[order(d$sdepart),] rhcollect(d[c(1,nrow(d)),"sdepart"],d) }) reduce <- expression( reduce = { lapply(reduce.values, function(i) rhcollect(reduce.key,i))} ) z <- rhmr(map=map,reduce=reduce,setup=setup,inout=c("text","sequence") ,ifolder="/airline/data/",ofolder="/airline/blocks",mapred=mapred,orderby="numeric") 21 rhex(z)
22.
map <- expression({
a <- do.call("rbind",map.values) inbound <- table(a[,'origin']) outbound <- table(a[,'dest']) total <- table(unlist(c(a[,'origin'],a['dest']))) for (n in names(total)) { inb <- if(is.na(inbound[n])) 0 else inbound[n] ob <- if(is.na(outbound[n])) 0 else outbound[n] rhcollect(n, c(inb,ob, total[n])) } }) reduce <- expression( pre = { sums <- c(0,0,0) }, reduce = { sums <- sums+apply(do.call("rbind",reduce.values),2,sum) }, post = { rhcollect(reduce.key, sums) } ) z <- rhmr(map=map,reduce=reduce,combiner=TRUE,inout=c("sequence","sequence") ,ifolder="/airline/blocks/",ofolder="/airline/volume") rhex(z,async=TRUE) 22
23.
> counts
<- rhread("/airline/volume") > aircode <- unlist(lapply(counts, "[[",1)) > count <- do.call("rbind",lapply(counts,"[[",2)) > results <- data.frame(aircode=aircode, + inb=count[,1],oub=count[,2],all=count[,3] + ,stringsAsFactors=FALSE) > results <- results[order(results$all,decreasing=TRUE),] > ap <- read.table("~/tmp/airports.csv",sep=",",header=TRUE, + stringsAsFactors=FALSE,na.strings="XYZB") > results$airport <- sapply(results$aircode,function(r){ + nam <- ap[ap$iata==r,'airport'] + if(length(nam)==0) r else nam + }) > results[1:10,] aircode inb oub all airport 243 ORD 6597442 6638035 13235477 Chicago O'Hare International 21 ATL 6100953 6094186 12195139 William B Hartsfield-Atlanta Intl 91 DFW 5710980 5745593 11456573 Dallas-Fort Worth International 182 LAX 4089012 4086930 8175942 Los Angeles International 254 PHX 3491077 3497764 6988841 Phoenix Sky Harbor International 89 DEN 3319905 3335222 6655127 Denver Intl 97 DTW 2979158 2997138 5976296 Detroit Metropolitan-Wayne County 156 IAH 2884518 2889971 5774489 George Bush Intercontinental 230 MSP 2754997 2765191 5520188 Minneapolis-St Paul Intl 300 SFO 2733910 2725676 5459586 San Francisco International 23
24.
25.
map <- expression({
a <- do.call("rbind",map.values) y <- table(apply(a[,c("origin","dest")],1,function(r){ paste(sort(r),collapse=",") })) for(i in 1:length(y)){ p <- strsplit(names(y)[[i]],",")[[1]] rhcollect(p,y[[1]]) } }) reduce <- expression( pre = {sums <- 0}, reduce = {sums <- sums+sum(unlist(reduce.values))}, post = { rhcollect(reduce.key, sums) } ) z <- rhmr(map=map,reduce=reduce,combiner=TRUE,inout=c("sequence","sequence") ,ifolder="/airline/blocks/",ofolder="/airline/ijjoin") z=rhex(z) 25
26.
> b=rhread("/airline/ijjoin") > y
<- do.call("rbind",lapply(b,"[[",1)) > results <- data.frame(a=y[,1],b=y[,2],count= + do.call("rbind",lapply(b,"[[",2)),stringsAsFactors=FALSE) > results <- results[order(results$count,decreasing=TRUE),] > results$cumprop <- cumsum(results$count)/sum(results$count) > a.lat <- t(sapply(results$a,function(r){ + ap[ap$iata==r,c('lat','long')] + })) > results$a.lat <- unlist(a.lat[,'lat']) > results$a.long <- unlist(a.lat[,'long']) > b.lat <- t(sapply(results$b,function(r){ + ap[ap$iata==r,c('lat','long')] + })) > b.lat["CBM",] <- c(0,0) > results$b.lat <- unlist(b.lat[,'lat']) > results$b.long <- unlist(b.lat[,'long']) > head(results) a b count cumprop a.lat a.long b.lat b.long 418 ATL ORD 141465 0.001546158 33.64044 -84.42694 41.97960 -87.90446 2079 DEN DFW 138892 0.003064195 39.85841 -104.66700 32.89595 -97.03720 331 ATL DFW 135357 0.004543595 33.64044 -84.42694 32.89595 -97.03720 2221 DFW IAH 134508 0.006013716 32.89595 -97.03720 29.98047 -95.33972 3568 LAS LAX 132333 0.007460065 36.08036 -115.15233 33.94254 -118.40807 2409 DTW ORD 130065 0.008881626 42.21206 -83.34884 41.97960 -87.90446 26
27.
27
28.
28
Notes de l'éditeur
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Télécharger maintenant