Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 25 Publicité

Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis

In this session, attendees will learn how to use R in the distributed environment of Hadoop using the rmr package. Additionally, the R package googleVis will be used to show how application development teams can incorporate the power of R and the power of Google Chart Tools into their applications quickly and easily. The result is a rich custom data visualization with far less coding than what would otherwise be required. The session will begin by discussing R basics and then moving to concrete examples of statistical analysis on data sets. This will be accompanied by an application development example showing custom visualization of the analysis using googleVis. The application development example will show a browser based app both kicking off the data set analysis using R as well as the visualization of the result. Visualization examples will use both googleVis as well as basic Google Chart Tools. Attendees will leave the session with a concrete example of how to incorporate R into their existing application development practices and how to use Hadoop and its ecosystem to build custom visualizations.

In this session, attendees will learn how to use R in the distributed environment of Hadoop using the rmr package. Additionally, the R package googleVis will be used to show how application development teams can incorporate the power of R and the power of Google Chart Tools into their applications quickly and easily. The result is a rich custom data visualization with far less coding than what would otherwise be required. The session will begin by discussing R basics and then moving to concrete examples of statistical analysis on data sets. This will be accompanied by an application development example showing custom visualization of the analysis using googleVis. The application development example will show a browser based app both kicking off the data set analysis using R as well as the visualization of the result. Visualization examples will use both googleVis as well as basic Google Chart Tools. Attendees will leave the session with a concrete example of how to incorporate R into their existing application development practices and how to use Hadoop and its ecosystem to build custom visualizations.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis (20)

Publicité

Plus par Hortonworks (20)

Plus récents (20)

Publicité

Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis

  1. 1. Quick House Keeping Rule • Q&A panel is available if you have any questions during the webinar • There will be time for Q&A at the end • We will record the webinar for future viewing • All attendees will receive a copy of the slides an recording Page 1 © Hortonworks Inc. 2013
  2. 2. Hadoop, R, and Google Chart Tools Data Visualization for Application Developers Jeff Markham Solution Engineer jmarkham@hortonworks.com © Hortonworks Inc. 2013
  3. 3. Agenda • Introductions • Use Case Description • Preparation • Demo • Review • Q&A Page 3 © Hortonworks Inc. 2013
  4. 4. Use Case Description • Visualizing data • Tools vs. application development • Choosing the technology • Hortonworks Data Platform • RHadoop • Google Charts Page 4 © Hortonworks Inc. 2013
  5. 5. Preparation: Install HDP OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 5 © Hortonworks Inc. 2013
  6. 6. Preparation: Install R • Install R language • Install appropriate packages – rhdfs – rmr2 – googleVis – shiny – Dependencies for all above Page 6 © Hortonworks Inc. 2013
  7. 7. Preparation • rmr2 – Functions to allow for MapReduce in R apps • rhdfs – Functions allowing HDFS access in R apps • googleVis – Use of Google Chart Tools in R apps • shiny – Interactive web apps for R developers Page 7 © Hortonworks Inc. 2013
  8. 8. Demo Walkthrough Using Hadoop, R, and Google Chart Tools © Hortonworks Inc. 2012
  9. 9. Visualization Use Case • Data from CDC – Vital statistics publicly available data – 2010 US birth data file S 201001 7 2 2 30105 2 011 06 1 123 3405 1 06 01 2 2 SAMPLE RECORD 0321 1006 314 2000 2 222 22 2 2 2 122222 11 3 094 1 M 04 200940 39072 3941 083 22 2 2 22 110 110 00 0000000 00 000000000 000000 000 000000000000000000011 101 1 111 10 1 1 1 111111 11 1 1 11 source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm Page 9 © Hortonworks Inc. 2013
  10. 10. Visualization Use Case • Put data into HDFS – Create input directory – Put data into input directory CREATE HDFS DIR > hadoop fs –mkdir /user/jeff/natality PUT DATA INTO HDFS > hadoop fs –put ~/VS2010NATL.DETAILUS.DAT /user/jeff/natality/ Page 10 © Hortonworks Inc. 2013
  11. 11. Visualization Use Case • Write R script – Specify use of RHadoop packages – Initialize HDFS – Specify data input and output location #!/usr/bin/env Rscript require('rmr2') require('rhdfs') hdfs.init() R SCRIPT hdfs.data.root = 'natality' hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT') hdfs.out.root = hdfs.data.root hdfs.out = file.path(hdfs.out.root, 'out') ... Page 11 © Hortonworks Inc. 2013
  12. 12. Visualization Use Case • Write R script – Write mapper function – Write reducer function ... mapper = function(k, fields) { keyval(as.integer(substr(fields, 89, 90)),1) } R SCRIPT reducer = function(key, vv) { # count values for each key keyval(key, sum(as.numeric(vv),na.rm=TRUE)) } ... Page 12 © Hortonworks Inc. 2013
  13. 13. Visualization Use Case • Write R script – Write job function ... job = function (input, output) { mapreduce(input = input, output = output, R SCRIPT input.format = "text", map = mapper, reduce = reducer, combine = T) }... Page 13 © Hortonworks Inc. 2013
  14. 14. Visualization Use Case • Write R script – Write result to HDFS output directory ... R SCRIPT out = from.dfs(job(hdfs.data, hdfs.out)) results.df = as.data.frame(out,stringsAsFactors=F) Page 14 © Hortonworks Inc. 2013
  15. 15. Visualization Use Case • Create Shiny application – Create directory – Create ui.R – Create server.R SHINY APP DIR > mkdir ~/my-shiny-app Page 15 © Hortonworks Inc. 2013
  16. 16. Visualization Use Case • Create Shiny application – Create ui.R shinyUI(pageWithSidebar( # Application title headerPanel("2010 US Births"), sidebarPanel(. . .), UI.R SOURCE mainPanel( tabsetPanel( tabPanel("Line Chart", htmlOutput("lineChart")), tabPanel("Column Chart", htmlOutput("columnChart")) ) ) )) Page 16 © Hortonworks Inc. 2013
  17. 17. Visualization Use Case • Create Shiny application – Create server.R library(googleVis) library(shiny) library(rmr2) library(rhdfs) SERVER.R SOURCE hdfs.init() hdfs.data.root = 'natality' hdfs.data = file.path(hdfs.data.root, 'out') df = as.data.frame(from.dfs(hdfs.data)) ... Page 17 © Hortonworks Inc. 2013
  18. 18. Visualization Use Case • Create Shiny application – Create server.R ... shinyServer(function(input, output) { output$lineChart <- renderGvis({ SERVER.R SOURCE gvisLineChart(df, options=list( vAxis="{title:'Number of Births'}", hAxis="{title:'Age of Mother'}", legend="none" )) }) ... Page 18 © Hortonworks Inc. 2013
  19. 19. Visualization Use Case • Run Shiny application > shiny::runApp('~/my-shiny-app') Loading required package: shiny Welcome to googleVis version 0.4.0 RUN SHINY APP ... HADOOP_CMD=/usr/bin/hadoop Be sure to run hdfs.init() Listening on port 8100 Page 19 © Hortonworks Inc. 2013
  20. 20. Visualization Use Case • View Shiny application Page 20 © Hortonworks Inc. 2013
  21. 21. Demo Live Using Hadoop, R, and Google Chart Tools © Hortonworks Inc. 2012
  22. 22. Visualization Use Case • Architecture recap – Analyze data sets with R on Hadoop – Choose RHadoop packages – Visualize data with Google Chart Tools via googleVis package – Render googleVis output in Shiny applications • Architecture next steps – Integrate Shiny application into existing web apps – Create further data models with R Page 22 © Hortonworks Inc. 2013
  23. 23. HDP: Enterprise Hadoop Distribution OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Enterprise Hadoop Scale SQOOP Access Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 23 © Hortonworks Inc. 2013
  24. 24. HDP Sandbox Page 24 © Hortonworks Inc. 2013
  25. 25. Thank You! Jeff Markham Solution Engineer jmarkham@hortonworks.com Page 25 © Hortonworks Inc. 2012

Notes de l'éditeur

  • Hi, I’m Jeff Markham and I wanted to talk today about
  • Agenda points
  • Describe the use case and how to choose the tech
  • Start by installing HDP
  • Install R and dependencies
  • Go into more detail on the R packages
  • Walk through the demo before actually doing the demo
  • Describe the data set
  • Start with the very beginning: getting the downloaded data into Hadoop
  • Start explaining the R script. Kick it off with explanation of RHadoop packages and what they’re doing
  • Explain the mapper and reducer functions
  • Explain the job function
  • Wrap up with showing where the data lands
  • Show how to create the Shiny app. Start with creating the directory.
  • This the entirety of the Shiny UI. Help text in the sidebar is omitted for real estate.
  • Explain the server.R code. Note the imports of the relevant R packages.
  • Move to one of the functions that describes how Shiny wraps googleVis which wraps Google Chart Tools
  • Show how to kick off the Shiny app and note the listening port
  • Go to the browser and view the Shiny app
  • Cut to the live demo.
  • Recap what we just saw and suggest possible future steps to further develop the app
  • Hammer home HDP as the bedrock for the app
  • Suggest getting started with the Sandbox
  • Wrap up with Q &amp; A

×