SlideShare a Scribd company logo
1 of 20
Download to read offline
Big Data Step-by-Step
                              Boston Predictive Analytics
                                 Big Data Workshop
                                Microsoft New England Research &
                               Development Center, Cambridge, MA
                                    Saturday, March 10, 2012



                                                            by Jeffrey Breen

                                                        President and Co-Founder
         http://atms.gr/bigdata0310                   Atmosphere Research Group
                                                      email: jeffrey@atmosgrp.com
                                                             Twitter: @JeffreyBreen

Saturday, March 10, 2012
n ee d a
             just         AM
                  mo re R
           little
                   Big Data Infrastructure
                           Part 2: Running R + RStudio on Amazon EC2




    Code & more on github:
    https://github.com/jeffreybreen/tutorial-201203-big-data
Saturday, March 10, 2012
Overview

                    • Sometimes you just need a little more
                           RAM, CPU, or disk space than you have
                    • Let’s try launching an instance on Amazon
                           EC2 and configuring it to do some work
                    • We’ll install R and RStudio and call it a day


Saturday, March 10, 2012
Some details we’ll skip
                    • Signing up (it’s not that hard)
                           http://aws.amazon.com/ec2/

                    • Pricing (it keeps dropping)
                           http://aws.amazon.com/ec2/pricing/

                    • The alphabet soup of services (we care
                           about EC2 computing and S3 storage)



Saturday, March 10, 2012
Just look for biggest button on the page...




Saturday, March 10, 2012
Select an Amazon Machine Image
                ami-7385461a is a good, recent 64-bit CentOS
                image published by RightScale




Saturday, March 10, 2012
Only use EBS images
                • Instance-storage machines lose their data
                      upon shutdown (termination)
                • EBS instances can be stopped and restarted,
                      or terminated when you’re done forever




Saturday, March 10, 2012
Pick a size
                See http://aws.amazon.com/ec2/instance-types/




                                          Already out of date! Amazon introduced
                                          new “m1.medium” instance type this week.




Saturday, March 10, 2012
Avoid Premature Termination
                Set Termination Protection + Shutdown Behavior




Saturday, March 10, 2012
Name your instance




Saturday, March 10, 2012
Create a key pair
                Don’t forget to download it (and keep it safe!)




Saturday, March 10, 2012
Create a Security Group
                All TCP, UDP and ICMP from your IP address




Saturday, March 10, 2012
Don’t know your IP address?
                Don’t ask me. Ask Google!




                (simply append “/32” when entering into firewall rules)



Saturday, March 10, 2012
3... 2... 1...




Saturday, March 10, 2012
State = running
                Up and running at specified domain name




Saturday, March 10, 2012
Time to get all command line
                    • You’ll need an ssh client and the key pair we
                           generated in order to connect with your
                           instance
                    • We’ll use the Cloudera VM to control versions,
                           options, etc.
                    • ssh won’t use your key pair if its file permissions
                           are too lax
                           $ chmod og-rwx rstudio-ec2.pem


                    • Log in as root to your domain name
                           $ ssh -i rstudio-ec2.pem root@YOURDOMAINHERE.amazonaws.com
                                                        (from previous slide)



Saturday, March 10, 2012
Install R and RStudio
                • Create a user login for yourself (RStudio needs this)
                      # useradd jbreen
                      # passwd jbreen


                • EPEL is already installed, so R is easy
                      # yum -y install R


                • Follow RStudio’s download instructions
                      http://www.rstudio.org/download/server
                      # wget http://download2.rstudio.org/rstudio-server-0.95.262-x86_64.rpm

                      # rpm -Uvh rstudio-server-0.95.262-x86_64.rpm


                • Browse to port 8787 and use the login and password
                      e.g., http://ec2-107-22-109-130.compute-1.amazonaws.com:8787/



Saturday, March 10, 2012
Success!




Saturday, March 10, 2012
The meter’s running
                    • Amazon charges by the hour (or fraction
                           thereof). So when you’re done, you should
                           probably shutdown
                    • via command line
                           $ sudo shutdown -h now


                    • or with the “Stop” Instance Action in the
                           AWS Management Console
                    • (use “Terminate” if you never want to use it
                           again)


Saturday, March 10, 2012
Next up:
                           How to launch Hadoop
                            clusters in the cloud
                            without really trying



Saturday, March 10, 2012

More Related Content

What's hot

12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocratJonathan Linowes
 
How to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architectureHow to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architectureTiago Simões
 
Automating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David NalleyAutomating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David NalleyPuppet
 
Globus toolkit4installationguide
Globus toolkit4installationguideGlobus toolkit4installationguide
Globus toolkit4installationguideAdarsh Patil
 
Getting Started with Ansible
Getting Started with AnsibleGetting Started with Ansible
Getting Started with Ansibleahamilton55
 
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platformDrupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platformHector Iribarne
 
Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)Naseem Khoodoruth
 
Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Maarten Balliauw
 
Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with RJeffrey Breen
 
[Open infra] how to calculate the cloud system operating rate
[Open infra] how to calculate the cloud system operating rate[Open infra] how to calculate the cloud system operating rate
[Open infra] how to calculate the cloud system operating rateNalee Jang
 
[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike BackAlex Liu
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackke4qqq
 
Using OpenStack With Fog
Using OpenStack With FogUsing OpenStack With Fog
Using OpenStack With FogMike Hagedorn
 
Architecting cloud
Architecting cloudArchitecting cloud
Architecting cloudTahsin Hasan
 
An introduction to cgroups and cgroupspy
An introduction to cgroups and cgroupspyAn introduction to cgroups and cgroupspy
An introduction to cgroups and cgroupspyvpetersson
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.Graham Dumpleton
 
Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & ExtensibilityCloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & ExtensibilityClouderaUserGroups
 
Running your Java EE 6 applications in the cloud
Running your Java EE 6 applications in the cloudRunning your Java EE 6 applications in the cloud
Running your Java EE 6 applications in the cloudArun Gupta
 

What's hot (20)

12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat12 core technologies you should learn, love, and hate to be a 'real' technocrat
12 core technologies you should learn, love, and hate to be a 'real' technocrat
 
How to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architectureHow to implement a gdpr solution in a cloudera architecture
How to implement a gdpr solution in a cloudera architecture
 
Automating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David NalleyAutomating CloudStack with Puppet - David Nalley
Automating CloudStack with Puppet - David Nalley
 
Globus toolkit4installationguide
Globus toolkit4installationguideGlobus toolkit4installationguide
Globus toolkit4installationguide
 
Getting Started with Ansible
Getting Started with AnsibleGetting Started with Ansible
Getting Started with Ansible
 
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platformDrupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platform
 
Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)
 
Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...
 
Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with R
 
[Open infra] how to calculate the cloud system operating rate
[Open infra] how to calculate the cloud system operating rate[Open infra] how to calculate the cloud system operating rate
[Open infra] how to calculate the cloud system operating rate
 
[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
 
Using OpenStack With Fog
Using OpenStack With FogUsing OpenStack With Fog
Using OpenStack With Fog
 
Architecting cloud
Architecting cloudArchitecting cloud
Architecting cloud
 
An introduction to cgroups and cgroupspy
An introduction to cgroups and cgroupspyAn introduction to cgroups and cgroupspy
An introduction to cgroups and cgroupspy
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Apache Cassandra and Go
Apache Cassandra and GoApache Cassandra and Go
Apache Cassandra and Go
 
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
 
Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & ExtensibilityCloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility
 
Running your Java EE 6 applications in the cloud
Running your Java EE 6 applications in the cloudRunning your Java EE 6 applications in the cloud
Running your Java EE 6 applications in the cloud
 

Similar to Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2

Hosting Drupal on Amazon EC2
Hosting Drupal on Amazon EC2Hosting Drupal on Amazon EC2
Hosting Drupal on Amazon EC2Kornel Lugosi
 
Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2cmcavoy
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud
 
Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Tom Laszewski
 
Austin Web Architecture
Austin Web ArchitectureAustin Web Architecture
Austin Web Architecturejoaquincasares
 
The Future is Now: Leveraging the Cloud with Ruby
The Future is Now: Leveraging the Cloud with RubyThe Future is Now: Leveraging the Cloud with Ruby
The Future is Now: Leveraging the Cloud with RubyRobert Dempsey
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesVladimir Simek
 
Sqlsat154 maintain your dbs with help from ola hallengren
Sqlsat154 maintain your dbs with help from ola hallengrenSqlsat154 maintain your dbs with help from ola hallengren
Sqlsat154 maintain your dbs with help from ola hallengrenAndy Galbraith
 
3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMRFaizan Javed
 
Data in the Azure Cloud, by Julie Lerman
Data in the Azure Cloud, by Julie LermanData in the Azure Cloud, by Julie Lerman
Data in the Azure Cloud, by Julie LermanJulie Lerman
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Sujee Maniyam
 
MongoDB - Who, What & Where!
MongoDB - Who, What & Where!MongoDB - Who, What & Where!
MongoDB - Who, What & Where!Mark Hillick
 
Cloud Computing Bootcamp On The Google App Engine [v1.1]
Cloud Computing Bootcamp On The Google App Engine [v1.1]Cloud Computing Bootcamp On The Google App Engine [v1.1]
Cloud Computing Bootcamp On The Google App Engine [v1.1]Matthew McCullough
 
Creating Your Own Static Website Generator
Creating Your Own Static Website GeneratorCreating Your Own Static Website Generator
Creating Your Own Static Website GeneratorSean O'Mahoney
 
Getting started with AWS
Getting started with AWSGetting started with AWS
Getting started with AWSJungwon Seo
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for DummiesRodney Joyce
 
Continuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:InventContinuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:InventJohn Schneider
 
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Amazon Web Services
 

Similar to Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2 (20)

Debian on EC2
Debian on EC2Debian on EC2
Debian on EC2
 
Hosting Drupal on Amazon EC2
Hosting Drupal on Amazon EC2Hosting Drupal on Amazon EC2
Hosting Drupal on Amazon EC2
 
Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2Day of Cloud: Amazon EC2
Day of Cloud: Amazon EC2
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
 
Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Oracle on AWS partner webinar series
Oracle on AWS partner webinar series
 
Austin Web Architecture
Austin Web ArchitectureAustin Web Architecture
Austin Web Architecture
 
The Future is Now: Leveraging the Cloud with Ruby
The Future is Now: Leveraging the Cloud with RubyThe Future is Now: Leveraging the Cloud with Ruby
The Future is Now: Leveraging the Cloud with Ruby
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
Sqlsat154 maintain your dbs with help from ola hallengren
Sqlsat154 maintain your dbs with help from ola hallengrenSqlsat154 maintain your dbs with help from ola hallengren
Sqlsat154 maintain your dbs with help from ola hallengren
 
3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR3rd meetup - Intro to Amazon EMR
3rd meetup - Intro to Amazon EMR
 
Data in the Azure Cloud, by Julie Lerman
Data in the Azure Cloud, by Julie LermanData in the Azure Cloud, by Julie Lerman
Data in the Azure Cloud, by Julie Lerman
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
MongoDB - Who, What & Where!
MongoDB - Who, What & Where!MongoDB - Who, What & Where!
MongoDB - Who, What & Where!
 
Cloud Computing Bootcamp On The Google App Engine [v1.1]
Cloud Computing Bootcamp On The Google App Engine [v1.1]Cloud Computing Bootcamp On The Google App Engine [v1.1]
Cloud Computing Bootcamp On The Google App Engine [v1.1]
 
Creating Your Own Static Website Generator
Creating Your Own Static Website GeneratorCreating Your Own Static Website Generator
Creating Your Own Static Website Generator
 
Getting started with AWS
Getting started with AWSGetting started with AWS
Getting started with AWS
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Continuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:InventContinuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:Invent
 
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
 
Harnessing The Cloud
Harnessing The CloudHarnessing The Cloud
Harnessing The Cloud
 

More from Jeffrey Breen

Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & HadoopJeffrey Breen
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeJeffrey Breen
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from RJeffrey Breen
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in RJeffrey Breen
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterJeffrey Breen
 
FAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewFAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewJeffrey Breen
 

More from Jeffrey Breen (8)

Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R code
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from R
 
Reshaping Data in R
Reshaping Data in RReshaping Data in R
Reshaping Data in R
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
 
FAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewFAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overview
 

Recently uploaded

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2

  • 1. Big Data Step-by-Step Boston Predictive Analytics Big Data Workshop Microsoft New England Research & Development Center, Cambridge, MA Saturday, March 10, 2012 by Jeffrey Breen President and Co-Founder http://atms.gr/bigdata0310 Atmosphere Research Group email: jeffrey@atmosgrp.com Twitter: @JeffreyBreen Saturday, March 10, 2012
  • 2. n ee d a just AM mo re R little Big Data Infrastructure Part 2: Running R + RStudio on Amazon EC2 Code & more on github: https://github.com/jeffreybreen/tutorial-201203-big-data Saturday, March 10, 2012
  • 3. Overview • Sometimes you just need a little more RAM, CPU, or disk space than you have • Let’s try launching an instance on Amazon EC2 and configuring it to do some work • We’ll install R and RStudio and call it a day Saturday, March 10, 2012
  • 4. Some details we’ll skip • Signing up (it’s not that hard) http://aws.amazon.com/ec2/ • Pricing (it keeps dropping) http://aws.amazon.com/ec2/pricing/ • The alphabet soup of services (we care about EC2 computing and S3 storage) Saturday, March 10, 2012
  • 5. Just look for biggest button on the page... Saturday, March 10, 2012
  • 6. Select an Amazon Machine Image ami-7385461a is a good, recent 64-bit CentOS image published by RightScale Saturday, March 10, 2012
  • 7. Only use EBS images • Instance-storage machines lose their data upon shutdown (termination) • EBS instances can be stopped and restarted, or terminated when you’re done forever Saturday, March 10, 2012
  • 8. Pick a size See http://aws.amazon.com/ec2/instance-types/ Already out of date! Amazon introduced new “m1.medium” instance type this week. Saturday, March 10, 2012
  • 9. Avoid Premature Termination Set Termination Protection + Shutdown Behavior Saturday, March 10, 2012
  • 11. Create a key pair Don’t forget to download it (and keep it safe!) Saturday, March 10, 2012
  • 12. Create a Security Group All TCP, UDP and ICMP from your IP address Saturday, March 10, 2012
  • 13. Don’t know your IP address? Don’t ask me. Ask Google! (simply append “/32” when entering into firewall rules) Saturday, March 10, 2012
  • 14. 3... 2... 1... Saturday, March 10, 2012
  • 15. State = running Up and running at specified domain name Saturday, March 10, 2012
  • 16. Time to get all command line • You’ll need an ssh client and the key pair we generated in order to connect with your instance • We’ll use the Cloudera VM to control versions, options, etc. • ssh won’t use your key pair if its file permissions are too lax $ chmod og-rwx rstudio-ec2.pem • Log in as root to your domain name $ ssh -i rstudio-ec2.pem root@YOURDOMAINHERE.amazonaws.com (from previous slide) Saturday, March 10, 2012
  • 17. Install R and RStudio • Create a user login for yourself (RStudio needs this) # useradd jbreen # passwd jbreen • EPEL is already installed, so R is easy # yum -y install R • Follow RStudio’s download instructions http://www.rstudio.org/download/server # wget http://download2.rstudio.org/rstudio-server-0.95.262-x86_64.rpm # rpm -Uvh rstudio-server-0.95.262-x86_64.rpm • Browse to port 8787 and use the login and password e.g., http://ec2-107-22-109-130.compute-1.amazonaws.com:8787/ Saturday, March 10, 2012
  • 19. The meter’s running • Amazon charges by the hour (or fraction thereof). So when you’re done, you should probably shutdown • via command line $ sudo shutdown -h now • or with the “Stop” Instance Action in the AWS Management Console • (use “Terminate” if you never want to use it again) Saturday, March 10, 2012
  • 20. Next up: How to launch Hadoop clusters in the cloud without really trying Saturday, March 10, 2012