SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
Data Computing Division
Hadoop Hands On
Session
Milind Bhandarkar
Greenplum,A Division of EMC
Monday, February 18, 13
Data Computing Division
Prerequisites
•Make sure you haveVMWare player installed
•VMWare Fusion for Mac OS X
•Copy the GPHD (Greenplum Distribution of
Hadoop v 1.0) virtual machine to your
laptop
•Also copy exercise.zip file to your laptop,
and decompress
Monday, February 18, 13
Data Computing Division
Setting Up
•Start GPHDVirtual Machine
•Make sure you can login to it
•Copy exercise.zip from your laptop to the
VM, and unzip in ~/exercise
Monday, February 18, 13
Data Computing Division
Preparation
•Make sure HDFS is running
•Make sure MapReduce is running
•Check configuration files *-site.xml
Monday, February 18, 13
Data Computing Division
Hands-On
•Objective: Implement Linear Regression using
MapReduce, and use it to train a model
•Data Set: from Marine Resources Division,
Department of Primary Industries and
Fisheries,Tasmania
•4177 samples from observations
Monday, February 18, 13
Data Computing Division
Data
•Attributes about a type of fish
•M/F, Length, Diameter, Height,Weight,
Rings on shell
•Problem:To predict number of rings as a
function of other attributes
Monday, February 18, 13
Data Computing Division
Step 1
•Copy the small sample data set to HDFS
•See: Scripts/cp_to_grid.sh
Monday, February 18, 13
Data Computing Division
Step 2
•Blow up the dataset 1000 times by adding
gaussian noise to most fields
•Output: 4M sample observations
•Using Hadoop Streaming
•See: Scripts/stream_replicate.sh
•Monitor this job in JobTracker UI
Monday, February 18, 13
Data Computing Division
Step 3
•Train model based on Linear Regression
•See: Scripts/stream_train_linreg.sh
•Monitor the Job
•Copy the model to a local directory
•Check it
Monday, February 18, 13

Contenu connexe

En vedette

Ինչպիսին պետք է լինի
Ինչպիսին պետք է լինիԻնչպիսին պետք է լինի
Ինչպիսին պետք է լինիtatevabrahamyan
 
Insaat kursu-bakirkoy
Insaat kursu-bakirkoyInsaat kursu-bakirkoy
Insaat kursu-bakirkoysersld54
 
Changing the Security Monitoring Status Quo
Changing the Security Monitoring Status QuoChanging the Security Monitoring Status Quo
Changing the Security Monitoring Status QuoEMC
 
Protectora d'animals_Xènia, Malina i Gemma
Protectora d'animals_Xènia, Malina i GemmaProtectora d'animals_Xènia, Malina i Gemma
Protectora d'animals_Xènia, Malina i Gemmamgonellgomez
 
Ablation material book
Ablation material   bookAblation material   book
Ablation material bookRahman Hakim
 
Hvad koster stress?
Hvad koster stress?Hvad koster stress?
Hvad koster stress?roddik
 
Manage vm’s and services across private clouds and windows azure with system ...
Manage vm’s and services across private clouds and windows azure with system ...Manage vm’s and services across private clouds and windows azure with system ...
Manage vm’s and services across private clouds and windows azure with system ...Microsoft TechNet - Belgium and Luxembourg
 
Advance DNA sequencing
Advance DNA sequencing Advance DNA sequencing
Advance DNA sequencing Asheesh Pandey
 

En vedette (12)

Ինչպիսին պետք է լինի
Ինչպիսին պետք է լինիԻնչպիսին պետք է լինի
Ինչպիսին պետք է լինի
 
Insaat kursu-bakirkoy
Insaat kursu-bakirkoyInsaat kursu-bakirkoy
Insaat kursu-bakirkoy
 
Changing the Security Monitoring Status Quo
Changing the Security Monitoring Status QuoChanging the Security Monitoring Status Quo
Changing the Security Monitoring Status Quo
 
Protectora d'animals_Xènia, Malina i Gemma
Protectora d'animals_Xènia, Malina i GemmaProtectora d'animals_Xènia, Malina i Gemma
Protectora d'animals_Xènia, Malina i Gemma
 
Yourprezi
YourpreziYourprezi
Yourprezi
 
Forex graphs
Forex graphsForex graphs
Forex graphs
 
Topic 9 final accounts
Topic 9 final accountsTopic 9 final accounts
Topic 9 final accounts
 
Ablation material book
Ablation material   bookAblation material   book
Ablation material book
 
Hvad koster stress?
Hvad koster stress?Hvad koster stress?
Hvad koster stress?
 
Manage vm’s and services across private clouds and windows azure with system ...
Manage vm’s and services across private clouds and windows azure with system ...Manage vm’s and services across private clouds and windows azure with system ...
Manage vm’s and services across private clouds and windows azure with system ...
 
3349
33493349
3349
 
Advance DNA sequencing
Advance DNA sequencing Advance DNA sequencing
Advance DNA sequencing
 

Similaire à Hadoop Hands-On by @techmilind

Managing forestry operations
Managing forestry operationsManaging forestry operations
Managing forestry operationsSimon Mercier
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop InstallMike Frampton
 
Back to FME School - Day 3: Expanding Frontiers
Back to FME School - Day 3: Expanding FrontiersBack to FME School - Day 3: Expanding Frontiers
Back to FME School - Day 3: Expanding FrontiersSafe Software
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaTim Ellison
 
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...Athens Big Data
 
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...Veritas Technologies LLC
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scalesparktc
 
Help your Enterprise Implement Big Data with Control-M for Hadoop
 Help your Enterprise Implement Big Data with Control-M for Hadoop Help your Enterprise Implement Big Data with Control-M for Hadoop
Help your Enterprise Implement Big Data with Control-M for HadoopBMC Software
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...Hirofumi Hayashi
 
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devices
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devicesIct 9 module 3, lesson 1.5 materials, tools, equipment and testing devices
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devicesYonel Cadapan
 
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster IJECEIAES
 
TechEvent Operating MapR Hadoop Cluster for a year
TechEvent Operating MapR Hadoop Cluster for a yearTechEvent Operating MapR Hadoop Cluster for a year
TechEvent Operating MapR Hadoop Cluster for a yearTrivadis
 
Infrastructure Management in GCP
Infrastructure Management in GCPInfrastructure Management in GCP
Infrastructure Management in GCPDana Hoffman
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Backup and Disaster Recovery Product
Backup and Disaster Recovery ProductBackup and Disaster Recovery Product
Backup and Disaster Recovery ProductPrabhas Gupte
 
Deploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise EnvironmentsDeploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise Environmentsinovex GmbH
 
Best Practices: Migrating a Postgres Production Database to the Cloud
Best Practices: Migrating a Postgres Production Database to the CloudBest Practices: Migrating a Postgres Production Database to the Cloud
Best Practices: Migrating a Postgres Production Database to the CloudEDB
 

Similaire à Hadoop Hands-On by @techmilind (20)

Managing forestry operations
Managing forestry operationsManaging forestry operations
Managing forestry operations
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop Install
 
Back to FME School - Day 3: Expanding Frontiers
Back to FME School - Day 3: Expanding FrontiersBack to FME School - Day 3: Expanding Frontiers
Back to FME School - Day 3: Expanding Frontiers
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with Java
 
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
 
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
Unlocking the Full Power of Your Backup Data with Veritas NetBackup Data Virt...
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
 
Instalação geo ip
Instalação geo ipInstalação geo ip
Instalação geo ip
 
Help your Enterprise Implement Big Data with Control-M for Hadoop
 Help your Enterprise Implement Big Data with Control-M for Hadoop Help your Enterprise Implement Big Data with Control-M for Hadoop
Help your Enterprise Implement Big Data with Control-M for Hadoop
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...
Use case of Disaster Management System by using Geopaparazzi and MapGuide Ope...
 
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devices
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devicesIct 9 module 3, lesson 1.5 materials, tools, equipment and testing devices
Ict 9 module 3, lesson 1.5 materials, tools, equipment and testing devices
 
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
 
TechEvent Operating MapR Hadoop Cluster for a year
TechEvent Operating MapR Hadoop Cluster for a yearTechEvent Operating MapR Hadoop Cluster for a year
TechEvent Operating MapR Hadoop Cluster for a year
 
Infrastructure Management in GCP
Infrastructure Management in GCPInfrastructure Management in GCP
Infrastructure Management in GCP
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Backup and Disaster Recovery Product
Backup and Disaster Recovery ProductBackup and Disaster Recovery Product
Backup and Disaster Recovery Product
 
Deploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise EnvironmentsDeploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise Environments
 
Best Practices: Migrating a Postgres Production Database to the Cloud
Best Practices: Migrating a Postgres Production Database to the CloudBest Practices: Migrating a Postgres Production Database to the Cloud
Best Practices: Migrating a Postgres Production Database to the Cloud
 

Plus de EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewEMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 

Plus de EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 

Dernier

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Dernier (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Hadoop Hands-On by @techmilind

  • 1. Data Computing Division Hadoop Hands On Session Milind Bhandarkar Greenplum,A Division of EMC Monday, February 18, 13
  • 2. Data Computing Division Prerequisites •Make sure you haveVMWare player installed •VMWare Fusion for Mac OS X •Copy the GPHD (Greenplum Distribution of Hadoop v 1.0) virtual machine to your laptop •Also copy exercise.zip file to your laptop, and decompress Monday, February 18, 13
  • 3. Data Computing Division Setting Up •Start GPHDVirtual Machine •Make sure you can login to it •Copy exercise.zip from your laptop to the VM, and unzip in ~/exercise Monday, February 18, 13
  • 4. Data Computing Division Preparation •Make sure HDFS is running •Make sure MapReduce is running •Check configuration files *-site.xml Monday, February 18, 13
  • 5. Data Computing Division Hands-On •Objective: Implement Linear Regression using MapReduce, and use it to train a model •Data Set: from Marine Resources Division, Department of Primary Industries and Fisheries,Tasmania •4177 samples from observations Monday, February 18, 13
  • 6. Data Computing Division Data •Attributes about a type of fish •M/F, Length, Diameter, Height,Weight, Rings on shell •Problem:To predict number of rings as a function of other attributes Monday, February 18, 13
  • 7. Data Computing Division Step 1 •Copy the small sample data set to HDFS •See: Scripts/cp_to_grid.sh Monday, February 18, 13
  • 8. Data Computing Division Step 2 •Blow up the dataset 1000 times by adding gaussian noise to most fields •Output: 4M sample observations •Using Hadoop Streaming •See: Scripts/stream_replicate.sh •Monitor this job in JobTracker UI Monday, February 18, 13
  • 9. Data Computing Division Step 3 •Train model based on Linear Regression •See: Scripts/stream_train_linreg.sh •Monitor the Job •Copy the model to a local directory •Check it Monday, February 18, 13