SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
© inovex Academy
Hadoop &
map-reduce
1
© inovex Academy
Speakers
1
Dr. Kathrin Spreyer
Big Data Engineer
Patrick Thoma
Head of Solution Development
© inovex Academy
Inevitable hadoop
2004: Google MapReduce paper
2006: Hadoop team around Doug Cutting at Yahoo!
2010/11: IBM’s Watson
2011/12: Hadoop connectors for Oracle products
Oct 2012: Microsoft (connectors f. Azure, HDInsights)
Oct 2012: SAP (cooperation w/ support companies)
3
© inovex Academy
Motivation
1. sample use case: logfile analytics @ 1&1
2. 80 TB/month to be processed
3. too slow on existing hardware
4. further scaling not possible -- or extremely expensive
4
© inovex Academy
Amazing performance improvement
4
© inovex Academy
Overview
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing
6
© inovex Academy
What?
1. framework for distributed data
processing
2. highly scalable: TBs and PBs
3. originated at Google
4. open-source implementation:
Apache Hadoop
7
© inovex Academy
The big picture
8
input
© inovex Academy
The big picture
8
© inovex Academy
Why?
1. too much data for one machine
2. processing speed
3. scaling out vs. scaling up
9
Photo by Flo P.
© inovex Academy 14
HDFS
(hadoop distributed file system)
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing
© inovex Academy
Apis
20
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing
© inovex Academy
Basic map-reduce Apis
1. Java
2. C++ (Pipes)
3. Python (Dumbo)
4. streaming (any language)
21
© inovex Academy
Higher-level Apis
1. Apache Pig (data flow language)
2. Apache Hive (SQL dialect)
22
alternative:
graphical ETL tools,
e.g., Pentaho Data Integration
© inovex Academy
Cluster sizing
23
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing
© inovex Academy
Network topology
1. single data center
2. rack topology
3. bandwidth
25
© inovex Academy
Questions?
26
© inovex Academy
Contact:
bigdata@inovex.de
27

Contenu connexe

En vedette

So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)
So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)
So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)EmPower Research, a Genpact company
 
Good Design is Good Business: Business Design with RSA and SA
Good Design is Good Business: Business Design with RSA and SAGood Design is Good Business: Business Design with RSA and SA
Good Design is Good Business: Business Design with RSA and SARoger Snook
 
2013 Good Design is Good Business mobile and RSA
2013 Good Design is Good Business mobile and RSA2013 Good Design is Good Business mobile and RSA
2013 Good Design is Good Business mobile and RSARoger Snook
 
University of Miami Briefing: DevOps Steer – an agile response to customer fe...
University of Miami Briefing: DevOps Steer – an agile response to customer fe...University of Miami Briefing: DevOps Steer – an agile response to customer fe...
University of Miami Briefing: DevOps Steer – an agile response to customer fe...Roger Snook
 
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain KnowledgeFinding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain KnowledgeEmPower Research, a Genpact company
 
Teen Age Relationships With Parents
Teen Age Relationships With ParentsTeen Age Relationships With Parents
Teen Age Relationships With ParentsRaj Tilak
 

En vedette (8)

So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)
So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)
So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)
 
Good Design is Good Business: Business Design with RSA and SA
Good Design is Good Business: Business Design with RSA and SAGood Design is Good Business: Business Design with RSA and SA
Good Design is Good Business: Business Design with RSA and SA
 
2013 Good Design is Good Business mobile and RSA
2013 Good Design is Good Business mobile and RSA2013 Good Design is Good Business mobile and RSA
2013 Good Design is Good Business mobile and RSA
 
Social Media In Financial Services London Conference
Social Media In Financial Services London ConferenceSocial Media In Financial Services London Conference
Social Media In Financial Services London Conference
 
University of Miami Briefing: DevOps Steer – an agile response to customer fe...
University of Miami Briefing: DevOps Steer – an agile response to customer fe...University of Miami Briefing: DevOps Steer – an agile response to customer fe...
University of Miami Briefing: DevOps Steer – an agile response to customer fe...
 
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain KnowledgeFinding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
 
Teen Age Relationships With Parents
Teen Age Relationships With ParentsTeen Age Relationships With Parents
Teen Age Relationships With Parents
 
Service Oriented Computing - Session1 : Intro
Service Oriented Computing - Session1 : IntroService Oriented Computing - Session1 : Intro
Service Oriented Computing - Session1 : Intro
 

Similaire à Webinar - Big Data: Einführung in Hadoop und MapReduce

Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Arohi Khandelwal
 
Hadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix itHadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix itKognitio
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsightNaoki (Neo) SATO
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersSam Dias
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPKrishna Sujeer
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Data Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionData Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionFrank Y
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 

Similaire à Webinar - Big Data: Einführung in Hadoop und MapReduce (20)

big data
big databig data
big data
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Hadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix itHadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix it
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hadoop
HadoopHadoop
Hadoop
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute Beginners
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Data Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionData Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In Action
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 

Plus de inovex GmbH

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegeninovex GmbH
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIinovex GmbH
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolutioninovex GmbH
 
Network Policies
Network PoliciesNetwork Policies
Network Policiesinovex GmbH
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learninginovex GmbH
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungeninovex GmbH
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeteninovex GmbH
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetesinovex GmbH
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreiheninovex GmbH
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenteninovex GmbH
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?inovex GmbH
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Projectinovex GmbH
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretabilityinovex GmbH
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseinovex GmbH
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessinovex GmbH
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumiinovex GmbH
 

Plus de inovex GmbH (20)

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegen
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AI
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolution
 
WWDC 2019 Recap
WWDC 2019 RecapWWDC 2019 Recap
WWDC 2019 Recap
 
Network Policies
Network PoliciesNetwork Policies
Network Policies
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungen
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeten
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetes
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Azure IoT Edge
Azure IoT EdgeAzure IoT Edge
Azure IoT Edge
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenten
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?
 
Dev + Ops = Go
Dev + Ops = GoDev + Ops = Go
Dev + Ops = Go
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Project
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madness
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
 

Webinar - Big Data: Einführung in Hadoop und MapReduce