SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
© inovex Academy
Hadoop &
map-reduce
1
© inovex Academy
Speakers
1
Dr. Kathrin Spreyer
Big Data Engineer
Patrick Thoma
Head of Solution Development
© inovex Academy
Inevitable hadoop
2004: Google MapReduce paper
2006: Hadoop team around Doug Cutting at Yahoo!
2010/11: IBM’s Watson
2011/12: Hadoop connectors for Oracle products
Oct 2012: Microsoft (connectors f. Azure, HDInsights)
Oct 2012: SAP (cooperation w/ support companies)
3
© inovex Academy
Motivation
1. sample use case: logfile analytics @ 1&1
2. 80 TB/month to be processed
3. too slow on existing hardware
4. further scaling not possible -- or extremely expensive
4
© inovex Academy
Amazing performance improvement
4
© inovex Academy
Overview
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing
6
© inovex Academy
What?
1. framework for distributed data
processing
2. highly scalable: TBs and PBs
3. originated at Google
4. open-source implementation:
Apache Hadoop
7
© inovex Academy
The big picture
8
input
© inovex Academy
The big picture
8
© inovex Academy
Why?
1. too much data for one machine
2. processing speed
3. scaling out vs. scaling up
9
Photo by Flo P.
© inovex Academy 14
HDFS
(hadoop distributed file system)
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing
© inovex Academy
Apis
20
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing
© inovex Academy
Basic map-reduce Apis
1. Java
2. C++ (Pipes)
3. Python (Dumbo)
4. streaming (any language)
21
© inovex Academy
Higher-level Apis
1. Apache Pig (data flow language)
2. Apache Hive (SQL dialect)
22
alternative:
graphical ETL tools,
e.g., Pentaho Data Integration
© inovex Academy
Cluster sizing
23
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing
© inovex Academy
Network topology
1. single data center
2. rack topology
3. bandwidth
25
© inovex Academy
Questions?
26
© inovex Academy
Contact:
bigdata@inovex.de
27

Contenu connexe

En vedette

So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)
So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)
So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)
EmPower Research, a Genpact company
 
Teen Age Relationships With Parents
Teen Age Relationships With ParentsTeen Age Relationships With Parents
Teen Age Relationships With Parents
Raj Tilak
 

En vedette (8)

So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)
So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)
So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)
 
Good Design is Good Business: Business Design with RSA and SA
Good Design is Good Business: Business Design with RSA and SAGood Design is Good Business: Business Design with RSA and SA
Good Design is Good Business: Business Design with RSA and SA
 
2013 Good Design is Good Business mobile and RSA
2013 Good Design is Good Business mobile and RSA2013 Good Design is Good Business mobile and RSA
2013 Good Design is Good Business mobile and RSA
 
Social Media In Financial Services London Conference
Social Media In Financial Services London ConferenceSocial Media In Financial Services London Conference
Social Media In Financial Services London Conference
 
University of Miami Briefing: DevOps Steer – an agile response to customer fe...
University of Miami Briefing: DevOps Steer – an agile response to customer fe...University of Miami Briefing: DevOps Steer – an agile response to customer fe...
University of Miami Briefing: DevOps Steer – an agile response to customer fe...
 
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain KnowledgeFinding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge
 
Teen Age Relationships With Parents
Teen Age Relationships With ParentsTeen Age Relationships With Parents
Teen Age Relationships With Parents
 
Service Oriented Computing - Session1 : Intro
Service Oriented Computing - Session1 : IntroService Oriented Computing - Session1 : Intro
Service Oriented Computing - Session1 : Intro
 

Similaire à Webinar - Big Data: Einführung in Hadoop und MapReduce

INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
Krishna Sujeer
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
Hortonworks
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
yhadoop
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 

Similaire à Webinar - Big Data: Einführung in Hadoop und MapReduce (20)

Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
big data
big databig data
big data
 
Hadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix itHadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix it
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hadoop
HadoopHadoop
Hadoop
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute Beginners
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Data Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In ActionData Warehouse on Hadoop Based System In Action
Data Warehouse on Hadoop Based System In Action
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 

Plus de inovex GmbH

Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
inovex GmbH
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
inovex GmbH
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
inovex GmbH
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
inovex GmbH
 

Plus de inovex GmbH (20)

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegen
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AI
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolution
 
WWDC 2019 Recap
WWDC 2019 RecapWWDC 2019 Recap
WWDC 2019 Recap
 
Network Policies
Network PoliciesNetwork Policies
Network Policies
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungen
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeten
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetes
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Azure IoT Edge
Azure IoT EdgeAzure IoT Edge
Azure IoT Edge
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenten
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?
 
Dev + Ops = Go
Dev + Ops = GoDev + Ops = Go
Dev + Ops = Go
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Project
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madness
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
 

Webinar - Big Data: Einführung in Hadoop und MapReduce