Webinar - Big Data: Einführung in Hadoop und MapReduce

•

1 j'aime•1,180 vues

This document provides an overview of Hadoop and MapReduce. It discusses the origins and evolution of Hadoop from Google's MapReduce paper in 2004 to connectors from IBM, Microsoft and SAP in later years. It also presents a sample use case of using Hadoop for logfile analytics processing 80 TB of data per month. The key components of Hadoop discussed include MapReduce, HDFS, APIs and cluster sizing.

© inovex Academy
Speakers
1
Dr. Kathrin Spreyer
Big Data Engineer
Patrick Thoma
Head of Solution Development

© inovex Academy
Inevitable hadoop
2004: Google MapReduce paper
2006: Hadoop team around Doug Cutting at Yahoo!
2010/11: IBM’s Watson
2011/12: Hadoop connectors for Oracle products
Oct 2012: Microsoft (connectors f. Azure, HDInsights)
Oct 2012: SAP (cooperation w/ support companies)
3

© inovex Academy
Motivation
1. sample use case: logﬁle analytics @ 1&1
2. 80 TB/month to be processed
3. too slow on existing hardware
4. further scaling not possible -- or extremely expensive
4

© inovex Academy
Amazing performance improvement
4

© inovex Academy
Overview
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing
6

© inovex Academy
What?
1. framework for distributed data
processing
2. highly scalable: TBs and PBs
3. originated at Google
4. open-source implementation:
Apache Hadoop
7

© inovex Academy
The big picture
8
input

© inovex Academy
Why?
1. too much data for one machine
2. processing speed
3. scaling out vs. scaling up
9
Photo by Flo P.

© inovex Academy 14
HDFS
(hadoop distributed file system)
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing

© inovex Academy
Apis
20
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing

© inovex Academy
Basic map-reduce Apis
1. Java
2. C++ (Pipes)
3. Python (Dumbo)
4. streaming (any language)
21

© inovex Academy
Higher-level Apis
1. Apache Pig (data ﬂow language)
2. Apache Hive (SQL dialect)
22
alternative:
graphical ETL tools,
e.g., Pentaho Data Integration

© inovex Academy
Cluster sizing
23
1. Map-Reduce
2. HDFS
3. APIs
4. Cluster sizing

© inovex Academy
Network topology
1. single data center
2. rack topology
3. bandwidth
25

© inovex Academy
Contact:
bigdata@inovex.de
27

Recommandé

Hadoop UK Strata Panel Discussionhuguk

HadoopZubair Arshad

Starfish-A self tuning system for bigdata analyticssai Pramoda

Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...Matt Stubbs

Hadoop for Java ProfessionalsEdureka!

Data science with Windows Azure - A Brief IntroductionAdnan Masood

Getting more out of your big dataGeert Van Landeghem

Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed

Recommandé

Hadoop UK Strata Panel Discussionhuguk

HadoopZubair Arshad

Starfish-A self tuning system for bigdata analyticssai Pramoda

Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...Matt Stubbs

Hadoop for Java ProfessionalsEdureka!

Data science with Windows Azure - A Brief IntroductionAdnan Masood

Getting more out of your big dataGeert Van Landeghem

Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed

So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)EmPower Research, a Genpact company

Good Design is Good Business: Business Design with RSA and SARoger Snook

2013 Good Design is Good Business mobile and RSARoger Snook

Social Media In Financial Services London ConferenceEmPower Research, a Genpact company

University of Miami Briefing: DevOps Steer – an agile response to customer fe...Roger Snook

Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain KnowledgeEmPower Research, a Genpact company

Teen Age Relationships With ParentsRaj Tilak

Service Oriented Computing - Session1 : IntroMohamed Zakarya Abdelgawad

big dataArohi Khandelwal

Introduction of Big data and Hadoop Arohi Khandelwal

Hadoop's Problem and How to Fix itKognitio

M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana

[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsightNaoki (Neo) SATO

201305 hadoop jpl-v3Eric Baldeschwieler

Hadoopthisisnabin

Architecting the Future of Big Data and SearchHortonworks

Hadoop and Big Data for Absolute BeginnersSam Dias

Hadoop_Its_Not_Just_Internal_Storage_V14John Sing

INTRODUCTION TO BIG DATA HADOOPKrishna Sujeer

Oct 2011 CHADNUG Presentation on HadoopJosh Patterson

SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project

Transform Your Business with Big Data and Hortonworks Pactera_US

Contenu connexe

En vedette

So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)EmPower Research, a Genpact company

Good Design is Good Business: Business Design with RSA and SARoger Snook

2013 Good Design is Good Business mobile and RSARoger Snook

Social Media In Financial Services London ConferenceEmPower Research, a Genpact company

University of Miami Briefing: DevOps Steer – an agile response to customer fe...Roger Snook

Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain KnowledgeEmPower Research, a Genpact company

Teen Age Relationships With ParentsRaj Tilak

Service Oriented Computing - Session1 : IntroMohamed Zakarya Abdelgawad

En vedette (8)

So what did moms intend to buy? Mom Index for Purchase Intention (Jan-Jun2011)

Good Design is Good Business: Business Design with RSA and SA

2013 Good Design is Good Business mobile and RSA

Social Media In Financial Services London Conference

University of Miami Briefing: DevOps Steer – an agile response to customer fe...

Finding the “Sweet Spot”: Big Data, Smart Technology, and Domain Knowledge

Teen Age Relationships With Parents

Service Oriented Computing - Session1 : Intro

Similaire à Webinar - Big Data: Einführung in Hadoop und MapReduce

big dataArohi Khandelwal

Introduction of Big data and Hadoop Arohi Khandelwal

Hadoop's Problem and How to Fix itKognitio

M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana

[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsightNaoki (Neo) SATO

201305 hadoop jpl-v3Eric Baldeschwieler

Hadoopthisisnabin

Architecting the Future of Big Data and SearchHortonworks

Hadoop and Big Data for Absolute BeginnersSam Dias

Hadoop_Its_Not_Just_Internal_Storage_V14John Sing

INTRODUCTION TO BIG DATA HADOOPKrishna Sujeer

Oct 2011 CHADNUG Presentation on HadoopJosh Patterson

SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project

Transform Your Business with Big Data and Hortonworks Pactera_US

Transform You Business with Big Data and HortonworksHortonworks

Hadoop at Yahoo! -- University Talksyhadoop

Data Warehouse on Hadoop Based System In ActionFrank Y

Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services

Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks

Munich HUG 21.11.2013Emil Andreas Siemes

Similaire à Webinar - Big Data: Einführung in Hadoop und MapReduce (20)

big data

Introduction of Big data and Hadoop

Hadoop's Problem and How to Fix it

M. Florence Dayana - Hadoop Foundation for Analytics.pptx

[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight

201305 hadoop jpl-v3

Hadoop

Architecting the Future of Big Data and Search

Hadoop and Big Data for Absolute Beginners

Hadoop_Its_Not_Just_Internal_Storage_V14

INTRODUCTION TO BIG DATA HADOOP

Oct 2011 CHADNUG Presentation on Hadoop

SCAPE Information Day at BL - Large Scale Processing with Hadoop

Transform Your Business with Big Data and Hortonworks

Transform You Business with Big Data and Hortonworks

Hadoop at Yahoo! -- University Talks

Data Warehouse on Hadoop Based System In Action

Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...

Hortonworks - What's Possible with a Modern Data Architecture?

Munich HUG 21.11.2013

Plus de inovex GmbH

lldb – Debugger auf Abwegeninovex GmbH

Are you sure about that?! Uncertainty Quantification in AIinovex GmbH

Why natural language is next step in the AI evolutioninovex GmbH

WWDC 2019 Recapinovex GmbH

Network Policiesinovex GmbH

Interpretable Machine Learninginovex GmbH

Jenkins X – CI/CD in wolkigen Umgebungeninovex GmbH

AI auf Edge-Geraeteninovex GmbH

Prometheus on Kubernetesinovex GmbH

Deep Learning for Recommender Systemsinovex GmbH

Azure IoT Edgeinovex GmbH

Representation Learning von Zeitreiheninovex GmbH

Talk to me – Chatbots und digitale Assistenteninovex GmbH

Künstlich intelligent?inovex GmbH

Dev + Ops = Goinovex GmbH

Das Android Open Source Projectinovex GmbH

Machine Learning Interpretabilityinovex GmbH

Performance evaluation of GANs in a semisupervised OCR use caseinovex GmbH

People & Products – Lessons learned from the daily IT madnessinovex GmbH

Infrastructure as (real) Code – Manage your K8s resources with Pulumiinovex GmbH

Plus de inovex GmbH (20)

lldb – Debugger auf Abwegen

Are you sure about that?! Uncertainty Quantification in AI

Why natural language is next step in the AI evolution

WWDC 2019 Recap

Network Policies

Interpretable Machine Learning

Jenkins X – CI/CD in wolkigen Umgebungen

AI auf Edge-Geraeten

Prometheus on Kubernetes

Deep Learning for Recommender Systems

Azure IoT Edge

Representation Learning von Zeitreihen

Talk to me – Chatbots und digitale Assistenten

Künstlich intelligent?

Dev + Ops = Go

Das Android Open Source Project

Machine Learning Interpretability

Performance evaluation of GANs in a semisupervised OCR use case

People & Products – Lessons learned from the daily IT madness

Infrastructure as (real) Code – Manage your K8s resources with Pulumi

Webinar - Big Data: Einführung in Hadoop und MapReduce

3. © inovex Academy Inevitable hadoop 2004: Google MapReduce paper 2006: Hadoop team around Doug Cutting at Yahoo! 2010/11: IBM’s Watson 2011/12: Hadoop connectors for Oracle products Oct 2012: Microsoft (connectors f. Azure, HDInsights) Oct 2012: SAP (cooperation w/ support companies) 3

4. © inovex Academy Motivation 1. sample use case: logﬁle analytics @ 1&1 2. 80 TB/month to be processed 3. too slow on existing hardware 4. further scaling not possible -- or extremely expensive 4

7. © inovex Academy What? 1. framework for distributed data processing 2. highly scalable: TBs and PBs 3. originated at Google 4. open-source implementation: Apache Hadoop 7

10.

11.

12. © inovex Academy Why? 1. too much data for one machine 2. processing speed 3. scaling out vs. scaling up 9 Photo by Flo P.

13.

14.

15.

16.

17.

18.

19.

20.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

34. © inovex Academy Higher-level Apis 1. Apache Pig (data ﬂow language) 2. Apache Hive (SQL dialect) 22 alternative: graphical ETL tools, e.g., Pentaho Data Integration

36.