Submit Search
Upload
Hadoop Fundamentals I
•
1 like
•
1,431 views
Romeo Kienzler
Follow
IBM Innovation Center DACH/Zurich, Romeo Kienzler
Read less
Read more
Technology
Report
Share
Report
Share
1 of 145
Download now
Download to read offline
Recommended
2. hadoop fundamentals
2. hadoop fundamentals
Lokesh Ramaswamy
Hadoop Fundamentals
Hadoop Fundamentals
its_skm
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
Apache Hadoop
Apache Hadoop
Ajit Koti
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Apache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
Fadi Yousuf
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
Recommended
2. hadoop fundamentals
2. hadoop fundamentals
Lokesh Ramaswamy
Hadoop Fundamentals
Hadoop Fundamentals
its_skm
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
Apache Hadoop
Apache Hadoop
Ajit Koti
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Apache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
Fadi Yousuf
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
Hadoop 101
Hadoop 101
EMC
Hadoop
Hadoop
ABHIJEET RAJ
Introduction to Hadoop
Introduction to Hadoop
Ran Ziv
Presentation on Hadoop Technology
Presentation on Hadoop Technology
OpenDev
Big Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
Hadoop demo ppt
Hadoop demo ppt
Phil Young
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Allen Day, PhD
Hadoop scalability
Hadoop scalability
WANdisco Plc
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
Hadoop bigdata overview
Hadoop bigdata overview
harithakannan
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
Performance Issues on Hadoop Clusters
Performance Issues on Hadoop Clusters
Xiao Qin
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
Hadoop
Hadoop
Nishant Gandhi
Column Stores and Google BigQuery
Column Stores and Google BigQuery
Csaba Toth
Hadoop technology
Hadoop technology
tipanagiriharika
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
PPT on Hadoop
PPT on Hadoop
Shubham Parmar
Hadoop and big data
Hadoop and big data
Sharad Pandey
Hadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
BigData processing in the cloud – Guest Lecture - University of Applied Scien...
BigData processing in the cloud – Guest Lecture - University of Applied Scien...
Romeo Kienzler
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
Romeo Kienzler
More Related Content
What's hot
Hadoop 101
Hadoop 101
EMC
Hadoop
Hadoop
ABHIJEET RAJ
Introduction to Hadoop
Introduction to Hadoop
Ran Ziv
Presentation on Hadoop Technology
Presentation on Hadoop Technology
OpenDev
Big Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
Hadoop demo ppt
Hadoop demo ppt
Phil Young
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Allen Day, PhD
Hadoop scalability
Hadoop scalability
WANdisco Plc
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
Hadoop bigdata overview
Hadoop bigdata overview
harithakannan
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
Performance Issues on Hadoop Clusters
Performance Issues on Hadoop Clusters
Xiao Qin
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
Hadoop
Hadoop
Nishant Gandhi
Column Stores and Google BigQuery
Column Stores and Google BigQuery
Csaba Toth
Hadoop technology
Hadoop technology
tipanagiriharika
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
PPT on Hadoop
PPT on Hadoop
Shubham Parmar
Hadoop and big data
Hadoop and big data
Sharad Pandey
Hadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
What's hot
(20)
Hadoop 101
Hadoop 101
Hadoop
Hadoop
Introduction to Hadoop
Introduction to Hadoop
Presentation on Hadoop Technology
Presentation on Hadoop Technology
Big Data and Cloud Computing
Big Data and Cloud Computing
Hadoop demo ppt
Hadoop demo ppt
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Hadoop scalability
Hadoop scalability
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Hadoop bigdata overview
Hadoop bigdata overview
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Performance Issues on Hadoop Clusters
Performance Issues on Hadoop Clusters
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Hadoop
Hadoop
Column Stores and Google BigQuery
Column Stores and Google BigQuery
Hadoop technology
Hadoop technology
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
PPT on Hadoop
PPT on Hadoop
Hadoop and big data
Hadoop and big data
Hadoop ecosystem
Hadoop ecosystem
Similar to Hadoop Fundamentals I
BigData processing in the cloud – Guest Lecture - University of Applied Scien...
BigData processing in the cloud – Guest Lecture - University of Applied Scien...
Romeo Kienzler
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
Romeo Kienzler
JavaOne BOF 5957 Lightning Fast Access to Big Data
JavaOne BOF 5957 Lightning Fast Access to Big Data
Brian Martin
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Romeo Kienzler
EMC config Hadoop
EMC config Hadoop
solarisyougood
Pivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant Dance
EMC
In memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGain
Data Con LA
IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17
David Spurway
Operational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
DataWorks Summit
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
Richard McDougall
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Romeo Kienzler
Aug 2012 HUG: Random vs. Sequential
Aug 2012 HUG: Random vs. Sequential
Yahoo Developer Network
Inovação e equipes geograficamente distribuídas - Palestrante: Maíra Gatti
Inovação e equipes geograficamente distribuídas - Palestrante: Maíra Gatti
Rio Info
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of Things
Stephan Reimann
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Dave Stokes
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Cynthia Saracco
Big and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable Systems
Fred Melo
Presentation20130616
Presentation20130616
Adrian Warman
Big data nyu
Big data nyu
Edward Capriolo
The Central View of your Data with Postgres
The Central View of your Data with Postgres
EDB
Similar to Hadoop Fundamentals I
(20)
BigData processing in the cloud – Guest Lecture - University of Applied Scien...
BigData processing in the cloud – Guest Lecture - University of Applied Scien...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
JavaOne BOF 5957 Lightning Fast Access to Big Data
JavaOne BOF 5957 Lightning Fast Access to Big Data
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
EMC config Hadoop
EMC config Hadoop
Pivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant Dance
In memory computing principles by Mac Moore of GridGain
In memory computing principles by Mac Moore of GridGain
IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17
Operational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Aug 2012 HUG: Random vs. Sequential
Aug 2012 HUG: Random vs. Sequential
Inovação e equipes geograficamente distribuídas - Palestrante: Maíra Gatti
Inovação e equipes geograficamente distribuídas - Palestrante: Maíra Gatti
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of Things
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big and Fast Data - Building Infinitely Scalable Systems
Big and Fast Data - Building Infinitely Scalable Systems
Presentation20130616
Presentation20130616
Big data nyu
Big data nyu
The Central View of your Data with Postgres
The Central View of your Data with Postgres
More from Romeo Kienzler
Parallelization Stategies of DeepLearning Neural Network Training
Parallelization Stategies of DeepLearning Neural Network Training
Romeo Kienzler
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Romeo Kienzler
Love & Innovative technology presented by a technology pioneer and an AI expe...
Love & Innovative technology presented by a technology pioneer and an AI expe...
Romeo Kienzler
Blockchain Technology Book Vernisage
Blockchain Technology Book Vernisage
Romeo Kienzler
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Romeo Kienzler
IBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, Qatar
Romeo Kienzler
Apache SystemML - Declarative Large-Scale Machine Learning
Apache SystemML - Declarative Large-Scale Machine Learning
Romeo Kienzler
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Romeo Kienzler
DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoT
Romeo Kienzler
Geo Python16 keynote
Geo Python16 keynote
Romeo Kienzler
Real-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor Data
Romeo Kienzler
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Romeo Kienzler
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Romeo Kienzler
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
Romeo Kienzler
TDWI_DW2014_SQLNoSQL_DBAAS
TDWI_DW2014_SQLNoSQL_DBAAS
Romeo Kienzler
Cloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa Neddam
Romeo Kienzler
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
Romeo Kienzler
DBaaS Bluemix Meetup DACH 26.8.14
DBaaS Bluemix Meetup DACH 26.8.14
Romeo Kienzler
Cloud Databases, Developer Week Nuernberg 2014
Cloud Databases, Developer Week Nuernberg 2014
Romeo Kienzler
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Romeo Kienzler
More from Romeo Kienzler
(20)
Parallelization Stategies of DeepLearning Neural Network Training
Parallelization Stategies of DeepLearning Neural Network Training
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Love & Innovative technology presented by a technology pioneer and an AI expe...
Love & Innovative technology presented by a technology pioneer and an AI expe...
Blockchain Technology Book Vernisage
Blockchain Technology Book Vernisage
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
IBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, Qatar
Apache SystemML - Declarative Large-Scale Machine Learning
Apache SystemML - Declarative Large-Scale Machine Learning
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoT
Geo Python16 keynote
Geo Python16 keynote
Real-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor Data
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
TDWI_DW2014_SQLNoSQL_DBAAS
TDWI_DW2014_SQLNoSQL_DBAAS
Cloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa Neddam
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
DBaaS Bluemix Meetup DACH 26.8.14
DBaaS Bluemix Meetup DACH 26.8.14
Cloud Databases, Developer Week Nuernberg 2014
Cloud Databases, Developer Week Nuernberg 2014
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Recently uploaded
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Evaluating the top large language models.pdf
Evaluating the top large language models.pdf
ChristopherTHyatt
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
Recently uploaded
(20)
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Evaluating the top large language models.pdf
Evaluating the top large language models.pdf
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Hadoop Fundamentals I
1.
© 2013 IBM
Corporation1 AVNET – Hadoop Fundamentals I Romeo Kienzler IBM Innovation Center Zurich
2.
© 2013 IBM
Corporation2 1) Welcome 2) What is big data? 3) Introduction to Hadoop 4) BigInsights 5) Hadoop architecture 6) Lab 1 – Core Hadoop 7) MapReduce 8) Lab 2 – MapReduce 9) Pig, Jaql, Hive, BigSQL, SystemT/AQL 10) Lab 3 – Pig, Hive, and Jaql 11) Certification on BigDataUniversity Agenda
3.
© 2013 IBM
Corporation3 What is BIG data?
4.
© 2013 IBM
Corporation4 Traditional Business Intelligence / Data Warehousing ...60 percent, were unsatisfied with their data warehousing system.¹ ¹http://www.information-management.com/issues/20010601/3494-1.html
5.
© 2013 IBM
Corporation5 What is BIG data?
6.
© 2013 IBM
Corporation6 What is BIG data?
7.
© 2013 IBM
Corporation7 What is BIG data? Big Data Hadoop
8.
© 2013 IBM
Corporation8 What is BIG data? Business Intelligence Data Warehouse
9.
© 2013 IBM
Corporation9 Map-Reduce → Hadoop → BigInsights
10.
© 2013 IBM
Corporation1010 Why is Big Data important? Data AVAILABLE to an organization data an organization can PROCESS Missed opportunity Enterprises are “more blind” to new opportunities. Organizations are able to process less and less of the available data. 100 Millionen Tweets are posted every day, 35 hours of video are beeing uploaded every minute,6.1 x 10^12 text messages have been sent in 2011 and 247 x 10^9 E-Mails passed through the net. 80 % spam and viruses. => Prefiltering is more and more important.
11.
© 2013 IBM
Corporation11 Why is Big Data important?
12.
© 2013 IBM
Corporation12 Why is Big Data important?
13.
© 2013 IBM
Corporation13 Why is Big Data important?
14.
© 2013 IBM
Corporation1414 Volume Terabytes, petabytes, even exabytes Variety All kinds of data All kinds of analytics Velocity Agility Analyze data in. . . Hours instead of days Days instead of weeks Dynamically responsive Rapid data exploration Traditional / Non-traditional data sources Store Analyze Explore What is BIG data? Volume*Variaty*Velocity=Value
15.
© 2013 IBM
Corporation15 BigData Analytics
16.
© 2013 IBM
Corporation16 BigData Analytics – Predictive Analytics
17.
© 2013 IBM
Corporation17 BigData Analytics – Predictive Analytics
18.
© 2013 IBM
Corporation18 BigData Analytics – Correlation / Text / NLP
19.
© 2013 IBM
Corporation19 BigData Analytics – Feature Extraction Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately¹ ¹: Wikipedia
20.
© 2013 IBM
Corporation20 BigData Analytics – Predictive Analytics Storage / DataCPU’s / Algorithm Business Value / Insight
21.
© 2013 IBM
Corporation21 BigData Analytics – Predictive Analytics "sometimes it's not who has the best algorithm that wins; it's who has the most data." (C) Google Inc. The Unreasonable Effectiveness of Data¹ ¹http://www.csee.wvu.edu/~gidoretto/courses/2011-fall-cp/reading/TheUnreasonable%20EffectivenessofData_IEEE_IS2009.pdf No Sampling => Work with full dataset => Long Tail Distributions
22.
© 2013 IBM
Corporation22 Realtime / In-Memory Computing: InfoSphere Streams / Watson
23.
© 2013 IBM
Corporation23
24.
© 2013 IBM
Corporation24
25.
© 2013 IBM
Corporation25
26.
© 2013 IBM
Corporation26 The Paris Hilton Problem Watson Workshop: What is Watson?
27.
© 2013 IBM
Corporation27 Introduction to Hadoop
28.
© 2013 IBM
Corporation28
29.
© 2013 IBM
Corporation29 BigInsights
30.
© 2013 IBM
Corporation30
31.
© 2013 IBM
Corporation31 BigInsights Demonstration
32.
© 2013 IBM
Corporation32 Hadoop Architecture
33.
© 2013 IBM
Corporation33
34.
© 2013 IBM
Corporation34
35.
© 2013 IBM
Corporation35 HDFS – Hadoop File System
36.
© 2013 IBM
Corporation36
37.
© 2013 IBM
Corporation37
38.
© 2013 IBM
Corporation38
39.
© 2013 IBM
Corporation39
40.
© 2013 IBM
Corporation40
41.
© 2013 IBM
Corporation41
42.
© 2013 IBM
Corporation42
43.
© 2013 IBM
Corporation43
44.
© 2013 IBM
Corporation44
45.
© 2013 IBM
Corporation45
46.
© 2013 IBM
Corporation46
47.
© 2013 IBM
Corporation47
48.
© 2013 IBM
Corporation48
49.
© 2013 IBM
Corporation49
50.
© 2013 IBM
Corporation50
51.
© 2013 IBM
Corporation51
52.
© 2013 IBM
Corporation52
53.
© 2013 IBM
Corporation53
54.
© 2013 IBM
Corporation54 Lab 1 – Hadoop Architecture 1)Start from chapter 1.2 2)Replace /home/biadmin with /home/biadminX where X is your user ID 3)In chapter 1.3 skip task 1.3.1._1 and go to http://10.199.20.51:8080 instead 4)Skip 1.3.5 5)In chapter 1.3.6._30 use any file you like on your desktop computer
55.
© 2013 IBM
Corporation55 Map-Reduce
56.
© 2013 IBM
Corporation56
57.
© 2013 IBM
Corporation57
58.
© 2013 IBM
Corporation58
59.
© 2013 IBM
Corporation59
60.
© 2013 IBM
Corporation60
61.
© 2013 IBM
Corporation61
62.
© 2013 IBM
Corporation62
63.
© 2013 IBM
Corporation63
64.
© 2013 IBM
Corporation64
65.
© 2013 IBM
Corporation65
66.
© 2013 IBM
Corporation66
67.
© 2013 IBM
Corporation67
68.
© 2013 IBM
Corporation68
69.
© 2013 IBM
Corporation69
70.
© 2013 IBM
Corporation70
71.
© 2013 IBM
Corporation71
72.
© 2013 IBM
Corporation72
73.
© 2013 IBM
Corporation73
74.
© 2013 IBM
Corporation74
75.
© 2013 IBM
Corporation75
76.
© 2013 IBM
Corporation76
77.
© 2013 IBM
Corporation77
78.
© 2013 IBM
Corporation78
79.
© 2013 IBM
Corporation79
80.
© 2013 IBM
Corporation80
81.
© 2013 IBM
Corporation81
82.
© 2013 IBM
Corporation82
83.
© 2013 IBM
Corporation83
84.
© 2013 IBM
Corporation84
85.
© 2013 IBM
Corporation85
86.
© 2013 IBM
Corporation86
87.
© 2013 IBM
Corporation87
88.
© 2013 IBM
Corporation88
89.
© 2013 IBM
Corporation89
90.
© 2013 IBM
Corporation90
91.
© 2013 IBM
Corporation91
92.
© 2013 IBM
Corporation92
93.
© 2013 IBM
Corporation93
94.
© 2013 IBM
Corporation94
95.
© 2013 IBM
Corporation95
96.
© 2013 IBM
Corporation96
97.
© 2013 IBM
Corporation97 Data Parallelism
98.
© 2013 IBM
Corporation98 Aggregated Bandwith between CPU, Main Memory and Hard Drive 1 TB (at 10 GByte/s) - 1 Node - 100 sec - 10 Nodes - 10 sec - 100 Nodes - 1 sec - 1000 Nodes - 100 msec
99.
© 2013 IBM
Corporation99
100.
© 2013 IBM
Corporation100
101.
© 2013 IBM
Corporation101
102.
© 2013 IBM
Corporation102
103.
© 2013 IBM
Corporation103 Lab 2 - MapReduce 1)Skip task 1.1._1, use putty to connect to biadmin@10.199.20.51 instead 2)Replace /home/biadmin with /home/biadminX where X is your user ID 3)In 1.1._4 - 1.1._6 replace output with with /home/biadminX/output where X is your user ID 4)Skip chapter 1.2 5)Chapter 1.3 is optional (using your local virtual machine), maybe during lunch break :)
104.
© 2013 IBM
Corporation104 Pig, Jaql, Hive, BigSQL, SystemT/AQL
105.
© 2013 IBM
Corporation105
106.
© 2013 IBM
Corporation106
107.
© 2013 IBM
Corporation107
108.
© 2013 IBM
Corporation108
109.
© 2013 IBM
Corporation109
110.
© 2013 IBM
Corporation110
111.
© 2013 IBM
Corporation111
112.
© 2013 IBM
Corporation112
113.
© 2013 IBM
Corporation113
114.
© 2013 IBM
Corporation114
115.
© 2013 IBM
Corporation115
116.
© 2013 IBM
Corporation116
117.
© 2013 IBM
Corporation117
118.
© 2013 IBM
Corporation118
119.
© 2013 IBM
Corporation119
120.
© 2013 IBM
Corporation120
121.
© 2013 IBM
Corporation121
122.
© 2013 IBM
Corporation122
123.
© 2013 IBM
Corporation123
124.
© 2013 IBM
Corporation124
125.
© 2013 IBM
Corporation125
126.
© 2013 IBM
Corporation126
127.
© 2013 IBM
Corporation127
128.
© 2013 IBM
Corporation128
129.
© 2013 IBM
Corporation129
130.
© 2013 IBM
Corporation130
131.
© 2013 IBM
Corporation131
132.
© 2013 IBM
Corporation132
133.
© 2013 IBM
Corporation133 SQL for BigInsights Data warehouse augmentation is a very common use case for Hadoop While highly scalable, MapReduce is notoriously difficult to use – Java API is tedious and requires programming expertise – Unfamiliar languages (e.g. Pig) also requiring expertise – Many different file formats, storage mechanisms, configuration options, etc. – Joins, grouping, sorting tedious to orchestrate SQL support opens the data to a much wider audience – Familiar, widely known syntax – Common catalog for identifying data and structure – Clear separation of defining the what (you want) vs. the how (to get it)
134.
© 2013 IBM
Corporation134 Query Processing Big SQL consists of two query processing engines – The SQL optimization engine – Jaql as the query execution engine Client SQL Engine Jaql Jaql SQL Optimizer Runtime
135.
© 2013 IBM
Corporation135 Big SQL vs. Alternatives There are a number of SQL solutions, where does Big SQL fit in? Hive – Open source • Established Hadoop component • Active development community – Restrictive SQL syntax • No subqueries (Hive 0.11 adds non-correlated subquery support) • No windowed aggregates (Hive 0.11 adds windowed aggregate support) • Ansi join syntax only – Limited type support • No varchar(n), decimal(p,s), etc. – Poor client support • Limited JDBC and ODBC drivers – Poor low-latency query support (via local mapreduce)
136.
© 2013 IBM
Corporation136 Big SQL vs. Alternatives (cont.) Impala – Recently open sourced – Achieves low latency by bypassing MapReduce infrastructure • Installs a completely separate execution infrastructure • Can lead to resource scheduling conflicts – Execution engine is C++ • Great for performance, makes extending difficult (e.g. UDF's & UDA's) • Support for limited set of file formats – Currently limited to broadcast joins • All tables must fit in memory (aggregate cluster memory) • Scalability limitation for larger clusters – Uses Hive 0.9 query syntax (more limitations than the current Hive) – Uses Hive 0.9 type system (more limitations than the current Hive)
137.
© 2013 IBM
Corporation137
138.
© 2013 IBM
Corporation138
139.
© 2013 IBM
Corporation139
140.
© 2013 IBM
Corporation140
141.
© 2013 IBM
Corporation141 Lab 3 – Querying Data with Pig, Hive, Jaql 1)putty to biadmin@10.199.20.51 2)Skip task 1.1._2, start jaql shell using command /opt/ibm/biginsights/jaql/bin/jaqlshell 3)In 1.1._5 replace biadmin with with biadminX where X is your user ID 4)Skip chapter 1.2 (optional using virtual machine) 5)In 1.3._2 replace biadmin with with biadminX where X is your user ID 6)Instead of task 1.3._2 type /opt/ibm/biginsights/pig/bin/pig 7)In 1.3._4 replace sampleData/NewsGroups.csv with /user/biadminX/sampleData/NewsGroups.csv 8)Skip chapter 1.4 (optional using virtual machine) 9)Skip 1.5._12 and _13 and type /opt/ibm/biginsights/hive/bin/hive instead 10)Type "use biadminX" where X is your user ID 11)continue with task _14
142.
© 2013 IBM
Corporation142 NoSQL Databases Column Store – Hadoop / HBASE – Cassandra – Amazon Simple DB JSON / Document Store – MongoDB – CouchDB Key / Value Store – Amazon DynamoDB – Voldemort Graph DBs – DB2 SPARQL Extension – Neo4J MP RDBMS – DB2 DPF, DB2 pureScale, PureData for Operational Analytics – Oracle RAC – Greenplum http://nosql-database.org/ > 150
143.
© 2013 IBM
Corporation143 CAP Theorem / Brewers Theorem¹ impossible for a distributed computer system simultaneously guarantee all 3 properties – Consistency (all nodes see the same data at the same time) – Availability (guarantee that every request knows whether it was successful or failed) – Partition tolerance (continues to operate despite failure of part of the system) What about ACID? – Atomicity – Consistency – Isolation – Durability BASE, the new ACID – Basically Available – Soft state – Eventual consistency • Monotonic Read Consistency • Monotonic Write Consistency • Read Your Own Writes
144.
© 2013 IBM
Corporation144 Certification Go to www.bigdatauniversity.com Search for “hadoop fundamentals” Choose “Hadoop Fundamentals I – Version 2” Sign up Login with existing account or one of the following: Take the test:
145.
© 2013 IBM
Corporation145 Questions?
Download now