SlideShare une entreprise Scribd logo
1  sur  56
Vademecum Big Data
Adam Kawa, Spotify, Compendium CE
About Me
Spotify/Compendium, WHUG/SHUG, HakunaMapData.com, +2.5Y
And The 20-Minute Story About ...




Image source:http://www.containsmoderateperil.com/wp-content/uploads/2012/09/Dev-Diary-Epic-Story.jpg
A Really Data-Driven Company …




Image source: http://wwwimg.roku.com/hero-images/home2_1.jpg
And Some Inevitable Problems ...




Image source: http://www.digitalnewsasia.com/sites/default/files/images/digital%20economy/data%20explosion.jpg
And Some Inevitable Problems ...




Image source: http://p.alejka.pl/i2/p_new/36/42/grosz-na-szczescie-ze-zlota-m-1z-doskonaly-na-kazda-okazje_0_b.jpg
And Some Inevitable Problems ...




Image source: http://25.media.tumblr.com/d1038e7831eae86f5e84d0d09a2e6fad/tumblr_mfh5srmNAR1s06a3to1_500.jpg
Start!
The First Approach Works Fine ...
Until Data Gets Bigger ...
And More Diverse ...
The Data Monster Becomes A Problem




Image source: http://cloudtimes.org/wp-content/uploads/2012/05/big-data.jpg
Apache Hadoop Becomes A Solution




Image source: http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg
Orchestra Of Nodes




Image source: http://www.dsn.jhu.edu/images/orchestra.gif
Fault-Tolerant Orchestra Of Nodes
Untypical Orchestra Of Typical* Nodes
* however having very cheap nodes is false economy
Highly Scalable Orchestra Of Nodes
Hadoop Distributed File System (HDFS)




Image source: http://www.wallcoo.net/car/Trucks/images/Big_Truck_on_Road_.jpg
HDFS Blocks And Replication
HDFS Self-Healing Features




Image source: http://www.mwctoys.com/images/review_hydra_3.jpg
HDFS Scales And Shines With MapReduce




Image source: http://www.kkkp.pl/graph/gr_kdz_char3.jpg
MapReduce Is A Change


                                            DATA
                                             Map And Reduce


Image source: http://2.bp.blogspot.com/-Kl1ADjd3_7I/T6a8ZQV7ITI/AAAAAAAAKfE/qVyTQdJl2Do/s1600/make-big-changes-in-small-steps.png
Map And Reduce Functions
MapReduce Paradigm
Artist Count Example
Sending Computation To Data


                                                                                                     Data
                                                                                                     Is
                                                                                                     Here!


Computation


Image source: http://www.conservationmagazine.org/wp-content/uploads/2011/03/ElephantAndMouse1.jpg
MapReduce Implementation




Image source: http://i3.mirror.co.uk/incoming/article1360046.ece/ALTERNATES/s615/Male+drones+tend+to+honeycomb+cells+in+a+bee+colony
First Success: 5-Node Hadoop Cluster




Image source: http://www.smallbiztechnology.com/wp-content/uploads/2012/12/success.jpg
Apache Whirr And The Cloud
===== hadoop.properties =============
whirr.cluster-name=production_cluster
whirr.instance-templates=
1 hadoop-jobtracker+hadoop-namenode,
4 hadoop-datanode+hadoop-tasktracker
whirr.provider=aws-ec2 # or Rackspace cloudservers-us
...
=====================================

$ whirr launch-cluster --config hadoop.properties
$ whirr destroy-cluster --config hadoop.properties
First Sad (Non-Java Speaking) Developers




Image source: http://www.shivayanaturals.com/wp-content/uploads/2012/01/Unhappy.jpg
Hadoop Streaming For Scripting Languages




Image source:http://www.mightystreamradio.com/PHOTOS/STREAM%20PHOTO%202.jpg
Apache Hive Makes You Feel Younger




Image source: http://majapszczolka.blox.pl/resource/Pszczolka_Maja_Baje_Pl_6.jpg
Speak ~SQL, But Run As MapReduce
HUE - Browser-Based Environment




Image source: http://www.sentric.ch/wp-content/uploads/2013/01/Create-table-in-Hive.png
Hive Is Based On & Limited By Hadoop
Apache Pig Makes Them Happier!


                        




Image source: http://vetnolimits.files.wordpress.com/2012/02/pumba.jpg
Pig Accelerates Development


        
Need To Add More Relational Data To HDFS




Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
SQL To Hadoop = Sqoop




Image source:http://3.bp.blogspot.com/_uuOo8x3WXWE/SuNV4y7qzeI/AAAAAAAAkYM/6RUExOMQPno/s400/pumpkin_eating_elephant.jpg
Sqoop Import/Export Data Using MR




Image source: http://blog.cloudera.com/blog/2011/10/apache-sqoop-overview/
Apache Oozie For Defining Workflows




Image source: Apache Oozie website
Apache Oozie For Scheduling




Image source:http://risingtechies.files.wordpress.com/2012/05/schedule.jpg
Need To Add Even More Logs To HDFS




Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
Apache Flume For Data Collection
                                     e.g. JDBC, Memory, File




Image source: Apache Flume website
How To Manager A Larger Cluster
Apache Avro + Snappy/Deflate_6




Image source: http://www.funkydiva.pl/wp-content/uploads/2012/10/lego-tapety-na-pulpit-duze-zdjecia-16.jpg
When Latency Is To High




Image source: http://www.pharmacyowners.com/Portals/37772/images/It-can-be-a-LONG-wait-at-the-pharmacy-resized-600.jpg
Cloudera Impala – Real-Time ~SQL Queries




Image source: http://static.cargurus.com/images/site/2010/07/02/12/24/1969_chevrolet_impala-pic-2868587530424686499.jpeg
Apache HBase - Random, Real-Time
Access To Big Data




Image source: http://www.superhqwallpapers.com/wp-content/uploads/2012/01/Super-Ferrari.jpg
YARN – Hadoop Cluster More Robust




Image source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
Hadoop Is Successfully Deployed




Image source: http://bogdankipko.com/wp-content/uploads/2012/03/lessons-learned.jpg
Learn More About Apache Hadoop?
Use Hadoop To Solve Real-World Problems?
Oozie And YARN At WHUG, Today @18:00
Thank You! Any Questions About Them?




Image source: http://xn--gryprzegldarkowe-43b.com.pl/wp-content/uploads/2012/05/me-free-zoo1.jpg
Apache Hadoop Ecosystem (based on an exemplary data-driven…

Contenu connexe

Tendances

Python in big data world
Python in big data worldPython in big data world
Python in big data worldRohit
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsSkillspeed
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinPietro Michiardi
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in SearchAmund Tveit
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축Kwang Woo NAM
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Holden Karau
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latinknowbigdata
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and PipesHanborq Inc.
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitMilind Bhandarkar
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Sumeet Singh
 

Tendances (19)

Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
 
Scaling hadoopapplications
Scaling hadoopapplicationsScaling hadoopapplications
Scaling hadoopapplications
 
Apache pig
Apache pigApache pig
Apache pig
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 

En vedette

Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Adam Kawa
 
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Adam Kawa
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARNAdam Kawa
 
Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacjiAdam Kawa
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At SpotifyAdam Kawa
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java APIAdam Kawa
 

En vedette (13)

Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm
 
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacji
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
 

Similaire à Apache Hadoop Ecosystem (based on an exemplary data-driven…

Back to the [Completable] Future
Back to the [Completable] FutureBack to the [Completable] Future
Back to the [Completable] FutureSofiia Khomyn
 
Empowering DevOps with Cloud Foundry
Empowering DevOps with Cloud FoundryEmpowering DevOps with Cloud Foundry
Empowering DevOps with Cloud FoundryVMware Tanzu
 
Testing Like a Pro - Chef Infrastructure Testing
Testing Like a Pro - Chef Infrastructure TestingTesting Like a Pro - Chef Infrastructure Testing
Testing Like a Pro - Chef Infrastructure TestingTim Smith
 
Fitting the pieces together - at Drupal Summit Europe - 2011
Fitting the pieces together - at Drupal Summit Europe - 2011Fitting the pieces together - at Drupal Summit Europe - 2011
Fitting the pieces together - at Drupal Summit Europe - 2011Katrien De Graeve
 
HTML5 after the hype - JFokus2015
HTML5 after the hype - JFokus2015HTML5 after the hype - JFokus2015
HTML5 after the hype - JFokus2015Christian Heilmann
 
Design+Performance Velocity 2015
Design+Performance Velocity 2015Design+Performance Velocity 2015
Design+Performance Velocity 2015Steve Souders
 
Logan composition (2)
Logan composition (2)Logan composition (2)
Logan composition (2)loganm
 
Hacking Web Performance @ ForwardJS 2017
Hacking Web Performance @ ForwardJS 2017Hacking Web Performance @ ForwardJS 2017
Hacking Web Performance @ ForwardJS 2017Maximiliano Firtman
 
10 Laravel packages everyone should know
10 Laravel packages everyone should know10 Laravel packages everyone should know
10 Laravel packages everyone should knowPovilas Korop
 
[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web DesignChristopher Schmitt
 
Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017William Lee
 
Tactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryTactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryJeff Gallimore
 
Tactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryTactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryExcella
 
Vpn presentation richard kong
Vpn presentation   richard kongVpn presentation   richard kong
Vpn presentation richard kongRichardKong18
 
High Performance HTML5 (SF HTML5 UG)
High Performance HTML5 (SF HTML5 UG)High Performance HTML5 (SF HTML5 UG)
High Performance HTML5 (SF HTML5 UG)Steve Souders
 
Tactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsTactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsJeff Gallimore
 
Tactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsTactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsExcella
 

Similaire à Apache Hadoop Ecosystem (based on an exemplary data-driven… (20)

Back to the [Completable] Future
Back to the [Completable] FutureBack to the [Completable] Future
Back to the [Completable] Future
 
Empowering DevOps with Cloud Foundry
Empowering DevOps with Cloud FoundryEmpowering DevOps with Cloud Foundry
Empowering DevOps with Cloud Foundry
 
Testing Like a Pro - Chef Infrastructure Testing
Testing Like a Pro - Chef Infrastructure TestingTesting Like a Pro - Chef Infrastructure Testing
Testing Like a Pro - Chef Infrastructure Testing
 
The Last Mile
The Last MileThe Last Mile
The Last Mile
 
Fitting the pieces together - at Drupal Summit Europe - 2011
Fitting the pieces together - at Drupal Summit Europe - 2011Fitting the pieces together - at Drupal Summit Europe - 2011
Fitting the pieces together - at Drupal Summit Europe - 2011
 
HTML5 after the hype - JFokus2015
HTML5 after the hype - JFokus2015HTML5 after the hype - JFokus2015
HTML5 after the hype - JFokus2015
 
Design+Performance Velocity 2015
Design+Performance Velocity 2015Design+Performance Velocity 2015
Design+Performance Velocity 2015
 
Logan composition (2)
Logan composition (2)Logan composition (2)
Logan composition (2)
 
Hacking Web Performance @ ForwardJS 2017
Hacking Web Performance @ ForwardJS 2017Hacking Web Performance @ ForwardJS 2017
Hacking Web Performance @ ForwardJS 2017
 
10 Laravel packages everyone should know
10 Laravel packages everyone should know10 Laravel packages everyone should know
10 Laravel packages everyone should know
 
Velocity Report 2009
Velocity Report 2009Velocity Report 2009
Velocity Report 2009
 
[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design
 
Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017
 
Tactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryTactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous Delivery
 
Tactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryTactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous Delivery
 
Vpn presentation richard kong
Vpn presentation   richard kongVpn presentation   richard kong
Vpn presentation richard kong
 
GDG Varna - Hadoop
GDG Varna - HadoopGDG Varna - Hadoop
GDG Varna - Hadoop
 
High Performance HTML5 (SF HTML5 UG)
High Performance HTML5 (SF HTML5 UG)High Performance HTML5 (SF HTML5 UG)
High Performance HTML5 (SF HTML5 UG)
 
Tactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsTactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOps
 
Tactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsTactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOps
 

Dernier

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Dernier (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Apache Hadoop Ecosystem (based on an exemplary data-driven…

  • 1. Vademecum Big Data Adam Kawa, Spotify, Compendium CE
  • 2. About Me Spotify/Compendium, WHUG/SHUG, HakunaMapData.com, +2.5Y
  • 3. And The 20-Minute Story About ... Image source:http://www.containsmoderateperil.com/wp-content/uploads/2012/09/Dev-Diary-Epic-Story.jpg
  • 4. A Really Data-Driven Company … Image source: http://wwwimg.roku.com/hero-images/home2_1.jpg
  • 5. And Some Inevitable Problems ... Image source: http://www.digitalnewsasia.com/sites/default/files/images/digital%20economy/data%20explosion.jpg
  • 6. And Some Inevitable Problems ... Image source: http://p.alejka.pl/i2/p_new/36/42/grosz-na-szczescie-ze-zlota-m-1z-doskonaly-na-kazda-okazje_0_b.jpg
  • 7. And Some Inevitable Problems ... Image source: http://25.media.tumblr.com/d1038e7831eae86f5e84d0d09a2e6fad/tumblr_mfh5srmNAR1s06a3to1_500.jpg
  • 9. The First Approach Works Fine ...
  • 10. Until Data Gets Bigger ...
  • 12. The Data Monster Becomes A Problem Image source: http://cloudtimes.org/wp-content/uploads/2012/05/big-data.jpg
  • 13. Apache Hadoop Becomes A Solution Image source: http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg
  • 14. Orchestra Of Nodes Image source: http://www.dsn.jhu.edu/images/orchestra.gif
  • 16. Untypical Orchestra Of Typical* Nodes * however having very cheap nodes is false economy
  • 18. Hadoop Distributed File System (HDFS) Image source: http://www.wallcoo.net/car/Trucks/images/Big_Truck_on_Road_.jpg
  • 19. HDFS Blocks And Replication
  • 20. HDFS Self-Healing Features Image source: http://www.mwctoys.com/images/review_hydra_3.jpg
  • 21. HDFS Scales And Shines With MapReduce Image source: http://www.kkkp.pl/graph/gr_kdz_char3.jpg
  • 22. MapReduce Is A Change DATA Map And Reduce Image source: http://2.bp.blogspot.com/-Kl1ADjd3_7I/T6a8ZQV7ITI/AAAAAAAAKfE/qVyTQdJl2Do/s1600/make-big-changes-in-small-steps.png
  • 23. Map And Reduce Functions
  • 26. Sending Computation To Data Data Is Here! Computation Image source: http://www.conservationmagazine.org/wp-content/uploads/2011/03/ElephantAndMouse1.jpg
  • 27. MapReduce Implementation Image source: http://i3.mirror.co.uk/incoming/article1360046.ece/ALTERNATES/s615/Male+drones+tend+to+honeycomb+cells+in+a+bee+colony
  • 28. First Success: 5-Node Hadoop Cluster Image source: http://www.smallbiztechnology.com/wp-content/uploads/2012/12/success.jpg
  • 29. Apache Whirr And The Cloud ===== hadoop.properties ============= whirr.cluster-name=production_cluster whirr.instance-templates= 1 hadoop-jobtracker+hadoop-namenode, 4 hadoop-datanode+hadoop-tasktracker whirr.provider=aws-ec2 # or Rackspace cloudservers-us ... ===================================== $ whirr launch-cluster --config hadoop.properties $ whirr destroy-cluster --config hadoop.properties
  • 30. First Sad (Non-Java Speaking) Developers Image source: http://www.shivayanaturals.com/wp-content/uploads/2012/01/Unhappy.jpg
  • 31. Hadoop Streaming For Scripting Languages Image source:http://www.mightystreamradio.com/PHOTOS/STREAM%20PHOTO%202.jpg
  • 32. Apache Hive Makes You Feel Younger Image source: http://majapszczolka.blox.pl/resource/Pszczolka_Maja_Baje_Pl_6.jpg
  • 33. Speak ~SQL, But Run As MapReduce
  • 34. HUE - Browser-Based Environment Image source: http://www.sentric.ch/wp-content/uploads/2013/01/Create-table-in-Hive.png
  • 35. Hive Is Based On & Limited By Hadoop
  • 36. Apache Pig Makes Them Happier!   Image source: http://vetnolimits.files.wordpress.com/2012/02/pumba.jpg
  • 38. Need To Add More Relational Data To HDFS Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
  • 39. SQL To Hadoop = Sqoop Image source:http://3.bp.blogspot.com/_uuOo8x3WXWE/SuNV4y7qzeI/AAAAAAAAkYM/6RUExOMQPno/s400/pumpkin_eating_elephant.jpg
  • 40. Sqoop Import/Export Data Using MR Image source: http://blog.cloudera.com/blog/2011/10/apache-sqoop-overview/
  • 41. Apache Oozie For Defining Workflows Image source: Apache Oozie website
  • 42. Apache Oozie For Scheduling Image source:http://risingtechies.files.wordpress.com/2012/05/schedule.jpg
  • 43. Need To Add Even More Logs To HDFS Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
  • 44. Apache Flume For Data Collection e.g. JDBC, Memory, File Image source: Apache Flume website
  • 45. How To Manager A Larger Cluster
  • 46. Apache Avro + Snappy/Deflate_6 Image source: http://www.funkydiva.pl/wp-content/uploads/2012/10/lego-tapety-na-pulpit-duze-zdjecia-16.jpg
  • 47. When Latency Is To High Image source: http://www.pharmacyowners.com/Portals/37772/images/It-can-be-a-LONG-wait-at-the-pharmacy-resized-600.jpg
  • 48. Cloudera Impala – Real-Time ~SQL Queries Image source: http://static.cargurus.com/images/site/2010/07/02/12/24/1969_chevrolet_impala-pic-2868587530424686499.jpeg
  • 49. Apache HBase - Random, Real-Time Access To Big Data Image source: http://www.superhqwallpapers.com/wp-content/uploads/2012/01/Super-Ferrari.jpg
  • 50. YARN – Hadoop Cluster More Robust Image source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
  • 51. Hadoop Is Successfully Deployed Image source: http://bogdankipko.com/wp-content/uploads/2012/03/lessons-learned.jpg
  • 52. Learn More About Apache Hadoop?
  • 53. Use Hadoop To Solve Real-World Problems?
  • 54. Oozie And YARN At WHUG, Today @18:00
  • 55. Thank You! Any Questions About Them? Image source: http://xn--gryprzegldarkowe-43b.com.pl/wp-content/uploads/2012/05/me-free-zoo1.jpg