SlideShare une entreprise Scribd logo
1  sur  12
Apache Hadoop-based Services für Windows Azure




Sascha Dittmann
Software Developer / Solution Architect
Twitter: @SaschaDittmann
Blog:    http://www.sascha-dittmann.de
Apache Hadoop & Co

             Zookeeper




    Pig
Hadoop Distributed File System

           Cluster Startvorgang
Hadoop Distributed File System
           Ausfall des Namenodes (Failover)
Hadoop Distributed File System
       Benuteranfrage


                        ①

           ②     ②          ②
Hadoop Distributed File System
 Portable Operating System Interface (POSIX)
 Replikation auf mehrere Datenknoten
js> #ls input/ncdc
Found 9 items
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 13:01 /user/Sascha/input/ncdc/_distcp_logs_g0dedn
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 12:04 /user/Sascha/input/ncdc/_distcp_logs_ofj0u6
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 13:09 /user/Sascha/input/ncdc/all
drwxr-xr-x - Sascha   supergroup   0 2012-04-24 13:01 /user/Sascha/input/ncdc/all2
drwxr-xr-x - Sascha   supergroup   0 2012-04-23 13:06 /user/Sascha/input/ncdc/metadata
drwxr-xr-x - Sascha   supergroup   0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro
drwxr-xr-x - Sascha   supergroup   0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro-tab
-rw-r--r-- 3 Sascha   supergroup   529 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt
-rw-r--r-- 3 Sascha   supergroup   168 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
Map/Reduce
 DataNode   DataNode   DataNode   0067011990999991950051507004+68750
                                  0043011990999991950051512004+68750
                                  0043011990999991950051518004+68750
                                  0043012650999991949032412004+62300
                                  0043012650999991949032418004+62300




                                  1949,0
                                                         1952,-11
                                  1950,22
   Map        Map        Map      1950,55
                                                         1950,33




   Sort       Sort       Sort     1949,0
                                  1950,[22,33,55]
  Shuffle    Shuffle    Shuffle   1952,-11




             Reduce
                                  1949,0
                                  1950,55
                                  1952,-11
Map/Reduce
 DataNode   DataNode   DataNode   0067011990999991950051507004+68750
                                  0043011990999991950051512004+68750
                                  0043011990999991950051518004+68750
                                  0043012650999991949032412004+62300
                                  0043012650999991949032418004+62300




                                  1949,0
                                                         1952,-11
                                  1950,22
   Map        Map        Map      1950,55
                                                         1950,33




                                  1949,0                 1952,-11
 Combine    Combine    Combine    1950,55                1950,33




   Sort       Sort       Sort     1949,0
                                  1950,[33,55]
  Shuffle    Shuffle    Shuffle   1952,-11




             Reduce
                                  1949,0
                                  1950,55
                                  1952,-11
RDBMS vs. Map/Reduce
                          RDBMS                  Map/Reduce
Datenmenge                Gigabytes              Petabytes
Zugriff                   Interaktiv und Batch   Batch
Lese- / Schreibzugriffe   Viele Lese- und        Einmaliges Schreiben
                          Schreibzugriffe        Viele Lesezugriffe
Datenstruktur             Statisches Schema      Dynamisches Schema
Datenintegrität           Hoch                   Niedrig
Skalierverhalten          Nicht-Linear           Linear
Apache Hadoop & Co

             Zookeeper




    Pig
Demos
 Hadoop Dashboard
 Interactive Console
 Remote Desktop
 Nutzung des WA Storage
 Map/Reduce via JavaScript
 C# Streaming
 Power Pivot
Cloud Bloggers


Die Blogs der deutschen Cloud Computing-Community

Link: http://cloudbloggers.de

Contenu connexe

Plus de Sascha Dittmann

dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Services
Sascha Dittmann
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die Cloud
Sascha Dittmann
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und Azure
Sascha Dittmann
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1
Sascha Dittmann
 

Plus de Sascha Dittmann (15)

C# + SQL = Big Data
C# + SQL = Big DataC# + SQL = Big Data
C# + SQL = Big Data
 
Hochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureHochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft Azure
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
 
dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der Praxis
 
Microsoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightMicrosoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsight
 
dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Services
 
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwicklerdotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
 
Developer Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopDeveloper Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing Workshop
 
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die Cloud
 
Big Data & NoSQL
Big Data & NoSQLBig Data & NoSQL
Big Data & NoSQL
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und Azure
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Services für Windows Azure

  • 1. Apache Hadoop-based Services für Windows Azure Sascha Dittmann Software Developer / Solution Architect Twitter: @SaschaDittmann Blog: http://www.sascha-dittmann.de
  • 2. Apache Hadoop & Co Zookeeper Pig
  • 3. Hadoop Distributed File System Cluster Startvorgang
  • 4. Hadoop Distributed File System Ausfall des Namenodes (Failover)
  • 5. Hadoop Distributed File System Benuteranfrage ① ② ② ②
  • 6. Hadoop Distributed File System  Portable Operating System Interface (POSIX)  Replikation auf mehrere Datenknoten js> #ls input/ncdc Found 9 items drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:01 /user/Sascha/input/ncdc/_distcp_logs_g0dedn drwxr-xr-x - Sascha supergroup 0 2012-04-24 12:04 /user/Sascha/input/ncdc/_distcp_logs_ofj0u6 drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:09 /user/Sascha/input/ncdc/all drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:01 /user/Sascha/input/ncdc/all2 drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/metadata drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro-tab -rw-r--r-- 3 Sascha supergroup 529 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt -rw-r--r-- 3 Sascha supergroup 168 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
  • 7. Map/Reduce DataNode DataNode DataNode 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1952,-11 1950,22 Map Map Map 1950,55 1950,33 Sort Sort Sort 1949,0 1950,[22,33,55] Shuffle Shuffle Shuffle 1952,-11 Reduce 1949,0 1950,55 1952,-11
  • 8. Map/Reduce DataNode DataNode DataNode 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1952,-11 1950,22 Map Map Map 1950,55 1950,33 1949,0 1952,-11 Combine Combine Combine 1950,55 1950,33 Sort Sort Sort 1949,0 1950,[33,55] Shuffle Shuffle Shuffle 1952,-11 Reduce 1949,0 1950,55 1952,-11
  • 9. RDBMS vs. Map/Reduce RDBMS Map/Reduce Datenmenge Gigabytes Petabytes Zugriff Interaktiv und Batch Batch Lese- / Schreibzugriffe Viele Lese- und Einmaliges Schreiben Schreibzugriffe Viele Lesezugriffe Datenstruktur Statisches Schema Dynamisches Schema Datenintegrität Hoch Niedrig Skalierverhalten Nicht-Linear Linear
  • 10. Apache Hadoop & Co Zookeeper Pig
  • 11. Demos  Hadoop Dashboard  Interactive Console  Remote Desktop  Nutzung des WA Storage  Map/Reduce via JavaScript  C# Streaming  Power Pivot
  • 12. Cloud Bloggers Die Blogs der deutschen Cloud Computing-Community Link: http://cloudbloggers.de