SlideShare une entreprise Scribd logo
1  sur  34
Sascha Dittmann
Blog: http://www.sascha-dittmann.de
Twitter: @SaschaDittmann
Microsoft HDInsight für .NET Entwickler
Big Data Analysen mit JavaScript und C#
Large Hadron Collider (CERN Schweiz)
http://public.web.cern.ch/public/en/lhc/Computing-en.html
Der LHC Teilchenbeschleuniger
produziert 15 PB Messdaten pro Jahr*
Woher kommt Big Data
70% of U.S.
smartphone owners
regularly shop online
via their devices.
44% of users
(350M people)
access Facebook via
mobile devices.
50% of
millennials use
mobile devices to
research products.
60%of U.S.
mobile data will be
audio and video
streaming by 2014.
Mobility
2/3of the world's
mobile data traffic will
be video by 2016.
33%of BI will
be consumed via
handheld devices
by 2013.
Gaming consoles are
now used an average of
1.5 hrs/wk
to connect to the
Internet.
80%growth of
unstructured data is
predicted over the
next five years.
1.8 zettabytes
of digital data were
in use
worldwide in
2011, up 30%
from 2010.
1 in 4
Facebook users
add their location
to posts
(2B/month).
500M Tweets
are hosted on
Twitter each day.
38% of people
recommend a brand
they “like” or follow
on a social network.
100M
Facebook
“likes” per day.
Brands get
Big
Data
Social
Mobility Cloud
Big Data Szenarien
Web app
optimization
Smart meter
monitoring
Equipment
monitoring
Advertising
analysis
Life sciences
research
Fraud
detection
Healthcare
outcomes
Weather
forecasting
Natural resource
exploration
Social network
analysis
Churn
analysis
Traffic flow
optimization
IT infrastructure
optimization
Legal
discovery
Big Data ist sexy
http://hbr.org/
Apache Hadoop Ecosystem
MapReduce (Job Scheduling/Execution System)
HDFS
(Hadoop Distributed File System)
HBase (Column DB)
Pig (Data
Flow)
Hive
(Warehouse
and Data
Access)
Oozie
(Workflow)
Sqoop
Traditional BI Tools
HBase / Cassandra
(Columnar NoSQL Databases)
Avro(Serialization)
Zookeeper(Coordination)
Apache
Mahout
Cascading
(programming
model)
Hadoop = MapReduce + HDFS
Flume
Microsoft HDInsight
MapReduce (Job Scheduling/Execution System)
HDFS
(Hadoop Distributed File System)
HBase (Column DB)
Pig
(Data
Flow)
Hive
(Warehous
e and Data
Access)
Oozie
(Workflow)
Sqoop
Traditional BI Tools
HBase / Cassandra
(Columnar NoSQL Databases)
Avro(Serialization)
Zookeeper(Coordination)
Apache
Mahout
Cascading
(programmin
g model)
Hadoop = MapReduce + HDFS
Flume
Windows
SystemCenter
ActiveDirectory
Visual Studio
Hadoop Distributed File System (HDFS)
Bootvorgang
Ausfallsicherheit
Benutzeranfrage
Hadoop Distributed File System (HDFS)
Bootvorgang
Ausfallsicherheit
Benutzeranfrage
Bootvorgang
Ausfallsicherheit
Benutzeranfrage
Hadoop Distributed File System (HDFS)
Hadoop Distributed File System (HDFS)
 Portable Operating System Interface (POSIX)
 Replikation auf mehrere Datenknoten
js> #ls /user/Sascha/input/ncdc
Found 9 items
drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:09 /user/Sascha/input/ncdc/all
drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:01 /user/Sascha/input/ncdc/all2
drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/metadata
drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro
drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro-tab
-rw-r--r-- 3 Sascha supergroup 529 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt
-rw-r--r-- 3 Sascha supergroup 168 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
HDInsight Dashboard Demo
Map/Reduce am Beispiel von Messdaten
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
Jahr Lufttemperatur
Map/Reduce am Beispiel von Messdaten
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
Messqualität
Map/Reduce
Map
Sort
Shuffle
DataNode
Map
Sort
Shuffle
DataNode
Map
Sort
Shuffle
DataNode
Reduce
0067011990999991950051507004+68750
0043011990999991950051512004+68750
0043011990999991950051518004+68750
0043012650999991949032412004+62300
0043012650999991949032418004+62300
1949,0
1950,22
1950,55
1952,-11
1950,33
1949,0
1950,[22,33,55]
1952,-11
1949,0
1950,55
1952,-11
Map/Reduce mit Combine Methode
Map
Combine
Sort
Shuffle
DataNode
Map
Combine
Sort
Shuffle
DataNode
Map
Combine
Sort
Shuffle
DataNode
Reduce
0067011990999991950051507004+68750
0043011990999991950051512004+68750
0043011990999991950051518004+68750
0043012650999991949032412004+62300
0043012650999991949032418004+62300
1949,0
1950,22
1950,55
1952,-11
1950,33
1949,0
1950,55
1952,-11
1950,33
1949,0
1950,[33,55]
1952,-11
1949,0
1950,55
1952,-11
Map/Reduce am Beispiel von Messdaten
Wörter zählen mit JavaScript (Map)
Wörter zählen mit JavaScript (Reduce)
Map/Reduce mit JavaScript
Verfeinern mit Pig Latin
pig
.from("/user/Sascha/input/texte")
.mapReduce("/user/…/WordCount.js"
, "Woerter, Anzahl:long")
.orderBy("Anzahl DESC")
.take(15)
.to("/user/Sascha/output/Top15Woerter")
Pig Latin
Wörter zählen mit C# (Map - Classic)
Wörter zählen mit C# (Reduce - Classic)
Map/Reduce mit C#
.NET Job Submission Framework (Map)
.NET Job Submission Framework (Reduce)
Externe Hive-Tabelle erzeugen
CREATE EXTERNAL TABLE twitter_raw
(
tweet_json STRING
)
COMMENT 'Twitter Sample Data'
ROW FORMAT DELIMITED LINES TERMINATED
BY '10'
STORED AS TEXTFILE
LOCATION '/example/twitterdata';
Twitter JSON
{
"possibly_sensitive_editable":true,
"place":null,
"text":"Pre - #ConvCloud chat insights. " #Cloud Security, are we missing the point?" from
@christianve http://t.co/Smo0CPvb #HP #cloudsource”,
"id_str":"223418953114984448”,
"favorited":false,
"possibly_sensitive":false,
"created_at":"Thu Jul 12 14:10:04 +0000 2012",
"retweeted":false,
"retweet_count":0,
"user":{
"is_translator":false,
"profile_use_background_image":true,
"profile_image_url_https":"https://si0.twimg.com/profile_images/640456324/
Paul_Calento_normal.jpg",
"id_str":"103006513",
"profile_text_color":"333333",
"statuses_count":5984,
"following":null,
"followers_count":744,
"default_profile_image":false,
"profile_link_color":"FF3300",
}, …..
}
JSON in Hive interpretieren
FROM twitter_raw
INSERT OVERRIDE TABLE twitter_temp
SELECT get_json_object(tweet_json, '$.created_at'),
substr(get_json_object(tweet_json, '$.created_at'),9,2),
substr(get_json_object(tweet_json, '$.created_at'),12,8),
get_json_object(tweet_json, '$.in_reply_to_user_id_str'),
get_json_object(tweet_json, '$.text'),
get_json_object(tweet_json, '$.contributors'),
get_json_object(tweet_json, '$.retweeted'),
get_json_object(tweet_json, '$.truncated'),
get_json_object(tweet_json, '$.favorited'),
cast(get_json_object(tweet_json, '$.retweet_count') as int),
/* … */
get_json_object(tweet_json, '$.user.profile_image_url_https'),
cast(get_json_object(tweet_json, '$.user.followers_count') as int),
get_json_object(tweet_json, '$.user.location'),
get_json_object(tweet_json, '$.user.time_zone'),
get_json_object(tweet_json, '$.user.created_at');
Hive
RDBMS vs. Hadoop
RDBMS Hadoop
Volumen Gigabyte Petabyte
Verarbeitung Ad-Hoc und batch Batch
Updates Viele Lese- und
Schreibzugriffe
Einmal schreiben,
Viele Lesezugriffe
Schema Statisches Schema Dynamisches Schema
Datenintegrität Hoch Niedrig
Skalierverhalten Nicht-Linear Linear
Polybase / SQL Server PDW
Fragen
? ?
?
?
?

Contenu connexe

Similaire à dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler

Crossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at ScaleCrossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at Scalejgoulah
 
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...FIDE Master Tihomir Dovramadjiev PhD
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Geoffrey Fox
 
Azureday 2020 - The Edge talks - long road into the Cloud​
Azureday 2020 - The Edge talks - long road into the Cloud​Azureday 2020 - The Edge talks - long road into the Cloud​
Azureday 2020 - The Edge talks - long road into the Cloud​Rafal Warzycha
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
Guía de usuario
Guía de usuarioGuía de usuario
Guía de usuarioSe Aprender
 
Making a Better World with Technology Innovations
Making a Better World with Technology InnovationsMaking a Better World with Technology Innovations
Making a Better World with Technology InnovationsImesh Gunaratne
 
Big data, open data and telepathy: technologies for smart, human-scale cities...
Big data, open data and telepathy: technologies for smart, human-scale cities...Big data, open data and telepathy: technologies for smart, human-scale cities...
Big data, open data and telepathy: technologies for smart, human-scale cities...Rick Robinson
 
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the coversAthens Big Data
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data ScienceEdureka!
 
Francis da costa rethinks the internet of things zd_net
Francis da costa rethinks the internet of things   zd_netFrancis da costa rethinks the internet of things   zd_net
Francis da costa rethinks the internet of things zd_netMeshDynamics
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
Kinectic vision looking deep into depth
Kinectic vision   looking deep into depthKinectic vision   looking deep into depth
Kinectic vision looking deep into depthppd1961
 
Realtime data processing with Flink and Druid by Youngpyo Lee, SKT
Realtime data processing with Flink and Druid by Youngpyo Lee, SKTRealtime data processing with Flink and Druid by Youngpyo Lee, SKT
Realtime data processing with Flink and Druid by Youngpyo Lee, SKTMetatron
 
Vinay Reddy resume
Vinay Reddy resumeVinay Reddy resume
Vinay Reddy resumeVinay Reddy
 
Ds latino alejandrov4
Ds latino alejandrov4Ds latino alejandrov4
Ds latino alejandrov4alejandro_xf
 
A novel programmable attenuator based low Gm-OTA for biomedical applications
A novel programmable attenuator based low Gm-OTA for biomedical applicationsA novel programmable attenuator based low Gm-OTA for biomedical applications
A novel programmable attenuator based low Gm-OTA for biomedical applicationsHoopeer Hoopeer
 
Web 2.0 NY: When Products Start Talking Back
Web 2.0 NY: When Products Start Talking BackWeb 2.0 NY: When Products Start Talking Back
Web 2.0 NY: When Products Start Talking BackGarrick Schmitt
 

Similaire à dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler (20)

Crossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at ScaleCrossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at Scale
 
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...
Tihomir Dovramadjiev PhD. BLENDER ANIMATION. 3D Video Fantasy Battle Animatio...
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
 
Azureday 2020 - The Edge talks - long road into the Cloud​
Azureday 2020 - The Edge talks - long road into the Cloud​Azureday 2020 - The Edge talks - long road into the Cloud​
Azureday 2020 - The Edge talks - long road into the Cloud​
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Guía de usuario
Guía de usuarioGuía de usuario
Guía de usuario
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
Making a Better World with Technology Innovations
Making a Better World with Technology InnovationsMaking a Better World with Technology Innovations
Making a Better World with Technology Innovations
 
Big data, open data and telepathy: technologies for smart, human-scale cities...
Big data, open data and telepathy: technologies for smart, human-scale cities...Big data, open data and telepathy: technologies for smart, human-scale cities...
Big data, open data and telepathy: technologies for smart, human-scale cities...
 
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
 
Francis da costa rethinks the internet of things zd_net
Francis da costa rethinks the internet of things   zd_netFrancis da costa rethinks the internet of things   zd_net
Francis da costa rethinks the internet of things zd_net
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Kinectic vision looking deep into depth
Kinectic vision   looking deep into depthKinectic vision   looking deep into depth
Kinectic vision looking deep into depth
 
Realtime data processing with Flink and Druid by Youngpyo Lee, SKT
Realtime data processing with Flink and Druid by Youngpyo Lee, SKTRealtime data processing with Flink and Druid by Youngpyo Lee, SKT
Realtime data processing with Flink and Druid by Youngpyo Lee, SKT
 
Vinay Reddy resume
Vinay Reddy resumeVinay Reddy resume
Vinay Reddy resume
 
Ds latino alejandrov4
Ds latino alejandrov4Ds latino alejandrov4
Ds latino alejandrov4
 
A novel programmable attenuator based low Gm-OTA for biomedical applications
A novel programmable attenuator based low Gm-OTA for biomedical applicationsA novel programmable attenuator based low Gm-OTA for biomedical applications
A novel programmable attenuator based low Gm-OTA for biomedical applications
 
Web 2.0 NY: When Products Start Talking Back
Web 2.0 NY: When Products Start Talking BackWeb 2.0 NY: When Products Start Talking Back
Web 2.0 NY: When Products Start Talking Back
 

Plus de Sascha Dittmann

Hochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureHochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureSascha Dittmann
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at ScaleSascha Dittmann
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSascha Dittmann
 
dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric Sascha Dittmann
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSascha Dittmann
 
Hadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelHadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelSascha Dittmann
 
Microsoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightMicrosoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightSascha Dittmann
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)Sascha Dittmann
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile ServicesSascha Dittmann
 
Developer Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopDeveloper Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopSascha Dittmann
 
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)Sascha Dittmann
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudSascha Dittmann
 
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv....NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...Sascha Dittmann
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureSascha Dittmann
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Sascha Dittmann
 

Plus de Sascha Dittmann (18)

C# + SQL = Big Data
C# + SQL = Big DataC# + SQL = Big Data
C# + SQL = Big Data
 
Hochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureHochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft Azure
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
 
dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der Praxis
 
Hadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelHadoop 2.0 - The Next Level
Hadoop 2.0 - The Next Level
 
Microsoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightMicrosoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsight
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Services
 
Developer Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopDeveloper Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing Workshop
 
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die Cloud
 
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv....NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
 
Big Data & NoSQL
Big Data & NoSQLBig Data & NoSQL
Big Data & NoSQL
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und Azure
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1
 

Dernier

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Dernier (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler

  • 1. Sascha Dittmann Blog: http://www.sascha-dittmann.de Twitter: @SaschaDittmann Microsoft HDInsight für .NET Entwickler Big Data Analysen mit JavaScript und C#
  • 2. Large Hadron Collider (CERN Schweiz) http://public.web.cern.ch/public/en/lhc/Computing-en.html Der LHC Teilchenbeschleuniger produziert 15 PB Messdaten pro Jahr*
  • 3. Woher kommt Big Data 70% of U.S. smartphone owners regularly shop online via their devices. 44% of users (350M people) access Facebook via mobile devices. 50% of millennials use mobile devices to research products. 60%of U.S. mobile data will be audio and video streaming by 2014. Mobility 2/3of the world's mobile data traffic will be video by 2016. 33%of BI will be consumed via handheld devices by 2013. Gaming consoles are now used an average of 1.5 hrs/wk to connect to the Internet. 80%growth of unstructured data is predicted over the next five years. 1.8 zettabytes of digital data were in use worldwide in 2011, up 30% from 2010. 1 in 4 Facebook users add their location to posts (2B/month). 500M Tweets are hosted on Twitter each day. 38% of people recommend a brand they “like” or follow on a social network. 100M Facebook “likes” per day. Brands get Big Data Social Mobility Cloud
  • 4. Big Data Szenarien Web app optimization Smart meter monitoring Equipment monitoring Advertising analysis Life sciences research Fraud detection Healthcare outcomes Weather forecasting Natural resource exploration Social network analysis Churn analysis Traffic flow optimization IT infrastructure optimization Legal discovery
  • 5. Big Data ist sexy http://hbr.org/
  • 6. Apache Hadoop Ecosystem MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) HBase (Column DB) Pig (Data Flow) Hive (Warehouse and Data Access) Oozie (Workflow) Sqoop Traditional BI Tools HBase / Cassandra (Columnar NoSQL Databases) Avro(Serialization) Zookeeper(Coordination) Apache Mahout Cascading (programming model) Hadoop = MapReduce + HDFS Flume
  • 7. Microsoft HDInsight MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) HBase (Column DB) Pig (Data Flow) Hive (Warehous e and Data Access) Oozie (Workflow) Sqoop Traditional BI Tools HBase / Cassandra (Columnar NoSQL Databases) Avro(Serialization) Zookeeper(Coordination) Apache Mahout Cascading (programmin g model) Hadoop = MapReduce + HDFS Flume Windows SystemCenter ActiveDirectory Visual Studio
  • 8. Hadoop Distributed File System (HDFS) Bootvorgang Ausfallsicherheit Benutzeranfrage
  • 9. Hadoop Distributed File System (HDFS) Bootvorgang Ausfallsicherheit Benutzeranfrage
  • 11. Hadoop Distributed File System (HDFS)  Portable Operating System Interface (POSIX)  Replikation auf mehrere Datenknoten js> #ls /user/Sascha/input/ncdc Found 9 items drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:09 /user/Sascha/input/ncdc/all drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:01 /user/Sascha/input/ncdc/all2 drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/metadata drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro-tab -rw-r--r-- 3 Sascha supergroup 529 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt -rw-r--r-- 3 Sascha supergroup 168 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
  • 13. Map/Reduce am Beispiel von Messdaten 0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999 Jahr Lufttemperatur
  • 14. Map/Reduce am Beispiel von Messdaten 0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999 Messqualität
  • 16. Map/Reduce mit Combine Methode Map Combine Sort Shuffle DataNode Map Combine Sort Shuffle DataNode Map Combine Sort Shuffle DataNode Reduce 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1950,22 1950,55 1952,-11 1950,33 1949,0 1950,55 1952,-11 1950,33 1949,0 1950,[33,55] 1952,-11 1949,0 1950,55 1952,-11
  • 17. Map/Reduce am Beispiel von Messdaten
  • 18. Wörter zählen mit JavaScript (Map)
  • 19. Wörter zählen mit JavaScript (Reduce)
  • 21. Verfeinern mit Pig Latin pig .from("/user/Sascha/input/texte") .mapReduce("/user/…/WordCount.js" , "Woerter, Anzahl:long") .orderBy("Anzahl DESC") .take(15) .to("/user/Sascha/output/Top15Woerter")
  • 23. Wörter zählen mit C# (Map - Classic)
  • 24. Wörter zählen mit C# (Reduce - Classic)
  • 26. .NET Job Submission Framework (Map)
  • 27. .NET Job Submission Framework (Reduce)
  • 28. Externe Hive-Tabelle erzeugen CREATE EXTERNAL TABLE twitter_raw ( tweet_json STRING ) COMMENT 'Twitter Sample Data' ROW FORMAT DELIMITED LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/example/twitterdata';
  • 29. Twitter JSON { "possibly_sensitive_editable":true, "place":null, "text":"Pre - #ConvCloud chat insights. " #Cloud Security, are we missing the point?" from @christianve http://t.co/Smo0CPvb #HP #cloudsource”, "id_str":"223418953114984448”, "favorited":false, "possibly_sensitive":false, "created_at":"Thu Jul 12 14:10:04 +0000 2012", "retweeted":false, "retweet_count":0, "user":{ "is_translator":false, "profile_use_background_image":true, "profile_image_url_https":"https://si0.twimg.com/profile_images/640456324/ Paul_Calento_normal.jpg", "id_str":"103006513", "profile_text_color":"333333", "statuses_count":5984, "following":null, "followers_count":744, "default_profile_image":false, "profile_link_color":"FF3300", }, ….. }
  • 30. JSON in Hive interpretieren FROM twitter_raw INSERT OVERRIDE TABLE twitter_temp SELECT get_json_object(tweet_json, '$.created_at'), substr(get_json_object(tweet_json, '$.created_at'),9,2), substr(get_json_object(tweet_json, '$.created_at'),12,8), get_json_object(tweet_json, '$.in_reply_to_user_id_str'), get_json_object(tweet_json, '$.text'), get_json_object(tweet_json, '$.contributors'), get_json_object(tweet_json, '$.retweeted'), get_json_object(tweet_json, '$.truncated'), get_json_object(tweet_json, '$.favorited'), cast(get_json_object(tweet_json, '$.retweet_count') as int), /* … */ get_json_object(tweet_json, '$.user.profile_image_url_https'), cast(get_json_object(tweet_json, '$.user.followers_count') as int), get_json_object(tweet_json, '$.user.location'), get_json_object(tweet_json, '$.user.time_zone'), get_json_object(tweet_json, '$.user.created_at');
  • 31. Hive
  • 32. RDBMS vs. Hadoop RDBMS Hadoop Volumen Gigabyte Petabyte Verarbeitung Ad-Hoc und batch Batch Updates Viele Lese- und Schreibzugriffe Einmal schreiben, Viele Lesezugriffe Schema Statisches Schema Dynamisches Schema Datenintegrität Hoch Niedrig Skalierverhalten Nicht-Linear Linear
  • 33. Polybase / SQL Server PDW