SlideShare une entreprise Scribd logo
1  sur  46
SQLSaturday #230 Rheinland
Sascha Dittmann
Softwarearchitekt & Entwickler – Ernst & Young GmbH
www.sascha-dittmann.de
Georg Urban
Snr. Technology Solution Professional | Data Platform
georg.urban@microsoft.com
13.07.2013
THE HADOOP ECOSYSTEM
Big Data Characteristics: „3 Vs“
How to deal with the „3 Vs“?
A brief history of Hadoop
2002: Apache Nutch open source search engine ist started by Doug Cutting
2003: Google publishes a paper on GFS (Google Distributed File System)
2004: Nutch Distributed Files System (NDFS) is developed
2004: Google publishes a paper on MapReduce
2005: MapReduce is implemented on NDFS
2006: Doug Cutting joins Yahoo! & starts Apache Hadoop subproject
2008: Hadoop is made a Apache top level project.
…Yahoo„s search index runs on a 10.000 node cluster
…Hadoop breaks record on 1TB sort: 209s on 910 nodes
...New York Times converts 4TB archives in PDFs in 24h on 100 nodes
http://labs.google.com/papers/mapreduce.htm
Today: Hadoop becomes a synonym for Big Data processing
Hadoop: The popular Face of Big Data
RDBMS & Hadoop Comparison
Traditional RDBMS MapReduce
Data Volume Terabytes Petabytes / Hexabytes
Access Interactiv & Batch Batch
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low (BASE*)
Scaling non linear Linear
DBA Ratio 1:40 1:3000
Quelle: Tom White’s Hadoop: The Definitive Guide
*Basically Available, Soft state, Eventual consistency
MapReduce is simple… (well: basically)
The Hadoop Ecosystem (simplified)
Quelle: Tom White’s Hadoop: The Definitive Guide
The Hadoop Ecosystem (parts of it…)
HBase (Column DB)
Hive Mahout
Oozie
Sqoop
HBase/Cassandra/Couch/
MongoDB
Avro
Zookeeper
Pig
Karmasphere
Flume
Cascad-
ing
R
Ambari
HCatalog
Datameer
Hortonworks
Cloudera
SplunkHStreaming
MapRHadapt
Hadoop = MapReduce + HDFS
There‟s even more: Mahout for machine
learning
 Scalable machine learning library that leverages
the Hadoop infrastructure
 Key use cases:
 Recommendation mining
 Clustering
 Classification
 Algorithmns:
K-means Clustering, Naïve Bayes,
Decision Tree, Neural network,
Hierarchical Clustering,
Positive Matrix Factorization and more…
R for statistical computing
 An open and extensible statistical
computing environment
 Based on the S language
 Used by Data Scientists to
explore data and
generate graphical output
 A well-developed
programming language
 Many “Packages” available
to extend R
…but: That‟s not Enterprise ready… Really not…
HDINSIGHT
SERVER & SERVICE
Big Data in the Enterprise should…
fit in an present IT Infrastructure
be easy to manage
rely on existing skill sets
be cost optimized
Why Apache Hadoop on Windows?
 According to IDC Windows Server held 73% market share in 2012
 Hadoop was traditionally built for Linux servers so there are a large number of underserved organizations
 According to 2012 Barclays CIO study big data outranks
virtualization as #1 trend driving spending initiatives
 Unstructured data growth exceeds 80% year/year in most enterprises
 Apache Hadoop is the defacto big data platform
for processing massive amounts of unstructured data
 Complementary to existing Microsoft technologies
 There is a huge untapped community of Windows developers and ecosystem partners
 A strong Microsoft-Hortonworks partnership and 18 months of development makes this a natural next step
OS Cloud VM Appliance
Enterprise Hadoop Distribution Hortonworks
Data Platform (HDP)
Hadoop
designed for Enterprises
The “really complete“ Open
Source Distribution
Eco-System designed for
InteroperabilityPLATTFORM SERVICES
HADOOP CORE
DATA
SERVICES
OPERATIONAL
SERVICES
Management
of Hadoop
Environment
Store, Process
&
Connect
HORTONWORKS
DATA PLATFORM (HDP)
Distributed
Data Storage & Processing
Enterprise Availability
Leadership that Starts at the Core
 Driving next generation Hadoop
 YARN, MapReduce2, HDFS2, High Availability, Disaster Recovery
 420k+ lines authored since 2006
 More than twice nearest contributor
 Deeply integrating w/ecosystem
 Enabling new deployment platforms
 (ex. Windows & Azure, Linux & VMware HA)
 Creating deeply engineered solutions
 (ex. Teradata big data appliance)
 All Apache, NO holdbacks
 100% of code contributed to Apache
HDInsight Windows optimized
Hadoop
Big Data @Microsoft
Microsoft HDInsight Server on Windows Server
Windows Azure HDInsight Service (Cloud)
Enterprise Ready Hadoop
Simplicity & Managebility of Windows
AD Integration
Monitoring (System Center)
Integrated in Microsoft Business Intelligence
JavaScript, HiveODBC, .NET
…
Up and running in minutes with HDInsight Service
Microsoft Big Data Solution (two months ago…)
BIG DATA IN THE CLOUD
Windows Azure: Elastic Big Data
Windows Azure HDInsight Service
Hadoop Cluster
Hadoop on Azure
Azure Blob
Storage
Name
Node
Data
Node
Data
Node
Data
Node
Data
Node
HDFS
On Premise Enterprise
Content
• Transactional DBs
• On Prem logs
• Internal sensors
Cloud Enterprise Content
• Generated in Azure
3rd Party Content
• Azure Datamarket
• Generated/stored
elsewhere
• Public content
• Delivered online
Azure Blob
Storage
SQL Azure
Application
end point
Using Blob Storage From HDInsight
 HDInsight cluster is bound to one “default” blob storage account
& container at cluster create time
 Using the “default” container requires no special addressing to
access (“/” == root folder, etc)
 Access additional blob storage accounts or containers:
 Storage accounts need to be registered in site-config.xml:
asv[s]://<container>@<account>.blob.core.windows.net/<path>
<property>
<name>fs.azure.account.key.accountname</name>
<value>enterthekeyvaluehere</value>
</property>
Transporting Data with AzCopy
 Utility for moving data to/from Azure Blob Storage
(like robocopy)
 50MB/s transfer rate in data center
Container Blob Name
mycontainer a.txt
mycontainer b.txt
mycontainer dir1c.txt
mycontainer dir1dir2d.txt
Intro to HDInsight
Map/Reduce
Map
Sort
Shuffle
DataNode
Map
Sort
Shuffle
DataNode
Map
Sort
Shuffle
DataNode
Reduce
0067011990999991950051507004+68750
0043011990999991950051512004+68750
0043011990999991950051518004+68750
0043012650999991949032412004+62300
0043012650999991949032418004+62300
1949,0
1950,22
1950,55
1952,-11
1950,33
1949,0
1950,[22,33,55]
1952,-11
1949,0
1950,55
1952,-11
Map/Reduce mit Combine
Map
Combine
Sort
Shuffle
DataNode
Map
Combine
Sort
Shuffle
DataNode
Map
Combine
Sort
Shuffle
DataNode
Reduce
0067011990999991950051507004+68750
0043011990999991950051512004+68750
0043011990999991950051518004+68750
0043012650999991949032412004+62300
0043012650999991949032418004+62300
1949,0
1950,22
1950,55
1952,-11
1950,33
1949,0
1950,55
1952,-11
1950,33
1949,0
1950,[33,55]
1952,-11
1949,0
1950,55
1952,-11
Map/Reduce (JavaScript)
Verfeinern mit Pig Latin
pig
.from("/user/Sascha/input/twitter")
.mapReduce("/user/…/FollowersCount.js"
, "User, Followers:long")
.orderBy("Followers DESC")
.take(10)
.to("/user/Sascha/output/Top10Followers")
Pig Latin
Map in C# (Classic)
Reduce in C# (Classic)
Map/Reduce mit C#
.NET Job Submission Framework (Map)
.NET Job Submission Framework (Reduce)
Vielen Dank an die Volunteers!
13.07.2013 |
Große Verlosung!
 Am Ende der Veranstaltung (ca. 18:00 Uhr)
 Gewinnt viele Preise!
 Deshalb:
13.07.2013 |
Besucht unsere Sponsoren!
Unsere „You Rock! “ Sponsoren
13.07.2013 |
Vielen Dank an all unsere Sponsoren!
13.07.2013 |
Gold
Silber
Bronze
Media Sponsoren:
13.07.2013 |
Hands-on event: PASS Camp 2013!
13.07.2013 |

Contenu connexe

Tendances

Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseGrant Fritchey
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resumeTarun P
 
Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesDatabricks
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageKai Sasaki
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Gpu computing workshop
Gpu computing workshopGpu computing workshop
Gpu computing workshopdatastack
 
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAACuneyt Goksu
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudDataWorks Summit
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...Insight Technology, Inc.
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
How Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BIHow Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BIDenny Lee
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedAnant Kumar
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013IntelAPAC
 

Tendances (20)

Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resume
 
Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta Lakes
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Hadoop
HadoopHadoop
Hadoop
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Gpu computing workshop
Gpu computing workshopGpu computing workshop
Gpu computing workshop
 
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloud
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
How Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BIHow Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BI
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013
 

En vedette

Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoophadooparchbook
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure FunctionsJim O'Neil
 
Azure api app métricas com application insights
Azure api app métricas com application insightsAzure api app métricas com application insights
Azure api app métricas com application insightsNicolas Takashi
 
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...Mike Martin
 
Microsoft NYC 14
Microsoft NYC 14Microsoft NYC 14
Microsoft NYC 14SwitchPitch
 
Big data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on AzureBig data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on AzureWillem Meints
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightPaco Nathan
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with AzureSteve Lee
 
Going serverless
Going serverlessGoing serverless
Going serverlessTechExeter
 
Open up to a better learning ecosystem
Open up to a better learning ecosystemOpen up to a better learning ecosystem
Open up to a better learning ecosystemKatie Bradford
 
Spark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattleSpark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattleJudy Nash
 
Azure functions
Azure functionsAzure functions
Azure functionsvivek p s
 
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloudAzure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloudToradex
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azureEyal Ben Ivri
 
Microsoft Azure For Solutions Architects
Microsoft Azure For Solutions ArchitectsMicrosoft Azure For Solutions Architects
Microsoft Azure For Solutions ArchitectsRoy Kim
 

En vedette (20)

Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure Functions
 
Azure api app métricas com application insights
Azure api app métricas com application insightsAzure api app métricas com application insights
Azure api app métricas com application insights
 
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
Belgian Windows Server 2012 Launch windows azure insights for the enterprise ...
 
Microsoft NYC 14
Microsoft NYC 14Microsoft NYC 14
Microsoft NYC 14
 
Big data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on AzureBig data streaming with Apache Spark on Azure
Big data streaming with Apache Spark on Azure
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Azure IOT
Azure IOTAzure IOT
Azure IOT
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Software scope
Software scopeSoftware scope
Software scope
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure2016-08-25 TechExeter - going serverless with Azure
2016-08-25 TechExeter - going serverless with Azure
 
Going serverless
Going serverlessGoing serverless
Going serverless
 
Open up to a better learning ecosystem
Open up to a better learning ecosystemOpen up to a better learning ecosystem
Open up to a better learning ecosystem
 
Spark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattleSpark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattle
 
Azure functions
Azure functionsAzure functions
Azure functions
 
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloudAzure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
Azure IoT Hub on a Toradex Colibri VF61 – Part 1 - Sending data to the cloud
 
Going serverless
Going serverlessGoing serverless
Going serverless
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
Microsoft Azure For Solutions Architects
Microsoft Azure For Solutions ArchitectsMicrosoft Azure For Solutions Architects
Microsoft Azure For Solutions Architects
 

Similaire à SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)

Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public CloudIMC Institute
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
 
Big Data/Hadoop Option Analysis
Big Data/Hadoop Option AnalysisBig Data/Hadoop Option Analysis
Big Data/Hadoop Option Analysiszafarali1981
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detikk4ndar
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoopGanesh Sanap
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)GeeksLab Odessa
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Anna Shymchenko
 
DWH & big data architecture approaches
DWH & big data architecture approachesDWH & big data architecture approaches
DWH & big data architecture approachesLuxoft
 

Similaire à SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1) (20)

Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public Cloud
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Big Data/Hadoop Option Analysis
Big Data/Hadoop Option AnalysisBig Data/Hadoop Option Analysis
Big Data/Hadoop Option Analysis
 
Big Data
Big DataBig Data
Big Data
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Hadoop Overview
Hadoop OverviewHadoop Overview
Hadoop Overview
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop
HadoopHadoop
Hadoop
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
DWH & big data architecture approaches
DWH & big data architecture approachesDWH & big data architecture approaches
DWH & big data architecture approaches
 

Plus de Sascha Dittmann

Hochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureHochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureSascha Dittmann
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at ScaleSascha Dittmann
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSascha Dittmann
 
dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric Sascha Dittmann
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSascha Dittmann
 
Hadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelHadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelSascha Dittmann
 
Microsoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightMicrosoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightSascha Dittmann
 
dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile ServicesSascha Dittmann
 
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwicklerdotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET EntwicklerSascha Dittmann
 
Developer Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopDeveloper Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopSascha Dittmann
 
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)Sascha Dittmann
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudSascha Dittmann
 
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv....NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...Sascha Dittmann
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureSascha Dittmann
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Sascha Dittmann
 

Plus de Sascha Dittmann (17)

C# + SQL = Big Data
C# + SQL = Big DataC# + SQL = Big Data
C# + SQL = Big Data
 
Hochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft AzureHochskalierbare, relationale Datenbanken in Microsoft Azure
Hochskalierbare, relationale Datenbanken in Microsoft Azure
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSON
 
dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric dotnet Cologne 2015 - Azure Service Fabric
dotnet Cologne 2015 - Azure Service Fabric
 
SQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der PraxisSQL Saturday #313 Rheinland - MapReduce in der Praxis
SQL Saturday #313 Rheinland - MapReduce in der Praxis
 
Hadoop 2.0 - The Next Level
Hadoop 2.0 - The Next LevelHadoop 2.0 - The Next Level
Hadoop 2.0 - The Next Level
 
Microsoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsightMicrosoft HDInsight Podcast #001 - Was ist HDInsight
Microsoft HDInsight Podcast #001 - Was ist HDInsight
 
dotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Servicesdotnet Cologne 2013 - Windows Azure Mobile Services
dotnet Cologne 2013 - Windows Azure Mobile Services
 
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwicklerdotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
 
Developer Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing WorkshopDeveloper Open Space 2012 - Cloud Computing Workshop
Developer Open Space 2012 - Cloud Computing Workshop
 
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
PASS Camp 2012 - Big Data mit Microsoft (Teil 1)
 
CloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die CloudCloudOps Summit 2012 - 3 Wege in die Cloud
CloudOps Summit 2012 - 3 Wege in die Cloud
 
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv....NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
.NET Usergroup Rhein-Neckar: Big Data in der Cloud - Apache Hadoop-based Serv...
 
Big Data & NoSQL
Big Data & NoSQLBig Data & NoSQL
Big Data & NoSQL
 
NoSQL mit RavenDB und Azure
NoSQL mit RavenDB und AzureNoSQL mit RavenDB und Azure
NoSQL mit RavenDB und Azure
 
Windows Azure für Entwickler V1
Windows Azure für Entwickler V1Windows Azure für Entwickler V1
Windows Azure für Entwickler V1
 

Dernier

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)

Notes de l'éditeur

  1. In that capacity,Arun allows Hortonworks to be instrumental in working with the community to drive the roadmap for Core Hadoop, where the focus today is on things like YARN, MapReduce2, HDFS2 and more.For Core Hadoop, in absolute terms, Hortonworkers have contributed more than twice as many lines of code as the next closest contributor, and even more if you include Yahoo, our development partner. Taking such a prominent role also enables us to ensure that our distribution integrates deeply with the ecosystem: on both choice of deployment platforms such as Windows, Azure and more, but also to create deeply engineered solutions with key partners such as Teradata.And consistent with our approach, all of this is done in 100% open source.