SlideShare une entreprise Scribd logo
1  sur  28
Global Sponsor:
Big Data on the Microsoft Platform
Andrew J. Brust, CEO, Blue Badge Insights
With Hadoop, MS BI and the SQL Server stack
CEO and Founder, Blue Badge Insights
Big Data blogger for ZDNet
Microsoft Regional Director, MVP
Co-chair VSLive! and 17 years as a speaker
Founder, Microsoft BI User Group of NYC
 http://www.msbigdatanyc.com
Co-moderator, NYC .NET Developers Group
 http://www.nycdotnetdev.com
“Redmond Review” columnist for
Visual Studio Magazine
brustblog.com, Twitter: @andrewbrust
Meet Andrew
Read all about it!
My New Blog (bit.ly/bigondata)
Agenda
Big Data, Hadoop and HDInsight
MapReduce
Hive ODBC, BI Stack
Hekaton, NoSQL
SQL Server Parallel Data Warehouse, MPP, PolyBase
What is Big Data?
100s of TB into PB and higher
Involving data from: financial data, sensors, web logs,
social media, etc.
Parallel processing often involved
 Hadoop is emblematic, but other technologies are Big Data too
Processing of data sets too large for transactional
databases
 Analyzing interactions, rather than transactions
 The three V’s: Volume, Velocity, Variety
•Big Data tech sometimes imposed on small data problems
What is Hadoop?
Open source implementation of Google’s MapReduce and
GFS (Google File System)
Allows for scale-out processing of petabyte scale data
 1 PB = 1,024 TB
Also distributed storage
Commodity hardware
Can work against flat files, or certain database formats
Native processing involves imperative Java code
Other languages supported through “Streaming”
7
What is HDInsight?
Microsoft’s Hadoop distribution, on Windows
 Most other distros on Linux
Based on Hortonworks Data Platform (HDP)
Runs on Azure, eventually on Windows Server, and as
sandbox on dev PC
For .NET devs: .NET SDK for Hadoop, LINQ provider
8
Global Sponsor:
Demo
HDInsight
The Hadoop Stack
MapReduce, HDFS
Database
RDBMS Import/Export
Query: HiveQL and Pig Latin
Machine Learning/Data Mining
Log file integration
MapReduce, in a Diagram
mapper
mapper
mapper
mapper
mapper
mapper
Input
reducer
reducer
reducer
Input
Input
Input
Input
Input
Input
Output
Output
Output
Output
Output
Output
Output
Input
Input
Input
K1
K2
K3
Output
Output
Output
A MapReduce Example
• Count by suite, on each floor
• Send per-suite, per platform totals to lobby
• Sort totals by platform
• Send two platform packets to 10th, 20th, 30th floor
• Tally up each platform
• Merge tallies into one spreadsheet
• Collect the tallies
MapReduce Options
Pig, Hive, Sqoop, Mahout also generate MapReduce code
13
Java
JavaScript (“Rhino”)
Other languages, especially Python, via Streaming
C# via Streaming
C# via .NET SDK
Hortonworks Data
Platform for
Windows
MRLib (NuGet
Package)
LINQ to Hive
OdbcClient +
Hive ODBC
Driver
Deployment
WebHDFS
client
MR code in C#,
HadoopJob,
MapperBase,
ReducerBase
Amenities for
Visual Studio/.NET
Global Sponsor:
Demo
MapReduce
Hive
Began as Hadoop sub-project
 Now top-level Apache project
Provides a SQL-like (“HiveQL”) abstraction over
MapReduce
Has its own HDFS table file format (and it’s fully schema-
bound)
Can also work over HBase
Acts as a bridge to many BI products which expect tabular
data
Hive ODBC Consumers
17
Excel 2010 or 2013 (including via add-in)
PowerPivot
SQL Server Analysis Services, Tabular Mode
SQL Server Reporting Services
ADO.NET OdbcClient provider
LINQ provider
xVelocity Technologies
Formerly known as VertiPaq
PowerPivot, SSAS Tabular, SQL Server columnar indexes
Implements BI Semantic Model (BISM)
Uses column store technology
 Compression
 In-memory
 Speed
Not a Big Data technology per se, but very useful for
analysis of job output
18
Power View
Reports on BISM models (PowerPivot, SSAS Tabular)
Hosted in SharePoint 2010, 2013 Enterprise
Also Excel 2013 (but not on ARM/Windows RT)
Interactive data exploration
19
Global Sponsor:
Demo
Hive ODBC + BI Stack
Project “Hekaton”
In-memory engine for SQL Server transactional workloads
Tables must be declared as in-memory explicitly
In-memory and standard tables can coexist in same db
Stored procs on in-mem tables are compiled to native code
Hekaton and xVelocity are separate
Hekaton ≠ PowerPivot/SSAS Tabular
Hekaton ≠ Columnstore indexes
Compare to SAP HANA
 In-memory, transactional, analytical, column store
21
NoSQL
NoSQL databases are non-relational and non- or loosely-
schematized
HBase is a NoSQL database, of the wide column variety
 Hive implements a SQL layer over it
 HBase not yet in HDInsight
HBase table = HDFS file
Three other NoSQL categories
 Key-value store, document store, graph database
 Azure Table Storage is a key-value store NoSQL database
Some of them aren’t really Big Data tools, but market
themselves that way anyway
22
SQL Parallel Data Warehouse (PDW)
SQL PDW is a Massively Parallel Processing (MPP)
database
Teradata, IBM Netezza, HP Vertica also in this category
It’s an array/cluster of SQL servers made to look like one
SQL Server
Available as appliance only
 Purchase from HP, Dell
 Server, storage and network all pre-built and configured
Many other MPP products based on PostgreSQL
PDW loosely based on acquired DATAllegro product
 Implemented MPP with Ingres, written in Java, running on Linux
23
MapReduce versus MPP
24
MapReduce MPP
 Splits preprocessing amongst mapper
nodes and aggregation amongst reducers
 Scales infinitely on commodity hardware
 Uses locally attached commodity disks on
nodes
 Uses imperative code
 Processes flat files, wide column tables
(HBase), relational tables (Hive)
 Divide-and-conquer approach, parallel,
distributed
 Splits query amongst nodes then unifies
result sets
 Scales to high-end assets in the
appliance cabinet
 Uses shared storage (can be more
network traffic)
 Uses SQL
 Works with relational tables only
 Divide-and-conquer approach, parallel,
distributed
PolyBase
To be included in next version of PDW
Mashup of SQL Server and Hadoop
Enables PDW to address Hadoop data nodes (HDFS)
directly
Parallelism managed by PDW
Tables are imported into SQL Server db
 They are EXTERNAL tables
 They can participate in joins with standard tables
25
Resources
MS Big Data/HDInsight
 http://www.microsoft.com/bigdata
Apache Hadoop
 http://hadoop.apache.org/
Apache HBase
 http://hbase.apache.org/
SQL PDW
 http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/pdw.aspx
PolyBase
 http://gsl.azurewebsites.net/Projects/Polybase.aspx
xVelocity
 http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/in-
memory.aspx
Column store
 http://en.wikipedia.org/wiki/Column-oriented_DBMS
Power View
 http://office.microsoft.com/en-us/excel-help/power-view-explore-visualize-and-present-your-
data-HA102835634.aspx
Hekaton
 http://blogs.technet.com/b/dataplatforminsider/archive/2012/11/08/breakthrough-performance-
with-in-memory-technologies.aspx
26
Global Sponsor:
Questions?
Global Sponsor:
Thank You for Attending

Contenu connexe

Tendances

Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooAndrew Brust
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms Andrew Brust
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionAndrew Brust
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012Andrew Brust
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational DatabasesChris Baglieri
 

Tendances (20)

Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in Action
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Rdbms vs. no sql
Rdbms vs. no sqlRdbms vs. no sql
Rdbms vs. no sql
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 

En vedette

Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureEduardo Castro
 
Carnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraCarnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraMeera Raghu
 
Guide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsGuide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsMeera Raghu
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQLkristinferrier
 
An introduction to the Recorder
An introduction to the Recorder An introduction to the Recorder
An introduction to the Recorder Sandra Morgan
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureLynn Langit
 
Aws vs. Azure: 5 Things You Need To Know
Aws vs. Azure: 5 Things You Need To KnowAws vs. Azure: 5 Things You Need To Know
Aws vs. Azure: 5 Things You Need To KnowScalr
 
AWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAniket Kanitkar
 
Azure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to KnowAzure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to KnowRightScale
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingIlyas F ☁☁☁
 
Compare Clouds: Aws vs Azure vs Google vs SoftLayer
Compare Clouds: Aws vs Azure vs Google vs SoftLayerCompare Clouds: Aws vs Azure vs Google vs SoftLayer
Compare Clouds: Aws vs Azure vs Google vs SoftLayerRightScale
 

En vedette (16)

Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage Azure
 
Jathiswara
JathiswaraJathiswara
Jathiswara
 
Carnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraCarnatic Music Notations: Alankara
Carnatic Music Notations: Alankara
 
Guide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsGuide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music Notations
 
Recorder lesson
Recorder lessonRecorder lesson
Recorder lesson
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
An introduction to the Recorder
An introduction to the Recorder An introduction to the Recorder
An introduction to the Recorder
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 
Aws vs. Azure: 5 Things You Need To Know
Aws vs. Azure: 5 Things You Need To KnowAws vs. Azure: 5 Things You Need To Know
Aws vs. Azure: 5 Things You Need To Know
 
AWS vs. Azure
AWS vs. AzureAWS vs. Azure
AWS vs. Azure
 
AWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services Comparison
 
Azure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to KnowAzure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to Know
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
 
Compare Clouds: Aws vs Azure vs Google vs SoftLayer
Compare Clouds: Aws vs Azure vs Google vs SoftLayerCompare Clouds: Aws vs Azure vs Google vs SoftLayer
Compare Clouds: Aws vs Azure vs Google vs SoftLayer
 

Similaire à Big Data on the Microsoft Platform

Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydsrikanth K
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014Stratebi
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerMark Kromer
 

Similaire à Big Data on the Microsoft Platform (20)

Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
מיכאל
מיכאלמיכאל
מיכאל
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hyd
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Hadoop intro
Hadoop introHadoop intro
Hadoop intro
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
 

Plus de Andrew Brust

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabsAndrew Brust
 
Hitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIHitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIAndrew Brust
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackAndrew Brust
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012Andrew Brust
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataAndrew Brust
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmAndrew Brust
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Andrew Brust
 

Plus de Andrew Brust (8)

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabs
 
Hitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIHitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BI
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s Data
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch Paradigm
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis
 

Big Data on the Microsoft Platform

  • 1. Global Sponsor: Big Data on the Microsoft Platform Andrew J. Brust, CEO, Blue Badge Insights With Hadoop, MS BI and the SQL Server stack
  • 2. CEO and Founder, Blue Badge Insights Big Data blogger for ZDNet Microsoft Regional Director, MVP Co-chair VSLive! and 17 years as a speaker Founder, Microsoft BI User Group of NYC  http://www.msbigdatanyc.com Co-moderator, NYC .NET Developers Group  http://www.nycdotnetdev.com “Redmond Review” columnist for Visual Studio Magazine brustblog.com, Twitter: @andrewbrust Meet Andrew
  • 4. My New Blog (bit.ly/bigondata)
  • 5. Agenda Big Data, Hadoop and HDInsight MapReduce Hive ODBC, BI Stack Hekaton, NoSQL SQL Server Parallel Data Warehouse, MPP, PolyBase
  • 6. What is Big Data? 100s of TB into PB and higher Involving data from: financial data, sensors, web logs, social media, etc. Parallel processing often involved  Hadoop is emblematic, but other technologies are Big Data too Processing of data sets too large for transactional databases  Analyzing interactions, rather than transactions  The three V’s: Volume, Velocity, Variety •Big Data tech sometimes imposed on small data problems
  • 7. What is Hadoop? Open source implementation of Google’s MapReduce and GFS (Google File System) Allows for scale-out processing of petabyte scale data  1 PB = 1,024 TB Also distributed storage Commodity hardware Can work against flat files, or certain database formats Native processing involves imperative Java code Other languages supported through “Streaming” 7
  • 8. What is HDInsight? Microsoft’s Hadoop distribution, on Windows  Most other distros on Linux Based on Hortonworks Data Platform (HDP) Runs on Azure, eventually on Windows Server, and as sandbox on dev PC For .NET devs: .NET SDK for Hadoop, LINQ provider 8
  • 10. The Hadoop Stack MapReduce, HDFS Database RDBMS Import/Export Query: HiveQL and Pig Latin Machine Learning/Data Mining Log file integration
  • 11. MapReduce, in a Diagram mapper mapper mapper mapper mapper mapper Input reducer reducer reducer Input Input Input Input Input Input Output Output Output Output Output Output Output Input Input Input K1 K2 K3 Output Output Output
  • 12. A MapReduce Example • Count by suite, on each floor • Send per-suite, per platform totals to lobby • Sort totals by platform • Send two platform packets to 10th, 20th, 30th floor • Tally up each platform • Merge tallies into one spreadsheet • Collect the tallies
  • 13. MapReduce Options Pig, Hive, Sqoop, Mahout also generate MapReduce code 13 Java JavaScript (“Rhino”) Other languages, especially Python, via Streaming C# via Streaming C# via .NET SDK
  • 14. Hortonworks Data Platform for Windows MRLib (NuGet Package) LINQ to Hive OdbcClient + Hive ODBC Driver Deployment WebHDFS client MR code in C#, HadoopJob, MapperBase, ReducerBase Amenities for Visual Studio/.NET
  • 16. Hive Began as Hadoop sub-project  Now top-level Apache project Provides a SQL-like (“HiveQL”) abstraction over MapReduce Has its own HDFS table file format (and it’s fully schema- bound) Can also work over HBase Acts as a bridge to many BI products which expect tabular data
  • 17. Hive ODBC Consumers 17 Excel 2010 or 2013 (including via add-in) PowerPivot SQL Server Analysis Services, Tabular Mode SQL Server Reporting Services ADO.NET OdbcClient provider LINQ provider
  • 18. xVelocity Technologies Formerly known as VertiPaq PowerPivot, SSAS Tabular, SQL Server columnar indexes Implements BI Semantic Model (BISM) Uses column store technology  Compression  In-memory  Speed Not a Big Data technology per se, but very useful for analysis of job output 18
  • 19. Power View Reports on BISM models (PowerPivot, SSAS Tabular) Hosted in SharePoint 2010, 2013 Enterprise Also Excel 2013 (but not on ARM/Windows RT) Interactive data exploration 19
  • 21. Project “Hekaton” In-memory engine for SQL Server transactional workloads Tables must be declared as in-memory explicitly In-memory and standard tables can coexist in same db Stored procs on in-mem tables are compiled to native code Hekaton and xVelocity are separate Hekaton ≠ PowerPivot/SSAS Tabular Hekaton ≠ Columnstore indexes Compare to SAP HANA  In-memory, transactional, analytical, column store 21
  • 22. NoSQL NoSQL databases are non-relational and non- or loosely- schematized HBase is a NoSQL database, of the wide column variety  Hive implements a SQL layer over it  HBase not yet in HDInsight HBase table = HDFS file Three other NoSQL categories  Key-value store, document store, graph database  Azure Table Storage is a key-value store NoSQL database Some of them aren’t really Big Data tools, but market themselves that way anyway 22
  • 23. SQL Parallel Data Warehouse (PDW) SQL PDW is a Massively Parallel Processing (MPP) database Teradata, IBM Netezza, HP Vertica also in this category It’s an array/cluster of SQL servers made to look like one SQL Server Available as appliance only  Purchase from HP, Dell  Server, storage and network all pre-built and configured Many other MPP products based on PostgreSQL PDW loosely based on acquired DATAllegro product  Implemented MPP with Ingres, written in Java, running on Linux 23
  • 24. MapReduce versus MPP 24 MapReduce MPP  Splits preprocessing amongst mapper nodes and aggregation amongst reducers  Scales infinitely on commodity hardware  Uses locally attached commodity disks on nodes  Uses imperative code  Processes flat files, wide column tables (HBase), relational tables (Hive)  Divide-and-conquer approach, parallel, distributed  Splits query amongst nodes then unifies result sets  Scales to high-end assets in the appliance cabinet  Uses shared storage (can be more network traffic)  Uses SQL  Works with relational tables only  Divide-and-conquer approach, parallel, distributed
  • 25. PolyBase To be included in next version of PDW Mashup of SQL Server and Hadoop Enables PDW to address Hadoop data nodes (HDFS) directly Parallelism managed by PDW Tables are imported into SQL Server db  They are EXTERNAL tables  They can participate in joins with standard tables 25
  • 26. Resources MS Big Data/HDInsight  http://www.microsoft.com/bigdata Apache Hadoop  http://hadoop.apache.org/ Apache HBase  http://hbase.apache.org/ SQL PDW  http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/pdw.aspx PolyBase  http://gsl.azurewebsites.net/Projects/Polybase.aspx xVelocity  http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/in- memory.aspx Column store  http://en.wikipedia.org/wiki/Column-oriented_DBMS Power View  http://office.microsoft.com/en-us/excel-help/power-view-explore-visualize-and-present-your- data-HA102835634.aspx Hekaton  http://blogs.technet.com/b/dataplatforminsider/archive/2012/11/08/breakthrough-performance- with-in-memory-technologies.aspx 26
  • 28. Global Sponsor: Thank You for Attending

Notes de l'éditeur

  1. SQTH8 - Microsoft's Big Play for Big Data - Andrew Brust © 2012 SQL Server Live! All rights reserved.