SlideShare une entreprise Scribd logo
1  sur  34
Microsoft’s BigData Story
            @LynnLangit




                          April 2013 – Big Data Tech Con
Data Expertise / Lynn Langit
• Industry awards
   – Microsoft – MVP for SQL Server
   – Google – GDE for Cloud Platform
   – 10Gen – Master for MongoDB
• Practicing Architect
• Technical author / trainer
   – Pluralsight – Google Cloud Series
   – DevelopMentor – SQL Server Series
   – 2 books on SQL Server BI
• Former MSFT FTE
   – 4 years
In a Relationship?




  BigData            NoSQL
BigData, NoSQL… => No Microsoft?

  Big Data => keeping / getting more data

  • Cheap Storage
  • Cloud Storage
  • Open Source data projects (Hadoop)

  NoSQL => schema-lite, scalable storage

  • NoSQL data projects
  • Mostly open source
  • Sharded replicas
In a (Open Source) Relationship?

 NoSQL
  Hadoop
             Cloud
 MongoDB

  Neo4j

   Riak       AWS    Heroku   RackSpace   OpenStack

 Cassandra
Data Services

DEMO
HDINSIGHT (HADOOP)
The Reality

              BigData

                Small
                BigData
BigData Lifecycle Management
      Locate
                Quantify
      Qualify
                    Replicate
          Process

                    Present
Locating the data

                                      • you buy it
                            Private
                            source


                   Public
                   source
       • you find it
                                Your source
                        • in SQL Server
                        • on desktops
Finding Data in Data Markets

•   Windows Azure Data Market
•   DataMarket.com
•   Factual.com
•   InfoChimps
Data Services

DEMO
AZURE DATAMARKET
Database Lifecycle Management
• Evaluating current processes
• Improving processes
• Adding new tools
  – SSDT
• Data synchronization processes
Storing the data
  Relational

  • SQL Server – can use partitioning for scalability

  Beyond relational via relational

  • Specialized data types
  • XML, Hierarchy, Filestream/Filetable, Geospatial
  • Columnstore index

  Multi-dimensional / in-memory

  • OLAP cubes / Mining Models
  • Tabular models
Big Data in SQL Server 2012 – Relational Enhancements

DEMO
COLUMNSTORE, XML, FILETABLE
Data Processing


Raw data
           Pre-processed data
                       Detail data
                                     Aggregate data
                                                 Views
Valuing the data
•   De-duplicating
•   Validating
•   Correcting errors
•   Aggregating
•   Ranking / rating
    – Social rating ,i.e. Yelp-like
    – Social scoring, i.e. Freebase-like
Data Services

DEMO
DATA QUALITY SERVICES
Types of Data Quality Projects
 T-SQL scripts (boolean        • Exact matches WHERE = , WHERE WHERE
                                                                   <>,      IN



        match)                 • LIKE string matching
                                     % --




    Full-text matching
 (semantic word match)         • CONTAINS

    Semantic Search
                               • SEMANTICSIMIALARITIESTABLE
(semantic phrase match)

SSIS tasks - (transactional,
 multi-valued matching)        • List below

                               • KnowledgeBase rules/matches
                                                 -

   DQS (KB matching)
                               • DataQualityproject clean correctdata
                                                     -    /




MDS (One view of truth)        • Versioned Entities, Attributes and Rules
Data Presentation
•   View-only client
•   View & manipulate (hide-only) client
•   View & query (aggregate) client
•   View & query (drill through) client
•   View & mash-up (add new data) client
•   View & update client
•   Timeliness of data (latency)
•   Beauty of data
But, does it work in Excel?

          Mash-up
                       Clean up      Extract-   Authorize
          data with
                       data with   Transform-      with     3rd party –
Import   PowerPivot
                         Data       Load with    Master     Mine with
 Data    – including
                        Quality        Data        Data     Predixion
         Hadoop via
                       Services      Explorer    Services
            ODBC
From Pivot tables to Visualized Data Mash-ups with Mining

DEMO
THE POWER OF EXCEL
What about the UDM?

• UDM / Data Mining is fully supported in SSAS
• Must be installed in this mode
  – Mutually exclusive to Tabular mode
• But, should you use it anymore?
Big Data in SQL Server 2012
– Non-Relational Features

DEMO
TABULAR MODELS
DATA MINING
Data Consumability
 (Accurate)   Valid
                      (Meaningful)




                                     Recognizable
                                                    (Useful)




                                                               Appropriate
                                                                             (Appealing)




                                                                                           Beautiful
                                                                                                       (Satisfying)




                                                                                                                      Enjoyable
PowerView for
Tabular Models

DEMO
POWERVIEW
Data Fluency and Job Roles


Consumer       Analyzer       Cleaner        Artist
• View and     • View,        • Validate     • Visualize
  understand     manipulate     and update     and present
                 and decide
BigData in SQL Server 2012
                                   • Scaling via
                                     • Partitioning for Tables, indexes
                                   • PDW
               Relational          • Columnstore indexes
                engine             • Special Data Types
                                     • XML, Hierarchy, Filetable



                           • OLAP Cubes
          Analysis         • Tabular Models
       service engines     • Data Mining Models




                                   • Data Quality Services
                 Other             • Master Data Services
                services           • StreamInsight
Other Data Services from Microsoft

           Windows
            Azure       SQL Azure
          Marketplace




              Data
                        Power Pivot
            Explorer
NoSQL – New Products / Betas


                            SSRS on
           Semantic         Azure
           Search


                                    HDInsight
        PowerView                   (Hadoop on
                                    Azure)


                    Cloud-based
                    Data Explorer
Announced Futures
The Changing Data Landscape

                               Other
                              Services
RDBMS
         NoSQL
• recipes)




    www.TeachingKidsProgramming.org
      •   Free Courseware
      •   Do a Recipe  Teach a Kid (Ages 10 ++)
      •   Java or Microsoft SmallBasic
      •   C# on Pluralsight
Toward Data Craftsmanship…

                 Follow me
                 • @LynnLangit
                 • www.LynnLangit.com
                 • YouTube - SoCalDevGal




            Hire me
            • To help build your BI/Big Data solution
            • To teach your team next gen BI
            • To learn more about using NoSQL solutions

Contenu connexe

Tendances

Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics SuiteJames Serra
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on AzureTrivadis
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeRick van den Bosch
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleVasu S
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS CloudIdan Tohami
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 

Tendances (20)

Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 
Interactive query using hadoop
Interactive query using hadoopInteractive query using hadoop
Interactive query using hadoop
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 

Similaire à The Microsoft BigData Story

Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012Andrew Brust
 
SSAS Design &amp; Incremental Processing - PASSMN May 2010
SSAS Design &amp; Incremental Processing - PASSMN May 2010SSAS Design &amp; Incremental Processing - PASSMN May 2010
SSAS Design &amp; Incremental Processing - PASSMN May 2010Dan English
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLTugdual Grall
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
MS Sql Server: Datamining Introduction
MS Sql Server: Datamining IntroductionMS Sql Server: Datamining Introduction
MS Sql Server: Datamining Introductionsqlserver content
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud
 
Software architecture & design patterns for MS CRM Developers
Software architecture & design patterns for MS CRM  Developers Software architecture & design patterns for MS CRM  Developers
Software architecture & design patterns for MS CRM Developers sebedatalabs
 
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...Perficient, Inc.
 
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...Microsoft TechNet - Belgium and Luxembourg
 
2009/12 - Database Architechs - Presentation
2009/12 - Database Architechs - Presentation2009/12 - Database Architechs - Presentation
2009/12 - Database Architechs - PresentationDatabase Architechs
 
2009/12 Database Architechs Presentation
2009/12   Database Architechs Presentation2009/12   Database Architechs Presentation
2009/12 Database Architechs Presentationguest248edc
 
Oracle Advanced Analytics
Oracle Advanced AnalyticsOracle Advanced Analytics
Oracle Advanced Analyticsaghosh_us
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Web Services
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraVishal Puri
 
Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24Martin Bém
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 

Similaire à The Microsoft BigData Story (20)

Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012
 
SSAS Design &amp; Incremental Processing - PASSMN May 2010
SSAS Design &amp; Incremental Processing - PASSMN May 2010SSAS Design &amp; Incremental Processing - PASSMN May 2010
SSAS Design &amp; Incremental Processing - PASSMN May 2010
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
MS Sql Server: Datamining Introduction
MS Sql Server: Datamining IntroductionMS Sql Server: Datamining Introduction
MS Sql Server: Datamining Introduction
 
SQL Server: Data Mining
SQL Server: Data MiningSQL Server: Data Mining
SQL Server: Data Mining
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 
Software architecture & design patterns for MS CRM Developers
Software architecture & design patterns for MS CRM  Developers Software architecture & design patterns for MS CRM  Developers
Software architecture & design patterns for MS CRM Developers
 
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
 
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
 
2009/12 - Database Architechs - Presentation
2009/12 - Database Architechs - Presentation2009/12 - Database Architechs - Presentation
2009/12 - Database Architechs - Presentation
 
2009/12 Database Architechs Presentation
2009/12   Database Architechs Presentation2009/12   Database Architechs Presentation
2009/12 Database Architechs Presentation
 
Oracle Advanced Analytics
Oracle Advanced AnalyticsOracle Advanced Analytics
Oracle Advanced Analytics
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital era
 
Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 

Plus de Lynn Langit

VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWSLynn Langit
 
Serverless Architectures
Serverless ArchitecturesServerless Architectures
Serverless ArchitecturesLynn Langit
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids ProgrammingLynn Langit
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on DockerLynn Langit
 
Testing in Ballerina Language
Testing in Ballerina LanguageTesting in Ballerina Language
Testing in Ballerina LanguageLynn Langit
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsTeaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsLynn Langit
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesUnderstanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesLynn Langit
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data PipelinesLynn Langit
 
Teaching Kids Programming
Teaching Kids ProgrammingTeaching Kids Programming
Teaching Kids ProgrammingLynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless RealityLynn Langit
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesLynn Langit
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsLynn Langit
 
Bioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSBioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSLynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless RealityLynn Langit
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond RelationalLynn Langit
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for BioinformaticsLynn Langit
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsLynn Langit
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformLynn Langit
 

Plus de Lynn Langit (20)

VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWS
 
Serverless Architectures
Serverless ArchitecturesServerless Architectures
Serverless Architectures
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on Docker
 
Testing in Ballerina Language
Testing in Ballerina LanguageTesting in Ballerina Language
Testing in Ballerina Language
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsTeaching Kids to create Alexa Skills
Teaching Kids to create Alexa Skills
 
Practical cloud
Practical cloudPractical cloud
Practical cloud
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesUnderstanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examples
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data Pipelines
 
Teaching Kids Programming
Teaching Kids ProgrammingTeaching Kids Programming
Teaching Kids Programming
 
Practical Cloud
Practical CloudPractical Cloud
Practical Cloud
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomics
 
Bioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSBioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWS
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

The Microsoft BigData Story

  • 1. Microsoft’s BigData Story @LynnLangit April 2013 – Big Data Tech Con
  • 2. Data Expertise / Lynn Langit • Industry awards – Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform – 10Gen – Master for MongoDB • Practicing Architect • Technical author / trainer – Pluralsight – Google Cloud Series – DevelopMentor – SQL Server Series – 2 books on SQL Server BI • Former MSFT FTE – 4 years
  • 3. In a Relationship? BigData NoSQL
  • 4. BigData, NoSQL… => No Microsoft? Big Data => keeping / getting more data • Cheap Storage • Cloud Storage • Open Source data projects (Hadoop) NoSQL => schema-lite, scalable storage • NoSQL data projects • Mostly open source • Sharded replicas
  • 5. In a (Open Source) Relationship? NoSQL Hadoop Cloud MongoDB Neo4j Riak AWS Heroku RackSpace OpenStack Cassandra
  • 7. The Reality BigData Small BigData
  • 8. BigData Lifecycle Management Locate Quantify Qualify Replicate Process Present
  • 9. Locating the data • you buy it Private source Public source • you find it Your source • in SQL Server • on desktops
  • 10. Finding Data in Data Markets • Windows Azure Data Market • DataMarket.com • Factual.com • InfoChimps
  • 12. Database Lifecycle Management • Evaluating current processes • Improving processes • Adding new tools – SSDT • Data synchronization processes
  • 13. Storing the data Relational • SQL Server – can use partitioning for scalability Beyond relational via relational • Specialized data types • XML, Hierarchy, Filestream/Filetable, Geospatial • Columnstore index Multi-dimensional / in-memory • OLAP cubes / Mining Models • Tabular models
  • 14. Big Data in SQL Server 2012 – Relational Enhancements DEMO COLUMNSTORE, XML, FILETABLE
  • 15. Data Processing Raw data Pre-processed data Detail data Aggregate data Views
  • 16. Valuing the data • De-duplicating • Validating • Correcting errors • Aggregating • Ranking / rating – Social rating ,i.e. Yelp-like – Social scoring, i.e. Freebase-like
  • 18. Types of Data Quality Projects T-SQL scripts (boolean • Exact matches WHERE = , WHERE WHERE <>, IN match) • LIKE string matching % -- Full-text matching (semantic word match) • CONTAINS Semantic Search • SEMANTICSIMIALARITIESTABLE (semantic phrase match) SSIS tasks - (transactional, multi-valued matching) • List below • KnowledgeBase rules/matches - DQS (KB matching) • DataQualityproject clean correctdata - / MDS (One view of truth) • Versioned Entities, Attributes and Rules
  • 19. Data Presentation • View-only client • View & manipulate (hide-only) client • View & query (aggregate) client • View & query (drill through) client • View & mash-up (add new data) client • View & update client • Timeliness of data (latency) • Beauty of data
  • 20. But, does it work in Excel? Mash-up Clean up Extract- Authorize data with data with Transform- with 3rd party – Import PowerPivot Data Load with Master Mine with Data – including Quality Data Data Predixion Hadoop via Services Explorer Services ODBC
  • 21. From Pivot tables to Visualized Data Mash-ups with Mining DEMO THE POWER OF EXCEL
  • 22. What about the UDM? • UDM / Data Mining is fully supported in SSAS • Must be installed in this mode – Mutually exclusive to Tabular mode • But, should you use it anymore?
  • 23. Big Data in SQL Server 2012 – Non-Relational Features DEMO TABULAR MODELS DATA MINING
  • 24.
  • 25. Data Consumability (Accurate) Valid (Meaningful) Recognizable (Useful) Appropriate (Appealing) Beautiful (Satisfying) Enjoyable
  • 27. Data Fluency and Job Roles Consumer Analyzer Cleaner Artist • View and • View, • Validate • Visualize understand manipulate and update and present and decide
  • 28. BigData in SQL Server 2012 • Scaling via • Partitioning for Tables, indexes • PDW Relational • Columnstore indexes engine • Special Data Types • XML, Hierarchy, Filetable • OLAP Cubes Analysis • Tabular Models service engines • Data Mining Models • Data Quality Services Other • Master Data Services services • StreamInsight
  • 29. Other Data Services from Microsoft Windows Azure SQL Azure Marketplace Data Power Pivot Explorer
  • 30. NoSQL – New Products / Betas SSRS on Semantic Azure Search HDInsight PowerView (Hadoop on Azure) Cloud-based Data Explorer
  • 32. The Changing Data Landscape Other Services RDBMS NoSQL
  • 33. • recipes) www.TeachingKidsProgramming.org • Free Courseware • Do a Recipe  Teach a Kid (Ages 10 ++) • Java or Microsoft SmallBasic • C# on Pluralsight
  • 34. Toward Data Craftsmanship… Follow me • @LynnLangit • www.LynnLangit.com • YouTube - SoCalDevGal Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions

Notes de l'éditeur

  1. SSIS Tasks - Lookup transformation - (this for that, substitutions)Cache transformation - (multiple lookups)Fuzzy Lookup - (lookup based on threshold matching)Fuzzy Grouping - (grouping based on thresholds)Data Mining Query - (based on mining model algorithms)DQS Cleansing - (uses a KB)
  2. Comparison of features from MSDN -- http://msdn.microsoft.com/en-us/library/hh212940(v=sql.110).aspx
  3. Lynn