SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
Secrets of Enterprise Data Mining 
Mark Tabladillo, Ph.D. (MVP, SAS Expert) 
Consultant, SolidQ 
SQL Saturday Oregon 
November 1, 2014
Networking 
Say hello
Mark Tab 
SQL Server MVP; SAS Expert 
Consulting 
Training 
Teaching 
Presenting 
Linked In 
@MarkTabNet
Interactive 
Name (up to) three things you want from enterprise data mining
Definitions 
What is data mining?
Definition 
Data mining is the automated or semi-automated process of discovering patterns in data 
Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
Purposes 
Phrase 
Goal 
“Data Mining” 
Inform actionabledecisions 
“Machine Learning” 
Determine best performingalgorithm
How could data mining apply? 
Let’s look at three companies
Telecommunications
Oil and Gas
Volkswagen Group
What 
Why 
How 
Relational Data Warehouse 
Familiarway to store, fast retrieval, consistency, scalable 
Database, relational constructs,indexes 
Hadoop & HDInsight 
Large amounts, divideand conquer, analyzing unstructured data, flexible schema 
Distributed computing 
Tabular 
Fast calculations 
In-memory, columns over rows 
MultidimensionalOLAP 
Sliceand dice, ad hoc querying 
Expandsstar schema into cube, preaggregatedcalculations 
Data Mining & Machine Learning 
Patterns, predictions, high volume 
Algorithms,estimations
Secret: Excel data mining 
Excel add-in for SQL Server data mining
Data mining add-in for business analysts 
•Ease of use 
•Rich data mining 
•Scalable
Split Personality of SSAS 
SS 
SQL 
AS 
NoSQL
Excel Data Mining Add-In 
For Office 2007: The 32-bit data mining add-in works with SQL Server 2008 or 2008 R2: 
http://www.microsoft.com/en-us/download/details.aspx?id=7294 
For Office 2010: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier: 
http://www.microsoft.com/en-us/download/details.aspx?id=35578 
For Office 2013: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier: 
http://www.microsoft.com/en-us/download/details.aspx?id=35578
Secret: Data Science provides an Epistemology 
Data mining is part of a complete data science cycle
MarkTab Decision Cycle 
Analysis 
(science) 
Synthesis 
(art) 
GO 
Science needs science fiction --MarkTab
MarkTab Decision Cycle 
Analysis 
(science) 
Synthesis 
(art) 
GO
Currency of Science 
Notes
Secret: Microsoft is an analytics competitor
Gartner 2013 
Magic Quadrant for Business Intelligence and Analytics Platforms 
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb–February 5, 2013
Gartner 2013 
Magic Quadrant for Data Warehouse Database Management Systems 
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb–January 31, 2013
KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project? 
http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project? 
http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
KDNuggets2014 
http://www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html
KDNuggets2014 
http://www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html
SQL Server 2014 
Business Intelligence and Business Analytics
Secret: Many already have Microsoft analytics 
Business Intelligence and Business Analytics are included with most SQL Server licenses
Self-service BI 
Corporate BI 
Evolution of BI
Evolution of BI 
Niche Startups 
Self-service BI 
Corporate BI
Data platform: SQL Server 2014 
Database Services 
SQL Server* SQL Azure* 
ReplicationSQL Azure Data Sync* 
Full Text & Semantic Search* 
Data Integration Services 
Integration Services* 
Master Data Services* 
Data Quality Services* 
StreamInsight* Project “Austin”* 
Analytical Services 
Analysis Services* 
Data Mining 
PowerPivot* 
Reporting Services 
Reporting Services* SQL Azure Reporting* 
Report Builder 
Power View*
Secret: Microsoft offers two choices 
SQL Server Analysis Services = SQL Server Data Mining 
Microsoft Azure Machine Learning
Advanced analytic tools for data scientists 
•Advanced descriptive analytics (e.g. clustering algorithm in SQL Server Analysis Services) 
•Predictive analytics (Neural Nets, Regression, Decision Tree, Time Series, Naïve Bayes algorithms in SQL Server Analysis Services) 
•Further advanced analytics (Semantic Search and Geospatial Data and functions in SQL Server 2012) 
•Big Data analytics(Hadoop integration)
What Enterprise Tools support SSAS? 
Data Mining 
SSMS 
SSIS 
PowerShell
SSAS Data Mining Capacities 
SQL Server 2014Analysis Services Object 
Maximum sizes/numbers 
Maximum data mining models per structure 
2^31-1 = 2,147,483,647 
Maximum data mining structures per solution 
2^31-1 = 2,147,483,647 
Maximum data mining structures per Analysis Services database 
2^31-1 = 2,147,483,647 
Maximum data mining attributes (variables) per structure 
64K 
Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
Microsoft Azure Machine Learning 
Bringsengineeringbestpracticestodatascience… 
Archiveforpredictivemodels,ensuringmodels 
arenotlost,deleted,orcorrupted. 
Search,discoveryandreuseexistingmodelsto 
buildontheworkofothers; 
Deploypredictivemodelintooperation,from 
DataLabtominimizetimetoinsight; 
Frequentlyupdatethepredictivemodel,to adapttochangingbusinessconditions. 
Everynewalgorithmaddedasamodule,everynewpredictivemodeldeployedwillflow 
tobuilduptheknowledgebaseandmakethe software morevaluable.
Semantic Search 
Text Mining
Future: Most data is Text 
•Quantitative research = data mining 
•Qualitative research = text mining 
Two Research Types 
The future is combining both
(iFilterRequired) 
Documents 
Full-Text Keyword Index 
“FTI” 
iFilters 
Semantic Document Similarity Index “DSI” 
Semantic Database 
Semantic Key Phrase Index – 
Tag Index “TI”
Languages Currently Supported 
Traditional Chinese 
German 
English 
French 
Italian 
Brazilian 
Russian 
Swedish 
Simplified Chinese 
British English 
Portuguese 
Chinese (Hong Kong SAR, PRC) 
Spanish 
Chinese (Singapore) 
Chinese (Macau SAR)
Secret: Semantic Search scales linearly 
Performance
Integrated Full Text Search (iFTS) 
Improved Performance and Scale: 
Scale-up to 350M documents for storage and search 
iFTSquery performance 7-10 times faster than in SQL Server 2008 
Worst-case iFTSquery response times less than 3 sec for corpus 
Similar or better than main database search competitors 
(2012, Michael Rys, Microsoft)
Linear Scale of FTI/TI/DSI 
First known linearly scaling end-to-end Search and Semantic product in the industry 
Time in Seconds vs. Number of Documents 
(2011 –K. Mukerjee, T. Porter, S. Gherman–Microsoft)
Text Mining References 
Video 
http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search 
http://www.microsoftpdc.com/2009/SVR32 
Semantic Search (Books Online) –explains the demo 
http://msdn.microsoft.com/en-us/library/gg492075.aspx 
Paper 
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
Microsoft Resources 
Links
Major Websites 
SQL Server Data Mining 
http://technet.microsoft.com/en-us/sqlserver/cc510301.aspx 
http://www.sqlserverdatamining.com/ 
Microsoft Azure Machine Learning (currently in preview) http://azure.microsoft.com/en-us/services/machine-learning/
Software 
Dreamspark(students); BizSpark(businesses) 
SQL Server 2014 Enterprise (includes database engine, Analysis Services, SSMS and SSDT) 
http://www.microsoft.com/en-us/server-cloud/products/sql-server/default.aspx 
Microsoft Office 
http://office.microsoft.com/en-us/ 
Primer on Power BI --MarkTab 
http://blogs.msdn.com/b/mvpawardprogram/archive/2014/08/04/primer-on-power-bi-business- intelligence.aspx
Organizations 
Professional Association for SQL Server http://www.sqlpass.org 
PASS Business Analytics Conference http://www.passbaconference.com
Interactive 
Takeaways
Conclusion 
Excel data mining 
Data Science provides an epistemology 
Microsoft is an analytics competitor 
Many already have Microsoft analytics 
Microsoft offers two enterprise solutions 
Semantic search scales linearly
Abstract 
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.

Contenu connexe

Tendances

Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformPredictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformSavita Yadav
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j
 
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Denodo
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...Yahoo Developer Network
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at TwitchImply
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateLendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateRajit Saha
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLJongwook Woo
 
Data Discovery & Trust through Metadata
Data Discovery & Trust through MetadataData Discovery & Trust through Metadata
Data Discovery & Trust through Metadatamarkgrover
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowCambridge Semantics
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesDataStax
 
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessEntity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessDataWorks Summit
 
Data Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data WarehousingData Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data WarehousingAll Things Open
 
Visualize the Knowledge Graph and Unleash Your Data
Visualize the Knowledge Graph and Unleash Your DataVisualize the Knowledge Graph and Unleash Your Data
Visualize the Knowledge Graph and Unleash Your DataLinkurious
 

Tendances (20)

Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformPredictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
 
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
 
Big data summary_v2.1
Big data summary_v2.1Big data summary_v2.1
Big data summary_v2.1
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at Twitch
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateLendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGate
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
 
Data Discovery & Trust through Metadata
Data Discovery & Trust through MetadataData Discovery & Trust through Metadata
Data Discovery & Trust through Metadata
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time Analytics
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
 
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessEntity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
 
Data Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data WarehousingData Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data Warehousing
 
Visualize the Knowledge Graph and Unleash Your Data
Visualize the Knowledge Graph and Unleash Your DataVisualize the Knowledge Graph and Unleash Your Data
Visualize the Knowledge Graph and Unleash Your Data
 

En vedette

SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2Mark Tabladillo
 
Data Mining Innovation 201306
Data Mining Innovation 201306Data Mining Innovation 201306
Data Mining Innovation 201306Mark Tabladillo
 
Data science guide for PASS Summit 2014
Data science guide for PASS Summit 2014Data science guide for PASS Summit 2014
Data science guide for PASS Summit 2014Mark Tabladillo
 
Applied enterprise semantic mining
Applied enterprise semantic miningApplied enterprise semantic mining
Applied enterprise semantic miningMark Tabladillo
 
SQL Saturday 108 -- Enterprise Data Mining with SQL Server
SQL Saturday 108 -- Enterprise Data Mining with SQL ServerSQL Saturday 108 -- Enterprise Data Mining with SQL Server
SQL Saturday 108 -- Enterprise Data Mining with SQL ServerMark Tabladillo
 
SQL Server 2008 Data Mining with PowerPivot and Excel 2010
SQL Server 2008 Data Mining with PowerPivot and Excel 2010SQL Server 2008 Data Mining with PowerPivot and Excel 2010
SQL Server 2008 Data Mining with PowerPivot and Excel 2010Mark Tabladillo
 

En vedette (7)

SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
 
Data Mining Innovation 201306
Data Mining Innovation 201306Data Mining Innovation 201306
Data Mining Innovation 201306
 
Social Marketing 201306
Social Marketing 201306Social Marketing 201306
Social Marketing 201306
 
Data science guide for PASS Summit 2014
Data science guide for PASS Summit 2014Data science guide for PASS Summit 2014
Data science guide for PASS Summit 2014
 
Applied enterprise semantic mining
Applied enterprise semantic miningApplied enterprise semantic mining
Applied enterprise semantic mining
 
SQL Saturday 108 -- Enterprise Data Mining with SQL Server
SQL Saturday 108 -- Enterprise Data Mining with SQL ServerSQL Saturday 108 -- Enterprise Data Mining with SQL Server
SQL Saturday 108 -- Enterprise Data Mining with SQL Server
 
SQL Server 2008 Data Mining with PowerPivot and Excel 2010
SQL Server 2008 Data Mining with PowerPivot and Excel 2010SQL Server 2008 Data Mining with PowerPivot and Excel 2010
SQL Server 2008 Data Mining with PowerPivot and Excel 2010
 

Similaire à Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411

Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310Mark Tabladillo
 
Secrets of Enterprise Data Mining 201305
Secrets of Enterprise Data Mining 201305Secrets of Enterprise Data Mining 201305
Secrets of Enterprise Data Mining 201305Mark Tabladillo
 
Applied Enterprise Semantic Mining -- Charlotte 201410
Applied Enterprise Semantic Mining -- Charlotte 201410Applied Enterprise Semantic Mining -- Charlotte 201410
Applied Enterprise Semantic Mining -- Charlotte 201410Mark Tabladillo
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionAmazon Web Services
 
Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305Mark Tabladillo
 
Applied Semantic Search 201306
Applied Semantic Search 201306Applied Semantic Search 201306
Applied Semantic Search 201306Mark Tabladillo
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsClusterpoint
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerMark Tabladillo
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Mark Tabladillo
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...oj08
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 

Similaire à Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411 (20)

Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310Secrets of Enterprise Data Mining 201310
Secrets of Enterprise Data Mining 201310
 
Secrets of Enterprise Data Mining 201305
Secrets of Enterprise Data Mining 201305Secrets of Enterprise Data Mining 201305
Secrets of Enterprise Data Mining 201305
 
Applied Enterprise Semantic Mining -- Charlotte 201410
Applied Enterprise Semantic Mining -- Charlotte 201410Applied Enterprise Semantic Mining -- Charlotte 201410
Applied Enterprise Semantic Mining -- Charlotte 201410
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in Action
 
Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305
 
Applied Semantic Search 201306
Applied Semantic Search 201306Applied Semantic Search 201306
Applied Semantic Search 201306
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL Server
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
2013  International Conference on Knowledge, Innovation and Enterprise Presen...2013  International Conference on Knowledge, Innovation and Enterprise Presen...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 

Plus de Mark Tabladillo

How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006Mark Tabladillo
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMark Tabladillo
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for DevelopersMark Tabladillo
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0Mark Tabladillo
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019Mark Tabladillo
 
201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusMLMark Tabladillo
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...Mark Tabladillo
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Mark Tabladillo
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Mark Tabladillo
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureMark Tabladillo
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Mark Tabladillo
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Mark Tabladillo
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Mark Tabladillo
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610Mark Tabladillo
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Mark Tabladillo
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Mark Tabladillo
 

Plus de Mark Tabladillo (20)

How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science Recap
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on Azure
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
 

Dernier

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 

Dernier (20)

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 

Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411

  • 1. Secrets of Enterprise Data Mining Mark Tabladillo, Ph.D. (MVP, SAS Expert) Consultant, SolidQ SQL Saturday Oregon November 1, 2014
  • 3. Mark Tab SQL Server MVP; SAS Expert Consulting Training Teaching Presenting Linked In @MarkTabNet
  • 4. Interactive Name (up to) three things you want from enterprise data mining
  • 5. Definitions What is data mining?
  • 6. Definition Data mining is the automated or semi-automated process of discovering patterns in data Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
  • 7. Purposes Phrase Goal “Data Mining” Inform actionabledecisions “Machine Learning” Determine best performingalgorithm
  • 8. How could data mining apply? Let’s look at three companies
  • 12.
  • 13. What Why How Relational Data Warehouse Familiarway to store, fast retrieval, consistency, scalable Database, relational constructs,indexes Hadoop & HDInsight Large amounts, divideand conquer, analyzing unstructured data, flexible schema Distributed computing Tabular Fast calculations In-memory, columns over rows MultidimensionalOLAP Sliceand dice, ad hoc querying Expandsstar schema into cube, preaggregatedcalculations Data Mining & Machine Learning Patterns, predictions, high volume Algorithms,estimations
  • 14.
  • 15. Secret: Excel data mining Excel add-in for SQL Server data mining
  • 16. Data mining add-in for business analysts •Ease of use •Rich data mining •Scalable
  • 17. Split Personality of SSAS SS SQL AS NoSQL
  • 18. Excel Data Mining Add-In For Office 2007: The 32-bit data mining add-in works with SQL Server 2008 or 2008 R2: http://www.microsoft.com/en-us/download/details.aspx?id=7294 For Office 2010: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier: http://www.microsoft.com/en-us/download/details.aspx?id=35578 For Office 2013: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier: http://www.microsoft.com/en-us/download/details.aspx?id=35578
  • 19. Secret: Data Science provides an Epistemology Data mining is part of a complete data science cycle
  • 20. MarkTab Decision Cycle Analysis (science) Synthesis (art) GO Science needs science fiction --MarkTab
  • 21. MarkTab Decision Cycle Analysis (science) Synthesis (art) GO
  • 23. Secret: Microsoft is an analytics competitor
  • 24. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb–February 5, 2013
  • 25. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb–January 31, 2013
  • 26. KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project? http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
  • 27. KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project? http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
  • 30. SQL Server 2014 Business Intelligence and Business Analytics
  • 31. Secret: Many already have Microsoft analytics Business Intelligence and Business Analytics are included with most SQL Server licenses
  • 32. Self-service BI Corporate BI Evolution of BI
  • 33. Evolution of BI Niche Startups Self-service BI Corporate BI
  • 34. Data platform: SQL Server 2014 Database Services SQL Server* SQL Azure* ReplicationSQL Azure Data Sync* Full Text & Semantic Search* Data Integration Services Integration Services* Master Data Services* Data Quality Services* StreamInsight* Project “Austin”* Analytical Services Analysis Services* Data Mining PowerPivot* Reporting Services Reporting Services* SQL Azure Reporting* Report Builder Power View*
  • 35. Secret: Microsoft offers two choices SQL Server Analysis Services = SQL Server Data Mining Microsoft Azure Machine Learning
  • 36. Advanced analytic tools for data scientists •Advanced descriptive analytics (e.g. clustering algorithm in SQL Server Analysis Services) •Predictive analytics (Neural Nets, Regression, Decision Tree, Time Series, Naïve Bayes algorithms in SQL Server Analysis Services) •Further advanced analytics (Semantic Search and Geospatial Data and functions in SQL Server 2012) •Big Data analytics(Hadoop integration)
  • 37. What Enterprise Tools support SSAS? Data Mining SSMS SSIS PowerShell
  • 38. SSAS Data Mining Capacities SQL Server 2014Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis Services database 2^31-1 = 2,147,483,647 Maximum data mining attributes (variables) per structure 64K Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  • 39. Microsoft Azure Machine Learning Bringsengineeringbestpracticestodatascience… Archiveforpredictivemodels,ensuringmodels arenotlost,deleted,orcorrupted. Search,discoveryandreuseexistingmodelsto buildontheworkofothers; Deploypredictivemodelintooperation,from DataLabtominimizetimetoinsight; Frequentlyupdatethepredictivemodel,to adapttochangingbusinessconditions. Everynewalgorithmaddedasamodule,everynewpredictivemodeldeployedwillflow tobuilduptheknowledgebaseandmakethe software morevaluable.
  • 41. Future: Most data is Text •Quantitative research = data mining •Qualitative research = text mining Two Research Types The future is combining both
  • 42. (iFilterRequired) Documents Full-Text Keyword Index “FTI” iFilters Semantic Document Similarity Index “DSI” Semantic Database Semantic Key Phrase Index – Tag Index “TI”
  • 43. Languages Currently Supported Traditional Chinese German English French Italian Brazilian Russian Swedish Simplified Chinese British English Portuguese Chinese (Hong Kong SAR, PRC) Spanish Chinese (Singapore) Chinese (Macau SAR)
  • 44. Secret: Semantic Search scales linearly Performance
  • 45. Integrated Full Text Search (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTSquery performance 7-10 times faster than in SQL Server 2008 Worst-case iFTSquery response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  • 46. Linear Scale of FTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 –K. Mukerjee, T. Porter, S. Gherman–Microsoft)
  • 47. Text Mining References Video http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32 Semantic Search (Books Online) –explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspx Paper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • 49. Major Websites SQL Server Data Mining http://technet.microsoft.com/en-us/sqlserver/cc510301.aspx http://www.sqlserverdatamining.com/ Microsoft Azure Machine Learning (currently in preview) http://azure.microsoft.com/en-us/services/machine-learning/
  • 50. Software Dreamspark(students); BizSpark(businesses) SQL Server 2014 Enterprise (includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/en-us/server-cloud/products/sql-server/default.aspx Microsoft Office http://office.microsoft.com/en-us/ Primer on Power BI --MarkTab http://blogs.msdn.com/b/mvpawardprogram/archive/2014/08/04/primer-on-power-bi-business- intelligence.aspx
  • 51. Organizations Professional Association for SQL Server http://www.sqlpass.org PASS Business Analytics Conference http://www.passbaconference.com
  • 53. Conclusion Excel data mining Data Science provides an epistemology Microsoft is an analytics competitor Many already have Microsoft analytics Microsoft offers two enterprise solutions Semantic search scales linearly
  • 54. Abstract If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.