Presented at SQL Saturday 220, Atlanta, GA, 201305. If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.
6. Secret: More than just
SQL Server
Microsoft continues to add machine learning
technology
7. Microsoft Offers
Bing
Maps
Xbox Kinect
Hacker Magnet
SQL Server 2012
Analysis Services (Multidimensional and Data Mining)
Integration Services
Semantic Search
Hadoop Partnership
Excel Projects from Microsoft Research
Microsoft Data Lab: http://passfiles.sqlpass.org/vc/ba/PASSBAVC042513/PASSBAVC042513.pdf
9. Definition
Data mining is the automated or semi-automated process of
discovering patterns in data
Machine learning is the development and optimization of
algorithms for automated or semi-automated pattern discovery
18. Gartner 2013
Magic Quadrant for
Business Intelligence
and Analytics
Platforms
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb
– February 5, 2013
19. Gartner 2013
Magic Quadrant for
Data Warehouse
Database
Management
Systems
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb
– January 31, 2013
22. New Platform options: managed services
Applications
Data
Runtime
Middleware
Database
O/S
Virtualization
Servers
Storage
Networking
Platform
(Self Managed)
Applications
Data
Runtime
Middleware
Database
O/S
Virtualization
Servers
Storage
Networking
Infrastructure
(as a Service)
Applications
Data
Runtime
Middleware
Database
O/S
Virtualization
Servers
Storage
Networking
Platform
(as a Service)
Applications
Data
Runtime
Middleware
Database
O/S
Virtualization
Servers
Storage
Networking
Software
(as a Service)
ManagedServices
ManagedServices
ManagedServices
23. SQL Release timelines
1996
SQL Server 6.5
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
2005
SQL Server 2005
Unicode Support
Native XML
SQLCLR
Service Broker
Integration Services
1993
SQL Server 4.21
(NT)
1995
SQL Server 6.0
1989
SQL Server 1.0
(OS/2)
2000
SQL Server 2000
Reporting Services
2010
SQL Server 2008 R2
Data-tier Apps
StreamInsight
PowerPivot
Master Data Services
2008
SQL Server 2008
Sparse Columns
Spatial Types
FILESTREAM
1998
SQL Server 7.0
Dynamic Locking
Auto-Tuning
Full-text search
Replication
Analysis Services
1991
SQL Server 1.1
(OS/2)
2012
SQL Server 2012
AlwaysOn
Columnstore
FileTable
Semantic Search
Power View
Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11
Aug 10
SQL Azure SU4 RTW
Database Copy
Web Admin
Feb 10
SQL Azure RTW
Feb 10
SQL Azure SU1 RTW
Alter Edition
Apr 10
SQL Azure SU2 RTW
MARS
Jun 10
SQL Azure SU3 RTW
50 GB Db
Spatial Type
HierarchyId Type
Dec 10
SQL Azure SU6 RTW
DataSync CTP2
Apr 11
SQL Azure SU V.Next
Multiple Servers
Server Mgmt API
JDBC
DAC Upgrade
Nov 10
DataMarket RTW
SQL Azure Reporting CTP1
Feb 11
SQL Azure Reporting CTP2
DataSync CTP2 Update
Jul 10
DataSync CTP1
Aug 11
New Portal Experience
Sparse Columns
SQL Azure Reporting CTP3
SQL Azure DataSync CTP3
DAC Import/Export Service
Denali TSQL
24. Secret: Many already
have Microsoft analytics
Business Intelligence and Business Analytics are
included with most SQL Server licenses
25. Data platform: SQL Server 2012
Database Services
SQL Server*
SQL Azure*
Replication
SQL Azure Data Sync*
Full Text & Semantic
Search*
Data Integration
Services
Integration Services*
Master Data Services*
Data Quality Services*
StreamInsight*
Project “Austin”*
Analytical Services
Analysis Services*
Data Mining
PowerPivot*
Reporting Services
Reporting Services*
SQL Azure Reporting*
Report Builder
Power View*
* New / improved in SQL Server 2012
26. SQL Server 2012 Editions
Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
33. Data Mining Capacities
SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers
Maximum data mining models per structure 2^31-1 = 2,147,483,647
Maximum data mining structures per solution 2^31-1 = 2,147,483,647
Maximum data mining structures per Analysis
Services database
2^31-1 = 2,147,483,647
Maximum data mining attributes (variables) per
structure
2^31-1 = 2,147,483,647
Reference:
http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
35. Future: Most data is Text
Two Research Types
• Quantitative research = data mining
• Qualitative research = text mining
The future is combining both
36. Full-Text Search Enhancements
Property search: search on tagged properties (such as author or title)
Customizable NEAR: find words or phrases close to one another
New Word Breakers and Stemmers (for many languages)
38. Languages Currently Supported
Traditional Chinese
German
English
French
Italian
Brazilian
Russian
Swedish
Simplified Chinese
British English
Portuguese
Chinese (Hong Kong SAR, PRC)
Spanish
Chinese (Singapore)
Chinese (Macau SAR)
39. Phases of Semantic Indexing
Full Text Keyword Index “FTI”
Semantic Key Phrase Index –
Tag Index “TI”
Semantic Document Similarity
Index “DSI”
http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
41. Integrated Full Text Search (iFTS)
Improved Performance and Scale:
Scale-up to 350M documents for storage and search
iFTS query performance 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times less than 3 sec for corpus
Similar or better than main database search competitors
(2012, Michael Rys, Microsoft)
42. Linear Scale of FTI/TI/DSI
First known linearly scaling end-to-end Search and Semantic product in the industry
Time in Seconds vs. Number of Documents
(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
45. Software
SQL Server 2012 Enterprise
(includes database engine, Analysis Services, SSMS and SSDT)
http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx
Microsoft Office 2012 Professional
http://office.microsoft.com/en-us/try
46. Organizations
Professional Association for SQL Server http://www.sqlpass.org
Atlanta MDF http://www.atlantamdf.com/
Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-
Business-Intelligence-Users/
PASS Business Analytics Conference http://www.passbaconference.com
Microsoft TechEd North America http://northamerica.msteched.com/
48. Conclusion: Seven Secrets
Excel data mining
More than just SQL Server
Success involves everyone
Microsoft is an analytics competitor
Many already have Microsoft analytics
Microsoft offers three enterprise tools
Semantic search scales linearly
49. Connect
Data Mining Resources and blog http://marktab.net
Data Mining Training and Consulting (especially Microsoft and SAS)
http://marktab.com
50. Abstract
If you have a SQL Server license (Standard or higher) then you already have the ability
to start data mining. In this new presentation, you will see how to scale up data
mining from the free Excel 2013 add-in to production use. Aimed at beginning to
intermediate data miners, this presentation will show how mining models move from
development to production. We will use SQL Server 2012 tools including SSMS, SSIS,
and SSDT.