In memory columnstore indexes--make your data warehouse

•Télécharger en tant que PPTX, PDF•

2 j'aime•2,231 vues

Presentation on SQL Server 2012 and 2014 Columnstore Indexing feature presented to Philadelphia SQL BI Usergroup on November 19, 2013

Technologie

In-Memory Columnstore Indexes-Make Your Data Warehouse Fly
Joe D’Antoni
Philadelphia SQL Server Business Intelligence
Group
19 November 2013

About Me
Solution Architect, Anexinet
@jdanton – Twitter
jdanton1@yahoo.com
Joedantoni.wordpress.com – Blog, Slides

Agenda
Indexes—a basic overview
Columnstore—an introduction
Report Performance—Demo
2012 and 2014—What’s Changing?

2014—Demo
Questions

Indexes
• Data Structure that allows us to
speed data retrieval, by
maintaining an extra copy of
data
• Can be filtered
• Can be function based, or
ordered
• Penalty is that writes become
more expensive
• More storage required

Indexes in SQL Server
• Clustered vs Nonclustered
• Non-clustered index ―just
an index‖

Clustered Index
• Data is ordered as is inserted into
pages
• Data in clustered index is only
stored on disk once (it’s the data
from the tables)
• Table without a clustered index is
called a heap—no order at all

Non-Clustered Index
• Duplicate copy of the data in table
• Provides point from index to table
data
• No specific order of data in index

Data Warehouse Queries
• Data Warehouses have a lot of
data
• Querying lots of a data can
take a really long time
• Processing data row by row—
may not be the most efficient
way to perform aggregations

Traditional Approaches To Improving
Performance
• Partitioned Tables
• Indexed Views
• Data Compression

Introducing Columnstore Indexes (SQL
2012)
• Data is stored in columns, as
opposed to rows
• This allows a much higher rate of
compression
• Columns not used in a query a
simply not scanned, nor returned
• Recommended practice is to add
most columns in a table to a index

So How is So Much Faster?
• Very good compression ratio for Column oriented
data
• Better use of Memory
• Segment Elimination Skips Large Chunks of Data
• Batch Mode
• Processes data in chunks of a 1000 row
―batches‖ rather than row by row
• 7-40x CPU savings with batch mode

“The key to getting the best
performance is to make sure
your queries process the
large majority of data in

Columnstore All The Things?
• Awesome performance—so what’s
the negative?
• Can’t update/insert in 2012
• Can only be nonclustered index—
so we are storing more data on
disk
• Data types are somewhat limited
• One index per table
• Can’t be a sorted index

So Where To Use Columnstore
Indexes?
• Only on Large Tables—Fact
tables and Dimension Tables >
3 Million Rows
• Include Every Column
• Structure Queries as star joins
with grouping and aggregation

More details here

Columnstore in 2014
• Fewer Data Type Limitations
• Updateable
• Can be Clustered Index

• New Archival Compression Mode
• Batch Mode Improvements

Columnstore Updates (2014)

Updates To
Index

Collected
until they
reach 1000
rows

Tuple Movers
Move into
Index

Columnstore Updates (2014)
• Bulk Inserts go through
special API
• Updates are processed
as inserts and deletes,
so expensive
opertation

What Do We Do Differently in 2014
• Best Practices are mostly the
same
• Batch mode gets enhanced and
gains more query types
• No need to worry about dropping
and rebuilding indexes—just
append data
• Still focus on large tables where
data is not frequently updated
• Archival Compression Good for
old unused data

Contact
jdanton1@yahoo.com
Joedantoni.wordpress.com
@jdanton

In memory columnstore indexes--make your data warehouse

Contenu connexe

Plus de Joseph D'Antoni

Sql Server 2012 HA and DR -- SQL Saturday RichmondJoseph D'Antoni

Sql server 2012 ha and dr sql saturday tampaJoseph D'Antoni

Windows server 2012 failover clustering new featuresJoseph D'Antoni

Sql server 2012 ha and dr sql saturday dcJoseph D'Antoni

San presentation nov 2012 central paJoseph D'Antoni

Always on availability groups way too deepJoseph D'Antoni

South jersey sql virtualizationJoseph D'Antoni

Virtualization for DBAJoseph D'Antoni

Sql server 2012 ha dr 24_hop_finalJoseph D'Antoni

Sql server 2012 ha dr novaJoseph D'Antoni

Sql server 2012 ha drJoseph D'Antoni

Sql saturday powerpoint dc_sanJoseph D'Antoni

Sql saturday dc vm wareJoseph D'Antoni

Deploying your Application to SQLRallyJoseph D'Antoni

Deploying data tier applications sql saturday dcJoseph D'Antoni

Building your first sql server clusterJoseph D'Antoni

Deploying data tier applications sql saturday dcJoseph D'Antoni

Server virtualization and cloud computingJoseph D'Antoni

Management data warehouseJoseph D'Antoni

Plus de Joseph D'Antoni (20)

Sql Server 2012 HA and DR -- SQL Saturday Richmond

Sql server 2012 ha and dr sql saturday tampa

Windows server 2012 failover clustering new features

Sql server 2012 ha and dr sql saturday dc

San presentation nov 2012 central pa

Always on availability groups way too deep

South jersey sql virtualization

Virtualization for DBA

Sql server 2012 ha dr 24_hop_final

Sql server 2012 ha dr nova

Sql server 2012 ha dr

Sql saturday powerpoint dc_san

Sql saturday dc vm ware

Deploying your Application to SQLRally

Deploying data tier applications sql saturday dc

Building your first sql server cluster

Deploying data tier applications sql saturday dc

Server virtualization and cloud computing

Management data warehouse

Dernier

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica

Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

2024 April Patch TuesdayIvanti

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple

Dernier (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

Decarbonising Buildings: Making a net-zero built environment a reality

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration

Time Series Foundation Models - current state and future directions

Generative Artificial Intelligence: How generative AI works.pdf

React Native vs Ionic - The Best Mobile App Framework

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability

Emixa Mendix Meetup 11 April 2024 about Mendix Native development

TeamStation AI System Report LATAM IT Salaries 2024

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes

Long journey of Ruby standard library at RubyConf AU 2024

2024 April Patch Tuesday

The Ultimate Guide to Choosing WordPress Pros and Cons

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

Genislab builds better products and faster go-to-market with Lean project man...

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...

Moving Beyond Passwords: FIDO Paris Seminar.pdf

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

In memory columnstore indexes--make your data warehouse

1. In-Memory Columnstore Indexes-Make Your Data Warehouse Fly Joe D’Antoni Philadelphia SQL Server Business Intelligence Group 19 November 2013

2. About Me Solution Architect, Anexinet @jdanton – Twitter jdanton1@yahoo.com Joedantoni.wordpress.com – Blog, Slides

3. Agenda Indexes—a basic overview Columnstore—an introduction Report Performance—Demo 2012 and 2014—What’s Changing? 2014—Demo Questions

4. Indexes • Data Structure that allows us to speed data retrieval, by maintaining an extra copy of data • Can be filtered • Can be function based, or ordered • Penalty is that writes become more expensive • More storage required

5. Indexes in SQL Server • Clustered vs Nonclustered • Non-clustered index ―just an index‖

6. Clustered Index • Data is ordered as is inserted into pages • Data in clustered index is only stored on disk once (it’s the data from the tables) • Table without a clustered index is called a heap—no order at all

7. Non-Clustered Index • Duplicate copy of the data in table • Provides point from index to table data • No specific order of data in index

8. So Why All This Talk About Indexes?

9. Data Warehouse Queries • Data Warehouses have a lot of data • Querying lots of a data can take a really long time • Processing data row by row— may not be the most efficient way to perform aggregations

10. Traditional Approaches To Improving Performance • Partitioned Tables • Indexed Views • Data Compression

11. Introducing Columnstore Indexes (SQL 2012) • Data is stored in columns, as opposed to rows • This allows a much higher rate of compression • Columns not used in a query a simply not scanned, nor returned • Recommended practice is to add most columns in a table to a index

12. Columnar Data Storage

13. Columnstore 2012 Demo

14. So How is So Much Faster? • Very good compression ratio for Column oriented data • Better use of Memory • Segment Elimination Skips Large Chunks of Data • Batch Mode • Processes data in chunks of a 1000 row ―batches‖ rather than row by row • 7-40x CPU savings with batch mode “The key to getting the best performance is to make sure your queries process the large majority of data in

15. Columnstore All The Things? • Awesome performance—so what’s the negative? • Can’t update/insert in 2012 • Can only be nonclustered index— so we are storing more data on disk • Data types are somewhat limited • One index per table • Can’t be a sorted index

16. So Where To Use Columnstore Indexes? • Only on Large Tables—Fact tables and Dimension Tables > 3 Million Rows • Include Every Column • Structure Queries as star joins with grouping and aggregation More details here

17. Columnstore 2014

18. Columnstore in 2014 • Fewer Data Type Limitations • Updateable • Can be Clustered Index • New Archival Compression Mode • Batch Mode Improvements

19. Columnstore Updates (2014) Updates To Index Collected until they reach 1000 rows Tuple Movers Move into Index

20. Columnstore Updates (2014) • Bulk Inserts go through special API • Updates are processed as inserts and deletes, so expensive opertation

21. Columnstore 2014 Demo

22. What Do We Do Differently in 2014 • Best Practices are mostly the same • Batch mode gets enhanced and gains more query types • No need to worry about dropping and rebuilding indexes—just append data • Still focus on large tables where data is not frequently updated • Archival Compression Good for old unused data

23. Questions

24. Contact jdanton1@yahoo.com Joedantoni.wordpress.com @jdanton

Notes de l'éditeur

Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
. Generally, nonclustered indexes are created to improve the performance of frequently used queries not covered by the clustered index or to locate rows in a table without a clustered index (called a heap). You can create multiple nonclustered indexes on a table or indexed view.
The columnstore index in SQL Server employs Microsoft’s patented Vertipaq™ technology, which itshares with SQL Server Analysis Services and PowerPivot. SQL Server columnstore indexes don’t have tofit in main memory, but they can effectively use as much memory as is available on the server. Portionsof columns are moved in and out of memory on demand
What data types cannot be used in a columnstore index?The following data types cannot be used in a columnstore index: decimal or numeric with precision > 18, datetimeoffset with precision > 2, binary, varbinary, image, text, ntext, varchar(max), nvarchar(max), cursor, hierarchyid, timestamp, uniqueidentifier, sqlvariant, xml.The SQL Server 2012 implementation did not support a number of data types such as numeric beyond precision 18, datetimeoffset beyond precision 2, GUID and binary columns. The upcoming version adds support for all the above data types. It also introducessupport for storing short strings by value instead of converting all strings to a 32 bit id within a dictionary. This removes the extraoverhead associated with the dictionary and helps improve the column store compression even further.
Include every column of the table in the columnstore index. If you don't, then a query that references a column not included in the index will not benefit from the columnstores index much or at all.Structure your queries as star joins with grouping and aggregation as much as possible. Avoid joining pairs of large tables. Join a single large fact table to one or more smaller dimensions using standard inner joins. Use a dimensional modeling approach for your data as much as possible to allow you to structure your queries this way.Use best practices for statistics management and query design. This is independent of columnstore technology. Use good statistics and avoid query design pitfalls to get the best performance. See the white paper on SQL Server statistics for guidance. In particular, see the section "Best Practices for Managing Statistics."

In memory columnstore indexes--make your data warehouse

Recommandé

Recommandé

Contenu connexe

Plus de Joseph D'Antoni

Plus de Joseph D'Antoni (20)

Dernier

Dernier (20)

In memory columnstore indexes--make your data warehouse

Notes de l'éditeur