4. Indexes
• Data Structure that allows us to
speed data retrieval, by
maintaining an extra copy of
data
• Can be filtered
• Can be function based, or
ordered
• Penalty is that writes become
more expensive
• More storage required
5. Indexes in SQL Server
• Clustered vs Nonclustered
• Non-clustered index ―just
an index‖
6. Clustered Index
• Data is ordered as is inserted into
pages
• Data in clustered index is only
stored on disk once (it’s the data
from the tables)
• Table without a clustered index is
called a heap—no order at all
7. Non-Clustered Index
• Duplicate copy of the data in table
• Provides point from index to table
data
• No specific order of data in index
9. Data Warehouse Queries
• Data Warehouses have a lot of
data
• Querying lots of a data can
take a really long time
• Processing data row by row—
may not be the most efficient
way to perform aggregations
10. Traditional Approaches To Improving
Performance
• Partitioned Tables
• Indexed Views
• Data Compression
11. Introducing Columnstore Indexes (SQL
2012)
• Data is stored in columns, as
opposed to rows
• This allows a much higher rate of
compression
• Columns not used in a query a
simply not scanned, nor returned
• Recommended practice is to add
most columns in a table to a index
14. So How is So Much Faster?
• Very good compression ratio for Column oriented
data
• Better use of Memory
• Segment Elimination Skips Large Chunks of Data
• Batch Mode
• Processes data in chunks of a 1000 row
―batches‖ rather than row by row
• 7-40x CPU savings with batch mode
“The key to getting the best
performance is to make sure
your queries process the
large majority of data in
15. Columnstore All The Things?
• Awesome performance—so what’s
the negative?
• Can’t update/insert in 2012
• Can only be nonclustered index—
so we are storing more data on
disk
• Data types are somewhat limited
• One index per table
• Can’t be a sorted index
16. So Where To Use Columnstore
Indexes?
• Only on Large Tables—Fact
tables and Dimension Tables >
3 Million Rows
• Include Every Column
• Structure Queries as star joins
with grouping and aggregation
More details here
22. What Do We Do Differently in 2014
• Best Practices are mostly the
same
• Batch mode gets enhanced and
gains more query types
• No need to worry about dropping
and rebuilding indexes—just
append data
• Still focus on large tables where
data is not frequently updated
• Archival Compression Good for
old unused data
Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
. Generally, nonclustered indexes are created to improve the performance of frequently used queries not covered by the clustered index or to locate rows in a table without a clustered index (called a heap). You can create multiple nonclustered indexes on a table or indexed view.
The columnstore index in SQL Server employs Microsoft’s patented Vertipaq™ technology, which itshares with SQL Server Analysis Services and PowerPivot. SQL Server columnstore indexes don’t have tofit in main memory, but they can effectively use as much memory as is available on the server. Portionsof columns are moved in and out of memory on demand
What data types cannot be used in a columnstore index?The following data types cannot be used in a columnstore index: decimal or numeric with precision > 18, datetimeoffset with precision > 2, binary, varbinary, image, text, ntext, varchar(max), nvarchar(max), cursor, hierarchyid, timestamp, uniqueidentifier, sqlvariant, xml.The SQL Server 2012 implementation did not support a number of data types such as numeric beyond precision 18, datetimeoffset beyond precision 2, GUID and binary columns. The upcoming version adds support for all the above data types. It also introducessupport for storing short strings by value instead of converting all strings to a 32 bit id within a dictionary. This removes the extraoverhead associated with the dictionary and helps improve the column store compression even further.
Include every column of the table in the columnstore index. If you don't, then a query that references a column not included in the index will not benefit from the columnstores index much or at all.Structure your queries as star joins with grouping and aggregation as much as possible. Avoid joining pairs of large tables. Join a single large fact table to one or more smaller dimensions using standard inner joins. Use a dimensional modeling approach for your data as much as possible to allow you to structure your queries this way.Use best practices for statistics management and query design. This is independent of columnstore technology. Use good statistics and avoid query design pitfalls to get the best performance. See the white paper on SQL Server statistics for guidance. In particular, see the section "Best Practices for Managing Statistics."