4. Microstrategy- BI •
DWH •
•
BI
SQL SERVER- SYBASE- •
•
DB P T • DB
OEM •
•
WEB SILVERLIGHT C •
•
.NET
5.
6. White Papers
• Analysis Services 2008 R2 Performance
Guide
• Analysis Services 2008 Operation
Guide
• Performance Improvements for MDX in
SQL Server 2008 Analysis Services
• OLAP Design Best Practices
7. 1 or 2 dimensions
a) One Dimension b) Two Dimensions
Dim Dim
Account Account
Fact Fact
Table Table
customer
Dim
attributes
Customer
• We can get the customer
• Simplicity, 1 dim
attributes without knowing the
• Hierarchy from customer
account key
attribute &account attribute
• Disadvantage: can‟t go from
• Use when we don‟t have fact
account to customer without
tables requiring customer grain.
going through the fact table -
performance
8. 1 or 2 dimensions
c) Snowflake
Dim Dim
Account Customer
Fact
Table • Dim customer is needed by another fact table
• Modular: 2 separate dim tables but we can combine
them easily to create a bigger dimension
• To get the breakdown of a measure by a customer
attribute is a bit more complicated than a)
select c. attribute, sum(f.measure1) from fact1 f
inner join dim_account a on f.account_key = a.account_key
inner join dim_customer c on a.customer_key = c.customer_key
group by c. attribute
9. When to Snowflake
1. When the sub dim is used by several dims
City-Country-Region columns exist in
DimBroker, DimPolicy, DimOffice and
DimInsured
Replaced by Location/GeoKey
pointing to DimLocation /
DimGeography
Advantage: consistent hierarchy, i.e. relationship between
City, Country & Region.
Weakness: we would lose flexibility. City to Country are
more or less fixed, but the grouping of countries might be
different between dimensions.
10. When to Snowflake
2. When the sub dim is used by both the main dim and
the fact table(s)
• DimCustomer is used in DimAccount,
and is also used in the fact table.
• DimManufacturer is used in DimProduct,
and is also used in the fact table.
• DimProductGroup is used in DimProduct,
and is also used in some fact table.
The alternative is maintaining two
full dimensions (star classic).
11. When to Snowflake
3. To make “base dim” and “detail dim”
Insurance classes, account types
(banking), product lines, diagnosis,
treatment (health care)
Policies for marine, aviation & property classes have different
attributes.
Pull common attributes into 1 dim: DimBasePolicy
Put class-specific attributes into DimMarine, DimProperty, DimAviation
Ref: Kimball DW Toolkit 2nd edition page 213
12. A dimension with only 1 attribute
Should we put the attribute in the fact table?
(like DD = Degenerate Dim)
Probably, if the grain = fact table,
and it‟s short or it‟s a number.
Reasons for putting single attribute in its own dim:
– Keep fact table slim (4 bytes int not 100 bytes varchar)
– When the value changes, we don‟t have to update the
BIG fact table – ETL performance
– Grain is much lower than fact table – small dim
– Yes it‟s only 1 attribute today, but in the future there
could be another attribute.
13. Fact Table Primary Key
Should we have a PK? Some experts totally disagree
Yes, if we need to be able to identify each fact row
1. Need to refer to a fact row from another fact row e.g. chain of events
2. Many identical fact rows and we need to update/delete only one
3. To link the fact table to another fact table
Related Trans Header - Detail Uniqueness
PK FK PK FK (no RI) PK
(not enforced)
previous/next transaction
14. Fact Table Primary Key
Single or Multi Column?
Single Column: Generated Identity
Multi Column: Dimension Keys
Single-column PK is better than multi-column PK because :
1) A multi-column PK may not be unique. A single-column PK
guarantees that the PK is unique, because it is an identity column.
2) A single-column PK is slimmer than a multi-column PK, better query
performance. To do a self join in the fact table (e.g. to link the current
fact row to the previous fact row), we join on a single integer column.
15. Fact Table Primary Key
• Advantage: Prevent duplicate rows, query performance
• Disadvantage: loading performance
• Indexing the PK: cluster or not?
– Cluster the PK if: the PK is an identity column
– Don‟t cluster the PK if: the PK is a composite, or when you need
the cluster index for query performance (with partitioning)
Example of not having a PK
If duplicate fact rows are allowed.
e.g. retail DW: Store Key, Date Key, Product Key, Customer Key
Same customer buying the same milk in the same shop on the same day
twice
16. Aggregate Fact Tables
What are they?
Base Fact Tables
• High level aggregation of base fact tables
• A “select group by” query on a 2 billion rows
fact table can take 30 mins if it joins with two
big fact tables, even with indexes in place
• So we do this query in advance as part of the
DW load and store it as an Aggregate Fact
Table 30 mins
• The report only takes 1 second to run.
Aggregate
1 sec Fact Table
Report
17. Rapidly Changing Dimension
• Why is it a problem
– Large SCD2 dim – Attributes change every day
– Slow query when join with large fact tables
• What to do
– Put into a separate dim, link direct to fact table.
– Just store the latest, type 1 attributes (or dual)
– Store in the fact table (for small attribute, e.g. indicator)
Type2 Type2 Type2
Type2 Type1
18. Very Large Dimension
Why is it a problem
– SSAS: 4 GB string store limit for dimension
– SSAS: dim is “select distinct” on each attribute
– long processing time
– Difficult to browse high cardinality attribute
– Join with fact tables – performance
19. Very Large Dimension
What to do
– Split into 2 dims, same grain. Always cut vertically.
– Remove SCD2, or at least only certain columns.
– Most common: separate the attributes with high cardinality/change
frequency
VLD
20. Real Time Fact Table
• Reporting the transaction system in real time
• View to union with the normal fact table, or use partitions
• Freezing the dims for key lookup, -3 unknown key
• Key corrections next day
Dims as of Main partition
yesterday (up to last night)
Unknown keys:
-1 null in source
-2 not in dim table Real time partition
-3 not in dim table as dim was frozen dim (intraday today)
to be resolved next batch key
21. Dealing with Currency Rates
What for/background/requirements
– Report in 3 reporting currencies, using today rates or past
– Analyse over time without the impact of currency rates (using fixed
currency rates, e.g. 2010 EOY rates)
– Had the transactions happened today
– Currency rates historical analysis
Transaction DW Reporting
Currency Transaction Currency Reporting Currency
Rates Rates
100 countries (many transaction 1 currency ( 1 reporting 3-4 currencies
40 currencies dates) e.g. GBP GBP, USD, EUR,
date)
Original
23. Dealing with Status
What/background
– Workflow (policies, contracts, documents)
– Bottleneck analysis (no of days between
stages)
– How many on each stage
Status Status Status Status
1 2 4 6
date1 date2 date3 date4
Status Status
3 5
24. Dealing with Status
Approaches
– Accumulative Snapshot Fact, 1 row per application
– SCD2 on DimApp AppKey AppID StsKey StsDate Current
1 1 1 1/3/11 N
– App Status fact table
2 1 2 3/3/11 N
3 1 3 7/3/11 Y
AppKey StsKey StsDateKey
4 2 1 6/3/11 N
1 1 61
5 2 2 7/3/11 Y
1 2 63
1 3 67
2 1 66
AppKey Sts1Date Sts1Ind Sts2Date Sts2Ind Sts3Date Sts3Ind
2 2 67
1 1/3/11 1 3/3/11 1 7/3/11 1
2 6/3/11 1 7/3/11 1 0
25. Referenced Dimensions
• Enables using one “master” member
• Not Snowflake dimension
– For ex.
• Dim customers: UK, London, Roman Avramovich.
• Dim Stores: UK, London, Friendly Bikes Store
– What is the total revenue from Internet
customers and stores in London?
26. MDX optimization Methodology
• Re-write the MDX code
• Add Aggregations
• Add pre-calculated Measure Groups (ETL)
• Solve the problem using Relational Engine
• Use .NET Store Procedures.
– Rarely the problem can be solved using better
hardware.
• Column based Databases
27. • Optimizing MDX
– Baselining Query Speeds
• Clearing the Analysis Services Caches
• Clearing the Operating System Caches using
fsutil.exe or SSAS Stored Proc (codeplex)
• Identifying and Resolving MDX Query
Performance Bottlenecks in SQL Server 2005
Analysis Services
• Configuring the Analysis Services Query Log
28. • Cell-by-Cell Mode vs. Subspace Mode
Almost always, performance obtained by
using subspace (or block computation)
mode is superior to that obtained by using
cell-by-cell (nor naïve) mode.
32. Granularity
• Single grain
– List of GROUP BY attributes in SQL SELECT
• Mixed grain
– Both Attribute.[All] and Attribute.MEMBERS
33. Granularity
All Countries, Countries,
Country, All City Cities
All City
All
Products
Products
34. Slice
• Single member
– SQL: Where City = „Redmond‟
– MDX: [City].[Redmond]
• Multiple members
– SQL: Where City IN („Redmond‟, „Seattle‟)
– MDX: { [City].[Redmond], [City].[Seattle] }
35. Slice at granularity
SQL
SELECT Sum(Sales), City FROM Sales_Table
WHERE City IN (‘Redmond’, ‘Seattle’)
GROUP BY City
MDX
SELECT Measures.Sales ON 0
, NON EMPTY {Redmond, Seattle} ON 1
FROM Sales_Cube
36. Slice below granularity
SQL
SELECT Sum(Sales) FROM Sales_Table
WHERE City IN (‘Redmond’, ‘Seattle’)
MDX
SELECT Measures.Sales ON 0
FROM Sales_Cube
WHERE {Redmond, Seattle}
37. Examples
All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
38. Examples
All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
(Seattle, Year.Year.MEMBERS)
39. Examples
All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
(Seattle, Year.MEMBERS)
40. Examples
All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
({Redmond, Seattle, London}, Year.MEMBERS)
41. Examples
All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
({Redmond, Seattle}, {2005, 2006, 2007})
43. Arbitrary shaped subcubes
All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
Lodnon
Union((Redmond, Year.Year.MEMBERS), (City.City.MEMBERS,
2005))
44. Arbitrary shaped subcubes
All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
SF
Denver
CrossJoin(City.City.MEMBERS, Year.Year.MEMBERS) –
(Seattle, 2007)
45. Arbitrary shaped subcubes
All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
{(Redmond,2005), (Seattle, 2006), (New York, 2007), (London,
2008)}
46. Arbitrary shaped subcubes
All Years 2005 2006 2007 2008
All Cities
Redmon
d
Seattle
New
York
London
Union(([All Cities], Year.MEMBERS), (City.MEMBERS, [All
Years]))
52. Leaves vs. Non Leaves
All Countries, Countries,
Country, All City Cities
All City
All
Product
s
Product Leaves
s
53. Problems with arbitrary shapes
• Caching
• Partition slices
• Indexes
• SCOPEs
• Matching calculations
• Many more
(for every topic we discuss – just ask “What will happen with arbitrary shapes”, and I am in trouble)
59. SSAS Denali
• Coming in the first half of 2012
• SSAS Tabular Mode
– Cheaper
– Not best of breed
– Uses DAX or MDX
• Have you started working with it?
60. Mobile BI
BI l
Smart Phone l
BI l
Mobile Bi
BI
Gartner
61. Social BI
• Discover New Insights - Analyze the
demographic and psychographic profiles
of your Facebook application users.
• Analyze Facebook Data - Analyze the full
spectrum of Facebook data: profiles,
interests, check-ins, and more
• Instantly Available via Cloud