SlideShare a Scribd company logo
1 of 7
Table of Contents
SSIS Partitioning and Best Practices ............................................................................................................ 3
Sliding window .......................................................................................................................................... 4
Parallel Execution Using partition logic ................................................................................................ 4
SSIS Best Practices ........................................................................................................................................ 5
Benefits of using SSIS Partitioning ............................................................................................................ 7
Appendix ............................................................................................................................................... 7

1
SSIS Partitioning and Best
Practices

Date

27/1/2014

Owner

Vinod kumar kodatham

OBJECT OVERVIEW
Technical Name

Description

SSIS Partitioning and Best Practices.
Partitioning is Divides the large table and its indexes into smaller
parts / partitions, so that maintenance operations can be applied on
a partition-by-partition basis, rather than on the entire table.

2
SSIS Partitioning and Best Practices
Partitioning and Best Practices to be followed while developing SSIS ETLs to improve
Performance of the Packages.
Types of Partitions
•

Vertical partitioning
some columns in one table
other columns in some other table

•

Horizontal partitioning
Based on the rows range splitting the table

Requirements for Table Partition
•

Partition Function - Logical - Defines the points on the line (right or left)

Syntax : CREATE PARTITION FUNCTION [partfunc_TinyInt_MOD10](tinyint) AS
RANGE RIGHT FOR VALUES (0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09)
GO
ex:Creating a RANGE LEFT partition function on an int column
CREATE PARTITION FUNCTION myRangePF1 (int) AS RANGE LEFT FOR VALUES (1, 100, 1000);
Creating a RANGE RIGHT partition function on an int column
CREATE PARTITION FUNCTION myRangePF2 (int) AS RANGE RIGHT FOR VALUES (1, 100, 1000);

Syntax : CREATE PARTITION SCHEME [partscheme_DATA_TinyInt_MOD10]
AS PARTITION [partfunc_TinyInt_MOD10] TO ([DATA], [DATA], [DATA],
[DATA], [DATA], [DATA], [DATA], [DATA], [DATA], [DATA])
GO
•

Partitioned Key

Single Column or Computed Column which are marked Persisted
All data types for use as index columns are valid, except timestamp. LOB data types and CLR user defined types
cannot be used
Clustered table - must be part of either primary key or clustered index
Ideally queries should use them as filter

Partitioning Usage in Table
Create the table with PARTITION SCHEME
CREATE TABLE [tmp].[Table_1](
.
.

3
) ON
[partscheme_DATA_TinyInt_MOD10]([MOD10])

Sliding window
1. Create a non partitioned archive table with the same structure, and a matching clustered index (if
required).

Place it on the same filegroup as the oldest partition.

2.

Use SWITCH to move the oldest partition from the partitioned table to the archive table.

3.

Remove the boundary value of the oldest partition using MERGE.
get smallest range vlaue from sys.partition_range_values and MERGE it
Syntax: ALTER PARTITION FUNCTION pf_k_rows()
MERGE RANGE (@merge_range)

4.Designate the NEXT USED filegroup.
5.
Create a new boundary value for the new partition using SPLIT (the best practice is to split an empty
partition at the leading end of the table into two empty partitions to minimize data movement.).
get largest range vlaue from sys.partition_range_values SPLIT last range with a new value
Syntax:SELECT @split_range = @split_range + 1000
ALTER PARTITION FUNCTION pf_k_rows()
SPLIT RANGE (@split_range)
6.Create a staging table that has the same structure as the partitioned table on the target filegroup.
7.Populate the staging table.
8.Add indexes.
9.Add a check constraint that matches the constraint of the new partition.
10.Ensure that all indexes are aligned.
11.Switch the newest data into the partitioned table (the staging table is now empty).
12.Update statistics on the partitioned table

Parallel Execution Using partition logic
Table data refresh time can be improved using partitioned parallel execution.
1.

Create PARTITION FUNCTION

2.

Create PARTITION SCHEME

3.

CREATE TABLE [dbo].[syslargevolumelog]

4.

Check If loading not at completed it will go down else go to step 8

5.

Create the table with PARTITION SCHEME

6.

Laod the TargetTable with SourceTable Using idcolumn/10=1 Etc...

7.

Update [syslargevolumelog] with data is loaded for this partition

8.

Create temporary table same as original table

4
9.

Switch all partitions to temporary table

10.

Create unique clustered indexes

11.

Rename the temporary table as original table

SSIS Best Practices
Avoid SELECT *
Removing this unused output column can increase Data Flow task performance
Steps need to be considered while loading the data.
If any Non Clustered Index(es) exists
DROP all Non-Clustered Index(es)
If Clustered Index exists
DROP Clustered Index
Steps need to be considered while selecting the data.
If Clustered Index does not exists
CREATE Clustered Index
If Non Clustered Index(es) does exists
CREATE Non Clustered Index
Effect of OLEDB Destination Settings

Keep Identity – By default this setting is unchecked. If you check this setting, the dataflow engine will ensure that
the source identity values are preserved and same value is inserted into the destination table.
Keep Nulls –By default this setting is unchecked. If you check this option then default constraint on the
destination table's column will be ignored and preserved NULL of the source column will be inserted into the
destination.
Table Lock – By default this setting is checked and the recommendation is to let it be checked unless the same
table is being used by some other process at same time.
Check Constraints – Again by default this setting is checked and recommendation is to un-check it if you are sure
that the incoming data is not going to violate constraints of the destination table. If you un-check this option it
will improve the performance of the data load.
Better performance with parallel execution
MaxConcurrentExecutables – default value is -1, which means total number of available processors + 2, also if
you have hyper-threading enabled then it is total number of logical processors + 2.
Avoid asynchronous transformation (such as Sort Transformation) wherever possible
Ex: - Aggregate
- Fuzzy Grouping
- Merge
- Merge Join

5
- Sort
- Union All
How DelayValidation property can help you
In general the package will be validated during design time itself. However, we can control this behavior by using
"Delay Validation" property.
Default value of this property is false. By setting delay validation to true, we can delay validation of the package
until run time.
When to use events logging and when to avoid...
Recommendation here is to enable logging if required, you can dynamically set the value of the
LoggingMode property (of a package and its executables) to enable or disable logging without modifying the
package. Also you should choose to log for only those executables which you suspect to have problems and
further you should only log

those events which are absolutely required for troubleshooting.
Effect of Rows Per Batch and Maximum Insert Commit Size Settings
Rows per batch – The default value for this setting is -1 which specifies all incoming rows will be treated as a
single batch. You can change this default behavior and break all incoming rows into multiple batches. The allowed
value is only positive integer which specifies the maximum number of rows in a batch.
OLEDB Destination:
Maximum insert commit size – The default value for this setting is '2147483647' (largest value for 4 byte integer
type) which specifies all incoming rows will be committed once on successful completion. You can specify a
positive value for this setting to indicate that commit will be done for those number of records.
Changing the default value for this setting will put overhead on the dataflow engine to commit several times. Yes
that is true, but at the same time it will release the pressure on the transaction log and tempdb to grow
tremendously specifically during high volume data transfers.
DefaultBufferMaxSize and DefaultBufferMaxRows
The number of buffer created is dependent on how many rows fit into a buffer and how many rows fit into a
buffer dependent on few other factors.
1. Estimated row size,
2. DefaultBufferMaxSize property of the data flow task.default value is 10 MB and its upper and lower boundaries
are MaxBufferSize (100MB) and MinBufferSize (64 KB).
3. DefaultBufferMaxRows which is again a property of data flow task which specifies the default number of rows
in a buffer. Its default value is 10000.
Lookup transformation consideration
Choose the caching mode wisely after analyzing your environment.
If you are using Partial Caching or No Caching mode, ensure you have an index on the reference table for better
performance.
Instead of directly specifying a reference table in he lookup configuration, you should use a SELECT statement
with only the required columns.
You should use a WHERE clause to filter out all the rows which are not required for the lookup.
set data type in each column appropriately, especially if your source is flat file. This will enable you to
accommodate as many rows as possible in the buffer.

6
Avoid many small buffers. Tweak the values for DefaultMaxBufferRows and DefaultMaxBufferSize to get as many
records into a buffer as possible, especially when dealing with large data volumes.

Full Load vs Delta Load
Design the package in such a way that it does a full pull of data only in the beginning or on-demand, next time
onward it should do the incremental pull, this will greatly reduce the volume of data load operations, especially
when volumes are likely to increase over the lifecycle of an application. For this purpose, use upstream enabled
CDC (Change Data Capture) feature of SQL Server 2008; for previous versions of SQL Server incremental pull
logic.
Use merge instead of SCD
The big advantage of the MERGE statement is being able to handle multiple actions in a single pass of the data
sets, rather than requiring multiple passes with separate inserts and updates. A well tuned optimizer could handle
this extremely efficiently.
Packet size in connection should equal to 32767

Data types as narrow as possible for less memory usage

Do not perform excessive casting
Use group by instead of aggregation
Unnecessary delta detection vs. reload
commit size 0 == fastest

Benefits of using SSIS Partitioning
Following are some of the benefits of following SSIS Partitioning and Best Practices:
It facilitates the management of large fact tables in data warehouses.
Performance / parallelism benefits
Dividing the table into across file groups is benefitting on IO Operations, fetch latest data ,re indexing ,backup
and restore.
For range-based inserts or range-based deletes
Sliding window scenario
In SQL Server 2008 SP2 and SQL Server 2008 R2 SP1, you can choose to enable support for 15,000 partitions.

Appendix
Reference used for Best Practices:
http://msdn.microsoft.com/en-us/library/ms190787.aspx

http://www.mssqltips.com/sql_server_business_intelligence_tips.asp
7

More Related Content

What's hot

Oracle Collections
Oracle CollectionsOracle Collections
Oracle Collections
Trendz Lab
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
MongoDB
 

What's hot (20)

SQL Complete Tutorial. All Topics Covered
SQL Complete Tutorial. All Topics CoveredSQL Complete Tutorial. All Topics Covered
SQL Complete Tutorial. All Topics Covered
 
PHP mysql Mysql joins
PHP mysql  Mysql joinsPHP mysql  Mysql joins
PHP mysql Mysql joins
 
SQL
SQLSQL
SQL
 
Oracle Collections
Oracle CollectionsOracle Collections
Oracle Collections
 
[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL
[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL
[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL
 
Sql Constraints
Sql ConstraintsSql Constraints
Sql Constraints
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
Shell sort[1]
Shell sort[1]Shell sort[1]
Shell sort[1]
 
Concurrency Control.
Concurrency Control.Concurrency Control.
Concurrency Control.
 
Lecture 4 sql {basics keys and constraints}
Lecture 4 sql {basics  keys and constraints}Lecture 4 sql {basics  keys and constraints}
Lecture 4 sql {basics keys and constraints}
 
Structured query language(sql)ppt
Structured query language(sql)pptStructured query language(sql)ppt
Structured query language(sql)ppt
 
MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals
 
Client server computing_keypoint_and_questions
Client server computing_keypoint_and_questionsClient server computing_keypoint_and_questions
Client server computing_keypoint_and_questions
 
Generalized Barycentric Coordinates
Generalized Barycentric CoordinatesGeneralized Barycentric Coordinates
Generalized Barycentric Coordinates
 
Mongo db
Mongo dbMongo db
Mongo db
 
Relational Algebra,Types of join
Relational Algebra,Types of joinRelational Algebra,Types of join
Relational Algebra,Types of join
 
Kruskal's algorithm
Kruskal's algorithmKruskal's algorithm
Kruskal's algorithm
 
Binary Search Algorithm.pptx
Binary Search Algorithm.pptxBinary Search Algorithm.pptx
Binary Search Algorithm.pptx
 
Sql dml & tcl 2
Sql   dml & tcl 2Sql   dml & tcl 2
Sql dml & tcl 2
 
Circular linked list
Circular linked listCircular linked list
Circular linked list
 

Similar to Ssis partitioning and best practices

Mohan Testing
Mohan TestingMohan Testing
Mohan Testing
smittal81
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
paulguerin
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
paulguerin
 
Ssis Best Practices Israel Bi U Ser Group Itay Braun
Ssis Best Practices   Israel Bi U Ser Group   Itay BraunSsis Best Practices   Israel Bi U Ser Group   Itay Braun
Ssis Best Practices Israel Bi U Ser Group Itay Braun
sqlserver.co.il
 

Similar to Ssis partitioning and best practices (20)

PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testing
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_Cheatsheet
 
Nested loop join technique - part2
Nested loop join technique - part2Nested loop join technique - part2
Nested loop join technique - part2
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 
Part2 Best Practices for Managing Optimizer Statistics
Part2 Best Practices for Managing Optimizer StatisticsPart2 Best Practices for Managing Optimizer Statistics
Part2 Best Practices for Managing Optimizer Statistics
 
SQL Server 2012 Best Practices
SQL Server 2012 Best PracticesSQL Server 2012 Best Practices
SQL Server 2012 Best Practices
 
Ssis Best Practices Israel Bi U Ser Group Itay Braun
Ssis Best Practices   Israel Bi U Ser Group   Itay BraunSsis Best Practices   Israel Bi U Ser Group   Itay Braun
Ssis Best Practices Israel Bi U Ser Group Itay Braun
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
PostgreSQL Terminology
PostgreSQL TerminologyPostgreSQL Terminology
PostgreSQL Terminology
 
White Paper On ConCurrency For PCMS Application Architecture
White Paper On ConCurrency For PCMS Application ArchitectureWhite Paper On ConCurrency For PCMS Application Architecture
White Paper On ConCurrency For PCMS Application Architecture
 
Ebs stats
Ebs statsEbs stats
Ebs stats
 
Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2
 
Large scale sql server best practices
Large scale sql server   best practicesLarge scale sql server   best practices
Large scale sql server best practices
 
Cost Based Oracle
Cost Based OracleCost Based Oracle
Cost Based Oracle
 
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Ssis partitioning and best practices

  • 1. Table of Contents SSIS Partitioning and Best Practices ............................................................................................................ 3 Sliding window .......................................................................................................................................... 4 Parallel Execution Using partition logic ................................................................................................ 4 SSIS Best Practices ........................................................................................................................................ 5 Benefits of using SSIS Partitioning ............................................................................................................ 7 Appendix ............................................................................................................................................... 7 1
  • 2. SSIS Partitioning and Best Practices Date 27/1/2014 Owner Vinod kumar kodatham OBJECT OVERVIEW Technical Name Description SSIS Partitioning and Best Practices. Partitioning is Divides the large table and its indexes into smaller parts / partitions, so that maintenance operations can be applied on a partition-by-partition basis, rather than on the entire table. 2
  • 3. SSIS Partitioning and Best Practices Partitioning and Best Practices to be followed while developing SSIS ETLs to improve Performance of the Packages. Types of Partitions • Vertical partitioning some columns in one table other columns in some other table • Horizontal partitioning Based on the rows range splitting the table Requirements for Table Partition • Partition Function - Logical - Defines the points on the line (right or left) Syntax : CREATE PARTITION FUNCTION [partfunc_TinyInt_MOD10](tinyint) AS RANGE RIGHT FOR VALUES (0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09) GO ex:Creating a RANGE LEFT partition function on an int column CREATE PARTITION FUNCTION myRangePF1 (int) AS RANGE LEFT FOR VALUES (1, 100, 1000); Creating a RANGE RIGHT partition function on an int column CREATE PARTITION FUNCTION myRangePF2 (int) AS RANGE RIGHT FOR VALUES (1, 100, 1000); Syntax : CREATE PARTITION SCHEME [partscheme_DATA_TinyInt_MOD10] AS PARTITION [partfunc_TinyInt_MOD10] TO ([DATA], [DATA], [DATA], [DATA], [DATA], [DATA], [DATA], [DATA], [DATA], [DATA]) GO • Partitioned Key Single Column or Computed Column which are marked Persisted All data types for use as index columns are valid, except timestamp. LOB data types and CLR user defined types cannot be used Clustered table - must be part of either primary key or clustered index Ideally queries should use them as filter Partitioning Usage in Table Create the table with PARTITION SCHEME CREATE TABLE [tmp].[Table_1]( . . 3
  • 4. ) ON [partscheme_DATA_TinyInt_MOD10]([MOD10]) Sliding window 1. Create a non partitioned archive table with the same structure, and a matching clustered index (if required). Place it on the same filegroup as the oldest partition. 2. Use SWITCH to move the oldest partition from the partitioned table to the archive table. 3. Remove the boundary value of the oldest partition using MERGE. get smallest range vlaue from sys.partition_range_values and MERGE it Syntax: ALTER PARTITION FUNCTION pf_k_rows() MERGE RANGE (@merge_range) 4.Designate the NEXT USED filegroup. 5. Create a new boundary value for the new partition using SPLIT (the best practice is to split an empty partition at the leading end of the table into two empty partitions to minimize data movement.). get largest range vlaue from sys.partition_range_values SPLIT last range with a new value Syntax:SELECT @split_range = @split_range + 1000 ALTER PARTITION FUNCTION pf_k_rows() SPLIT RANGE (@split_range) 6.Create a staging table that has the same structure as the partitioned table on the target filegroup. 7.Populate the staging table. 8.Add indexes. 9.Add a check constraint that matches the constraint of the new partition. 10.Ensure that all indexes are aligned. 11.Switch the newest data into the partitioned table (the staging table is now empty). 12.Update statistics on the partitioned table Parallel Execution Using partition logic Table data refresh time can be improved using partitioned parallel execution. 1. Create PARTITION FUNCTION 2. Create PARTITION SCHEME 3. CREATE TABLE [dbo].[syslargevolumelog] 4. Check If loading not at completed it will go down else go to step 8 5. Create the table with PARTITION SCHEME 6. Laod the TargetTable with SourceTable Using idcolumn/10=1 Etc... 7. Update [syslargevolumelog] with data is loaded for this partition 8. Create temporary table same as original table 4
  • 5. 9. Switch all partitions to temporary table 10. Create unique clustered indexes 11. Rename the temporary table as original table SSIS Best Practices Avoid SELECT * Removing this unused output column can increase Data Flow task performance Steps need to be considered while loading the data. If any Non Clustered Index(es) exists DROP all Non-Clustered Index(es) If Clustered Index exists DROP Clustered Index Steps need to be considered while selecting the data. If Clustered Index does not exists CREATE Clustered Index If Non Clustered Index(es) does exists CREATE Non Clustered Index Effect of OLEDB Destination Settings Keep Identity – By default this setting is unchecked. If you check this setting, the dataflow engine will ensure that the source identity values are preserved and same value is inserted into the destination table. Keep Nulls –By default this setting is unchecked. If you check this option then default constraint on the destination table's column will be ignored and preserved NULL of the source column will be inserted into the destination. Table Lock – By default this setting is checked and the recommendation is to let it be checked unless the same table is being used by some other process at same time. Check Constraints – Again by default this setting is checked and recommendation is to un-check it if you are sure that the incoming data is not going to violate constraints of the destination table. If you un-check this option it will improve the performance of the data load. Better performance with parallel execution MaxConcurrentExecutables – default value is -1, which means total number of available processors + 2, also if you have hyper-threading enabled then it is total number of logical processors + 2. Avoid asynchronous transformation (such as Sort Transformation) wherever possible Ex: - Aggregate - Fuzzy Grouping - Merge - Merge Join 5
  • 6. - Sort - Union All How DelayValidation property can help you In general the package will be validated during design time itself. However, we can control this behavior by using "Delay Validation" property. Default value of this property is false. By setting delay validation to true, we can delay validation of the package until run time. When to use events logging and when to avoid... Recommendation here is to enable logging if required, you can dynamically set the value of the LoggingMode property (of a package and its executables) to enable or disable logging without modifying the package. Also you should choose to log for only those executables which you suspect to have problems and further you should only log those events which are absolutely required for troubleshooting. Effect of Rows Per Batch and Maximum Insert Commit Size Settings Rows per batch – The default value for this setting is -1 which specifies all incoming rows will be treated as a single batch. You can change this default behavior and break all incoming rows into multiple batches. The allowed value is only positive integer which specifies the maximum number of rows in a batch. OLEDB Destination: Maximum insert commit size – The default value for this setting is '2147483647' (largest value for 4 byte integer type) which specifies all incoming rows will be committed once on successful completion. You can specify a positive value for this setting to indicate that commit will be done for those number of records. Changing the default value for this setting will put overhead on the dataflow engine to commit several times. Yes that is true, but at the same time it will release the pressure on the transaction log and tempdb to grow tremendously specifically during high volume data transfers. DefaultBufferMaxSize and DefaultBufferMaxRows The number of buffer created is dependent on how many rows fit into a buffer and how many rows fit into a buffer dependent on few other factors. 1. Estimated row size, 2. DefaultBufferMaxSize property of the data flow task.default value is 10 MB and its upper and lower boundaries are MaxBufferSize (100MB) and MinBufferSize (64 KB). 3. DefaultBufferMaxRows which is again a property of data flow task which specifies the default number of rows in a buffer. Its default value is 10000. Lookup transformation consideration Choose the caching mode wisely after analyzing your environment. If you are using Partial Caching or No Caching mode, ensure you have an index on the reference table for better performance. Instead of directly specifying a reference table in he lookup configuration, you should use a SELECT statement with only the required columns. You should use a WHERE clause to filter out all the rows which are not required for the lookup. set data type in each column appropriately, especially if your source is flat file. This will enable you to accommodate as many rows as possible in the buffer. 6
  • 7. Avoid many small buffers. Tweak the values for DefaultMaxBufferRows and DefaultMaxBufferSize to get as many records into a buffer as possible, especially when dealing with large data volumes. Full Load vs Delta Load Design the package in such a way that it does a full pull of data only in the beginning or on-demand, next time onward it should do the incremental pull, this will greatly reduce the volume of data load operations, especially when volumes are likely to increase over the lifecycle of an application. For this purpose, use upstream enabled CDC (Change Data Capture) feature of SQL Server 2008; for previous versions of SQL Server incremental pull logic. Use merge instead of SCD The big advantage of the MERGE statement is being able to handle multiple actions in a single pass of the data sets, rather than requiring multiple passes with separate inserts and updates. A well tuned optimizer could handle this extremely efficiently. Packet size in connection should equal to 32767 Data types as narrow as possible for less memory usage Do not perform excessive casting Use group by instead of aggregation Unnecessary delta detection vs. reload commit size 0 == fastest Benefits of using SSIS Partitioning Following are some of the benefits of following SSIS Partitioning and Best Practices: It facilitates the management of large fact tables in data warehouses. Performance / parallelism benefits Dividing the table into across file groups is benefitting on IO Operations, fetch latest data ,re indexing ,backup and restore. For range-based inserts or range-based deletes Sliding window scenario In SQL Server 2008 SP2 and SQL Server 2008 R2 SP1, you can choose to enable support for 15,000 partitions. Appendix Reference used for Best Practices: http://msdn.microsoft.com/en-us/library/ms190787.aspx http://www.mssqltips.com/sql_server_business_intelligence_tips.asp 7