3. Contents
• Understanding the Data Integration
• Understanding the SQL Server 2008 R2
Integration Services
• Understanding the SSIS Packages
• Understanding the SSIS Control Flow
• Understanding the SSIS Data Flow
http://techmaster.vn
4. SQL Server 2008 R2 BI Structure
Reporting and Visualization Tools (Dashboard, KPI,
Presentation Layer
Scorecard,…)
Turn data into information (analysis)
Analytical Layer
Multidimensional OLAP Database
Data Storage and Retrieval Layer Data Warehouse in RDBMS
1. Extract the data from the multiple sources
Data Transformation Layer 2. Modify the data to consistent
3. Load the data into Data Storage system
Data Source Layer Text, MS Excel, MS Access, MS SQL, Oracle,…| External Sources
http://techmaster.vn
5. Microsoft Business Intelligence Platform
Analytic Scorecards, Analytics, Planning
Applications (PerformancePoint Service)
Portal
(SharePoint)
Data Delivery Report Builder End-user Analysis
SSRS (Excel)
Integrate Analyze Report
(SQL Integration Services) (SQL Analysis Services) (SQL Reporting Services)
Infrastructure
Platform Data Warehouse, Data Marts,
Operational Data
(SQL Server 2008 R2)
Office SQL
http://techmaster.vn
6. Data Integration in Real World
Extract data Transform the Load data into
from sources data data stores
http://techmaster.vn
7. Data Integration Challenges
• Multiple sources with different formats.
• Structured, semi-structured, and unstructured
data.
• Huge data volumes.
Enterprises spend 60%–80% of their resources developing
and testing their ETL processes
http://techmaster.vn
9. Introducing Integration Services 2008 R2
• Primarily designed to implement ETL
processes
• Provides a robust, flexible, fast, scalable and
extensible architecture
• Challenges traditional ETL design approaches
http://techmaster.vn
10. Introducing Integration Services 2008 R2
• Its capabilities are useful in many other
scenarios:
– Assessing data quality
– Cleansing and standardizing data
– Merging data from heterogeneous data stores
– Implementing ad hoc data transfers
– Automating administrative tasks
http://techmaster.vn
11. SSIS Architecture
• SQL Server Integration
Services (SSIS) service
• SSIS object model
• Two distinct runtime
engines:
– Control flow
– Data flow
http://techmaster.vn
12. SSIS Architecture
• SSIS Designer
– Graphical tool to create and maintain
Integration Services packages.
• Integration Services Runtime
– Saves the layout of packages, runs
packages, and provides support for
logging, breakpoints, configuration,
connections, and transactions.
• Tasks and other executable:
– The Integration Services run-time
executables are the package,
containers, tasks, and event handlers
http://techmaster.vn
14. SSIS Architecture
• Object Model
– Allow for creating custom
components for use in packages
• Integration Services Service
– Lets you monitor running
Integration Services packages and
to manage the storage of
packages.
http://techmaster.vn
16. What’s IS Package
• A package is the object that implements Integration
Services functionality to extract, transform, and load
data.
• Creation tools:
– SSIS Designer in BI Development Studio.
– SQL Server Import and Export Wizard
– Integration Services Connections Project Wizard
• Saved in XML format to the file system or SQL Server
http://techmaster.vn
17. Package Elements
• Connection managers
• Control flow components
• Data flow components
• Variables
• Event handlers
• Configurations
http://techmaster.vn
18. Connection Managers
• Logical representation of
a connection
• Stored in the package
and cannot be shared
between packages
• Used by package
elements
• Do not need to connect
to SQL Server
http://techmaster.vn
20. Control Flow
• Control flow is the process-oriented
workflow engine
• A package consists of a single
control flow
• Control flow elements:
– Containers
– Tasks
– Precedence constraints
– Variables
– Event handlers
http://techmaster.vn
21. Containers
• Provide structure and services for
– Grouping tasks
– Implementing repeating flows
• Execute in sequence defined by precedence constraints in the control flow
• Manage variable and transactional boundaries
http://techmaster.vn
22. Tasks
• Perform discrete operations at runtime
• Execute in sequence defined by precedence
constraints in the control flow
• Use properties configured at design time or
assigned dynamically at runtime by using
expressions
http://techmaster.vn
23. Task Categories
Task Descriptions
Data Flow The Data Flow task defines and runs data flows that extract data, apply transformations,
and load data
Data Preparation Data preparation tasks copy files and directories, download files and data, save data
returned by Web methods, or work with XML documents
Workflow Workflow tasks communicate with other processes to run packages or programs, send
and receive messages between packages, send e-mail messages, read Windows
Management Instrumentation (WMI) data, or watch for WMI events.
SQL Server SQL Server tasks access, copy, insert, delete, or modify SQL Server objects and data
Analysis Services Analysis Services tasks create, modify, delete, or process Analysis Services objects
Scripting Scripting tasks extend package functionality through custom scripts
Maintenance Maintenance tasks perform administrative functions, such as backing up and shrinking
SQL Server databases, rebuilding and reorganizing indexes, and running SQL Server Agent
jobs
http://techmaster.vn
24. Precedence Constraints
• Precedence constraints link executables, containers,
and tasks in packages into a control flow, and specify
conditions that determine whether executables run
• Configure conditions that determine whether the
executable runs:
– Success, Failure, or Completion constraints
– Expressions
– Logical AND/OR for
multiple constraints
http://techmaster.vn
25. Variables
• Variables customize package
behavior by changing expression
values or object properties
• System variables store values
collected during package
execution
• All variables use case-sensitive
names
• Variables can be scoped at
package, container, or task level
http://techmaster.vn
26. Event Handlers
• At run time executables raise events
• Event handlers can be defined to respond to
these events
• Creating an event handler is similar to
building a package; an event handler has
tasks and containers, which are sequenced
into the control flow
http://techmaster.vn
27. Event Handlers
• Common events used to trigger event handlers:
– OnPreExecute
– OnPostExecute
– OnError
• Examples:
– Retrieve system information to assess resource
availability before the package runs
– Send an e-mail message when an error occurs
http://techmaster.vn
29. Data Flow
• Data Flow is optional elements
– Extract data
– Modify data
– Load data into data sources.
• The main data flow elements are
– Sources
– Transformations
– Destinations.
http://techmaster.vn
30. Data Flow Sources
• Sources extract data from:
– Relational tables and views
– Files
– Analysis Services databases
http://techmaster.vn
31. Data Flow Transformations
• Aggregate, merge, distribute, or modify data
• Include error outputs in some cases
• Transformation categories
– Row
– Rowset
– Split and Join
– Script
– Other
http://techmaster.vn
32. Row Transformations
Transformation Description
Character Map The transformation that applies string functions to character data.
The transformation that adds copies of input columns to the
Copy Column transformation output.
The transformation that converts the data type of a column to a
Data Conversion different data type.
The transformation that populates columns with the results of
Derived Column expressions.
Export Column The transformation that inserts data from a data flow into a file.
The transformation that reads data from a file and adds it to a data
Import Column flow.
The transformation that uses script to extract, transform, or load
Script Component data.
The transformation that runs SQL commands for each row in a data
OLE DB Command flow. http://techmaster.vn
33. Rowset Transformations
Transformation Description
The transformation that performs aggregations such as
Aggregate
AVERAGE, SUM, and COUNT.
Sort The transformation that sorts data.
Percentage The transformation that creates a sample data set using a
Sampling percentage to specify the sample size.
The transformation that creates a sample data set by
Row Sampling
specifying the number of rows in the sample.
The transformation that creates a less normalized version
Pivot
of a normalized table.
The transformation that creates a more normalized
Unpivot
version of a nonnormalized table.
http://techmaster.vn
34. Split and Join Transformations
Transformation Description
Conditional Split The transformation that routes data rows to different outputs.
The transformation that distributes data sets to multiple
Multicast
outputs.
Union All The transformation that merges multiple data sets.
Merge The transformation that merges two sorted data sets.
The transformation that joins two data sets using a FULL, LEFT,
Merge Join
or INNER join.
The transformation that looks up values in a reference table
Lookup
using an exact match.
The transformation that writes data from a connected data
source in the data flow to a Cache connection manager that
Cache
saves the data to a cache file. The Lookup transformation
performs lookups on the data in the cache file. http://techmaster.vn
35. The Script Transformation
• Extends the capabilities of the data flow
• Similar to the Script Task, develop VB.NET or C# .NET
scripts to introduce custom logic into the data flow
• Can be configured for these roles:
– Source
– Destination
– Transformation
• Delivers optimized performance because it is precompiled
http://techmaster.vn
36. Other Transformations
• Add audit information
• Populate lookup caches
• Export and import data
• Count rows
• Manage slowly changing dimensions
http://techmaster.vn
Key Points: Integration Services (SSIS) provides a scalable enterprise data integration platform with exceptional Extract, Transform, Load (ETL) and integration capabilities, enabling organizations to more easily manage data from a wide array of data sourcesMaster Data Services (MDS) enables organizations to start with simple solutions for analytic or operational requirements, and then adapt the solutions to additional requirements incrementallyThe latest version of SQL Server from Microsoft SQL Server 2008 offers hundreds of new DBMS features that boost the productivity of database administrators and developers, improve support for larger databases, and enhance securityReporting Services (SSRS) provides a full range of ready-to-use tools and services to help you create, deploy, and manage reports for your organization, as well as programming features that enable you to extend and customize your reporting functionalityAnalysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applicationsConclusion: With SQL Server 2008 R2 customers get all the technologies needed to build a reliable and secure BI platform. SQL Server 2008 R2 has the strongest combination of price/performance, manageability, security, and DBA productivity.
Update column values or create new columnsTransform each row in the pipeline input
The transformations create new rowsets that can include aggregate and sorted values, sample rowsets, or pivoted and unpivotedrowsets.
The transformations distribute rows to different outputs, create copies of the transformation inputs, join multiple inputs into one output, and perform lookup operations.