Scaling API-first – The story of a global engineering organization
What's new in SQL Server Integration Services 2012?
1. WHAT’S NEW IN SQL SERVER
INTEGRATION SERVICES 2012?
Nico Jacobs
Nico@U2U.be
@sqlwaldorf
2. WHAT’S SSIS?
• E xtract from source systems
• SQL Server, Oracle, DB2, flat file, xml, Excel, …
• T ransform data
• Lookup surrogate keys, clean data, reformat, …
• L oad it into a destination database
• Transactions, checkpoints, scalability, …
3. WHAT’S SSIS
• Data flow reads data from source(s)
• Data is pushed in a row-based pipeline
• It optionally passes through one or more preprogrammed or ad-hoc
transformations
• Streaming transformations improve scalability
• Destination(s) write data to disk, db, …
• Control flow dictates in which order tasks execute, data flow is one of
these tasks
4. WHAT’S NEW IN 2012?
• A lot!
• New stuff for package developers
• New stuff for package administration
• New stuff for package usage
• Let’s get started!
5. 1: GUI IMPROVEMENTS
• Getting started window
• Package visualization
• Zoom
• Undo
• SSIS toolbox
• Data flow source/destination wizard
• Sort packages by name
• Grouping in data flow
6. CHANGE DATA CAPTURE
• Incremental load loads all rows that have changed since the last load
• How do we know what has changed?
• Compare every source row with every destination row
• Last modified date and a trigger to maintain this
• Change tracking
• Change data capture!
7. CHANGE DATA CAPTURE
• SQL Server Enterprise edition, 2008 or higher
• Asynchronous process
• Captures all changes
• Maintains time window
• CDC data access via table valued functions
Books online, change data capture
8. 2: CDC TASK AND COMPONENTS
• CDC needs to keep track of which changes have already been
processed
• CDC task does this by storing LSNs in a tracking table
• CDC Source component reads from the CDC table function, based on
the LSN it got from the CDC task
• CDC transformation splits records into new rows, updated rows and
deleted rows
• No documentation yet in RC0, check Matt Masson’s blog
• Based on Attunity CDC components
9. 3: MAPPING DATA FLOW COLUMNS
• When modifying a data flow, column remapping is sometimes needed
• SSIS 2012 maps columns on name instead of id
• It also has an improved remapping dialog
10. 4: ODBC SOURCE AND DESTINATION
• ODBC was not natively supported in 2008
• SSIS 2012 has ODBC Source & Destination
• Handy for connecting to SQL Azure
• Essential if SQL Server stops supporting OleDb
• SSIS 2008 could access ODBC via ADO.Net:
• Has create table option, which ODBC lacks
• No control on batch inserts
nr of rows ODBC ADO.Net % Diff
• Low performance 1000 0,42 2,12 405%
10000 4,91 7,84 60%
100000 49,2 78,36 59%
1000000 481,65 781,28 62%
11. REPLACE OLEDB WITH ODBC?
• After comparing ODBC with ODBC via ADO.Net, lets test ODBC versus
OleDb
• On bulk insert nr of rows OleDb OleDb Fast ODBC % Diff
1000 0,15 0,07 0,865 477%
10000 0,32 0,16 4,8 1400%
100000 1,66 0,565 48,13 2799%
1000000 12,485 9,12 483,085 3769%
• On row by row nr of rows OleDb ODBC % Diff
1000 0,62 0,76 -18%
10000 9,15 6,28 46%
100000 71,21 67,37 6%
1000000 730,16 684,28 7%
Your mileage may vary…
12. 5: SCRIPTING
• Script task and script component now support .Net 4.0
• Breakpoints are supported in script component
• When developing custom components, there is better backpressure
support:
• SupportsBackPressure property, IsInputReady and GetDependantInputs method
13. 6: EXPRESSION TASK
• The script task can be used to modify variable values… but it’s overkill
• Expression task provides a simple task to change variable values
14. DATA QUALITY SERVICES (DQS)
• DQS is a new service to clean domain data
• Domain knowledge base needs to be build
• Based on rules, positive and negative examples
• Potentially using external data from Azure Marketplace or other providers
15. 7: DQS CLEANSING TASK
• Cleaning and standardizing data before it is loaded in the data
warehouse is essential
• DQS Cleansing task labels data in 4 categories:
• Correct: a value accepted by the knowledge base
• Corrected: a value on which DQS is confident it can correct to a valid domain
value
• Suggested: a value on which DQS is less confident, but can still suggest a
domain value
• New: DQS has no suggestions for this
• See Koen Verbeeck’s session on DQS for more info!
16. 8: PACKAGE CATALOG
• SSIS 2012 can work in the new project mode (default) or in old
package mode (backwards compatibility)
• In project mode, many things change:
• Project becomes the level of deployment
• Deployment to SQL Server becomes obligatory
• Packages not stored in msdb, but in dedicated user database:
o The package catalog, named SSISDB
• Logging happens automatically and is done in the package catalog
o Custom logging still supported
• Projects can be converted from one deployment type to another
17. PACKAGE CATALOG
• Manage via SSMS: Relational engine
• Fixed database name: SSISDB
• Stores projects, versions, logs, 5 reports, 25 views, 42 stored
procedures, …
• This makes it possible to run, monitor and manage SSIS projects and
packages via T-SQL!
18. 9: PARAMETERS
• Just two scopes:
• Package
• Project!
• Read-only
• Value is set when scope starts and cannot be changed
• Can be set from SQL Server Data Tools configurations
• Often used together with environments
• Does not replace variables
• It is more a package configuration replacement
• Using the visual studio (SSDT) configurations
we can configure default values for testing
19. 10: SHARED CONNECTION MANAGERS
• Shared connection manager is defined at project level and is
automatically available in every package
• Not copied as in SSIS 2008
• Shared connection managers can be parameterized as well
• When converting shared connection managers back to regular
(package) connection managers, they disappear in all other packages
• Shared cache connection managers are supported as well
• This allows to cache data in memory in one package and reuse it in multiple
other packages
20. 11: ENVIRONMENTS
• Environments replace package configurations
• They can control parameter values and connection strings
• Environments are created in the package catalog
• They are not deployed to the server, but created on the server
• Don’t forget to reference the environment at the project level
• Script them while creating, this eases creating multiple environments
• A server might have multiple environments
• When we execute a package, we can choose which environment we’ll use
21. 12: DATA TAPS
• Imagine a data viewer
• Which can be added on the runtime server
• Without modifying the package, but using T-SQL
• Which writes the data to disk instead on visualizing it…
• Voila, you are now thinking about the data tap
22. 13: AND A LOT MORE…
• .Net API and Powershell
• Pivot and row count transformation get a user interface
• Flat file supports
• Embedded qualifiers
• Variable number of columns (but still fixed meta-data)
• Raw file improvements
• Generate empty raw file
• Stores sort info
• DTSX files are becoming more readable and ‘mergeable’
• Sorted, filtered and prettyprinted
• Merge and merge join improve backpressure handling
23. AND A LOT MORE…
• 4000 char expression length lifted
• New expression language keywords
• LEFT as syntactic sugar for SUBSTRING(,1,)
• TOKEN and TOKENCOUNT for shredding strings
24. SUMMARY
• Improved GUI
• Change data capture support
• Easy column remapping
• ODBC connections
• .Net 4.0 support & script component debugging
• Expression Task
• Data Quality Cleansing
• Package catalog
• Parameters
• Shared Connection Managers
• Environments
• Data Taps
• And a lot more…