The document describes the software architecture of Informatica PowerCenter ETL product. It consists of 3 main components: 1) Client tools that enable development and monitoring. 2) A centralized repository that stores all metadata. 3) The server that executes mappings and loads data into targets. The architecture diagram shows the data flow from sources to targets via the server.
1. Informatica Software Architecture illustrated
Informatica ETL product, known as Informatica Power Center consists of 3 main components.
1. Informatica PowerCenter Client Tools:
These are the development tools installed at developer end. These tools enable a developer to
Define transformation process, known as mapping. (Designer)
Define run-time properties for a mapping, known as sessions (Workflow Manager)
Monitor execution of sessions (Workflow Monitor)
Manage repository, useful for administrators (Repository Manager)
Report Metadata (Metadata Reporter)
2. Informatica PowerCenter Repository:
Repository is the heart of Informatica tools. Repository is a kind of data inventory where all the
data related to mappings, sources, targets etc is kept. This is the place where all the metadata for
your application is stored. All the client tools and Informatica Server fetch data from Repository.
Informatica client and server without repository is same as a PC without memory/harddisk,
which has got the ability to process data but has no data to process. This can be treated as
backend of Informatica.
3. Informatica PowerCenter Server:
Server is the place, where all the executions take place.
Server makes physical connections to sources/targets,
fetches data, applies the transformations mentioned in the
mapping and loads the data in the target system.
This architecture is visually explained in diagram below:
2. Sources
Targets
Standard: RDBMS,
Flat Files, XML, Standard: RDBMS,
ODBC Flat Files, XML,
ODBC
Applications: SAP
R/3, SAP BW, Applications: SAP
PeopleSoft, Siebel, JD R/3, SAP BW,
Edwards, i2 PeopleSoft, Siebel, JD
Edwards, i2
EAI: MQ Series,
Tibco, JMS, Web EAI: MQ Series,
Services Tibco, JMS, Web
Services
Legacy: Mainframes
(DB2, VSAM, IMS, Legacy: Mainframes
IDMS, Adabas)AS400 (DB2)AS400 (DB2)
(DB2, Flat File)
Remote Targets
Remote Sources
This is the sufficient knowledge to start with Informatica. So lets go straight to development in
Informatica.
Informatica >> Beginners >> Informatica Product Overview
Informatica Product Line
Informatica is a powerful ETL tool from Informatica Corporation, a leading provider of
enterprise data integration software and ETL softwares.
The important products provided by Informatica Corporation is provided below:
Power Center
Power Mart
Power Exchange
Power Center Connect
Power Channel
3. Metadata Exchange
Power Analyzer
Super Glue
Power Center & Power Mart: Power Mart is a departmental version of Informatica for
building, deploying, and managing data warehouses and data marts. Power center is used for
corporate enterprise data warehouse and power mart is used for departmental data warehouses
like data marts. Power Center supports global repositories and networked repositories and it can
be connected to several sources. Power Mart supports single repository and it can be connected
to fewer sources when compared to Power Center. Power Mart can extensibily grow to an
enterprise implementation and it is easy for developer productivity through a codeless
environment.
Power Exchange: Informatica Power Exchange as a stand alone service or along with Power
Center, helps organizations leverage data by avoiding manual coding of data extraction
programs. Power Exchange supports batch, real time and changed data capture options in main
frame(DB2, VSAM, IMS etc.,), mid range (AS400 DB2 etc.,), and for relational databases
(oracle, sql server, db2 etc) and flat files in unix, linux and windows systems.
Power Center Connect: This is add on to Informatica Power Center. It helps to extract data and
metadata from ERP systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third
party applications.
Power Channel: This helps to transfer large amount of encrypted and compressed data over
LAN, WAN, through Firewalls, tranfer files over FTP, etc.
Meta Data Exchange: Metadata Exchange enables organizations to take advantage of the time
and effort already invested in defining data structures within their IT environment when used
with Power Center. For example, an organization may be using data modeling tools, such as
Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc for developing data models.
Functional and technical team should have spent much time and effort in creating the data
model's data structures(tables, columns, data types, procedures, functions, triggers etc). By using
meta deta exchange, these data structures can be imported into power center to identifiy source
and target mappings which leverages time and effort. There is no need for informatica developer
to create these data structures once again.
Power Analyzer: Power Analyzer provides organizations with reporting facilities.
PowerAnalyzer makes accessing, analyzing, and sharing enterprise data simple and easily
available to decision makers. PowerAnalyzer enables to gain insight into business processes and
develop business intelligence.
With PowerAnalyzer, an organization can extract, filter, format, and analyze corporate
information from data stored in a data warehouse, data mart, operational data store, or otherdata
storage models. PowerAnalyzer is best with a dimensional data warehouse in a relational
4. database. It can also run reports on data in any table in a relational database that do not conform
to the dimensional model.
Super Glue: Superglue is used for loading metadata in a centralized place from several sources.
Reports can be run against this superglue to analyze meta data.
Note:This is not a complete tutorial on Informatica. We will add more Tips and Guidelines on
Informatica in near future. Please visit us soon to check back. To know more about Informatica,
contact its official website www.informatica.com
Informatica Transformations
A transformation is a repository object that generates, modifies, or passes data. The Designer
provides a set of transformations that perform specific functions. For example, an Aggregator
transformation performs calculations on groups of data.
Transformations can be of two types:
Active Transformation
An active transformation can change
the number of rows that pass through
the transformation, change the
transaction boundary, can change the
row type. For example, Filter,
Transaction Control and Update
Strategy are active transformations.
The key point is to note that Designer does not allow you to connect multiple active
transformations or an active and a passive transformation to the same downstream transformation
or transformation input group because the Integration Service may not be able to concatenate the
rows passed by active transformations However, Sequence Generator transformation(SGT) is an
exception to this rule. A SGT does not receive data. It generates unique numeric values. As a
result, the Integration Service does not encounter problems concatenating rows passed by a SGT
and an active transformation.
Passive Transformation.
A passive transformation does not change the number of rows that pass through it, maintains the
transaction boundary, and maintains the row type.
The key point is to note that Designer allows you to connect multiple transformations to the same
downstream transformation or transformation input group only if all transformations in the
upstream branches are passive. The transformation that originates the branch can be active or
passive.
5. Transformations can be Connected or UnConnected to the data flow.
Connected Transformation
Connected transformation is
connected to other transformations or
directly to target table in the mapping.
UnConnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is
called within another transformation, and returns a value to that transformation.
Informatica Transformations
Following are the list of Transformations available in Informatica:
Aggregator Transformation
Application Source Qualifier Transformation
Custom Transformation
Data Masking Transformation
Expression Transformation
External Procedure Transformation
Filter Transformation
HTTP Transformation
Input Transformation
Java Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
Output Transformation
Rank Transformation
Reusable Transformation
Router Transformation
Sequence Generator Transformation
Sorter Transformation
Source Qualifier Transformation
SQL Transformation
Stored Procedure Transformation
Transaction Control Transaction
Union Transformation
Unstructured Data Transformation
Update Strategy Transformation
XML Generator Transformation
XML Parser Transformation
XML Source Qualifier Transformation
Advanced External Procedure Transformation
6. External Transformation
In the following pages, we will explain all the above Informatica Transformations and their
significances in the ETL process in detail.
Informatica >> Beginners >> Informatica Transformations
Informatica Transformations
Aggregator Transformation
Aggregator transformation performs aggregate funtions like average, sum, count etc. on multiple
rows or groups. The Integration Service performs these calculations as it reads and stores data
group and row data in an aggregate cache. It is an Active & Connected transformation.
Difference b/w Aggregator and Expression Transformation? Expression transformation permits
you to perform calculations row by row basis only. In Aggregator you can perform calculations
on groups.
Aggregator transformation has following ports State, State_Count, Previous_State and
State_Counter.
Components: Aggregate Cache, Aggregate Expression, Group by port, Sorted input.
Aggregate Expressions: are allowed only in aggregate transformations. can include conditional
clauses and non-aggregate functions. can also include one aggregate function nested into another
aggregate function.
Aggregate Functions: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE,
STDDEV, SUM, VARIANCE
Application Source Qualifier Transformation
Represents the rows that the Integration Service reads
from an application, such as an ERP source, when it runs
a session.It is an Active & Connected transformation.
Custom Transformation
It works with procedures you create outside the designer interface to extend PowerCenter
functionality. calls a procedure from a shared library or DLL. It is active/passive & connected
type.
You can use CT to create T. that require multiple input groups and multiple output groups.
7. Custom transformation allows you to develop the transformation logic in a procedure. Some of
the PowerCenter transformations are built using the Custom transformation. Rules that apply to
Custom transformations, such as blocking rules, also apply to transformations built using Custom
transformations. PowerCenter provides two sets of functions called generated and API functions.
The Integration Service uses generated functions to interface with the procedure. When you
create a Custom transformation and generate the source code files, the Designer includes the
generated functions in the files. Use the API functions in the procedure code to develop the
transformation logic.
Difference between Custom and External Procedure Transformation? In Custom T, input and
output functions occur separately.The Integration Service passes the input data to the procedure
using an input function. The output function is a separate function that you must enter in the
procedure code to pass output data to the Integration Service. In contrast, in the External
Procedure transformation, an external procedure function does both input and output, and its
parameters consist of all the ports of the transformation.
Data Masking Transformation
Passive & Connected. It is used to change sensitive
production data to realistic test data for non production
environments. It creates masked data for development,
testing, training and data mining. Data relationship and
referential integrity are maintained in the masked data.
For example: It returns masked value that has a realistic format for SSN, Credit card number,
birthdate, phone number, etc. But is not a valid value. Masking types: Key Masking, Random
Masking, Expression Masking, Special Mask format. Default is no masking.
Expression Transformation
Passive & Connected. are used to perform non-aggregate functions, i.e to calculate values in a
single row. Example: to calculate discount of each product or to concatenate first and last names
or to convert date to a string field.
You can create an Expression transformation in the Transformation Developer or the Mapping
Designer. Components: Transformation, Ports, Properties, Metadata Extensions.
External Procedure
Passive & Connected or Unconnected. It works with procedures you create outside of the
Designer interface to extend PowerCenter functionality. You can create complex functions
within a DLL or in the COM layer of windows and bind it to external procedure transformation.
To get this kind of extensibility, use the Transformation Exchange (TX) dynamic invocation
interface built into PowerCenter. You must be an experienced programmer to use TX and use
multi-threaded code in external procedures.
Filter Transformation
8. Active & Connected. It allows rows that meet the specified filter condition and removes the rows
that do not meet the condition. For example, to find all the employees who are working in
NewYork or to find out all the faculty member teaching Chemistry in a state. The input ports for
the filter must come from a single transformation. You cannot concatenate ports from more than
one transformation into the Filter transformation. Components: Transformation, Ports,
Properties, Metadata Extensions.
HTTP Transformation
Passive & Connected. It allows you to connect to an
HTTP server to use its services and applications. With an
HTTP transformation, the Integration Service connects to
the HTTP server, and issues a request to retrieves data or
posts data to the target or downstream transformation in
the mapping.
Authentication types: Basic, Digest and NTLM. Examples: GET, POST and SIMPLE POST.
Java Transformation
Active or Passive & Connected. It provides a simple native programming interface to define
transformation functionality with the Java programming language. You can use the Java
transformation to quickly define simple or moderately complex transformation functionality
without advanced knowledge of the Java programming language or an external Java
development environment.
Joiner Transformation
Active & Connected. It is used to join data from two related heterogeneous sources residing in
different locations or to join data from the same source. In order to join two sources, there must
be at least one or more pairs of matching column between the sources and a must to specify one
source as master and the other as detail. For example: to join a flat file and a relational source or
to join two flat files or to join a relational source and a XML source.
The Joiner transformation supports the following types of joins:
Normal
Normal join discards all the rows of data from the master and detail source that do not
match, based on the condition.
Master Outer
Master outer join discards all the unmatched rows from the master source and keeps all
the rows from the detail source and the matching rows from the master source.
Detail Outer
9. Detail outer join keeps all rows of data from the master source and the matching rows
from the detail source. It discards the unmatched rows from the detail source.
Full Outer
Full outer join keeps all rows of data from both the master and detail sources.
Limitations on the pipelines you connect to the Joiner transformation:
*You cannot use a Joiner transformation when either input pipeline contains an Update Strategy
transformation.
*You cannot use a Joiner transformation if you connect a Sequence Generator transformation
directly before the Joiner transformation.
Lookup Transformation
Passive & Connected or UnConnected. It is used to look up data in a flat file, relational table,
view, or synonym. It compares lookup transformation ports (input ports) to the source column
values based on the lookup condition. Later returned values can be passed to other
transformations. You can create a lookup definition from a source qualifier and can also use
multiple Lookup transformations in a mapping.
You can perform the following tasks with a Lookup transformation:
*Get a related value. Retrieve a value from the lookup table based on a value in the source. For
example, the source has an employee ID. Retrieve the employee name from the lookup table.
*Perform a calculation. Retrieve a value from a lookup table and use it in a calculation. For
example, retrieve a sales tax percentage, calculate a tax, and return the tax to a target.
*Update slowly changing dimension tables. Determine whether rows exist in a target.
Lookup Components: Lookup source, Ports, Properties, Condition.
Types of Lookup:
1) Relational or flat file lookup.
2) Pipeline lookup.
3) Cached or uncached lookup.
4) connected or unconnected lookup.