2. TYPICAL USAGE - CUSTOMER DATA
OPERATIONS
• Data De-Duping
• Data Standardization
• Data Analysis and Data Profiling
• Data Consolidation from various sources
• Comparing multiple data sets as per predefined parameters
• Insert Data in to Target Data Bases
• Match at the glance Reports for various New Acquisitions
3. DF NODE - DATA INPUTS
DF could use various Input sources such as Relational
Databases (using queries), Excel files, Access Files, Text files
This sources are connected Via ODBC
Examples-A query is inserted in SQL Query Node .By selecting a
database/Access file in the node properties
For Excel-Area needs to be defined for selection by using Name
manager under formula tab in excel sheet .For excel sheets
Data Source Input node is used
4. DF NODE - DATA OUTPUTS
By using DF we can insert a Job/Result output in an Excel,
Access ,Text, relational database like Oracle /Sql Server
DF uses Insert/Update/Target/Output utilities for Data output
stage
Examples-The output result can be directly inserted into
Database table by using Data Target Insert Node
Output can also be taken in an text file via Text file output node
5. DF NODE – QUALITY
• Standardization
dfPower Architect's Standardization node is used to make similar items the
same
The various definition of standardizations are Name, Address,
Organization,Zip, Phone, email address ,country, State ,Non Alpha numeric
remover, Numeric remover, Alpha Numeric remover ,space remover
,Quotation remover etc
Various Schemas can also be selected which can be defined in QKB of
DataFlux
For Example-using full company names instead of initials ("International
Business Machines" vs. "IBM"),
6. DF NODE – QUALITY
• Standardization (More Examples)-Addresses
1 Comcast Center to 1 Comcast Ctr
10 Glenlake Pkwy north east to 10 Glenlake Pkwy NE
"North Dakota" vs. "ND“
United States vs USA
7. DF NODE – QUALITY
• Parsing
DF Power Architect's Parsing node is a simple but intelligent tool for
separating multi-part field values into multiple, single-part fields.
For example, if you have a Name field that includes the value "Mr.
Igor Bela Bonski III, Esq.," you can use parsing to create six separate
fields:
Name Prefix: "Mr."
Given Name: "Igor"
Middle Name: "Bela"
Family Name: "Bonski"
Name Suffix: "III"
Name Appendage: "Esq."
8. DF NODE – INTEGRATION
• Match Codes
dfPower Architect's Match Codes is to identify duplicate records
in your data. These steps create match codes, that evaluate the
quantity of duplicate fields in your data and eliminate the extra
fields.
Match codes can be set from 50%(Lowest) to 100%(Exact) and
various schemas can be selectedFieldName Defination Sensitivity
AccountName BussinessTiTtle 85%
Address_Line1 Address/AddressLong 85%
City City Exact-All,Exact-10characters
Country Country Exact-All,Exact-10characters
9. DF NODE – INTEGRATION
• Clustering
DFPower Architect's Data Clustering node is used to employ the
clustering functionality to group match duplicates or set of
unique records as per conditions defined. See cluster numbers
in given example belowCluster AccountName AccountAddress1 MatchCriteria
7231 NewJerseyManufacturersInsuranceCompany 301SullivanWay ExactCompanyName+Address-1
7231 NewJerseyManufacturersInsuranceCompany 301SullivanWay ExactCompanyName+Address-1
7231 NewJerseyManufacturersInsuranceCompany 301SullivanWay ExactCompanyName+Address-1
7663 Metlife,Incorporated 27-01QueensPlzN ExactCompanyName+Address-1
7663 Metlife,Incorporated 27-01QueensPlzN ExactCompanyName+Address-1
7791 EatonCorporation 34899CurtisBlvd ExactCompanyName+Address-1
7791 EatonCorporation 34899CurtisBlvd ExactCompanyName+Address-1
10. DF NODE – INTEGRATION
• Surviving Record Identification
DFPower Architect's Surviving Record Identification (SRI) node
examines clustered data and determines a surviving record for each
cluster. This process lets you eliminate duplicate information in a
data source. The surviving record is identified using one or more
user-configurable record rules. The user may also enter field rules to
perform automated field-level edits of the surviving record's data
during SRI processing. The SRI step can be configured to keep all
existing data, marking the surviving records with a flag or primary
key value, or it can remove all data except for that associated with
the surviving records.
Examples- Consider you have set of duplicate Accounts and addresses
in the system and you need to keep one distinct record out of those
duplicates but the record should have proper phone numbers in it.
You can use SRI node and define rule for selection which can be done
in properties of SRI Node. Please see the example given in the next
slide
11. DF NODE – INTEGRATION
• Surviving Record Identification
Examples (Continued) –Please see the cluster column and the
Surviving record column given below. So each cluster has only
one surviving record
Cluster AccountName AccountAddress1 Phone SurvivingRecord
7231 NewJerseyManufacturersInsuranceCompany 301SullivanWay (609) 883-1300 TRUE
7231 NewJerseyManufacturersInsuranceCompany 301SullivanWay Null FALSE
7231 NewJerseyManufacturersInsuranceCompany 301SullivanWay 987 FALSE
7663 Metlife,Incorporated 27-01QueensPlzN 1-800-638-5000 TRUE
7663 Metlife,Incorporated 27-01QueensPlzN Null FALSE
7791 EatonCorporation 34899CurtisBlvd 1-900-735-5674 TRUE
7791 EatonCorporation 34899CurtisBlvd Null FALSE
12. DF MATCH EXAMPLES
• Standardization and Match codes combined in job flow gives
Remarkable results as shown below
Exact or 100% Match results
Input-COMPANYNAME Matched/OutputCompanyName ADDRESS1(Input) ADDR(Matched)
NetscapeCommunicationsCorporation NetscapeCommunicationsCorporation 501EMiddlefieldRd 501EMiddlefieldRd
Alston&BirdLLP Alston&BirdLLP 1201WPeachtreeSt 1201WPeachtreeSt
GeorgiaPerimeterCollege GeorgiaPerimeterCollege 3251PanthersvilleRd 3251PanthersvilleRd
CountyofOneida CountyofOneida 800ParkAve 800ParkAve
EliLillyandCompany EliLillyandCompany POBox6034 POBox6034
ActuateCorporation ActuateCorporation 2207BridgepointePkwy.Ste.500 2207BridgepointePkwySte500
ShrinersHospitalsForChildren ShrinersHospitalsForChildren 3551NBroadSt 3551NBroadSt
CatholicHealthInitiatives CatholicHealthInitiatives 440CreameryWay 440CreameryWay
ElPasoElectricCompany ElPasoElectricCompany 123WMillsAve 123WMillsAve
15. DF NODE – UTILITIES
• Data Joining Node
This nodes is used to joining data form various sources such as
Two different databases/Excels/Access etc
DFPower Architect's Data Joining job flow step is based on the
SQL concept of JOIN. You can use Data Joining to combine two
data sets in an intelligent way so that the records of one, the
other, or both data sets are used as the basis for the resulting
data set
16. DF NODE – UTILITIES
• SQL Lookup
SQL Lookup lets the user find rows in a database table that
have one or more fields matching those in the job flow. It
provides an explicit advantage with performance, especially
with large databases since the large database is not copied
locally on the hard drive in order to perform the operation (as
is the case with joins).
17. DF NODE – UTILITIES
• SQL Execute
This is a stand-alone node (no parents or children) that lets you
construct and execute any valid SQL statement (or series of
statements). It performs some database-specific task(s), either
before, after, or in-between architect job flows.
Examples-SQL Statements like Update, delete ,commit for a
particular table can be used in this node
18. DF NODE – UTILITIES
Data Union
DFPower Architect's Data Union node is based on the SQL concept of
UNION. As with Data Joining, use the Data Union node to combine data
from two data sets. Unlike Data Joining, however, Data Union does not
perform an intelligent combination. Rather, Data Union simply adds the
two data sets together; the resulting data set contains one record for
each record in each of the original data sets
Examples- Data from two or more sheets/Databases/DF job flows needs
to be clubbed together. This node performs the Task
19. DF NODE – UTILITIES
• Branch
This step lets multiple children (up to 32) simultaneously
access data from a single source. Depending on step's
configuration and children's access patterns, you can pass data
from the parent directly to each of the children, or it may be
temporarily stored in memory and/or disk caches, before being
passed to the children.
In other words it can be one input and multiple outputs(Max-
32)
20. DF NODE – UTILITIES
Concatenate
DFPower Architect's Concatenate node performs the opposite
function of the Parse node. Rather than separate a single field
into multiple fields, Concatenate combines one or more fields
into a single field.
Example
Suffix-Mr First Name- Rahul Last Name- Jain
Concatenate output – Mr Rahul Jain
21. DF NODE – UTILITIES
• Expression
Use DFPower Architect's Expression node to run a Visual BASIC-like
language to process your data sets in ways that are not built into
dfPower Studio. The Expression language provides many statements,
functions, and variables for manipulating data
Examples like creating a column Match Criteria in middle of Job
flow.The syntax would be
Expression Match_Criteria = “ “
Pre-Processing Expression string Match Criteria
22. DF NODE – UTILITIES
• Data Sorting
Use DFPower Architect's Data Sorting node to re-order
(Ascending or Descending way)your data set at any
point in a job flow.
23. DF NODE – PROFILING
• Basic Statistics
DFPower Architect's Basic Statistics node is used to calculate
statistics about your data, such as value ranges, counts, or
sums for any given field
The Basic Statistics node is typically used on numerical rather
than text fields. However, statistics such as Count, Missing,
MAX, and MIN could be useful on any field type
This can be used in middle of the job as well to do a Fault
finding by checking the counts of each step
Examples Basic stat of Siebel TableRow_Id Created Created_By Account Name Partner Flag Email Addr Phone CSN
Records 267413 267413 267413 267413 267413 267413 267413 267413
Count 267413 267413 267413 267413 267413 5 72552 181643
Null Count 0 0 0 0 0 267408 194861 85770
Distinct yes yes yes yes no yes yes yes
Min 1 0-5200 1/1/1980 0:00 0-1 N dllee@pentasoft.co.kr ###iswrong 1
Max 1 O-2 9/9/2010 21:55 1-XVOET ültje GmbH Y tloughran@infopath.net xxxxxxxxx
24. DF NODE – PROFILING
Pattern Analysis
DFPower Architect's Pattern Analysis node is used to generate a
new field containing alphanumeric patterns that represent each
value in a selected field. You can specify whether these patterns
represent each character or each word (as separated by spaces)
in a field.
25. DF NODE – PROFILING
• Frequency Distribution
DFPower Architect's Frequency Distribution node is used to
calculate the number of occurrences of each unique value in a
field.
For example, Frequency Distribution can determine how many
customers in your customer database are in each of the 50 US
states, the District of Columbia, and the 13 Canadian
provinces.State Count of Customers %Total
CA 19593 12
CO 4041 2
CT 2807 1
DC 2555 1
DE 746 0
FL 7105 4
GA 5198 3
GE 1 0
GEO GEO_count GEO %
Americas 187235 57
AsiaPacific 30642 9
EMEA 107412 33
26. DF NODE – PROFILING
• Data validation
DFPower Architect's Data Validation node is used to analyze
the content of data by setting validation conditions. These
conditions create validation expressions that you can use to
filter data for a more accurate view of that data.
27. DF NODE – ENRICHMENT
Address Verification
DFPower Architect Address Verification (US/Canada/World) node to
verify, correct, and enhance any addresses in your existing data (QKB).
Address Verification (US/Canada/World) uses geographic information
from various reference databases to match and standardize addresses.
You can also use Address Verification (US/Canada) for proper casing and
CASS /SERP compliance. The addresses are distinguished as per codes
mentioned in the next slide. So it gives the status of addresses i.e how
valid it is
28. DF NODE – ENRICHMENT
• For US Addresses
Text Result
Code
Numeric
Result Code Description
OK 0 Address was verified successfully.
PARSE 11
Error parsing address. Components of the
address may be missing.
CITY 12
Could not locate city/state or zip in the USPS
database. At least (city and state) or ZIP
must be present in the input.
MULTI 13
Ambiguous address. There were two or
more possible matches for this address with
differing data.
NOMATCH 14
No matching address found in the USPS
data.
OVER 15
One or more input strings is too long
(maximum 100 characters).
29. • For Canada Addresses
Result Code Description
0 No error occurred
1 Internal error
2 Cannot load database
3 Invalid - unspecified reason
4 Invalid civic number
5 Invalid street
6 Invalid unit
7 Invalid delivery mode
8 Invalid delivery installation
9 Invalid city
10 Invalid province
11 Invalid postal code
12 Address is not Canadian
30. • Rest of World(Excluding US and Canada)
ResultCode Description
0 Addresscorrectasentered.
1 Addresscorrectedautomatically.
2 Addressneedstobecorrected,butcouldnot
3
Addressneedstobecorrected,butcouldnot
bedeterminedautomatically.Thereisafair
4
Addressneedstobecorrected,butcouldnot
bedeterminedautomatically.Thereisasmall
31. DF NODE – MONITORING
Data Monitoring
The Data Monitoring node enables you to analyze data according to
business rules you create using the Business Rule Manager. The
business rules you create in Rule Manager can analyze the structure of
the data and trigger an event, such as logging a message or sending an
email alert, when a condition is detected. By using the Data Monitoring
node, you can insert these business rules in your job flow to analyze
data at various points in the flow.