Contenu connexe
Similaire à Oracle data integrator in swedbank EDW - Rein Adamson ja Mart Tudre (20)
Plus de ORACLE USER GROUP ESTONIA (20)
Oracle data integrator in swedbank EDW - Rein Adamson ja Mart Tudre
- 1. Oracle Data Integrator
ETL software in Swedbank EDW
2007 – 2011
Mart Tudre – Swedbank Baltic DW architect
Rein Adamson – Project Manager
© Swedbank
- 2. Agenda
• EDW - Enterprise Data Warehouse
– EDW, BI definitions
– Swedbank Baltic DW - general facts
• ETL software evaluation 2007
– ETL Software evaluation and Proof of Concept 2007
– ODI Implementation project
– User roles today
• ODI implementation in Swedbank Baltic DW
– ODI defining features
– Usage specifics and custom components
2
© Swedbank
- 3. Data WareHouse – a definition
• A data warehouse is a repository of an organization's electronically stored data,
designed to facilitate reporting and analysis.
• An expanded definition for data warehousing includes tools for
– business intelligence
– extracting, transforming and loading data into the repository
– to manage and retrieve metadata.
Business intelligence - computer-based techniques used in spotting, digging-out, and
analyzing business data
Source: wikipedia.org
ETL – Extract, Transform, Load
EDW – Enterprise Data Warehouse (also IT org.unit in Swedbank)
3
© Swedbank
- 4. Business Intelligence functions
• predictive analytics (statistics, data mining)
• online analytical processing (OLAP)
• business performance management
• benchmarking
• text mining
• reporting
4
© Swedbank
- 5. Data Warehouse architecture
Analytical Users
Replication
Enterprise
Data Warehouse,
Integrated Data Marts
Data Transformation
Operational Data
Source Business Users
5
© Swedbank
- 6. Data flows
Analytical services FM RM CM CB
Data delivery
P A R T Y A S S E T
T h i n g s p a r ti e s A G R E E M E N T
h a v e a n i n te r e s t i n
th a t h a v e v a lu e . A c o n tr a c t o r a n y ty p e
P A R T Y o f a g r e e m e n t o f in te r e st
b e tw e e n P a r tie s.
A n in d iv id u a l, b u sin e ss
o r g r o u p o f in d iv id u a ls
o f i n t e r e s t to t h e f i n a n c i a l
i n s ti t u t i o n .
F IN A N C E
T h e i n te r n a l a c c o u n ti n g
o f th e b u sin e s s.
E V E N T
Central data P R O D U C T S o m e th in g o f in te r e st th a t
Data store
A n y m a r k e ta b l e p r o d u c t h a p p e n e d th a t m a y o r m a y
o r se r v ic e in c lu d in g te r m s, n o t in v o lv e c o n ta c t w ith th e
store c o n d i ti o n s a n d f e a tu r e s . c u sto m e r .
I N T E R N A L O R G A N I Z A T IO N
A P a r t y th a t i s a u n it o f b u s in e ss. C H A N N E L
T h e v e h ic le b y w h ic h a
p a r ty m a y in te r a c t
L O C A T IO N w i th th e f i n a n c i a l i n s t i tu t i o n .
A p h y sic a l a d d r e s s, C A M P A IG N
e le c tr o n ic a d d r e ss A c o m m u n ic a tio n p la n to
o r g e o g ra p h ic a l a re a . d e liv e r a m e ssa g e .
Data aquisition
Source systems LOAN DEPOSIT CARDS LEASING GL
...
6
© Swedbank
- 7. Swedbank Baltic DW
Swedbank Baltic Data Warehouse (EDW) is a subject oriented,
integrated, time-variant, non-volatile collection of enterprise data.
– Subject Oriented: Information is organized by subject areas instead of business line
specific source system data structure. Subject areas are Party, Product, Agreement,
Channel, Organization, Event etc.
– Integrated: Data that is gathered into the data warehouse from a variety of sources
and merged into a coherent whole under unified governance by using agreed
dimesions, such as party Product, Agreement, Channel, Organisation etc.
– Time-variant: All data in the data warehouse is identified with a particular time period.
DW stores history.
– Non-volatile: Data in the data warehouse is usually not over-written or deleted. Once
committed, the data is read-only, and retained for future reporting and analysis.
– Detailed: The granuality is detailed business events.
– Based on reference industry model: Teradata financial services logical datamodel.
7
© Swedbank
- 8. Multiple usage of data warehouse
• Different business services have different requierements for
– data availability frequency and timing
(e.g daily 6 am, daily 6 pm, monthly 1 day 8 am)
– data quality
(some services have near 0 tolerance to errors)
– performance and workload
8
© Swedbank
- 9. Enterprise model (High level)
ASSET
PARTY FINANCE
Items that belong to
Items that belong to LOCATION
parties and which have
parties and which have The internal accounting A geographic or spatial
An individual or value. The internal accounting
value. of the business area, physical address
group of individuals. of the business
or electronic address.
CAMPAIGN AGREEMENT EVENT
A communication plan
directed at parties or a
A contract or deal between Financial or non-financial
market for a purpose.
parties that is of interest. event which may involve
contact with the customer.
CHANNEL
PRODUCT
INTERNAL ORGANIZATION The vehicle by which a
Any marketable or customer interacts with
A unit of business within the tradable product the Financial
financial institution or insurance or service including institution/insurance
company. Is a type of Party. terms and conditions. company.
9
© Swedbank Not all relationships are shown 9
- 10. Swedbank Baltic DW Statistics
External
• 30 source systems (containing 1000 source objects)
• 50 business services
• 75 employees in Baltic DW
Internal
• 20 Terabytes of storage (planned for 2012 50 TB)
• 650 objects in main data store
• 500 ETL processes
• 4000 database objects
• 40 database schemas in DW
• 245 direct db db users, 500 reporting users
10
© Swedbank
- 11. How to manage
• everyday operations
• developement
• testing
• releasing
• migration (both technical and business)
• etl workflow optimisation
Answer: Using Enterprise metadata system needed
11
© Swedbank
- 12. ETL is part of METATADA
Enterprise Metadata
12
© Swedbank
- 13. ETL Software evaluation and POC 2007
Rein Adamson – project manager
• Request for Proposal to 4 Vendors
• 2 Vendors selected for Proof of Concept (POC)
– Oracle “ODI”
– Informatica “PowerCenter” (ETL market leader)
• POC budget 20 kEUR
• Evaluation process duration 5-8 months:
– 2 m RFP and 2 Vendors selection for POC
– 4 m POC preparation
– 1 m POC action + results to management decision
– 1 m License and Implementation Contract with Winner
13
© Swedbank
- 14. POC- Proof Of Concept 2007
• POC budget 10 kEUR per Vendor included:
– 1 day system installation on bank IT infrastructure
– 2 days preparation before arrival (5 tasks sended)
– 5 days onsite consultant
• POC scope in 5 days with consultant:
– 1 day: Training to POC team ( 5 persons )
– 2,3,4 day: guidance to team for 5 ETL tasks development
– Last day: 2 hrs demo to IT managers
14
© Swedbank
- 15. POC Loading tasks scenarios
• 3 days to complete 5 ETL tasks
• 1 task for each POC team member. Experienced DWH
specialists: developer, analyst, DBA, Admin, 2 architects
• Consultant was a trainer to support our specialists
TASKS CONTENT:
• Task 1 – Agreement loading (incl. Historisation)
• Task 2 – Trigger filled to history table (incl.Country context)
• Task 3 – Rows to Columns and vice versa
• Task 4 – Aggregation within Teradata
• Task 5 – Bank transactions(events) loading
– from 3 sources into 1 target, capacity perfomance test 7 million row
15
© Swedbank
- 16. KSF - Key Success Factors evaluated
• Reusability and standardization of loadings (high)
• Impact analysis on attribute level
• Resources for EDW services performance
• Release deployment and configuration
• Functionality of metadata repository (medium priority)
• Improve EDW development process
• EDW loading and calculation workflow management
• Faster analysis stage of development task
• Faster process and error maintenance (low priority)
16
© Swedbank
- 17. Reusability and standardization of loading
patterns.
Flexibility of loading templates. Customizable, but robust. Target is to
shorten time of development by reusing excisting patterns.
• ODI • INFORMATICA
• All the objects in ODI are reusable • Templates are fixed source/target
because of substitution method templates. Technical options are
used. integrated with business logic.
• ELT Architecture supports today's • It is possible to create reusable
skill sets components but, while doing tasks
it was clear that at one point it
• Business and technical information
easier to start from blank page....
has been separated from data load
logic. • 1,8 points out of 3
• 2,8 points out of 3
17
© Swedbank
- 18. Release deployment and configuration
Time and understanding of maintenance and deployment new loading
procedures. Easier and faster release management.
ODI IFA
• Topology is transparent and easily • Topology is not clear and
understandable, transparent
• Monitoring is at necessary detail • Release complexity can grow to
level together with debugging, estimations where it is comparable
• No additional environments to today's situation,
needed, information is moving • Monitoring and debugging is
between repositories only, available at high level until steps
• Versioning with install/rollback have been completed, no
functionality is available. intermediate access,
• ... • Country based approach is not
• 2,6 points out 3 supported in central repository.
• 1,8 points out of 3
18
© Swedbank
- 19. POC results summary comment
– ODI utilizes the existing infrastructure. There is no (new)
proprietary transformation server/database. This tool is
utilizing Source and Target database engine and their tools
to unload/load data and transform the data. It is transparent.
No need for highly new skills and more specialists.
– Informatica brings in totally new technology, additional
specialists needed, more trainings and consultancy to buy.
19
© Swedbank
- 20. KSF evaluation points (max 3)
ODI IFA
1,0 1,5 2,0 2,5 3,0
1.Reusability and standardisation of
loadings
2.Impact analysis on attribute level
3.Resources for EDW services
perfomance
4.Release deployment and configuration
5.Functionality of metadata repository
6.Improve EDW development process
7.EDW loading and calculation workflow
management
8.Faster analysis stage of development
task
9.Faster process and data error
maintenance 20
© Swedbank
- 21. ODI implementation 2007sept - 2008 sept
• Oracle ODI partner consultancy used
– 1 standard training in 4 days , 10 persons in class
– 1 onsite visit in 2 days (consultant from Italy)
– 5 days off-site consultancy during 3 months (Poland)
– 5 Oracle support cases
• Customer resource
– 1 experienced ETL developer assigned 100% in 1 year
–
• Custom solutions design and implementation:
– ETL Process registry design and development (2 months duration)
– Common Wrapper development (3 months)
– Process Registry and Common Wrapper testing, debugging (2 m)
– ODI release process procedures implementation (2 m)
21
© Swedbank
- 22. 83 active ODI Users today
• 59 users in EDW (71%), 22 users in CRM area (27%)
• 35 Analyst-Developers; 16 SQA-s. Dev+SQA=61%
Sys.admin-DBA
App.admin
CRM
other manager EDW
LOANS
Implementator
Service Manager
SQA
Developer
0 5 10 15 20 25 30 35 40 22
© Swedbank
- 24. Oracle Data Integrator
• Oracle Data Integrator is a comprehensive
• data integration platform that covers all data integration
requirements from high-volume, high-performance batch
loads, to event-driven, trickle-feed integration processes,
to SOA-enabled data services.
ODI is Oracle’s Strategic Product for Data Integration
• Heterogeneous E-LT Architecture
• Optimized Connectivity Architecture
• Modular Implementation Architecture
• SOA-Native Architecture
24
© Swedbank
- 26. Repository Set-Up Pattern
Security
Create and archive versions
of models, projects and Topology
scenarios Versioning
Import released and tested versions
Master
of scenarios for production
Repository
Models
Projects
Import released versions of
Execution models, projects and
scenarios for testing
Work Repository Models
(Development) Execution
Projects
Execution Execution Repository
(Production)
Work Repository
(Test & QA)
26
© Swedbank
Development – Test – Production Cycle
- 28. ORDER CL_ PARTY CL _BANK_ACCO UNT
Acco unt _Nbr : VARCHAR( 35) CL_CO NTRACT
Pa rty _Id: INTEG ER
ORDER NUMBER In dividua l_Or gan izat ion_ Code : SM ALL INT
Acco unt _Nbr _M odifie r: SMAL LINT Acc oun t_Nb r: VARCHAR(35 )
Acc oun t_Nb r_M od ifier: SM ALLINT
H T A Y C UN
OS _P RT _A CO T H T_ A Y E T
OS P RT _R LA ION ORDER DATE L ifecy cle_ Code : SM ALL INT
Pr ima ry_ Host _Cus tom er _Nbr : VARCHAR( 20)
Acco unt _Cur ren cy_ Code : CHAR( 3)
Acco unt _Pro duc t_T ype _Cod e: SMAL LINT Acc oun t_T ype _Cod e: SMAL LINT
Pr ima ry_ Host _Id: SM ALLINT Acct _Sta tus _Ty pe_ Code : SM ALL INT Pro duc t_Id : INT EGER
H st_ID(FK
o ) H st_ (
o ID FK) STATUS F ull_ Nam e: VARCHAR(24 0)
Sh ort _Nam e: VARCHAR(7 0)
Acco unt _Reg istr atio n_Da te: DATE
Acco unt _Op en_ Date : DAT E
Acc oun t_Cu rre ncy _Cod e: CHAR(3)
Acc oun t_Pr odu ct_ Typ e_Co de: SM ALLINT
F irs t_Na me : VARCHAR( 70) Acco unt _M atu rity _Dat e: DATE Acc t_St atu s_T ype _Cod e: SMAL LINT
Id tifica _ r (F
en tion Nb K) Id tifica _ r (FK
en tion Nb ) ORDER ITEM BACKORDERED M idd le_Na me : VARCHAR( 70)
L ast _Nam e: VARCHAR(7 0)
Acco unt _Clos ing_ Date : DAT E
Owne r_Pa rty _Id: INTEG ER
Acc oun t_Re gist rat ion_ Date : DAT E
Acc oun t_Sig n_Da te: DATE
A u b (FK
cco nt_N r ) R late Ide tifica _N r (FK
e d_ n tion b ) QUANTITY
Cu sto me r_Re side ncy _Cod e: SMAL LINT
Id ent ificat ion_ Nbr: VARCHAR(2 0)
Ma nag er_ Part y_Id : INT EGER
Ope n_Pa rty _Id: INTEG ER
Acc oun t_O pen _Dat e: DATE
Acc oun t_M at urit y_Da te: DATE
Acc oun t_Clo sing _Dat e: DATE
A u b o
cco nt_N r_M difier (FK) R late H Id(FK
e d_ ost_ ) CUSTOMER Pa rty _Sta rt_ Date : DAT E
Re side ncy _Cou ntr y_G eog _Are a_Id : INT EGER
Ope n_Ch ann el_Id : INT EGER
Ope n_Us er_ Code : VARCHAR( 16) Acc oun t_Na me : VARCHAR( 100 )
Own er_ Part y_Id : INT EGER
Bir th_ Date : DAT E Acco unt _Cha nge _Dtim e: TIM ESTAM P(0 )
S rt_ te
ta Da S rt_ te
ta Da CUSTOMER NUMBER L ega l_Reg istr atio n_Da te: DATE
Cu sto me r_T ype _Cod e: SMAL LINT
Acco unt _Cha nge _Lo ad_ Dtim e: TIM ESTAM P(0)
Las t_Re newa l_Dat e: DATE
Qu ota tion _Id: INTEG ER
Por tfolio _Cha nne l_Id: INTEG ER
Ad dre ss_ Use_ Code : SM ALL INT Ter m_ Perio d_Co de: SM ALLINT Affiliat ion_ Part y_Id : INT EGER
E d_ ate
n D CUSTOMER NAME Ad dre ss_ Line : VARCHAR( 140 ) Ter m_ Perio d_Va lue: INTEG ER M ana ger _Par ty_ Id: INTEGER
E d_ a
n D te ORDER ITEM SHIPPED Cit y_Na me : VARCHAR( 30) Depo sit_ Inte res t_Ra te: DECIM AL(8 ,3) App licat ion_ Ope n_Da te: DATE
CUSTOMER CITY Po sta l_Cod e: VARCHAR(20 )
Ph one _Nbr _1: VARCHAR(2 0)
Actu al_In ter est _Rat e: DECIMAL (8, 3)
Depo sit_ Inte res t_Am t: DECIM AL(1 8,2 )
Op en_ Chan nel_ Id: INTEGER
Op en_ Part y_Id : INT EGER
CUSTOMER POST QUANTITY Ph one _Nbr _2: VARCHAR(2 0)
Ele ctr onic _Add res s: VARCHAR(50 )
Depo sit_ Acco unt _Am t: DECIMAL (18 ,2)
Actu al_De pos it_Am t: DECIM AL(1 8,2 )
Op en_ User _Cod e: VARCHAR(16 )
Hint er_ Part y_Id : INT EGER
R 4
/37 CUSTOMER ST SHIP DATE M an age r_Pa rty _Id: INTEG ER Auto _Pro long _Ind : SM ALL INT Selle r_Pa rty _ID: INTEGER
R 78
/3 R 79
/3 F ax _Nbr : VARCHAR( 20)
Cit y_G eog _Are a_Id : INT EGER
Auto _Pro long _Per iod_ Code : SM ALL INT
Auto _Pro long _Per iod_ Value : SM ALL INT
Gr oup _Acc oun t_Ch ild_In d: CHAR(1)
Con tra ct_ Stat us_ Typ e_Co de: SM ALLINT
H T_ AR Y N IFIC IO H T Y
OS P T _IDE T AT N_ IS OR CUSTOMER ADDR St ate _Ge og_ Area _Id: INTEG ER Auto _Pro long _End _Dat e: DATE Cur ren t_Ac cou nt_ Nbr: VARCHAR(3 5)
Cur ren t_Ac cou nt_ Nbr_ Mo difier : SM ALL INT
H T A Y
OS _P RT Se gm ent _Id: INTEG ER
Affilia tion _Seg me nt_ Id: INTEGER
Prem at ure _Te rm inat ion_ Ind: SM ALLINT
Prem at ure _Te rm inat ion_ Rate _Ind : SM ALL INT Pro duc t_Pa ram 1_ Code : INT EGER
H st_ (
o ID FK) CUSTOMER PHONE ITEM Affilia tion _Par ty_ Id: INTEGER Inte res t_Ca lc_M et hod _Cod e: SMAL LINT Pro duc t_Pa ram 2_ Code : INT EGER
MA TE _P R
S R A TY Ho st_ID CUSTOMER FAX
Ho me bra nch _Cha nne l_Id: INTEG ER Inte res t_Ac cou nt_ Nbr: VARCHAR(3 5) Pro duc t_Pa ram 3_ Code : INT EGER
Acc oun t_Ch ang e_Dt ime : T IMEST AMP( 0)
Id tifica _ r (FK
en tion Nb ) ITEM NUMBER
SIC_ Code : VARCHAR( 10)
SIC_ Gro up_ Code : SM ALL INT
Inte res t_Ac cou nt_ Nbr_ Mo difier : SM ALL INT
Fu nd_ Rate _Pct : DECIM AL( 16, 9) Acc oun t_Ch ang e_L oad _Dtim e: TIM ESTAM P(0 )
Ma r_P rty_
ste a ID S rt_ te
ta Da R 72
/3 Id tificatio N
en n_ br L ega l_Str uct ure _Cod e: SMAL LINT Affiliatio n_Pa rty _Id: INTEG ER M IS_Pro duc t_Id : INT EGER
Int ere st_ Rate _Pct : DECIM AL( 8,3 )
QUANTITY Em plo yee s_Cn t: INTEGER
Sy ste m_ Abus e_T ype _Cod e: SMAL LINT
Gro up_ Acco unt _Child _ind : CHAR( 1)
Cont rac t_St atu s_T ype _Cod e: SMAL LINT Bas e_Ra te_ Pct: DECIM AL(8 ,3)
L ang uag e_De mo g_Va lue_ Id: INTEGER Data _Valid atio n_Re sult _Cod e: SMAL LINT Int ere st_ Inde x_Co de: SM ALLINT
DESCRIPTION Ed uca tion _Dem og _Valu e_Id : INT EGER Prod uct _id: INTEG ER
R/370 M ste P ID(FK
a r_ arty_ ) So cial_ Stat us_ Dem og_ Value _Id: INTEG ER
M ar ital_ Stat us_ Dem og_ Value _Id: INTEG ER
Port folio_ Cahn nel_ Id: INTEGER
Mis _Pro duc t_Id : INT EGER
E d_ ate
n D De pen dan ts_ Cnt: INTEG ER
Pa ren t_In ter nal_ Org _Par ty_ Id: INTEGER
Port folio_ Chan nel_ Id: INTEGER
Depo sit_ Rene wed_ Ind: CHAR(1 )
Pa rty _Cha nge _Dtim e: TIM ESTAM P(0 ) Addit iona l_Int ere st_ Rate : DECIM AL( 8,3 )
Pa rty _Cha nge _Lo ad_ Dtim e: TIM ESTAM P(0) Inte res t_Dis bm _Ty pe_ Code : SM ALL INT
Bir th_ Coun try _Ge og_ Area _Id: INTEG ER Depo sit_ Ter min atio n_Ra te: DECIM AL(8 ,3)
G end er_ Code : CHAR( 1) Curr enc y_Co nv_Ind : CHAR( 1)
Pa rty _Sta tus : SM ALL INT Invest me nt_ Prod uct _id: SM ALLINT
28
© Swedbank
- 29. ODI Topology usage example
• Logical schema is mapped thru Context to Physical Server and Physical
Schema
LOGICAL SCHEMAS
Logical Schema: CORE_CARD Logical Schema: DW_MAIN
CONTEXT: PROD_EE
CONTEXT: PROD_LV
CONTEXT: PROD_LV
CONTEXT: PROD_GR
CONTEXT: PROD_EE
ODI Server Name: PROD_CORE_EE ODI Server Name: PROD_CORE_LV ODI Server Name: PROD_DW_GR
Server Name: TALLINN (LDAP) Server Name: RIGA (LDAP) Server Name: EDW.DOMAIN.EE (IP)
Schema: CARD Schema: CARD Schema: MAIN
PHYSICAL SERVERS - PRODUCTION
29
© Swedbank
- 30. Features of ODI topology
• Physical server has fixed user name and password
• One logical schema can map to exactly one physical
schema in one context
To make multiple users in same database – define more contexts or duplicate
the datamodel
• Logical schema cannot change technology
Conclusion – database schema is needed to be defined
as many times as many database users have
Single shared database connection is preferred to
maximize ELT –> compromise on resource
management on database side by user names
© Swedbank
- 31. ODI developer basic steps
1. Reverse engineer data models from source and
target
2. Define column level data mappings, specify join and
filter conditions.
Every data mapping (odi interface) can have exactly one target and
multiple sources
3. Select knowledge module (code generator)
4. Generate code (odi scenario) and execute scenario
31
© Swedbank
- 32. ODI scenario generation and execution
Data Objects Runtime variables
Connect & execute
commands
Interfaces Package Code Scenario Code DB 1
Generation Execution
Connect &
execute
commands
Knowledge modules Context
(Topology) DB 2
ODI Designer ODI Agent
• When knowledge module changes – rebuild and deploy all related scenarios
• When database objects change – refresh data structure definitions from source database, rebuild
and deploy all related scenarios
32
© Swedbank
- 33. Custom components
to manage 500 ETL processes
• Process registry
– all processes and their dependencies
• Common wrapper
– special scenario wrapping all others
• ODI monitor
– Web access to process registry
• Release builder
– Used for deploying from test to developement
33
© Swedbank
- 34. Process registry
• List of all ETL processes regardless of technology
- Create, change, retire process
- All necessary information for maintaining the list
• Process scheduling information
• Dependencies between processes
– Process to process dependencies
– Dependencies thru “Dependency Group”
– Based on process bookmarks
© Swedbank
- 35. Common Wrapper
• Special 1 instance ODI scenario, thru which all other scenarios
are executed (pre and post steps)
• Implements common functionality needed for all processes
- Checks if preliminaries of process have been filled
- Checks if process allowed to run at the moment.
- Assigns common process control variables and passes its values to
executed scenario
- Logs execution bookmarks, odi session ids, run result
- Alerts monitoring in case of failure
© Swedbank
- 37. Dependency group
• Defining dependency group - is the data content what process
delivers. It corresponds to business concept / subject area +
data availability.
• Proceses are either:
– Suppliers of Dependency group
– Consumers of Dependency group
• Dependency groups are also used for show the data availability
bookmarks for users in ad-hoc reporting environement
37
© Swedbank
- 38. Dependencies between processes
Value added calculation 1 Value added calculation 2 Consuming
processes
Is Consumer of
Is Consumer of Is Consumer of
Fin Agrmt Bal Dly Credit Agrmt Dly
Dependency
Fin Agrmt Dly
Groups
Is Supplier for Is Supplier for
Is Supplier for
Bank Account loading Loan Agrmt loading Leasing Agrmt Loading Factoring Agrmt Loading
Supplying
processes
© Swedbank 38
- 39. Enterprise metadata context
Manual Metadata
(Services, business requierments Metadata reports
etc)
METADATA
USER INTERFACE
Enterprise
Metadata
Repository
Transformation
CASE tool
metadata Presentation metadata
RDBMS metadata (Logical data
(ETL tools) (Reporting tools)
models)
TECHNICAL OPERATIONAL METADATA 39
© Swedbank
- 40. Metadata model – ETL related
IMPACT_LAYER
DEPENDENCY_GROUP SERVICE
Impact_Layer_Name
Dependency_Group_Name Service_Component_ShortName
Service_Component_ShortName (FK)
PACKAGE_SOURCE_LAYER PACKAGE_TARGET_LAYER
Package_Name (FK) Package_Name (FK)
Impact_Layer_Name (FK) Impact_Layer_Name (FK)
PROCESS
PROCESS_DEPENDENCY
Process_Name
Process_Name (FK)
Dependency_Group_Name (FK) Service_Component_ShortName (FK) PROCESS_EXECUT ABLE
ETL_Server_Name PACKAGE
Process_Status_ShortName Process_Executable_Name
Package_Name
Process_Executable_Name (FK)
Package_Name (FK)
PROCESS_SCHEDULE_TIME
Process_Schedule_Type_Code
Process_Schedule_No PACKAGE_SOURCE_OBJECT PACKAGE_TARGET_OBJECT
Process_Name (FK) PROCESS_PARAM PROCESS_RUN
Package_Name (FK) Package_Name (FK)
Process_Name (FK) Process_Name (FK) DB_Object_Name (FK)
Frequency_Type DB_Object_Name (FK)
Param_Name Process_Execution_Dtime Impact_Layer_Name (FK)
Frequency_Value Impact_Layer_Name (FK)
Param_Value Process_Boomark_Values
DB_OBJECT
Impact_Layer_Name (FK)
Sources of metadata: DB_Object_Name
PROCESS REGISTRY Service_Component_ShortName (FK)
TRANSFORMATION
RDBMS
MANUAL CONFIGURATION
40
© Swedbank
- 42. Process execution Daemon
• Planned component for automatic ETL workflow management,
start process when:
– It is time to process new data
– Preliminaries are ready
– Process run is allowed
• Replacement of enterprise job scheduler
• Utilizing framework of Process Registry and CommonWrapper
© Swedbank
- 43. Our experience with ODI (10g)
• Performance concerns
– Educate developers to use existing patterns
– Optimize knowledge modules, while keeping them as generic as possible
– Made lightweight quick web application for accessing execution logs
• Functionality
– Modified almost every KM which is now in use
– Created new KMs for common needs (new history integration, SAX xml
parsing for loadings, streamed xml output etc.)
– Made workarounds for missing features: OLAP function support, sub queries
– Utilized ODI code substitution framework to maximum
– Made command line utility to start ODI session on remote Agent
– Use DTS Agent for scheduling – single high-level workflow management
system
© Swedbank
- 44. Our experience with ODI (10g) , continued
• Deployment
– We use single ODI project per Area – shared sets of KMs and Variables
– To test – install separately changed data models, knowledge modules and odi
folders (common releasable unit, based on custom export script)
– Huge ODI project import operation required custom solution to do incremental
restore for whole project.
• ETL Administrator concerns
– no way to change the code directly in production (in case of urgent issues)
© Swedbank
- 45. 2008 started ETL processes migration
from MS-DTS to ODI . Current status:
Number of ETL processes Number of tasks in ETL processes
300
261 257
250
200
3224
150 DTS
ODI
3819
100
50
0
ODI DTS
45
© Swedbank