1. Réseau des Professionnels de la
Business Intelligence
en Suisse Romande
4ème Table Ronde
Lausanne, le 1er juin 2012
Dario Mangano
Head Of Knowledge Management
Nestlé Nespresso S.A. HQ
2. AGENDA
14h00 Welcome
14h30 Le groupe LinkedIn
14h45 Les métadonnées de chargement
15h30 Coffee Break
15h45 Le chargement par fuseaux horaires
16h30 Futures tables rondes
16h45 Coffee
17h30 Fin
Lausanne, le 1er juin 2012
17. Data Integration Best Practices oriented
Data integration is a family of techniques, most commonly including ETL (extract, transform, and load),
but also lots of related techniques that are inevitable when dealing with Data Integration: Metadata,
Change Data Capture, File loading, Publication, Data quality,... moreover it is always involving different
technologies: DB server, DB scripting, Shell scripting, …
All these techniques and technologies require development and support for a wide range of interfaces
using solution that can be hand-coded, based on vendor’s tool , or mix of both.
With such complexity in Data Integration systems, to develop and support these solutions is becoming
very challenging.
Having Best Practices and standards will ensure that all the systems are developed in a way that it is
much easier to support and also much safer and scalable to afford future needs and data volume.
The Data Integration Framework is a metadata driven development environment that is providing turn
key solutions for all these tasks around Data Integration:
- Metadata
- Change Data Capture
- File loading
- Data quality 17
18. Metadata Management oriented
Metadata is a key feature in Data Integration and Data Warehousing.
This is the only way to get answers to the following questions:
- Which column did this data come from?
- When was this data populated in the system?
- How is calculated this result on my report?
- Is my report up to date?
- Is my system scalable?
Having these answers will just increase the trust in the data, enable a pro active monitoring of Data
Integration processes, ensure that the data are loaded in a effective manner and at the end prevent
our system to lose value over the time by decreasing and absorbing the costs of understanding,
maintenance and repair.
The Data Integration Framework is providing a metadata management solution without any
development effort required from the project team:
Collecting operational metadata in real time
Capturing business and technical metadata related to data integration processes
18
Integrating all these metadata in a metadata repository
19. Data Integration Framework (DIF): Data Mngmt Framework
D IF c o m p o n e n t s
R o g
rvices
ep rtin
Metadata
se
Support Teams
monitor
Monitoring
D ta p blication
a pu lica n
ub tio
C an e D ta
ile ad g
ha ge a
Archiving
rch g
F loa in
File lo din
Business
a ture
ap re
A ivin
C ptu
h ng
Source
systems Rules Downstream
systems
Notification
Graphical User Interface
develop
Use develop
(see slide notes for comments) 19
Development Teams
20. Designs methods and tools to perform data
integration services
R e fe r e n c e M e tho D IF
A r c h it e c t u r e ds C ompon
E v e n t -T r ig g e r e d E T L E T L D e v e lo p m e n t W r atp p e r
en s
M e t h o d o lo g y
B a tc h E TL F ile L o a d e r
P a r s in g ,
M a t c h in g & S t a n d a r d In t e g r a t io n C ha ng e D a ta C a p ture
M e r g in g , M e t h o d s b y S u b je c t A r e a
R e je c t M a n a g e m e n t
C o n s o lid a t io n …
Q u a lit y A u d it in g D a t a Q u a lit y C o n t r o l M e t h o d s D Q m o d u le
P u b /S u b E v e n t B u l k P u b /S u b P a t t e r n P u b lis h e r M o d u le
O p . & T e c h n ic a l
M e ta d a ta M e ta d a ta d a ta mo d e l
M e ta d a ta
Ma na g e me nt C o lle c t io n
(Operational & Technical) M e t a d a t a c o lle c t io n d a e m o n
S ta nd a rd s 20
21. DIF: Back end modular architecture
Metadata
Sheduler Wrapper module Repository
FileLoader module
Publisher module
Archiver module
Purge module
Data Quality module
Notification module
Project specific code to apply business rules and requirements
(Powercenter, Shell script, Sql script, Store procedures,…)
Re-usable inlcudes (logging
routines, mail sending routines,…)
DIF Minimal installation
DIF available Modules/Services/Re- HP OV Metrics collection services
usable components
Powercenter Metrics collection services
21
22. Potential for Global Monitoring
Application
environments
Shared Metadata Repository
APP1
Project Team
us
e
Data access layer
Support Team
Reusable
components
- Autosys wrapper Middleware Team
e erver
Reports
- File loader
APP2 use - CDC Metadata
W bS
- Rejects recycling Repository
- Archiver
- Publisher
- ... Publication
Extract
External system
e
us
Engines
APP3
Retrieve key metadata from infrastructure and middleware components Application process
Unix HP
PowerCenter Scheduler Oracle Dbs Oracle Dbs
servers Openview
22
23. Reporting services (Cognos/BO reports) – Ex1
Using the reporting layer we can have access to the integrated metadata repository for all kind of report
or ad hoc query:
- monitoring report
- capacity planning
- impact analysis
Example of monitoring report with embedded navigation capabilities:
Dril down button
Open log file for
more details
23
24. Reporting services (Cognos/BO reports) - Ex2
The value added of having integrated metadata, is to have report showing correlated metadata on the
same view.
For example this Gantt view execution report, will show if there is a correlation between a given interface
execution and server workload. Drill through this
This is very useful to understandto interface step issue, but also for capacity planning purposes….
performance interface details
Drill down report
Gantt view for this interface
24
25. Reporting services (Cognos/BO reports) - Ex3
Another example of the details we can get from the reports.
Using publication module, the metadata will tell you what are the XML files that were produced, how
many rows were extracted from the database,..
And also to which downstream applications the package was pushed to:
25
27. Data Integration Application
Architecture Reporting services
Operational Metadata Repository
Cognos
configuration Exception
Metadata Metrics Business
Metadata Logs
Objects
Level 2 Supprt
DEV Team (L3 support)
Business Users
Data Integration Framework Services (modules , monitoring services )
Data Change data Rejection Exception
Data Quality Notification Archiving Auditing Workflow Publication
Movement capture recycle Mgmt
Interface
ETL ETL
Task(s) Task(s)
ETL ETL
Task(s) Task(s)
Flat File FileLoader Publisher
Source Task(s) Script Script Task(s)
system Task(s) Task(s)
Data Publication
Publication Script Staging Integration Script Layer
Layer Oracle Layer DWH
DB Table(s) Layer Oracle
store proc store proc
Oracle Oracle
store proc store proc
Sql script Sql script
Sql script Sql script
Step (s) Step(s) Step(s) Step(s)
Scheduler
Data Flow
27
28. Monitoring Implementation
1s t L e v e l
DB S e r v e r s S upport
Application/service level
P o we r C e n t e r A p p l ic a t io n & view enables Service
We b S e r v e r s Desk to rapidly intercept
fatal alerts &
communicate service
Sc hedul DI F M o d u l e s a n d outages to affected
er users
S e r v ic e s
2 nd L e v e l
S upport
Identify root cause of
Re p o r t s &
Da s h b o a r d s issue & take effective
Me t a d a t a action. Data is available
R e p o s it o r y for analysis to anticipate
Service-L evel V iew of the D ata I ntegration A pplication issues and bottlenecks.
28
rd
29. AGENDA
Le chargement par fuseaux
horaires
Cedric Zbinden Dario Mangano
BI architect Head Of Knowledge Management
Nestlé Nespresso S.A. Nestlé Nespresso S.A.
30. Le charement par fuseaux
horaires
Question: Comment gérer la cohérence
et l’intégrité des données du DWH
lorsque les données sont chargées par
zones géographiques et par fuseaux
horaires ?
41. Le charement par fuseaux
horaires
Résumé des discussions:
-Le HQ et les marchés n’ont pas les mêmes besoins en terme de
rafraicissmeent de données revoir si le HQ peut se satisfaire de J-2 ?
-Utiliser le chargement de schémas différents afin de ne pas requêter le
schémas qui est en train d’être lo9ader, puis faire un drop partition à la
fin ?
-Utiliser les master cubes Cognos
44. Futures Tables Rondes
Propositions:
-Big Data
-Démonstration d’un POC ClickView (groupe Mutuelle)
-Exemple de governance permettant de mieux cadrer les demandes des
businesses (Business Case, Demand management committee, etc.)
-In Memory appliances
-BI Mobile
-Column based DB / NoSQL
-Data Virtualization
-BI SaaS
Data Integration framework services = Data Integration toolset and services It is designed to help data integration project development teams and support teams: Provide common/standard re-usable components and services to perform data integration tasks (File loader, Change data capture, Rejects recycling, Publication/Subscription, Notifications,...). Provide a metadata driven development environment: highly configurable. Collect and store operational metadata for all components/processes involved in data integration projects. Provide unique web entry point (reporting tool) to monitor end to end project activity (daily monitoring, performance analysis, capacity planning, impact analysis,..). Development Team is focused on Business Rules development (project core).