2. AGENDA
• Introduction to DWH
– Are DWH project complex, challenges, requirements
• Information Architecture (EA) for DWH
– Models, Workflows, Artifacts
• Mapping over Sybase PowerDesigner
– Mapping to IA models and artifacts
– Features of interest for DWH
• Demonstration
– One example of IA architecture on the project
• DWH Referent IA
– Layers and recommendations
2 – Company Confidential – June 18, 2012
3. COMPLEXITY OF DWH PROJECTS
ARE DWH PROJECTS COMPLEX?
3 – Company Confidential – June 18, 2012
5. BI/DWH COMPLEXITY
Causes, sources
– Data Sources
Different sources, technologies, business functions, legacy,
overlapping, concepts, elements
– Scope and Performances,
• Never enough, never on time, content variations(!),
– Participants
• Different backgrounds, knowledge, skills, motivation, visions,
– Requirements
• Continuous changes and extensions,
– Growth and Development
• Volume of data, people, reports and analysis,
– Quality
• Clean, right, correct on time, cleansing (division of resp.)
5 – Company Confidential – June 18, 2012
6. BI/DWH COMPLEXITY
Change, Heterogeneity
– Never enough
Users/BI Analyst can not give definite, detailed, complete and
precise specification of all reports/views in advance (!),
Changes are coming on the end, they are inevitable and continual,
– Never on time
Every change should be implemented and used in usable time
frame, before user forgets about it,
– Never one and exactly one data source,
Different sources results in:
– Different DMS technology, different refresh rate, different volume,
different performances (management).....
– Data overlapping – consolidation,
6 – Company Confidential – June 18, 2012
7. BI/DWH COMPLEXITY
Volume and growth, data quality
– Large Volume and intensive Growth is inevitable,
Operational Data Sources,
– Keep only data set needed for operational work (year?),
– Keep it in the shape suitable for operational work (Relational),
– Analytical Extension,
Keep data needed for sound analysis (many years?),
Keep it in the shape suitable for analysis (MDM),
– Growth is inevitable per time and volume,
– Compromises:
Time: keep last (x) months or representative sample,
Nobody is actually happy, neither IT or Business
– Data Quality
Clean, consolidated data source does not exists,
Every data source needs “housekeeping” constantly, Data Entry/ETL
7 – Company Confidential – June 18, 2012
8. BI/DWH COMPLEXITY
Performance
– Never fast enough
Use of technology not designed/suited for analytics (RDBMS),
Intensive use of „ad hoc“ request – indexing problem (RDBMS),
– Free exploration over arbitrary data set is heavily limited,
Very intensive and heavy administration – never ending story,
– One and only one complete “Version of the truth”
Similar or overlapping analysis are presenting different data!!
– Which one is correct and right? What about the rest of it?
You are publisher – hold the responsibilities,
– Hold the reader's trust,
– Publish on regular basis,
– Use variety of sources and edit them with quality and consistency,
Data consistency must be established and protected,
8 – Company Confidential – June 18, 2012
11. INFORMATION ARCHITECTURE FOR DWH
Position of IA
Motivation, goals, business principles
organizational structure, Business
Functions, Services and Processes
IT support for Business Architecture
System Services, Applications,
Databases, Components, Forms,
Reports, Data Flows....
Technology for IS Architecture,
network, servers, installed instances,
Access points,
From current to planned
11 – Company Confidential – June 18, 2012
12. INFORMATION ARCHITECTURE FOR DWH
Method (ADM)
– To Define Architecture Development Method
Define at any point of the project Who is doing What, How and When
Define Phases, Workflows, Artifacts, Models, and Deliverable
– Essential for DWH/BI with Backward Requirement process
– Presented ADM and IA is:
Agile – simplification of RUP, TOGAF
Iterative and Incremental – cyclic repetition of workflows,
Data Driven – based on Data Assets
Comprehensive – includes all activities including maintenance and RFC
Model Driven – all artifacts are represented with modeling artifacts
Requirement Driven – placed in the center of methodology
Sustainable – at any point knowledge is collected, formally specified
and properly presented
12 – Company Confidential – June 18, 2012
13. INFORMATION ARCHITECTURE FOR DWH
Main ADM Cycle
Requirements&Constraints in
the center
Not all are mandatory
Presented main cycle, others
possible
Many cycles are expected
All that is needed to obtain
sustainable system
• Development
• Deployment
• Maintenance
Active, in IME
13 – Company Confidential – June 18, 2012
14. INFORMATION ARCHITECTURE FOR DWH
B.Data Analysis
– Objective:
Discover, identify, collect, elaborate, specify, define and present Data Assets
Different abstraction levels: from conceptual to implementation
– Viewpoints:
Architectural viewpoint, Data Providers and Consumers, data flow process,
engaged systems, applications, components, usage, access rights
Structural viewpoint, structure, attributes, relationships, dependencies,
rules applied on conceptual and physical level
– Inputs:
IA (others not in the scope),
– Outputs
DA, Sources Conceptual and Physical Data
14 – Company Confidential – June 18, 2012
15. INFORMATION ARCHITECTURE FOR DWH
C.DWH Design and Implementation
– Objective:
D&I of integrated and unified data collection, organized in dimension of
time, which is subject oriented used for analysis, planning and evaluation of
business performances
Establish common view (unified/integrated/complete) over the enterprise
data, stable source of historical information, accommodate data growth
– Activities:
Full and detailed schema specification for DWH, Staging and ODS
– Inputs:
IA & DA, Sources Conceptual/Physical Data,
– Outputs
IA & DA, Conceptual and Physical DWH
15 – Company Confidential – June 18, 2012
16. INFORMATION ARCHITECTURE FOR DWH
D.ETL Design and Implementation
– Objective:
To analyze, elaborate, define, specify, present and implement full and
incremental ETL flows between source2staging, staging2DWH, DWH2MDM
To discover, identify, collect, elaborate, specify, define and present all
characteristics of ETL flows,
– Activities:
Extraction Method, Schema, Condition and Frequency for increments,
– Source 2 Target Mapping,
Transformation processes on the appropriate level of details,
Data Flow Architecture, Trash management
– Inputs:
IA&DA, Sources Conceptual/Physical Data and DWH,
– Outputs
DA, Data Flow, Sources Conceptual/Physical Data,
16 – Company Confidential – June 18, 2012
17. INFORMATION ARCHITECTURE FOR DWH
E.BI Design and Implementation
– Objective:
Establish Multidimensional space (Business Universe) with Facts, Measures,
Dimensions and Hierarchies,
Build visualization including Reports, Dashboards, OLAP views,
Check if requested KPI set is supported and presented,
– Activities:
MDM Space (above)
Detailed specification of requested KPI with mapping
Detailed specification of Reports, Dashboards and OLAP Views
Access rights and delivery mechanisms
– Inputs:
DWH, IA and DA
– Outputs
IA, DA, DWH (MDM) models
17 – Company Confidential – June 18, 2012
18. INFORMATION ARCHITECTURE FOR DWH
Paths – Simplified Analysis, Design and Implementation
Generally many paths are
possible
Gap Analysis may discover
missing info
Analysis, Design and
Implementation of DWH, ETL
and BI are tightly
interconnected and
dependent on each other
18 – Company Confidential – June 18, 2012
19. INFORMATION ARCHITECTURE FOR DWH
Models – Business&Information Architecture
– Specifies
Application systems and applications,
Data Assets, Databases, Data Source/Destination, Data
Providers/Consumers,
Usage of Data Assets and Applications, cooperation and
collaboration of Applications and/or services,
ownership over the Data Assets, Applications and Services,
elements of SLA
ETL procedures on high abstraction level.
– Viewpoints
Architecture of the system
– Represents a “hat” for the rest of the system
19 – Company Confidential – June 18, 2012
20. INFORMATION ARCHITECTURE FOR DWH
Models - Source Model
– Specifies
details to understand structural relationships and meaning (conceptual)
internal structure of data source with implementation details (physical)
– tables, columns, views, keys, procedures, indexes, rights, constraints, triggers
consolidated by bidirectional synchronization and associated
transformation
Specifies extraction scheme for every data source,
Source for Source2Target mapping
– Viewpoint
One or more diagrams per source to represent subject area
– Used to synchronize changes from source into DWH IA
Starting point for change management
20 – Company Confidential – June 18, 2012
21. INFORMATION ARCHITECTURE FOR DWH
Models – DWH model
– Specifies
details to understand structural relationships and meaning (conceptual)
internal structure of DWH with implementation details (physical),
Internal MDM structure of Data Marts (physical),
For Staging, Trash, ODS, DWH and MDM
Target for Source2Target mapping
Relational2MDM mapping,
– Viewpoints
One or more diagrams to represent subject area
One or more MDM diagrams to represent Data Marts
– Used to synchronize changes from DWH IA to actual RDBMS
Ending point for change management
21 – Company Confidential – June 18, 2012
22. INFORMATION ARCHITECTURE FOR DWH
Models – Data Flow model
– Specifies
Connects all important data sources to destination
Data flows from source to destination with all attributes and constraints
– characteristics of flow processes, source and destination tables, kind of the
flow (ETL, replication or federation), integration preconditions, used
integration service, possible outcomes etc.
Mapping source2target within every step of the flow
– Viewpoints
One or more diagrams to represent actual flow task
– Aggregation, sort, filter, projection, split, join, merge, lookup
One or more diagrams to represent transformation control flow
– Used to present integration, consolidation and migration
22 – Company Confidential – June 18, 2012
23. MAPPING ON SYBASE POWERDESIGNER
DWH RELATED MODELS AND FEATURES
23 – Company Confidential – June 18, 2012
24. MAPPING ON SYBASE POWERDESIGNER
Data Models
24 – Company Confidential – June 18, 2012
25. MAPPING ON SYBASE POWERDESIGNER
Architecture and Requirements
25 – Company Confidential – June 18, 2012
26. MAPPING ON SYBASE POWERDESIGNER
DHW related features – Dependency Matrix
– What
Two dimensional hierarchical matrix
Present, review and create/delete links of particular kind between
two artifacts
– Any model, any diagram, any two artifacts
– Indirect (two or more links) dependency, drilling
Hierarchy of objects on row/column, Copy to CVS
– Reasoning
Full, rich, useful dependency analysis (network)
EAM to understand and present dependency between Data Assets
and Data Providers and Consumers,
PDM to create mapping overview
DMM to present actual source to target dependencies
26 – Company Confidential – June 18, 2012
27. MAPPING ON SYBASE POWERDESIGNER
DHW related features – Mappings
– What
Modeling Connection between objects
– Mapping with transformation (O/R, R/R, O/O)
– Generation (Generate Mappings)
Wizard to convert mappings into Transformation Task (ILM)
Mapping Editor
– Reasoning
Data Flow specification
Relational to Multidimensional,
– DWH2MDM
Relational to Relational,
– Source2Target
Any descriptive dependency
– Federation concept (not Replication or ETL)
27 – Company Confidential – June 18, 2012
28. MAPPING ON SYBASE POWERDESIGNER
DHW related features – Impact/Lineage Analysis
– What
Impact Analysis – consequences of the change
Lineage Analysis – objects forming the basis for object
Temporary View for Review
IAM for permanent view, snapshot (Drilling, Exploring)
Analysis Rules changeable (Impact/Lineage),
– Reasoning
Change Management evaluation, estimation and planning,
– To asses change impact before it happens (costs, time, resources)
Snapshots
– Development points, different version of system,
Meta Data BI
– To explore meta-data set, discover implicit dependencies,
28 – Company Confidential – June 18, 2012
29. DEMONSTRATION
EXAMPLE
29 – Company Confidential – June 18, 2012
32. REFERENT INFORMATION ARCHITECTURE
Data Access Layer - recommendations
– Set of processes, tools, activities and models:
Data extraction (E) from operational system to DWH,
Transformation to suitable shape (T),
– Data cleansing, consistency check, integrity,
– Translation from operational to enterprise format,
– Enterprise DWH data structure is inevitable different then operational,
Data loading into DWH (L),
– Recommendations :
Understand OS structure, rules and dynamics,
Dynamics of data refresh rate should be realistic,
Changed Data Capture Algorithm: Intrusive/Non Intrusive,
Apply effective ETL/ELT, use staging area,
Document everything - very intensive and complex changes,
– Mappings between Data Sources and DWH Destination,
32 – Company Confidential – June 18, 2012
33. REFERENT INFORMATION ARCHITECTURE
DWH Layer and Recommendations
– DWH, Staging, Trash and ODS
Common view on enterprise data, regardless of how/who will use it,
Unification offers flexibility in how the data is later interpreted
A stable source of historical information,
Efficient accommodation of a data explosion (growth),
Supply data for Analytical layer on required granularity,
Trash and Alerts&Matching to jump over initial cleansing (blocker),
– Recommendations:
Use pre-packaged solution and existing experience,
Use relational and multidimensional modeling (document all),
Relational to address performance issues, follow OS paradigm,
Multidimensional to present later Business Universe for Analytics,
Use views to transform and map Relational to Multidimensional and
back,
33 – Company Confidential – June 18, 2012
34. REFERENT INFORMATION ARCHITECTURE
Delivery Layer and Recommendations
– Set of processes, tools, activities and models:
Selection of data subset to be delivered,
Reorganization (format) of the data to be delivered,
– Aggregation, Summing, Counting, additional classification,
– Data transformation (Date, Time), slowly changing dimensions
Transform DWH structures to “Business Universe”,
– Facts, Measurements, Dimensions, Hierarchies, Business lang. abstraction
Granularity accordingly to the End User needs,
– Recommendations:
Use multidimensional modeling,
Model transformation/mappings between DWH and Data Mart(s),
Extend multidimensional model with required BI meta-data,
Define refresh rate on the basis on user needs (constraints),
Aggregations are difficult for incremental update
34 – Company Confidential – June 18, 2012
35. REFERENT INFORMATION ARCHITECTURE
MDM Layer and Recommendations
– Data marts
DWH derivatives, provide the business community answers to asked
questions and strategic analysis,
Tailored for a particular capability or function of enterprise,
Vertically organized and bounded to one business function,
Organized in multidimensional structure,
OLAP – On line Analytical Processing – ROLAP/MOLAP
– Recommendations:
Choose proper storage technology (ROLAP/MOLAP),
– Special storage may not be standard and may narrow your choices,
– Common storage may not be performative enough,
Choose Virtual Marts as basement of Analytics,
Use separate V. Mart for separate business concerns,
Adjust to BI meta-data requirements, use automated access,
35 – Company Confidential – June 18, 2012
36. INFORMATION ARCHITECTURE
FOR DWH PROJECTS
QUESTIONS?
RUAIRI PRENDIVILLE
SENIOR CONSULTANT , SYBASE (UK)
JUNE I4TH, 2012, ISTANBUL