3. Prague Data Management Meetup
Data Management
Získávaní dat
Ukládání dat
Zpracování dat
Interpretace dat
Použití dat
• Otevřená profesionální zájmová
skupina
• Každý je vítán (ať už v pasivní
nebo aktivní roli)
• Témat není nikdy dost
• Snaha o pravidelné měsíční
setkávání
• Fungujeme od září 2015
4. Historie
Datum Téma
10. 9. 2015 Data Management
14. 10. 2015 Data Lake
23. 11. 2015 Dark Data (without Dark Energy and Dark Force)
12. 1. 2016 Data Lake (znova)
7. 3. 2016 Sad Stories About DW Modeling (sad stories only)
23. 3. 2016 Self-service BI Street Battle
27. 4. 2016 Let's explore the new Microsoft PowerBI!
22. 9. 2016 Data Management pro začátečníky
17. 10. 2016 Small Big Data
22. 11. 2016 Základy modelování DW
23.1.2017 Komponenty datových skladů
28.2.2017 Operational Data Store
8. Operational Database vs. Data Warehouse
Characteristic Operational Database Data Warehouse
Time focus Current Historical
Details level Individual Individual and summary
Orientation Process Subject
Records per request Few Thousands
Normalization level Mostly normalized Normalization relaxed
Update level Highly volatile Mostly refreshed (non volatile)
Data model Relational (3NF) Relational (star schemas, hybrid, 3NF) and multidimensional
(data cubes)
Source: CourseraOperational Data Store
9. Inmon, Imhoff & Battas ODS Definition
• Features:
• Subject-oriented (like a data warehouse)
• Made up of integrated data (standard, consistent data formats)
• Volatile (changes as often as the source system)
• Current (low-latency data capture; no historical detail)
• Defined in the mid-1990s
• Later Adopted by Gartner, Inc.
• When limited in scope to customer or product data, the canonical
ODS is similar to master data management (MDM).
9
10. Adastra Business Intelligence Reference Architecture
10
ODS
Operational
reporting
Enterprise DWH Big Data
Platform
Data Lake
Event
Processing
Semantic
Models
Advanced Analytics
Perceptual / cognitive intelligence
Information Management
Relational / Structured data Unstructured data Streaming
Data Workflow
Orchestration
Data Transformation /
Processing
Data
Management
Event Ingestion
Complex Event
Processing
Notifications
BI / Application
Integration
Machine Learning
In-database Data Mining, R
Recognition of human
interaction and intent
SMP and MPP
In-memory technologies
In-memory Columnar
In-memory technologies Hadoop, NoSQL
Business Intelligence / Data Delivery
Real-time DashboardsDashboards and visualizationsReports Self-service BIMobile BI
IoT Network
Field Gateway
Big data
OLAP
11. Architecture Reasons for ODS
• Copy vs. Reference - why copy data into ODS?
• Performance issues
• Faster local data access
• Load distribution (Operational and Reporting)
• Time issues
• Less granularity of secondary system
• History
• Availability issues
• e.g. primary 10x5, secondary 24x7
• Consolidation issues
• e.g. Consolidated client, product
• Security issues
11
12. ODS Possible Roles in Architecture
• ODS as data store for operational processes (PDI/CDI)
• ODS as DWH stage
• ODS as operational reporting data source
• ODS as data exchange component
• ODS as data cache for other systems
• ODS as MDM solution
• ODS as replacement of legacy system
• ODS as DWH data load type (near-real time DWH)
12
13. Truth in data
13
Primary data
Primary data
(another system)
Secondary data
Consolidated data
…Noise generator
Truth
• Independent truth in data does not exist
• Truth depends on Business and Data architect definition
14. Inmon ODS Classes
• Class I. (Real-Time ODS)
• Transactions were moved to th e ODS in an immediate manner from applications in a range
of one to two seconds from the moment the transaction was executed in the operational
environment until the transaction arrived at the ODS. In this case, the end user could hardly
tell the difference between an activity that had occurred in the operational environment and
the same activity as it was transmitted in the ODS environment.
• Class II. (Near Real-Time ODS)
• Activities that occurred in the operational environment were stored and forwarded to the
ODS every four hours or so. In this case, there was a noticeable lag between the original
execution of the transaction and the reflection of that transaction in the ODS environment.
However, this class of ODS was much easier to build and to operate than a Class I ODS.
• Class III. (Daily ODS)
• The time lag between execution in the operational environment and reflection in the ODS is
overnight. In a Class III ODS there is a noticeable time lag between the execution of the
transaction in the operational environment and the reflection of the transaction in the ODS
environment. This type of ODS is relatively easy to build.
• Class IV. (Datawarehouse ODS)
• A Class IV ODS is one that is fed from the data warehouse from analysis created by the DSS
analyst in the data warehouse environment and condensed down to a point where the
results of the analytical processing fit comfortably in the ODS. The input to the ODS can be
either regular or irregular. This class of ODS is very easy to build as long as the data
warehouse has already been constructed.
• (Class V.)
• Highly integrated and aggregated data source for reporting
14
15. Alternative ODS Typology (Execution MiH)
• TYPE I (Data Cache)
• Online data store, used for transaction execution and system interface purpose
• These data stores have source system data replicated in the central data store. The source system exchange data with other systems through this data store,
instead of exchanging point to point interface files
• Other applications of this kind of data store architectures is to provide a common database for source systems to directly refer to. For example, you can
have the source systems updating and referring to the sanitized master tables existing in the ODS (we will refer to this in our Master Data Management
Section, which is still under authoring). There are situations where the source system is directly referring to or updating a table in an ODS.
• TYPE II (CDI/PDI)
• Online data stores, used for Servicing and Relationship
• This is a similar application as mentioned above, however the focus is limited to getting single customer, process and master data view for the sake of
stakeholder servicing (like customer, employee and Vendor servicing). The examples are customer relationship single view, or customer touch point single
view. You can retrieve this single view during your in-bound or out-bound interactions with the customers. This online operational access, gives you the
benefit of risk management, cross-sell, up-sell etc.
• TYPE III (Operational Reporting)
• For reporting
• Technically it is not an ODS, but people use the term for this application as well. You can have a reporting data to churn out your operational reporting. It
has replica of select data from the source systems. It generally has low-intervention transformation.
15
16. Microsoft: DWH vs. ODS
• The purpose of the Data Warehouse (DWH) in the overall Business Intelligence Architecture is to integrate corporate data from different
heterogeneous data sources in order to facilitate historical and trend analysis reporting. It acts as a central repository and contains the
"single version of truth" for the organization that has been carefully constructed from data stored in disparate internal and external
operational databasessystems.
• The purpose of the Operation Data Store (ODS) is to integrate corporate data from different heterogeneous data sources in order to
facilitate real time or near real time operational reporting. Often data in the ODS will be in structured similar to the source systems,
although during integration it can involve data cleansing, de-duplication and can apply business rules to ensure data integrity. An ODS is
mainly intended to integrate data quite frequently at the lowest granular level for operational reporting in a close to real time data
integration scenario. Normally, an ODS will not be optimized for historical and trend analysis on huge set of data.
• Let's summarize the differences between an ODS and DW:
• An ODS is meant for operational reporting and supports current or near real-time reporting requirements whereas a
DW is meant for historical and trend analysis reporting on a large volume of data
• An ODS is targeted for low granular queries whereas a DW is used for complex queries against summary-level or on
aggregated data
• An ODS provides information for operational, tactical decisions about current or near real-time data acquisition
whereas a DW delivers feedback for strategic decisions leading to overall system improvements
• In an ODS the frequency of data load could be hourly or daily whereas in an DW the frequency of data loads could be
daily, weekly, monthly or quarterly
16
18. Adastra ODS Principles
Integrated and
consolidated
data
Subject
oriented data
Master data
focus (business
entities)
Changing data
(actual data)
Limited history
data
(transactions)
Low level data
granularity (no
aggregations)
Mix between
OLTP and DWH
„The best from
both worlds“
18
19. ODS Features
• One version of truth (with different processes presentation)
• Single customer view across all systems / businesses
• Customer Data Integration
• Product Data Integration
• Data cleansing and consolidation (MDM platform)
• Integrated data for other systems or applications (data cache)
• Online access (read and write)
• Quick access to actual data (operational reporting)
• One of component for SOA Architecture (not only)
• Efficient common information exchange among businesses or systems
• One platform for all businesses and IT systems (online and offline processes)
• Data sets from many sources
• Support or replacement for legacy systems
19
20. ODS Benefits
Business Benefits
• Real-time consolidated and integrated data for any purpose
• More reliable mission critical processes
• Reduce costs on IT solutions
• Single customer view
• Integrated product data
• Enabling multichannel and efficient campaign management
• Data for credit risk management
• Integrated communication across all channels
• Economical network analysis
• Faster collection processes
• Online fraud detection
• Near-real time operational reporting
• Data monetization
Technical Benefits
• One version of truth (with different process presentation)
• Single customer view across all systems / businesses
• Customer Data Integration (CDI)
• Product Data Integration (PDI)
• Data cleansing and consolidation (MDM platform)
• Integrated data for other systems or applications (data
cache)
• Online access (read and write)
• Quick access to actual data (operational reporting)
• One of central component of SOA Architecture
• Efficient common information exchange among businesses or
systems
• One platform for all businesses and IT systems (online and
offline processes)
• Data sets from many sources
• Support or replacement for legacy systems
20
21. ODS
ADS
(DWH or EDW)
DATA
ONLINE WORLD OFFLINE WORLD
1. Focus on operational processes
2. Online read and write 24/7
3. For other IT systems / prorcesses
4. Limited data set
5. Very limited history
6. Focus on current data
7. Low data granularity
8. Integration with ADS
1. Focus on analytic tasks
2. Offline batch processing
3. For end-users
4. Large data Set
5. Long history
6. Focus on all data
7. Many levelds of data granularity
8. Data marts and data aggregates
21
22. ODS Data Refresh Time Period
Real-time
Near-real
time
Many times
per day
Daily
Monthly Ad-hoc Hybrid
23. ODS Data Transformations
• Batch Processing
• ETLs
• Extract, Transform, Load
• Transform data from source table / tables to one target table
• Transformation ETLs, Synchronization ETLs
• Advanced data processing
• Batch data cleansing and unification
• Advanced calculations
• Online Processing
• APIs
• Read APIs
• Write APIs
• Change Data Capture (CDC)
24. Database provider’s
competency
Consumer’s competencyConsumer’s competency
System independency – Reason for API
24
Database
External Data Consumer
Database
External Data Consumer
Interface layer
Concentrated transformation logics
Enterprise level impact analysis required
External workload consumers
25. Service layer agreement (SLA)
• A definition of services
• Availability (99.99%)
• Open hours (24x7, 10x5)
• Performance
• Problem management
• Security
• Disaster recovery
• Termination of agreement
25
Availability % Downtime per year
98% 7.30 days
99% 3.65 days
99.5% 1.83 days
99.9% 8.76 hours
99.99% 52.6 min
99.999% 5.26 min
99.9999% 31.5 s
30. Datové domény
Produkty 3.
stran
Oddlužnění ETM Nabídky Žádosti Souhlasy
Klasifikace
Ekonomické
skupiny
Kampaně Produkty Segmentace
Behaviorální
data
Externí data
Identifikace
klienta
Podpisová
oprávnění
Kontaktní
údaje
Unifikace Ostatní
30
33. Instance Party
Unified PartyLocated Address
Instance Address Instance Phone
Unified Phone
Account
Product Instance
Product Instance Party Role
Application
Account Balance Fact
Account Role
Product Instance Relationship
Loan Instance
Facility Instance
Business Product Type
ODS Core Tables
(ABDM)
Card Instance
... Instance
Instance Email
Instance ID Card
Application Detail
34. Benefits
Business Benefits
• Real-time consolidated and integrated data for any purpose
• More reliable mission critical processes
• Reduce costs on IT solutions
• Single customer view
• Integrated product data
• Enabling multichannel and efficient campaign management
• Data for credit risk management
• Integrated communication across all channels
• Economical network analysis
• Faster collection processes
• Online fraud detection
• Near-real time operational reporting
• Data monetization
Technical Benefits
• One BI version of truth (with different process
presentation)
• Single customer view across all systems / businesses
• Customer Data Integration (CDI)
• Product Data Integration (PDI)
• Data cleansing and consolidation (MDM platform)
• Integrated data for other systems or applications (data
cache)
• Online access (read and write)
• Quick access to actual data (operational reporting)
• One of central component of SOA Architecture
• Efficient common information exchange among businesses or
systems
• One platform for all businesses and IT systems (online and
offline processes)
• Data sets from many sources
• Support or replacement for legacy systems 34
38. Datové domény
Běžné účty /
Deposita
Úvěry Karty Pojištění Služby
Produkty třetích
stran (Energie,
Telco,..)
Transakce Rezervace/Blokace Klienti Žádosti o produkty
Žádosti o procesní
zpracování
Kontakty Zajištění Eventy
38
39. Přínosy
Konsolidace dat z
mnoha BE
Odlehčení
middleware
Zrychlení odezvy
front end
aplikacím.
Zajištění vysoké
dostupnosti služeb.
Online interface
pro DWH.
Detekce událostí
Datový rozcestník
do BE
Kratší čas a méně
úsilí pro dodávku
požadavků.
Bez složité procesní
integrace
Propis dat je mimo
účetní uzávěrky
opravdu rychlý.
41. 41
WEB Services
WEB Services
CRM
Vrstva L0
eShop
Vrstva L1
Navision
Rozhraní pro návazné systémy
CRM eShop
Metadata
Adresář
MS SQL Server 2012
OLE DB
OLE DB
Navision
ODS
Navision
Diskový svazek
pro NAV
Snapshot
svazku
SQL Server 2012 SQL Server 2012
ODS
Agent diskového pole
Diskové pole
Připojení svazku k
serveru
Metadata
Konec ETL
SQL Server Agent
Odpojení
svazku
Start ETL
42. Přínosy
Uvolnění zátěže
primárního systému
Integrace e-shopů
Podpora pro věrnostní
program
Snadnější integrace
nových systémů
Zpřehlednění datových
toků
Jedna verze pravdy pro
návazné systémy i
zákazníky na webu
Přímý přístup k datům
prostřednictvím
databázových
snapshotů
Webové služby
•metody s online přístupem
•metody pro synchronizaci
dat
42