SlideShare une entreprise Scribd logo
1  sur  30
Building The Enterprise Data
Architecture
The Components, Processes and Tools of a
Successful Enterprise Data Architecture
Enterprise Data Serves Two
Purposes
 Running the Business
◦ Tracking transactions such as sales, invoices,
payments, deliveries, payroll, benefits, etc.
 Managing the Business
◦ Accounting, Budgeting and Planning
◦ Marketing
◦ Asset Management
◦ Process Optimization
◦ Strategy
 Definition
 Monitoring
 Investment and resource allocation
Why is Data Architecture So
Hard?
 Complex
◦ Data IS complex
◦ The world changes rapidly, hard to keep up
 Shared Data and Competing Stakeholder Interests
◦ Stakeholders have different definitions and uses for shared data
◦ Conflicts are often resolved with incompatible models and duplication
 Data Proliferation
◦ Useful data is replicated or transferred to other systems (e.g. accounting)
◦ How and where data is used is often unknown
 Data can be inaccurate
◦ Data entry is error prone and data duplication is common
◦ Calculation errors are surprisingly common
 Resource Intensive
◦ Databases can be very large
◦ Managing data can consume a lot of staff resources
◦ Tools and licenses can be pricey
A Robust Data Architecture
Should Be
 Trusted
 Accessible
 Timely
 Integrated
 Consistent
 Comprehensive (Complete)
 Verified
 Documented
 Discoverable
 Flexible
 Scalable
 Cost Effective
 Transparent and Visible
 Safe
 Governed
 Auditable (Lineage and eDiscovery)
 Secure
The Data Lifecycle
 Data goes through a lifecycle with phases such as:
◦ Creation/capture
◦ Modification
◦ Processing
 Aggregation, joining, filtering, transformation
◦ Consumption:
 Product/service delivery/fulfillment
 Reporting
 Statistical analysis
 Exploration (Data discovery, search)
◦ Archiving/destruction
 Data lifecycle can be complex and difficult to manage
A Successful Data Architecture
Must Deal With
1. Data Repositories
2. Data Capture and Ingestion
3. Data Definition and Design
4. Data Integration
5. Data Access and Distribution
6. Data Analysis
And More….
Data Repositories
 Objective:
◦ Repositories must be extremely solid and reliable.
 Issues
◦ Complexity of tools
◦ Licensing, resource and skill costs can be high
◦ Tolerance for failure is extremely low
 Types of Data Repositories
◦ Application
◦ Reference/Master
◦ Meta Data
◦ Data Warehouse
◦ Discovery
◦ Archive
◦ Offsite/DR Repositories
 Technology
◦ Traditional RDBMS: Oracle, DB2, SQL Server, Informix
◦ Data Warehouse: Neteeza, Greenplum, Exadata, etc
◦ Big Data: Hadoop, CouchDB, Redis, HBase, CouchDB
Data Capture and Ingestion
 Objective:
◦ Capture all relevant business data
◦ Ensure data is both accurate and complete at point of capture
 Issues
◦ Great diversity (online entry, ftp, APIs, real time streams, etc., unstructured
data)
◦ Incoming data is often unreliable, incomplete
◦ Big data explosion
 Types of Data Capture
◦ Online
◦ File transfers
◦ Social media (real time)
◦ Logs (web, security, o/s)
◦ Data subscriptions (address databases, etc.)
 Technology
◦ Online Applications (JEE, .NET, COBOL, etc.)
◦ File transfer: FTP
◦ APIs: Web services, REST, SOAP
Data Definition and Design
 Objective:
◦ Organize the data to suit business requirements
◦ Define data components, structures and processes to enable shared use and collaboration
◦ Define rules for format, domain, structure and relationships
 Issues
◦ Data modeling is complex – often requires unsatisfactory trade-offs
◦ As data is distributed or shared, quality, structure and definition tend to worsen
◦ Provide sufficient flexibility for graceful evolution
 Types of Data Definition Artefacts
◦ Data Glossaries
◦ Meta Data
◦ Data Domains
◦ Data Structures
◦ Data Relationships
◦ Validation Rules
◦ Data Transformations
 Technology
◦ Data modeling tools (Erwin, Power Designer, XMLSpy, etc.)
◦ Metadata workbenches
◦ Business glossaries
Data Integration
 Objectives
◦ Data must usually be shared (copied or transferred) to realize its full value
◦ Shared data requires transformation, filtering, verification, security controls, calculations,
aggregations etc.
 Issues
◦ In most cases, data integration is done in an inconsistent manner
◦ Lack of enterprise standards and controls. Department silos
◦ Data can become less unreliable each time it is handled
◦ Rules for data integration poorly understood and documented
 Types of Data Integration
◦ EAI, SOA
◦ ETL
◦ Federation
 Technology
◦ ETL Tools: DataStage, Informatica, SSIS, Oracle ETL
◦ ESB: Web services, REST, Websphere, WebLogic
◦ Messaging servers: MQSeries, WebSphere, ActiveMQ, MessageBroker, Biztalk
◦ Federated databases
Data Access and Distribution
 Objectives:
◦ Data is of no use unless it can be efficiently and reliably consumed by client applications
◦ Data must be presented in a consistent, trusted and well understood way
 Issues
◦ Access patterns are very diverse
◦ Client requirements can cause conflicts that are difficult to resolve
 Types of Access
◦ SQL
◦ File APIs
◦ Dashboards
◦ Query engines
◦ BI tools
◦ Canned Reports
◦ Discovery tools
◦ APIs
◦ Published views and queries
◦ Publication and replication
 Technology
◦ Reporting tools: Cognos, BO, Tableau
◦ Data feeds: ETL
◦ Query languages: SQL, SAS, R, etc.
Data Analysis
 Objectives:
◦ Provide insight into business operations
◦ Assist in planning; provide predictions
 Issues:
◦ Complex and difficult to do right
◦ Skills shortages
 Types of Analysis
◦ Predictive analytics
◦ Correlation
◦ Pricing and underwriting
 Technology
◦ SAS, SSPS, R
◦ Machine Learning algorithms
More Details
 There are many aspects to consider in
good architecture, including:
◦ Business Glossary
◦ Metadata
◦ Data Discovery and Profiling
◦ Data Lineage
◦ Data Reconciliation
◦ Data Modeling
◦ Master Data
◦ Data Wrangling
◦ Data Pipelines
Business Glossary
 Business Glossary is the place where all
stakeholders can agree what the data
actually means mean and what data rules
apply
◦ In the real world, lack of precision and conflict
about terminology and meaning is common
 This information needs to be managed,
coordinated and subject to quality controls
 An essential first step to a robust enterprise
architecture
 Business stakeholders own the data
definitions
Meta Data
 For every data field, record and
process its useful to provide detailed
definitions, descriptions and rules that
govern production, use and storage of
any data item
 This information is managed,
coordinated and subject to quality
controls
 Targeted to technical stakeholders
Data Discovery and Profiling
 Data Discovery is about knowing
where the data arrives from, where it
goes to and where it is used
 Data profiling analyzes the content,
structure, and relationships within data
to uncover patterns and rules,
inconsistencies, anomalies, and
redundancies.
Data Lineage
 In many cases its important to know how
data has reached a particular point.
 Example: Why do reports A and B
disagree about last month’s revenues?
 Answer: Examine the data sources to
ensure that both reports are using the
same data sources.
 NOTE: Sometimes data can move a long
way and through many steps before it
ends up in a report
Data Synchronization and
Reconciliation
 When data exists in multiple locations,
there is always a danger that errors will
creep in
 Therefore as part of a robust control
process, its useful to compare the data in
source/target systems to find errors or
omissions
 This can however become very complex
as it may involve reconstructing the data
lineage of every item
Data Modeling
 Data modeling with 3NF and
dimensional modeling is relatively
mature, but the real world is complex,
which can make some models hard to
design, build and use
 NoSQL takes a different approach
◦ Data is stored as documents
◦ Data is flexible and dynamic
Master Data
 When data is generated and consumed
in multiple locations and systems its very
difficult to manage the data
 A Master Data Management (MDM)
repository can:
◦ Act as the corporate system of record
◦ Define the shared canonical data model
◦ Enforce data rules consistently across all
systems
◦ Resolve conflicts
◦ Detect and remove duplicates
Data Wrangling
 Data is often less than perfect
◦ Missing data
◦ Incompatible fields
◦ Semantic incompatibilities
 Data wrangling is the art of making
merging diverse data sources into a
consistent data store with consistent
semantics, formats and values
Data Pipelines
 A data pipeline is the process which handles the transfer of data
from one or more sources and delivers it to one or more targets
 A data pipeline may have the following stages:
1. Extraction
2. Cleansing
3. Transformation
4. Load
 Types of Transfer:
◦ Batch (ETL)
◦ Continuous/Real Time (SOA, Federation)
 Push
 Pull
 Issues:
◦ Completeness
◦ Accuracy
◦ Latency
◦ Access method
◦ Technology and tools
Data Pipelines – Data Extraction
 Data is extracted from a source
system and landed in a staging area
 Data extracts can be:
◦ Complete dumps of all data
◦ Changes only
 Principles:
◦ Capture Everything
◦ Land everything in a staging area
◦ Read once, write many
Data Pipelines - Data Cleansing
 Cleansing processes will fix or flag:
◦ Invalid data
◦ Missing data
◦ Inaccurate data
 Principles:
◦ Partition data into clean and rejected data.
◦ Generate rejection reports
◦ Land data into a zone for ‘clean’ data
◦ Errors should be corrected in the source
systems when detected
Data Pipelines - Data
Transformation
 Transformation is a data integration function
that modifies existing data or creates new data
through functions such as calculations and
aggregations.
 Common transformations include:
◦ Calculations
◦ Splits
◦ Joins
◦ Lookups
◦ Aggregations
◦ Filtering
 Principles
◦ Stage transformed files to publish ready landing
zones
Data Pipelines - Data Loading
 Data can be loaded:
◦ As files
◦ Via RDBMS utilities
◦ SQL
◦ Messages
 Principles
◦ Group loading files by target repository
Data Pipelines – Design
Practices
 For each stage of the data pipeline, its
advisable to create:
◦ Conceptual Models – what will happen
◦ Logical Models – how will it happen (no
technical details)
◦ Physical Models – Technical details to
realize the design
Case Study – Extraction Model
 Conceptual Model
◦ Extract 2013 policy records from system A and system B
 Logical Model
◦ Extract 2013 policy records from daily batch files generated
by system A
◦ Issue SQL queries to tables S, T in system B and store in
file X
 Physical Model
◦ From system A read file CS403.txt from directory x/y/z and
store extract in ….
◦ Eliminate all records in A that…
◦ From system B run SQL query Select * from Policies,
Clients … and store extract in file … with format….
◦ Etc
The Future
 Hortonworks claims HALF the worlds
data will soon be in Hadoop clusters
 For the first time in a generation,
relational databases are seriously
threatened by NOSQL data bases –
Redis, HBase, MongoDB, CouchDB, etc
 Data exploration tools such as Tableau,
QlikView are disrupting traditional BI
market
 In memory databases make it feasible to
eliminate the data mart altogether
Summary
 A Data Architecture is much more than
a collection of Erwin data models –
that’s just the starting point
 Coordination, synchronization and
collaboration are the hallmarks of a
successful data architecture

Contenu connexe

Tendances

Create a 'Customer 360' with Master Data Management for Financial Services
Create a 'Customer 360' with Master Data Management for Financial ServicesCreate a 'Customer 360' with Master Data Management for Financial Services
Create a 'Customer 360' with Master Data Management for Financial ServicesPerficient, Inc.
 
The Importance of MDM - Eternal Management of the Data Mind
The Importance of MDM - Eternal Management of the Data MindThe Importance of MDM - Eternal Management of the Data Mind
The Importance of MDM - Eternal Management of the Data MindDATAVERSITY
 
10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Management10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Managementibi
 
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...Christopher Bradley
 
Whitepaper on Master Data Management
Whitepaper on Master Data Management Whitepaper on Master Data Management
Whitepaper on Master Data Management Jagruti Dwibedi ITIL
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMDATAVERSITY
 
Sustaining Data Governance and Adding Value for the Long Term
Sustaining Data Governance and Adding Value for the Long TermSustaining Data Governance and Adding Value for the Long Term
Sustaining Data Governance and Adding Value for the Long TermFirst San Francisco Partners
 
A Step-by-Step Guide to Metadata Management
A Step-by-Step Guide to Metadata ManagementA Step-by-Step Guide to Metadata Management
A Step-by-Step Guide to Metadata ManagementSaachiShankar
 
The what, why, and how of master data management
The what, why, and how of master data managementThe what, why, and how of master data management
The what, why, and how of master data managementMohammad Yousri
 
Building a strong Data Management capability with TOGAF and ArchiMate
Building a strong Data Management capability with TOGAF and ArchiMateBuilding a strong Data Management capability with TOGAF and ArchiMate
Building a strong Data Management capability with TOGAF and ArchiMateBas van Gils
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...DATAVERSITY
 
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...DATAVERSITY
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesDATAVERSITY
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesLars E Martinsson
 
Data Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterData Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterDATAVERSITY
 

Tendances (20)

Create a 'Customer 360' with Master Data Management for Financial Services
Create a 'Customer 360' with Master Data Management for Financial ServicesCreate a 'Customer 360' with Master Data Management for Financial Services
Create a 'Customer 360' with Master Data Management for Financial Services
 
The Importance of MDM - Eternal Management of the Data Mind
The Importance of MDM - Eternal Management of the Data MindThe Importance of MDM - Eternal Management of the Data Mind
The Importance of MDM - Eternal Management of the Data Mind
 
10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Management10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Management
 
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
 
Whitepaper on Master Data Management
Whitepaper on Master Data Management Whitepaper on Master Data Management
Whitepaper on Master Data Management
 
Ebook - The Guide to Master Data Management
Ebook - The Guide to Master Data Management Ebook - The Guide to Master Data Management
Ebook - The Guide to Master Data Management
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDM
 
Sustaining Data Governance and Adding Value for the Long Term
Sustaining Data Governance and Adding Value for the Long TermSustaining Data Governance and Adding Value for the Long Term
Sustaining Data Governance and Adding Value for the Long Term
 
A Step-by-Step Guide to Metadata Management
A Step-by-Step Guide to Metadata ManagementA Step-by-Step Guide to Metadata Management
A Step-by-Step Guide to Metadata Management
 
The what, why, and how of master data management
The what, why, and how of master data managementThe what, why, and how of master data management
The what, why, and how of master data management
 
Building a strong Data Management capability with TOGAF and ArchiMate
Building a strong Data Management capability with TOGAF and ArchiMateBuilding a strong Data Management capability with TOGAF and ArchiMate
Building a strong Data Management capability with TOGAF and ArchiMate
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
 
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Data Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterData Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words Matter
 

En vedette

Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data GovernanceDATAVERSITY
 
Enterprise Master Data Architecture
Enterprise Master Data ArchitectureEnterprise Master Data Architecture
Enterprise Master Data ArchitectureBoris Otto
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data GovernanceChristopher Bradley
 
Enterprise data architecture of complex distributed applications & services
Enterprise data architecture of complex distributed applications & servicesEnterprise data architecture of complex distributed applications & services
Enterprise data architecture of complex distributed applications & servicesDavinder Kohli
 
White Paper - Overview Architecture For Enterprise Data Warehouses
White Paper -  Overview Architecture For Enterprise Data WarehousesWhite Paper -  Overview Architecture For Enterprise Data Warehouses
White Paper - Overview Architecture For Enterprise Data WarehousesDavid Walker
 
Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...
Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...
Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...Craig Milroy
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model DATUM LLC
 
Enterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsEnterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsBoris Otto
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesBoris Otto
 
Digital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by DanairatDigital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by DanairatDanairat Thanabodithammachari
 
Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!Sam Mandebvu
 
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overviewEnterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overviewWinton Winton
 
From Strategy to Implementation the Right Steps to Creating a BI Platform
From Strategy to Implementation the Right Steps to Creating a BI Platform From Strategy to Implementation the Right Steps to Creating a BI Platform
From Strategy to Implementation the Right Steps to Creating a BI Platform Edgewater
 
I'm not Crazy, It's the Situation - Coping with a Missing Loved One July 30 2016
I'm not Crazy, It's the Situation - Coping with a Missing Loved One July 30 2016I'm not Crazy, It's the Situation - Coping with a Missing Loved One July 30 2016
I'm not Crazy, It's the Situation - Coping with a Missing Loved One July 30 2016Maureen Trask
 
Delivering Data - Social Networking Personal
Delivering Data - Social Networking PersonalDelivering Data - Social Networking Personal
Delivering Data - Social Networking Personaliasaireland
 
Iasa, Iasa Ireland, ICS Jan 2011
Iasa, Iasa Ireland, ICS Jan 2011Iasa, Iasa Ireland, ICS Jan 2011
Iasa, Iasa Ireland, ICS Jan 2011iasaireland
 
Business Case and Architecture
Business Case and ArchitectureBusiness Case and Architecture
Business Case and Architectureiasaireland
 

En vedette (20)

Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
 
Enterprise Master Data Architecture
Enterprise Master Data ArchitectureEnterprise Master Data Architecture
Enterprise Master Data Architecture
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data Governance
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Enterprise data architecture of complex distributed applications & services
Enterprise data architecture of complex distributed applications & servicesEnterprise data architecture of complex distributed applications & services
Enterprise data architecture of complex distributed applications & services
 
White Paper - Overview Architecture For Enterprise Data Warehouses
White Paper -  Overview Architecture For Enterprise Data WarehousesWhite Paper -  Overview Architecture For Enterprise Data Warehouses
White Paper - Overview Architecture For Enterprise Data Warehouses
 
Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...
Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...
Chief Data Architect or Chief Data Officer: Connecting the Enterprise Data Ec...
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
 
Enterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsEnterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and Options
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Digital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by DanairatDigital Transformation, Enterprise Architecture, Big Data by Danairat
Digital Transformation, Enterprise Architecture, Big Data by Danairat
 
Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!Learn Togaf 9.1 in 100 slides!
Learn Togaf 9.1 in 100 slides!
 
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overviewEnterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
From Strategy to Implementation the Right Steps to Creating a BI Platform
From Strategy to Implementation the Right Steps to Creating a BI Platform From Strategy to Implementation the Right Steps to Creating a BI Platform
From Strategy to Implementation the Right Steps to Creating a BI Platform
 
I'm not Crazy, It's the Situation - Coping with a Missing Loved One July 30 2016
I'm not Crazy, It's the Situation - Coping with a Missing Loved One July 30 2016I'm not Crazy, It's the Situation - Coping with a Missing Loved One July 30 2016
I'm not Crazy, It's the Situation - Coping with a Missing Loved One July 30 2016
 
Delivering Data - Social Networking Personal
Delivering Data - Social Networking PersonalDelivering Data - Social Networking Personal
Delivering Data - Social Networking Personal
 
Iasa, Iasa Ireland, ICS Jan 2011
Iasa, Iasa Ireland, ICS Jan 2011Iasa, Iasa Ireland, ICS Jan 2011
Iasa, Iasa Ireland, ICS Jan 2011
 
Business Case and Architecture
Business Case and ArchitectureBusiness Case and Architecture
Business Case and Architecture
 

Similaire à Building the enterprise data architecture

Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Denodo
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platformsJamesAnderson599331
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatiaSatish Bhatia
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningPradnya Saval
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRyan Andhavarapu
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?Albert Hoitingh
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Data Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIData Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIDenodo
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerAntonios Chatzipavlis
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dataconomy Media
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxPriyadarshini648418
 

Similaire à Building the enterprise data architecture (20)

Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIData Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AI
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 

Dernier

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 

Dernier (20)

Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

Building the enterprise data architecture

  • 1. Building The Enterprise Data Architecture The Components, Processes and Tools of a Successful Enterprise Data Architecture
  • 2. Enterprise Data Serves Two Purposes  Running the Business ◦ Tracking transactions such as sales, invoices, payments, deliveries, payroll, benefits, etc.  Managing the Business ◦ Accounting, Budgeting and Planning ◦ Marketing ◦ Asset Management ◦ Process Optimization ◦ Strategy  Definition  Monitoring  Investment and resource allocation
  • 3. Why is Data Architecture So Hard?  Complex ◦ Data IS complex ◦ The world changes rapidly, hard to keep up  Shared Data and Competing Stakeholder Interests ◦ Stakeholders have different definitions and uses for shared data ◦ Conflicts are often resolved with incompatible models and duplication  Data Proliferation ◦ Useful data is replicated or transferred to other systems (e.g. accounting) ◦ How and where data is used is often unknown  Data can be inaccurate ◦ Data entry is error prone and data duplication is common ◦ Calculation errors are surprisingly common  Resource Intensive ◦ Databases can be very large ◦ Managing data can consume a lot of staff resources ◦ Tools and licenses can be pricey
  • 4. A Robust Data Architecture Should Be  Trusted  Accessible  Timely  Integrated  Consistent  Comprehensive (Complete)  Verified  Documented  Discoverable  Flexible  Scalable  Cost Effective  Transparent and Visible  Safe  Governed  Auditable (Lineage and eDiscovery)  Secure
  • 5. The Data Lifecycle  Data goes through a lifecycle with phases such as: ◦ Creation/capture ◦ Modification ◦ Processing  Aggregation, joining, filtering, transformation ◦ Consumption:  Product/service delivery/fulfillment  Reporting  Statistical analysis  Exploration (Data discovery, search) ◦ Archiving/destruction  Data lifecycle can be complex and difficult to manage
  • 6. A Successful Data Architecture Must Deal With 1. Data Repositories 2. Data Capture and Ingestion 3. Data Definition and Design 4. Data Integration 5. Data Access and Distribution 6. Data Analysis And More….
  • 7. Data Repositories  Objective: ◦ Repositories must be extremely solid and reliable.  Issues ◦ Complexity of tools ◦ Licensing, resource and skill costs can be high ◦ Tolerance for failure is extremely low  Types of Data Repositories ◦ Application ◦ Reference/Master ◦ Meta Data ◦ Data Warehouse ◦ Discovery ◦ Archive ◦ Offsite/DR Repositories  Technology ◦ Traditional RDBMS: Oracle, DB2, SQL Server, Informix ◦ Data Warehouse: Neteeza, Greenplum, Exadata, etc ◦ Big Data: Hadoop, CouchDB, Redis, HBase, CouchDB
  • 8. Data Capture and Ingestion  Objective: ◦ Capture all relevant business data ◦ Ensure data is both accurate and complete at point of capture  Issues ◦ Great diversity (online entry, ftp, APIs, real time streams, etc., unstructured data) ◦ Incoming data is often unreliable, incomplete ◦ Big data explosion  Types of Data Capture ◦ Online ◦ File transfers ◦ Social media (real time) ◦ Logs (web, security, o/s) ◦ Data subscriptions (address databases, etc.)  Technology ◦ Online Applications (JEE, .NET, COBOL, etc.) ◦ File transfer: FTP ◦ APIs: Web services, REST, SOAP
  • 9. Data Definition and Design  Objective: ◦ Organize the data to suit business requirements ◦ Define data components, structures and processes to enable shared use and collaboration ◦ Define rules for format, domain, structure and relationships  Issues ◦ Data modeling is complex – often requires unsatisfactory trade-offs ◦ As data is distributed or shared, quality, structure and definition tend to worsen ◦ Provide sufficient flexibility for graceful evolution  Types of Data Definition Artefacts ◦ Data Glossaries ◦ Meta Data ◦ Data Domains ◦ Data Structures ◦ Data Relationships ◦ Validation Rules ◦ Data Transformations  Technology ◦ Data modeling tools (Erwin, Power Designer, XMLSpy, etc.) ◦ Metadata workbenches ◦ Business glossaries
  • 10. Data Integration  Objectives ◦ Data must usually be shared (copied or transferred) to realize its full value ◦ Shared data requires transformation, filtering, verification, security controls, calculations, aggregations etc.  Issues ◦ In most cases, data integration is done in an inconsistent manner ◦ Lack of enterprise standards and controls. Department silos ◦ Data can become less unreliable each time it is handled ◦ Rules for data integration poorly understood and documented  Types of Data Integration ◦ EAI, SOA ◦ ETL ◦ Federation  Technology ◦ ETL Tools: DataStage, Informatica, SSIS, Oracle ETL ◦ ESB: Web services, REST, Websphere, WebLogic ◦ Messaging servers: MQSeries, WebSphere, ActiveMQ, MessageBroker, Biztalk ◦ Federated databases
  • 11. Data Access and Distribution  Objectives: ◦ Data is of no use unless it can be efficiently and reliably consumed by client applications ◦ Data must be presented in a consistent, trusted and well understood way  Issues ◦ Access patterns are very diverse ◦ Client requirements can cause conflicts that are difficult to resolve  Types of Access ◦ SQL ◦ File APIs ◦ Dashboards ◦ Query engines ◦ BI tools ◦ Canned Reports ◦ Discovery tools ◦ APIs ◦ Published views and queries ◦ Publication and replication  Technology ◦ Reporting tools: Cognos, BO, Tableau ◦ Data feeds: ETL ◦ Query languages: SQL, SAS, R, etc.
  • 12. Data Analysis  Objectives: ◦ Provide insight into business operations ◦ Assist in planning; provide predictions  Issues: ◦ Complex and difficult to do right ◦ Skills shortages  Types of Analysis ◦ Predictive analytics ◦ Correlation ◦ Pricing and underwriting  Technology ◦ SAS, SSPS, R ◦ Machine Learning algorithms
  • 13. More Details  There are many aspects to consider in good architecture, including: ◦ Business Glossary ◦ Metadata ◦ Data Discovery and Profiling ◦ Data Lineage ◦ Data Reconciliation ◦ Data Modeling ◦ Master Data ◦ Data Wrangling ◦ Data Pipelines
  • 14. Business Glossary  Business Glossary is the place where all stakeholders can agree what the data actually means mean and what data rules apply ◦ In the real world, lack of precision and conflict about terminology and meaning is common  This information needs to be managed, coordinated and subject to quality controls  An essential first step to a robust enterprise architecture  Business stakeholders own the data definitions
  • 15. Meta Data  For every data field, record and process its useful to provide detailed definitions, descriptions and rules that govern production, use and storage of any data item  This information is managed, coordinated and subject to quality controls  Targeted to technical stakeholders
  • 16. Data Discovery and Profiling  Data Discovery is about knowing where the data arrives from, where it goes to and where it is used  Data profiling analyzes the content, structure, and relationships within data to uncover patterns and rules, inconsistencies, anomalies, and redundancies.
  • 17. Data Lineage  In many cases its important to know how data has reached a particular point.  Example: Why do reports A and B disagree about last month’s revenues?  Answer: Examine the data sources to ensure that both reports are using the same data sources.  NOTE: Sometimes data can move a long way and through many steps before it ends up in a report
  • 18. Data Synchronization and Reconciliation  When data exists in multiple locations, there is always a danger that errors will creep in  Therefore as part of a robust control process, its useful to compare the data in source/target systems to find errors or omissions  This can however become very complex as it may involve reconstructing the data lineage of every item
  • 19. Data Modeling  Data modeling with 3NF and dimensional modeling is relatively mature, but the real world is complex, which can make some models hard to design, build and use  NoSQL takes a different approach ◦ Data is stored as documents ◦ Data is flexible and dynamic
  • 20. Master Data  When data is generated and consumed in multiple locations and systems its very difficult to manage the data  A Master Data Management (MDM) repository can: ◦ Act as the corporate system of record ◦ Define the shared canonical data model ◦ Enforce data rules consistently across all systems ◦ Resolve conflicts ◦ Detect and remove duplicates
  • 21. Data Wrangling  Data is often less than perfect ◦ Missing data ◦ Incompatible fields ◦ Semantic incompatibilities  Data wrangling is the art of making merging diverse data sources into a consistent data store with consistent semantics, formats and values
  • 22. Data Pipelines  A data pipeline is the process which handles the transfer of data from one or more sources and delivers it to one or more targets  A data pipeline may have the following stages: 1. Extraction 2. Cleansing 3. Transformation 4. Load  Types of Transfer: ◦ Batch (ETL) ◦ Continuous/Real Time (SOA, Federation)  Push  Pull  Issues: ◦ Completeness ◦ Accuracy ◦ Latency ◦ Access method ◦ Technology and tools
  • 23. Data Pipelines – Data Extraction  Data is extracted from a source system and landed in a staging area  Data extracts can be: ◦ Complete dumps of all data ◦ Changes only  Principles: ◦ Capture Everything ◦ Land everything in a staging area ◦ Read once, write many
  • 24. Data Pipelines - Data Cleansing  Cleansing processes will fix or flag: ◦ Invalid data ◦ Missing data ◦ Inaccurate data  Principles: ◦ Partition data into clean and rejected data. ◦ Generate rejection reports ◦ Land data into a zone for ‘clean’ data ◦ Errors should be corrected in the source systems when detected
  • 25. Data Pipelines - Data Transformation  Transformation is a data integration function that modifies existing data or creates new data through functions such as calculations and aggregations.  Common transformations include: ◦ Calculations ◦ Splits ◦ Joins ◦ Lookups ◦ Aggregations ◦ Filtering  Principles ◦ Stage transformed files to publish ready landing zones
  • 26. Data Pipelines - Data Loading  Data can be loaded: ◦ As files ◦ Via RDBMS utilities ◦ SQL ◦ Messages  Principles ◦ Group loading files by target repository
  • 27. Data Pipelines – Design Practices  For each stage of the data pipeline, its advisable to create: ◦ Conceptual Models – what will happen ◦ Logical Models – how will it happen (no technical details) ◦ Physical Models – Technical details to realize the design
  • 28. Case Study – Extraction Model  Conceptual Model ◦ Extract 2013 policy records from system A and system B  Logical Model ◦ Extract 2013 policy records from daily batch files generated by system A ◦ Issue SQL queries to tables S, T in system B and store in file X  Physical Model ◦ From system A read file CS403.txt from directory x/y/z and store extract in …. ◦ Eliminate all records in A that… ◦ From system B run SQL query Select * from Policies, Clients … and store extract in file … with format…. ◦ Etc
  • 29. The Future  Hortonworks claims HALF the worlds data will soon be in Hadoop clusters  For the first time in a generation, relational databases are seriously threatened by NOSQL data bases – Redis, HBase, MongoDB, CouchDB, etc  Data exploration tools such as Tableau, QlikView are disrupting traditional BI market  In memory databases make it feasible to eliminate the data mart altogether
  • 30. Summary  A Data Architecture is much more than a collection of Erwin data models – that’s just the starting point  Coordination, synchronization and collaboration are the hallmarks of a successful data architecture