SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
DATA VAULT 2.0:
Big Data Meets DataWarehousing
DEAN HALLMAN
WIRESOFT, LLC
DATA WAREHOUSING VS BIG DATA
• Does Big Data replace Data Warehousing? Or do I need both?
• What’s the difference:
• Between the data flowing into a data warehouse vs big data tools?
• Between the ingestion processes and infrastructure?
• Data Lakes arrived with Big Data, so are they useful in Data Warehousing?
• How should I model my data in EDW?
• 3NF, Star Schema, same as my operational data stores?
• DataVault 2.0
• Graph Databases
• What is an architecture that allows both to co-exists effectively?
Impressions
(Big Data)
Core
Business
Services
Core
Business
Services
Core
Business
Services
Operational
Data Stores
D
A
T
A
L
A
K
E
Enterprise DataWarehouse
CDC,
snapshot
Internet
External
Data
Sources
Big Data Toolchain
Batch
(SerDe)
StagingVault
RawVault
BusinessVault
InformationMart
Streaming
(Kafka)
Streaming
Analytics
Batch Analytics
(Hadoop)
Schema-on-Read
Schema-on-Write
Data Source
Landing
Clients
ETL
ELT
!
!
!
! BI Tools
Monitoring
Discovery
Audit
clickstream
(SerDe)
ETL ETL
Impressions
(Big Data)
Core
Business
Services
Core
Business
Services
Core
Business
Services
Operational
Data Stores
D
A
T
A
L
A
K
E
Enterprise DataWarehouse
CDC,
snapshot
Internet
External
Data
Sources
Big Data Toolchain
Batch
(SerDe)
StagingVault
RawVault
BusinessVault
InformationMart
Streaming
(Kafka)
Streaming
Analytics
Batch Analytics
(Hadoop)
Schema-on-Read
Schema-on-Write
Data Source
Landing
Clients
ETL
ELT
!
!
!
! BI Tools
Monitoring
Discovery
Audit
clickstream
(SerDe)
ETL ETL
THE DATA MODEL
DATA VAULT 2.0
COMMON FOUNDATIONALWAREHOUSE ARCHITECTURE
• “The Data Vault Model is a detail oriented, historical tracking and uniquely linked set
of normalized tables that support one or more functional areas of business. It is a
hybrid approach encompassing the best of breed between 3rd normal form (3NF)
and star schema.The design is flexible, scalable, consistent and adaptable to the
needs of the enterprise” -- Dan Linstedt, Creator of Data Vault
• Data loaded as-is from sources, no edits or cleanup
• Append-only to afford highest performance
• Agile & agnostic to changes in the operational store’s data model
• Essentially, a prescription for Layered Graph to Relational Mapping
DATA WAREHOUSING & DATA VAULT 2.0
• 60’s, 70’s, 80’s
• E.F. Codd => 3NF
• Bill Inmon invents DataWarehousing
concept
• Dr. Ralph Kimball popularizes Star
Schema design
• 90’s, 00’s:
• Dan Linstedt creates DataVault Model @
DOD
• 2014:
• Dan Introduces DataVault 2.0
Source:“What are Graph Databases andWhy should I care?“, by Dave Bechberger of Expero
SOLVE BY STAR SCHEMA ?
RELATIONAL VS GRAPH DATABASES
• Enterprise Grade
• Well-worn path
• SQL has been relatively stagnant vs programming languages
GRAPH DATA MODEL
Source: https://neo4j.com/developer/graph-database/
GRAPH DATABASE VS DATA VAULT
GRAPH DATABASE VS DATA VAULT
Flight
Base Dest Forecast
Record
Source
LoadDate Depart Gate
LGA 2018-10-11 1:25PM B27
CAE 2018-10-24 3:30PM A14
SFO 2018-09-06 8:55PM G19
RDU 2018-08-12 4:45PM C22
SERVICED_BY
Record Source Airport CAE
Load Date 2018-11-17
Source Id 20181117-32-983
Aircraft
Base Service FAA NTSB
Record
Source
LoadDate Model Tailno
United 2017-02-11 767 1477
Delta 2015-11-04 A6 2381
Alaska 2013-08-28 747 8312
Frontier 2016-07-19 182 1438
Record Source United Airlines
Load Date 2018-01-17
Source Id 2412c
SERVICED_BY
Base Dest Manifest
Record
Source
LoadDate Begin End
United 2017-02-11 2017-04-23 2017-09-23
Delta 2015-11-04 2015-12-01 2017-04-22
Alaska 2013-08-28 2013-09-14 2016-05-04
Frontier 2016-07-19 2016-08-02 2018-04-11
Record Source United Airlines
Load Date 2018-09-17
Hubs
Links
SatellitesTab
• Organizations which design systems ...
are constrained to produce designs
which are copies of the communication
structures of these organizations
- Mel Conway
FLIGHT
Base Dest Forecast
Record
Source
LoadDate Depart Gate
LGA 2018-10-11 1:25PM B27
CAE 2018-10-24 3:30PM A14
FLIGHT
Record Source Airport CAE
Load Date 2018-11-17
Source Id 20181117-32-983
Aircraft
Base Service FAA NTSB
Record
Source
LoadDate Model Tailno
United 2017-02-11 767 1477
Delta 2015-11-04 A6 2381
Alaska 2013-08-28 747 8312
Frontier 2016-07-19 182 1438
Record Source United Airlines
Load Date 2018-01-17
Source Id 2412c
Airport
Base Dest Manifest
Record
Source
LoadDate Begin End
United 2017-02-11 2017-04-23 2017-09-23
Delta 2015-11-04 2015-12-01 2017-04-22
Alaska 2013-08-28 2013-09-14 2016-05-04
Frontier 2016-07-19 2016-08-02 2018-04-11
Record Source United Airlines
Load Date 2018-09-17
Airline
Base Service FAA NTSB
Record
Source
LoadDate Model Tailno
United 2017-02-11 767 1477
Delta 2015-11-04 A6 2381
Record Source United Airlines
Load Date 2018-01-17
Source Id 2412c
Hubs
Links
SatellitesTab
Source: https://www.wherescape.com/solutions/project-types/data-vault-automation/
• Modeled after self-
organizing networks
• A Business Key identifies a
key concept in business.
• They have a business
meaning
• They are unique and have
very low propensity to
change
• Business keys change
only when the business
change
• Enables (forces) cross-
source modeling
Source: http://www.di.univr.it/documenti/OccorrenzaIns/matdid/matdid232240.pdf
Source: http://www.di.univr.it/documenti/OccorrenzaIns/matdid/matdid232240.pdf
Source: http://www.di.univr.it/documenti/OccorrenzaIns/matdid/matdid232240.pdf
DATA VAULT 2.0 MODELING:
HUBS, LINKS & SATELLITES
@wiresoft/Pathfinder
Impressions
(Big Data)
Core
Business
Services
Core
Business
Services
Core
Business
Services
Operational
Data Stores
D
A
T
A
L
A
K
E
Enterprise DataWarehouse
CDC,
snapshot
Internet
External
Data
Sources
Big Data Toolchain
Batch
(SerDe)
StagingVault
RawVault
BusinessVault
InformationMart
Streaming
(Kafka)
Streaming
Analytics
Batch Analytics
(Hadoop)
Schema-on-Read
Schema-on-Write
Data Source
Landing
Clients
ETL
ELT
!
!
!
! BI Tools
Monitoring
Discovery
Audit
clickstream
(SerDe)
ETL ETL
THE DATA
Impressions vs Business Data
ENTERPRISE DATA SILOS
Small DataLarge DataBig Data
Describes the
user base
Describes the
Enterprise
Describes the
Product
Instance
Grain
Transaction
Grain
Audit Grain
Impression Grain
Big Data
Enterprise Data
Warehouse
Operational Data Stores
Impression
Analytics
Business
Analytics
External Data Sources
DATA GRANULARITY FUNNEL
Impressions
(Big Data)
Core
Business
Services
Core
Business
Services
Core
Business
Services
Operational
Data Stores
D
A
T
A
L
A
K
E
Enterprise DataWarehouse
CDC,
snapshot
Internet
External
Data
Sources
Big Data Toolchain
Batch
(SerDe)
StagingVault
RawVault
BusinessVault
InformationMart
Streaming
(Kafka)
Streaming
Analytics
Batch Analytics
(Hadoop)
Schema-on-Read
Schema-on-Write
Data Source
Landing
Clients
ETL
ELT
!
!
!
! BI Tools
Monitoring
Discovery
Audit
clickstream
(SerDe)
ETL ETL
DATA INGESTION
ETL vs ELT vs SerDe
ETL
VS
ELT
VS
SerDe
• Beware the Turing tar-pit, in which
everything is possible, but nothing of
interest is easy
- Alan Perlis
DATA CLASSIFICATION MATRIX:
DECLARATIVE VS INTERPRETIVE
Declarative Interpretive
HadoopRDBMS
Web Events
Media Player
DATA WAREHOUSING
• Deep Topic
• 60’s, 70’s, 80’s
• E.F. Codd => 3NF
• Bill Inmon invents Data Warehousing
concept
• Dr. Ralph Kimball popularizes Star Schema
design
• 90’s, 00’s:
• Dan Linstedt creates Data Vault Model @
DOD
• 2014:
• Dan Introduces Data Vault 2.0
• Data Warehouse vs Operational Data Stores
• Data Warehouse as Version Control System
• MapReduce, 2004, Google by Jeffery
Dean and Sanjay,“MAPREDUCE:
SIMPLIFIED DATA PROCESSING ON
LARGE CLUSTERS” , GFS
• Nutch 2005, Hadoop 2006, 2007 - Doug
Cutting
• What exactly is “Big Data”?
BIG DATA
Client
User
Interpreter
Analysis
UNSTRUCTURED USER EXPERIENCE
L
L n L ilossy
Client
User
Time Series
Event
Record
Analysis
STRUCTURED USER EXPERIENCE
losslessL p L p
L e
ETL OR SERDE ?
S3
Hadoop
Time Series
Event Record
Analysis
Deserializer
L e
L
d
L
m
Client
User
Serializer
L p
L
p
Eventlog.e Eventlog.d
L
e
Single Source
(Version Locked)
Kafka/Kinesis
L eInternet
ETL
ELT
(SerDe)
vs
Source: https://www.ironsidegroup.com/2015/03/01/etl-vs-elt-whats-the-big-difference/
Schema
On
Write
Schema
On
Read
OTHER CHALLENGES
• Satellites must be loaded chronologically
• Time-based scheduling vs data-availability scheduling
QUESTIONS?
• Contact:
Ø Dean Hallman
Ø rdhallman@gmail.com
Ø Linkedin: https://www.linkedin.com/in/dean-hallman/

Contenu connexe

Tendances

Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSAgile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSKent Graziano
 
Delta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkDelta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkGeorge Chow
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to GraphsNeo4j
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceJeffrey T. Pollock
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
 
Modern Data Architecture
Modern Data Architecture Modern Data Architecture
Modern Data Architecture Mark Hewitt
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalMichael Rainey
 
Ibm machine learning for z os
Ibm machine learning for z osIbm machine learning for z os
Ibm machine learning for z osCuneyt Goksu
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Kent Graziano
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataHortonworks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
 
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 

Tendances (20)

Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSAgile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
 
Delta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache SparkDelta Lake: Open Source Reliability w/ Apache Spark
Delta Lake: Open Source Reliability w/ Apache Spark
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to Graphs
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
 
Modern Data Architecture
Modern Data Architecture Modern Data Architecture
Modern Data Architecture
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 
Ibm machine learning for z os
Ibm machine learning for z osIbm machine learning for z os
Ibm machine learning for z os
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
 
Semantic Web For Dummies
Semantic Web For DummiesSemantic Web For Dummies
Semantic Web For Dummies
 
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 

Similaire à Big Data or Data Warehousing? How to Leverage Both in the Enterprise

Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 
10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 Kangaroot
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data IntegrationsPat Patterson
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
How Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackHow Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackFabian Hardt
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Datafreshdatabos
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...Patrick Van Renterghem
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekMark Kromer
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @IndixManoj Mahalingam
 
Best Bigquery ETL Tool
Best Bigquery ETL ToolBest Bigquery ETL Tool
Best Bigquery ETL ToolLyftron Data
 

Similaire à Big Data or Data Warehousing? How to Leverage Both in the Enterprise (20)

datavault2.pptx
datavault2.pptxdatavault2.pptx
datavault2.pptx
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16 10/ EnterpriseDB @ OPEN'16
10/ EnterpriseDB @ OPEN'16
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
How Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data StackHow Service Mesh Fits into the Modern Data Stack
How Service Mesh Fits into the Modern Data Stack
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @Indix
 
Best Bigquery ETL Tool
Best Bigquery ETL ToolBest Bigquery ETL Tool
Best Bigquery ETL Tool
 

Dernier

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 

Dernier (20)

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

Big Data or Data Warehousing? How to Leverage Both in the Enterprise

  • 1. DATA VAULT 2.0: Big Data Meets DataWarehousing DEAN HALLMAN WIRESOFT, LLC
  • 2. DATA WAREHOUSING VS BIG DATA • Does Big Data replace Data Warehousing? Or do I need both? • What’s the difference: • Between the data flowing into a data warehouse vs big data tools? • Between the ingestion processes and infrastructure? • Data Lakes arrived with Big Data, so are they useful in Data Warehousing? • How should I model my data in EDW? • 3NF, Star Schema, same as my operational data stores? • DataVault 2.0 • Graph Databases • What is an architecture that allows both to co-exists effectively?
  • 3. Impressions (Big Data) Core Business Services Core Business Services Core Business Services Operational Data Stores D A T A L A K E Enterprise DataWarehouse CDC, snapshot Internet External Data Sources Big Data Toolchain Batch (SerDe) StagingVault RawVault BusinessVault InformationMart Streaming (Kafka) Streaming Analytics Batch Analytics (Hadoop) Schema-on-Read Schema-on-Write Data Source Landing Clients ETL ELT ! ! ! ! BI Tools Monitoring Discovery Audit clickstream (SerDe) ETL ETL
  • 4. Impressions (Big Data) Core Business Services Core Business Services Core Business Services Operational Data Stores D A T A L A K E Enterprise DataWarehouse CDC, snapshot Internet External Data Sources Big Data Toolchain Batch (SerDe) StagingVault RawVault BusinessVault InformationMart Streaming (Kafka) Streaming Analytics Batch Analytics (Hadoop) Schema-on-Read Schema-on-Write Data Source Landing Clients ETL ELT ! ! ! ! BI Tools Monitoring Discovery Audit clickstream (SerDe) ETL ETL THE DATA MODEL
  • 5. DATA VAULT 2.0 COMMON FOUNDATIONALWAREHOUSE ARCHITECTURE • “The Data Vault Model is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema.The design is flexible, scalable, consistent and adaptable to the needs of the enterprise” -- Dan Linstedt, Creator of Data Vault • Data loaded as-is from sources, no edits or cleanup • Append-only to afford highest performance • Agile & agnostic to changes in the operational store’s data model • Essentially, a prescription for Layered Graph to Relational Mapping
  • 6. DATA WAREHOUSING & DATA VAULT 2.0 • 60’s, 70’s, 80’s • E.F. Codd => 3NF • Bill Inmon invents DataWarehousing concept • Dr. Ralph Kimball popularizes Star Schema design • 90’s, 00’s: • Dan Linstedt creates DataVault Model @ DOD • 2014: • Dan Introduces DataVault 2.0
  • 7.
  • 8. Source:“What are Graph Databases andWhy should I care?“, by Dave Bechberger of Expero
  • 9. SOLVE BY STAR SCHEMA ?
  • 10. RELATIONAL VS GRAPH DATABASES • Enterprise Grade • Well-worn path • SQL has been relatively stagnant vs programming languages
  • 11. GRAPH DATA MODEL Source: https://neo4j.com/developer/graph-database/
  • 12. GRAPH DATABASE VS DATA VAULT
  • 13. GRAPH DATABASE VS DATA VAULT
  • 14. Flight Base Dest Forecast Record Source LoadDate Depart Gate LGA 2018-10-11 1:25PM B27 CAE 2018-10-24 3:30PM A14 SFO 2018-09-06 8:55PM G19 RDU 2018-08-12 4:45PM C22 SERVICED_BY Record Source Airport CAE Load Date 2018-11-17 Source Id 20181117-32-983 Aircraft Base Service FAA NTSB Record Source LoadDate Model Tailno United 2017-02-11 767 1477 Delta 2015-11-04 A6 2381 Alaska 2013-08-28 747 8312 Frontier 2016-07-19 182 1438 Record Source United Airlines Load Date 2018-01-17 Source Id 2412c SERVICED_BY Base Dest Manifest Record Source LoadDate Begin End United 2017-02-11 2017-04-23 2017-09-23 Delta 2015-11-04 2015-12-01 2017-04-22 Alaska 2013-08-28 2013-09-14 2016-05-04 Frontier 2016-07-19 2016-08-02 2018-04-11 Record Source United Airlines Load Date 2018-09-17 Hubs Links SatellitesTab
  • 15. • Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations - Mel Conway
  • 16. FLIGHT Base Dest Forecast Record Source LoadDate Depart Gate LGA 2018-10-11 1:25PM B27 CAE 2018-10-24 3:30PM A14 FLIGHT Record Source Airport CAE Load Date 2018-11-17 Source Id 20181117-32-983 Aircraft Base Service FAA NTSB Record Source LoadDate Model Tailno United 2017-02-11 767 1477 Delta 2015-11-04 A6 2381 Alaska 2013-08-28 747 8312 Frontier 2016-07-19 182 1438 Record Source United Airlines Load Date 2018-01-17 Source Id 2412c Airport Base Dest Manifest Record Source LoadDate Begin End United 2017-02-11 2017-04-23 2017-09-23 Delta 2015-11-04 2015-12-01 2017-04-22 Alaska 2013-08-28 2013-09-14 2016-05-04 Frontier 2016-07-19 2016-08-02 2018-04-11 Record Source United Airlines Load Date 2018-09-17 Airline Base Service FAA NTSB Record Source LoadDate Model Tailno United 2017-02-11 767 1477 Delta 2015-11-04 A6 2381 Record Source United Airlines Load Date 2018-01-17 Source Id 2412c Hubs Links SatellitesTab
  • 18. • Modeled after self- organizing networks • A Business Key identifies a key concept in business. • They have a business meaning • They are unique and have very low propensity to change • Business keys change only when the business change • Enables (forces) cross- source modeling Source: http://www.di.univr.it/documenti/OccorrenzaIns/matdid/matdid232240.pdf
  • 19.
  • 22. DATA VAULT 2.0 MODELING: HUBS, LINKS & SATELLITES
  • 24. Impressions (Big Data) Core Business Services Core Business Services Core Business Services Operational Data Stores D A T A L A K E Enterprise DataWarehouse CDC, snapshot Internet External Data Sources Big Data Toolchain Batch (SerDe) StagingVault RawVault BusinessVault InformationMart Streaming (Kafka) Streaming Analytics Batch Analytics (Hadoop) Schema-on-Read Schema-on-Write Data Source Landing Clients ETL ELT ! ! ! ! BI Tools Monitoring Discovery Audit clickstream (SerDe) ETL ETL THE DATA Impressions vs Business Data
  • 25. ENTERPRISE DATA SILOS Small DataLarge DataBig Data Describes the user base Describes the Enterprise Describes the Product
  • 26. Instance Grain Transaction Grain Audit Grain Impression Grain Big Data Enterprise Data Warehouse Operational Data Stores Impression Analytics Business Analytics External Data Sources DATA GRANULARITY FUNNEL
  • 27. Impressions (Big Data) Core Business Services Core Business Services Core Business Services Operational Data Stores D A T A L A K E Enterprise DataWarehouse CDC, snapshot Internet External Data Sources Big Data Toolchain Batch (SerDe) StagingVault RawVault BusinessVault InformationMart Streaming (Kafka) Streaming Analytics Batch Analytics (Hadoop) Schema-on-Read Schema-on-Write Data Source Landing Clients ETL ELT ! ! ! ! BI Tools Monitoring Discovery Audit clickstream (SerDe) ETL ETL DATA INGESTION ETL vs ELT vs SerDe
  • 28. ETL VS ELT VS SerDe • Beware the Turing tar-pit, in which everything is possible, but nothing of interest is easy - Alan Perlis
  • 29. DATA CLASSIFICATION MATRIX: DECLARATIVE VS INTERPRETIVE Declarative Interpretive HadoopRDBMS Web Events Media Player
  • 30. DATA WAREHOUSING • Deep Topic • 60’s, 70’s, 80’s • E.F. Codd => 3NF • Bill Inmon invents Data Warehousing concept • Dr. Ralph Kimball popularizes Star Schema design • 90’s, 00’s: • Dan Linstedt creates Data Vault Model @ DOD • 2014: • Dan Introduces Data Vault 2.0 • Data Warehouse vs Operational Data Stores • Data Warehouse as Version Control System • MapReduce, 2004, Google by Jeffery Dean and Sanjay,“MAPREDUCE: SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS” , GFS • Nutch 2005, Hadoop 2006, 2007 - Doug Cutting • What exactly is “Big Data”? BIG DATA
  • 33. ETL OR SERDE ? S3 Hadoop Time Series Event Record Analysis Deserializer L e L d L m Client User Serializer L p L p Eventlog.e Eventlog.d L e Single Source (Version Locked) Kafka/Kinesis L eInternet
  • 35. OTHER CHALLENGES • Satellites must be loaded chronologically • Time-based scheduling vs data-availability scheduling
  • 36. QUESTIONS? • Contact: Ø Dean Hallman Ø rdhallman@gmail.com Ø Linkedin: https://www.linkedin.com/in/dean-hallman/