SlideShare une entreprise Scribd logo
1  sur  10
Data Flow-I
Extraction
By,
Dr. Dipti Patil
•Extract
– Reading the data from the source (based on the
data formats)
–Connecting and accessing the data from the source
–Scheduling the source system to get the data;
notifications for the same
–Capture the changed data
–Dump the data extracted to disk for availability
•Clean
–Ensure column properties like data types
–Enforce structure of data, dependencies
–Enforce data rules - direct ones as well as business
Data Flow II
•Conform
–Loading dimensions, sub-dimensions, facts
–Conforming the dimensions and facts
–Handling delayed data coming to dimensions and
facts
–Loading and updating aggregations
–Dump the delivered data to disk
•Deliver/operations
–Scheduling
–Job execution
–Recovery, failure handling and restart
–Quality checks
Extraction
•Crucial to know how to extract the data from
the source system
•Each source has distinctive characteristics and
need to be managed accordingly
•Integration with different systems is required
like
–Database management system
– Operating systems
–Hardware
–Communication protocols
•It is important to logically map the data from
Physical to logical data mapping I
•Crucial to have a clean and cohesive data within
the data warehouse.
Steps for mapping before physical ETL
process:
1. Planning of ETL process including a
defined logical data map.
Called the data lineage report. Logical data
map is the foundation
of metadata.
2. Identify the data sources to be used for
Physical to logical data mapping II
4. Go over the data lineage and business rules for extracting,
transforming and loading. This step must include the data warehouse
architects, business users, developers and QA personnel.
5. Design of complete ETL system with details of all fact and
dimension tables as a whole
6. Ensure the correctness of the computations and formulations
against the business requirements. This must involve all the members
of the data warehouse building team, architects as well as members
from the business teams.
Components of logical data map
1. Table name
2. Column name
3. Table type (fact, dimension, sub dimension,
supporting etc.)
4. Slow changing dimension type (applicable for
dimension tables)
5. Source database to get this information from
6. Source table
7. Source column name
8. Transformation required (if any)
Components of logical data map
About SCDs
•Important factor to be considered when
loading the dimension tables
•Structure of the dimension table cannot tell
what the strategy is
•Columns have historic relevance and the
strategy required for capturing this history
should be known in advance.
•Changing the SCD after the design should be
managed well through a change management
process
SCD management
1.Type 0: passive. Values remain same for ever.
2. Type 1: Allows new data to overwrite old
data. So not required to track the history
3. Type 2: tracks historical data by creating
multiple records for a given natural key in the
dimensional tables with separate surrogate keys
and/or different version numbers.
4. Type 3: tracks changes using separate
columns and preserves limited history
5. Type 4: maintains older data in separate

Contenu connexe

Tendances

Olap, oltp and data mining
Olap, oltp and data miningOlap, oltp and data mining
Olap, oltp and data miningzafrii
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
business analysis-Data warehousing
business analysis-Data warehousingbusiness analysis-Data warehousing
business analysis-Data warehousingDhilsath Fathima
 
Online analytical processing (olap) tools
Online analytical processing (olap) toolsOnline analytical processing (olap) tools
Online analytical processing (olap) toolskulkarnivaibhav
 
Data Archiving -Ramesh sap bw
Data Archiving -Ramesh sap bwData Archiving -Ramesh sap bw
Data Archiving -Ramesh sap bwramesh rao
 
Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)LizLavaveshkul
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?HEXANIKA
 
Data warehousing
Data warehousingData warehousing
Data warehousingAnne Lee
 
Get started with data migration
Get started with data migrationGet started with data migration
Get started with data migrationThinqloud
 
Hand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and ChallengesHand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and Challengesmark madsen
 
Day 6.4 extraction__lo
Day 6.4 extraction__loDay 6.4 extraction__lo
Day 6.4 extraction__lotovetrivel
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Day 8.1 system_admin_tasks
Day 8.1 system_admin_tasksDay 8.1 system_admin_tasks
Day 8.1 system_admin_taskstovetrivel
 

Tendances (20)

Olap, oltp and data mining
Olap, oltp and data miningOlap, oltp and data mining
Olap, oltp and data mining
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
business analysis-Data warehousing
business analysis-Data warehousingbusiness analysis-Data warehousing
business analysis-Data warehousing
 
jagadeesh updated
jagadeesh updatedjagadeesh updated
jagadeesh updated
 
Online analytical processing (olap) tools
Online analytical processing (olap) toolsOnline analytical processing (olap) tools
Online analytical processing (olap) tools
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Data Archiving -Ramesh sap bw
Data Archiving -Ramesh sap bwData Archiving -Ramesh sap bw
Data Archiving -Ramesh sap bw
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)
 
Database migration
Database migrationDatabase migration
Database migration
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Get started with data migration
Get started with data migrationGet started with data migration
Get started with data migration
 
Hand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and ChallengesHand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and Challenges
 
Day 6.4 extraction__lo
Day 6.4 extraction__loDay 6.4 extraction__lo
Day 6.4 extraction__lo
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Day 8.1 system_admin_tasks
Day 8.1 system_admin_tasksDay 8.1 system_admin_tasks
Day 8.1 system_admin_tasks
 
Database migration
Database migrationDatabase migration
Database migration
 

En vedette

ETL Validator: Flat File Validation
ETL Validator: Flat File ValidationETL Validator: Flat File Validation
ETL Validator: Flat File ValidationDatagaps Inc
 
Managing users & tables using Oracle Enterprise Manage
Managing users & tables using Oracle Enterprise ManageManaging users & tables using Oracle Enterprise Manage
Managing users & tables using Oracle Enterprise ManageNR Computer Learning Center
 
Capacity Management of an ETL System
Capacity Management of an ETL SystemCapacity Management of an ETL System
Capacity Management of an ETL SystemASHOK BHATLA
 
ETL Validator: Creating Data Model
ETL Validator: Creating Data ModelETL Validator: Creating Data Model
ETL Validator: Creating Data ModelDatagaps Inc
 
Crossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latestCrossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latestCrossref
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...Amazon Web Services
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDMKousik Mukherjee
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...Christopher Bradley
 
State of Digital Transformation 2016. Altimeter Report
State of Digital Transformation 2016. Altimeter ReportState of Digital Transformation 2016. Altimeter Report
State of Digital Transformation 2016. Altimeter ReportDen Reymer
 
Gartner: Top 10 Strategic Technology Trends 2016
Gartner: Top 10 Strategic Technology Trends 2016Gartner: Top 10 Strategic Technology Trends 2016
Gartner: Top 10 Strategic Technology Trends 2016Den Reymer
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Quack Chat: Diving into Data Governance
Quack Chat: Diving into Data Governance Quack Chat: Diving into Data Governance
Quack Chat: Diving into Data Governance IDERA Software
 
Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Den Reymer
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

En vedette (20)

Manage users & tables in Oracle Database
Manage users & tables in Oracle DatabaseManage users & tables in Oracle Database
Manage users & tables in Oracle Database
 
ETL Validator: Flat File Validation
ETL Validator: Flat File ValidationETL Validator: Flat File Validation
ETL Validator: Flat File Validation
 
Managing users & tables using Oracle Enterprise Manage
Managing users & tables using Oracle Enterprise ManageManaging users & tables using Oracle Enterprise Manage
Managing users & tables using Oracle Enterprise Manage
 
Capacity Management of an ETL System
Capacity Management of an ETL SystemCapacity Management of an ETL System
Capacity Management of an ETL System
 
Oracle Tablespace - Basic
Oracle Tablespace - BasicOracle Tablespace - Basic
Oracle Tablespace - Basic
 
ETL Validator: Creating Data Model
ETL Validator: Creating Data ModelETL Validator: Creating Data Model
ETL Validator: Creating Data Model
 
Crossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latestCrossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latest
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDM
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...
 
State of Digital Transformation 2016. Altimeter Report
State of Digital Transformation 2016. Altimeter ReportState of Digital Transformation 2016. Altimeter Report
State of Digital Transformation 2016. Altimeter Report
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Gartner: Top 10 Strategic Technology Trends 2016
Gartner: Top 10 Strategic Technology Trends 2016Gartner: Top 10 Strategic Technology Trends 2016
Gartner: Top 10 Strategic Technology Trends 2016
 
Data mining
Data miningData mining
Data mining
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Quack Chat: Diving into Data Governance
Quack Chat: Diving into Data Governance Quack Chat: Diving into Data Governance
Quack Chat: Diving into Data Governance
 
Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similaire à Data flow in Extraction of ETL data warehousing

Survey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data WarehousesSurvey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data WarehousesEtisalat
 
AIS PPt.pptx
AIS PPt.pptxAIS PPt.pptx
AIS PPt.pptxdereje33
 
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSkillwise Group
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform LoadABDUL KHALIQ
 
Database_Design.ppt
Database_Design.pptDatabase_Design.ppt
Database_Design.pptNadiSarj2
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptRafiulHasan19
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptxsharpan
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architectureanicewick
 
Database development progress(database)
Database development progress(database)Database development progress(database)
Database development progress(database)welcometofacebook
 
Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxNurulIzrin
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeSaurabh K. Gupta
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDhilsath Fathima
 
ETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse FundamentalsETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse FundamentalsSOMASUNDARAM T
 

Similaire à Data flow in Extraction of ETL data warehousing (20)

Data flow ii extract
Data flow   ii extractData flow   ii extract
Data flow ii extract
 
Chapter 6.pptx
Chapter 6.pptxChapter 6.pptx
Chapter 6.pptx
 
Survey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data WarehousesSurvey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data Warehouses
 
AIS PPt.pptx
AIS PPt.pptxAIS PPt.pptx
AIS PPt.pptx
 
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
 
Database design
Database designDatabase design
Database design
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
 
Database_Design.ppt
Database_Design.pptDatabase_Design.ppt
Database_Design.ppt
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptx
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 
Database development progress(database)
Database development progress(database)Database development progress(database)
Database development progress(database)
 
Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptx
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousing
 
ETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse FundamentalsETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse Fundamentals
 
Intro.pptx
Intro.pptxIntro.pptx
Intro.pptx
 
Introduction to Databases by Dr. Kamal Gulati
Introduction to Databases by Dr. Kamal GulatiIntroduction to Databases by Dr. Kamal Gulati
Introduction to Databases by Dr. Kamal Gulati
 

Dernier

怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 

Dernier (20)

怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 

Data flow in Extraction of ETL data warehousing

  • 2. •Extract – Reading the data from the source (based on the data formats) –Connecting and accessing the data from the source –Scheduling the source system to get the data; notifications for the same –Capture the changed data –Dump the data extracted to disk for availability •Clean –Ensure column properties like data types –Enforce structure of data, dependencies –Enforce data rules - direct ones as well as business
  • 3. Data Flow II •Conform –Loading dimensions, sub-dimensions, facts –Conforming the dimensions and facts –Handling delayed data coming to dimensions and facts –Loading and updating aggregations –Dump the delivered data to disk •Deliver/operations –Scheduling –Job execution –Recovery, failure handling and restart –Quality checks
  • 4. Extraction •Crucial to know how to extract the data from the source system •Each source has distinctive characteristics and need to be managed accordingly •Integration with different systems is required like –Database management system – Operating systems –Hardware –Communication protocols •It is important to logically map the data from
  • 5. Physical to logical data mapping I •Crucial to have a clean and cohesive data within the data warehouse. Steps for mapping before physical ETL process: 1. Planning of ETL process including a defined logical data map. Called the data lineage report. Logical data map is the foundation of metadata. 2. Identify the data sources to be used for
  • 6. Physical to logical data mapping II 4. Go over the data lineage and business rules for extracting, transforming and loading. This step must include the data warehouse architects, business users, developers and QA personnel. 5. Design of complete ETL system with details of all fact and dimension tables as a whole 6. Ensure the correctness of the computations and formulations against the business requirements. This must involve all the members of the data warehouse building team, architects as well as members from the business teams.
  • 7. Components of logical data map 1. Table name 2. Column name 3. Table type (fact, dimension, sub dimension, supporting etc.) 4. Slow changing dimension type (applicable for dimension tables) 5. Source database to get this information from 6. Source table 7. Source column name 8. Transformation required (if any)
  • 9. About SCDs •Important factor to be considered when loading the dimension tables •Structure of the dimension table cannot tell what the strategy is •Columns have historic relevance and the strategy required for capturing this history should be known in advance. •Changing the SCD after the design should be managed well through a change management process
  • 10. SCD management 1.Type 0: passive. Values remain same for ever. 2. Type 1: Allows new data to overwrite old data. So not required to track the history 3. Type 2: tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. 4. Type 3: tracks changes using separate columns and preserves limited history 5. Type 4: maintains older data in separate