SlideShare une entreprise Scribd logo
1  sur  25
Data Warehouse
DefinitionDefinition
Importance of Data WarehouseImportance of Data Warehouse
Its ComponentsIts Components
Two Data Warehousing StrategiesTwo Data Warehousing Strategies
ETL ProcessesETL Processes
For a Successful WarehouseFor a Successful Warehouse
Data Warehouse PitfallsData Warehouse Pitfalls
Data Warehouse
 A subject oriented, integrated, time-variant, non-volatile
collection of data in support of management decisions (Bill
Inmon)
 Subject oriented -- data are organized around sales,
products, etc.
 Integrated -- data are integrated to provide a
comprehensive view
 Time variant -- historical data are maintained
 Nonvolatile -- data are not updated by users
Limitations of Traditional
Databases
 lack of on-line historical data
 residing in different operational systems
 extremely poor query performance
 operational database designs not suited for
decision support
The Importance of Data
Warehousing
 More cost – effective decision making
 Increase quality and flexibility of enterprise analysis as
data warehouse contain accurate and reliable data
 Ability to maintain better customer relationships
 Unlimited analyses of enterprise information
Components of Data warehouse
 Summarized data
 Basically of two type: 1) Lightly (departmental information)
2) Highly (enterprise wide decision)
 Current detail
 Comes directly from operational system
 But stored by subject area and represent entire organization not a department
 System of record
 Maintaining the source of record
 Integration and transformation Programs
 Programs that convert an application – specific data to enterprise data
Cont..
 Performs many function like
 Reformatting, recalculating
 Adding time element
 Identifying the default value
 Summarizing and merging the data
 Filling up the blank fields
 Archives
 Contain old data which hold some amount of significance to the organization
 Used for trend analysis
 Metadata
 Control access and analysis of the data warehouse contents

To manage and control data warehouse creation and maintenance
Two Data Warehousing
Strategies
 Enterprise-wide warehouse, top down, the
Inmon methodology
 Data mart, bottom up, the Kimball
methodology
 When properly executed, both result in an
enterprise-wide data warehouse
The Data Mart Strategy
 The most common approach
 Begins with a single mart and are added over
time for more subject areas
 Relatively inexpensive and easy to implement
 Can be used as a proof of concept for data
warehousing
 Requires an overall integration plan
The Enterprise-wide Strategy
 A comprehensive warehouse is built initially
 An initial dependent data mart is built using a
subset of the data in the warehouse
 Additional data marts are built using subsets of the
data in the warehouse
 Like all complex projects, it is expensive, time
consuming, and prone to failure
 When successful, it results in an integrated, scalable
warehouse
ETL Processes
 Extraction, Transformation, and Loading Process
 The “plumbing” work of data warehousing
 Data are moved from source to target data
bases
 A very costly, time consuming part of data
warehousing
Sample ETL Tools
 Teradata Warehouse Builder from Teradata
 DataStage from Ascential Software
 SAS System from SAS Institute
 Power Mart/Power Center from Informatica
 Sagent Solution from Sagent Software
Reasons for “Dirty” Data
• Dummy Values
• Absence of Data
• Multipurpose Fields
• Inappropriate Use of Address Lines
• Violation of Business Rules
• Non-Unique Identifiers
• Data Integration Problems
I. Data Cleansing and
Extracting
 Source systems contain “dirty data” that must be cleansed
 ETL software contains rudimentary data cleansing capabilities
 Specialized data cleansing software is often used. Important
for performing name and address correction and householding
functions
 Leading data cleansing vendors include Vality (Integrity),
Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
Steps in Data Cleansing
 Parsing
 Correcting
 Standardizing
 Matching
 Consolidating
Parsing
 Parsing locates and identifies individual data
elements in the source files and then isolates
these data elements in the target files.
 Examples include parsing the first, middle,
and last name; street number and street
name; and city and state.
Correcting
 Corrects parsed individual data components
using sophisticated data algorithms and
secondary data sources.
 Example include replacing a vanity address
and adding a zip code.
Standardizing
 Standardizing applies conversion routines to
transform data into its preferred (and
consistent) format using both standard and
custom business rules.
 Examples include adding a pre name,
replacing a nickname, and using a preferred
street name
Matching
 Searching and matching records within and
across the parsed, corrected and standardized
data based on predefined business rules to
eliminate duplications.
 Examples include identifying similar names
and addresses.
Consolidating
• Analyzing and identifying relationships between
matched records and consolidating/merging
them into ONE representation.
II. Data Transformation
 Transforms the data in accordance with the
business rules and standards that have been
established
 Example include: format changes,
deduplication, splitting up fields,
replacement of codes, derived values, and
aggregates
III. Data Loading
 Data are physically moved to the data
warehouse
 The loading takes place within a “load
window”
 The trend is to near real time updates of the
data warehouse as the warehouse is
increasingly used for operational applications
For a Successful Warehouse
 From day one establish that warehousing is a joint
user/builder project
 Establish that maintaining data quality will be an
ONGOING joint user/builder responsibility
 Train the users one step at a time
 Consider doing a high level corporate data model in no
more than three weeks
 Look closely at the data extracting, cleaning, and loading
tools
Cont..
 Determine a plan to test the integrity of the data in the
warehouse
 From the start get warehouse users in the habit of 'testing'
complex queries
 Coordinate system roll-out with network administration
personnel
 Implement a user accessible automated directory to information
stored in the warehouse
Data Warehouse Pitfalls
 Many warehouse end users will be trained and never or
seldom apply their training
 Large scale data warehousing can become an exercise in
data homogenizing
 Loading information only because it is available
 Providing no maintenance to the data warehouse
Contact Us
Business Name: Skyline Business School
Address: Hauz Khas Enclave, 
New Delhi ­ 110 016, India.
Phone: 91­11­26864848,:91­11­26866968
E­mail: info@skylinecollege.com
Resource: 
www.skylinecollege.com/our­programmes/pgp­data­warehousing

Contenu connexe

Tendances

Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEyad Manna
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse designines beltaief
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasEric Matthews
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional ModelingSunita Sahu
 

Tendances (20)

Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse design
 
Star schema
Star schemaStar schema
Star schema
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 

En vedette

Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration
Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud MigrationGamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration
Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud MigrationNuoDB
 
Distributed databases
Distributed databasesDistributed databases
Distributed databasessourabhdave
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningRohit Kumar
 
Coding serbia 2015
Coding serbia 2015Coding serbia 2015
Coding serbia 2015Matija Gobec
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase
 
2012 10 24_briefing room
2012 10 24_briefing room2012 10 24_briefing room
2012 10 24_briefing roomNuoDB
 
Industry experts webinar slides (final v1.0)
Industry experts webinar slides (final   v1.0)Industry experts webinar slides (final   v1.0)
Industry experts webinar slides (final v1.0)NuoDB
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data miningmaxonlinetr
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreCloudera, Inc.
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecyclebartlowe
 
Must taitung team
Must taitung teamMust taitung team
Must taitung teamMUSTHoover
 
Lecture 4.2 c++(comlete reference book)
Lecture 4.2 c++(comlete reference book)Lecture 4.2 c++(comlete reference book)
Lecture 4.2 c++(comlete reference book)Abu Saleh
 
Vittorio Tedeschi aponta motivos para Dicaprio ainda não ter conquistado o Oscar
Vittorio Tedeschi aponta motivos para Dicaprio ainda não ter conquistado o OscarVittorio Tedeschi aponta motivos para Dicaprio ainda não ter conquistado o Oscar
Vittorio Tedeschi aponta motivos para Dicaprio ainda não ter conquistado o OscarVittorioTedeschi
 
Introduction to NOSQL And MongoDB
Introduction to NOSQL And MongoDBIntroduction to NOSQL And MongoDB
Introduction to NOSQL And MongoDBBehrouz Bakhtiari
 
06module 16 building-lan
06module 16 building-lan06module 16 building-lan
06module 16 building-lansetioaribowo
 

En vedette (20)

Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration
Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud MigrationGamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration
Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Distributed databases
Distributed databasesDistributed databases
Distributed databases
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Coding serbia 2015
Coding serbia 2015Coding serbia 2015
Coding serbia 2015
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
 
2012 10 24_briefing room
2012 10 24_briefing room2012 10 24_briefing room
2012 10 24_briefing room
 
Industry experts webinar slides (final v1.0)
Industry experts webinar slides (final   v1.0)Industry experts webinar slides (final   v1.0)
Industry experts webinar slides (final v1.0)
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data Store
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
Presentación1 rodrigo
Presentación1 rodrigoPresentación1 rodrigo
Presentación1 rodrigo
 
Must taitung team
Must taitung teamMust taitung team
Must taitung team
 
Lecture 4.2 c++(comlete reference book)
Lecture 4.2 c++(comlete reference book)Lecture 4.2 c++(comlete reference book)
Lecture 4.2 c++(comlete reference book)
 
Vittorio Tedeschi aponta motivos para Dicaprio ainda não ter conquistado o Oscar
Vittorio Tedeschi aponta motivos para Dicaprio ainda não ter conquistado o OscarVittorio Tedeschi aponta motivos para Dicaprio ainda não ter conquistado o Oscar
Vittorio Tedeschi aponta motivos para Dicaprio ainda não ter conquistado o Oscar
 
Introduction to NOSQL And MongoDB
Introduction to NOSQL And MongoDBIntroduction to NOSQL And MongoDB
Introduction to NOSQL And MongoDB
 
06module 16 building-lan
06module 16 building-lan06module 16 building-lan
06module 16 building-lan
 

Similaire à Data Warehouse Basic Guide

Etl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering studentsEtl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering studentsutsav25khel
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroangshuman2387
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testingraianup
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingsumit621
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSamPrem3
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptPalaniKumarR2
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSumathiG8
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and ImplementationSHIKHA GAUTAM
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
Data Ware House Testing
Data Ware House TestingData Ware House Testing
Data Ware House Testingmanojpmat
 

Similaire à Data Warehouse Basic Guide (20)

D01 etl
D01 etlD01 etl
D01 etl
 
Etl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering studentsEtl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering students
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 vero
 
Unit 1
Unit 1Unit 1
Unit 1
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testing
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Data Ware House Testing
Data Ware House TestingData Ware House Testing
Data Ware House Testing
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 

Plus de thomasmary607

Economic, Environmental & Socio-cultural Considerations
Economic, Environmental & Socio-cultural ConsiderationsEconomic, Environmental & Socio-cultural Considerations
Economic, Environmental & Socio-cultural Considerationsthomasmary607
 
Destination Branding: Building Brand Equity
Destination Branding: Building Brand EquityDestination Branding: Building Brand Equity
Destination Branding: Building Brand Equitythomasmary607
 
SOME POINTS YOU SHOULD KNOW ABOUT BUSINESS COMMUNICATION
SOME POINTS YOU SHOULD KNOW ABOUT BUSINESS COMMUNICATIONSOME POINTS YOU SHOULD KNOW ABOUT BUSINESS COMMUNICATION
SOME POINTS YOU SHOULD KNOW ABOUT BUSINESS COMMUNICATIONthomasmary607
 
Health Benefits of Chamomile Tea
Health Benefits of Chamomile TeaHealth Benefits of Chamomile Tea
Health Benefits of Chamomile Teathomasmary607
 
Importance of Tourism Planning in Skyline College Delhi
Importance of Tourism Planning in Skyline College DelhiImportance of Tourism Planning in Skyline College Delhi
Importance of Tourism Planning in Skyline College Delhithomasmary607
 
Way To Manage Standards Of Your Business
Way To Manage Standards Of Your BusinessWay To Manage Standards Of Your Business
Way To Manage Standards Of Your Businessthomasmary607
 
GET TO KNOW ABOUT BARRIERS TO EFFECTIVE COMMUNICATION
GET TO KNOW ABOUT BARRIERS TO EFFECTIVE COMMUNICATIONGET TO KNOW ABOUT BARRIERS TO EFFECTIVE COMMUNICATION
GET TO KNOW ABOUT BARRIERS TO EFFECTIVE COMMUNICATIONthomasmary607
 
Seven C'S For Effective Communication India
Seven C'S For Effective Communication IndiaSeven C'S For Effective Communication India
Seven C'S For Effective Communication Indiathomasmary607
 
Secret Guide of Internet marketing
Secret Guide of Internet marketingSecret Guide of Internet marketing
Secret Guide of Internet marketingthomasmary607
 
Call Center Management
Call Center ManagementCall Center Management
Call Center Managementthomasmary607
 

Plus de thomasmary607 (11)

Economic, Environmental & Socio-cultural Considerations
Economic, Environmental & Socio-cultural ConsiderationsEconomic, Environmental & Socio-cultural Considerations
Economic, Environmental & Socio-cultural Considerations
 
Destination Branding: Building Brand Equity
Destination Branding: Building Brand EquityDestination Branding: Building Brand Equity
Destination Branding: Building Brand Equity
 
SOME POINTS YOU SHOULD KNOW ABOUT BUSINESS COMMUNICATION
SOME POINTS YOU SHOULD KNOW ABOUT BUSINESS COMMUNICATIONSOME POINTS YOU SHOULD KNOW ABOUT BUSINESS COMMUNICATION
SOME POINTS YOU SHOULD KNOW ABOUT BUSINESS COMMUNICATION
 
Health Benefits of Chamomile Tea
Health Benefits of Chamomile TeaHealth Benefits of Chamomile Tea
Health Benefits of Chamomile Tea
 
Importance of Tourism Planning in Skyline College Delhi
Importance of Tourism Planning in Skyline College DelhiImportance of Tourism Planning in Skyline College Delhi
Importance of Tourism Planning in Skyline College Delhi
 
Way To Manage Standards Of Your Business
Way To Manage Standards Of Your BusinessWay To Manage Standards Of Your Business
Way To Manage Standards Of Your Business
 
GET TO KNOW ABOUT BARRIERS TO EFFECTIVE COMMUNICATION
GET TO KNOW ABOUT BARRIERS TO EFFECTIVE COMMUNICATIONGET TO KNOW ABOUT BARRIERS TO EFFECTIVE COMMUNICATION
GET TO KNOW ABOUT BARRIERS TO EFFECTIVE COMMUNICATION
 
Seven C'S For Effective Communication India
Seven C'S For Effective Communication IndiaSeven C'S For Effective Communication India
Seven C'S For Effective Communication India
 
Secret Guide of Internet marketing
Secret Guide of Internet marketingSecret Guide of Internet marketing
Secret Guide of Internet marketing
 
Call Center Management
Call Center ManagementCall Center Management
Call Center Management
 
Benefits of ERP
Benefits of ERPBenefits of ERP
Benefits of ERP
 

Dernier

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 

Dernier (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 

Data Warehouse Basic Guide

  • 1. Data Warehouse DefinitionDefinition Importance of Data WarehouseImportance of Data Warehouse Its ComponentsIts Components Two Data Warehousing StrategiesTwo Data Warehousing Strategies ETL ProcessesETL Processes For a Successful WarehouseFor a Successful Warehouse Data Warehouse PitfallsData Warehouse Pitfalls
  • 2. Data Warehouse  A subject oriented, integrated, time-variant, non-volatile collection of data in support of management decisions (Bill Inmon)  Subject oriented -- data are organized around sales, products, etc.  Integrated -- data are integrated to provide a comprehensive view  Time variant -- historical data are maintained  Nonvolatile -- data are not updated by users
  • 3. Limitations of Traditional Databases  lack of on-line historical data  residing in different operational systems  extremely poor query performance  operational database designs not suited for decision support
  • 4. The Importance of Data Warehousing  More cost – effective decision making  Increase quality and flexibility of enterprise analysis as data warehouse contain accurate and reliable data  Ability to maintain better customer relationships  Unlimited analyses of enterprise information
  • 5. Components of Data warehouse  Summarized data  Basically of two type: 1) Lightly (departmental information) 2) Highly (enterprise wide decision)  Current detail  Comes directly from operational system  But stored by subject area and represent entire organization not a department  System of record  Maintaining the source of record  Integration and transformation Programs  Programs that convert an application – specific data to enterprise data
  • 6. Cont..  Performs many function like  Reformatting, recalculating  Adding time element  Identifying the default value  Summarizing and merging the data  Filling up the blank fields  Archives  Contain old data which hold some amount of significance to the organization  Used for trend analysis  Metadata  Control access and analysis of the data warehouse contents  To manage and control data warehouse creation and maintenance
  • 7. Two Data Warehousing Strategies  Enterprise-wide warehouse, top down, the Inmon methodology  Data mart, bottom up, the Kimball methodology  When properly executed, both result in an enterprise-wide data warehouse
  • 8. The Data Mart Strategy  The most common approach  Begins with a single mart and are added over time for more subject areas  Relatively inexpensive and easy to implement  Can be used as a proof of concept for data warehousing  Requires an overall integration plan
  • 9. The Enterprise-wide Strategy  A comprehensive warehouse is built initially  An initial dependent data mart is built using a subset of the data in the warehouse  Additional data marts are built using subsets of the data in the warehouse  Like all complex projects, it is expensive, time consuming, and prone to failure  When successful, it results in an integrated, scalable warehouse
  • 10. ETL Processes  Extraction, Transformation, and Loading Process  The “plumbing” work of data warehousing  Data are moved from source to target data bases  A very costly, time consuming part of data warehousing
  • 11. Sample ETL Tools  Teradata Warehouse Builder from Teradata  DataStage from Ascential Software  SAS System from SAS Institute  Power Mart/Power Center from Informatica  Sagent Solution from Sagent Software
  • 12. Reasons for “Dirty” Data • Dummy Values • Absence of Data • Multipurpose Fields • Inappropriate Use of Address Lines • Violation of Business Rules • Non-Unique Identifiers • Data Integration Problems
  • 13. I. Data Cleansing and Extracting  Source systems contain “dirty data” that must be cleansed  ETL software contains rudimentary data cleansing capabilities  Specialized data cleansing software is often used. Important for performing name and address correction and householding functions  Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
  • 14. Steps in Data Cleansing  Parsing  Correcting  Standardizing  Matching  Consolidating
  • 15. Parsing  Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files.  Examples include parsing the first, middle, and last name; street number and street name; and city and state.
  • 16. Correcting  Corrects parsed individual data components using sophisticated data algorithms and secondary data sources.  Example include replacing a vanity address and adding a zip code.
  • 17. Standardizing  Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.  Examples include adding a pre name, replacing a nickname, and using a preferred street name
  • 18. Matching  Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.  Examples include identifying similar names and addresses.
  • 19. Consolidating • Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.
  • 20. II. Data Transformation  Transforms the data in accordance with the business rules and standards that have been established  Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates
  • 21. III. Data Loading  Data are physically moved to the data warehouse  The loading takes place within a “load window”  The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for operational applications
  • 22. For a Successful Warehouse  From day one establish that warehousing is a joint user/builder project  Establish that maintaining data quality will be an ONGOING joint user/builder responsibility  Train the users one step at a time  Consider doing a high level corporate data model in no more than three weeks  Look closely at the data extracting, cleaning, and loading tools
  • 23. Cont..  Determine a plan to test the integrity of the data in the warehouse  From the start get warehouse users in the habit of 'testing' complex queries  Coordinate system roll-out with network administration personnel  Implement a user accessible automated directory to information stored in the warehouse
  • 24. Data Warehouse Pitfalls  Many warehouse end users will be trained and never or seldom apply their training  Large scale data warehousing can become an exercise in data homogenizing  Loading information only because it is available  Providing no maintenance to the data warehouse