Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Data Warehouse Basic Guide

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 25 Publicité

Data Warehouse Basic Guide

Télécharger pour lire hors ligne

Know different types of tips about Importance of dataware housing, Data Cleansing and Extracting etc . For more details visit: http://www.skylinecollege.com/business-analytics-course

Know different types of tips about Importance of dataware housing, Data Cleansing and Extracting etc . For more details visit: http://www.skylinecollege.com/business-analytics-course

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (20)

Publicité
Publicité

Plus récents (20)

Data Warehouse Basic Guide

  1. 1. Data Warehouse DefinitionDefinition Importance of Data WarehouseImportance of Data Warehouse Its ComponentsIts Components Two Data Warehousing StrategiesTwo Data Warehousing Strategies ETL ProcessesETL Processes For a Successful WarehouseFor a Successful Warehouse Data Warehouse PitfallsData Warehouse Pitfalls
  2. 2. Data Warehouse  A subject oriented, integrated, time-variant, non-volatile collection of data in support of management decisions (Bill Inmon)  Subject oriented -- data are organized around sales, products, etc.  Integrated -- data are integrated to provide a comprehensive view  Time variant -- historical data are maintained  Nonvolatile -- data are not updated by users
  3. 3. Limitations of Traditional Databases  lack of on-line historical data  residing in different operational systems  extremely poor query performance  operational database designs not suited for decision support
  4. 4. The Importance of Data Warehousing  More cost – effective decision making  Increase quality and flexibility of enterprise analysis as data warehouse contain accurate and reliable data  Ability to maintain better customer relationships  Unlimited analyses of enterprise information
  5. 5. Components of Data warehouse  Summarized data  Basically of two type: 1) Lightly (departmental information) 2) Highly (enterprise wide decision)  Current detail  Comes directly from operational system  But stored by subject area and represent entire organization not a department  System of record  Maintaining the source of record  Integration and transformation Programs  Programs that convert an application – specific data to enterprise data
  6. 6. Cont..  Performs many function like  Reformatting, recalculating  Adding time element  Identifying the default value  Summarizing and merging the data  Filling up the blank fields  Archives  Contain old data which hold some amount of significance to the organization  Used for trend analysis  Metadata  Control access and analysis of the data warehouse contents  To manage and control data warehouse creation and maintenance
  7. 7. Two Data Warehousing Strategies  Enterprise-wide warehouse, top down, the Inmon methodology  Data mart, bottom up, the Kimball methodology  When properly executed, both result in an enterprise-wide data warehouse
  8. 8. The Data Mart Strategy  The most common approach  Begins with a single mart and are added over time for more subject areas  Relatively inexpensive and easy to implement  Can be used as a proof of concept for data warehousing  Requires an overall integration plan
  9. 9. The Enterprise-wide Strategy  A comprehensive warehouse is built initially  An initial dependent data mart is built using a subset of the data in the warehouse  Additional data marts are built using subsets of the data in the warehouse  Like all complex projects, it is expensive, time consuming, and prone to failure  When successful, it results in an integrated, scalable warehouse
  10. 10. ETL Processes  Extraction, Transformation, and Loading Process  The “plumbing” work of data warehousing  Data are moved from source to target data bases  A very costly, time consuming part of data warehousing
  11. 11. Sample ETL Tools  Teradata Warehouse Builder from Teradata  DataStage from Ascential Software  SAS System from SAS Institute  Power Mart/Power Center from Informatica  Sagent Solution from Sagent Software
  12. 12. Reasons for “Dirty” Data • Dummy Values • Absence of Data • Multipurpose Fields • Inappropriate Use of Address Lines • Violation of Business Rules • Non-Unique Identifiers • Data Integration Problems
  13. 13. I. Data Cleansing and Extracting  Source systems contain “dirty data” that must be cleansed  ETL software contains rudimentary data cleansing capabilities  Specialized data cleansing software is often used. Important for performing name and address correction and householding functions  Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
  14. 14. Steps in Data Cleansing  Parsing  Correcting  Standardizing  Matching  Consolidating
  15. 15. Parsing  Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files.  Examples include parsing the first, middle, and last name; street number and street name; and city and state.
  16. 16. Correcting  Corrects parsed individual data components using sophisticated data algorithms and secondary data sources.  Example include replacing a vanity address and adding a zip code.
  17. 17. Standardizing  Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.  Examples include adding a pre name, replacing a nickname, and using a preferred street name
  18. 18. Matching  Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.  Examples include identifying similar names and addresses.
  19. 19. Consolidating • Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.
  20. 20. II. Data Transformation  Transforms the data in accordance with the business rules and standards that have been established  Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates
  21. 21. III. Data Loading  Data are physically moved to the data warehouse  The loading takes place within a “load window”  The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for operational applications
  22. 22. For a Successful Warehouse  From day one establish that warehousing is a joint user/builder project  Establish that maintaining data quality will be an ONGOING joint user/builder responsibility  Train the users one step at a time  Consider doing a high level corporate data model in no more than three weeks  Look closely at the data extracting, cleaning, and loading tools
  23. 23. Cont..  Determine a plan to test the integrity of the data in the warehouse  From the start get warehouse users in the habit of 'testing' complex queries  Coordinate system roll-out with network administration personnel  Implement a user accessible automated directory to information stored in the warehouse
  24. 24. Data Warehouse Pitfalls  Many warehouse end users will be trained and never or seldom apply their training  Large scale data warehousing can become an exercise in data homogenizing  Loading information only because it is available  Providing no maintenance to the data warehouse
  25. 25. Contact Us Business Name: Skyline Business School Address: Hauz Khas Enclave,  New Delhi ­ 110 016, India. Phone: 91­11­26864848,:91­11­26866968 E­mail: info@skylinecollege.com Resource:  www.skylinecollege.com/our­programmes/pgp­data­warehousing

×