Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Dwdm 2(data warehouse)

Prochain SlideShare
Data warehouse
Data warehouse
Chargement dans…3

Consultez-les par la suite

1 sur 44 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)


Plus récents (20)

Dwdm 2(data warehouse)

  2. 2. Shanu Sharma, CSE-ASET TOPICS COVERED  Definition of Data warehouse  Characteristics of Data Warehouse  Data mart  Components of data warehouse  Meta data  Applications of Data warehouse  OLTP v/s Data Warehouse
  3. 3. Shanu Sharma, CSE-ASET CONCEPT OF DATA WAREHOUSE Take all the data you already have in the organization, clean and transform it, and then provide useful strategic information.
  4. 4. Shanu Sharma, CSE-ASET DEFINITION OF DATA WAREHOUSE (1996 )Bill Inmon considered to be the father of data warehousing stated.  “A DW is a subject-oriented, integrated, non-volatile, time-variant collection of data in favor of decision- making”. Sean Kelly said Data in the data warehouse is “Separate available, integrated, time-stamped, subject- oriented, non-volatile, accessible”
  5. 5. Shanu Sharma, CSE-ASET CHARACTERISTICS OF DATA WAREHOUSE Subject Oriented Integrated Time Variant Non Volatile
  6. 6. Shanu Sharma, CSE-ASET 1. SUBJECT ORIENTED DATA  In operational systems data is stored by individual applications or business process. Like data about individual order , customer etc.  For example in banking industry data sets for saving or checking accounts contain data about that particular application.  But in DW data is stored by real world business objectives or events not by the applications.
  7. 7. Shanu Sharma, CSE-ASET In DW subject is the organization method Subjects vary with enterprise
  8. 8. Shanu Sharma, CSE-ASET 2. INTEGRATED DATA  Data in DW comes from several operational systems.  Different datasets have different file formats. Example: Data for subject Account comes from 3 different data sources. So variations could be there, like: Naming conventions could be different. Attributes for data items could be different. Like: Saving account no. could be of 8 bytes long but only 6 bytes for checking accounts.
  9. 9. Shanu Sharma, CSE-ASET  Before moving the data into the data warehouse, you have to go through a process of transformation, consolidation, and integration of the source data.  Here are some of the items that would need standardization:  Naming conventions  Codes  Data attributes
  10. 10. Shanu Sharma, CSE-ASET
  11. 11. Shanu Sharma, CSE-ASET TIME VARIANT DATA  In operational systems the stored data contains current values. Like in saving account system the balance is the current balance of the customer.  But the data in the DW is meant for analysis and decision making.  Comparative analysis is one of the best techniques for business performance evaluation  Time is critical factor for comparative analysis  Every data structure in DW contains time element
  12. 12. Shanu Sharma, CSE-ASET  So, DW has to contain historical data and current values.  Data is stored as snapshots over past and current periods. The time-variant nature of the data in a data warehouse  Allows for analysis of the past  Relates information to the present  Enables forecasts for the future
  13. 13. Shanu Sharma, CSE-ASET NON VOLATILE DATA  Data from operational systems are moved into DW after specific intervals  Every business transaction don‟t update in DW  Data from DW is not deleted  Data is neither changed by individual transactions
  14. 14. Shanu Sharma, CSE-ASET Subject Oriented Organized along the lines of the subjects of the corporation. Typical subjects are customer, product, vendor and transaction. Time-Variant Every record in the data warehouse has some form of time variancy attached to it. Non-Volatile Refers to the inability of data to be updated. Every record in the data warehouse is time stamped in one form or another.
  15. 15. Shanu Sharma, CSE-ASET DATA GRANULARITY Data granularity refers to the level of details of data in data warehouse. The lower the level of details, the finer is the data granularity.
  16. 16. Shanu Sharma, CSE-ASET DATA WAREHOUSES AND DATA MARTS  In 1998 Bill Inmon stated , “The single most important issue facing the IT manager this year is whether to build the data warehouse first or the data mart first”. How are they different ?
  17. 17. Shanu Sharma, CSE-ASET
  18. 18. Shanu Sharma, CSE-ASET  In any organization for managing data for analysis purpose there are basically two approaches. 1. Top Down Approach The centralized data warehouse would feed the dependent data marts that may be designed based on a dimensional data model. In this approach data in the data warehouse is stored at the lowest level of granularity based on a normalized data model.
  19. 19. Shanu Sharma, CSE-ASET Advantages:  An enterprise view of data  Not a union of disparate data marts  Centralized rules and control Disadvantages:  Slow approach  High exposure to risk of failure
  20. 20. Shanu Sharma, CSE-ASET 2. Bottom Up Approach In this approach first data marts are created to provide analytical capability for specific business subjects based on dimension data model. Then these data marts are joined or unioned by conforming the dimensions to create a DW. Advantages:  Faster and easier implementation  Less risk of failure  Allows project team to learn and grow Disadvantages:  Redundant data in every data mart.  Inconsistent data
  22. 22. Shanu Sharma, CSE-ASET 1. SOURCE DATA COMPONENT  Production data Comes from various operational systems of the enterprise.  Internal Data Like private documents, customer profiles, departmental databases etc.  External Data Statistics data produced by external agencies. Used for comparing performance against other organizations.  Archived Data In every operational systems, the old data periodically stored in archived files or on disk storage. This data is also required as the data warehouse keeps historical snapshots of data.
  23. 23. Shanu Sharma, CSE-ASET 2. DATA STAGING COMPONENT After data is extracted, data is to be prepared Data extracted from sources needs to be changed, converted and made ready in suitable format  Three major functions to make data ready  Extract  Transform  Load  Staging area provides a place and area with a set of functions to  Clean  Change  Combine  Convert
  24. 24. Shanu Sharma, CSE-ASET Different techniques are used for extracting data from different data sources. Data transformation includes Data cleaning- like correction of misselling, resolution of conflicts, providing default values for missing data elements etc, remove duplication. Standardization of Data- standardize data types, field length. Semantic standardization like resolving synonyms and homonyms. Sorting, Merging etc.
  25. 25. Shanu Sharma, CSE-ASET Data Loading: Data Movement to the Data Warehouse
  26. 26. Shanu Sharma, CSE-ASET 3. DATA STORAGE COMPONENTS  Separate repository  Data structured for efficient processing  Updated after specific periods  Only read-only
  27. 27. Shanu Sharma, CSE-ASET 4. INFORMATION DELIVERY COMPONENT  It includes various methods of delivering information on the basis of users. Ex.  Ad hoc reports or predefined reports for novice and casual users.  Statistical analysis for business analyst.  It also provides information to data mining applications.
  28. 28. Shanu Sharma, CSE-ASET
  29. 29. Shanu Sharma, CSE-ASET METADATA COMPONENT  Metadata component is the data about the data in the data warehouse.  Metadata in a data warehouse contains the answers to questions about the data in the data warehouse.  It serves as a directory of the contents of the data warehouse
  30. 30. Shanu Sharma, CSE-ASET TYPES OF METADATA  Operational Metadata Contains information about the operational data sources like field lengths, data types etc.  Extraction and Transformation Metadata extraction frequencies, extraction methods etc.  End-User Metadata
  32. 32. 32 APPLICATION AREAS Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
  33. 33. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Impact on organization‟s core business is to streamline and maximize profitability.  Fraud detection.  Profitability analysis.  Direct mail/database marketing.  Credit risk prediction.  Yield management.  Inventory management. .
  34. 34. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Fraud detection  By observing data usage patterns.  People have typical purchase patterns.  Deviation from patterns.  Certain cities notorious for fraud.  Certain items bought by stolen cards.  Similar behavior for stolen phone cards.
  35. 35. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Profitability Analysis  Banks know if they are profitable or not.  Don‟t know which customers are profitable.  Typically more than 50% are NOT profitable.  Don‟t know which one?  Balance is not enough, transactional behavior is the key.  Restructure products and pricing strategies.  Life-time profitability models (next 3-5 years).
  36. 36. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Direct mail marketing  Targeted marketing.  Offering high bandwidth package NOT to all users.  Know from call detail records of web surfing.  Saves marketing expense, saving pennies.  Knowing your customers better.
  37. 37. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Credit risk prediction  Who should get a loan?  Qualitative decision making NOT subjective.  Different interest rates for different customers.  Do not subsidize bad customer on the basis of good.
  38. 38. Shanu Sharma, CSE-ASET TYPICAL APPLICATIONS Yield Management  Works for fixed inventory businesses.  Item prices vary for varying customers.  Example: Air Lines, Hotels etc.  Price of (say) Air Ticket depends on:  How much in advance ticket was bought?  How many vacant seats were present?  How profitable is the customer?  Ticket is one-way or return?
  39. 39. Shanu Sharma, CSE-ASET RECENT APPLICATION Agriculture Systems  Agri and related data collected for decades.  Decision making based on expert judgment.  Lack of integration results in underutilization.  What is required, in which amount and when?
  40. 40. 40 DATA WAREHOUSE VS. OLTP OLTP (On Line Transaction Processing) Select tx_date, balance from tx_table Where account_ID = 23876;
  41. 41. 41 DATA WAREHOUSE VS. OLTP DWH Select balance, age, sal, gender from customer_table, tx_table Where age between (30 and 40) and Education = „graduate‟ and CustID.customer_table = Customer_ID.tx_table;
  42. 42. 42 DATA WAREHOUSE VS. OLTP OLTP DWH Primary key used Primary key NOT used No concept of Primary Index Primary index used Few rows returned Many rows returned May use a single table Uses multiple tables High selectivity of query Low selectivity of query Indexing on primary key (unique) Indexing on primary index (non-unique)
  43. 43. Shanu Sharma, CSE-ASET43 COMPARISON OF RESPONSE TIMES  On-line analytical processing (OLAP) queries must be executed in a small number of seconds.  Often requires denormalization and/or sampling.  Complex query scripts and large list selections can generally be executed in a small number of minutes.  Sophisticated clustering algorithms (e.g., data mining) can generally be executed in a small number of hours (even for hundreds of thousands of customers).
  44. 44. Shanu Sharma, CSE-ASET44 DATA WAREHOUSE FOR DECISION SUPPORT & OLAP  Putting Information technology to help the knowledge worker make faster and better decisions  Which of my customers are most likely to go to the competition?  What product promotions have the biggest impact on revenue?  How did the share price of software companies correlate with profits over last 10 years?