Publicité
Publicité

Contenu connexe

Publicité

IT Ready - DW: 1st Day

  1. Data Warehousing (DAY 1) Siwawong W. Project Manager 2010.05.24
  2. Agenda 09:00 – 09:15 Registration 09:15 – 09:30 Self-Introduction 09:30 – 10:30 Data Warehouse: Introduction 10:30 – 10:45 Break & Morning Refreshment 10:45 – 12:00 Data Warehouse: Introduction (Cont’) 12:00 – 13:00 Lunch Break 13:00 – 15:00 Review RDBMS & SQL command 15:00 – 15:15 Break 15:15 – 16:00 Case Study ~ Q/A
  3. SELF-INTRODUCTION
  4. My Company: Blue Ball Blue Ball Group is an Offshoring Company that focus totally in customer satisfaction. It takes advantage of western management combined with Asian human resources to provide high quality services Thailand (Head Office) Mexico (Special Developments) Vietnam (Offshoring Center)
  5. Services from My Company Offshoring Programmers &Testers   Blue Ball will get you ready to offshore successfully. No need to rush you into offshoring without you feeling confident on how to send, organize, receive, test and accept job confidently   System Development & Embedded Solutions   Solutions that combine technological expertise and deep business understanding. We only start coding once every single detail such as milestones, scheduling, contact point, communication, issue management and critical protocols are in place Web design and E-commerce   Premium web design, CMS, e-commerce solutions and SEO services. Website maintenance and copy content creation to develop marketing campaigns that SELL for discerning companies to increase the quality and reach of their marketing campaigns
  6. My Clients
  7. Data Warehouse: Introduction
  8. Data Warehouse: What & Why? Problem Statements
  9. A producer wants to know…. Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? What product prom- -otions have the biggest impact on revenue? What is the most effective distribution channel?
  10. What is Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research, April 1996] Data Information
  11. Very Large Data Bases Terabytes -- 10^12 bytes: Petabytes -- 10^15 bytes: Exabytes -- 10^18 bytes: Zettabytes -- 10^21 bytes: Zottabytes -- 10^24 bytes: Walmart -- 24 Terabytes Intelligence Agency Videos Geographic Information Systems National Medical Records Weather images
  12. Data Warehouse: Subjected-Oriented WH is organized around the major subjects of the enterprise..rather than the major application areas.. This is reflected in the need to store decision-support data rather than application-oriented data Subject-Oriented DBWH Sales Operational DB Order Processing Application-Oriented
  13. Data Warehouse: Integrated Because the source data come together from different enterprise-wide applications systems. The source data is often inconsistent using the integrated data source must be made consistent to present a unified view of the data to the users
  14. Data Warehouse: time-varying The source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots Historical data is recorded
  15. Data Warehouse: Non-volatile Data is NOT update in real time but is refresh from OS on a regular basis. New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data Anyone who is using the database has confidence that a query will always produce the same result no matter how often it is run
  16. Explorers, Farmers and Tourists Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data Farmers: Harvest information from known access paths Tourists: Browse information harvested by farmers
  17. Data Warehouse Architecture Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems
  18. OLAP & Data Mining
  19. Data Mining works with Warehouse Data Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence
  20. Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
  21. Examples of Operational Data Data Industry Usage Technology Volumes Customer File All Track Customer Detail Legacy application, flat files, main frames Small-medium Account Balance Finance Control Account Activities Legacy applications, hierarchical databases, mainframe Large Point-of- Sale data Retail Generate bills, manage stock ERP, Client/Server, relational databases Very Large Call Record Tele-Comm. Billing Legacy application, hierarchical database, mainframe Very Large Production Record Mfg. Control Production ERP, RDBMS, AS/400 Medium
  22. Related to OLTP
  23. Application-Orientation vs. Subject-Orientation Application-Orientation Subject-Orientation Data Warehouse Customer Vendor Product Activity Operational Database Loans Credit Card Trust Savings
  24. To summarize ... OLTP Systems are used to “ run ” a business The Data Warehouse helps to “ optimize ” the business
  25. Review RDBMS & SQL statement
  26. Referential Integrity Referential integrity constraints define the rules for associating rows with each other, i.e. columns which reference columns in other tables: Every non-null value in a foreign key must have a corresponding value in the primary key which it references. A row can be inserted or a column updated in the dependent table only if (1) there is a corresponding primary key value in the parent table, or (2) the foreign key value is set null. Department ( Parent Table ) Dept-No D1 D3 D2 D7 Employee( Dependent Table ) Dept-No D7 ? D1 D3 ? D7 Emp-No D2 INSERT ROW UPDATE COLUMN
  27. Project Selected Columns The " Persons " table : SELECT LastName,FirstName FROM Persons SELECT P_id, Last Name, First Name FROM Persons ORDER BY LastName Stavanger Storgt 20 Kari Pettersen 3 Sandnes Borgvn 23 Tove Svendson 2 Sandnes Timoteivn 10 Ola Hansen 1 City Address FirstName LastName P_Id Kari Pettersen Tove Svendson Ola Hansen FirstName LastName Tove Svendson 2 Kari Pettersen 3 Tom Nilsen 4 Ola Hansen 1 FirstName LastName P_Id
  28. Restrict Rows The " Persons " table : SELECT * FROM Persons WHERE City='Sandnes' Sandnes Borgvn 23 Tove Svendson 2 Sandnes Timoteivn 10 Ola Hansen 1 City Address FirstName LastName P_Id Stavanger Storgt 20 Kari Pettersen 3 Sandnes Borgvn 23 Tove Svendson 2 Sandnes Timoteivn 10 Ola Hansen 1 City Address FirstName LastName P_Id
  29. Equal Join The " Persons " table : The "Orders" table: SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo FROM Persons, Orders WHERE Persons.P_Id = Orders.P_Id ORDER BY Persons.LastName Stavanger Storgt 20 Kari Pettersen 3 Sandnes Borgvn 23 Tove Svendson 2 Sandnes Timoteivn 10 Ola Hansen 1 City Address FirstName LastName P_Id 15 34764 5 1 24562 4 1 22456 3 3 44678 2 3 77895 1 P_Id OrderNo O_Id 44678 Kari Pettersen 77895 Kari Pettersen 24562 Ola Hansen 22456 Ola Hansen OrderNo FirstName LastName
  30. Summarising Data The " Orders " table : SELECT COUNT(Customer) AS CustomerNilsen FROM Orders WHERE Customer='Nilsen' SELECT AVG(OrderPrice) AS OrderAverage FROM Orders Nilsen 100 2008/10/04 6 Jensen 2000 2008/08/30 5 Hansen 300 2008/09/03 4 Hansen 700 2008/09/02 3 Nilsen 1600 2008/10/23 2 Hansen 1000 2008/11/12 1 Customer OrderPrice OrderDate O_Id 2 CustomerNilsen 950 OrderAverage
  31. GROUP BY SELECT Customer,SUM(OrderPrice) FROM Orders GROUP BY Customer The " Orders " table : A result of a previous specified clause is grouped using the group by clause. Nilsen 100 2008/10/04 6 Jensen 2000 2008/08/30 5 Hansen 300 2008/09/03 4 Hansen 700 2008/09/02 3 Nilsen 1600 2008/10/23 2 Hansen 1000 2008/11/12 1 Customer OrderPrice OrderDate O_Id 2000 Jensen 1700 Nilsen 2000 Hansen SUM(OrderPrice) Customer
  32. HAVING Used for select groups that meet specified conditions. Always used with GROUP BY clause. The &quot; Orders &quot; table : SELECT Customer,SUM ( OrderPrice ) FROM Orders GROUP BY Customer HAVING SUM ( OrderPrice ) <2000 Nilsen 100 2008/10/04 6 Jensen 2000 2008/08/30 5 Hansen 300 2008/09/03 4 Hansen 700 2008/09/02 3 Nilsen 1600 2008/10/23 2 Hansen 1000 2008/11/12 1 Customer OrderPrice OrderDate O_Id 1700 Nilsen SUM(OrderPrice) Customer
  33. Nested Queries A sub query is SELECT statement that nest inside the WHERE clause of another SELECT statement. The results are need in solving the main query. The &quot; Orders &quot; table : SELECT Customer FROM Orders WHERE OrderPrice> ( SELECT AVG ( OrderPrice ) FROM Orders ) Nilsen 100 2008/10/04 6 Jensen 2000 2008/08/30 5 Hansen 300 2008/09/03 4 Hansen 700 2008/09/02 3 Nilsen 1600 2008/10/23 2 Hansen 1000 2008/11/12 1 Customer OrderPrice OrderDate O_Id Jensen Nilsen Hansen Customer
  34. Case Study
  35. References/External Links (1) Data Warehousing & Data Mining S. Sudarshan Krithi Ramamritham IIT Bombay (2) Data Warehousing Hu Yan e-mail: [email_address] (3) What is a Data Warehouse? http://blog.maia-intelligence.com/2008/04/29/what-is-a-data-warehouse/ (4) Database Management Systems (DBMS) http://www.bit.lk/teachingmaterial/IT2302/index.htm (5) SQL Tutorial http :// www . w3schools . com / sql / default . asp
  36. Thank you for your attention! [email_address] www.blueballgroup.com

Notes de l'éditeur

  1. A producer wants to know many data from varies sections in organization
  2. We have many data, but we can’t use it properly
  3. Dr. Barry Devlin: IBM Consultant, working on DBWH since 1985 with IBM
  4. Data = Raw can’t use for decision Information = Summarize/analytic data
  5. Intelligence Agency e.g. CIA, FBI, NSA
  6. Bill Inmon : Father of DBWH subject-oriented: Organized based on use Integrated: inconsistencies remove time-varying: data are normally time-series non-volatile: store in read-only format
  7. The data is organized around subjects ( such as Sales )  rather than operational applications ( e.g. order processing). Operational databases are organized around business application; they are   application oriented. Recall the five queries that the directors have identified as examples of the types of questions they would like to ask of their data . We concluded that they are concerned with sales of products over time . The subject area in our case study is clearly “sales . ”
Publicité