My Company: Blue Ball Blue Ball Group is an Offshoring Company that focus totally in customer satisfaction. It takes advantage of western management combined with Asian human resources to provide high quality services Thailand (Head Office) Mexico (Special Developments) Vietnam (Offshoring Center)
Services from My Company Offshoring Programmers &Testers Blue Ball will get you ready to offshore successfully. No need to rush you into offshoring without you feeling confident on how to send, organize, receive, test and accept job confidently System Development & Embedded Solutions Solutions that combine technological expertise and deep business understanding. We only start coding once every single detail such as milestones, scheduling, contact point, communication, issue management and critical protocols are in place Web design and E-commerce Premium web design, CMS, e-commerce solutions and SEO services. Website maintenance and copy content creation to develop marketing campaigns that SELL for discerning companies to increase the quality and reach of their marketing campaigns
A producer wants to know…. Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? What product prom- -otions have the biggest impact on revenue? What is the most effective distribution channel?
What is Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research, April 1996] Data Information
Very Large Data Bases Terabytes -- 10^12 bytes: Petabytes -- 10^15 bytes: Exabytes -- 10^18 bytes: Zettabytes -- 10^21 bytes: Zottabytes -- 10^24 bytes: Walmart -- 24 Terabytes Intelligence Agency Videos Geographic Information Systems National Medical Records Weather images
Data Warehouse: Subjected-Oriented WH is organized around the major subjects of the enterprise..rather than the major application areas.. This is reflected in the need to store decision-support data rather than application-oriented data Subject-Oriented DBWH Sales Operational DB Order Processing Application-Oriented
Data Warehouse: Integrated Because the source data come together from different enterprise-wide applications systems. The source data is often inconsistent using the integrated data source must be made consistent to present a unified view of the data to the users
Data Warehouse: time-varying The source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots Historical data is recorded
Data Warehouse: Non-volatile Data is NOT update in real time but is refresh from OS on a regular basis. New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data Anyone who is using the database has confidence that a query will always produce the same result no matter how often it is run
Explorers, Farmers and Tourists Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data Farmers: Harvest information from known access paths Tourists: Browse information harvested by farmers
Data Warehouse Architecture Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems
Data Mining works with Warehouse Data Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence
Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
Examples of Operational Data Data Industry Usage Technology Volumes Customer File All Track Customer Detail Legacy application, flat files, main frames Small-medium Account Balance Finance Control Account Activities Legacy applications, hierarchical databases, mainframe Large Point-of- Sale data Retail Generate bills, manage stock ERP, Client/Server, relational databases Very Large Call Record Tele-Comm. Billing Legacy application, hierarchical database, mainframe Very Large Production Record Mfg. Control Production ERP, RDBMS, AS/400 Medium
Referential Integrity Referential integrity constraints define the rules for associating rows with each other, i.e. columns which reference columns in other tables: Every non-null value in a foreign key must have a corresponding value in the primary key which it references. A row can be inserted or a column updated in the dependent table only if (1) there is a corresponding primary key value in the parent table, or (2) the foreign key value is set null. Department ( Parent Table ) Dept-No D1 D3 D2 D7 Employee( Dependent Table ) Dept-No D7 ? D1 D3 ? D7 Emp-No D2 INSERT ROW UPDATE COLUMN
Project Selected Columns The " Persons " table : SELECT LastName,FirstName FROM Persons SELECT P_id, Last Name, First Name FROM Persons ORDER BY LastName Stavanger Storgt 20 Kari Pettersen 3 Sandnes Borgvn 23 Tove Svendson 2 Sandnes Timoteivn 10 Ola Hansen 1 City Address FirstName LastName P_Id Kari Pettersen Tove Svendson Ola Hansen FirstName LastName Tove Svendson 2 Kari Pettersen 3 Tom Nilsen 4 Ola Hansen 1 FirstName LastName P_Id
Restrict Rows The " Persons " table : SELECT * FROM Persons WHERE City='Sandnes' Sandnes Borgvn 23 Tove Svendson 2 Sandnes Timoteivn 10 Ola Hansen 1 City Address FirstName LastName P_Id Stavanger Storgt 20 Kari Pettersen 3 Sandnes Borgvn 23 Tove Svendson 2 Sandnes Timoteivn 10 Ola Hansen 1 City Address FirstName LastName P_Id
Equal Join The " Persons " table : The "Orders" table: SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo FROM Persons, Orders WHERE Persons.P_Id = Orders.P_Id ORDER BY Persons.LastName Stavanger Storgt 20 Kari Pettersen 3 Sandnes Borgvn 23 Tove Svendson 2 Sandnes Timoteivn 10 Ola Hansen 1 City Address FirstName LastName P_Id 15 34764 5 1 24562 4 1 22456 3 3 44678 2 3 77895 1 P_Id OrderNo O_Id 44678 Kari Pettersen 77895 Kari Pettersen 24562 Ola Hansen 22456 Ola Hansen OrderNo FirstName LastName
Summarising Data The " Orders " table : SELECT COUNT(Customer) AS CustomerNilsen FROM Orders WHERE Customer='Nilsen' SELECT AVG(OrderPrice) AS OrderAverage FROM Orders Nilsen 100 2008/10/04 6 Jensen 2000 2008/08/30 5 Hansen 300 2008/09/03 4 Hansen 700 2008/09/02 3 Nilsen 1600 2008/10/23 2 Hansen 1000 2008/11/12 1 Customer OrderPrice OrderDate O_Id 2 CustomerNilsen 950 OrderAverage
GROUP BY SELECT Customer,SUM(OrderPrice) FROM Orders GROUP BY Customer The " Orders " table : A result of a previous specified clause is grouped using the group by clause. Nilsen 100 2008/10/04 6 Jensen 2000 2008/08/30 5 Hansen 300 2008/09/03 4 Hansen 700 2008/09/02 3 Nilsen 1600 2008/10/23 2 Hansen 1000 2008/11/12 1 Customer OrderPrice OrderDate O_Id 2000 Jensen 1700 Nilsen 2000 Hansen SUM(OrderPrice) Customer
HAVING Used for select groups that meet specified conditions. Always used with GROUP BY clause. The " Orders " table : SELECT Customer,SUM ( OrderPrice ) FROM Orders GROUP BY Customer HAVING SUM ( OrderPrice ) <2000 Nilsen 100 2008/10/04 6 Jensen 2000 2008/08/30 5 Hansen 300 2008/09/03 4 Hansen 700 2008/09/02 3 Nilsen 1600 2008/10/23 2 Hansen 1000 2008/11/12 1 Customer OrderPrice OrderDate O_Id 1700 Nilsen SUM(OrderPrice) Customer
Nested Queries A sub query is SELECT statement that nest inside the WHERE clause of another SELECT statement. The results are need in solving the main query. The " Orders " table : SELECT Customer FROM Orders WHERE OrderPrice> ( SELECT AVG ( OrderPrice ) FROM Orders ) Nilsen 100 2008/10/04 6 Jensen 2000 2008/08/30 5 Hansen 300 2008/09/03 4 Hansen 700 2008/09/02 3 Nilsen 1600 2008/10/23 2 Hansen 1000 2008/11/12 1 Customer OrderPrice OrderDate O_Id Jensen Nilsen Hansen Customer
References/External Links (1) Data Warehousing & Data Mining S. Sudarshan Krithi Ramamritham IIT Bombay (2) Data Warehousing Hu Yan e-mail: [email_address] (3) What is a Data Warehouse? http://blog.maia-intelligence.com/2008/04/29/what-is-a-data-warehouse/ (4) Database Management Systems (DBMS) http://www.bit.lk/teachingmaterial/IT2302/index.htm (5) SQL Tutorial http :// www . w3schools . com / sql / default . asp
Thank you for your attention! [email_address] www.blueballgroup.com
Notes de l'éditeur
A producer wants to know many data from varies sections in organization
We have many data, but we can’t use it properly
Dr. Barry Devlin: IBM Consultant, working on DBWH since 1985 with IBM
Data = Raw can’t use for decision Information = Summarize/analytic data
Intelligence Agency e.g. CIA, FBI, NSA
Bill Inmon : Father of DBWH subject-oriented: Organized based on use Integrated: inconsistencies remove time-varying: data are normally time-series non-volatile: store in read-only format
The data is organized around subjects ( such as Sales ) rather than operational applications ( e.g. order processing). Operational databases are organized around business application; they are application oriented. Recall the five queries that the directors have identified as examples of the types of questions they would like to ask of their data . We concluded that they are concerned with sales of products over time . The subject area in our case study is clearly “sales . ”