Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Warehouse Planning and Implementation

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Data Warehousing
Data Warehousing
Chargement dans…3
×

Consultez-les par la suite

1 sur 58 Publicité

Warehouse Planning and Implementation

Télécharger pour lire hors ligne

Data Warehouse Process and Technology: Warehousing Strategy, Warehouse management and Support Processes.

Warehouse Planning and Implementation.

H/w and O.S. for Data Warehousing, C/Server Computing Model & Data Warehousing, Parallel Processors & Cluster Systems, Distributed DBMS implementations.

Warehousing Software, Warehouse Schema Design.

Data Extraction, Cleanup & Transformation Tools, Warehouse Metadata

,data warehouse process and technology: warehousing ,warehouse management and support processes. wareh ,c/server computing model & data warehousing ,parallel processors & cluster systems ,distributed dbms implementations. warehousing sof ,warehouse schema design. data extraction ,cleanup & transformation tools ,warehouse metadata

Data Warehouse Process and Technology: Warehousing Strategy, Warehouse management and Support Processes.

Warehouse Planning and Implementation.

H/w and O.S. for Data Warehousing, C/Server Computing Model & Data Warehousing, Parallel Processors & Cluster Systems, Distributed DBMS implementations.

Warehousing Software, Warehouse Schema Design.

Data Extraction, Cleanup & Transformation Tools, Warehouse Metadata

,data warehouse process and technology: warehousing ,warehouse management and support processes. wareh ,c/server computing model & data warehousing ,parallel processors & cluster systems ,distributed dbms implementations. warehousing sof ,warehouse schema design. data extraction ,cleanup & transformation tools ,warehouse metadata

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Publicité

Plus récents (20)

Publicité

Warehouse Planning and Implementation

  1. 1. Shikha Gautam Asst.Professor
  2. 2.  Data Warehouse Process and Technology: Warehousing Strategy, Warehouse management and Support Processes.  Warehouse Planning and Implementation.  H/w and O.S. for Data Warehousing, C/Server Computing Model & Data Warehousing, Parallel Processors & Cluster Systems, Distributed DBMS implementations.  Warehousing Software, Warehouse Schema Design.  Data Extraction, Cleanup & Transformation Tools, Warehouse Metadata
  3. 3. “Storage or warehousing provides the place utility as part of logistics for any business and along with Transportation is a critical component of customer service standards”.
  4. 4.  To support the company’s customer policy.  To maintain a source of supply without interruptions.  To support changing market conditions and sudden changes in demand.  To provide customers with the right mix of products at all times and all locations.  To ensure least logistics cost for a desired level of customer service.
  5. 5.  More cost effective decision making.  Better enterprise intelligence: Increasing quality and flexibility of enterprise analysis.  Enhanced customer service.  Business re-engineering: Knowing what information is important provides direction and priority for re-engineering efforts.  Information system re-engineering.
  6. 6.  Private warehouses: It is a storage facility that is mostly owned by big companies or single manufacturing units. It is also known as proprietary warehousing.  Public warehouses: It is a facility that stores inventory for many different businesses as opposed to a "private warehouse”.  Contract warehouses: A contract warehouse handles the shipping, receiving and storage of goods on a contract basis. This type of warehouse usually requires a client to commit to services for a particular period of time.
  7. 7.  An integrated warehouse strategy focuses on two questions: 1. How many warehouses should be employed. 2. Which warehouse types should be used to meet market requirements.  Many firms utilize a combination of private, public, and contract facilities.
  8. 8.  It involves following activities: 1. Establish sponsorship. 2. Identify enterprise needs. 3. Determine measurement cycle. 4. Validate measures. 5. Design data warehouse architecture. 6. Apply appropriate technologies. 7. Implementing data warehouse.
  9. 9. 1. Establish sponsorship: Establishing the right sponsorship chain will ensure successful development and implementation. Sponsorship chain should include a data warehousing manager and two key individuals. 2. Identify Enterprise needs: Interview with key enterprise manager and analysis other pertinent documentations are techniques used to determine enterprise needs.
  10. 10. 3. Determine measurement cycle: Describing the cycles or time period used for the measure. Are quarters, months or hours are appropriate to capture useful measurement data? Does it need historical data? 4. Validate measures: After determining and identifying enterprise needs, it is necessary to “reality check” of it. The feedback will be used for refining the measures.
  11. 11. 5. Design data warehouse architecture: This activity involves active user participation in facilitated design sessions. 6. Apply appropriate technologies: Enterprise selects technology, key technology issues, security policies etc. 7. Implementing data warehouse: Loading preliminary data, designing user interface, developing standard queries and reports etc.
  12. 12. There are four major processes that build a data warehouse: 1. Extract and load data: Data extraction takes data from the source systems. Data load takes the extracted data and loads it into the data warehouse. It involves:  Controlling the Process: Determining when to start data extraction. It ensures that the tools, the logic modules, and the programs are executed in correct sequence and at correct time.
  13. 13.  When to Initiate Extract: Data warehouse should represent a single, consistent version of the information to the user. So, Data needs to be in a consistent state.  Loading the Data : Data is loaded into a temporary data store where it is cleaned up and made consistent. 2. Cleaning and transforming the data: Clean and transform the loaded data into a structure, Partition the data and Aggregation.
  14. 14. 3. Backup and Archive the data: In order to recover the data in the event of data loss, software failure, or hardware failure, it is necessary to keep regular back ups. 4. Managing queries & directing them to the appropriate data sources: Manages the queries, helps speed up the execution time of queries, Directs the queries to their most effective data sources. Ensures that all the system sources are used in the most effective way, Monitors actual query profiles.
  15. 15.  A warehouse management system (WMS) is a software application, designed to support and optimize warehouse or distribution center management.  They facilitate management in their daily planning, organizing, staffing, directing, and controlling the utilization of available resources, to move and store materials into, within, and out of a warehouse, while supporting staff in the performance of material movement and storage in and around a warehouse.
  16. 16. 1. Load management: Relates to the collection of information from internal or external sources. Loading process includes summarizing, manipulating and changing the data structures into a format that lends itself to analytical processing. 2. Warehouse Management: The management tasks include ensuring its availability, the effective backup of its contents, and its security.
  17. 17. 3. Query management: relates to the provision of access to the contents of the warehouse and may include the partitioning of information into different areas with different privileges to different users. Access may be provided through custom-built applications, or ad hoc query tools.
  18. 18.  Includes loading preliminary data, implementing transformation program, design user interface, developing standard query and reports and training to warehouse users.
  19. 19. ETL Design user Interface Develop standard query Training Users
  20. 20. The process of extracting data from source systems and bringing it into the data warehouse is commonly called ETL, which stands for:  Extraction: To retrieve all the required data from the source system with as little resources as possible.  Transformation, and  Loading.
  21. 21.  Ways to perform the extract:  Update notification – If the source system is able to provide a notification that a record has been changed, this is the easiest way to get the data.  Incremental extract –They are able to identify which records have been modified and provide an extract of such records. By using daily extract, we may not be able to handle deleted records.  Full extract - The full extract requires keeping a copy of the last extract in the same format in order to be able to identify changes. Handles deletions as well.
  22. 22. 2. Clean: Ensures the quality of the data in the data warehouse. 3. Transform: Applies a set of rules to transform the data from the source to the target. Converting any measured data to the same dimension using the same units so that they can later be joined. It also requires joining data from several sources, generating aggregates, generating surrogate keys, sorting, deriving new calculated values.
  23. 23. 4. Load: To ensure that the load is performed correctly and with as little resources as possible.The target of the Load process is often a database. The referential integrity needs to be maintained by ETL tool to ensure consistency. 5. Managing ETL Process: There is a possibility that the ETL process fails.This can be caused by missing values in one of the reference tables, or simply a connection or power outage. It is necessary to design the ETL process keeping fail-recovery in mind.
  24. 24. 6. Staging: A staging area or landing zone is an intermediate storage area used for data processing during the ETL process. Primary motivations for their use are to increase efficiency of ETL processes, ensure data integrity and support data quality operations.
  25. 25.  Commercial tools : Ab Initio, IBM InfoSphere DataStage, Informatica, Oracle Data Integrator and SAP Data Integrator.  Open source ETL tools: CloverETL, Apatar, Pentaho and Talend.
  26. 26.  Data Warehousing comes in all shapes and sizes, which is having a direct relationship to cost and time involved.  The steps listed below are summary of some of the points to consider:  Get Professional Advice  Plan the Data  Who will use the Data Warehouse  Integration to External Applications
  27. 27. The key steps in developing a data warehouse can be summarized as follows:  Project initiation  Requirements analysis  Design (architecture, databases and applications)  Construction (selecting and installing tools, developing data feeds and building reports)  Deployment (release & training)  Maintenance
  28. 28.  It applies to the software architecture that describe processing between application and supporting services.  It represents distributive co-operating processing, relationship between client and server is the relationship between hardware and software components.  It covers a wide range of functions, services and other aspects of distributed environment.
  29. 29.  Host based application processing is performed on one computer system with attached unintelligent, “dumb” terminals.  A single stand alone PC or an IBM mainframe with attached character-based display terminals are example of host-based processing environment.  Host based processing is totally non-distributed.
  30. 30.  Slave computers are attached to master computer and perform application-processing-related functions only as directed by their master.  Distribution of processing tends to be unidirectional- from master to slaves.  Slaves are capable of some limited local application processing.  E.g. Mainframe (host) computer, such as IBM 3090 used with cluster controllers and intelligent terminals.
  31. 31.  This generation used to model: 1. Shared device LAN processing environment : PCs are attached to a system device that allows these PCs to share a common resource – file Server on Hard disk or printer Server. E.g. Microsoft’s LAN manager, which allows a LAN to have a system dedicated to file and print services.
  32. 32. 2. Client server LAN processing environment: Extension of shared device processing. E.g. SYBASE SQL Server An application running on PC sends Read request to its database server. DB server process it locally and sends only the requested records to PC applications.
  33. 33.  Two-tiered architecture to multi-tiered architecture.  Computing model deals with servers dedicated to application, data, transaction management and system management.  Supported relational to multidimensional to multimedia data structure.
  34. 34.  A distributed database system consists of loosely coupled sites that share no physical component.  Database systems that run on each site are independent of each other.  Transactions may access data at one or more sites.
  35. 35.  In a homogeneous distributed database  All sites have identical software  Are aware of each other and agree to cooperate in processing user requests.  Each site surrenders part of its autonomy in terms of right to change schemas or software  Appears to user as a single system  In a heterogeneous distributed database  Different sites may use different schemas and software ▪ Difference in schema is a major problem for query processing ▪ Difference in software is a major problem for transaction processing  Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing
  36. 36. DDBMS architectures are generally developed depending on three parameters −  Distribution − It states the physical distribution of data across the different sites.  Autonomy − It indicates the distribution of control of the database system and the degree to which each constituent DBMS can operate independently.  Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system components and databases.
  37. 37.  Data Replication  Fragmentation The three dimensions of distribution transparency are −  Location transparency  Fragmentation transparency  Replication transparency
  38. 38. Communication Network Site 1 Site 2 Site 3 Site 4
  39. 39.  The data warehouse operations mainly consist of huge data loads and index builds, generation of materialized views, and queries over large volumes of data. The elemental I/O system of a data warehouse should be built to meet these heavy requirements.  Architecture Options: 1. Symmetric Multiprocessing (SMP): where two or more identical processors are connected to a single, shared main memory. 2. Massive parallel processing (MPP): large number of processors to perform a set of coordinated computations in parallel.  Number of CPUs  Memory of data warehouse  Number of Disks
  40. 40.  Server OS determine:  how quickly the server can fulfill client request  how many clients it can support concurrently and reliably,  how efficient the system resources such as memory,  Disk I/O and communication components are utilized.
  41. 41.  Multiuser Support  Preemptive multitasking  Multithreaded Design  Memory Protection: Concurrent tasks should not violate each others memory.  Scalability  Security  Reliability  Availability
  42. 42.  Relatively small and highly secure than uniprocessors.  Simplified architecture, Extensibility, Portability, real time support, robust system security and multiprocessor support.  This architecture results into highly modular OS that can support multiple OS “personalities” by configuring outside services as needed.  For e.g. Mach 3.0 microkernel used by IBM to allow DOS, OS/2 and AIX OS to coexist on single machine.
  43. 43.  Distributed Memory Architecture:  Shared-Nothing Architecture  Shared Disk Architecture
  44. 44. Local Memory Local Memory Local Memory Local Memory Processor Unit (PU) Processor Unit (PU) Processor Unit (PU) Processor Unit (PU) Interconnection Network
  45. 45. Local Memory Local Memory Local Memory Local Memory Processor Unit (PU) Processor Unit (PU) Processor Unit (PU) Processor Unit (PU) Interconnection Network Global Shared Disk Subsystem
  46. 46.  A cluster is a loosely coupled SMP machines connected by high speed interconnection network.  A cluster behave just like a single large machine.

×