SlideShare une entreprise Scribd logo
1  sur  22
Oracle: Data Warehouse Design
Characteristics of a Data Warehouse

  A data warehouse is a database designed for
   querying, reporting, and analysis.
  A data warehouse contains historical data
   derived from transaction data.
  Data warehouses separate analysis workload
   from transaction workload.
  A data warehouse is primarily
   an analytical tool.
Comparing OLTP and Data Warehouses
          OLTP                        Data Warehouse



   Many              Joins            Some


   Comparatively   Data accessed by   Large
   lower           queries            amount


   Normalized       Duplicated data   Denormalized
   DBMS                               DBMS

                   Derived data
   Rare            and                Common
                   aggregates
Data Warehouse Architectures

                                                                     Analysis
Operational
systems
                                  Metadata                Sales


                                                        Purchasing
                        Materialized
                                             Raw data
              Staging   views
              area                                                        Reporting
                                                          Inventory




 Flat files                                                       Data mining
Data Warehouse Design
• Key data warehouse design considerations:
  – Identify the specific data content.
  – Recognize the critical relationships within and
    between groups of data.
  – Define the system environment
    supporting your data warehouse.
  – Identify the required data
    transformations.
  – Calculate the frequency at which
    the data must be refreshed.
Logical Design
– A logical design is conceptual and
  abstract.
– Entity-relationship (ER) modeling
  is useful in identifying logical
  information requirements.
   • An entity represents a chunk of data.
   • The properties of entities are known as attributes.
   • The links between entities and attributes are known
     as relationships.
– Dimensional modeling is a specialized
  type of ER modeling useful in data warehouse
  design.
Oracle Warehouse Builder
– Oracle Database provides tools to implement
  the ETL process.
   • Oracle Warehouse Builder is a tool to help in this
     process.
– Oracle Warehouse Builder generates the
  following types of code:
   •   SQL data definition language (DDL) scripts
   •   PL/SQL programs
   •   SQL*Loader control files
   •   XML Processing Description Language (XPDL)
   •   ABAP code (used to extract data from SAP systems)
Data Warehousing Schemas
– Objects can be arranged in data warehousing
  schema models in a variety of ways:
   •   Star schema
   •   Snowflake schema
   •   Third normal form (3NF) schema
   •   Hybrid schemas
– The source data model and user
  requirements should steer the data
  warehouse schema.
– Implementation of the logical model may
  require changes to enable you to adapt it to
  your physical system.
Schema Characteristics
– Star schema
   • Characterized by one or more large fact tables and a
     number of much smaller dimension tables
   • Each dimension table joined to the fact table using a
     primary key to foreign key join
– Snowflake schema
   • Dimension data grouped into multiple tables instead
     of one large table
   • Increased number of dimension tables, requiring
     more foreign key joins
– Third normal form (3NF) schema
   • A classical relational-database model that minimizes
     data redundancy through normalization
Data Warehousing Objects
– Fact tables
   • Fact tables are the large tables that store business
     measurements.
– Dimension tables
   • A dimension is a structure composed of one or more
     hierarchies that categorizes data.
   • Unique identifiers are specified for one distinct
     record in a dimension table.
– Relationships
   • Relationships guarantee
     integrity of business
     information.
Fact Tables
– A fact table must be defined for each star schema.
– Fact tables are the large tables that store business
  measurements.
– A fact table contains either detail-level or
  aggregated facts.
– A fact table usually contains facts with the same
  level of aggregation.
– The primary key of the fact table is
  usually a composite key made up
  of all its foreign keys.
Dimensions and Hierarchies
                                   CUSTOMERS dimension
– A dimension is a structure       hierarchy (by level)
  composed of one or more
  hierarchies that categorizes data. REGION
– Dimensional attributes help to
  describe the dimensional value.         SUBREGION

– Dimension data is collected at the
  lowest level of detail and aggregatedCOUNTRY
  into higher level totals.
– Hierarchies are structures that use STATE
  ordered levels to organize data.
– In a hierarchy, each level is           CITY

  connected to the levels above and
  below it.                               CUSTOMER
Dimensions and Hierarchies

 PRODUCTS                             CUSTOMERS
 #prod_id         Unique identifier   #cust_id
                    Fact table        cust_last_name
                                       cust_city
                                       cust_state_province
   Relationship    SALES
                   cust_id
                   prod_id                     Hierarchy




 TIMES                                  CHANNELS
                  PROMOTIONS

Dimension table                        Dimension table
                  Dimension table
Physical Design
  Logical             Physical (Tablespaces)


Entities           Tables                      Indexes



                                               Materialized
Relationships          Integrity
                                               views
                       constraints
                     - Primary key
                     - Foreign key
Attributes           - Not null                Dimensions



Unique
identifiers        Columns
Data Warehouse Physical Structures

• Tables and partitioned tables
  – Partitioned tables enable you to split
    large data volumes into smaller,
    more manageable pieces.
  – Expect performance benefits from:
     • Partition pruning
     • Intelligent parallel processing
  – Compressed tables offer scaleup opportunities
    for read-only operations.
  – Table compression saves disk space.
Data Warehouse Physical Structures

  – Views:
     • Are tailored presentations of data contained in one
       or more tables or views
     • Do not require any space in the database
  – Materialized views:
     • Are query results that have been stored in advance
     • (Like indexes) are used transparently and improve
       performance
  – Integrity constraints:
     • Are used in data warehouses for query rewrite
  – Dimensions:
     • Are containers of logical relationships and do not
       require any space in the database
Managing Large Volumes of Data
• Work smarter in your data warehouse:
  –   Partitioning
  –   Bitmap indexes/Star transformation
  –   Data compression
  –   Query rewrite
• Work harder in your data warehouse:
  – Parallelism for all operations
       • DBA tasks, such as loading, index creation, table
         creation, data modification, backup and recovery
       • End-user operations, such as queries
       • Unbounded scalability: Real Application Clusters
I/O Performance in Data Warehouses

  – I/O is typically the primary determinant of data
    warehouse performance.
  – Data warehouse storage configurations should be
    chosen by I/O bandwidth, not storage capacity.
  – Every component of the I/O
    subsystem should provide
    enough bandwidth:
     • Disks
     • I/O channels
     • I/O adapters
  – In data warehouses, maximizing
    sequential I/O throughput is critical.
I/O Scalability
Parallel execution:
     – Reduces response time for data-intensive operations on large
       databases
     – Benefits systems with the following characteristics:
          • Multiprocessors, clusters, or massively parallel systems
          • Sufficient I/O bandwidth
          • Sufficient memory to support memory-intensive processes such
            as sorts, hashing, and I/O buffers

                                Query servers
                                                               Coordinator
 Data on disk          Scan                     Sort Q1

                       Scan                     Sort Q2
                                                               Dispatch
                                                               work
                       Scan                     Sort Q3

                       Scan                     Sort Q4
                     Scanners          Sorters (Aggregators)
I/O Scalability

• Automatic Storage Management (ASM)
  – Configuring storage for a DB depends on many
    variables:
     •   Which data to put on which disk
     •   Logical unit number (LUN) configurations
     •   DB types and workloads; data warehouse, OLTP, DSS
     •   Trade-offs between available options
  – ASM provides solutions to storage issues
    encountered in data warehouses.
I/O Scalability

• Automatic Storage Management: Overview
  – Portable and high-performance
    cluster file system                Application
  – Manages Oracle database files
  – Data spread across disks                Database
    to balance load                  File
  – Integrated mirroring across      system
                                                     ASM
    disks                            Volume
                                     manager
  – Solves many storage
    management challenges           Operating system
Visit more self help tutorials

• Pick a tutorial of your choice and browse
  through it at your own pace.
• The tutorials section is free, self-guiding and
  will not involve any additional support.
• Visit us at www.dataminingtools.net

Contenu connexe

Tendances

Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesDatabricks
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introductionleanderlee2
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big DataUmair Shafique
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterMongoDB
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Denodo
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureMapR Technologies
 

Tendances (20)

Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta Lakes
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 
Autonomous Data Warehouse
Autonomous Data WarehouseAutonomous Data Warehouse
Autonomous Data Warehouse
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
 

En vedette

Oracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionOracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionAditya Trivedi
 
business analysis-Data warehousing
business analysis-Data warehousingbusiness analysis-Data warehousing
business analysis-Data warehousingDhilsath Fathima
 
multiparty access control
multiparty access controlmultiparty access control
multiparty access controlLevin Sibi
 
Multiparty Access Control For Online Social Networks : Model and Mechanisms.
Multiparty Access Control For Online Social Networks : Model and Mechanisms.Multiparty Access Control For Online Social Networks : Model and Mechanisms.
Multiparty Access Control For Online Social Networks : Model and Mechanisms.Kiran K.V.S.
 
Data warehousing labs maunal
Data warehousing labs maunalData warehousing labs maunal
Data warehousing labs maunalEducation
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationVishal Kumar
 
Oratoria E RetóRica Latinas
Oratoria E RetóRica LatinasOratoria E RetóRica Latinas
Oratoria E RetóRica Latinaslara
 
Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4 Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4 guestaa9e6a
 
Bind How To
Bind How ToBind How To
Bind How Tocntlinux
 
MS SQL SERVER: Microsoft sequence clustering and association rules
MS SQL SERVER: Microsoft sequence clustering and association rulesMS SQL SERVER: Microsoft sequence clustering and association rules
MS SQL SERVER: Microsoft sequence clustering and association rulesDataminingTools Inc
 
MS SQL SERVER: Programming sql server data mining
MS SQL SERVER: Programming sql server data miningMS SQL SERVER: Programming sql server data mining
MS SQL SERVER: Programming sql server data miningDataminingTools Inc
 

En vedette (20)

Oracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionOracle 11g data warehouse introdution
Oracle 11g data warehouse introdution
 
Module Owb Basics
Module Owb BasicsModule Owb Basics
Module Owb Basics
 
Module Owb Process Flows
Module Owb Process FlowsModule Owb Process Flows
Module Owb Process Flows
 
Module Owb Lifecycle
Module Owb LifecycleModule Owb Lifecycle
Module Owb Lifecycle
 
business analysis-Data warehousing
business analysis-Data warehousingbusiness analysis-Data warehousing
business analysis-Data warehousing
 
multiparty access control
multiparty access controlmultiparty access control
multiparty access control
 
Multiparty Access Control For Online Social Networks : Model and Mechanisms.
Multiparty Access Control For Online Social Networks : Model and Mechanisms.Multiparty Access Control For Online Social Networks : Model and Mechanisms.
Multiparty Access Control For Online Social Networks : Model and Mechanisms.
 
Data warehousing labs maunal
Data warehousing labs maunalData warehousing labs maunal
Data warehousing labs maunal
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
 
LISP:Object System Lisp
LISP:Object System LispLISP:Object System Lisp
LISP:Object System Lisp
 
LISP: Scope and extent in lisp
LISP: Scope and extent in lispLISP: Scope and extent in lisp
LISP: Scope and extent in lisp
 
Oratoria E RetóRica Latinas
Oratoria E RetóRica LatinasOratoria E RetóRica Latinas
Oratoria E RetóRica Latinas
 
How To Make Pb J
How To Make Pb JHow To Make Pb J
How To Make Pb J
 
SPSS: File Managment
SPSS: File ManagmentSPSS: File Managment
SPSS: File Managment
 
Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4 Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4
 
Bind How To
Bind How ToBind How To
Bind How To
 
BI: Open Source
BI: Open SourceBI: Open Source
BI: Open Source
 
MS SQL SERVER: Microsoft sequence clustering and association rules
MS SQL SERVER: Microsoft sequence clustering and association rulesMS SQL SERVER: Microsoft sequence clustering and association rules
MS SQL SERVER: Microsoft sequence clustering and association rules
 
Data Applied:Forecast
Data Applied:ForecastData Applied:Forecast
Data Applied:Forecast
 
MS SQL SERVER: Programming sql server data mining
MS SQL SERVER: Programming sql server data miningMS SQL SERVER: Programming sql server data mining
MS SQL SERVER: Programming sql server data mining
 

Similaire à Oracle: DW Design

Relational
RelationalRelational
Relationaldieover
 
Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dworacle content
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database managementOnline
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxPriyadarshini648418
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introductionMurli Jha
 
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8amBusiness Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8amBarrett Peterson
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Oracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataAbishek V S
 

Similaire à Oracle: DW Design (20)

Relational
RelationalRelational
Relational
 
Oracle: Fundamental Of DW
Oracle: Fundamental Of DWOracle: Fundamental Of DW
Oracle: Fundamental Of DW
 
Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dw
 
DBMS
DBMSDBMS
DBMS
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database management
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Lecture3.ppt
Lecture3.pptLecture3.ppt
Lecture3.ppt
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 
The BI Sandbox
The BI SandboxThe BI Sandbox
The BI Sandbox
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8amBusiness Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)
 
Oracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big Data
 
Computing 7
Computing 7Computing 7
Computing 7
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 

Plus de DataminingTools Inc

AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceDataminingTools Inc
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web miningDataminingTools Inc
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDataminingTools Inc
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 

Plus de DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 

Dernier

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Oracle: DW Design

  • 2. Characteristics of a Data Warehouse A data warehouse is a database designed for querying, reporting, and analysis. A data warehouse contains historical data derived from transaction data. Data warehouses separate analysis workload from transaction workload. A data warehouse is primarily an analytical tool.
  • 3. Comparing OLTP and Data Warehouses OLTP Data Warehouse Many Joins Some Comparatively Data accessed by Large lower queries amount Normalized Duplicated data Denormalized DBMS DBMS Derived data Rare and Common aggregates
  • 4. Data Warehouse Architectures Analysis Operational systems Metadata Sales Purchasing Materialized Raw data Staging views area Reporting Inventory Flat files Data mining
  • 5. Data Warehouse Design • Key data warehouse design considerations: – Identify the specific data content. – Recognize the critical relationships within and between groups of data. – Define the system environment supporting your data warehouse. – Identify the required data transformations. – Calculate the frequency at which the data must be refreshed.
  • 6. Logical Design – A logical design is conceptual and abstract. – Entity-relationship (ER) modeling is useful in identifying logical information requirements. • An entity represents a chunk of data. • The properties of entities are known as attributes. • The links between entities and attributes are known as relationships. – Dimensional modeling is a specialized type of ER modeling useful in data warehouse design.
  • 7. Oracle Warehouse Builder – Oracle Database provides tools to implement the ETL process. • Oracle Warehouse Builder is a tool to help in this process. – Oracle Warehouse Builder generates the following types of code: • SQL data definition language (DDL) scripts • PL/SQL programs • SQL*Loader control files • XML Processing Description Language (XPDL) • ABAP code (used to extract data from SAP systems)
  • 8. Data Warehousing Schemas – Objects can be arranged in data warehousing schema models in a variety of ways: • Star schema • Snowflake schema • Third normal form (3NF) schema • Hybrid schemas – The source data model and user requirements should steer the data warehouse schema. – Implementation of the logical model may require changes to enable you to adapt it to your physical system.
  • 9. Schema Characteristics – Star schema • Characterized by one or more large fact tables and a number of much smaller dimension tables • Each dimension table joined to the fact table using a primary key to foreign key join – Snowflake schema • Dimension data grouped into multiple tables instead of one large table • Increased number of dimension tables, requiring more foreign key joins – Third normal form (3NF) schema • A classical relational-database model that minimizes data redundancy through normalization
  • 10. Data Warehousing Objects – Fact tables • Fact tables are the large tables that store business measurements. – Dimension tables • A dimension is a structure composed of one or more hierarchies that categorizes data. • Unique identifiers are specified for one distinct record in a dimension table. – Relationships • Relationships guarantee integrity of business information.
  • 11. Fact Tables – A fact table must be defined for each star schema. – Fact tables are the large tables that store business measurements. – A fact table contains either detail-level or aggregated facts. – A fact table usually contains facts with the same level of aggregation. – The primary key of the fact table is usually a composite key made up of all its foreign keys.
  • 12. Dimensions and Hierarchies CUSTOMERS dimension – A dimension is a structure hierarchy (by level) composed of one or more hierarchies that categorizes data. REGION – Dimensional attributes help to describe the dimensional value. SUBREGION – Dimension data is collected at the lowest level of detail and aggregatedCOUNTRY into higher level totals. – Hierarchies are structures that use STATE ordered levels to organize data. – In a hierarchy, each level is CITY connected to the levels above and below it. CUSTOMER
  • 13. Dimensions and Hierarchies PRODUCTS CUSTOMERS #prod_id Unique identifier #cust_id Fact table cust_last_name cust_city cust_state_province Relationship SALES cust_id prod_id Hierarchy TIMES CHANNELS PROMOTIONS Dimension table Dimension table Dimension table
  • 14. Physical Design Logical Physical (Tablespaces) Entities Tables Indexes Materialized Relationships Integrity views constraints - Primary key - Foreign key Attributes - Not null Dimensions Unique identifiers Columns
  • 15. Data Warehouse Physical Structures • Tables and partitioned tables – Partitioned tables enable you to split large data volumes into smaller, more manageable pieces. – Expect performance benefits from: • Partition pruning • Intelligent parallel processing – Compressed tables offer scaleup opportunities for read-only operations. – Table compression saves disk space.
  • 16. Data Warehouse Physical Structures – Views: • Are tailored presentations of data contained in one or more tables or views • Do not require any space in the database – Materialized views: • Are query results that have been stored in advance • (Like indexes) are used transparently and improve performance – Integrity constraints: • Are used in data warehouses for query rewrite – Dimensions: • Are containers of logical relationships and do not require any space in the database
  • 17. Managing Large Volumes of Data • Work smarter in your data warehouse: – Partitioning – Bitmap indexes/Star transformation – Data compression – Query rewrite • Work harder in your data warehouse: – Parallelism for all operations • DBA tasks, such as loading, index creation, table creation, data modification, backup and recovery • End-user operations, such as queries • Unbounded scalability: Real Application Clusters
  • 18. I/O Performance in Data Warehouses – I/O is typically the primary determinant of data warehouse performance. – Data warehouse storage configurations should be chosen by I/O bandwidth, not storage capacity. – Every component of the I/O subsystem should provide enough bandwidth: • Disks • I/O channels • I/O adapters – In data warehouses, maximizing sequential I/O throughput is critical.
  • 19. I/O Scalability Parallel execution: – Reduces response time for data-intensive operations on large databases – Benefits systems with the following characteristics: • Multiprocessors, clusters, or massively parallel systems • Sufficient I/O bandwidth • Sufficient memory to support memory-intensive processes such as sorts, hashing, and I/O buffers Query servers Coordinator Data on disk Scan Sort Q1 Scan Sort Q2 Dispatch work Scan Sort Q3 Scan Sort Q4 Scanners Sorters (Aggregators)
  • 20. I/O Scalability • Automatic Storage Management (ASM) – Configuring storage for a DB depends on many variables: • Which data to put on which disk • Logical unit number (LUN) configurations • DB types and workloads; data warehouse, OLTP, DSS • Trade-offs between available options – ASM provides solutions to storage issues encountered in data warehouses.
  • 21. I/O Scalability • Automatic Storage Management: Overview – Portable and high-performance cluster file system Application – Manages Oracle database files – Data spread across disks Database to balance load File – Integrated mirroring across system ASM disks Volume manager – Solves many storage management challenges Operating system
  • 22. Visit more self help tutorials • Pick a tutorial of your choice and browse through it at your own pace. • The tutorials section is free, self-guiding and will not involve any additional support. • Visit us at www.dataminingtools.net