SlideShare une entreprise Scribd logo
1  sur  52
Mohammed AlSolh
Supervised By: Dr.Ghassan Qadah

Survey On Temporal Data And
Change Management in Data
Warehouses
Outline

Terminologies

Explore
Methods

Conclusion
Terminologies
Data Warehouse & Data Marts
•
•

•

a data warehouse contains data from several databases maintained by different
business units with historical and summary information
It is a database used for reporting and data analysis where the data is arranged into
hierarchical groups often called dimensions and represented into facts and aggregate
facts
Data warehouses can be subdivided into data marts. Data marts store subsets of data
from a warehouse
Temporal database

Temporal database
is a database with built-in
support for handling data
involving time, temporal
data is data that keeps
track of changes over time

Contains the following
attributes
• Valid time
• Transaction time
• Bitemporal data
combines both Valid
and Transaction Time
Multidimensional data model

The multidimensional data
model is designed to solve
complex queries in real
time, it produces a Cube
which is like a 3d
spreadsheet, It represents
its information in
dimensions & facts and they
are usually maintained in a
star schema
Multidimensional Model Terms
• Fact
– A business performance
measurement, typically
numeric and additive,

• Dimension
– Is an object that includes
attributes allowing the user
to explore the measures from
different perspectives of
analysis

• Measures
– Are the numeric records
stored in the fact table

• Hierarchy
– Is a collection of dimensions
that form a hierarchy, such as
country/city/state

• Property
– Are additional descriptive
attributes to a dimension
Multidimensional Model Operations
• Drill-up/Drill-Down:
Moving from
summary category to
individual categories
and vice versa
• Roll Up cities were
vertically and
products horizontally,
after the roll up it has
swapped both
dimensions
Schema & Data Changes
• Databases Schema Changes
– With loss of data by simply
changing the schema
– Without loss of data but the
data is evolved with keeping
the attributes of the old
schema (evolution)
– Without loss of data by
changing schemas with
keeping old versions of the
old schemas (versioning)

• Data Changes in
warehouses
– Transient Data by deletions
and updates without
maintaining the old data
– Periodic Data which handles
deletions and updates by
adding new records
– Semi Periodic is same as
periodic but keeps only a
recent collection of changes
– Keeping snapshots of
complete data which is
popular in datamarts
Materialized Views
• View Maintenance:
– The process of updating a
materialized view in
response to changes to the
underlying data

• View adaption
– View adaptation aims to
leverage the previously
materialized view to
generate the new view,
since the cost of rebuilding
the materialized view from
scratch may be expensive

• Materialized view
– Is a stored view query
result which is like a cache
– Used by query optimizer to
speed up querying
Explore Methods
A SchemaGuide for Accelerating the View Adaptation
Process

• an efficient process for view
adaptation in XML Databases
upon the fragment-based view
representation by segmenting
materialized data into
fragments and developing
algorithms to update only
those materialized fragments
that have affected by the view
definition changes

– Their Adaption process
• Calling an optimized
containment check for the most
suitable fragment that contain
the requested fragment
• Adapt the XFM structure to the
fragments found
• find a materialized fragment
that is affected by the change
• search for existing materialized
fragments that can be reused
and mapped to the affected
materialized fragment

– It has shown significant
improvement by reducing up to
2.6% of recomposing the
materialized view
Multi-version Data Warehouse

1. Automatic detection of structural
and content changes in the data
sources and reflection on the data
warehouse by keeping a sequence of
persistent versions

• Their Solution Supports
– monitoring External data sources
with respect to content and
structural changes
– automatic generation of processes
monitoring External Data Sources
– applying discovered External Data
Sources changes to a selected DW
version
– describing the structure of every
DW version
– querying multiple DW versions at
the same time and presenting the
results coming from multiple
versions
– visualizing the schema
MaSM (Materialized Sort Merge)

2. Efficient Online Updates in Data Warehouses
• an approach for supporting online updates by making
use of SSDs to cache incoming updates
• model the problem of query processing with differential
updates as a type of outer join between the data residing
on disks and the updates residing on SSDs
• present algorithms for performing such joins and
periodic migrations, as for example The updates are
migrated to disks only when the system load is low or
when updates reach a certain threshold (e.g., 90%) of the
SSD size
Data changes in the data mart

• changes can be first
updated in the
warehouse then data
marts under it

• Changes in data mart
are segregated to
– dimensional data
changes
– factual data changes
– schema changes
Dimensional Data Changes

• Which are Changes in a
hierarchy, Can be either
a dimension or a level
or a property

• Kimball Proposes Three
solutions to changes in
ROLAP
multidimensional
models
ROLAP multidimensional models
• In the Type I solution he simply proposes to overwrite old
tuples in dimension tables with new data. The problem is you
cannot track changes but keeps the data mart up to date
• Type II solution, each change produces a new record in the
dimension table. Surrogate keys must be used, you can keep
and track changes along with the new data
• the Type III solution is based on augmenting the schema of
the dimension table by representing both the current and the
previous value for each level or attribute subject to change
• Keeping the complete history TypeVI (I+II+III) the more data
you keep in hierarchy the more expensive, you need to keep
additional timestamps
Changes in Factual data

• Examples of changes happens for such cases like
errors in measurements such as levels of the sea
were captured incorrectly and fixed later
• The facts are classified based on the conceptual role
to Flow facts & Stock facts
Managing Late Measurements In Data Warehouses

• a proposal to couple valid time and transaction time and distinguish two
different solutions for managing late measurements
• Flow Model - delta solution, where each new measurement for an event
is represented as a delta (current registration – previous registration) with
respect to the previous measurement and transaction time is modeled by
adding to the schema a new temporal dimension that models the valid
time to represent when each registration was made in the data mart,
queries are answered by summing for each event all registrations,
historical queries are answered by selectively summing all registrations
for an event for the time queried
• Stock Model - consolidated solution, where late measurements are
represented by recording the consolidated value for the event by using 2
timestamps, and transaction time is modeled by two temporal
dimensions that delimit the time interval during which each registration is
current, like the currency and its time interval
Managing Late Measurements In Data Warehouses

• 2 approaches are followed for handling schema
changes:
– schema evolution, maintain old information without data
loss but loss of old schema
– schema versioning, separate versions are stored and user
can access different schema versions
Schema Evolution

• Propagating the Evolution of Data Warehouse on
Data Marts
• operators to support changing the data mart schema
– evolution operators for the data warehouse
• basic operations and composite operations

– evolution operators for the data mart
• mapping function

– a set of rules for the evolution
Propagating the Evolution of Data Warehouse on
Data Marts

• This mapping function is embedded into the Extract
Transform Load process from the data warehouse to
the datamart, these functions are for example:
• Fact(Table): returns a set of facts from the data
warehouse tables
• Dim(Table): returns a superset of dimensions from
the data warehouse table, each superset contains all
dimensions of the data mart cube
Propagating the Evolution of Data Warehouse on
Data Marts

Propagation operations
• Add_Dim(Dname, Fi, T): adds a new dimention
which will be named as “Dname” to the data mart
fact “Fi”, it will take the primary key of table T from
the data warehouse and a subset of textual
attributes contained in T
• Add_Fact(Fname, T, set(D)): adds a new fact
“Fname” with dimensions set(D), the fact measures
are the numeric attributes of T
Propagating the Evolution of Data Warehouse on
Data Marts

A set of rules applies on the data warehouse to data mart mapping
process. Such as:
• If a table T to be added to the data warehouse has foreign keys in
another table existing in the data warehouse that concerns a fact,
the table will add a new dimension to the for the fact with
attributes of the table to be the attributes of the dimension
• If T doesn’t have foreign keys in another table in the data
warehouse and has different foreign keys pointing to other tables
that loads dimensions in the datamart and T has numeric
attributes, then T will probably create a new fact
Note: Commercially, SQL Compare & Oracle Change Management
Pack supports evolving schemas compare and generate scripts
Schema Versioning
• Decision makers may have built their decision on an
old schema and changes appeared after their
executing their queries which also may have
measure changes, to run the same query again and
produce the same result, non-volatility is required.
With changes at the schema level there has to be
some versioning approaches
Schema Versioning
• A comprehensive approach to versioning is presented in the
multiversion data warehouse [31], they propose two
metamodels: one for managing a multi-version data mart and
one for detecting changes in the operational sources, along
with “real” versions which are versions used in the
application domain, also “alternative” versions are
introduced which are used for simulating and managing
hypothetical business scenarios within what-if analysis
settings
• Commercially, several database management systems
(DBMSs) offer support for valid and transaction time: Oracle
11g, IBM DB2 10 for z/OS, and Teradata Database 14. Part 2
(SQL Foundation) of SQL:2011 was just released
Querying temporal data
• Cross Version Querying
– Multiversion Data Warehouse it allows users to specify
either specify a time interval for a data warehouse or
specify versions to query

• Temporal Querying
– Temporal Queries On TerraData
• Native Temporal Implementation
• Rewriting Approach
Querying temporal data

Disadvantages of native approach
– Since temporal data is stored in a new data type, SQL
execution code needs to be modified for joins and
aggregation on temporal data
– Query optimization needs to be adapted to the new
temporal tables
– Some duplications might occur in code of functions of a
DBMS to support temporal data
Rewrite approach
– There is no impact on execution code
– There is a small impact on the optimizer
– No duplication as it will add a step before the query
optimizer
– But it will add complexity to the query structure
Rewrite approach
– Rewrites will modify projection, selection & join
• Select * for example will exclude the time dimension if the
qualifier is CURRENT
• For CURRENT & SEQUENCED qualifiers, it will add time predicate
respectively
• For Join, temporal qualifiers are applied before the join, For inner
join it works, but for outer joins we need to apply the qualifiers on
each table separately then from the derived tables we perform
the join

– They showed in their study that rewrites were only adding
5% to the execution time of the query in comparison to
the native implementation
Fusion Cubes

• A framework to support
self-service business
intelligence in
multidimensional cubes
that can be dynamically
extended both in their
schema and their
instances

• it can include both
stationary and
situational data
Situational Query
• A user poses an OLAP-like situational query, one that
cannot be answered on stationary data only
• The system discovers potentially relevant data sources
• The system fetches relevant situational data from selected
sources
• The system integrates situational data with the user’s data,
if any
• The system visualizes the results and the user employs
them for making her decision
• The user stores and shares the results
Situational Query
fusion cube architecture
Situational Query

• integration of external data sources can be in
– RDF format to be integrated to the data warehouse on the fly
– Social networks, there is an implantation called
MicroStrategy that analyses social networks
– Blogs where we can do “opinion mining”, but to integrate it,
it is challenging because the data is unstructured
Drill Beyond

• Drill-Beyond operator can go beyond:
– The schema - A user can click on a dimension or a fact that is
not available
– The instances – a user can request for new instance for an
attribute, such as a new country so the values will be
retrieved

• Query formulation can involve different technologies
for situational data:
– SPARQL for querying RDF data
– Use of Web APIs that provides data in XML or JSON format
Integration
• Once the data is available, it has to be integrated with the
stationary data to formulate the fusion cubes
– Extract the structure of different situational data
• Google fusion tables (Gonzalez et al., 2010) offers cloud-based storage of
basic relational tables that can be shared with others, annotated, fused
with other data, this helps extracting relations from unstructured data

– Map the schema of data sources
• XFM for example

– Reconcile with stationary data to formulate the fusion cube
• Google Refine (code.google.com/p/google-refine/), which provides
functionalities for working with messy data, cleaning it up, transforming
it from one format into another, extending it with web services, and
linking it to databases
Support On Commercial Systems

• There are many commercial systems available to support
the fusion cubes:
• illo system (illo.com) allows non-business users to store
small cubes in a cloud-based environment, analyze the data
using powerful mechanisms, and provide visual analysis
results that can be shared with others and annotated
• The Stratosphere project (www.stratosphere.eu), which
explores the power of massively parallel computing for Big
Data analytics
Support On Commercial Systems
Conclusion
Conclusion

• We have explored different methodologies for handling
temporal data and changes of schema, factual data,
dimensional data in Data Warehouses, researches are
coming with bright ideas helping different scenarios to
facilitate dynamic features in a Data Warehouse and
speed up its query processing. We encourage
commercial systems to look into these new
methodologies and make it available for practical use
to the public
Conclusion
Propagating
A
the
ROLAP
SchemaGui
Managing
MaSM
Evolution of Temporal
multidimens de for
MultiLate
(Materialize
Data
Queries On Fusion
ional
Accelerating version Data
Measureme
d Sort
Warehouse TerraData Cubes
models
the View
Warehouse
nts In Data
Merge)
on Data
Adaptation
Warehouses
Marts
Process
Method
View Adaption

YES

DW in sync with the sources
dimensional data
changes

YES
YES

YES

YES

YES
Changes
in the
data
mart

YES

factual data changes
YES

schema
changes

schema
evolution
schema
versioning

YES

YES

YES

YES

Software
Prototyp
e
YES

YES

YES
Issues & Future Work

•

•

•

There is a need to formulate a query that can span different data base schema and
produce results at once we suggest an approach to use an extra attribute in the data
warehouse to store heterogeneous an non-normalized data in an XML format which is
highly extensible.
The data warehouse design needs to have dynamic updates in schema without
physically modifying or replicating the current store, as above an XML extension can
help in storing attributes that seem to be changing frequently, although it might be a
drawback in performance there can be an approach to identify fixed dimensions and
properties that can be structured and queried appropriately, then there can be a
second phase to query processing where XML data can be explored and embedded in
the multidimensional data form
Queries that span across different versions in the scope of version based schema can
be time consuming, either a revision to restructure versioning can be done to store
unchanged values in a different version that the changed values it can significantly
reduce the space and time requirements and be more feasible in querying
Main References
• [22] Felix Naumann & Stefano Rizzi 2013 – Fusion Cubes
• [30] MaSM: Efficient Online Updates in Data Warehouses
http://www.cs.cmu.edu/~chensm/papers/MaSM-sigmod11.pdf
• [37] Managing Late Measurements In Data Warehouses (Matteo
Golfarelli & Stefano Rizzi 2007)
• [38] A SchemaGuide for Accelerating the View Adaptation Process (Jun
Liu1, Mark Roantree1, and Zohra Bellahsene2 2010)
• [35] Temporal Query Processing in Teradata (Mohammed Al-Kateb,
Ahmad Ghazal, Alain Crolotte 2013)
• [33] Toward Propagating the Evolution of Data Warehouse on Data
Marts (Saïd Taktak and Jamel Feki, 2012)
• [31] Wrembel and Bebel (2007)
Supporting References

•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

[1] Ramakrishnan (DBMS 3rd ed) Chapter 25
[2] http://en.wikipedia.org/wiki/Data_warehouse
[3] Introduction to Information Systems (Marakas & O'Brien 2009)
[4] Kimball, The Data Warehouse Toolkit 2nd Ed (2002) Chapter 1
[5] http://en.wikipedia.org/wiki/Temporal_database
[6] http://www.olapcouncil.org/research/glossaryly.htm
[7] http://en.wikipedia.org/wiki/OLAP_cube
[8] http://docs.oracle.com/cd/B12037_01/olap.101/b10333/multimodel.htm
[9] Multidimensional Database Technology: Bach Pedersen, Torben; S. Jensen, Christian (December 2001).
[10] TSQL2 Language Specification https://cs.arizona.edu/~rts/initiatives/tsql2/finalspec.pdf
[11] Sybase Infocenter
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00269.1571/doc/html/bde1279401694270.html
[12] (Roddick, 1995)
[13] (Grandi, 2002)
[15] (Devlin, 1997)
[16] "Information technology -- Database languages -- SQL -- Part 2: Foundation (SQL/Foundation)," International Standards Organization,
December 2011
[18] Gupta Maintenance of Materialized Views: Problems, Techniques, and Applications
[20] De Amo & Halfeld Ferrari Alves (2000)
[21] Avoiding re-computation: View adaptation in data warehouses (1997) M Mohania
[23] http://en.wikipedia.org/wiki/Resource_Description_Framework
?

Questions?
Survey On Temporal Data And Change Management in Data Warehouses

Contenu connexe

Tendances

Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodellingmeghu123
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouseUday Kothari
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousingAhmad Shlool
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousingAhmad Shlool
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional ModellingVincent Rainardi
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
 
Mc leod9e ch08 information in action
Mc leod9e ch08 information in actionMc leod9e ch08 information in action
Mc leod9e ch08 information in actionsellyhood
 
Data Warehousing and Bitmap Indexes - More than just some bits
Data Warehousing and Bitmap Indexes  - More than just some bitsData Warehousing and Bitmap Indexes  - More than just some bits
Data Warehousing and Bitmap Indexes - More than just some bitsTrivadis
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingVibrant Event
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasiryasir873
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
Consolidation with bps and bcs
Consolidation with bps and bcsConsolidation with bps and bcs
Consolidation with bps and bcspedrochapin
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its conceptsGaurav Garg
 

Tendances (20)

Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouse
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousing
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousing
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional Modelling
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
3dw
3dw3dw
3dw
 
Mc leod9e ch08 information in action
Mc leod9e ch08 information in actionMc leod9e ch08 information in action
Mc leod9e ch08 information in action
 
Data Warehousing and Bitmap Indexes - More than just some bits
Data Warehousing and Bitmap Indexes  - More than just some bitsData Warehousing and Bitmap Indexes  - More than just some bits
Data Warehousing and Bitmap Indexes - More than just some bits
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
Awb
AwbAwb
Awb
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
SAP data archiving
SAP data archivingSAP data archiving
SAP data archiving
 
Cs1011 dw-dm-1
Cs1011 dw-dm-1Cs1011 dw-dm-1
Cs1011 dw-dm-1
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
Consolidation with bps and bcs
Consolidation with bps and bcsConsolidation with bps and bcs
Consolidation with bps and bcs
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its concepts
 

Similaire à Survey On Temporal Data And Change Management in Data Warehouses

Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesInformaticaTrainingClasses
 
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSkillwise Group
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsQuontra Solutions
 
Sql server 2016 new features
Sql server 2016 new featuresSql server 2016 new features
Sql server 2016 new featuresAjeet Singh
 
Data flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingData flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingDr. Dipti Patil
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsTerry Bunio
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousingShahed Khalili
 
datamarts.ppt
datamarts.pptdatamarts.ppt
datamarts.pptbhavyag24
 
Oracle data capture c dc
Oracle data capture c dcOracle data capture c dc
Oracle data capture c dcAmit Sharma
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptRafiulHasan19
 
Dimensional modeling primer
Dimensional modeling primerDimensional modeling primer
Dimensional modeling primerTerry Bunio
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingVibrant Event
 

Similaire à Survey On Temporal Data And Change Management in Data Warehouses (20)

Data modeling facts
Data modeling factsData modeling facts
Data modeling facts
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
Sql server 2016 new features
Sql server 2016 new featuresSql server 2016 new features
Sql server 2016 new features
 
Sql server 2016 new features
Sql server 2016 new featuresSql server 2016 new features
Sql server 2016 new features
 
Data flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingData flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousing
 
Chapter 6.pptx
Chapter 6.pptxChapter 6.pptx
Chapter 6.pptx
 
Asper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling TopicsAsper database presentation - Data Modeling Topics
Asper database presentation - Data Modeling Topics
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
datamarts.ppt
datamarts.pptdatamarts.ppt
datamarts.ppt
 
Datawarehouse org
Datawarehouse orgDatawarehouse org
Datawarehouse org
 
Oracle data capture c dc
Oracle data capture c dcOracle data capture c dc
Oracle data capture c dc
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
Dimensional modeling primer
Dimensional modeling primerDimensional modeling primer
Dimensional modeling primer
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
 

Dernier

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Survey On Temporal Data And Change Management in Data Warehouses

  • 1. Mohammed AlSolh Supervised By: Dr.Ghassan Qadah Survey On Temporal Data And Change Management in Data Warehouses
  • 4. Data Warehouse & Data Marts • • • a data warehouse contains data from several databases maintained by different business units with historical and summary information It is a database used for reporting and data analysis where the data is arranged into hierarchical groups often called dimensions and represented into facts and aggregate facts Data warehouses can be subdivided into data marts. Data marts store subsets of data from a warehouse
  • 5. Temporal database Temporal database is a database with built-in support for handling data involving time, temporal data is data that keeps track of changes over time Contains the following attributes • Valid time • Transaction time • Bitemporal data combines both Valid and Transaction Time
  • 6. Multidimensional data model The multidimensional data model is designed to solve complex queries in real time, it produces a Cube which is like a 3d spreadsheet, It represents its information in dimensions & facts and they are usually maintained in a star schema
  • 7. Multidimensional Model Terms • Fact – A business performance measurement, typically numeric and additive, • Dimension – Is an object that includes attributes allowing the user to explore the measures from different perspectives of analysis • Measures – Are the numeric records stored in the fact table • Hierarchy – Is a collection of dimensions that form a hierarchy, such as country/city/state • Property – Are additional descriptive attributes to a dimension
  • 8. Multidimensional Model Operations • Drill-up/Drill-Down: Moving from summary category to individual categories and vice versa • Roll Up cities were vertically and products horizontally, after the roll up it has swapped both dimensions
  • 9. Schema & Data Changes • Databases Schema Changes – With loss of data by simply changing the schema – Without loss of data but the data is evolved with keeping the attributes of the old schema (evolution) – Without loss of data by changing schemas with keeping old versions of the old schemas (versioning) • Data Changes in warehouses – Transient Data by deletions and updates without maintaining the old data – Periodic Data which handles deletions and updates by adding new records – Semi Periodic is same as periodic but keeps only a recent collection of changes – Keeping snapshots of complete data which is popular in datamarts
  • 10. Materialized Views • View Maintenance: – The process of updating a materialized view in response to changes to the underlying data • View adaption – View adaptation aims to leverage the previously materialized view to generate the new view, since the cost of rebuilding the materialized view from scratch may be expensive • Materialized view – Is a stored view query result which is like a cache – Used by query optimizer to speed up querying
  • 12.
  • 13. A SchemaGuide for Accelerating the View Adaptation Process • an efficient process for view adaptation in XML Databases upon the fragment-based view representation by segmenting materialized data into fragments and developing algorithms to update only those materialized fragments that have affected by the view definition changes – Their Adaption process • Calling an optimized containment check for the most suitable fragment that contain the requested fragment • Adapt the XFM structure to the fragments found • find a materialized fragment that is affected by the change • search for existing materialized fragments that can be reused and mapped to the affected materialized fragment – It has shown significant improvement by reducing up to 2.6% of recomposing the materialized view
  • 14.
  • 15. Multi-version Data Warehouse 1. Automatic detection of structural and content changes in the data sources and reflection on the data warehouse by keeping a sequence of persistent versions • Their Solution Supports – monitoring External data sources with respect to content and structural changes – automatic generation of processes monitoring External Data Sources – applying discovered External Data Sources changes to a selected DW version – describing the structure of every DW version – querying multiple DW versions at the same time and presenting the results coming from multiple versions – visualizing the schema
  • 16. MaSM (Materialized Sort Merge) 2. Efficient Online Updates in Data Warehouses • an approach for supporting online updates by making use of SSDs to cache incoming updates • model the problem of query processing with differential updates as a type of outer join between the data residing on disks and the updates residing on SSDs • present algorithms for performing such joins and periodic migrations, as for example The updates are migrated to disks only when the system load is low or when updates reach a certain threshold (e.g., 90%) of the SSD size
  • 17.
  • 18. Data changes in the data mart • changes can be first updated in the warehouse then data marts under it • Changes in data mart are segregated to – dimensional data changes – factual data changes – schema changes
  • 19. Dimensional Data Changes • Which are Changes in a hierarchy, Can be either a dimension or a level or a property • Kimball Proposes Three solutions to changes in ROLAP multidimensional models
  • 20. ROLAP multidimensional models • In the Type I solution he simply proposes to overwrite old tuples in dimension tables with new data. The problem is you cannot track changes but keeps the data mart up to date • Type II solution, each change produces a new record in the dimension table. Surrogate keys must be used, you can keep and track changes along with the new data • the Type III solution is based on augmenting the schema of the dimension table by representing both the current and the previous value for each level or attribute subject to change • Keeping the complete history TypeVI (I+II+III) the more data you keep in hierarchy the more expensive, you need to keep additional timestamps
  • 21. Changes in Factual data • Examples of changes happens for such cases like errors in measurements such as levels of the sea were captured incorrectly and fixed later • The facts are classified based on the conceptual role to Flow facts & Stock facts
  • 22. Managing Late Measurements In Data Warehouses • a proposal to couple valid time and transaction time and distinguish two different solutions for managing late measurements • Flow Model - delta solution, where each new measurement for an event is represented as a delta (current registration – previous registration) with respect to the previous measurement and transaction time is modeled by adding to the schema a new temporal dimension that models the valid time to represent when each registration was made in the data mart, queries are answered by summing for each event all registrations, historical queries are answered by selectively summing all registrations for an event for the time queried • Stock Model - consolidated solution, where late measurements are represented by recording the consolidated value for the event by using 2 timestamps, and transaction time is modeled by two temporal dimensions that delimit the time interval during which each registration is current, like the currency and its time interval
  • 23.
  • 24. Managing Late Measurements In Data Warehouses • 2 approaches are followed for handling schema changes: – schema evolution, maintain old information without data loss but loss of old schema – schema versioning, separate versions are stored and user can access different schema versions
  • 25. Schema Evolution • Propagating the Evolution of Data Warehouse on Data Marts • operators to support changing the data mart schema – evolution operators for the data warehouse • basic operations and composite operations – evolution operators for the data mart • mapping function – a set of rules for the evolution
  • 26. Propagating the Evolution of Data Warehouse on Data Marts • This mapping function is embedded into the Extract Transform Load process from the data warehouse to the datamart, these functions are for example: • Fact(Table): returns a set of facts from the data warehouse tables • Dim(Table): returns a superset of dimensions from the data warehouse table, each superset contains all dimensions of the data mart cube
  • 27. Propagating the Evolution of Data Warehouse on Data Marts Propagation operations • Add_Dim(Dname, Fi, T): adds a new dimention which will be named as “Dname” to the data mart fact “Fi”, it will take the primary key of table T from the data warehouse and a subset of textual attributes contained in T • Add_Fact(Fname, T, set(D)): adds a new fact “Fname” with dimensions set(D), the fact measures are the numeric attributes of T
  • 28. Propagating the Evolution of Data Warehouse on Data Marts A set of rules applies on the data warehouse to data mart mapping process. Such as: • If a table T to be added to the data warehouse has foreign keys in another table existing in the data warehouse that concerns a fact, the table will add a new dimension to the for the fact with attributes of the table to be the attributes of the dimension • If T doesn’t have foreign keys in another table in the data warehouse and has different foreign keys pointing to other tables that loads dimensions in the datamart and T has numeric attributes, then T will probably create a new fact Note: Commercially, SQL Compare & Oracle Change Management Pack supports evolving schemas compare and generate scripts
  • 29. Schema Versioning • Decision makers may have built their decision on an old schema and changes appeared after their executing their queries which also may have measure changes, to run the same query again and produce the same result, non-volatility is required. With changes at the schema level there has to be some versioning approaches
  • 30. Schema Versioning • A comprehensive approach to versioning is presented in the multiversion data warehouse [31], they propose two metamodels: one for managing a multi-version data mart and one for detecting changes in the operational sources, along with “real” versions which are versions used in the application domain, also “alternative” versions are introduced which are used for simulating and managing hypothetical business scenarios within what-if analysis settings • Commercially, several database management systems (DBMSs) offer support for valid and transaction time: Oracle 11g, IBM DB2 10 for z/OS, and Teradata Database 14. Part 2 (SQL Foundation) of SQL:2011 was just released
  • 31. Querying temporal data • Cross Version Querying – Multiversion Data Warehouse it allows users to specify either specify a time interval for a data warehouse or specify versions to query • Temporal Querying – Temporal Queries On TerraData • Native Temporal Implementation • Rewriting Approach
  • 32. Querying temporal data Disadvantages of native approach – Since temporal data is stored in a new data type, SQL execution code needs to be modified for joins and aggregation on temporal data – Query optimization needs to be adapted to the new temporal tables – Some duplications might occur in code of functions of a DBMS to support temporal data
  • 33. Rewrite approach – There is no impact on execution code – There is a small impact on the optimizer – No duplication as it will add a step before the query optimizer – But it will add complexity to the query structure
  • 34. Rewrite approach – Rewrites will modify projection, selection & join • Select * for example will exclude the time dimension if the qualifier is CURRENT • For CURRENT & SEQUENCED qualifiers, it will add time predicate respectively • For Join, temporal qualifiers are applied before the join, For inner join it works, but for outer joins we need to apply the qualifiers on each table separately then from the derived tables we perform the join – They showed in their study that rewrites were only adding 5% to the execution time of the query in comparison to the native implementation
  • 35.
  • 36. Fusion Cubes • A framework to support self-service business intelligence in multidimensional cubes that can be dynamically extended both in their schema and their instances • it can include both stationary and situational data
  • 37. Situational Query • A user poses an OLAP-like situational query, one that cannot be answered on stationary data only • The system discovers potentially relevant data sources • The system fetches relevant situational data from selected sources • The system integrates situational data with the user’s data, if any • The system visualizes the results and the user employs them for making her decision • The user stores and shares the results
  • 40. Situational Query • integration of external data sources can be in – RDF format to be integrated to the data warehouse on the fly – Social networks, there is an implantation called MicroStrategy that analyses social networks – Blogs where we can do “opinion mining”, but to integrate it, it is challenging because the data is unstructured
  • 41. Drill Beyond • Drill-Beyond operator can go beyond: – The schema - A user can click on a dimension or a fact that is not available – The instances – a user can request for new instance for an attribute, such as a new country so the values will be retrieved • Query formulation can involve different technologies for situational data: – SPARQL for querying RDF data – Use of Web APIs that provides data in XML or JSON format
  • 42. Integration • Once the data is available, it has to be integrated with the stationary data to formulate the fusion cubes – Extract the structure of different situational data • Google fusion tables (Gonzalez et al., 2010) offers cloud-based storage of basic relational tables that can be shared with others, annotated, fused with other data, this helps extracting relations from unstructured data – Map the schema of data sources • XFM for example – Reconcile with stationary data to formulate the fusion cube • Google Refine (code.google.com/p/google-refine/), which provides functionalities for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases
  • 43. Support On Commercial Systems • There are many commercial systems available to support the fusion cubes: • illo system (illo.com) allows non-business users to store small cubes in a cloud-based environment, analyze the data using powerful mechanisms, and provide visual analysis results that can be shared with others and annotated • The Stratosphere project (www.stratosphere.eu), which explores the power of massively parallel computing for Big Data analytics
  • 46. Conclusion • We have explored different methodologies for handling temporal data and changes of schema, factual data, dimensional data in Data Warehouses, researches are coming with bright ideas helping different scenarios to facilitate dynamic features in a Data Warehouse and speed up its query processing. We encourage commercial systems to look into these new methodologies and make it available for practical use to the public
  • 47. Conclusion Propagating A the ROLAP SchemaGui Managing MaSM Evolution of Temporal multidimens de for MultiLate (Materialize Data Queries On Fusion ional Accelerating version Data Measureme d Sort Warehouse TerraData Cubes models the View Warehouse nts In Data Merge) on Data Adaptation Warehouses Marts Process Method View Adaption YES DW in sync with the sources dimensional data changes YES YES YES YES YES Changes in the data mart YES factual data changes YES schema changes schema evolution schema versioning YES YES YES YES Software Prototyp e YES YES YES
  • 48. Issues & Future Work • • • There is a need to formulate a query that can span different data base schema and produce results at once we suggest an approach to use an extra attribute in the data warehouse to store heterogeneous an non-normalized data in an XML format which is highly extensible. The data warehouse design needs to have dynamic updates in schema without physically modifying or replicating the current store, as above an XML extension can help in storing attributes that seem to be changing frequently, although it might be a drawback in performance there can be an approach to identify fixed dimensions and properties that can be structured and queried appropriately, then there can be a second phase to query processing where XML data can be explored and embedded in the multidimensional data form Queries that span across different versions in the scope of version based schema can be time consuming, either a revision to restructure versioning can be done to store unchanged values in a different version that the changed values it can significantly reduce the space and time requirements and be more feasible in querying
  • 49. Main References • [22] Felix Naumann & Stefano Rizzi 2013 – Fusion Cubes • [30] MaSM: Efficient Online Updates in Data Warehouses http://www.cs.cmu.edu/~chensm/papers/MaSM-sigmod11.pdf • [37] Managing Late Measurements In Data Warehouses (Matteo Golfarelli & Stefano Rizzi 2007) • [38] A SchemaGuide for Accelerating the View Adaptation Process (Jun Liu1, Mark Roantree1, and Zohra Bellahsene2 2010) • [35] Temporal Query Processing in Teradata (Mohammed Al-Kateb, Ahmad Ghazal, Alain Crolotte 2013) • [33] Toward Propagating the Evolution of Data Warehouse on Data Marts (Saïd Taktak and Jamel Feki, 2012) • [31] Wrembel and Bebel (2007)
  • 50. Supporting References • • • • • • • • • • • • • • • • • • • [1] Ramakrishnan (DBMS 3rd ed) Chapter 25 [2] http://en.wikipedia.org/wiki/Data_warehouse [3] Introduction to Information Systems (Marakas & O'Brien 2009) [4] Kimball, The Data Warehouse Toolkit 2nd Ed (2002) Chapter 1 [5] http://en.wikipedia.org/wiki/Temporal_database [6] http://www.olapcouncil.org/research/glossaryly.htm [7] http://en.wikipedia.org/wiki/OLAP_cube [8] http://docs.oracle.com/cd/B12037_01/olap.101/b10333/multimodel.htm [9] Multidimensional Database Technology: Bach Pedersen, Torben; S. Jensen, Christian (December 2001). [10] TSQL2 Language Specification https://cs.arizona.edu/~rts/initiatives/tsql2/finalspec.pdf [11] Sybase Infocenter http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00269.1571/doc/html/bde1279401694270.html [12] (Roddick, 1995) [13] (Grandi, 2002) [15] (Devlin, 1997) [16] "Information technology -- Database languages -- SQL -- Part 2: Foundation (SQL/Foundation)," International Standards Organization, December 2011 [18] Gupta Maintenance of Materialized Views: Problems, Techniques, and Applications [20] De Amo & Halfeld Ferrari Alves (2000) [21] Avoiding re-computation: View adaptation in data warehouses (1997) M Mohania [23] http://en.wikipedia.org/wiki/Resource_Description_Framework

Notes de l'éditeur

  1. This presentation demonstrates the new capabilities of PowerPoint and it is best viewed in Slide Show. These slides are designed to give you great ideas for the presentations you’ll create in PowerPoint 2010!For more sample templates, click the File tab, and then on the New tab, click Sample Templates.
  2. performs slicing (filtering data) and dicing (grouping data) of the additive measures located in the fact table of the dimensional model, these attributes are called “Dimension Levels”quantity, cost, number of customersthe hierarchy of dimensions can offer a summarized and detailed view of an analysis
  3. performs slicing (filtering data) and dicing (grouping data) of the additive measures located in the fact table of the dimensional model, these attributes are called “Dimension Levels”quantity, cost, number of customersthe hierarchy of dimensions can offer a summarized and detailed view of an analysis
  4. The main problem in the multidimensional model that it relies on static dimensions, which is not realistic due to for example product catalogue changes in addition and removalAnother problem is commonly there are no modifications, only additions which is again not realistic, because after Extraction Transform Load is run there might be some mistakes in data that needs to be modified
  5. The facts are classified based on the conceptual role [37]:Flow facts, group transactions happening in a time interval into a single transaction like purchases of items and enrollmentsStock facts , monitoring items by their time state like price of share and level of a river
  6. The facts are classified based on the conceptual role [37]:Flow facts, group transactions happening in a time interval into a single transaction like purchases of items and enrollmentsStock facts , monitoring items by their time state like price of share and level of a river
  7. there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
  8. there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
  9. there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
  10. there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
  11. there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
  12. there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
  13. there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
  14. Keeping temporal data warehouses is useless without companies using queries that support temporal data warehousesWith standard SQL it is possible but infeasible
  15. Keeping temporal data warehouses is useless without companies using queries that support temporal data warehousesWith standard SQL it is possible but infeasible
  16. Keeping temporal data warehouses is useless without companies using queries that support temporal data warehousesWith standard SQL it is possible but infeasible
  17. For query rewrite optimizations, suppose the last scenario was happening, if there is an additional predicate on the original query, it should have been applied on the appropriate table before the join, this is where Teradata optimizer solves the issue by folding the derived tables into their parent queries, view folding is an internal feature of the Teradata optimizer where it converts queries on derived tables that require a temporary table to a parent query that doesn’t require a temporary table
  18. The above scenario should be controlled by the user, the user also can do many iterations to improve data
  19. The above scenario should be controlled by the user, the user also can do many iterations to improve data
  20. A user interface is available to enable users to submit situational queries in an OLAP-like fashion. Queries are then handed on to a query processor, translates the user queries to executable query processing code. The submitted query can refer to a stationary cube or to a fusion cube already defined by another user or the user may need to create a new fusion cube, in that case new situational data must be found. In that case a data finder uses external registries as well as external ontologies or it just accesses the metadata already in the catalog as well as in the internal ontology. Registries are complex services or just a simple search engine
  21. The above scenario should be controlled by the user, the user also can do many iterations to improve data
  22. For processing time reduction, MapReduce approaches can reduce query times by splitting the query into nodes
  23. The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
  24. The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
  25. The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
  26. The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
  27. The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
  28. The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
  29. The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again