SlideShare a Scribd company logo
1 of 10
Download to read offline
International Journal of Software Engineering.
Volume 2, Number 1 (2011), pp. 21-30
© International Research Publication House
http://www.irphouse.com
Database Aggregation using Metadata
Sandeep Kumar1
, Sanjay Jain2
and Rajani Kumari3
1
Arya College of Engg. & IT, Jaipur, Rajasthan, India
E-mail: sandpoonia@gmail.com
2
Department of Computer Science and Engineering,
Arya College of Engg. & IT, Jaipur, Rajasthan, India
E-mail: jainsanjay17@yahoo.com
3
Bahror Mahavidhylaya, Bahror, Rajasthan, India
E-mail: sweetugdd@gmail.com
Abstract
The ‘Database Aggregation using Metadata’ addresses the problem of
hardcoded end-user applications by sitting between the end-user application
and the DBMS, and intercepting the end user's SQL. With a Simulator for
Database Aggregation using metadata, the end-user application now speaks
"base-level" SQL and never attempts to call for an aggregate directly. Using
metadata describing the data warehouse's portfolio of aggregates, the
aggregate navigator transforms the base-level SQL into "simulator-aware"
SQL. The end user and the application designer can now proceed to build and
use applications, blissfully unaware of which aggregates are available.
The goal of an aggregate program in a large data warehouse must be more
than just improving performance. This simulator provides dramatic
performance gains for as many categories of user queries as possible. The
‘Database Aggregation using Metadata’ is a general purpose simulator. It
creates a new database or modify existing database. User enters a base-level
SQL query and this simulator transforms these base-level SQL query into
simulator-aware SQL (SA-SQL) query. This simulator can solve those queries
which are related to the database created by user.
Keywords: Metadata, ETL, OLAP, OLTP, Data warehouse, Data mart.
Introduction
A data warehouse is a relational database that is designed for query and analysis
Manuscript, November 2010
22 Sandeep Kumar et al
rather than for transaction processing. It usually contains historical data derived from
transaction data, but can include data from other sources. In addition to a relational
database, a data warehouse environment includes an extraction, transformation,
loading (ETL) solution. It also include online analytical processing (OLAP), data
mining capabilities, client analysis tools, and other applications that manage the
process of gathering data and delivering it to business users. Data warehouses are
designed to analyze data. The Data warehousing system includes backend tools for
extracting, cleaning and loading data from Online Transaction Processing (OLTP)
databases and historical repositories of data. The DBMS, are typically used for On-
Line Transaction Processing (OLTP), Whereas the data warehouses are designed for
On-Line Analytical Processing (OLAP) and decision making [1][2][3].
Multidimensional structure is defined as “a variation of the relational model that
uses multidimensional structures to organize data and express the relationships
between data” [4]. The structure is broken into cubes and the cubes are able to store
and access data within the confines of each cube. “Each cell within a
multidimensional structure contains aggregated data related to elements along each of
its dimensions”. Multidimensional structure is quite popular for analytical databases
that use online analytical processing (OLAP) applications [5].
Related Work
The earlier work on data warehousing involving WHIPS does not elaborate on the
technique used for modeling the data warehouse itself. Data warehouse is used to
support dimensional queries and, hence, requires dimensional modeling. Dimensional
data model is commonly used in data warehousing systems. Dimensional Modeling
is the name of a logical design technique often used for data warehouse [1, 4, 8]. It is
considered to be different from entity relationship modeling (ER). The same modeling
approach, at the logical level, can be used for any physical form, such as
multidimensional database or even flat files. Dimensional Modeling is used for
databases intended to support end-user queries in a data warehouse. Dimensional
Modeling is also used by many data warehouse designers to build their data
warehouse.
Dimensional modeling always uses the concepts of facts and dimensions. In this
design model all the data is stored as Facts table and Dimension table. Facts are
typically (but not always) numeric values that can be aggregated, and dimensions are
groups of hierarchies and descriptors that define the facts. A dimensional model
includes fact tables and lookup tables. Fact tables connect to one or more lookup
tables, but fact tables do not have direct relationships to one another [9]. Dimensions
and hierarchies are represented by lookup tables. Attributes are the non-key columns
in the lookup tables. Every dimensional model is composed of one table with a
multipart key called the fact table and a set of smaller tables called dimensional
tables.
A Fact Table is a table that contains the measures of interest. In data
warehousing, a fact table consists of the measurements, metrics or facts of a business
process. It is often located at the centre of a star schema, surrounded by dimension
Database Aggregation using Metadata 23
tables. Fact tables provide the (usually) additive values that act as independent
variables by which dimensional attributes are analyzed. Fact tables are often defined
by their grain. The grain of a fact table represents the most atomic level by which the
facts may be defined. Each data warehouse includes one or more fact tables.
A Dimensional Table is a collection of hierarchies and categories along which
the user can drill down and drill up. It contains only the textual attributes. Dimension
tables contain attributes that describe fact records in the fact table. Some of these
attributes provide descriptive information; others are used to specify how fact table
data should be summarized to provide useful information to the analyst. Dimension
tables contain hierarchies of attributes that aid in summarization. Dimensional
modeling produces dimension tables in which each table contains fact attributes that
are independent of those in other dimensions [2, 4, 10].
Figure 1: Star Schema by using Fact and Dimension Table.
Snowflake Schema, each dimension has a primary dimension table, to which one
or more additional dimensions can join. The primary dimension table is the only table
that can join to the fact table. In Snowflake schema, dimensions may be interlinked or
may have one-to-many relationship with other tables.
Star Schema is used as one of the ways of supporting dimensional modeling. Star
schema is a type of organizing the tables such that we can retrieve the result from the
database easily and fast in the warehouse environment. Usually a star schema consists
of one or more dimension tables around a fact table which looks like a star. Figure 4
shows a simple star schema.
24 Sandeep Kumar et al
Performance is an important consideration of any schema; particularly with a
decision-support system in which one routinely query large amounts of data. In the
star schema, any table that references or is referenced by another table must have a
primary key, which is a column or group of columns whose contents uniquely identify
each row. In a simple star schema, the primary key for the fact table consists of one or
more foreign keys. A foreign key is a column or group of columns in one table whose
values are defined by the primary key in another table [2, 10, 11].
Methodology
The work done in the area of aggregate navigator has been unsatisfactory and rather
unexplored. One of the documented algorithms was presented by Kimball in [12].
This algorithm is based on the star schema introduced previously. The base schema or
the detailed schema is as shown in Figure 1. This algorithm is based on following
design requirements, which are essential for designing any "family of aggregate
tables".
Simulator for Data Base Aggregation
The Aggregate Navigator Algorithm was designed without using metadata as
discussed in previous chapter. But the Simulator for Database Aggregation shows the
importance of having a customized metadata. Thus, developing the metadata is the
most crucial step. SQL statements are solved by using metadata. This algorithm is
implemented by using Java programming. It means that the metadata is used through
java program.
Proposed Algorithm for Simulator
The algorithm uses the same design requirements mentioned in previous algorithm,
and the new algorithm can be given as follows:
1. The first step of the algorithm remains the same which is using the previous
algorithm. So for any given SQL statement presented to the DBMS, the
smallest fact table that has not yet been examined in the family of schemas
referenced by the query is examined. The information about the fact tables is
maintained in the meta-data.
2. The second step in the algorithm differs from previous algorithm. Metadata is
maintained for family of schema and the dimension attributes change their
name according to shrinking or aggregating dimension table; which is shown
earlier. Now for any schema, the mapping is used for comparing the table
fields in SQL statement to the table fields in the particular fact and dimension
table. The mapping should be correct. If any field in the SQL statement cannot
be found in the current fact and dimension tables, then go back to step 1 and
find the next larger fact table.
3. If the query is not satisfied by the base schema as well, then solve the SQL
statement by using family of schema table. The query is contact with central
data mart, which is Main site in this example.
Database Aggregation usi
4. Now run the altere
all of the fields in t
Figure 2: Arch
In the previous algorit
why in the previous algor
to this database. In this ne
statement is related to one
In the previous algorit
such lookup would be a S
may take several seconds
aggregate navigator may t
and this is not acceptable.
The time for aggregate
table and an approximatio
second) taken by aggregat
made to system table and t
Experimental Results
For the first test, different
are presented to the simula
Test 1: The following que
completed was less than ti
select s.region_name
from academic_fact f, time
where f.planed_degree_co
ing Metadata
ed SQL. It is guaranteed to return the correct
the SQL statement are present in the chosen s
hitecture of Simulator for Database Aggregat
thm metadata is not maintained for family of
rithm every SQL statement can’t be solved w
ew approach every SQL statement can be sol
or more schema table in this database.
thm when solving SQL query, lookup each fi
SQL call to DBMS’s system table. Call to th
if layer of aggregate table exist more than si
take as much as 20 seconds to determine the
e navigator depends upon the number of call
on is one call to system table takes one seco
te navigator is almost equal to product of the
time taken per call.
s
t queries needing meta-data at different level
ator for testing purposes.
ery ask for region name in which the time fo
ime for actual degree completed for January,
e1 t, school s
ompleted < f.degree_completed
25
answer because
schema.
tion.
schema. That’s
which is related
ved as the SQL
ield name. Each
he system table
ix or seven; the
e correct choice
ls to the system
ond. So time (in
number of calls
l of aggregation
r planed degree
1999.
26 Sandeep Kumar et al
and f.school_key = s.school_key
and f.time_key = t.time_key
and t.fiscal_period = ‘1999’
and t.month1 = ‘January’;
The benefit of using aggregate tables is that additional information can be stored
in fact tables for higher level of aggregation. The aggregate fact table shows the
project aggregated to category, school rolled up to region, student rolled up to state,
household roll up to location and time rolled up to month. The query is presented as
follows:
select s.region_name
from academic_fact_aggregated_by_all f, time1 t, region s
where f.planed_degree_completed < f.degree_completed
and f.region_key = s.region_key
and f.time_key = t.time_key
and t.fiscal_period = ‘1999’
and t.month1 = ‘January’;
This query is optimized for lowest aggregated table named as
academic_fact_aggregated_by_all. Aggregate Navigator will first examine
academic_fact_aggregated_by_all. This query is optimized at lowest level of
aggregation, so number of calls to the system table will be equal to the number of
fields in the query.
Test 2: This query is the same as the one which is used in previous algorithm. The
query asks for the degree completed and end of year status for every Monday in
jaipur.
Select p.project_category, f.degree_completed, f.end_of_year_status
from academic_fact f, project p, school s, student u, household h, time1 t
where f.project_key = p.project_key
and f.school_key = s.school_key
and f.student_key = u.student_key
and f.time_key = t.time_key
and f.household_key = h.household_key
and t.day1 = “Monday”
and s.school_city = “jaipur”
group by p.project_category;
Previous algorithm doesn't even accomplish aggregate aware optimization
completely, whereas the same query presented to a Simulator for Database
Aggregation gives the following complete query:
select p.project_category, f.degree_completed, f.end_of_year_status
from academic_fact_agg_by_category f, category p, school s, student u, household h,
time1 t
where f.category_key = p.category_key
and f.school_key = s.cshool_key
Database Aggregation using Metadata 27
and f.student_key = u.student_key
and f.time_key = t.time_key
and f.household_key = h.household_key
and t.day1 = “Monday”
and s.school_city = “jaipur”
group by p.project_category;
Aggregate Navigator will first examine academic_fact_aggregated_by_all; it will
fail after making one call for academic_fact_aggregated_by_all table. It will then
make call for time _key category_key, student_key and project_category i.e., four
calls to the system table. If Aggregate Navigator is optimized properly, it is possible
to just make calls for the subsequent match. In this example, it will be school_key,
houschold_key, day1 and school_city. These fields can be satisfied at higher level of
aggregation, i.e. academic_fact_aggregated_by_category.
Test 3: This query presented to Simulator uses base schema. The query asks for the
project type, status description and degree completed for which the new student flag
was True.
select p.project_type, s.status_description, f.degree_completed
from academic_fact f, project p, status s
where f.project_key = p.project_key
and f.status_key = s.status_key
and st.new_student_flag = “ True”;
The query makes reference to the status table which is present only in the base
schema. The query remains unchanged.
select p.project_type, s.status_description, f.degree_completed
from academic_fact f, project p, status s
where f.project_key = p.project_key
and f.status_key = s.status_key
and st.new_student_flag = ‘ True’;
For this query, Aggregate Navigator will make one call for the
academic_fact_aggregated_by_ all tables, and it will fail. Then another call for the
academic_fact_aggregated_by_category tables and it will also fail.
Time Testing
The time taken for the Aggregate Navigator is dependent on the number of calls made
to the aggregate table and fields. For Aggregate Navigator, the time (in Second) taken
is equal to number of calls to system table. For the above queries, the number of
iteration and time taken by Case1 to resolve the query is noted. Similarly, time taken
for Case2 is noted. Time is tested through the running different queries at this
Simulator and comparing time between Aggregate Navigator, Case1 and Case2.
Note time for every query shows the performance of time. JDBC makes a
persistent connection to the database. This means that the connection time is
28 Sandeep Kumar et al
associated only with the first query to the system table for the first field being
examined. The time for the very first query is much higher than other queries. In this
Simulator all test are performed on Celeron processor (1.50 GHz) with 512 MB
RAM. Java 1.6 is used as front end and MS Access is used as back end.
Whenever we execute a query for Testing through Case1, then the query is taking
few milliseconds. In the Test1, query is found out from academic_aggregated_by_all
table, which is smallest table. So make a call to only one table named as
academic_aggregated_by_all. For accessing this table it takes 16 milliseconds.
Now in Test 2, query is found out from academic_aggregated_by_category table,
which is second smallest table. In this test firstly examine the smallest table names as
academic_aggregated_by_all, the query can’t be solved from this table, after this it
goes to searching from second smallest table named as
academic_aggregated_by_category. For this query, academic_aggregated_by_all table
is taking 16 milliseconds. And academic_aggregated_by_category table is also taking
16 milliseconds. Like Test 1 and Test 2 remains test will proceed.
Table I: times taken by aggregate navigator, case1 and case2.
Query Time Taken by Aggregate
Navigator (in sec.)
Time Taken by Case1
(in millisecond)
Time Taken by Case2
(in millisecond)
Test 1 112 ms 16 ms 32 ms
Test 2 144 ms 32 ms 46 ms
Test 3 91.98 ms 47 ms 78 ms
Test 4 173.25 ms 63 ms 78 ms
Test 5 110.25 ms 63 ms 93 ms
Comparison of time taken by Aggregate Navigator, Simulator is shown below by
using graph.
Figure 3: Comparison of time taken by Aggregate Navigator and Simulator.
0
20
40
60
80
100
120
140
160
180
200
Test 1 Test 2 Test 3 Test 4 Test 5
Aggrega
te 
Neviga…
Database Aggregation using Metadata 29
Conclusion
The work done for this thesis resulted in developing a Simulator for Database
Aggregation which is fast and does query optimization efficiently. The thesis suggests
a new approach for maintaining meta-data to be used with the Simulator to make the
optimization efficient. Meta data is examined only for the properties of a table, so
through the meta-data query response becomes faster and efficient. Simulator is
general purpose and configured for any database compatible with SQL. So the queries
will be optimizing efficiently.
The Simulator itself is written in Java, which makes it suitable for use with Web-
related front-end tools like Java applets. JDBC makes a persistent connection to the
database. This means that the connection time is associated only with the first query
to the system table for the first field being examined. The tests also show that it can be
used efficiently in the emerging distributed data warehousing approach. This would
make the distributed nature of the data warehouse and presence of aggregated tables
transparent to the end-users. As pointed out earlier, the performance of Simulator can
be improved drastically in case of a distributed approach with the use of cache.
References
[1] Surajit Chaudhuri et al., Database technology for decision support systems,
IEEE Computer, 48—55, 2001.
[2] J. Hammer, H. Garcia-Molina, W. Labio, J. Widom, and Y. Zhuge, The
Stanford Data Warehousing Project, IEEE Data Engineering Bulletin, 18:2,
41—48, 1995.
[3] Daniel Barbará and Xintao Wu, The Role of Approximations in Maintaining
and Using Aggregate Views, IEEE Data Engineering Bulletin, 22:4, 15-21,
1999.
[4] S. Sarawagi, Indexing OLAP Data, IEEE Data Engineering Bulletin, 20:1,
36—43, 1997.
[5] V. Harinarayan, Issues in Interactive Aggregation, IEEE Data Engineering
Bulletin, 20:1, 12—18, 1997.
[6] W. H. Inmon. “Building the Data Warehouse” ISBN-13: 978-0-7645-9944-6,
John Wiley, 1992.
[7] Ramon C. Barquin. "A data warehousing manifesto". Planning and Designing
the Data Warehouse. Prentice Hall, 1997
[8] K. Sahin. "Multidimensional database technology and data warehousing".
Database Journal, December 1995. Online: http://www.kenan.com/acumate/.
[9] Jane Zhao,” Designing Distributed Data Warehouses and OLAP Systems ”,
Massey University, Information Science Research Centre,
[10] J. L. Wiener, H. Gupta, W. J. Labia, Y. Zhuge, H. Garcia-Molina, and J.
Widom. "A System Prototype for Warehouse View Maintenance". In
Proceedings of the ACM Workshop on Materialized Views: Techniques and
Applications, pages 26-33, Montreal, Canada, June 7, 1996.
30 Sandeep Kumar et al
[11] Rakesh Agrawal, Ashish Gupta, Sunita Sarawagi,” Modeling Multidimensional
Databases”, IBM Almaden Research Center.
[12] Narasimhaiah Gorla, “Features to Consider in a data Warehousing System”,
Communications of the ACM November 2003/Vol. 46, No. 11,PP 111-115.
Authors Biography
Sandeep Kumar, Gradute from University engineering college kota,Kota in year
2005 and M.Tech. in year 2010 from RTU.

More Related Content

What's hot

Relational Database Design
Relational Database DesignRelational Database Design
Relational Database DesignArchit Saxena
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and modelssabah N
 
Introduction to ER Diagrams
Introduction to ER DiagramsIntroduction to ER Diagrams
Introduction to ER DiagramsAdri Jovin
 
Data resource management and DSS
Data resource management and DSSData resource management and DSS
Data resource management and DSSRajThakuri
 
Data resource management
Data resource managementData resource management
Data resource managementNirajan Silwal
 
Introduction to Data Structure and Algorithm
Introduction to Data Structure and AlgorithmIntroduction to Data Structure and Algorithm
Introduction to Data Structure and AlgorithmSagacious IT Solution
 
Study on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its ApplicationsStudy on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its ApplicationsIJERA Editor
 
Database Management System
Database Management SystemDatabase Management System
Database Management SystemSelshaCs
 
Chapter12 designing databases
Chapter12 designing databasesChapter12 designing databases
Chapter12 designing databasesDhani Ahmad
 
Data Processing-Presentation
Data Processing-PresentationData Processing-Presentation
Data Processing-Presentationnibraspk
 
Sql interview q&a
Sql interview q&aSql interview q&a
Sql interview q&aSyed Shah
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 

What's hot (19)

Data processing
Data processingData processing
Data processing
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database Design
 
RDBMS_Unit 01
RDBMS_Unit 01RDBMS_Unit 01
RDBMS_Unit 01
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and models
 
Introduction to ER Diagrams
Introduction to ER DiagramsIntroduction to ER Diagrams
Introduction to ER Diagrams
 
Physical Design and Development
Physical Design and DevelopmentPhysical Design and Development
Physical Design and Development
 
Data models
Data modelsData models
Data models
 
Data models
Data modelsData models
Data models
 
Data resource management and DSS
Data resource management and DSSData resource management and DSS
Data resource management and DSS
 
Data resource management
Data resource managementData resource management
Data resource management
 
Introduction to Data Structure and Algorithm
Introduction to Data Structure and AlgorithmIntroduction to Data Structure and Algorithm
Introduction to Data Structure and Algorithm
 
Study on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its ApplicationsStudy on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its Applications
 
Database fundamentals
Database fundamentalsDatabase fundamentals
Database fundamentals
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Chapter12 designing databases
Chapter12 designing databasesChapter12 designing databases
Chapter12 designing databases
 
Data Processing-Presentation
Data Processing-PresentationData Processing-Presentation
Data Processing-Presentation
 
Sql interview q&a
Sql interview q&aSql interview q&a
Sql interview q&a
 
Database Basics
Database BasicsDatabase Basics
Database Basics
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 

Viewers also liked

Using Aggregation for analytics
Using Aggregation for analyticsUsing Aggregation for analytics
Using Aggregation for analyticsMongoDB
 
Difference between molap, rolap and holap in ssas
Difference between molap, rolap and holap  in ssasDifference between molap, rolap and holap  in ssas
Difference between molap, rolap and holap in ssasUmar Ali
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering AlgorithmLino Possamai
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based ClusteringSSA KPI
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
Application of data mining
Application of data miningApplication of data mining
Application of data miningSHIVANI SONI
 

Viewers also liked (10)

Using Aggregation for analytics
Using Aggregation for analyticsUsing Aggregation for analytics
Using Aggregation for analytics
 
Difference between molap, rolap and holap in ssas
Difference between molap, rolap and holap  in ssasDifference between molap, rolap and holap  in ssas
Difference between molap, rolap and holap in ssas
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
OLAP
OLAPOLAP
OLAP
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 

Similar to Database aggregation using metadata

SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCESALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCEcscpconf
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...IOSR Journals
 
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...IJDKP
 
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...IJDKP
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
On multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingOn multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingJaspreet Issaj
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbmsAnjaan Gajendra
 
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...JOHNLEAK1
 
Data_Structure_and_Algorithms_Using_C++ _ Nho Vĩnh Share.pdf
Data_Structure_and_Algorithms_Using_C++ _ Nho Vĩnh Share.pdfData_Structure_and_Algorithms_Using_C++ _ Nho Vĩnh Share.pdf
Data_Structure_and_Algorithms_Using_C++ _ Nho Vĩnh Share.pdfNho Vĩnh
 
Data warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswersData warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswersSourav Singh
 
Business Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdfBusiness Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdfJayanti Pande
 
Data Analysis – Technical learnings
Data Analysis – Technical learningsData Analysis – Technical learnings
Data Analysis – Technical learningsInvenkLearn
 
Data Structure and its Fundamentals
Data Structure and its FundamentalsData Structure and its Fundamentals
Data Structure and its FundamentalsHitesh Mohapatra
 
Data Mining And Data Warehousing Laboratory File Manual
Data Mining And Data Warehousing Laboratory File ManualData Mining And Data Warehousing Laboratory File Manual
Data Mining And Data Warehousing Laboratory File ManualNitin Bhasin
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningNandakumar P
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...Editor IJCATR
 

Similar to Database aggregation using metadata (20)

Data Warehouse Designing: Dimensional Modelling and E-R Modelling
Data Warehouse Designing: Dimensional Modelling and E-R ModellingData Warehouse Designing: Dimensional Modelling and E-R Modelling
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCESALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
 
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...
 
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...
A COMPARATIVE STUDY ON BIG DATA HANDLING USING RELATIONAL AND NON-RELATIONAL ...
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
On multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingOn multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and querying
 
Fg d
Fg dFg d
Fg d
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbms
 
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
 
Data_Structure_and_Algorithms_Using_C++ _ Nho Vĩnh Share.pdf
Data_Structure_and_Algorithms_Using_C++ _ Nho Vĩnh Share.pdfData_Structure_and_Algorithms_Using_C++ _ Nho Vĩnh Share.pdf
Data_Structure_and_Algorithms_Using_C++ _ Nho Vĩnh Share.pdf
 
Data warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswersData warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswers
 
Business Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdfBusiness Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdf
 
Data Analysis – Technical learnings
Data Analysis – Technical learningsData Analysis – Technical learnings
Data Analysis – Technical learnings
 
Data Structure and its Fundamentals
Data Structure and its FundamentalsData Structure and its Fundamentals
Data Structure and its Fundamentals
 
Data Mining And Data Warehousing Laboratory File Manual
Data Mining And Data Warehousing Laboratory File ManualData Mining And Data Warehousing Laboratory File Manual
Data Mining And Data Warehousing Laboratory File Manual
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data Mining
 
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
 

More from Dr Sandeep Kumar Poonia

An improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithmAn improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithmDr Sandeep Kumar Poonia
 
Modified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithmModified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithmDr Sandeep Kumar Poonia
 
Enhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithmEnhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithmDr Sandeep Kumar Poonia
 
Memetic search in differential evolution algorithm
Memetic search in differential evolution algorithmMemetic search in differential evolution algorithm
Memetic search in differential evolution algorithmDr Sandeep Kumar Poonia
 
Improved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithmImproved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithmDr Sandeep Kumar Poonia
 
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithmComparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithmDr Sandeep Kumar Poonia
 
A novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithmA novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithmDr Sandeep Kumar Poonia
 
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked listsMultiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked listsDr Sandeep Kumar Poonia
 
Sunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithmSunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithmDr Sandeep Kumar Poonia
 
New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm Dr Sandeep Kumar Poonia
 
Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...Dr Sandeep Kumar Poonia
 
Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...Dr Sandeep Kumar Poonia
 

More from Dr Sandeep Kumar Poonia (20)

Soft computing
Soft computingSoft computing
Soft computing
 
An improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithmAn improved memetic search in artificial bee colony algorithm
An improved memetic search in artificial bee colony algorithm
 
Modified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithmModified position update in spider monkey optimization algorithm
Modified position update in spider monkey optimization algorithm
 
Enhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithmEnhanced local search in artificial bee colony algorithm
Enhanced local search in artificial bee colony algorithm
 
RMABC
RMABCRMABC
RMABC
 
Memetic search in differential evolution algorithm
Memetic search in differential evolution algorithmMemetic search in differential evolution algorithm
Memetic search in differential evolution algorithm
 
Improved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithmImproved onlooker bee phase in artificial bee colony algorithm
Improved onlooker bee phase in artificial bee colony algorithm
 
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithmComparative study of_hybrids_of_artificial_bee_colony_algorithm
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
 
A novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithmA novel hybrid crossover based abc algorithm
A novel hybrid crossover based abc algorithm
 
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked listsMultiplication of two 3 d sparse matrices using 1d arrays and linked lists
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
 
Sunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithmSunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithm
 
New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm New Local Search Strategy in Artificial Bee Colony Algorithm
New Local Search Strategy in Artificial Bee Colony Algorithm
 
A new approach of program slicing
A new approach of program slicingA new approach of program slicing
A new approach of program slicing
 
Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...Performance evaluation of different routing protocols in wsn using different ...
Performance evaluation of different routing protocols in wsn using different ...
 
Enhanced abc algo for tsp
Enhanced abc algo for tspEnhanced abc algo for tsp
Enhanced abc algo for tsp
 
Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...Performance evaluation of diff routing protocols in wsn using difft network p...
Performance evaluation of diff routing protocols in wsn using difft network p...
 
Lecture28 tsp
Lecture28 tspLecture28 tsp
Lecture28 tsp
 
Lecture27 linear programming
Lecture27 linear programmingLecture27 linear programming
Lecture27 linear programming
 
Lecture26
Lecture26Lecture26
Lecture26
 
Lecture25
Lecture25Lecture25
Lecture25
 

Recently uploaded

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Recently uploaded (20)

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

Database aggregation using metadata

  • 1. International Journal of Software Engineering. Volume 2, Number 1 (2011), pp. 21-30 © International Research Publication House http://www.irphouse.com Database Aggregation using Metadata Sandeep Kumar1 , Sanjay Jain2 and Rajani Kumari3 1 Arya College of Engg. & IT, Jaipur, Rajasthan, India E-mail: sandpoonia@gmail.com 2 Department of Computer Science and Engineering, Arya College of Engg. & IT, Jaipur, Rajasthan, India E-mail: jainsanjay17@yahoo.com 3 Bahror Mahavidhylaya, Bahror, Rajasthan, India E-mail: sweetugdd@gmail.com Abstract The ‘Database Aggregation using Metadata’ addresses the problem of hardcoded end-user applications by sitting between the end-user application and the DBMS, and intercepting the end user's SQL. With a Simulator for Database Aggregation using metadata, the end-user application now speaks "base-level" SQL and never attempts to call for an aggregate directly. Using metadata describing the data warehouse's portfolio of aggregates, the aggregate navigator transforms the base-level SQL into "simulator-aware" SQL. The end user and the application designer can now proceed to build and use applications, blissfully unaware of which aggregates are available. The goal of an aggregate program in a large data warehouse must be more than just improving performance. This simulator provides dramatic performance gains for as many categories of user queries as possible. The ‘Database Aggregation using Metadata’ is a general purpose simulator. It creates a new database or modify existing database. User enters a base-level SQL query and this simulator transforms these base-level SQL query into simulator-aware SQL (SA-SQL) query. This simulator can solve those queries which are related to the database created by user. Keywords: Metadata, ETL, OLAP, OLTP, Data warehouse, Data mart. Introduction A data warehouse is a relational database that is designed for query and analysis Manuscript, November 2010
  • 2. 22 Sandeep Kumar et al rather than for transaction processing. It usually contains historical data derived from transaction data, but can include data from other sources. In addition to a relational database, a data warehouse environment includes an extraction, transformation, loading (ETL) solution. It also include online analytical processing (OLAP), data mining capabilities, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users. Data warehouses are designed to analyze data. The Data warehousing system includes backend tools for extracting, cleaning and loading data from Online Transaction Processing (OLTP) databases and historical repositories of data. The DBMS, are typically used for On- Line Transaction Processing (OLTP), Whereas the data warehouses are designed for On-Line Analytical Processing (OLAP) and decision making [1][2][3]. Multidimensional structure is defined as “a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data” [4]. The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. “Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions”. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications [5]. Related Work The earlier work on data warehousing involving WHIPS does not elaborate on the technique used for modeling the data warehouse itself. Data warehouse is used to support dimensional queries and, hence, requires dimensional modeling. Dimensional data model is commonly used in data warehousing systems. Dimensional Modeling is the name of a logical design technique often used for data warehouse [1, 4, 8]. It is considered to be different from entity relationship modeling (ER). The same modeling approach, at the logical level, can be used for any physical form, such as multidimensional database or even flat files. Dimensional Modeling is used for databases intended to support end-user queries in a data warehouse. Dimensional Modeling is also used by many data warehouse designers to build their data warehouse. Dimensional modeling always uses the concepts of facts and dimensions. In this design model all the data is stored as Facts table and Dimension table. Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup tables, but fact tables do not have direct relationships to one another [9]. Dimensions and hierarchies are represented by lookup tables. Attributes are the non-key columns in the lookup tables. Every dimensional model is composed of one table with a multipart key called the fact table and a set of smaller tables called dimensional tables. A Fact Table is a table that contains the measures of interest. In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the centre of a star schema, surrounded by dimension
  • 3. Database Aggregation using Metadata 23 tables. Fact tables provide the (usually) additive values that act as independent variables by which dimensional attributes are analyzed. Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by which the facts may be defined. Each data warehouse includes one or more fact tables. A Dimensional Table is a collection of hierarchies and categories along which the user can drill down and drill up. It contains only the textual attributes. Dimension tables contain attributes that describe fact records in the fact table. Some of these attributes provide descriptive information; others are used to specify how fact table data should be summarized to provide useful information to the analyst. Dimension tables contain hierarchies of attributes that aid in summarization. Dimensional modeling produces dimension tables in which each table contains fact attributes that are independent of those in other dimensions [2, 4, 10]. Figure 1: Star Schema by using Fact and Dimension Table. Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table. In Snowflake schema, dimensions may be interlinked or may have one-to-many relationship with other tables. Star Schema is used as one of the ways of supporting dimensional modeling. Star schema is a type of organizing the tables such that we can retrieve the result from the database easily and fast in the warehouse environment. Usually a star schema consists of one or more dimension tables around a fact table which looks like a star. Figure 4 shows a simple star schema.
  • 4. 24 Sandeep Kumar et al Performance is an important consideration of any schema; particularly with a decision-support system in which one routinely query large amounts of data. In the star schema, any table that references or is referenced by another table must have a primary key, which is a column or group of columns whose contents uniquely identify each row. In a simple star schema, the primary key for the fact table consists of one or more foreign keys. A foreign key is a column or group of columns in one table whose values are defined by the primary key in another table [2, 10, 11]. Methodology The work done in the area of aggregate navigator has been unsatisfactory and rather unexplored. One of the documented algorithms was presented by Kimball in [12]. This algorithm is based on the star schema introduced previously. The base schema or the detailed schema is as shown in Figure 1. This algorithm is based on following design requirements, which are essential for designing any "family of aggregate tables". Simulator for Data Base Aggregation The Aggregate Navigator Algorithm was designed without using metadata as discussed in previous chapter. But the Simulator for Database Aggregation shows the importance of having a customized metadata. Thus, developing the metadata is the most crucial step. SQL statements are solved by using metadata. This algorithm is implemented by using Java programming. It means that the metadata is used through java program. Proposed Algorithm for Simulator The algorithm uses the same design requirements mentioned in previous algorithm, and the new algorithm can be given as follows: 1. The first step of the algorithm remains the same which is using the previous algorithm. So for any given SQL statement presented to the DBMS, the smallest fact table that has not yet been examined in the family of schemas referenced by the query is examined. The information about the fact tables is maintained in the meta-data. 2. The second step in the algorithm differs from previous algorithm. Metadata is maintained for family of schema and the dimension attributes change their name according to shrinking or aggregating dimension table; which is shown earlier. Now for any schema, the mapping is used for comparing the table fields in SQL statement to the table fields in the particular fact and dimension table. The mapping should be correct. If any field in the SQL statement cannot be found in the current fact and dimension tables, then go back to step 1 and find the next larger fact table. 3. If the query is not satisfied by the base schema as well, then solve the SQL statement by using family of schema table. The query is contact with central data mart, which is Main site in this example.
  • 5. Database Aggregation usi 4. Now run the altere all of the fields in t Figure 2: Arch In the previous algorit why in the previous algor to this database. In this ne statement is related to one In the previous algorit such lookup would be a S may take several seconds aggregate navigator may t and this is not acceptable. The time for aggregate table and an approximatio second) taken by aggregat made to system table and t Experimental Results For the first test, different are presented to the simula Test 1: The following que completed was less than ti select s.region_name from academic_fact f, time where f.planed_degree_co ing Metadata ed SQL. It is guaranteed to return the correct the SQL statement are present in the chosen s hitecture of Simulator for Database Aggregat thm metadata is not maintained for family of rithm every SQL statement can’t be solved w ew approach every SQL statement can be sol or more schema table in this database. thm when solving SQL query, lookup each fi SQL call to DBMS’s system table. Call to th if layer of aggregate table exist more than si take as much as 20 seconds to determine the e navigator depends upon the number of call on is one call to system table takes one seco te navigator is almost equal to product of the time taken per call. s t queries needing meta-data at different level ator for testing purposes. ery ask for region name in which the time fo ime for actual degree completed for January, e1 t, school s ompleted < f.degree_completed 25 answer because schema. tion. schema. That’s which is related ved as the SQL ield name. Each he system table ix or seven; the e correct choice ls to the system ond. So time (in number of calls l of aggregation r planed degree 1999.
  • 6. 26 Sandeep Kumar et al and f.school_key = s.school_key and f.time_key = t.time_key and t.fiscal_period = ‘1999’ and t.month1 = ‘January’; The benefit of using aggregate tables is that additional information can be stored in fact tables for higher level of aggregation. The aggregate fact table shows the project aggregated to category, school rolled up to region, student rolled up to state, household roll up to location and time rolled up to month. The query is presented as follows: select s.region_name from academic_fact_aggregated_by_all f, time1 t, region s where f.planed_degree_completed < f.degree_completed and f.region_key = s.region_key and f.time_key = t.time_key and t.fiscal_period = ‘1999’ and t.month1 = ‘January’; This query is optimized for lowest aggregated table named as academic_fact_aggregated_by_all. Aggregate Navigator will first examine academic_fact_aggregated_by_all. This query is optimized at lowest level of aggregation, so number of calls to the system table will be equal to the number of fields in the query. Test 2: This query is the same as the one which is used in previous algorithm. The query asks for the degree completed and end of year status for every Monday in jaipur. Select p.project_category, f.degree_completed, f.end_of_year_status from academic_fact f, project p, school s, student u, household h, time1 t where f.project_key = p.project_key and f.school_key = s.school_key and f.student_key = u.student_key and f.time_key = t.time_key and f.household_key = h.household_key and t.day1 = “Monday” and s.school_city = “jaipur” group by p.project_category; Previous algorithm doesn't even accomplish aggregate aware optimization completely, whereas the same query presented to a Simulator for Database Aggregation gives the following complete query: select p.project_category, f.degree_completed, f.end_of_year_status from academic_fact_agg_by_category f, category p, school s, student u, household h, time1 t where f.category_key = p.category_key and f.school_key = s.cshool_key
  • 7. Database Aggregation using Metadata 27 and f.student_key = u.student_key and f.time_key = t.time_key and f.household_key = h.household_key and t.day1 = “Monday” and s.school_city = “jaipur” group by p.project_category; Aggregate Navigator will first examine academic_fact_aggregated_by_all; it will fail after making one call for academic_fact_aggregated_by_all table. It will then make call for time _key category_key, student_key and project_category i.e., four calls to the system table. If Aggregate Navigator is optimized properly, it is possible to just make calls for the subsequent match. In this example, it will be school_key, houschold_key, day1 and school_city. These fields can be satisfied at higher level of aggregation, i.e. academic_fact_aggregated_by_category. Test 3: This query presented to Simulator uses base schema. The query asks for the project type, status description and degree completed for which the new student flag was True. select p.project_type, s.status_description, f.degree_completed from academic_fact f, project p, status s where f.project_key = p.project_key and f.status_key = s.status_key and st.new_student_flag = “ True”; The query makes reference to the status table which is present only in the base schema. The query remains unchanged. select p.project_type, s.status_description, f.degree_completed from academic_fact f, project p, status s where f.project_key = p.project_key and f.status_key = s.status_key and st.new_student_flag = ‘ True’; For this query, Aggregate Navigator will make one call for the academic_fact_aggregated_by_ all tables, and it will fail. Then another call for the academic_fact_aggregated_by_category tables and it will also fail. Time Testing The time taken for the Aggregate Navigator is dependent on the number of calls made to the aggregate table and fields. For Aggregate Navigator, the time (in Second) taken is equal to number of calls to system table. For the above queries, the number of iteration and time taken by Case1 to resolve the query is noted. Similarly, time taken for Case2 is noted. Time is tested through the running different queries at this Simulator and comparing time between Aggregate Navigator, Case1 and Case2. Note time for every query shows the performance of time. JDBC makes a persistent connection to the database. This means that the connection time is
  • 8. 28 Sandeep Kumar et al associated only with the first query to the system table for the first field being examined. The time for the very first query is much higher than other queries. In this Simulator all test are performed on Celeron processor (1.50 GHz) with 512 MB RAM. Java 1.6 is used as front end and MS Access is used as back end. Whenever we execute a query for Testing through Case1, then the query is taking few milliseconds. In the Test1, query is found out from academic_aggregated_by_all table, which is smallest table. So make a call to only one table named as academic_aggregated_by_all. For accessing this table it takes 16 milliseconds. Now in Test 2, query is found out from academic_aggregated_by_category table, which is second smallest table. In this test firstly examine the smallest table names as academic_aggregated_by_all, the query can’t be solved from this table, after this it goes to searching from second smallest table named as academic_aggregated_by_category. For this query, academic_aggregated_by_all table is taking 16 milliseconds. And academic_aggregated_by_category table is also taking 16 milliseconds. Like Test 1 and Test 2 remains test will proceed. Table I: times taken by aggregate navigator, case1 and case2. Query Time Taken by Aggregate Navigator (in sec.) Time Taken by Case1 (in millisecond) Time Taken by Case2 (in millisecond) Test 1 112 ms 16 ms 32 ms Test 2 144 ms 32 ms 46 ms Test 3 91.98 ms 47 ms 78 ms Test 4 173.25 ms 63 ms 78 ms Test 5 110.25 ms 63 ms 93 ms Comparison of time taken by Aggregate Navigator, Simulator is shown below by using graph. Figure 3: Comparison of time taken by Aggregate Navigator and Simulator. 0 20 40 60 80 100 120 140 160 180 200 Test 1 Test 2 Test 3 Test 4 Test 5 Aggrega te  Neviga…
  • 9. Database Aggregation using Metadata 29 Conclusion The work done for this thesis resulted in developing a Simulator for Database Aggregation which is fast and does query optimization efficiently. The thesis suggests a new approach for maintaining meta-data to be used with the Simulator to make the optimization efficient. Meta data is examined only for the properties of a table, so through the meta-data query response becomes faster and efficient. Simulator is general purpose and configured for any database compatible with SQL. So the queries will be optimizing efficiently. The Simulator itself is written in Java, which makes it suitable for use with Web- related front-end tools like Java applets. JDBC makes a persistent connection to the database. This means that the connection time is associated only with the first query to the system table for the first field being examined. The tests also show that it can be used efficiently in the emerging distributed data warehousing approach. This would make the distributed nature of the data warehouse and presence of aggregated tables transparent to the end-users. As pointed out earlier, the performance of Simulator can be improved drastically in case of a distributed approach with the use of cache. References [1] Surajit Chaudhuri et al., Database technology for decision support systems, IEEE Computer, 48—55, 2001. [2] J. Hammer, H. Garcia-Molina, W. Labio, J. Widom, and Y. Zhuge, The Stanford Data Warehousing Project, IEEE Data Engineering Bulletin, 18:2, 41—48, 1995. [3] Daniel Barbará and Xintao Wu, The Role of Approximations in Maintaining and Using Aggregate Views, IEEE Data Engineering Bulletin, 22:4, 15-21, 1999. [4] S. Sarawagi, Indexing OLAP Data, IEEE Data Engineering Bulletin, 20:1, 36—43, 1997. [5] V. Harinarayan, Issues in Interactive Aggregation, IEEE Data Engineering Bulletin, 20:1, 12—18, 1997. [6] W. H. Inmon. “Building the Data Warehouse” ISBN-13: 978-0-7645-9944-6, John Wiley, 1992. [7] Ramon C. Barquin. "A data warehousing manifesto". Planning and Designing the Data Warehouse. Prentice Hall, 1997 [8] K. Sahin. "Multidimensional database technology and data warehousing". Database Journal, December 1995. Online: http://www.kenan.com/acumate/. [9] Jane Zhao,” Designing Distributed Data Warehouses and OLAP Systems ”, Massey University, Information Science Research Centre, [10] J. L. Wiener, H. Gupta, W. J. Labia, Y. Zhuge, H. Garcia-Molina, and J. Widom. "A System Prototype for Warehouse View Maintenance". In Proceedings of the ACM Workshop on Materialized Views: Techniques and Applications, pages 26-33, Montreal, Canada, June 7, 1996.
  • 10. 30 Sandeep Kumar et al [11] Rakesh Agrawal, Ashish Gupta, Sunita Sarawagi,” Modeling Multidimensional Databases”, IBM Almaden Research Center. [12] Narasimhaiah Gorla, “Features to Consider in a data Warehousing System”, Communications of the ACM November 2003/Vol. 46, No. 11,PP 111-115. Authors Biography Sandeep Kumar, Gradute from University engineering college kota,Kota in year 2005 and M.Tech. in year 2010 from RTU.