SlideShare a Scribd company logo
1 of 8
Advanced Database Systems
Question 1. List and explain various Normal Forms. How BCNF differs from the
Third Normal Form and 4th Normal forms?
Normal Forms
Relations are classified based upon the types of anomalies to which they're vulnerable. A database
that's in the first normal form is vulnerable to all types of anomalies, while a database that's in the
domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature.
That is, the lowest level is the first normal form, and the database cannot meet the requirements for
higher level normal forms without first having met all the requirements of the lesser normal forms.
First Normal Form
Any table having any relation is said to be in the first normal form. The criteria that must be met to be
considered relational is that the cells of the table must contain only single values, and repeat groups or
arrays are not allowed as values. All attributes (the entries in a column) must be of the same kind, and
each column must have a unique name. Each row in the table must be unique. Databases in the first
normal form are the weakest and suffer from all modification anomalies.
Second Normal Form
If all a relational database's non-key attributes are dependent on all of the key, then the database is
considered to meet the criteria for being in the second normal form. This normal form solves the
problem of partial dependencies, but this normal form only pertains to relations with composite keys.
Third Normal Form
A database is in the third normal form if it meets the criteria for a second normal form and has no
transitive dependencies.
Boyce-Codd Normal Form
A database that meets third normal form criteria and every determinant in the database is a candidate
key, it's said to be in the Boyce-Codd Normal Form. This normal form solves the issue of functional
dependencies.
Fourth Normal Form
Fourth Normal Form (4NF) is an extension of BCNF for functional and multi-valued dependencies. A
schema is in 4NF if the left hand side of every non-trivial functional or multi-valued dependency is a
super-key.
Domain/Key Normal Form
The domain/key normal form is the Holy Grail of relational database design, achieved when every
constraint on the relation is a logical consequence of the definition of keys and domains, and
enforcing key and domain restraints and conditions causes all constraints to be met. Thus, it avoids all
non-temporal anomalies. It's much easier to build a database in domain/key normal form than it is to
convert lesser databases which may contain numerous anomalies. However, successfully building a
domain/key normal form database remains a difficult task, even for experienced database
programmers. Thus, while the domain/key normal form eliminates the problems found in most
databases, it tends to be the most costly normal form to achieve. However, failing to achieve the
domain/key normal form may carry long-term, hidden costs due to anomalies which appear in
databases adhering only to lower normal forms over time.
Question 2. Describe the concepts of Structural Semantic Data Model (SSM).
A data model in software engineering is an abstract model that describes how data are represented and
accessed. Data models formally define data elements and relationships among data elements for a
domain of interest. According to Hoberman (2009), "A data model is a way finding tool for both
business and IT professionals, which uses a set of symbols and text to precisely explain a subset of
real information to improve communication within the organization and thereby lead to a more
flexible and stable application environment." A data model explicitly determines the structure of data
or structured data. Typical applications of data models include database models, design of information
systems, and enabling exchange of data. Usually data models are specified in a data modeling
language. Communication and precision are the two key benefits that make a data model important to
applications that use and exchange data. A data model is the medium which project team members
from different backgrounds and with different levels of experience can communicate with one
another. Precision means that the terms and rules on a data model can be interpreted only one way and
are not ambiguous. A data model can be sometimes referred to as a data structure, especially in the
context of programming languages. Data models are often complemented by function models,
especially in the context of enterprise models.
A semantic data model in software engineering is a technique to define the meaning of data within the
context of its interrelationships with other data. A semantic data model is an abstraction which defines
how the stored symbols relate to the real world. A semantic data model is sometimes called a
conceptual data model. The logical data structure of a database management system (DBMS), whether
hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition
of data because it is limited in scope and biased toward the implementation strategy employed by the
DBMS. Therefore, the need to define data from a conceptual view has led to the development of
semantic data modeling techniques. That is, techniques to define the meaning of data within the
context of its interrelationships with other data. The real worlds, in terms of resources, ideas, events,
etc., are symbolically defined within physical data stores. A semantic data model is an abstraction
which defines how the stored symbols relate to the real world. Thus, the model must be a true
representation of the real world
Data modeling in software engineering is the process of creating a data model by applying formal data
model descriptions using data modeling techniques. Data modeling is a technique for defining
business requirements for a database. It is sometimes called database modeling because a data model
is eventually implemented in a database.
The illustrates the way data models are developed and used today. A conceptual data model is
developed based on the data requirements for the application that is being developed, perhaps in the
context of an activity model. The data model will normally consist of entity types, attributes,
relationships, integrity rules, and the definitions of those objects. This is then used as the start point
for interface or database design
Data architecture is the design of data for use in defining the target state and the subsequent planning
needed to hit the target state. It is usually one of several architecture domains that form the pillars of
an enterprise architecture or solution architecture.
Question 3. Describe the following with respect to Object Oriented Databases:
a. Query Processing in Object-Oriented Database Systems
Query Processing in Object-Oriented Database Systems One of the criticisms of first-generation
object-oriented database management systems (OODBMSs) was their lack of declarative query
capabilities. This led some researchers to brand first generation (network and hierarchical) DBMSs as
object-oriented [Ullman 1988]. It was commonly believed that the application domains that
OODBMS technology targets do not need querying capabilities. This belief no longer holds, and
declarative query capability is accepted as one of the fundamental features of OODBMSs [Atkinson et
al. 1989; Stonebraker et al. 1990]. Indeed, most of the current prototype systems experiment with
powerful query languages and investigate their optimization. Commercial products have started to
include such languages as well (e.g., O2 [Deux et al. 1991], Object Store [Lamb et al. 1991]).In this
chapter we discuss the issues related to the optimization and execution of OODBMS query languages
(which we collectively call query processing). Query optimization techniques are dependent upon the
query model and language. For example, a functional query language lends itself to functional
optimization which is quite different from the algebraic, cost-based optimization techniques employed
in relational as well as a number of object-oriented systems. The query model, in turn, is based on the
data (or object) model since the latter defines the access primitives which are used by the query
model. These primitives, at least partially, determine the power of the query model. Despite this close
relationship, in this chapter we do not consider issues related to the design of object models query
models, or query languages in any detail. Language design issues are discussed elsewhere in this
book. The interrelationship between object and query models is discussed in [Blakeley 1991; Ozsu
and Straube 1991; Ozsu et al.1993; Yu and Osborn 1991].
Almost all object query processors proposed to date use optimization techniques developed for
relational systems. However, there are a number of issues that make query processing more difficult
in OODBMSs. The following are some of the more important issues:
1.Type system. Relational query languages operate on a simple type system consisting of a single
aggregate type: relation The closure property of relational languages implies that each relational
operator takes one or more relations as operands and produces a relation as a result. In contrast, object
systems have richer type systems. The results of object algebra operators are usually sets of objects
(or collections) whose members may be of different types. If the object languages are closed under the
algebra operators, these heterogeneous sets of objects can be operands to other operators. This
requires the development of elaborate type inferencing schemes to determine which methods can be
applied to all the objects in such a set. Furthermore, object algebras often operate on semantically
different collection types (e.g., set, bag, list) which imposes additional requirements on the type
inferencing schemes to determine the type of the results of operations on collections of different
types.
2. Encapsulation.Relational query optimization depends on knowledge of the physical storage of data
(access paths) which is readily available to the query optimizer. The encapsulation of methods with
the data that they operate on in OODBMSs raises (at least) two issues. First, estimating the cost of
executing methods is considerably more difficult than estimating the cost of accessing an attribute
according to an access path. In fact, optimizers have to worry about optimizing method execution,
which is not an easy problem because methods may be written using a general-purpose programming
language. Second, encapsulation raises issues related to the accessibility of storage information by the
query optimizer. Some systems overcome this difficulty by treating the query optimizer as a special
application that can break encapsulation and access information directly [Cluet and Delobel 1992].
Others propose a mechanism whereby objects “reveal” their costs as part of their interface [Graefe
and Maier 1988].
b. Query Processing Architecture
In this section we focus on two architectural issues: the query processing methodology and the query
optimizer architecture.
1 Query Processing Methodology
A query processing methodology similar to relational DBMSs, but modified to deal with the
difficulties discussed in the previous section, can be followed in OODBMSs. depicts such a
methodology proposed in [Straube and Ozsu 1990a]. The steps of the methodology are as follows.
Queries are expressed in a declarative language which requires no user knowledge of object
implementations, access paths or processing strategies. The calculus expression is first 2 calculus
optimization calculus-algebra transformation type check algebra optimization execution lan
generation object algebra expression type consistent expression optimized algebra expression
declarative query normalized calculus expression execution plan
2 Optimizer Architecture: Query optimization can be modeled as an optimization problem whose
solution is the choice of the “optimum” state in a state space (also called search space). In query
optimization, each state corresponds to an algebraic query indicating an execution schedule and
represented as a processing tree. The state space is a family of equivalent (in the sense of generating
the same result) algebraic queries. Query optimizers generate and search a state space using a search
strategy applying a cost function to each state and finding one with minimal cost. Thus, to
Characterize a query optimizer three things need to be specified:In this chapter we are mostly
concerned with cost-based optimization, which is arguably the more interesting case.
3.1. The search space and the the transformation rules that generate the alternative query expressions
which constitute the search space;
2. A search algorithm that allows one to move from one state to another in the search space; and
3. The cost function that is applied to each state. Many existing OODBMS optimizers are either
implemented as part of the object manager on top of a storage system, or they are implemented as
client modules in client-server architecture. In most cases, the above mentioned four aspects are
“hardwired” into the query optimizer. Given that extensibility is a major goal of OODBMSs, one
would hope to develop an extensible optimizer that accommodates different search strategies,
different algebra specifications with their different transformation rules, and different cost functions.
Rule-based query optimizers provide a limited amount of extensibility by allowing the definition of
new transformation rules. However, they do not allow extensibility in other dimensions. In this
section we discuss some new promising proposals for extensibility in OODBMSs. The Open OODB
project [Wells et al. 1992] at Texas Instruments
2 concentrate on the definition of an open architectural framework for OODBMSs and on the
description of the design space for these systems. Query processing in Open OODB [Blakeley et al.
1993]. The query module is an example of intra-module extensibility in Open OODB. The query
optimizer, built using the Volcano optimizer generator is extensible with respect to algebraic
operators, logical transformation rules, execution algorithms, implementation rules (i.e., logical
operator to execution algorithm mappings), cost estimation functions, and physical property
enforcement functions (e.g., presence of objects in memory). The clean separation between the user
query language parsing structures and the operator graph on which the optimizer operates allows the
replacement of the user language or optimizer. The separation between algebraic operators and
execution algorithms allows exploration with alternative methods for implementing algebraic
operators. Code generation is also a well defined subcomponent of the query module which facilitates
porting the query module to work on top of other OODBMSs. The Open OODB query processor
includes a query execution engine containing efficient implementations of scan, indexed scan, hybrid-
hash join [Shapiro 1986], and complex object assembly [Keller et al. 1991]. The EPOQ project
[Mitchell et al. 1993] is another approach to query optimization extensibility, where the search space
is divided into regions. Each region corresponds to an equivalent family of query expressions that are
reachable from each other. The regions are not necessarily mutually exclusive and differ in the queries
that they manipulate, control (search) strategy that they use, query transformation rules that they
incorporate, and optimization objectives they achieve. For example, one region may cover
transformation rules that deal with simple select queries, while another region may deal with
transformations for nested queries. Similarly, one region may have the objective of minimizing a cost
function, while another region may attempt to transform queries in some desirable form. Each region
may be nested to a number of levels, allowing hierarchical search within a region. Since the regions
do not represent equivalence classes, there is a need for a global control strategy to determine how the
query optimizer moves from one region to another. The feasibility and effectiveness of this approach
remains to be verified. The TIGUKAT project [Peters et al. 1992] uses an object-oriented approach to
query processing extensibility.
Question 4. Describe the Differences between Distributed & Centralized
Databases.
A distributed database is a database that is under the control of a central database management system
(DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple
computers located in the same physical location, or may be dispersed over a network of
interconnected computers. Collections of data (e.g. in a database) can be distributed across multiple
physical locations. A distributed database can reside on network servers on the Internet, on corporate
intranets or extranets, or on other company networks. The replication and distribution of databases
improves database performance at end-user worksites. To ensure that the distributive databases are up
to date and current, there are two processes: replication and duplication. Replication involves using
specialized software that looks for changes in the distributive database. Once the changes have been
identified, the replication process makes all the databases look the same. The replication process can
be very complex and time consuming depending on the size and number of the distributive databases.
This process can also require a lot of time and computer resources. Duplication on the other hand is
not as complicated. It basically identifies one database as a master and then duplicates that database.
The duplication process is normally done at a set time after hours. This is to ensure that each
distributed location has the same data. In the duplication process, changes to the master database only
are allowed. This is to ensure that local data will not be overwritten. Both of the processes can keep
the data current in all distributive locations. Besides distributed database replication and
fragmentation, there are many other distributed database design technologies. For example, local
autonomy, synchronous and asynchronous distributed database technologies. These technologies'
implementation can and does depend on the needs of the business and the sensitivity/confidentiality of
the data to be stored in the database, and hence the price the business is willing to spend on ensuring
data security, consistency and integrity. Basic architecture
A database User accesses the distributed database through: Local applications; Applications which do
not require data from other sites.
Global applications: Applications which do require data from other sites.
A distributed database does not share main memory or disks.
A centralized database has all its data on one place. As it is totally different from distributed database
which has data on different places. In centralized database as all the data reside on one place so
problem of bottle-neck can occur, and data availability is not efficient as in distributed database. Let
me define some advantages of distributed database, it will clear the difference between centralized
and distributed database.
Advantages of Data Distribution
The primary advantage of distributed database systems is the ability to share and access data in a
reliable and efficient manner.
Data sharing and Distributed Control:
If a number of different sites are connected to each other, then a user at one site may be able to access
data that is available at another site. For example, in the distributed banking system, it is possible for a
user in one branch to access data in another branch. Without this capability, a user wishing to transfer
funds from one branch to another would have to resort to some external mechanism for such a
transfer. This external mechanism would, in effect, be a single centralized database.
The primary advantage to accomplishing data sharing by means of data distribution is that each site is
able to retain a degree of control over data stored locally. In a centralized system, the database
administrator of the central site controls the database. In a distributed system, there is a global
database administrator responsible for the entire system. A part of these responsibilities is delegated to
the local database administrator for each site. Depending upon the design of the distributed database
system, each local administrator may have a different degree of autonomy which is often a major
advantage of distributed databases.
Question 5. Explain the following:
a. Query Optimization
Generally, the query optimizer cannot be accessed directly by users: once queries are submitted to
database server, and parsed by the parser, they are then passed to the query optimizer where
optimization occurs. However, some database engines allow guiding the query optimizer with hints.
A query is a request for information from a database. It can be as simple as "finding the address of a
person with SS# 123-45-6789," or more complex like "finding the average salary of all the employed
married men in California between the ages 30 to 39, that earn less than their wives." Queries results
are generated by accessing relevant database data and manipulating it in a way that yields the
requested information. Since database structures are complex, in most cases, and especially for not-
very-simple queries, the needed data for a query can be collected from a database by accessing it in
different ways, through different data-structures, and in different orders. Each different way typically
requires different processing time. Processing times of a same query may have large variance, from a
fraction of a second to hours, depending on the way selected. The purpose of query optimization,
which is an automated process, is to find the way to process a given query in minimum time. The
large possible variance in time justifies performing query optimization, though finding the exact
optimal way to execute a query, among all possibilities, is typically very complex, time consuming by
itself, may be too costly, and often practically impossible. Thus query optimization typically tries to
approximate the optimum by comparing several common-sense alternatives to provide in a reasonable
time a "good enough" plan which typically does not deviate much from the best possible result.
b. Text Retrieval Using SQL3/Text Retrieval
SQL3 supports storage of multimedia data, such as text documents, in an O-R database using the
blob/clob data types. However, the standard SQL3 specification does not include support for
processing the media content, such as indexing or querying. Thus is it not possible to use standard
SQL3 to locate documents based on an analysis of their content. Therefore, most of the larger or-
dbms vendors (IBM, Oracle, Ingres, Postgress ...) have used the SQL3 UDT/UDF functionality to
extend their or-dbms with management systems for media data. The approach used has been to add-on
own or purchased specialized media management systems to the basic or-dbms.
Basically, the new - to SQL3 - functionality includes:
Indexing routines for the various types of media data, as discussed in CH.6, for example using:
o Content terms for text data and
o Color, shape, and texture features for image data.
Selection operators for the SQL3 WHERE clause for specification of selection criteria for
media retrieval.
Text processing sub-systems for similarity evaluation and result ranking.
Unfortunately, the result of this 'independent' activity is non standard or-dbms/mm (multimedia)
systems that differ in the functionality included and limit data retrieval from multiple or-dbm system
types. For example, unified access to data stored in Oracle and DB2 systems is difficult, both in query
formulation and result presentation. Since actual SQL3/TextRetrieval syntax varies between or-
dbms/mm implementations, the examples used in the following are given in generic
SQL3/TextRetrieval statements.
8.1 Text Document Retrieval
Multimedia documents can be complex, but are basically unstructured. They can consist of the raw
text only, or have a few fixed attributes with one or more semi- or unstructured components. For
example, a news report for an election could include the following components: where n, m, k, and x
are the number of occurrences of each component type.
1. Identifier, date, and author(s) of the report,
2. n* text blocks - (titles, abstract, content text),
3. m* images - example: image_of_candidate
4. k* charts, and
5. x* maps.
Note that the document elements listed in pt.1 above function as context metadata for the report, while
the text itself can function as semantic metadata for the image materials (Rønnevik, 2005). illustrates
elements of a semi-structured document. The original Grieg site also contains a list of references/links
which gives access to other multimedia documents about the composer, including some of his music.
Since an OR-DB can contain text documents such as web pages, SQL3 should be extended with
processing operators that support access to each of the element types listed above.
Question 6. Describe the following:
a. Data Mining Functions: Data mining functions can be divided into two categories: supervised
(directed) and unsupervised (undirected).
Supervised functions are used to predict a value; they require the specification of a target (known
outcome). Targets are either binary attributes indicating yes/no decisions (buy/don't buy, churn or
don't churn, etc.) or multi-class targets indicating a preferred alternative (color of sweater, likely
salary range, etc.). Naive Bayes for classification is a supervised mining algorithm.
Unsupervised functions are used to find the intrinsic structure, relations, or affinities in data.
Unsupervised mining does not use a target. Clustering algorithms can be used to find naturally
occurring groups in data.
Data mining can also be classified as predictive or descriptive. Predictive data mining constructs one
or more models; these models are used to predict outcomes for new data sets. Predictive data mining
functions are classification and regression. Naive Bayes is one algorithm used for predictive data
mining. Descriptive data mining describes a data set in a concise way and presents interesting
characteristics of the data. Descriptive data mining functions are clustering, association models, and
feature extraction. k-Means clustering is an algorithm used for descriptive data mining.
Different algorithms serve different purposes; each algorithm has advantages and disadvantages. A
given algorithm can be used to solve different kinds of problems. For example, k-Means clustering is
unsupervised data mining; however, if you use k-Means clustering to assign new records to a cluster,
it performs predictive data mining. Similarly, decision tree classification is supervised data mining;
however, the decision tree rules can be used for descriptive purposes.
Oracle Data Mining supports the following data mining functions:
Supervised data mining:
o Classification: Grouping items into discrete classes and predicting which class an
item belongs to
o Regression: Approximating and forecasting continuous values
o Attribute Importance: Identifying the attributes that are most important in predicting
results
o Anomaly Detection: Identifying items that do not satisfy the characteristics of
"normal" data (outliers)
Unsupervised data mining:
o Clustering: Finding natural groupings in the data
o Association models: Analyzing "market baskets"
o Feature extraction: Creating new attributes (features) as a combination of the original
attributes
Oracle Data Mining permits mining of one or more columns of text data.
Oracle Data Mining also supports specialized sequence search and alignment algorithms (BLAST)
used to detect similarities between nucleotide and amino acid sequences.
b. Data Mining Techniques: Several core techniques that are used in data mining describe the
type of mining and data recovery operation. Unfortunately, the different companies and solutions do
not always share terms, which can add to the confusion and apparent complexity.
Let's look at some key techniques and examples of how to use different tools to build the data mining.
Association
Association (or relation) is probably the better known and most familiar and straightforward data
mining technique. Here, you make a simple correlation between two or more items, often of the same
type to identify patterns. For example, when tracking people's buying habits, you might identify that a
customer always buys cream when they buy strawberries, and therefore suggest that the next time that
they buy strawberries they might also want to buy cream.
Building association or relation-based data mining tools can be achieved simply with different tools.
For example, within InfoSphere Warehouse a wizard provides configurations of an information flow
that is used in association by examining your database input source, decision basis, and output.
Classification
You can use classification to build up an idea of the type of customer, item, or object by describing
multiple attributes to identify a particular class. For example, you can easily classify cars into
different types (sedan, 4x4, convertible) by identifying different attributes (number of seats, car shape,
driven wheels). Given a new car, you might apply it into a particular class by comparing the attributes
with our known definition. You can apply the same principles to customers, for example by
classifying them by age and social group.
Additionally, you can use classification as a feeder to, or the result of, other techniques. For example,
you can use decision trees to determine a classification. Clustering allows you to use common
attributes in different classifications to identify clusters.
Clustering
By examining one or more attributes or classes, you can group individual pieces of data together to
form a structure opinion. At a simple level, clustering is using one or more attributes as your basis for
identifying a cluster of correlating results. Clustering is useful to identify different information
because it correlates with other examples so you can see where the similarities and ranges agree.
Clustering can work both ways. You can assume that there is a cluster at a certain point and then use
our identification criteria to see if you are correct. In this, a sample of sales data compares the age of
the customer to the size of the sale. It is not unreasonable to expect that people in their twenties
(before marriage and kids), fifties, and sixties (when the children have left home), have more
disposable income.

More Related Content

What's hot

Lesson 2 network database system
Lesson 2 network database systemLesson 2 network database system
Lesson 2 network database systemGiO Friginal
 
Mca ii-dbms- u-ii-the relational database model
Mca ii-dbms- u-ii-the relational database modelMca ii-dbms- u-ii-the relational database model
Mca ii-dbms- u-ii-the relational database modelRai University
 
Relational Model in dbms & sql database
Relational Model in dbms & sql databaseRelational Model in dbms & sql database
Relational Model in dbms & sql databasegourav kottawar
 
DBMS - Relational Model
DBMS - Relational ModelDBMS - Relational Model
DBMS - Relational ModelOvais Imtiaz
 
Database management system
Database management systemDatabase management system
Database management systemedudivya
 
Database Systems - introduction
Database Systems - introductionDatabase Systems - introduction
Database Systems - introductionJananath Banuka
 
Database Concept by Luke Lonergan
Database Concept by Luke LonerganDatabase Concept by Luke Lonergan
Database Concept by Luke LonerganLuke Lonergan
 
Distributed database management systems
Distributed database management systemsDistributed database management systems
Distributed database management systemsUsman Tariq
 
Introduction to databases
Introduction to databasesIntroduction to databases
Introduction to databasesBryan Corpuz
 
Database management system chapter1
Database management system chapter1Database management system chapter1
Database management system chapter1Pranab Dasgupta
 
Lecture 02 architecture of dbms
Lecture 02 architecture of dbmsLecture 02 architecture of dbms
Lecture 02 architecture of dbmsrupalidhir
 
Data base management systems ppt
Data base management systems pptData base management systems ppt
Data base management systems pptsuthi
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational modelChirag vasava
 
Database concepts and Archeticture Ch2 with in class Activities
Database concepts and Archeticture Ch2 with in class ActivitiesDatabase concepts and Archeticture Ch2 with in class Activities
Database concepts and Archeticture Ch2 with in class ActivitiesZainab Almugbel
 

What's hot (20)

Lesson 2 network database system
Lesson 2 network database systemLesson 2 network database system
Lesson 2 network database system
 
Dbms Lecture Notes
Dbms Lecture NotesDbms Lecture Notes
Dbms Lecture Notes
 
Mca ii-dbms- u-ii-the relational database model
Mca ii-dbms- u-ii-the relational database modelMca ii-dbms- u-ii-the relational database model
Mca ii-dbms- u-ii-the relational database model
 
Relational Model in dbms & sql database
Relational Model in dbms & sql databaseRelational Model in dbms & sql database
Relational Model in dbms & sql database
 
DBMS - Relational Model
DBMS - Relational ModelDBMS - Relational Model
DBMS - Relational Model
 
Database management system
Database management systemDatabase management system
Database management system
 
Unit01 dbms
Unit01 dbmsUnit01 dbms
Unit01 dbms
 
Database Systems - introduction
Database Systems - introductionDatabase Systems - introduction
Database Systems - introduction
 
Database Concept by Luke Lonergan
Database Concept by Luke LonerganDatabase Concept by Luke Lonergan
Database Concept by Luke Lonergan
 
Relational Database Management System
Relational Database Management SystemRelational Database Management System
Relational Database Management System
 
Chapter02
Chapter02Chapter02
Chapter02
 
Dbms
DbmsDbms
Dbms
 
Distributed database management systems
Distributed database management systemsDistributed database management systems
Distributed database management systems
 
Introduction to databases
Introduction to databasesIntroduction to databases
Introduction to databases
 
Database management system chapter1
Database management system chapter1Database management system chapter1
Database management system chapter1
 
Lecture 02 architecture of dbms
Lecture 02 architecture of dbmsLecture 02 architecture of dbms
Lecture 02 architecture of dbms
 
Dbms unit01
Dbms unit01Dbms unit01
Dbms unit01
 
Data base management systems ppt
Data base management systems pptData base management systems ppt
Data base management systems ppt
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
 
Database concepts and Archeticture Ch2 with in class Activities
Database concepts and Archeticture Ch2 with in class ActivitiesDatabase concepts and Archeticture Ch2 with in class Activities
Database concepts and Archeticture Ch2 with in class Activities
 

Viewers also liked

Advanced Database Lecture Notes
Advanced Database Lecture NotesAdvanced Database Lecture Notes
Advanced Database Lecture NotesJasour Obeidat
 
Advance Database Management Systems -Object Oriented Principles In Database
Advance Database Management Systems -Object Oriented Principles In DatabaseAdvance Database Management Systems -Object Oriented Principles In Database
Advance Database Management Systems -Object Oriented Principles In DatabaseSonali Parab
 
Database system concepts
Database system conceptsDatabase system concepts
Database system conceptsKumar
 
Types of databases
Types of databasesTypes of databases
Types of databasesPAQUIAAIZEL
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Health Catalyst
 

Viewers also liked (11)

Advanced Database Lecture Notes
Advanced Database Lecture NotesAdvanced Database Lecture Notes
Advanced Database Lecture Notes
 
Análisis de establecimientos
Análisis de establecimientos Análisis de establecimientos
Análisis de establecimientos
 
MinhNguyen_Portfolio
MinhNguyen_PortfolioMinhNguyen_Portfolio
MinhNguyen_Portfolio
 
Advance Database Management Systems -Object Oriented Principles In Database
Advance Database Management Systems -Object Oriented Principles In DatabaseAdvance Database Management Systems -Object Oriented Principles In Database
Advance Database Management Systems -Object Oriented Principles In Database
 
Advanced DBMS presentation
Advanced DBMS presentationAdvanced DBMS presentation
Advanced DBMS presentation
 
Database system concepts
Database system conceptsDatabase system concepts
Database system concepts
 
Types of databases
Types of databasesTypes of databases
Types of databases
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
Polymorphism
PolymorphismPolymorphism
Polymorphism
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
 
Dbms slides
Dbms slidesDbms slides
Dbms slides
 

Similar to Mc0077 – advanced database systems

Student POST  Database processing models showcase the logical s.docx
Student POST  Database processing models showcase the logical s.docxStudent POST  Database processing models showcase the logical s.docx
Student POST  Database processing models showcase the logical s.docxorlandov3
 
Object relationship mapping and hibernate
Object relationship mapping and hibernateObject relationship mapping and hibernate
Object relationship mapping and hibernateJoe Jacob
 
call for paper 2012, hard copy of journal, research paper publishing, where t...
call for paper 2012, hard copy of journal, research paper publishing, where t...call for paper 2012, hard copy of journal, research paper publishing, where t...
call for paper 2012, hard copy of journal, research paper publishing, where t...IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1Sonia Mim
 
Database Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdfDatabase Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdfrsujeet169
 
Databases and its representation
Databases and its representationDatabases and its representation
Databases and its representationRuhull
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbmsAnjaan Gajendra
 
Bca examination 2015 dbms
Bca examination 2015 dbmsBca examination 2015 dbms
Bca examination 2015 dbmsAnjaan Gajendra
 
DBMS VIVA QUESTIONS_CODERS LODGE.pdf
DBMS VIVA QUESTIONS_CODERS LODGE.pdfDBMS VIVA QUESTIONS_CODERS LODGE.pdf
DBMS VIVA QUESTIONS_CODERS LODGE.pdfnofakeNews
 
Comparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented DatabaseComparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented DatabaseEditor IJMTER
 
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERSMAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERSijdms
 
In Memory Database Essay
In Memory Database EssayIn Memory Database Essay
In Memory Database EssayTammy Moncrief
 

Similar to Mc0077 – advanced database systems (20)

Student POST  Database processing models showcase the logical s.docx
Student POST  Database processing models showcase the logical s.docxStudent POST  Database processing models showcase the logical s.docx
Student POST  Database processing models showcase the logical s.docx
 
Preface
PrefacePreface
Preface
 
Data models
Data modelsData models
Data models
 
Cse ii ii sem
Cse ii ii semCse ii ii sem
Cse ii ii sem
 
Object relationship mapping and hibernate
Object relationship mapping and hibernateObject relationship mapping and hibernate
Object relationship mapping and hibernate
 
Data models
Data modelsData models
Data models
 
Data models
Data modelsData models
Data models
 
call for paper 2012, hard copy of journal, research paper publishing, where t...
call for paper 2012, hard copy of journal, research paper publishing, where t...call for paper 2012, hard copy of journal, research paper publishing, where t...
call for paper 2012, hard copy of journal, research paper publishing, where t...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1
 
Codds rules & keys
Codds rules & keysCodds rules & keys
Codds rules & keys
 
Database Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdfDatabase Concepts & SQL(1).pdf
Database Concepts & SQL(1).pdf
 
Databases and its representation
Databases and its representationDatabases and its representation
Databases and its representation
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbms
 
Bca examination 2015 dbms
Bca examination 2015 dbmsBca examination 2015 dbms
Bca examination 2015 dbms
 
DBMS VIVA QUESTIONS_CODERS LODGE.pdf
DBMS VIVA QUESTIONS_CODERS LODGE.pdfDBMS VIVA QUESTIONS_CODERS LODGE.pdf
DBMS VIVA QUESTIONS_CODERS LODGE.pdf
 
Comparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented DatabaseComparison of Relational Database and Object Oriented Database
Comparison of Relational Database and Object Oriented Database
 
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERSMAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
 
Bt0066 dbms
Bt0066 dbmsBt0066 dbms
Bt0066 dbms
 
In Memory Database Essay
In Memory Database EssayIn Memory Database Essay
In Memory Database Essay
 

Recently uploaded

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Recently uploaded (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

Mc0077 – advanced database systems

  • 1. Advanced Database Systems Question 1. List and explain various Normal Forms. How BCNF differs from the Third Normal Form and 4th Normal forms? Normal Forms Relations are classified based upon the types of anomalies to which they're vulnerable. A database that's in the first normal form is vulnerable to all types of anomalies, while a database that's in the domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level is the first normal form, and the database cannot meet the requirements for higher level normal forms without first having met all the requirements of the lesser normal forms. First Normal Form Any table having any relation is said to be in the first normal form. The criteria that must be met to be considered relational is that the cells of the table must contain only single values, and repeat groups or arrays are not allowed as values. All attributes (the entries in a column) must be of the same kind, and each column must have a unique name. Each row in the table must be unique. Databases in the first normal form are the weakest and suffer from all modification anomalies. Second Normal Form If all a relational database's non-key attributes are dependent on all of the key, then the database is considered to meet the criteria for being in the second normal form. This normal form solves the problem of partial dependencies, but this normal form only pertains to relations with composite keys. Third Normal Form A database is in the third normal form if it meets the criteria for a second normal form and has no transitive dependencies. Boyce-Codd Normal Form A database that meets third normal form criteria and every determinant in the database is a candidate key, it's said to be in the Boyce-Codd Normal Form. This normal form solves the issue of functional dependencies. Fourth Normal Form Fourth Normal Form (4NF) is an extension of BCNF for functional and multi-valued dependencies. A schema is in 4NF if the left hand side of every non-trivial functional or multi-valued dependency is a super-key. Domain/Key Normal Form The domain/key normal form is the Holy Grail of relational database design, achieved when every constraint on the relation is a logical consequence of the definition of keys and domains, and enforcing key and domain restraints and conditions causes all constraints to be met. Thus, it avoids all non-temporal anomalies. It's much easier to build a database in domain/key normal form than it is to convert lesser databases which may contain numerous anomalies. However, successfully building a domain/key normal form database remains a difficult task, even for experienced database programmers. Thus, while the domain/key normal form eliminates the problems found in most databases, it tends to be the most costly normal form to achieve. However, failing to achieve the domain/key normal form may carry long-term, hidden costs due to anomalies which appear in databases adhering only to lower normal forms over time. Question 2. Describe the concepts of Structural Semantic Data Model (SSM). A data model in software engineering is an abstract model that describes how data are represented and accessed. Data models formally define data elements and relationships among data elements for a domain of interest. According to Hoberman (2009), "A data model is a way finding tool for both
  • 2. business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment." A data model explicitly determines the structure of data or structured data. Typical applications of data models include database models, design of information systems, and enabling exchange of data. Usually data models are specified in a data modeling language. Communication and precision are the two key benefits that make a data model important to applications that use and exchange data. A data model is the medium which project team members from different backgrounds and with different levels of experience can communicate with one another. Precision means that the terms and rules on a data model can be interpreted only one way and are not ambiguous. A data model can be sometimes referred to as a data structure, especially in the context of programming languages. Data models are often complemented by function models, especially in the context of enterprise models. A semantic data model in software engineering is a technique to define the meaning of data within the context of its interrelationships with other data. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. A semantic data model is sometimes called a conceptual data model. The logical data structure of a database management system (DBMS), whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. Therefore, the need to define data from a conceptual view has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. The real worlds, in terms of resources, ideas, events, etc., are symbolically defined within physical data stores. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world Data modeling in software engineering is the process of creating a data model by applying formal data model descriptions using data modeling techniques. Data modeling is a technique for defining business requirements for a database. It is sometimes called database modeling because a data model is eventually implemented in a database. The illustrates the way data models are developed and used today. A conceptual data model is developed based on the data requirements for the application that is being developed, perhaps in the context of an activity model. The data model will normally consist of entity types, attributes, relationships, integrity rules, and the definitions of those objects. This is then used as the start point for interface or database design Data architecture is the design of data for use in defining the target state and the subsequent planning needed to hit the target state. It is usually one of several architecture domains that form the pillars of an enterprise architecture or solution architecture. Question 3. Describe the following with respect to Object Oriented Databases: a. Query Processing in Object-Oriented Database Systems Query Processing in Object-Oriented Database Systems One of the criticisms of first-generation object-oriented database management systems (OODBMSs) was their lack of declarative query capabilities. This led some researchers to brand first generation (network and hierarchical) DBMSs as object-oriented [Ullman 1988]. It was commonly believed that the application domains that OODBMS technology targets do not need querying capabilities. This belief no longer holds, and declarative query capability is accepted as one of the fundamental features of OODBMSs [Atkinson et al. 1989; Stonebraker et al. 1990]. Indeed, most of the current prototype systems experiment with powerful query languages and investigate their optimization. Commercial products have started to include such languages as well (e.g., O2 [Deux et al. 1991], Object Store [Lamb et al. 1991]).In this chapter we discuss the issues related to the optimization and execution of OODBMS query languages (which we collectively call query processing). Query optimization techniques are dependent upon the query model and language. For example, a functional query language lends itself to functional optimization which is quite different from the algebraic, cost-based optimization techniques employed
  • 3. in relational as well as a number of object-oriented systems. The query model, in turn, is based on the data (or object) model since the latter defines the access primitives which are used by the query model. These primitives, at least partially, determine the power of the query model. Despite this close relationship, in this chapter we do not consider issues related to the design of object models query models, or query languages in any detail. Language design issues are discussed elsewhere in this book. The interrelationship between object and query models is discussed in [Blakeley 1991; Ozsu and Straube 1991; Ozsu et al.1993; Yu and Osborn 1991]. Almost all object query processors proposed to date use optimization techniques developed for relational systems. However, there are a number of issues that make query processing more difficult in OODBMSs. The following are some of the more important issues: 1.Type system. Relational query languages operate on a simple type system consisting of a single aggregate type: relation The closure property of relational languages implies that each relational operator takes one or more relations as operands and produces a relation as a result. In contrast, object systems have richer type systems. The results of object algebra operators are usually sets of objects (or collections) whose members may be of different types. If the object languages are closed under the algebra operators, these heterogeneous sets of objects can be operands to other operators. This requires the development of elaborate type inferencing schemes to determine which methods can be applied to all the objects in such a set. Furthermore, object algebras often operate on semantically different collection types (e.g., set, bag, list) which imposes additional requirements on the type inferencing schemes to determine the type of the results of operations on collections of different types. 2. Encapsulation.Relational query optimization depends on knowledge of the physical storage of data (access paths) which is readily available to the query optimizer. The encapsulation of methods with the data that they operate on in OODBMSs raises (at least) two issues. First, estimating the cost of executing methods is considerably more difficult than estimating the cost of accessing an attribute according to an access path. In fact, optimizers have to worry about optimizing method execution, which is not an easy problem because methods may be written using a general-purpose programming language. Second, encapsulation raises issues related to the accessibility of storage information by the query optimizer. Some systems overcome this difficulty by treating the query optimizer as a special application that can break encapsulation and access information directly [Cluet and Delobel 1992]. Others propose a mechanism whereby objects “reveal” their costs as part of their interface [Graefe and Maier 1988]. b. Query Processing Architecture In this section we focus on two architectural issues: the query processing methodology and the query optimizer architecture. 1 Query Processing Methodology A query processing methodology similar to relational DBMSs, but modified to deal with the difficulties discussed in the previous section, can be followed in OODBMSs. depicts such a methodology proposed in [Straube and Ozsu 1990a]. The steps of the methodology are as follows. Queries are expressed in a declarative language which requires no user knowledge of object implementations, access paths or processing strategies. The calculus expression is first 2 calculus optimization calculus-algebra transformation type check algebra optimization execution lan generation object algebra expression type consistent expression optimized algebra expression declarative query normalized calculus expression execution plan 2 Optimizer Architecture: Query optimization can be modeled as an optimization problem whose solution is the choice of the “optimum” state in a state space (also called search space). In query optimization, each state corresponds to an algebraic query indicating an execution schedule and
  • 4. represented as a processing tree. The state space is a family of equivalent (in the sense of generating the same result) algebraic queries. Query optimizers generate and search a state space using a search strategy applying a cost function to each state and finding one with minimal cost. Thus, to Characterize a query optimizer three things need to be specified:In this chapter we are mostly concerned with cost-based optimization, which is arguably the more interesting case. 3.1. The search space and the the transformation rules that generate the alternative query expressions which constitute the search space; 2. A search algorithm that allows one to move from one state to another in the search space; and 3. The cost function that is applied to each state. Many existing OODBMS optimizers are either implemented as part of the object manager on top of a storage system, or they are implemented as client modules in client-server architecture. In most cases, the above mentioned four aspects are “hardwired” into the query optimizer. Given that extensibility is a major goal of OODBMSs, one would hope to develop an extensible optimizer that accommodates different search strategies, different algebra specifications with their different transformation rules, and different cost functions. Rule-based query optimizers provide a limited amount of extensibility by allowing the definition of new transformation rules. However, they do not allow extensibility in other dimensions. In this section we discuss some new promising proposals for extensibility in OODBMSs. The Open OODB project [Wells et al. 1992] at Texas Instruments 2 concentrate on the definition of an open architectural framework for OODBMSs and on the description of the design space for these systems. Query processing in Open OODB [Blakeley et al. 1993]. The query module is an example of intra-module extensibility in Open OODB. The query optimizer, built using the Volcano optimizer generator is extensible with respect to algebraic operators, logical transformation rules, execution algorithms, implementation rules (i.e., logical operator to execution algorithm mappings), cost estimation functions, and physical property enforcement functions (e.g., presence of objects in memory). The clean separation between the user query language parsing structures and the operator graph on which the optimizer operates allows the replacement of the user language or optimizer. The separation between algebraic operators and execution algorithms allows exploration with alternative methods for implementing algebraic operators. Code generation is also a well defined subcomponent of the query module which facilitates porting the query module to work on top of other OODBMSs. The Open OODB query processor includes a query execution engine containing efficient implementations of scan, indexed scan, hybrid- hash join [Shapiro 1986], and complex object assembly [Keller et al. 1991]. The EPOQ project [Mitchell et al. 1993] is another approach to query optimization extensibility, where the search space is divided into regions. Each region corresponds to an equivalent family of query expressions that are reachable from each other. The regions are not necessarily mutually exclusive and differ in the queries that they manipulate, control (search) strategy that they use, query transformation rules that they incorporate, and optimization objectives they achieve. For example, one region may cover transformation rules that deal with simple select queries, while another region may deal with transformations for nested queries. Similarly, one region may have the objective of minimizing a cost function, while another region may attempt to transform queries in some desirable form. Each region may be nested to a number of levels, allowing hierarchical search within a region. Since the regions do not represent equivalence classes, there is a need for a global control strategy to determine how the query optimizer moves from one region to another. The feasibility and effectiveness of this approach remains to be verified. The TIGUKAT project [Peters et al. 1992] uses an object-oriented approach to query processing extensibility.
  • 5. Question 4. Describe the Differences between Distributed & Centralized Databases. A distributed database is a database that is under the control of a central database management system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers. Collections of data (e.g. in a database) can be distributed across multiple physical locations. A distributed database can reside on network servers on the Internet, on corporate intranets or extranets, or on other company networks. The replication and distribution of databases improves database performance at end-user worksites. To ensure that the distributive databases are up to date and current, there are two processes: replication and duplication. Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be very complex and time consuming depending on the size and number of the distributive databases. This process can also require a lot of time and computer resources. Duplication on the other hand is not as complicated. It basically identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. In the duplication process, changes to the master database only are allowed. This is to ensure that local data will not be overwritten. Both of the processes can keep the data current in all distributive locations. Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consistency and integrity. Basic architecture A database User accesses the distributed database through: Local applications; Applications which do not require data from other sites. Global applications: Applications which do require data from other sites. A distributed database does not share main memory or disks. A centralized database has all its data on one place. As it is totally different from distributed database which has data on different places. In centralized database as all the data reside on one place so problem of bottle-neck can occur, and data availability is not efficient as in distributed database. Let me define some advantages of distributed database, it will clear the difference between centralized and distributed database. Advantages of Data Distribution The primary advantage of distributed database systems is the ability to share and access data in a reliable and efficient manner. Data sharing and Distributed Control: If a number of different sites are connected to each other, then a user at one site may be able to access data that is available at another site. For example, in the distributed banking system, it is possible for a user in one branch to access data in another branch. Without this capability, a user wishing to transfer funds from one branch to another would have to resort to some external mechanism for such a transfer. This external mechanism would, in effect, be a single centralized database. The primary advantage to accomplishing data sharing by means of data distribution is that each site is able to retain a degree of control over data stored locally. In a centralized system, the database
  • 6. administrator of the central site controls the database. In a distributed system, there is a global database administrator responsible for the entire system. A part of these responsibilities is delegated to the local database administrator for each site. Depending upon the design of the distributed database system, each local administrator may have a different degree of autonomy which is often a major advantage of distributed databases. Question 5. Explain the following: a. Query Optimization Generally, the query optimizer cannot be accessed directly by users: once queries are submitted to database server, and parsed by the parser, they are then passed to the query optimizer where optimization occurs. However, some database engines allow guiding the query optimizer with hints. A query is a request for information from a database. It can be as simple as "finding the address of a person with SS# 123-45-6789," or more complex like "finding the average salary of all the employed married men in California between the ages 30 to 39, that earn less than their wives." Queries results are generated by accessing relevant database data and manipulating it in a way that yields the requested information. Since database structures are complex, in most cases, and especially for not- very-simple queries, the needed data for a query can be collected from a database by accessing it in different ways, through different data-structures, and in different orders. Each different way typically requires different processing time. Processing times of a same query may have large variance, from a fraction of a second to hours, depending on the way selected. The purpose of query optimization, which is an automated process, is to find the way to process a given query in minimum time. The large possible variance in time justifies performing query optimization, though finding the exact optimal way to execute a query, among all possibilities, is typically very complex, time consuming by itself, may be too costly, and often practically impossible. Thus query optimization typically tries to approximate the optimum by comparing several common-sense alternatives to provide in a reasonable time a "good enough" plan which typically does not deviate much from the best possible result. b. Text Retrieval Using SQL3/Text Retrieval SQL3 supports storage of multimedia data, such as text documents, in an O-R database using the blob/clob data types. However, the standard SQL3 specification does not include support for processing the media content, such as indexing or querying. Thus is it not possible to use standard SQL3 to locate documents based on an analysis of their content. Therefore, most of the larger or- dbms vendors (IBM, Oracle, Ingres, Postgress ...) have used the SQL3 UDT/UDF functionality to extend their or-dbms with management systems for media data. The approach used has been to add-on own or purchased specialized media management systems to the basic or-dbms. Basically, the new - to SQL3 - functionality includes: Indexing routines for the various types of media data, as discussed in CH.6, for example using: o Content terms for text data and o Color, shape, and texture features for image data. Selection operators for the SQL3 WHERE clause for specification of selection criteria for media retrieval. Text processing sub-systems for similarity evaluation and result ranking. Unfortunately, the result of this 'independent' activity is non standard or-dbms/mm (multimedia) systems that differ in the functionality included and limit data retrieval from multiple or-dbm system types. For example, unified access to data stored in Oracle and DB2 systems is difficult, both in query formulation and result presentation. Since actual SQL3/TextRetrieval syntax varies between or- dbms/mm implementations, the examples used in the following are given in generic SQL3/TextRetrieval statements.
  • 7. 8.1 Text Document Retrieval Multimedia documents can be complex, but are basically unstructured. They can consist of the raw text only, or have a few fixed attributes with one or more semi- or unstructured components. For example, a news report for an election could include the following components: where n, m, k, and x are the number of occurrences of each component type. 1. Identifier, date, and author(s) of the report, 2. n* text blocks - (titles, abstract, content text), 3. m* images - example: image_of_candidate 4. k* charts, and 5. x* maps. Note that the document elements listed in pt.1 above function as context metadata for the report, while the text itself can function as semantic metadata for the image materials (Rønnevik, 2005). illustrates elements of a semi-structured document. The original Grieg site also contains a list of references/links which gives access to other multimedia documents about the composer, including some of his music. Since an OR-DB can contain text documents such as web pages, SQL3 should be extended with processing operators that support access to each of the element types listed above. Question 6. Describe the following: a. Data Mining Functions: Data mining functions can be divided into two categories: supervised (directed) and unsupervised (undirected). Supervised functions are used to predict a value; they require the specification of a target (known outcome). Targets are either binary attributes indicating yes/no decisions (buy/don't buy, churn or don't churn, etc.) or multi-class targets indicating a preferred alternative (color of sweater, likely salary range, etc.). Naive Bayes for classification is a supervised mining algorithm. Unsupervised functions are used to find the intrinsic structure, relations, or affinities in data. Unsupervised mining does not use a target. Clustering algorithms can be used to find naturally occurring groups in data. Data mining can also be classified as predictive or descriptive. Predictive data mining constructs one or more models; these models are used to predict outcomes for new data sets. Predictive data mining functions are classification and regression. Naive Bayes is one algorithm used for predictive data mining. Descriptive data mining describes a data set in a concise way and presents interesting characteristics of the data. Descriptive data mining functions are clustering, association models, and feature extraction. k-Means clustering is an algorithm used for descriptive data mining. Different algorithms serve different purposes; each algorithm has advantages and disadvantages. A given algorithm can be used to solve different kinds of problems. For example, k-Means clustering is unsupervised data mining; however, if you use k-Means clustering to assign new records to a cluster, it performs predictive data mining. Similarly, decision tree classification is supervised data mining; however, the decision tree rules can be used for descriptive purposes. Oracle Data Mining supports the following data mining functions: Supervised data mining: o Classification: Grouping items into discrete classes and predicting which class an item belongs to o Regression: Approximating and forecasting continuous values o Attribute Importance: Identifying the attributes that are most important in predicting results o Anomaly Detection: Identifying items that do not satisfy the characteristics of "normal" data (outliers) Unsupervised data mining: o Clustering: Finding natural groupings in the data
  • 8. o Association models: Analyzing "market baskets" o Feature extraction: Creating new attributes (features) as a combination of the original attributes Oracle Data Mining permits mining of one or more columns of text data. Oracle Data Mining also supports specialized sequence search and alignment algorithms (BLAST) used to detect similarities between nucleotide and amino acid sequences. b. Data Mining Techniques: Several core techniques that are used in data mining describe the type of mining and data recovery operation. Unfortunately, the different companies and solutions do not always share terms, which can add to the confusion and apparent complexity. Let's look at some key techniques and examples of how to use different tools to build the data mining. Association Association (or relation) is probably the better known and most familiar and straightforward data mining technique. Here, you make a simple correlation between two or more items, often of the same type to identify patterns. For example, when tracking people's buying habits, you might identify that a customer always buys cream when they buy strawberries, and therefore suggest that the next time that they buy strawberries they might also want to buy cream. Building association or relation-based data mining tools can be achieved simply with different tools. For example, within InfoSphere Warehouse a wizard provides configurations of an information flow that is used in association by examining your database input source, decision basis, and output. Classification You can use classification to build up an idea of the type of customer, item, or object by describing multiple attributes to identify a particular class. For example, you can easily classify cars into different types (sedan, 4x4, convertible) by identifying different attributes (number of seats, car shape, driven wheels). Given a new car, you might apply it into a particular class by comparing the attributes with our known definition. You can apply the same principles to customers, for example by classifying them by age and social group. Additionally, you can use classification as a feeder to, or the result of, other techniques. For example, you can use decision trees to determine a classification. Clustering allows you to use common attributes in different classifications to identify clusters. Clustering By examining one or more attributes or classes, you can group individual pieces of data together to form a structure opinion. At a simple level, clustering is using one or more attributes as your basis for identifying a cluster of correlating results. Clustering is useful to identify different information because it correlates with other examples so you can see where the similarities and ranges agree. Clustering can work both ways. You can assume that there is a cluster at a certain point and then use our identification criteria to see if you are correct. In this, a sample of sales data compares the age of the customer to the size of the sale. It is not unreasonable to expect that people in their twenties (before marriage and kids), fifties, and sixties (when the children have left home), have more disposable income.