SlideShare une entreprise Scribd logo
1  sur  155
IT6701
INFORMATION MANAGEMENT
BY
K.T.Mikel Raj
1K.T.Mikel Raj
DATA MODELING
2K.T.Mikel Raj
Introduction
• Process of creating a data model for an information system by
applying formal data modeling techniques.
• Process used to define and analyze data requirements needed
to support the business processes.
• Therefore, the process of data modeling involves professional
data modelers working closely with business stakeholders, as
well as potential users of the information system.
3K.T.Mikel Raj
What is Data Model?
• Data Model is a collection of conceptual tools for describing data,
data relationships, data semantics and consistency constraint.
• A data model is a conceptual representation of data structures
required for data base and is very powerful in expressing and
communicating the business requirements.
• A data model visually represents the nature of data, business rules
governing the data, and how it will be organized in the database.
4K.T.Mikel Raj
• A data model provides a way to describe the design of a
database at the physical, logical and view levels.
• There are three different types of data models produced while
progressing from requirements to the actual database to be
used for the information system
5K.T.Mikel Raj
• Conceptual: describes WHAT the system contains.
• Logical: describes HOW the system will be implemented,
regardless of the DBMS.
• Physical: describes HOW the system will be implemented
using a specific DBMS.
Different Data Models
6K.T.Mikel Raj
A data model consists of entities related to each other on a diagram:
Data Model
Element
Definition
Entity A real world thing or an interaction between 2 or more real world things.
Attribute The atomic pieces of information that we need to know about entities.
Relationship How entities depend on each other in terms of why the entities depend on each other (the
relationship) and what that relationship is (the cardinality of the relationship).
7K.T.Mikel Raj
Example:
Given that …
• “Customer” is an entity.
• “Product” is an entity.
• For a “Customer” we need to know their “customer number”
attribute and “name” attribute.
• For a “Product” we need to know the “product name” attribute and
“price” attribute.
• “Sale” is an entity that is used to record the interaction of
“Customer” and “Product”.
8K.T.Mikel Raj
Here is the diagram that encapsulates these rules:
9K.T.Mikel Raj
Notes
• By convention, entities are named in the singular.
• The attributes of “Customer” are “Customer No” (which is the unique
identifier or primary key of the “Customer” entity and is shown by the #
symbol) and “Customer Name”.
• “Sale” has a composite primary key made up of the primary key of
“Customer”, the primary key of “Product” and the date of the sale.
• Think of entities as tables, think of attributes as columns on the table and
think of instances as rows on that table:
10K.T.Mikel Raj
• If we want to know the price of a Sale, we can ‘find’ it by using the “Product
Code” on the instance of “Sale” we are interested in and look up the
corresponding “Price” on the “Product” entity with the matching “Product
Code”.
11K.T.Mikel Raj
Types of Data Models
• Entity-Relationship (E-R) Models
• UML (unified modeling language)
12K.T.Mikel Raj
Entity-Relationship Model
• Entity Relationship Diagrams (ERD) as this is the most widely used
• ERDs have an advantage in that they are capable of being normalized
• Represent entities as rectangles
• List attributes within the rectangle
UniversityStudent
PK StudentID
StudentName
StudentDOB
StudentAge
Entity
Attributes
Primary key
13K.T.Mikel Raj
Why and When
• The purpose of a data model is to describe the concepts relevant
to a domain, the relationships between those concepts, and
information associated with them.
14K.T.Mikel Raj
• Used to model data in a standard, consistent, predictable
manner in order to manage it as a resource.
• To have a clear picture of the base data that your business
needs.
• To identify missing and redundant base data.
15K.T.Mikel Raj
• To Establish a baseline for communication across functional
boundaries within your organization.
• Provides a basis for defining business rules.
• Makes it cheaper, easier, and faster to upgrade your IT solutions.
16K.T.Mikel Raj
Entity Relationship Diagram (ERD)
17K.T.Mikel Raj
Objectives
• Define terms related to entity relationship modeling, including
entity, entity instance, attribute, relationship and cardinality, and
primary key.
• Describe the entity modeling process.
18K.T.Mikel Raj
• Discuss how to draw an entity relationship diagram.
• Describe how to recognize entities, attributes, relationships,
and cardinalities.
19K.T.Mikel Raj
Database Model
A database can be modeled as:
– a collection of entities,
– relationship among entities.
Database systems are often modeled using an Entity Relationship
(ER) diagram as the "blueprint" from which the actual data is
stored — the output of the design phase.
20K.T.Mikel Raj
Entity Relationship Diagram (ERD)
• ER model allows us to sketch database designs
• ERD is a graphical tool for modeling data.
• ERD is widely used in database design
• ERD is a graphical representation of the logical structure of a database
• ERD is a model that identifies the concepts or entities that exist in a
system and the relationships between those entities
21K.T.Mikel Raj
Purposes of ERD
An ERD serves several purposes
• The database analyst/designer gains a better understanding of
the information to be contained in the database through the
process of constructing the ERD.
• The ERD serves as a documentation tool.
22K.T.Mikel Raj
• Finally, the ERD is used to communicate the logical structure of
the database to users. In particular, the ERD effectively
communicates the logic of the database to users.
23K.T.Mikel Raj
Components of an ERD
An ERD typically consists of four different graphical components:
1. Entity
2. Relationship
3. Cardinality
4. Attribute
24K.T.Mikel Raj
Classification of Relationship
• Optional Relationship
– An Employee may or may not be assigned to a Department
– A Patient may or may not be assigned to a Bed
• Mandatory Relationship
– Every Course must be taught by at least one Teacher
– Every mother have at least a Child
25K.T.Mikel Raj
Cardinality Constraints
 Express the number of entities to which another entity can be
associated via a relationship set.
• Cardinality Constraints - the number of instances of one entity that can
or must be associated with each instance of another entity.
• Minimum Cardinality
– If zero, then optional
– If one or more, then mandatory
• Maximum Cardinality
– The maximum number
26K.T.Mikel Raj
Cardinality Constraints (Contd.)
• For a binary relationship set the mapping cardinality must be one of
the following types:
–One to one
• A Manager Head one Department and vice versa
–One to many ( or many to one)
• An Employee Works in one Department or One Department has many
Employees
–Many to many
• A Teacher Teaches many Students and A student is taught by many
Teachers 27K.T.Mikel Raj
General Steps to create an ERD
• Identify the entity
• Identify the entity's attributes
• Identify the Primary Keys
• Identify the relation between entities
• Identify the Cardinality constraint
• Draw the ERD
• Check the ERD
28K.T.Mikel Raj
Steps in building an ERD
29K.T.Mikel Raj
Developing an ERD
The process has ten steps:
1. Identify Entities
2. Find Relationships
3. Draw Rough ERD
4. Fill in Cardinality
5. Define Primary Keys
6. Draw Key-Based ERD
7. Identify Attributes
8. Map Attributes
9. Draw fully attributed ERD
10. Check Results 30K.T.Mikel Raj
A Simple Example
A company has several departments. Each department has a
supervisor and at least one employee. Employees must be assigned
to at least one, but possibly more departments. At least one
employee is assigned to a project, but an employee may be on
vacation and not assigned to any projects. The important data fields
are the names of the departments, projects, supervisors and
employees, as well as the supervisor and employee number and a
unique project number.
31K.T.Mikel Raj
Identify entities
• One approach to this is to work through the information and highlight those
words which you think correspond to entities.
• A company has several departments. Each department has a supervisor and at
least one employee. Employees must be assigned to at least one, but possibly
more departments. At least one employee is assigned to a project, but an
employee may be on vacation and not assigned to any projects. The important
data fields are the names of the departments, projects, supervisors and
employees, as well as the supervisor and employee number and a unique
project number.
• A true entity should have more than one instance
32K.T.Mikel Raj
Find Relationships
• Aim is to identify the associations, the connections between pairs of
entities.
• A simple approach to do this is using a relationship matrix (table)
that has rows and columns for each of the identified entities.
33K.T.Mikel Raj
Find Relationships (Contd.)
• Go through each cell and decide whether or not there is an association.
For example, the first cell on the second row is used to indicate if there is
a relationship between the entity "Employee" and the entity
"Department".
34K.T.Mikel Raj
Identified Relationships
Names placed in the cells are meant to capture/describe the
relationships. So you can use them like this
• A Department is assigned an employee
• A Department is run by a supervisor
• An employee belongs to a department
• An employee works on a project
• A supervisor runs a department
• A project uses an employee
35K.T.Mikel Raj
Draw Rough ERD
Draw a diagram and:
• Place all the entities in rectangles
• Use diamonds and lines to represent the relationships between
entities.
• General Examples
36K.T.Mikel Raj
Drawing Rough ERD (Contd.)
37K.T.Mikel Raj
Drawing Rough ERD (Contd.)
38K.T.Mikel Raj
Drawing Rough ERD (Contd.)
39K.T.Mikel Raj
Fill in Cardinality
• Supervisor
– Each department has one supervisor.
• Department
– Each supervisor has one department.
– Each employee can belong to one or more departments
• Employee
– Each department must have one or more employees
– Each project must have one or more employees
• Project
– Each employee can have 0 or more projects.
40K.T.Mikel Raj
Fill in Cardinality (Contd.)
The cardinality of a relationship can only have the following
values
–One and only one
–One or more
–Zero or more
–Zero or one
41K.T.Mikel Raj
Cardinality Notation
42K.T.Mikel Raj
Cardinality Examples
A
A
A
A
B
B
B
B
Each instance of A is related to a minimum of
zero and a maximum of one instance of B
Each instance of B is related to a minimum of
one and a maximum of one instance of A
Each instance of A is related to a minimum of
one and a maximum of many instances of B
Each instance of B is related to a minimum of
zero and a maximum of many instances of A
43K.T.Mikel Raj
ERD with cardinality
44K.T.Mikel Raj
Examples
45K.T.Mikel Raj
ERD for Course Enrollment
46K.T.Mikel Raj
ERD for Course Registration
47K.T.Mikel Raj
Rough ERD Plus Primary Keys
48K.T.Mikel Raj
Identify Attributes
• In this step we try to identify and name all the attributes essential to the system we are
studying without trying to match them to particular entities.
• The best way to do this is to study the forms, files and reports currently kept by the users
of the system and circle each data item on the paper copy.
• Cross out those which will not be transferred to the new system, extraneous items such as
signatures, and constant information which is the same for all instances of the form (e.g.
your company name and address). The remaining circled items should represent the
attributes you need. You should always verify these with your system users. (Sometimes
forms or reports are out of date.)
• The only attributes indicated are the names of the departments, projects, supervisors and
employees, as well as the supervisor and employee NUMBER and a unique project number.
49K.T.Mikel Raj
Map Attributes
• For each attribute we need to match it with exactly one entity. Often it
seems like an attribute should go with more than one entity (e.g. Name). In
this case you need to add a modifier to the attribute name to make it unique
(e.g. Customer Name, Employee Name, etc.) or determine which entity an
attribute "best' describes.
• If you have attributes left over without corresponding entities, you may have
missed an entity and its corresponding relationships. Identify these missed
entities and add them to the relationship matrix now.
50K.T.Mikel Raj
Map Attributes (Contd.)
51K.T.Mikel Raj
Draw Fully Attributed ERD
52K.T.Mikel Raj
Check ERD Results
• Look at your diagram from the point of view of a system owner or
user. Is everything clear?
• Check through the Cardinality pairs.
• Also, look over the list of attributes associated with each entity to
see if anything has been omitted.
53K.T.Mikel Raj
Java Database Connectivity (JDBC)
54K.T.Mikel Raj
Introduction
• Database
– Collection of data
• DBMS
– Database management system
– Storing and organizing data
• SQL
– Relational database
– Structured Query Language
• JDBC
– Java Database Connectivity
– JDBC driver
55K.T.Mikel Raj
JDBC
• Programs developed with Java/JDBC are platform and vendor
independent.
• “write once, compile once, run anywhere”
• Write apps in java to access any DB, using standard SQL
statements – while still following Java conventions.
• JDBC driver manager and JDBC drivers provide the bridge
between the database and java worlds.
56K.T.Mikel Raj
Java application
JDBC Driver Manager
JDBC/ODBC
Bridge
vendor-
supplied
JDBC driver
ODBC
driver
DatabaseDatabase
JDBC API
JDBC Driver API
57K.T.Mikel Raj
ODBC
• JDBC heavily influenced by ODBC
• ODBC provides a C interface for database access on Windows
environment.
• ODBC has a few commands with lots of complex options. Java
prefers simple methods but lots of them.
58K.T.Mikel Raj
• Type 1: Uses a bridging technology to access a database. JDBC-ODBC bridge is an example. It provides a
gateway to the ODBC.
• Type 2: Native API drivers. Driver contains Java code that calls native C/C++ methods provided by the
database vendors.
• Type 3: Generic network API that is then translated into database-specific access at the server level.
The JDBC driver on the client uses sockets to call a middleware application on the server that translates
the client requests into an API specific to the desired driver. Extremely flexible.
• Type 4: Using network protocols built into the database engine talk directly to the database using Java
sockets. Almost always comes only from database vendors.
Type 3
Type 1
Type 2
Type 4
3rd Party API
Native C/C++ API
Database
Local API
Network API
59K.T.Mikel Raj
JDBC Drivers Types
JDBC driver implementations vary because of the wide variety of
operating systems and hardware platforms in which Java operates.
Sun has divided the implementation types into four categories,
Types 1, 2, 3, and 4.
60K.T.Mikel Raj
JDBC Drivers
61K.T.Mikel Raj
Common JDBC Components
The JDBC API provides the following interfaces and classes −
DriverManager:
This class manages a list of database drivers.
 Matches connection requests from the java application with the proper database
driver using communication sub protocol.
 The first driver that recognizes a certain subprotocol under JDBC will be used to
establish a database Connection.
.
62K.T.Mikel Raj
Driver:
This interface handles the communications with the database
server.
You will interact directly with Driver objects very rarely. Instead,
you use DriverManager objects, which manages objects of this type.
It also abstracts the details associated with working with Driver
objects.
63K.T.Mikel Raj
Connection:
 This interface with all methods for contacting a database.
 The connection object represents communication context, i.e., all
communication with database is through connection object only.
Statement:
You use objects created from this interface to submit the SQL statements to
the database.
64K.T.Mikel Raj
ResultSet:
These objects hold data retrieved from a database after you execute an SQL
query using Statement objects.
 It acts as an iterator to allow you to move through its data.
SQLException:
 This class handles any errors that occur in a database application
65K.T.Mikel Raj
Type 1: JDBC-ODBC Bridge Driver
In a Type 1 driver, a JDBC bridge is used to access ODBC drivers installed
on each client machine.
 Using ODBC, It requires configuring on your system a Data Source Name
(DSN) that represents the target database.
The JDBC-ODBC Bridge that comes with JDK 1.2 is a good example of this
kind of driver.
66K.T.Mikel Raj
Type 1 driver
67K.T.Mikel Raj
Type 2: JDBC-Native API
In a Type 2 driver, JDBC API calls are converted into native C/C++ API calls,
which are unique to the database.
These drivers are typically provided by the database vendors and used in
the same manner as the JDBC-ODBC Bridge.
The vendor-specific driver must be installed on each client machine.
The Oracle Call Interface (OCI) driver is an example of a Type 2 driver.
68K.T.Mikel Raj
Type 2 Driver
69K.T.Mikel Raj
Type 3: JDBC-Net pure Java
In a Type 3 driver, a three-tier approach is used to access databases.
The JDBC clients use standard network sockets to communicate with a
middleware application server.
 The socket information is then translated by the middleware application
server into the call format required by the DBMS, and forwarded to the
database server.
70K.T.Mikel Raj
Type 3 Driver
71K.T.Mikel Raj
Type 4: 100% Pure Java
In a Type 4 driver, a pure Java-based driver communicates directly with the
vendor's database through socket connection.
This is the highest performance driver available for the database and is
usually provided by the vendor itself.
This kind of driver is extremely flexible, you don't need to install special
software on the client or server. Further, these drivers can be downloaded
dynamically.
72K.T.Mikel Raj
Type 4 Driver
73K.T.Mikel Raj
The following steps are required to create a new Database using JDBC application −
Import the packages:
 Requires that you include the packages containing the JDBC classes needed for
database programming.
Most often, using import java.sql.* will suffice.
Register the JDBC driver:
 Requires that you initialize a driver so you can open a communications channel with
the database.
74K.T.Mikel Raj
Open a connection:
 Using the DriverManager.getConnection() method to create a
Connection object, which represents a physical connection with the
database server.
To create a new database, you need not give any database name
while preparing database URL as mentioned in the below example.
75K.T.Mikel Raj
Execute a query:
 Using an object of type Statement for building and submitting an SQL
statement to the database.
Clean up the environment:
 Explicitly closing all database resources versus relying on the JVM's
garbage collection.
76K.T.Mikel Raj
STORED PROCEDURES
77K.T.Mikel Raj
Stored Procedure Language
Stored Procedure Overview
Stored Procedure is a function in a shared library accessible to the
database server
can also write stored procedures using languages such as C or Java
Advantages of stored procedure : Reduced network traffic
The more SQL statements that are grouped together for execution, the
larger the savings in network traffic
78K.T.Mikel Raj
Normal Database
79K.T.Mikel Raj
Applications using stored
procedures
80K.T.Mikel Raj
Writing Stored Procedures
 Tasks performed by the client application
 Tasks performed by the stored procedure, when invoked
 The CALL statement
 Explicit parameter to be defined :
 IN: Passes a value to the stored procedure from the client application
 OUT: Stores a value that is passed to the client application when the stored procedure
terminates.
 INOUT : Passes a value to the stored procedure from the client application, and returns a
value to the Client application when the stored procedure terminates
81K.T.Mikel Raj
Some Valid SQL Procedure Body Statements
 CASE statement
 FOR statement
 GOTO statement
 IF statement
 ITERATE statement
 RETURN statement
 WHILE statement
82K.T.Mikel Raj
 Invoking Procedures
Can invoke Stored procedure stored at the location of the database by using the SQL CALL
statement
 Nested SQL Procedures:
To call a target SQL procedure from within a caller SQL procedure, simply include a CALL
statement with the appropriate number and types of parameters in your caller.
83K.T.Mikel Raj
CONDITIONAL STATEMENTS:
IF <condition> THEN
<statement(s)>
ELSE
<statement(s)>
END IF;
Loops
LOOP
……
EXIT WHEN <condition>
……
END LOOP;
84K.T.Mikel Raj
Big Data
85K.T.Mikel Raj
What is Big Data?
• Big data is a massive volume of both structured and unstructured data that
is so large it is difficult to process using traditional database and software
techniques.
• In most enterprise scenarios the volume of data is too big or it moves too
fast or it exceeds current processing capacity.
• Despite these problems, big data has the potential to help companies
improve operations and make faster, more intelligent decisions.
86K.T.Mikel Raj
Why Big Data
Key enablers of appearance and growth of Big Data are
 Increase of storage capacities
 Increase of processing power
 Availability of data
 Every day we create 2.5 quintillion bytes of data; 90% of the data in the world
today has been created in the last two years alone
87K.T.Mikel Raj
Big Data Everywhere!
• Lots of data is being collected
and warehoused
– Web data, e-commerce
– purchases at department/
grocery stores
– Bank/Credit Card
transactions
– Social Network
88K.T.Mikel Raj
How much data?
• Google processes 20 PB a day (2008)
• Facebook has 2.5 PB of user data + 15 TB/day (4/2009)
• eBay has 6.5 PB of user data + 50 TB/day (5/2009)
89K.T.Mikel Raj
Units
 Bit
 Nibble
 Byte/octet (B)
 Kilobyte (KB)
 Megabyte (MB)
 Gigabyte (GB)
 Terabyte (TB)
 Petabyte (PB)
 Exabyte (EB)
 Zettabyte (ZB)
 Yottabyte (YB) 90K.T.Mikel Raj
91K.T.Mikel Raj
Three V‘s of Big Data
Volume
•Data
quantity
Velocity
•Data Speed
Variety
•Data Types
92K.T.Mikel Raj
Types of Data
• Three concepts come with big data :
Structured Data
Semi structured Data &
Unstructured Data.
93K.T.Mikel Raj
Structured Data
 It concerns all data which can be stored in database SQL in table with
rows and columns.
 They have relational key and can be easily mapped into pre-designed fields.
 Today, those data’s are the most processed in development and the simplest
way to manage information.
 But structured data’s represent only 5 to 10% of all informatics data’s.
94K.T.Mikel Raj
Semi structured data
• Semi-structured data is information that doesn’t reside in a relational
database but that does have some organizational properties that make it
easier to analyze.
• Examples of semi-structured : XML and JSON (JavaScript Object Notation) documents are
semi structured documents.
• But as Structured data, semi structured data represents a few parts of data (5
to 10%).
95K.T.Mikel Raj
Unstructured data
• Unstructured data represent around 80% of data.
• It often include text and multimedia content.
Examples: include e-mail messages, word processing documents, videos, photos, audio files,
presentations, WebPages and many other kinds of business documents.
• Note that while these sorts of files may have an internal structure, they are
still considered « unstructured » because the data they contain doesn’t fit
neatly in a database.
• Unstructured data is everywhere. In fact, most individuals and organizations
conduct their lives around unstructured data.
96K.T.Mikel Raj
• Here are some examples of machine-generated unstructured
data:
Satellite images
Scientific data
Photographs and video
Social media data
Mobile data &
website content
97K.T.Mikel Raj
What to do with these data?
• Aggregation and Statistics
– Data warehouse and OLAP
• Indexing, Searching, and Querying
– Keyword based search
– Pattern matching (XML/RDF)
• Knowledge discovery
– Data Mining
– Statistical Modeling
98K.T.Mikel Raj
Examples of Big Data
IT log analytics
 IT solutions and IT departments generate an enormous quantity of logs and trace data.
 In the absence of a Big Data solution, much of this data must go unexamined:
organizations simply don't have the manpower or resource to churn through all that
information by hand, let alone in real time.
 With a Big Data solution in place, however, those logs and trace data can be put to good
use.
 Within this list of Big Data application examples, IT log analytics is the most broadly
applicable.
99K.T.Mikel Raj
Applications for Big Data Analytics
Homeland Security
FinanceSmarter Healthcare
Multi-channel sales
Telecom
Manufacturing
Traffic Control
Trading Analytics Fraud and Risk
Log Analysis
Search Quality
Retail: Churn, NBO
100K.T.Mikel Raj
NoSQL
101K.T.Mikel Raj
NoSQL?
NoSQL Not SQL
does not mean
102K.T.Mikel Raj
NoSQL?
NoSQL Not Only SQL
OR
Not Relational DatabaseIt means
103K.T.Mikel Raj
Why NoSQL
• Large Volume of Data
• Dynamic Schemas
• Auto-sharding
• Replication
• Horizontally Scalable
* Some Operations can be achieved by Enterprise class RDBMS software but with very High cost
104K.T.Mikel Raj
Define NoSQL
• NoSQL is a non-relational database management systems, different from
traditional relational database management systems in some significant ways.
• NoSQL database provides a mechanism for storage and retrieval of data that is
modeled in means other than the tabular relations used in relation databases
(RDBMS).
• It is designed for distributed data stores where very large scale of data storing
needs (for example Google or Facebook which collects terabits of data every
day for their users).
105K.T.Mikel Raj
Types of NoSQL Databases
NoSQL Databases
Document Stores Graph Databases Key-Value Stores
Columnar
Databases
106K.T.Mikel Raj
Document Oriented Databases
 Document oriented databases treat a document as a whole and avoid
splitting a document in its constituent name/value pairs.
 At a collection level, this allows for putting together a diverse set of
documents into a single collection.
Document databases allow indexing of documents on the basis of not
only its primary identifier but also its properties.
107K.T.Mikel Raj
Cont…
 Different open-source document databases are available today but the
most prominent among the available options are MongoDB and
CouchDB.
 In fact, MongoDB has become one of the most popular NoSQL
databases.
108K.T.Mikel Raj
Document Oriented Databases
109K.T.Mikel Raj
Graph Based Databases
A graph database uses graph structures with nodes, edges, and
properties to represent and store data.
By definition, a graph database is any storage system that
provides index-free adjacency. This means that every element
contains a direct pointer to its adjacent element and no index
lookups are necessary.
110K.T.Mikel Raj
Cont…
General graph databases that can store any graph are distinct
from specialized graph databases such as triple-stores and
network databases. Indexes are used for traversing the graph.
111K.T.Mikel Raj
Graph Database
112K.T.Mikel Raj
Column Based Databases
The column-oriented storage allows data to be stored effectively.
It avoids consuming space when storing nulls by simply not
storing a column when a value doesn’t exist for that column.
113K.T.Mikel Raj
Cont…
Each unit of data can be thought of as a set of key/value pairs,
where the unit itself is identified with the help of a primary
identifier, often referred to as the primary key.
114K.T.Mikel Raj
115K.T.Mikel Raj
Key Value Databases
The key of a key/value pair is a unique value in the set and can be
easily looked up to access the data.
 Key/value pairs are of varied types: some keep the data in
memory and some provide the capability to persist the data to
disk.
116K.T.Mikel Raj
Key Value Databases
117K.T.Mikel Raj
Key Value Databases
118K.T.Mikel Raj
Benefits of NoSQL over RDBMS
Schema Less
NoSQL databases being schema-less do not define any strict data
structure.
Dynamic and Agile
NoSQL databases have good tendency to grow dynamically with
changing requirements. It can handle structured, semi-
structured and unstructured data.
119K.T.Mikel Raj
Benefits (cont…)
Scales Horizontally:
NoSQL scales horizontally by adding more servers and using
concepts of sharding and replication.
 This behavior of NoSQL fits with the cloud computing
services such as Amazon Web Services (AWS) which allows you
to handle virtual servers which can be expanded horizontally
on demand.
120K.T.Mikel Raj
Benefits (cont…)
 Better Performance:
All the NoSQL databases claim to deliver better and faster performance
as compared to traditional RDBMS implementations.
121K.T.Mikel Raj
CAP Theorem
122K.T.Mikel Raj
CAP
It is impossible for a web service to provide following three guarantees at the
same time:
Consistency
Availability
Partition-tolerance
A distributed system can satisfy any two of these guarantees at the same
time but not all three
123K.T.Mikel Raj
CAP Theorem
Consistency
All the servers in the system will have the same data so anyone using the
system will get the same copy regardless of which server answers their
request.
Availability
The system will always respond to a request (even if it's not the latest data
or consistent across the system or just a message saying the system isn't
working)
Partition Tolerance
The system continues to operate as a whole even if individual servers fail
or can't be reached..
124K.T.Mikel Raj
CAP Theorem
C A
P
125K.T.Mikel Raj
CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the same room?
Bob Dong
126K.T.Mikel Raj
CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the same room?
Bob Dong
127K.T.Mikel Raj
CAP Theorem
• A simple example:
Hotel Booking: are we double-booking the same room?
Bob Dong
128K.T.Mikel Raj
Credit: http://architects.dzone.com/articles/better-explaining-cap-theorem 129K.T.Mikel Raj
Choosing AP
Credit: https://foundationdb.com/key-value-store/white-papers/the-cap-theorem 130K.T.Mikel Raj
Choosing CP
Credit: https://foundationdb.com/key-value-store/white-papers/the-cap-theorem
Replication allows to add
Availability
131K.T.Mikel Raj
Hadoop
132K.T.Mikel Raj
Introduction
 An open source software framework
 Supports Data intensive Distributed Applications.
 Derived from Google’s Map-Reduce and Google File System papers.
 Written in the Java Programming Language.
133K.T.Mikel Raj
Hadoop (Why)
 Need to process huge datasets on large no. of computers.
 It is expensive to build reliability into each application.
 Nodes fails everyday
 Failure is expected, rather than exceptional.
 Need common infrastructure
Efficient, reliable, easy to use.
Open sourced , Apache License
134K.T.Mikel Raj
What is Hadoop Used for ?
Searching (Yahoo)
Log Processing
Recommendation Systems (Facebook, LinkedIn, eBay, Amazon)
Analytics(Facebook, LinkedIn)
Video and Image Analysis(NASA)
Data Retention
135K.T.Mikel Raj
Hadoop High Level Architecture
136K.T.Mikel Raj
Goals of HDFS
1. Very Large Distributed File System
- 10K nodes, 100 million files, 10 PB
2. Assumes Commodity Hardware
- Files are replicated to handle hardware failure
- Detect failures and recovers from them
3. Optimized for Batch Processing
- Data locations exposed so that computation can move to where data resides.
137K.T.Mikel Raj
What is Hive
 Hive is a data warehouse infrastructure tool to process
structured data in Hadoop.
Initially Hive was developed by Facebook, later the Apache
Software Foundation took it up and developed it further as an
open source under the name Apache Hive.
138K.T.Mikel Raj
Hive is not
A relational database
A design for OnLine Transaction Processing (OLTP)
A language for real-time queries and row-level updates
139K.T.Mikel Raj
Features of Hive
140
•It stores schema in a database and processed data into HDFS.
•It is designed for OLAP.
•It provides SQL type language for querying called HiveQL or HQL.
•It is familiar, fast, scalable, and extensible.
K.T.Mikel Raj
Architecture of Hive
• The following component diagram depicts the architecture of
Hive:
141K.T.Mikel Raj
Architecture of Hive
Units and its operations
User Interface
• Hive is a data warehouse infrastructure software that can create
interaction between user and HDFS.
• The user interfaces that Hive supports are Hive Web UI, Hive
command line, and Hive HD Insight (In Windows server).
142K.T.Mikel Raj
Meta Store
• Hive chooses respective database servers to store the schema
or Metadata of tables, databases, columns in a table, their data
types, and HDFS mapping.
143K.T.Mikel Raj
HiveQL Process Engine
• HiveQL is similar to SQL for querying on schema info on the
Metastore.
• It is one of the replacements of traditional approach for
MapReduce program.
144K.T.Mikel Raj
Execution Engine
The conjunction part of HiveQL process Engine and
MapReduce is Hive Execution Engine.
Execution engine processes the query and generates results as
same as MapReduce results.
145K.T.Mikel Raj
HDFS or HBASE
Hadoop distributed file system or HBASE are the data storage
techniques to store data into file system.
146K.T.Mikel Raj
MAP REDUCE
147K.T.Mikel Raj
What is Map Reduce?
 MapReduce is a processing technique and a program model for
distributed computing based on java.
 The MapReduce algorithm contains two important tasks, namely Map
and Reduce.
 Map takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples .
148K.T.Mikel Raj
Cont…
Secondly, reduce task, which takes the output from a map as
an input and combines those data tuples into a smaller set of
tuples.
MapReduce is a programming model Google has used
successfully is processing its “big-data” sets (~ 20000 peta
bytes per day)
149K.T.Mikel Raj
What is Map Reduce? Cont…
Users specify the computation in terms of a map and a reduce function,
Underlying runtime system automatically parallelizes the computation across
large-scale clusters of machines, and
Underlying system also handles machine failures, efficient communications, and
performance issues.
150K.T.Mikel Raj
Map stage
 The map or mapper’s job is to process the input data.
 Generally the input data is in the form of file or directory and is stored
in the Hadoop file system (HDFS).
 The input file is passed to the mapper function line by line.
 The mapper processes the data and creates several small chunks of
data.
151K.T.Mikel Raj
Reduce stage
 This stage is the combination of the Shuffle stage and the Reduce
stage.
 The Reducer’s job is to process the data that comes from the
mapper.
 After processing, it produces a new set of output, which will be
stored in the HDFS.
152K.T.Mikel Raj
MapReduce
153K.T.Mikel Raj
MapReduce
154K.T.Mikel Raj
Thank ‘U’
155K.T.Mikel Raj

Contenu connexe

Tendances

Structured analysis and structured design
Structured analysis  and structured designStructured analysis  and structured design
Structured analysis and structured designSudeep Singh
 
Kendall7e ch05
Kendall7e ch05Kendall7e ch05
Kendall7e ch05sayAAhmad
 
Introduction to ER Diagrams
Introduction to ER DiagramsIntroduction to ER Diagrams
Introduction to ER DiagramsAdri Jovin
 
Data and functional modeling
Data and functional modelingData and functional modeling
Data and functional modelingSlideshare
 
Modelling System Requirements: Events & Things
Modelling System Requirements: Events & ThingsModelling System Requirements: Events & Things
Modelling System Requirements: Events & Thingswmomoni
 
Kendall7e ch02
Kendall7e ch02Kendall7e ch02
Kendall7e ch02sayAAhmad
 
Rapidly Generating Human and System Requirements - Mark Mellblom
Rapidly Generating Human and System Requirements  - Mark MellblomRapidly Generating Human and System Requirements  - Mark Mellblom
Rapidly Generating Human and System Requirements - Mark MellblomPaul W. Johnson
 
Object Oriented Analysis and Design with UML2 part1
Object Oriented Analysis and Design with UML2 part1Object Oriented Analysis and Design with UML2 part1
Object Oriented Analysis and Design with UML2 part1Haitham Raik
 
Entity relationship modelling - DE L300
Entity relationship modelling - DE L300Entity relationship modelling - DE L300
Entity relationship modelling - DE L300Edwin Ayernor
 
Introduction To Ooad
Introduction To OoadIntroduction To Ooad
Introduction To OoadRajesh Kumar
 
Object oriented methodologies
Object oriented methodologiesObject oriented methodologies
Object oriented methodologiesnaina-rani
 
E-R Diagram of College Management Systems
E-R Diagram of College Management SystemsE-R Diagram of College Management Systems
E-R Diagram of College Management SystemsOmprakash Chauhan
 
Structured system analysis
Structured system analysisStructured system analysis
Structured system analysislearnt
 
Basic database analysis(database)
Basic database analysis(database)Basic database analysis(database)
Basic database analysis(database)welcometofacebook
 
01 si(systems analysis and design )
01 si(systems analysis and design )01 si(systems analysis and design )
01 si(systems analysis and design )Nurdin Al-Azies
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysisMahesh Bhalerao
 

Tendances (20)

Structured analysis and structured design
Structured analysis  and structured designStructured analysis  and structured design
Structured analysis and structured design
 
Kendall7e ch05
Kendall7e ch05Kendall7e ch05
Kendall7e ch05
 
Ph.D. Registeration seminar
Ph.D. Registeration seminarPh.D. Registeration seminar
Ph.D. Registeration seminar
 
Introduction to ER Diagrams
Introduction to ER DiagramsIntroduction to ER Diagrams
Introduction to ER Diagrams
 
Data and functional modeling
Data and functional modelingData and functional modeling
Data and functional modeling
 
Modelling System Requirements: Events & Things
Modelling System Requirements: Events & ThingsModelling System Requirements: Events & Things
Modelling System Requirements: Events & Things
 
Kendall7e ch02
Kendall7e ch02Kendall7e ch02
Kendall7e ch02
 
Rapidly Generating Human and System Requirements - Mark Mellblom
Rapidly Generating Human and System Requirements  - Mark MellblomRapidly Generating Human and System Requirements  - Mark Mellblom
Rapidly Generating Human and System Requirements - Mark Mellblom
 
Object Oriented Analysis and Design with UML2 part1
Object Oriented Analysis and Design with UML2 part1Object Oriented Analysis and Design with UML2 part1
Object Oriented Analysis and Design with UML2 part1
 
Patterns
PatternsPatterns
Patterns
 
Entity relationship modelling - DE L300
Entity relationship modelling - DE L300Entity relationship modelling - DE L300
Entity relationship modelling - DE L300
 
Introduction To Ooad
Introduction To OoadIntroduction To Ooad
Introduction To Ooad
 
Object oriented methodologies
Object oriented methodologiesObject oriented methodologies
Object oriented methodologies
 
E-R Diagram of College Management Systems
E-R Diagram of College Management SystemsE-R Diagram of College Management Systems
E-R Diagram of College Management Systems
 
Structured system analysis
Structured system analysisStructured system analysis
Structured system analysis
 
Object model
Object modelObject model
Object model
 
Basic database analysis(database)
Basic database analysis(database)Basic database analysis(database)
Basic database analysis(database)
 
01 si(systems analysis and design )
01 si(systems analysis and design )01 si(systems analysis and design )
01 si(systems analysis and design )
 
Ooad unit – 1 introduction
Ooad unit – 1 introductionOoad unit – 1 introduction
Ooad unit – 1 introduction
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 

En vedette

How to Collect and Process Data Under GDPR?
How to Collect and Process Data Under GDPR?How to Collect and Process Data Under GDPR?
How to Collect and Process Data Under GDPR?Piwik PRO
 
1st step LogicFlow
1st step LogicFlow1st step LogicFlow
1st step LogicFlowTomoyuki Obi
 
Rapid Infrastructure Provisioning
Rapid Infrastructure ProvisioningRapid Infrastructure Provisioning
Rapid Infrastructure ProvisioningUchit Vyas ☁
 
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...EMC
 
IBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 finalIBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 finalCOMMON Europe
 
Fontys eric van tol
Fontys eric van tolFontys eric van tol
Fontys eric van tolBigDataExpo
 
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...Amazon Web Services
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Lucidworks
 
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...Cigniti Technologies Ltd
 
Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...hrmalik20
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
Red Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed_Hat_Storage
 
D5 crazy speed web development
D5 crazy speed web developmentD5 crazy speed web development
D5 crazy speed web developmentNAVER D2
 
First day of school for sixth grade
First day of school for sixth gradeFirst day of school for sixth grade
First day of school for sixth gradeEmily Kissner
 
Science ABC Book
Science ABC BookScience ABC Book
Science ABC Booktjelk1
 

En vedette (20)

Introduction to QC
Introduction to QCIntroduction to QC
Introduction to QC
 
How to Collect and Process Data Under GDPR?
How to Collect and Process Data Under GDPR?How to Collect and Process Data Under GDPR?
How to Collect and Process Data Under GDPR?
 
1st step LogicFlow
1st step LogicFlow1st step LogicFlow
1st step LogicFlow
 
ecdevday4
ecdevday4ecdevday4
ecdevday4
 
Rapid Infrastructure Provisioning
Rapid Infrastructure ProvisioningRapid Infrastructure Provisioning
Rapid Infrastructure Provisioning
 
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
 
IBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 finalIBM CEC Big Data 2011 06-11 final
IBM CEC Big Data 2011 06-11 final
 
ecdevday7
ecdevday7ecdevday7
ecdevday7
 
EventoDadosAbertos v17ago16
EventoDadosAbertos v17ago16EventoDadosAbertos v17ago16
EventoDadosAbertos v17ago16
 
Fontys eric van tol
Fontys eric van tolFontys eric van tol
Fontys eric van tol
 
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
 
Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
Red Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open Stack
 
D5 crazy speed web development
D5 crazy speed web developmentD5 crazy speed web development
D5 crazy speed web development
 
stagerapport2.3
stagerapport2.3stagerapport2.3
stagerapport2.3
 
First day of school for sixth grade
First day of school for sixth gradeFirst day of school for sixth grade
First day of school for sixth grade
 
Science ABC Book
Science ABC BookScience ABC Book
Science ABC Book
 

Similaire à IT6701 Information Management Unit-I

Unit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxUnit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxMaryJoseph79
 
Cn presentation on the topic called as re modelling
Cn presentation on the topic called as re modellingCn presentation on the topic called as re modelling
Cn presentation on the topic called as re modellingg30162363
 
WBC Entity Relationship and data flow diagrams
WBC Entity Relationship and data flow diagramsWBC Entity Relationship and data flow diagrams
WBC Entity Relationship and data flow diagramsArshitSood3
 
Data modelling it's process and examples
Data modelling it's process and examplesData modelling it's process and examples
Data modelling it's process and examplesJayeshGadhave1
 
Entity relationship diagram (erd)
Entity relationship diagram (erd)Entity relationship diagram (erd)
Entity relationship diagram (erd)tameemyousaf
 
Entity relationship diagram (erd)
Entity relationship diagram (erd)Entity relationship diagram (erd)
Entity relationship diagram (erd)tameemyousaf
 
Entity relationship diagram (erd)
Entity relationship diagram (erd)Entity relationship diagram (erd)
Entity relationship diagram (erd)tameemyousaf
 
chapter5-220725172250-dc425eb2.pdf
chapter5-220725172250-dc425eb2.pdfchapter5-220725172250-dc425eb2.pdf
chapter5-220725172250-dc425eb2.pdfMahmoudSOLIMAN380726
 
Chapter 5: Data Development
Chapter 5: Data Development Chapter 5: Data Development
Chapter 5: Data Development Ahmed Alorage
 
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdf
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdfAnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdf
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdfNamanGulati17
 
Chapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdfChapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdfTamiratDejene1
 
DBMS data modeling.pptx
DBMS data modeling.pptxDBMS data modeling.pptx
DBMS data modeling.pptxMrwafaAbbas
 
IT6701 Information Management - Unit I
IT6701 Information Management - Unit I  IT6701 Information Management - Unit I
IT6701 Information Management - Unit I pkaviya
 
Module 1 session 5
Module 1   session 5Module 1   session 5
Module 1 session 5raghuinfo
 

Similaire à IT6701 Information Management Unit-I (20)

Unit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxUnit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptx
 
Cn presentation on the topic called as re modelling
Cn presentation on the topic called as re modellingCn presentation on the topic called as re modelling
Cn presentation on the topic called as re modelling
 
WBC Entity Relationship and data flow diagrams
WBC Entity Relationship and data flow diagramsWBC Entity Relationship and data flow diagrams
WBC Entity Relationship and data flow diagrams
 
Data modelling it's process and examples
Data modelling it's process and examplesData modelling it's process and examples
Data modelling it's process and examples
 
Entity relationship diagram (erd)
Entity relationship diagram (erd)Entity relationship diagram (erd)
Entity relationship diagram (erd)
 
Entity relationship diagram (erd)
Entity relationship diagram (erd)Entity relationship diagram (erd)
Entity relationship diagram (erd)
 
Entity relationship diagram (erd)
Entity relationship diagram (erd)Entity relationship diagram (erd)
Entity relationship diagram (erd)
 
DataModeling.pptx
DataModeling.pptxDataModeling.pptx
DataModeling.pptx
 
ER modeling
ER modelingER modeling
ER modeling
 
chapter5-220725172250-dc425eb2.pdf
chapter5-220725172250-dc425eb2.pdfchapter5-220725172250-dc425eb2.pdf
chapter5-220725172250-dc425eb2.pdf
 
Chapter 5: Data Development
Chapter 5: Data Development Chapter 5: Data Development
Chapter 5: Data Development
 
oracle
oracle oracle
oracle
 
ERD.ppt
ERD.pptERD.ppt
ERD.ppt
 
ERD.ppt
ERD.pptERD.ppt
ERD.ppt
 
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdf
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdfAnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdf
AnalytixLabs - Data Science 360 (Nasscom)-1648178720283 (1).pdf
 
DB design
DB designDB design
DB design
 
Chapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdfChapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdf
 
DBMS data modeling.pptx
DBMS data modeling.pptxDBMS data modeling.pptx
DBMS data modeling.pptx
 
IT6701 Information Management - Unit I
IT6701 Information Management - Unit I  IT6701 Information Management - Unit I
IT6701 Information Management - Unit I
 
Module 1 session 5
Module 1   session 5Module 1   session 5
Module 1 session 5
 

Dernier

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 

Dernier (20)

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 

IT6701 Information Management Unit-I

  • 3. Introduction • Process of creating a data model for an information system by applying formal data modeling techniques. • Process used to define and analyze data requirements needed to support the business processes. • Therefore, the process of data modeling involves professional data modelers working closely with business stakeholders, as well as potential users of the information system. 3K.T.Mikel Raj
  • 4. What is Data Model? • Data Model is a collection of conceptual tools for describing data, data relationships, data semantics and consistency constraint. • A data model is a conceptual representation of data structures required for data base and is very powerful in expressing and communicating the business requirements. • A data model visually represents the nature of data, business rules governing the data, and how it will be organized in the database. 4K.T.Mikel Raj
  • 5. • A data model provides a way to describe the design of a database at the physical, logical and view levels. • There are three different types of data models produced while progressing from requirements to the actual database to be used for the information system 5K.T.Mikel Raj
  • 6. • Conceptual: describes WHAT the system contains. • Logical: describes HOW the system will be implemented, regardless of the DBMS. • Physical: describes HOW the system will be implemented using a specific DBMS. Different Data Models 6K.T.Mikel Raj
  • 7. A data model consists of entities related to each other on a diagram: Data Model Element Definition Entity A real world thing or an interaction between 2 or more real world things. Attribute The atomic pieces of information that we need to know about entities. Relationship How entities depend on each other in terms of why the entities depend on each other (the relationship) and what that relationship is (the cardinality of the relationship). 7K.T.Mikel Raj
  • 8. Example: Given that … • “Customer” is an entity. • “Product” is an entity. • For a “Customer” we need to know their “customer number” attribute and “name” attribute. • For a “Product” we need to know the “product name” attribute and “price” attribute. • “Sale” is an entity that is used to record the interaction of “Customer” and “Product”. 8K.T.Mikel Raj
  • 9. Here is the diagram that encapsulates these rules: 9K.T.Mikel Raj
  • 10. Notes • By convention, entities are named in the singular. • The attributes of “Customer” are “Customer No” (which is the unique identifier or primary key of the “Customer” entity and is shown by the # symbol) and “Customer Name”. • “Sale” has a composite primary key made up of the primary key of “Customer”, the primary key of “Product” and the date of the sale. • Think of entities as tables, think of attributes as columns on the table and think of instances as rows on that table: 10K.T.Mikel Raj
  • 11. • If we want to know the price of a Sale, we can ‘find’ it by using the “Product Code” on the instance of “Sale” we are interested in and look up the corresponding “Price” on the “Product” entity with the matching “Product Code”. 11K.T.Mikel Raj
  • 12. Types of Data Models • Entity-Relationship (E-R) Models • UML (unified modeling language) 12K.T.Mikel Raj
  • 13. Entity-Relationship Model • Entity Relationship Diagrams (ERD) as this is the most widely used • ERDs have an advantage in that they are capable of being normalized • Represent entities as rectangles • List attributes within the rectangle UniversityStudent PK StudentID StudentName StudentDOB StudentAge Entity Attributes Primary key 13K.T.Mikel Raj
  • 14. Why and When • The purpose of a data model is to describe the concepts relevant to a domain, the relationships between those concepts, and information associated with them. 14K.T.Mikel Raj
  • 15. • Used to model data in a standard, consistent, predictable manner in order to manage it as a resource. • To have a clear picture of the base data that your business needs. • To identify missing and redundant base data. 15K.T.Mikel Raj
  • 16. • To Establish a baseline for communication across functional boundaries within your organization. • Provides a basis for defining business rules. • Makes it cheaper, easier, and faster to upgrade your IT solutions. 16K.T.Mikel Raj
  • 17. Entity Relationship Diagram (ERD) 17K.T.Mikel Raj
  • 18. Objectives • Define terms related to entity relationship modeling, including entity, entity instance, attribute, relationship and cardinality, and primary key. • Describe the entity modeling process. 18K.T.Mikel Raj
  • 19. • Discuss how to draw an entity relationship diagram. • Describe how to recognize entities, attributes, relationships, and cardinalities. 19K.T.Mikel Raj
  • 20. Database Model A database can be modeled as: – a collection of entities, – relationship among entities. Database systems are often modeled using an Entity Relationship (ER) diagram as the "blueprint" from which the actual data is stored — the output of the design phase. 20K.T.Mikel Raj
  • 21. Entity Relationship Diagram (ERD) • ER model allows us to sketch database designs • ERD is a graphical tool for modeling data. • ERD is widely used in database design • ERD is a graphical representation of the logical structure of a database • ERD is a model that identifies the concepts or entities that exist in a system and the relationships between those entities 21K.T.Mikel Raj
  • 22. Purposes of ERD An ERD serves several purposes • The database analyst/designer gains a better understanding of the information to be contained in the database through the process of constructing the ERD. • The ERD serves as a documentation tool. 22K.T.Mikel Raj
  • 23. • Finally, the ERD is used to communicate the logical structure of the database to users. In particular, the ERD effectively communicates the logic of the database to users. 23K.T.Mikel Raj
  • 24. Components of an ERD An ERD typically consists of four different graphical components: 1. Entity 2. Relationship 3. Cardinality 4. Attribute 24K.T.Mikel Raj
  • 25. Classification of Relationship • Optional Relationship – An Employee may or may not be assigned to a Department – A Patient may or may not be assigned to a Bed • Mandatory Relationship – Every Course must be taught by at least one Teacher – Every mother have at least a Child 25K.T.Mikel Raj
  • 26. Cardinality Constraints  Express the number of entities to which another entity can be associated via a relationship set. • Cardinality Constraints - the number of instances of one entity that can or must be associated with each instance of another entity. • Minimum Cardinality – If zero, then optional – If one or more, then mandatory • Maximum Cardinality – The maximum number 26K.T.Mikel Raj
  • 27. Cardinality Constraints (Contd.) • For a binary relationship set the mapping cardinality must be one of the following types: –One to one • A Manager Head one Department and vice versa –One to many ( or many to one) • An Employee Works in one Department or One Department has many Employees –Many to many • A Teacher Teaches many Students and A student is taught by many Teachers 27K.T.Mikel Raj
  • 28. General Steps to create an ERD • Identify the entity • Identify the entity's attributes • Identify the Primary Keys • Identify the relation between entities • Identify the Cardinality constraint • Draw the ERD • Check the ERD 28K.T.Mikel Raj
  • 29. Steps in building an ERD 29K.T.Mikel Raj
  • 30. Developing an ERD The process has ten steps: 1. Identify Entities 2. Find Relationships 3. Draw Rough ERD 4. Fill in Cardinality 5. Define Primary Keys 6. Draw Key-Based ERD 7. Identify Attributes 8. Map Attributes 9. Draw fully attributed ERD 10. Check Results 30K.T.Mikel Raj
  • 31. A Simple Example A company has several departments. Each department has a supervisor and at least one employee. Employees must be assigned to at least one, but possibly more departments. At least one employee is assigned to a project, but an employee may be on vacation and not assigned to any projects. The important data fields are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee number and a unique project number. 31K.T.Mikel Raj
  • 32. Identify entities • One approach to this is to work through the information and highlight those words which you think correspond to entities. • A company has several departments. Each department has a supervisor and at least one employee. Employees must be assigned to at least one, but possibly more departments. At least one employee is assigned to a project, but an employee may be on vacation and not assigned to any projects. The important data fields are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee number and a unique project number. • A true entity should have more than one instance 32K.T.Mikel Raj
  • 33. Find Relationships • Aim is to identify the associations, the connections between pairs of entities. • A simple approach to do this is using a relationship matrix (table) that has rows and columns for each of the identified entities. 33K.T.Mikel Raj
  • 34. Find Relationships (Contd.) • Go through each cell and decide whether or not there is an association. For example, the first cell on the second row is used to indicate if there is a relationship between the entity "Employee" and the entity "Department". 34K.T.Mikel Raj
  • 35. Identified Relationships Names placed in the cells are meant to capture/describe the relationships. So you can use them like this • A Department is assigned an employee • A Department is run by a supervisor • An employee belongs to a department • An employee works on a project • A supervisor runs a department • A project uses an employee 35K.T.Mikel Raj
  • 36. Draw Rough ERD Draw a diagram and: • Place all the entities in rectangles • Use diamonds and lines to represent the relationships between entities. • General Examples 36K.T.Mikel Raj
  • 37. Drawing Rough ERD (Contd.) 37K.T.Mikel Raj
  • 38. Drawing Rough ERD (Contd.) 38K.T.Mikel Raj
  • 39. Drawing Rough ERD (Contd.) 39K.T.Mikel Raj
  • 40. Fill in Cardinality • Supervisor – Each department has one supervisor. • Department – Each supervisor has one department. – Each employee can belong to one or more departments • Employee – Each department must have one or more employees – Each project must have one or more employees • Project – Each employee can have 0 or more projects. 40K.T.Mikel Raj
  • 41. Fill in Cardinality (Contd.) The cardinality of a relationship can only have the following values –One and only one –One or more –Zero or more –Zero or one 41K.T.Mikel Raj
  • 43. Cardinality Examples A A A A B B B B Each instance of A is related to a minimum of zero and a maximum of one instance of B Each instance of B is related to a minimum of one and a maximum of one instance of A Each instance of A is related to a minimum of one and a maximum of many instances of B Each instance of B is related to a minimum of zero and a maximum of many instances of A 43K.T.Mikel Raj
  • 46. ERD for Course Enrollment 46K.T.Mikel Raj
  • 47. ERD for Course Registration 47K.T.Mikel Raj
  • 48. Rough ERD Plus Primary Keys 48K.T.Mikel Raj
  • 49. Identify Attributes • In this step we try to identify and name all the attributes essential to the system we are studying without trying to match them to particular entities. • The best way to do this is to study the forms, files and reports currently kept by the users of the system and circle each data item on the paper copy. • Cross out those which will not be transferred to the new system, extraneous items such as signatures, and constant information which is the same for all instances of the form (e.g. your company name and address). The remaining circled items should represent the attributes you need. You should always verify these with your system users. (Sometimes forms or reports are out of date.) • The only attributes indicated are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee NUMBER and a unique project number. 49K.T.Mikel Raj
  • 50. Map Attributes • For each attribute we need to match it with exactly one entity. Often it seems like an attribute should go with more than one entity (e.g. Name). In this case you need to add a modifier to the attribute name to make it unique (e.g. Customer Name, Employee Name, etc.) or determine which entity an attribute "best' describes. • If you have attributes left over without corresponding entities, you may have missed an entity and its corresponding relationships. Identify these missed entities and add them to the relationship matrix now. 50K.T.Mikel Raj
  • 52. Draw Fully Attributed ERD 52K.T.Mikel Raj
  • 53. Check ERD Results • Look at your diagram from the point of view of a system owner or user. Is everything clear? • Check through the Cardinality pairs. • Also, look over the list of attributes associated with each entity to see if anything has been omitted. 53K.T.Mikel Raj
  • 54. Java Database Connectivity (JDBC) 54K.T.Mikel Raj
  • 55. Introduction • Database – Collection of data • DBMS – Database management system – Storing and organizing data • SQL – Relational database – Structured Query Language • JDBC – Java Database Connectivity – JDBC driver 55K.T.Mikel Raj
  • 56. JDBC • Programs developed with Java/JDBC are platform and vendor independent. • “write once, compile once, run anywhere” • Write apps in java to access any DB, using standard SQL statements – while still following Java conventions. • JDBC driver manager and JDBC drivers provide the bridge between the database and java worlds. 56K.T.Mikel Raj
  • 57. Java application JDBC Driver Manager JDBC/ODBC Bridge vendor- supplied JDBC driver ODBC driver DatabaseDatabase JDBC API JDBC Driver API 57K.T.Mikel Raj
  • 58. ODBC • JDBC heavily influenced by ODBC • ODBC provides a C interface for database access on Windows environment. • ODBC has a few commands with lots of complex options. Java prefers simple methods but lots of them. 58K.T.Mikel Raj
  • 59. • Type 1: Uses a bridging technology to access a database. JDBC-ODBC bridge is an example. It provides a gateway to the ODBC. • Type 2: Native API drivers. Driver contains Java code that calls native C/C++ methods provided by the database vendors. • Type 3: Generic network API that is then translated into database-specific access at the server level. The JDBC driver on the client uses sockets to call a middleware application on the server that translates the client requests into an API specific to the desired driver. Extremely flexible. • Type 4: Using network protocols built into the database engine talk directly to the database using Java sockets. Almost always comes only from database vendors. Type 3 Type 1 Type 2 Type 4 3rd Party API Native C/C++ API Database Local API Network API 59K.T.Mikel Raj
  • 60. JDBC Drivers Types JDBC driver implementations vary because of the wide variety of operating systems and hardware platforms in which Java operates. Sun has divided the implementation types into four categories, Types 1, 2, 3, and 4. 60K.T.Mikel Raj
  • 62. Common JDBC Components The JDBC API provides the following interfaces and classes − DriverManager: This class manages a list of database drivers.  Matches connection requests from the java application with the proper database driver using communication sub protocol.  The first driver that recognizes a certain subprotocol under JDBC will be used to establish a database Connection. . 62K.T.Mikel Raj
  • 63. Driver: This interface handles the communications with the database server. You will interact directly with Driver objects very rarely. Instead, you use DriverManager objects, which manages objects of this type. It also abstracts the details associated with working with Driver objects. 63K.T.Mikel Raj
  • 64. Connection:  This interface with all methods for contacting a database.  The connection object represents communication context, i.e., all communication with database is through connection object only. Statement: You use objects created from this interface to submit the SQL statements to the database. 64K.T.Mikel Raj
  • 65. ResultSet: These objects hold data retrieved from a database after you execute an SQL query using Statement objects.  It acts as an iterator to allow you to move through its data. SQLException:  This class handles any errors that occur in a database application 65K.T.Mikel Raj
  • 66. Type 1: JDBC-ODBC Bridge Driver In a Type 1 driver, a JDBC bridge is used to access ODBC drivers installed on each client machine.  Using ODBC, It requires configuring on your system a Data Source Name (DSN) that represents the target database. The JDBC-ODBC Bridge that comes with JDK 1.2 is a good example of this kind of driver. 66K.T.Mikel Raj
  • 68. Type 2: JDBC-Native API In a Type 2 driver, JDBC API calls are converted into native C/C++ API calls, which are unique to the database. These drivers are typically provided by the database vendors and used in the same manner as the JDBC-ODBC Bridge. The vendor-specific driver must be installed on each client machine. The Oracle Call Interface (OCI) driver is an example of a Type 2 driver. 68K.T.Mikel Raj
  • 70. Type 3: JDBC-Net pure Java In a Type 3 driver, a three-tier approach is used to access databases. The JDBC clients use standard network sockets to communicate with a middleware application server.  The socket information is then translated by the middleware application server into the call format required by the DBMS, and forwarded to the database server. 70K.T.Mikel Raj
  • 72. Type 4: 100% Pure Java In a Type 4 driver, a pure Java-based driver communicates directly with the vendor's database through socket connection. This is the highest performance driver available for the database and is usually provided by the vendor itself. This kind of driver is extremely flexible, you don't need to install special software on the client or server. Further, these drivers can be downloaded dynamically. 72K.T.Mikel Raj
  • 74. The following steps are required to create a new Database using JDBC application − Import the packages:  Requires that you include the packages containing the JDBC classes needed for database programming. Most often, using import java.sql.* will suffice. Register the JDBC driver:  Requires that you initialize a driver so you can open a communications channel with the database. 74K.T.Mikel Raj
  • 75. Open a connection:  Using the DriverManager.getConnection() method to create a Connection object, which represents a physical connection with the database server. To create a new database, you need not give any database name while preparing database URL as mentioned in the below example. 75K.T.Mikel Raj
  • 76. Execute a query:  Using an object of type Statement for building and submitting an SQL statement to the database. Clean up the environment:  Explicitly closing all database resources versus relying on the JVM's garbage collection. 76K.T.Mikel Raj
  • 78. Stored Procedure Language Stored Procedure Overview Stored Procedure is a function in a shared library accessible to the database server can also write stored procedures using languages such as C or Java Advantages of stored procedure : Reduced network traffic The more SQL statements that are grouped together for execution, the larger the savings in network traffic 78K.T.Mikel Raj
  • 81. Writing Stored Procedures  Tasks performed by the client application  Tasks performed by the stored procedure, when invoked  The CALL statement  Explicit parameter to be defined :  IN: Passes a value to the stored procedure from the client application  OUT: Stores a value that is passed to the client application when the stored procedure terminates.  INOUT : Passes a value to the stored procedure from the client application, and returns a value to the Client application when the stored procedure terminates 81K.T.Mikel Raj
  • 82. Some Valid SQL Procedure Body Statements  CASE statement  FOR statement  GOTO statement  IF statement  ITERATE statement  RETURN statement  WHILE statement 82K.T.Mikel Raj
  • 83.  Invoking Procedures Can invoke Stored procedure stored at the location of the database by using the SQL CALL statement  Nested SQL Procedures: To call a target SQL procedure from within a caller SQL procedure, simply include a CALL statement with the appropriate number and types of parameters in your caller. 83K.T.Mikel Raj
  • 84. CONDITIONAL STATEMENTS: IF <condition> THEN <statement(s)> ELSE <statement(s)> END IF; Loops LOOP …… EXIT WHEN <condition> …… END LOOP; 84K.T.Mikel Raj
  • 86. What is Big Data? • Big data is a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. • In most enterprise scenarios the volume of data is too big or it moves too fast or it exceeds current processing capacity. • Despite these problems, big data has the potential to help companies improve operations and make faster, more intelligent decisions. 86K.T.Mikel Raj
  • 87. Why Big Data Key enablers of appearance and growth of Big Data are  Increase of storage capacities  Increase of processing power  Availability of data  Every day we create 2.5 quintillion bytes of data; 90% of the data in the world today has been created in the last two years alone 87K.T.Mikel Raj
  • 88. Big Data Everywhere! • Lots of data is being collected and warehoused – Web data, e-commerce – purchases at department/ grocery stores – Bank/Credit Card transactions – Social Network 88K.T.Mikel Raj
  • 89. How much data? • Google processes 20 PB a day (2008) • Facebook has 2.5 PB of user data + 15 TB/day (4/2009) • eBay has 6.5 PB of user data + 50 TB/day (5/2009) 89K.T.Mikel Raj
  • 90. Units  Bit  Nibble  Byte/octet (B)  Kilobyte (KB)  Megabyte (MB)  Gigabyte (GB)  Terabyte (TB)  Petabyte (PB)  Exabyte (EB)  Zettabyte (ZB)  Yottabyte (YB) 90K.T.Mikel Raj
  • 92. Three V‘s of Big Data Volume •Data quantity Velocity •Data Speed Variety •Data Types 92K.T.Mikel Raj
  • 93. Types of Data • Three concepts come with big data : Structured Data Semi structured Data & Unstructured Data. 93K.T.Mikel Raj
  • 94. Structured Data  It concerns all data which can be stored in database SQL in table with rows and columns.  They have relational key and can be easily mapped into pre-designed fields.  Today, those data’s are the most processed in development and the simplest way to manage information.  But structured data’s represent only 5 to 10% of all informatics data’s. 94K.T.Mikel Raj
  • 95. Semi structured data • Semi-structured data is information that doesn’t reside in a relational database but that does have some organizational properties that make it easier to analyze. • Examples of semi-structured : XML and JSON (JavaScript Object Notation) documents are semi structured documents. • But as Structured data, semi structured data represents a few parts of data (5 to 10%). 95K.T.Mikel Raj
  • 96. Unstructured data • Unstructured data represent around 80% of data. • It often include text and multimedia content. Examples: include e-mail messages, word processing documents, videos, photos, audio files, presentations, WebPages and many other kinds of business documents. • Note that while these sorts of files may have an internal structure, they are still considered « unstructured » because the data they contain doesn’t fit neatly in a database. • Unstructured data is everywhere. In fact, most individuals and organizations conduct their lives around unstructured data. 96K.T.Mikel Raj
  • 97. • Here are some examples of machine-generated unstructured data: Satellite images Scientific data Photographs and video Social media data Mobile data & website content 97K.T.Mikel Raj
  • 98. What to do with these data? • Aggregation and Statistics – Data warehouse and OLAP • Indexing, Searching, and Querying – Keyword based search – Pattern matching (XML/RDF) • Knowledge discovery – Data Mining – Statistical Modeling 98K.T.Mikel Raj
  • 99. Examples of Big Data IT log analytics  IT solutions and IT departments generate an enormous quantity of logs and trace data.  In the absence of a Big Data solution, much of this data must go unexamined: organizations simply don't have the manpower or resource to churn through all that information by hand, let alone in real time.  With a Big Data solution in place, however, those logs and trace data can be put to good use.  Within this list of Big Data application examples, IT log analytics is the most broadly applicable. 99K.T.Mikel Raj
  • 100. Applications for Big Data Analytics Homeland Security FinanceSmarter Healthcare Multi-channel sales Telecom Manufacturing Traffic Control Trading Analytics Fraud and Risk Log Analysis Search Quality Retail: Churn, NBO 100K.T.Mikel Raj
  • 102. NoSQL? NoSQL Not SQL does not mean 102K.T.Mikel Raj
  • 103. NoSQL? NoSQL Not Only SQL OR Not Relational DatabaseIt means 103K.T.Mikel Raj
  • 104. Why NoSQL • Large Volume of Data • Dynamic Schemas • Auto-sharding • Replication • Horizontally Scalable * Some Operations can be achieved by Enterprise class RDBMS software but with very High cost 104K.T.Mikel Raj
  • 105. Define NoSQL • NoSQL is a non-relational database management systems, different from traditional relational database management systems in some significant ways. • NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relation databases (RDBMS). • It is designed for distributed data stores where very large scale of data storing needs (for example Google or Facebook which collects terabits of data every day for their users). 105K.T.Mikel Raj
  • 106. Types of NoSQL Databases NoSQL Databases Document Stores Graph Databases Key-Value Stores Columnar Databases 106K.T.Mikel Raj
  • 107. Document Oriented Databases  Document oriented databases treat a document as a whole and avoid splitting a document in its constituent name/value pairs.  At a collection level, this allows for putting together a diverse set of documents into a single collection. Document databases allow indexing of documents on the basis of not only its primary identifier but also its properties. 107K.T.Mikel Raj
  • 108. Cont…  Different open-source document databases are available today but the most prominent among the available options are MongoDB and CouchDB.  In fact, MongoDB has become one of the most popular NoSQL databases. 108K.T.Mikel Raj
  • 110. Graph Based Databases A graph database uses graph structures with nodes, edges, and properties to represent and store data. By definition, a graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent element and no index lookups are necessary. 110K.T.Mikel Raj
  • 111. Cont… General graph databases that can store any graph are distinct from specialized graph databases such as triple-stores and network databases. Indexes are used for traversing the graph. 111K.T.Mikel Raj
  • 113. Column Based Databases The column-oriented storage allows data to be stored effectively. It avoids consuming space when storing nulls by simply not storing a column when a value doesn’t exist for that column. 113K.T.Mikel Raj
  • 114. Cont… Each unit of data can be thought of as a set of key/value pairs, where the unit itself is identified with the help of a primary identifier, often referred to as the primary key. 114K.T.Mikel Raj
  • 116. Key Value Databases The key of a key/value pair is a unique value in the set and can be easily looked up to access the data.  Key/value pairs are of varied types: some keep the data in memory and some provide the capability to persist the data to disk. 116K.T.Mikel Raj
  • 119. Benefits of NoSQL over RDBMS Schema Less NoSQL databases being schema-less do not define any strict data structure. Dynamic and Agile NoSQL databases have good tendency to grow dynamically with changing requirements. It can handle structured, semi- structured and unstructured data. 119K.T.Mikel Raj
  • 120. Benefits (cont…) Scales Horizontally: NoSQL scales horizontally by adding more servers and using concepts of sharding and replication.  This behavior of NoSQL fits with the cloud computing services such as Amazon Web Services (AWS) which allows you to handle virtual servers which can be expanded horizontally on demand. 120K.T.Mikel Raj
  • 121. Benefits (cont…)  Better Performance: All the NoSQL databases claim to deliver better and faster performance as compared to traditional RDBMS implementations. 121K.T.Mikel Raj
  • 123. CAP It is impossible for a web service to provide following three guarantees at the same time: Consistency Availability Partition-tolerance A distributed system can satisfy any two of these guarantees at the same time but not all three 123K.T.Mikel Raj
  • 124. CAP Theorem Consistency All the servers in the system will have the same data so anyone using the system will get the same copy regardless of which server answers their request. Availability The system will always respond to a request (even if it's not the latest data or consistent across the system or just a message saying the system isn't working) Partition Tolerance The system continues to operate as a whole even if individual servers fail or can't be reached.. 124K.T.Mikel Raj
  • 126. CAP Theorem • A simple example: Hotel Booking: are we double-booking the same room? Bob Dong 126K.T.Mikel Raj
  • 127. CAP Theorem • A simple example: Hotel Booking: are we double-booking the same room? Bob Dong 127K.T.Mikel Raj
  • 128. CAP Theorem • A simple example: Hotel Booking: are we double-booking the same room? Bob Dong 128K.T.Mikel Raj
  • 133. Introduction  An open source software framework  Supports Data intensive Distributed Applications.  Derived from Google’s Map-Reduce and Google File System papers.  Written in the Java Programming Language. 133K.T.Mikel Raj
  • 134. Hadoop (Why)  Need to process huge datasets on large no. of computers.  It is expensive to build reliability into each application.  Nodes fails everyday  Failure is expected, rather than exceptional.  Need common infrastructure Efficient, reliable, easy to use. Open sourced , Apache License 134K.T.Mikel Raj
  • 135. What is Hadoop Used for ? Searching (Yahoo) Log Processing Recommendation Systems (Facebook, LinkedIn, eBay, Amazon) Analytics(Facebook, LinkedIn) Video and Image Analysis(NASA) Data Retention 135K.T.Mikel Raj
  • 136. Hadoop High Level Architecture 136K.T.Mikel Raj
  • 137. Goals of HDFS 1. Very Large Distributed File System - 10K nodes, 100 million files, 10 PB 2. Assumes Commodity Hardware - Files are replicated to handle hardware failure - Detect failures and recovers from them 3. Optimized for Batch Processing - Data locations exposed so that computation can move to where data resides. 137K.T.Mikel Raj
  • 138. What is Hive  Hive is a data warehouse infrastructure tool to process structured data in Hadoop. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. 138K.T.Mikel Raj
  • 139. Hive is not A relational database A design for OnLine Transaction Processing (OLTP) A language for real-time queries and row-level updates 139K.T.Mikel Raj
  • 140. Features of Hive 140 •It stores schema in a database and processed data into HDFS. •It is designed for OLAP. •It provides SQL type language for querying called HiveQL or HQL. •It is familiar, fast, scalable, and extensible. K.T.Mikel Raj
  • 141. Architecture of Hive • The following component diagram depicts the architecture of Hive: 141K.T.Mikel Raj
  • 142. Architecture of Hive Units and its operations User Interface • Hive is a data warehouse infrastructure software that can create interaction between user and HDFS. • The user interfaces that Hive supports are Hive Web UI, Hive command line, and Hive HD Insight (In Windows server). 142K.T.Mikel Raj
  • 143. Meta Store • Hive chooses respective database servers to store the schema or Metadata of tables, databases, columns in a table, their data types, and HDFS mapping. 143K.T.Mikel Raj
  • 144. HiveQL Process Engine • HiveQL is similar to SQL for querying on schema info on the Metastore. • It is one of the replacements of traditional approach for MapReduce program. 144K.T.Mikel Raj
  • 145. Execution Engine The conjunction part of HiveQL process Engine and MapReduce is Hive Execution Engine. Execution engine processes the query and generates results as same as MapReduce results. 145K.T.Mikel Raj
  • 146. HDFS or HBASE Hadoop distributed file system or HBASE are the data storage techniques to store data into file system. 146K.T.Mikel Raj
  • 148. What is Map Reduce?  MapReduce is a processing technique and a program model for distributed computing based on java.  The MapReduce algorithm contains two important tasks, namely Map and Reduce.  Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples . 148K.T.Mikel Raj
  • 149. Cont… Secondly, reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. MapReduce is a programming model Google has used successfully is processing its “big-data” sets (~ 20000 peta bytes per day) 149K.T.Mikel Raj
  • 150. What is Map Reduce? Cont… Users specify the computation in terms of a map and a reduce function, Underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, and Underlying system also handles machine failures, efficient communications, and performance issues. 150K.T.Mikel Raj
  • 151. Map stage  The map or mapper’s job is to process the input data.  Generally the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS).  The input file is passed to the mapper function line by line.  The mapper processes the data and creates several small chunks of data. 151K.T.Mikel Raj
  • 152. Reduce stage  This stage is the combination of the Shuffle stage and the Reduce stage.  The Reducer’s job is to process the data that comes from the mapper.  After processing, it produces a new set of output, which will be stored in the HDFS. 152K.T.Mikel Raj