Big Data Overview

Copyright © 2019. Infor. All Rights Reserved.
Big Data Solution Overview
FEBRUARY 2020

(2)
Elast icsearch
Demo
N oSQL D ATA
MOD ELIN G
TR A D ITION A L
SOLUTIONS
N oSQL
DATABASES
Sample agenda
B IG D ATA

(4)
Big Data
D e f i n i t i o n o f b i g d a t a …
Big Data includes data sets
that become so large that it is
harder to capture, store and
process using conventional
methods
It is hard to handle,
• Large amount of data
• Coming at high speed
• In different formats

(5)
Digital Data Years
The rapid growth of
internet over 35
years
E v e r y d a y w e c r e a t e a s
m u c h i n f o r m a t i o n a s
w e d i d f r o m b e g i n n i n g
o f t i m e u n t i l 2 0 0 3
T h e t o t a l a m o u n t
o f d a t a b e i n g
c a p t u r e d a n d s t o r e d
b y i n d u s t r y d o u b l e s
e v e r y 1 . 2 y e a r s
0000 2003 20202019
m a k e s u r e y o u
k n o w y o u r
z e t t a b y t e s ( 1 0 2 1 )
f r o m y o u r
y o t t a b y t e s ( 1 0 2 4 )
O v e r 9 5 % o f a l l
t h e d a t a i n t h e
w o r l d w a s c r e a t e d
i n t h e p a s t 2
y e a r s
Every minute in 2014
• we sent 204 million emails
• generated 1.8 million
Facebook likes
• sent 278 thousand Tweets
• Uploaded 200,000 photos
to Facebook
• 3.5 billion searches in a
single day
Internet
• More than 3.7 billion
humans use the internet
• We conduct more than half
of our web searches from a
mobile phone now
• On average, Google now
processes more than
40,000 searches EVERY
second (3.5 billion searches
per day)!
• Only accelerating with the
growth of the Internet of
Things (IoT)
A Day of Data
• 500 million tweets are sent
• 294 billion emails are sent
• 4 petabytes of data are
created on Facebook
• 4 terabytes of data are
created from each
connected car
• 65 billion messages are
sent on WhatsApp
• 5 billion searches are made
Exponential Growth
• By 2025, it’s estimated
that 463 exabytes of
data will be created
each day globally –
that’s the equivalent of
212,765,957 DVDs per
day
20252014

(7)
Changing Landscape of Big Data
B i g D a t a C h a l l e n g e s …
Volume
More data coming in
huge quantities
• Your own data (Archives, junk, logs), Free public data and Premium
data adds on to the Volume
• The data will be coming in high speed mainly due to the increase
number of users and interactions
• There can be many data types in unstructured data (Files), Semi-
Structured data (JSON, graphs), Structured (Relational)
• It is challenging to figure out misinformation and invalid data within
that large volume
Velocity
The speed of the
incoming data
Variety
Different types of
data
Veracity
The quality or truth of
the data

(8)
Traditional Solutions
Of Data Management
SECTON 02
(8)

(9)
W h a t we u s e d t o d o . . .
 Data is stored in the form of tables.
 It supports multiple users.
 Maintaining the relationships among the
tables.
 Higher hardware and software need.
 RDBMS supports the integrity constraints
at the schema level.
 Data can be easily accessed using SQL
query.
 MySQL, Oracle, SQL Server
User
RDBM
Traditional Approach
Server
Centralized System

(10)
W h a t we d o n o w. . .
User
Distributed Approach
Server
Distributed Network
DB1
Server1
DB2
Server2
DB2
Server2
DB2
Server2

ACID
Principals
IN RELATIONAL DATABASES
ATOMICITY
CONSISTENCY
ISOLATION
DURABILITY

(12)
RDBMS Challenges
M o d e r n a p p l i c a t i o n s p r e s e n t s
When we implement modern
applications there are new challenges
we face with a traditional solution…
Expensive to scale up
Expensive to scale down
Hard to process high volumes near real-time
Requires DBAs to manage and tune
Designed for relational data

CAP
THEOREM
Brewer ’s Theorem
CONSISTANCY
Every read receives the most
recent write or an error
PARTITION TOLERANCE
continues to operate despite even
if one part of system fails
AVAILABILITY
Every request receives a (non-
error) response

(14)
NoSQL Databases
SECTON 03
(14)

(15)
NoSQL Database
W h a t i s a N o S Q L D a t a b a s e …
A NoSQL database provides
storage and retrieval of data
that is modeled in non-tabular
relations used in relational
databases
Introduced by Google and AWS
A set of characteristics not a defined thing
Non-relational, Highly scalable

(16)
Why do we need NoSQL?
T h e W h y …
 Large amount of data being generated
 Connections between data is growing
 Adaptable to changing structure of data
 Using advanced server architecture
 Designed for non-relational data
 When we need high availability

(17)
Main Use Cases
W h e n d o we u s e N o S Q L d a t a b a s e s ?
• Large Data Volumes
Massively distributed architecture required to store data (Google,
Amazon, Facebook)
• Extreme Query Workload
Impossible to efficiently do joins at that scale with an RDBMS
• Schema Evolution
Schema flexibility is trivial to the solution

(18)
PROS AND CONS
N o S Q L
PROS
Massive Scalability
High Availability
Economical
Schema Flexibility
Sparse and semi-structured data
CONS
Limited query capabilities
Not standardized
Still developing
Less support
Business related analytics

BASE
DESIGN
For NoSQL databases
BASICALLY
AVAILABLE
EVENTUAL
CONSISTENCY
SOFT
STATE

Four
Emerging
Trends
IN NOSQL DATABASES
BIGTABLE
KEY VALUE
GRAPH DB
DOCUMENT

(21)
Big Table
N o S Q L D a t a b a s e S t r u c t u r e s
• Behaves like a standard
relational database
• Designed to work with a lot of
data…. A REALLY BIG LOT of data
• Created by Google now used by
many others
• It is a sparse, distributed,
persistent stored map
• Indexed and with a timestamp

(22)
Key Value
• Each bit of data is stored in a
single collection
• Each collection can have different
types of data
• Values are hidden inside the key
• To find out what the value is we
need to query using the key

(23)
Document Store
• Very similar to a key value
database
• Each collection can have different
types of data
• Difference is you can see the
values

(24)
Graph Database
• Focus is modelling the structure
of data
• Inspired by graph theory
• Scales well to the structure of data
• The use cases are mainly related
to the structure of the database
• Machine learning, Mapping, Supply
Chain Transparency

(25)
NoSQL Data Modeling
SECTON 04
(25)

(26)
What are the database
technologies we use in our system?
Infor Nexus
https://wiki.gtnexus.info/display/dev/Core+Data+Systems

(27)
Data Modeling
W h y d o we n e e d N o S Q L d a t a m o d e l i n g ?
• Understand the data
• Plan the database structure
• Understand application specific
queries
• Document and communicate
design and content

(28)
Data Modeling Differences
R e l a t i o n a l v s N o S Q L d a t a m o d e l i n g
Relational
Fixed set of columns
Atomic fields
Highly normalized
Slow to change
Avoid duplication of data
NoSQL
Unstructured/Semi-Structured data
Aggregations of data
Highly denormalized
Rapidly changing
Duplication of data is supported

(29)
Denormalization
R e p l i c a t i o n o f d a t a …
• Copying of the same data into multiple
documents or tables
• Simplify the query
• Optimize query processing

(30)
Application side JOINs
J o i n s a r e n o t e n c o u r a g e d …
• Joins are rarely supported in NoSQL
solutions.
• Many to Many relationships are often
modeled as joins
• We can use aggregations where
possible

(31)
Elasticsearch Demonstration
SECTON 05
(31)

(32)
Elasticsearch
W h a t i s E l a s t i c s e a r c h ?
Elasticsearch is a search
engine based on the Lucene
library
• It was developed in Java
• Multitenant-capable
• Full text search engine
• Work with a HTTP interface
• JSON documents
• Official clients are available
in,
Java, .NET (C#), PHP, Python,
Apache Groovy, Ruby

Thank you

Big Data Overview

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

Similaire à Big Data Overview

Similaire à Big Data Overview (20)

Dernier

Dernier (20)

Big Data Overview

Notes de l'éditeur