4.16.24 21st Century Movements for Black Lives.pptx
Big Data: Its Characteristics And Architecture Capabilities
1. Big Data: Its Characteristics And
Architecture Capabilities
By
Ashraf Uddin
South Asian University
(http://ashrafsau.blogspot.in/)
2. What is Big Data?
Big data refers to large datasets that are
challenging
to
store,
search,
share,
visualize, and analyze.
“Big Data” is data whose scale, diversity,
and complexity require new architecture,
techniques, algorithms, and analytics to
manage it and extract value and hidden
knowledge from it…
3. The Model of Generating/Consuming
Data has Changed
Old Model: Few companies are generating data, all others are
consuming data
New Model: all of us are generating data, and all of us are
consuming data
4. Do we really need Big Data?
For consumer :
Better understanding of own behavior
Integration of activities
Influence – involvement and recognition
For companies :
Real behavior-- what do people do, and what do they
value?
Faster interaction
Better targeted offers
Customer understanding
7. Velocity
• Data is being generated fast and need to be
processed fast
• Online Data Analytics
• Late Decision leads missing opportunity
8. Varity
• Various formats, types, and
structures
• Text, numerical, images,
audio, video, sequences, time
series, social media data,
multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be
generating/collecting many
types of data
• To extract knowledge all
these types of data need to
linked together
9. Generation of Big Data
Scientific instruments
(collecting all sorts of data)
Social media and networks
(all of us are generating data)
Sensor technology and
networks
(measuring all kinds of data)
10. Why Big Data is Different?
For example, an airline jet collects 10 terabytes of
sensor data for every 30 minutes of flying time.
Compare that with conventional high performance
computing where New York Stock Exchange collects
1 terabyte of structured trading data per day.
Conventional corporate structured data sized in
terabytes and petabytes.
Big Data is sized in peta-, exa-, and soon perhaps,
zetta-bytes!
11. Why Big Data is Different?
The unique characteristics of Big Data is the
manner in which value is discovered.
In conventional BI, the simple summing of a
known value reveals a result
In Big Data, the value is discovered through a
refining modeling process:
make a hypothesis
create statistical, visual, or semantic models
validate, then make a new hypothesis.
13. A Big Data Use Case:
Personalized Insurance Premium
an insurance company wants to offer to those who are
unlikely to make a claim, thereby optimizing their profits.
One way to approach this problem is to collect more
detailed data about an individual's driving habits and then
assess their risk.
to collect data on driving habits utilizing sensors in their
customers' cars to capture driving data, such as routes
driven, miles driven, time of day, and braking abruptness.
14. A Big Data Use Case:
Personalized Insurance Premium
This data is used to assess driver risk; they compare
individual
driving
patterns
with
other
statistical
information, such as average miles driven in same state,
and peak hours of drivers on the road.
Driver risk plus actuarial information is then correlated
with policy and profile information to offer a competitive
and more profitable rate for the company
The result
A personalized insurance plan.
These unique capabilities, delivered from big data analytics, are
revolutionizing the insurance industry.
15. A Big Data Use Case:
Personalized Insurance Premium
To accomplish this task:
a great amount of continuous data must be collected,
stored, and correlated.
Hadoop is an excellent choice for acquisition and
reduction of the automobile sensor data.
Master data and certain reference data including
customer profile information are likely to be stored in the
existing DBMS systems
a NoSQL database can be used to capture and store
reference data that are more dynamic, diverse in formats,
and change frequently.
17. Big Data Architecture Capabilities
Storage and Management Capability
Database Capability
Processing Capability
Data Integration Capability
Statistical Analysis Capability
18. Storage and Management Capability
Hadoop
(HDFS)
Distributed
File
System
highly scalable storage and automatic
data replication across three nodes for fault
tolerance
Cloudera Manager
gives a cluster-wide, real-time view of
nodes and services running; provides a
single, central place to enact configuration
changes across the cluster
19. Big Data Architecture Capabilities
Storage and Management Capability
Database Capability
Processing Capability
Data Integration Capability
Statistical Analysis Capability
20. Database Capability
Oracle NoSQL
Dynamic and flexible schema design
High performance key value pair database.
Apache HBase
Strictly consistent reads and writes
Allows random, real time read/write access
Apache Cassandra
Fault tolerance capability is designed for every node
Data model offers column indexes with the
performance of log-structured updates, materialized
views, and built-in caching
Apache Hive
Tools to enable easy data extract/transform/load (ETL)
Query execution via MapReduce
21. Big Data Architecture Capabilities
Storage and Management Capability
Database Capability
Processing Capability
Data Integration Capability
Statistical Analysis Capability
22. Processing Capability
MapReduce
Break problem up into smaller
sub-problems
Able to distribute data workloads across
thousands of nodes
Apache Hadoop
Leading MapReduce implementation
Highly scalable parallel batch processing
Writes multiple copies across cluster for
fault tolerance
23. Big Data Architecture Capabilities
Storage and Management Capability
Database Capability
Processing Capability
Data Integration Capability
Statistical Analysis Capability
24. Data Integration Capability
Exports MapReduce results
Hadoop, and other targets
to
RDBMS,
Connects Hadoop to relational databases for
SQL processing
Optimized processing
import/export
with
parallel
data
25. Big Data Architecture Capabilities
Storage and Management Capability
Database Capability
Processing Capability
Data Integration Capability
Statistical Analysis Capability
28. Conclusion
Today’s economic environment demands
that business be driven by useful, accurate,
and timely information.
the world of Big Data is a solution to the
problem.
there are always business and IT tradeoffs to
get to data and information in a most
cost-effective way.
29. References
1. Big Data Analytics Guide: Better technology, more
insight for the next generation of business
applications, SAP
2. Oracle Information
Guide to Big Data
Architecture:
An
Architect’s
3. http://
www.csc.com/insights/flxwd/78931-big_data_univers
e_beginning_to_explode
4. http://
www.techrepublic.com/blog/big-data-analytics/10-em
erging-technologies-for-big-data/280
5. http://www.idc.com/
6. From Database to Big Data. Sam Madden (MIT)