The scientific process of transforming data into insight for making better decisions.
~INFORMS
Analytics leverage data in a particular functional process (or application) to enable
context-specific insight that is actionable
~Gartner
What is Analytics?
SIMPLY PUT->
BIG DATA “Big data is the term increasingly used to describe the process of
applying serious computing power—the latest in machine
learning and artificial intelligence—to seriously massive and often
highly complex sets of information.”
Big data opportunities emerge in organizations generating a
median of 300 terabytes of data a week. The most common forms
of data analysed in this way are business transactions stored in
relational databases, followed by documents, e-mail, sensor data,
blogs, and social media
Every day, we create 2.5 quintillion bytes of data — so much
that 90% of the data in the world today has been created in the
last two years alone. This data comes from everywhere: sensors
used to gather climate information, posts to social media sites,
digital pictures and videos, purchase transaction records, and
cell phone GPS signals to name a few.
This data is “big data.”
Types Of Data
Structured:Can easily fit
rows and columns of a
database
Unstructured: Cannot
be easily compiled into
older database formats
SemiStructured:Uses
tags to capture
elements of data
Internal Data:From a
company’s sales,
employee records etc
External Data: From
third party providers,
social media etc
Traditional Data BIG DATA
Gigabytes to Terabytes Petabytes to exabytes
centralised Distributed
Structured Semi Structured and
Unstructured
Stable Data Model Flat Schemas
Known Complex
interrelationships
Few Complex
Interrelationships
1,000,000,000,000 Gigabytes
1,000,000,000 Terabytes
1,000,000 Petabytes
1,000 Exabytes
1 Zettabyte
5.1 10.2
16.8
32.1
48
53.4
0
20
40
60
2012 2013 2014 2015 2016 2017
Big Data Market Forecast(In
$US billions)
Sources of Big Data
Source :Wikibon 2011
Big data refers to enormity in five dimensions:
Big
data
VOLUME
VARIETY
VELOCITYVARIABILITY
COMPLEXITY
Analysis Types Description
Basic Analytics for
insight
Slicing and Dicing of data,
reporting,simple,visualisations,basic
monitoring
Advanced
Analytics for
Insight
More complex data analysis such as
predictive modelling and other pattern
matching techniques
Operationalised
Analytics
Analytics becomes part of the business
process
Monetised
Analytics
Analytics used directly to drive revenue
BIG DATA
ANALYTICS
Source :The Economist
Basic analytics can be used to explore your data, if you’re not
sure what you have, but you think something is of value.
Slicing and Dicing-Breaking down data into smaller sets of
data that are easier to explore.
Basic Monitoring-Monitor large volumes of data in real time
Anomaly identification-An event where the actual
observation differs from what is expected.
Basic Analytics
Advanced Analytics
Advanced analytics can be deployed to find patterns in data,
prediction, forecasting, and complex event processing.
Advanced analytics provides algorithms for complex analysis
of either structured or unstructured data
Includes sophisticated statistical models, machine learning,
neural networks, text analytics and other advanced data-
mining techniques
Predictive Analytics-Techniques that can be used on both
structured and unstructured data (together or individually)to
determine future outcomes.
Text Analytics-the process of analyzing unstructured text,
extracting relevant information, and transforming it into
structured information
Other Methods-advanced forecasting, optimization, cluster
analysis for segmentation or even microsegmentation, or affinity
analysis
Data Mining-exploring and analysing large amounts of data to
find patterns in that data.
Hadoop:Hadoop is an open source framework for processing, storing and analyzing
massive amounts of distributed, unstructured data. Rather than banging away at
one, huge block of data with a single machine, Hadoop breaks up Big Data into
multiple parts so each part can be processed and analyzed at the same time.
NoSQL:NoSQL databases are aimed, for the most part (though there are some
important exceptions) at serving up discrete data stored among large volumes of
multi-structured data to end-user and automated Big Data applications.
Massively Parallel Analytic Databases:Unlike traditional data warehouses,
massively parallel analytic databases are capable of quickly ingesting large
amounts of mainly structured data with minimal data modeling required and can
scale-out to accommodate multiple terabytes and sometimes petabytes of data.
Big Data Approaches