SlideShare a Scribd company logo
1 of 29
BIG DATA: A 360° Overview 
Juvénal CHOKOGOUE M 
Consultant Business Analytics – Big Data 
BD-DE-0005 
11/23/2014
Module Overview 
• The Business Challenge 
• What this module Stands for ? 
• Who is this module for ? 
• Before the battle begins 
• Anyway! What is Big Data ? 
• Big Data and Analytics: How these two married together? 
• Analytical Techniques for Mining Big Data 
• The New Infrastructure for Data Management : Hadoop 
• Big Data adoption : Now or Later ? 
• The Next Steps 
• What Should i remember ? 
• Some Big Data Providers 
• Bibliography & Resources 
• About me
The Business Challenge 
• Scaling operations up and down as 
conditions change and ability to 
Decrease “time to market” for decision-making 
are become a critical 
competitive differentiator in today’s 
economy. 
• Companies are gathering more and 
more data to stay competitive. 
• If they want to decrease their “time 
to market”, they must make sense of the 
intersection of all these different kind of 
data they have gathered. 
• Technically, when you are dealing 
with so much data in so many different 
forms, it is impossible to think about 
data management in traditional ways. 
• The challenges and opportunities 
associated with this new kind of data 
management problem is known today 
as "Big Data"
What this module Stands for ? 
Like in any other technological concept that pops up, Software Companies are 
always fighting against definitions in order to sell their products, confusing and leaving 
businesses a confuse idea of the concept and of where that concept fit in the issues they have 
to face. Big Data, like any other concept such as Cloud Computing, Virtualization, Data mining 
and so on, is just one of these concept. 
i expected that by the end of this paper : 
• you will smile the next time you read or hear at the terms big data, hadoop, or analytics :) 
• you will understand what are behind the scene when one talks about "Big Data" 
• you will know how one can "make sense" of Big Data using Analytics 
• you will get a basic idea of data mining techniques used in Business and in Big Data 
• you will be able to get every news about Big Data 
So, Keep hearing…
What this module Stands for ? 
Like in any other technological concept that pops up, Software Companies are always fighting 
against definitions in order to sell their products, confusing and leaving businesses a confuse idea of the 
concept and of where that concept fit in the issues they have to face. Big Data, like any other concept such 
as Cloud Computing, Virtualization, Data mining and so on, is just one of these concept. 
When writing this paper, my main objective was to provide really a 360 ° overview of Big Data, 
that is a clear understanding of where the term "Big Data" comes from, why is that term so popular now, 
what does it really mean and what can be its implication for businesses. Because Analytics is another term 
that is associated to Big Data, i provided a description of a widely recognized and used analytical 
techniques to help you figure out how used in conjunction with Big Data, analytics can boost Business 
Performance. 
So, please don't lend me words; this paper does not intent to as a “how-to” neither for a big 
data project management, nor for big data application development, nor for Statistical Model Building. 
Those will be the subject of other papers. Rather, i expected that by the end of this paper : 
• you will smile the next time you read or hear at the terms big data, Hadoop, or analytics :) 
• you will understand what are behind the scene when one talks about "Big Data" 
• you will know how one can "make sense" of Big Data using Analytics 
• you will get a basic idea of data mining techniques used in Business and in Big Data 
• you will be able to get every updates about Big Data 
So, Keep Reading…
Before the battle begins 
information provided here is for informational purposes only and represents my current point of view as of 
the date of this presentation. Due to changing conditions of market, information provided here can be 
modify or obsolete, it should not be interpreted to be a commitment and I cannot guarantee its accuracy 
after the date of this presentation. 
Contents of websites provided here can be modify or change, or the website itself can be unavailable after 
the publication of this presentation. So I can not MAKES warranties, express, implied or statutory, as to the 
information in this presentation. 
In this presentation, i choose to call the "Analyst" the person who is responsible for data management, 
analytics, and programming Job. It is just a simplification that i adopted to avoid you of being worried by the 
new jobs/terms created by Big Data and help you focus on the content of the paper. 
Microsoft, SQL Server, Teradata, Oracle, Google, Hadoop, Cloudera, HortonWorks, SAS, EMC and other 
names and products cited here are or may be registered Trademarks in the U.S. and/or in other countries. 
Feel free to share this module with anyone you know, from your colleagues to your friends, but in this case, 
don’t forget to mention the name of the author. 
You can use and change the content of this module at your own but I will not be responsible of it content 
in this case. 
This module is not for sale, If you intend to use it to your own, please, don’t commercialize it !
Anyway! What is Big Data ?
• According to Gartner : "Big data is 
high-volume, high-velocity and high-variety 
information assets that demand 
cost-effective, innovative forms of 
information processing for enhanced 
insight and decision making.“ 
(http://www.gartner.com/it-glossary/big-data/) 
From all definitions provided for Big Data, the definition of Gartner 
is the most widely adopted for describing Big Data. And from that definition, 
one thing Is clear : when one uses the term Big Data, it is to designate data 
that is large in volume , has a high velocity and is available in wide variety . This 
is often refer to as the “3-V” or the 3 Dimension of Big Data.
Big Data and Analytics: 
How these two married together?
Taken alone, Big data is technology-driven. If Businesses want to capitalize on their Big Data 
paradigm, they have to find a way to combine their traditional business analysis techniques they used 
in the past to query and dive through the data. 
But with extremely wide variety of data comes new challenges. Most of traditional business analysis 
techniques are not suitable for the new kind of data sources we have today and that is where 
Analytics comes into play! 
Analytics design the means by which businesses gain insight from data whatever its source, its size 
and even its format.
All this said, you can now understand 
that Big Data Analytics is the concept 
that design the new means by which we 
extract insights from data that are 
extremely large, extremely varied and 
extremely swift. 
• However, Be aware that the 
efficiency of Analytics depends 
fundamentally on the question you want 
to answer, and on the Quality of data. 
Data quality issues must be consider 
prior to analytics concern. As it is said in 
the field: "Garbage in, Garbage out". 
• Analytics techniques must be 
handle with cautious and require a 
formal training in the field. you may 
consider to invest in acquiring an 
analytics professional
Thirdly, analytics is not a "silver bullet" 
that will always give you insights. 
fourthly, Just Because You Have Insights 
Does not Guarantee You Have The 
Power To Act on Them, that is Analytics 
can provide insights, but turning 
insights from numbers into competitive 
advantage may require changes that 
your business can’t afford, or simply 
doesn’t want to make. The Harvard 
Business Review explores a case study 
where through big data it was learned 
“that he could increase profits 
substantially by extending the time that 
items were on the floor before and after 
discounting. Implementing that 
change, however, would have required a 
complete redesign of the supply chain, 
which the retailer was reluctant to 
undertake.” (source 
:https://hbr.org/2013/12/you-may-not-need- 
big-data-after-all/ar/1) 
Analytics does not replace your business intuition. It 
just make you feel more confident about your choice. 
you may at the end consider your experience and your 
intuition as a manager to take the decision.
Analytical Techniques for Mining Big 
Data
in this part, i am going to talk only about 
some techniques i am certified in. These 
techniques are used in most business 
scenarios and have showed their proof long 
ago. 
These techniques are : Regression( Linear and 
Logistic), Decision Trees, K-Means, Times 
Series, Neural Network, Association Rules, 
Naive Bayes and Survival Analysis. In addition, 
i am going to present Text Analytics 
fundementals, since in Big Data age, we are 
generating more and more text data (tweets, 
facebook comments..). 
- Regression 
regression focuses on the relationship 
between an outcome and its input variables. 
Here, we are predicting how changes in 
individual drivers affect the outcome. the 
outcome can be continuous or discrete. When 
it is discrete, we are predicting the probability 
that the outcome will occur. When it is 
continuous, we are predicting the value of the 
dependent variable given the independent 
a survey from TDWI
- Decision Trees 
Decision Trees are a flexible method very 
commonly deployed in classification and 
regression problems. Decision trees partition 
large amount of data into smaller segments 
by applying a series of rules in the form "if 
condition THEN expression" (eg: if age less 
than 30 and revenue greater than 36000 then 
class = 'Rich'). Decision trees are visually 
represented as upside-down trees with the 
root at the top and branches emanating from 
the root. There are two types of trees: 
Classification Trees and Regression trees. 
- K-Means 
K-means is a clustering method, it enter in 
the category of Exploratory Data Analysis 
Methods called "Unsupervised Classification". 
The goal is to group data based on similarities 
in input variables with no target or specific 
outcome. It is the preferred method for 
segmentation & Profiling. 
a survey from TDWI
-Times Series 
Time Series Analysis provides a scientific methodology for 
forecasting. Time Series Analysis is the analysis of a 
phenomenon that has a temporary evolution. The main 
objectives in Time Series Analysis are: 
• To understand the underlying structure of the time series 
by breaking it into trend, seasonality, and noise. 
• Fit a mathematical model to forecast the future. 
- Neural Network 
Artificial Neural Network are class of flexible non-linear 
models used for prediction problems. The power of the 
neural network comes from the fact that they can 
approximate virtually any continuous association between 
the inputs and the target, whatever the kind of relationship 
associate them. There are many kind of Neural Network, 
but the most widely used is the Multi Layer Perceptron 
(MLP). 
- Association Rules 
Also known as association rules discovery or Market 
Basket Analysis or affinity analysis, association rule is a 
popular data mining method for exploring associations 
between items (data). It is an unsupervised method for in-database 
mining over transactions in databases.
- Naive Bayes 
Naive bayes is a "Classifier", that is it is used to classify or 
assign labels to objects based on applying Bayes theorem 
with strong naïve independence assumptions. Naive 
Bayes is specifically suited for problems where you have a 
categorical inputs with lot of levels. 
- Survival Analysis 
Survival analysis is a class of statistical methods for 
studying the occurrence and timing of events. It is suitable 
for problems where you want to know WHEN a specific 
event will happen. . Most common approach to build a 
survival model are the following : Life Tables, Kaplan-Meier 
estimators, exponential regression, proportional hazards 
regression, competing risk models and discrete-time 
methods. 
- text analytics fundamentals 
Text analytics is the process of analyzing unstructured text, 
extracting relevant information, and transforming it into 
structured information that can then be leveraged in 
various ways. The analysis and extraction processes take 
advantage of techniques that originated from 
computational linguistics (Natural Semantic Language), 
statistics, and other computer science disciplines.
The New Infrastructure for Data 
Management : Hadoop
6.1 The New data management strategy 
• The centralized process for data processing is no more efficient 
nowadays ! 
• To deal with Big Data, the idea is to distribute the storage of 
data and parallelize the processing of that data across several 
cluster of computers: the Cluster computing infrastructure. 
• In cluster computing : 
- data Files are stored redundantly. 
- Computation are divided into tasks and parallelized 
• The redundancy of the data on multiple hard disk is supported 
via a new kind of file system called the "Distributed File System" 
(DFS) and the parallelism of the processing is performed via a 
new kind of programming model called "MapReduce". 
• The Most popular (and yet mature) implementation of 
MapReduce is called "Hadoop". Hadoop comes along with the 
HDFS (Hadoop Distributed File System) 
• Yes, you got it! You can use an implementation of MapReduce to 
manage many large-scale data computations in a way that is 
tolerant of hardware fault. 
A cluster computing environment 
Map Reduce Job Description
• Hadoop is a platform that implements 
MapReduce and provide a redundant, reliable 
and distributed file system optimized for large 
files. 
• In reality, Hadoop is just a set of Java classes 
(theses classes can also be written into other 
programming languages such as Python, C#, 
C++,...) for HDFS types and MapReduce job 
management. 
• Theses classes allow the analyst to write 
functions that will get insight from data 
without having to worry about how his code is 
distributed and parallelized in the cluster 
environment. 
• To get out the most of a Hadoop cluster , a set 
of technologies and tools have been 
developed. These set of tools forms today 
what is convenient to call : the Hadoop 
Ecosystem. 
• The most foundational tools of the Hadoop 
Ecosystem are the following: Pig, Hive, HBase, 
Sqoop, Zookeeper & Mahout. 
6.2 The Hadoop Ecosystem
- Pig 
Pig is an interactive data flow (or script-based) 
language and execution environment 
for Hadoop. Pig provides a data flow 
language called Pig Latin that allows to 
express a series of operations to apply to an 
input data to produce output. 
- Hive 
Hive is an interactive and batch query 
language based on SQL for building 
MapReduce jobs. It provides users who know 
SQL with a simple SQL-like implementation 
called HiveQL. 
-HBase 
HBase is a distributed, column-oriented 
database that utilizes HDFS as its persistence 
store and supports MapReduce and point 
queries. It is capable of hosting very large 
tables (billions of columns/rows) because it 
is layered on Hadoop clusters of commodity 
hardware. 
eg of a Pig script : finding the Maximum 
temperature by year 
1 records = LOAD 'data/samples.txt AS (year: 
chararray, temperature : int, quality: int); 
2 filtered_records = FILTER records BY 
temperature !=9999 AND (quality ==0 OR 
quality == 4); 
3 grouped_records = GROUP filtered_records BY 
year ; 
4 Max_temp = FOREACH grouped_records GENERATE 
group, MAX (filtered_records.temperature) 
5 DUMP max_temp ; 
The same previous example written in HiveQL 
1 CREATE TABLE records (year string, 
temperature INT, quality INT) ROW FORMAT 
DELIMITED FIELDS TERMINATED BY 't' ; 
2 LOAD DATA LOCAL 'data/sample.txt' 
OVERWRITE INTO TABLE records ; 
3 SELECT year, MAX(temperature) FROM records 
WHERE temperature !=9999 AND (quality == 0 
OR quality == 1) GROUP BY year ;
- Sqoop 
Sqoop (SQL-to-Hadoop) efficiently transfers data 
from Hadoop HDFS to structured Relational 
Databases and vice-verça. Look at Sqoop as the 
ETL (Extract - Transform - Load) for an Hadoop 
environment. 
- Zookeeper 
Zookeeper provides a distributed configuration 
service, a synchronization service and a naming 
registry for distributed applications. Zookeeper is 
Hadoop’s way of coordinating all the elements of 
these distributed applications. 
-Mahout 
Mahout is a scalable machine learning and data 
mining library for Hadoop. Look at Mahout as the 
analytic software for an Hadoop environment. 
Mahout provides data mining and machine 
learning algorithms packaged in Java libraries to 
perform 4 types of analysis in an Hadoop 
environment: Recommendation mining, 
classification, clustering and association rules.
BIG DATA ADOPTION : 
NOW OR LATER ?
The answer to this question must lie in the integration and the operationalization of analytics as a whole part 
of the organization's business process. This suppose organization is data-driven. the big data approach is 
mostly suited to addressing or solving business problems that are subject to one or more of the following 
criteria: 
1. Data throttling: 
2. Computation-restricted throttling 
3. Large data volumes 
4. Significant data variety 
5. Benefits from data parallelization
What Should I remember ? 
• Even if we have always had a lot of data, the difference today is that significantly more of it 
exists, and it varies in type and timeliness. To cope with this problem , you have to think 
about managing data differently. That is where comes the "Big Data". 
• Big Data is the name given to the data management challenges and opportunities that 
emerge when dealing with data that is extremely large in volume, has extremely high 
velocity and is extremely wide in variety. 
• Big Data without Analytics is just data 
• Just Because You Have Insights Doesn’t Guarantee You Have The Power To Act on Them. 
• every problem is not suitable for Big Data 
• MapReduce is a programming model that allow to manage large-scale data computations 
in a way that is tolerant of hardware fault. 
• Hadoop is a platform that implements MapReduce and provide a redundant, reliable and 
distributed file system optimized for large files.
Some Big Data Providers 
Here are some Big Data providers I personally know. There are some others. 
- Cloudera, with its first commercial distribution of Hadoop 
- HortonWorks, with its commercial distribution of Hadoop 
- SAS Institute with its SAS on Hadoop platform, SAS High Performance Suite, SAS Grid 
Computing and SAS Visual Analytics 
- HP with its platform called HP Vertica 
- EMC with its platform called GreenPlum Pivotal
Bibliography & Resources 
http://www.cisjournal.org/archive/vol2no4/vol2no4_1.pdf 
Hybrid Recommender System Using Naive Bayes Classifier and Collaborative Filtering 
http://eprints.ecs.soton.ac.uk/18483/ 
Online applications : http://www.convo.co.uk/x02/ 
http://mahout.apache.org/ 
EMC Data Science & Big Data Analytics Training Module 
https://education.emc.com/guest/campaign/data_science.aspx 
SAS Official Predictive Modeling Training Course 
https://support.sas.com/edu/schedules.html?id=1366&ctry=us 
https://support.sas.com/edu/schedules.html?id=1220&ctry=US 
Big Data for Dummies by Judith Hurwitz, Alan NUGENT, Dr. Fern Halper, Marcia Kaufman 
ISBN : 978-1-118-50422-2 www.wiley.com 
Gartner : http://www.gartner.com/it-glossary/big-data/ 
The Harvard Business Review : 
https://hbr.org/2013/12/you-may-not-need-big-data-after-all/ar/1 
MapReduce: Simplified Data Processing on Large Clusters (from Google) 
http://static.googleusercontent.com/media/research.google.com/fr//archive/mapreduce-osdi04.pdf 
Hadoop Apache Foundation 
http://hadoop.apache.org/ 
TDWI : http://tdwi.org/
About Me 
• I am a freelance/Consultant who help organisations leverage their data to improve their performance 
through the right tool, the right methodology and the right technology. I have over 3 years of 
experience and 5 Certifications. I am a highly certified SAS Professional and also a certified EMC² 
Data Scientist. 
Contact 
Mail : jvc35@yahoo.fr 
Twitter : @Juvenal_JVC 
Linkedin : http://fr.linkedin.com/pub/juv%C3%A9nal-chokogoue/52/965/a8 
Data Information Knowledge 
Actionable 
plans 
Performance
Thank you for attending, I sincerely hope 
this module will be helpful for you ! 
The Full version will be available soon !!!!

More Related Content

What's hot

Connected Banking Framework
Connected Banking FrameworkConnected Banking Framework
Connected Banking FrameworkKashif Akram
 
Think like your customer
Think like your customerThink like your customer
Think like your customerTrisha Dutta
 
Think Like Your Customer
Think Like Your CustomerThink Like Your Customer
Think Like Your CustomerIBM Analytics
 
Single View of Customer in Banking
Single View of Customer in BankingSingle View of Customer in Banking
Single View of Customer in BankingRajeev Krishnan
 
Single View of the Customer
Single View of the Customer Single View of the Customer
Single View of the Customer MongoDB
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...Michelle Zhou
 
Analytics and Self Service
Analytics and Self ServiceAnalytics and Self Service
Analytics and Self ServiceMike Streb
 
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...Denodo
 
Mir 1808 cus_datplat
Mir 1808 cus_datplatMir 1808 cus_datplat
Mir 1808 cus_datplatEvoLife.bg
 
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide ShutFrancisco Calzado
 
Big Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics StartupsBig Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics Startupswallesplace
 
Building new business models through big data dec 06 2012
Building new business models through big data   dec 06 2012Building new business models through big data   dec 06 2012
Building new business models through big data dec 06 2012Aki Balogh
 
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...IBM Switzerland
 
MDM - The Key to Successful Customer Experience Managment
MDM - The Key to Successful Customer Experience ManagmentMDM - The Key to Successful Customer Experience Managment
MDM - The Key to Successful Customer Experience ManagmentEarley Information Science
 
Retail Big Data and Analytics
Retail Big Data and AnalyticsRetail Big Data and Analytics
Retail Big Data and AnalyticsCloudera, Inc.
 
Réinventez le Data Management avec la Data Virtualization de Denodo
Réinventez le Data Management avec la Data Virtualization de DenodoRéinventez le Data Management avec la Data Virtualization de Denodo
Réinventez le Data Management avec la Data Virtualization de DenodoDenodo
 
Data-driven Banking: Managing the Digital Transformation
Data-driven Banking: Managing the Digital TransformationData-driven Banking: Managing the Digital Transformation
Data-driven Banking: Managing the Digital TransformationLindaWatson19
 

What's hot (20)

Connected Banking Framework
Connected Banking FrameworkConnected Banking Framework
Connected Banking Framework
 
Think like your customer
Think like your customerThink like your customer
Think like your customer
 
Think Like Your Customer
Think Like Your CustomerThink Like Your Customer
Think Like Your Customer
 
Single View of Customer in Banking
Single View of Customer in BankingSingle View of Customer in Banking
Single View of Customer in Banking
 
Single View of the Customer
Single View of the Customer Single View of the Customer
Single View of the Customer
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
 
BigData in Banking
BigData in BankingBigData in Banking
BigData in Banking
 
Analytics and Self Service
Analytics and Self ServiceAnalytics and Self Service
Analytics and Self Service
 
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...
Accelerating Data-Driven Enterprise Transformation in Banking, Financial Serv...
 
Taming data lake - scalable metrics model
Taming data lake - scalable metrics modelTaming data lake - scalable metrics model
Taming data lake - scalable metrics model
 
Mir 1808 cus_datplat
Mir 1808 cus_datplatMir 1808 cus_datplat
Mir 1808 cus_datplat
 
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut
201407 Global Insights and Actions for Banks in the Digital Age - Eyes Wide Shut
 
Big Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics StartupsBig Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics Startups
 
Building new business models through big data dec 06 2012
Building new business models through big data   dec 06 2012Building new business models through big data   dec 06 2012
Building new business models through big data dec 06 2012
 
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
Big Data – wie aus Daten strategische Resourcen und Ihr Wettbewerbsvorteil we...
 
MDM - The Key to Successful Customer Experience Managment
MDM - The Key to Successful Customer Experience ManagmentMDM - The Key to Successful Customer Experience Managment
MDM - The Key to Successful Customer Experience Managment
 
Retail Big Data and Analytics
Retail Big Data and AnalyticsRetail Big Data and Analytics
Retail Big Data and Analytics
 
Réinventez le Data Management avec la Data Virtualization de Denodo
Réinventez le Data Management avec la Data Virtualization de DenodoRéinventez le Data Management avec la Data Virtualization de Denodo
Réinventez le Data Management avec la Data Virtualization de Denodo
 
Data-driven Banking: Managing the Digital Transformation
Data-driven Banking: Managing the Digital TransformationData-driven Banking: Managing the Digital Transformation
Data-driven Banking: Managing the Digital Transformation
 

Viewers also liked

Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Cloudera, Inc.
 
360° View of Your Customers
360° View of Your Customers360° View of Your Customers
360° View of Your CustomersOSF Commerce
 
Creating a Single View Part 2: Loading Disparate Source Data and Creating a S...
Creating a Single View Part 2: Loading Disparate Source Data and Creating a S...Creating a Single View Part 2: Loading Disparate Source Data and Creating a S...
Creating a Single View Part 2: Loading Disparate Source Data and Creating a S...MongoDB
 
Single view with_mongo_db_(lo)
Single view with_mongo_db_(lo)Single view with_mongo_db_(lo)
Single view with_mongo_db_(lo)MongoDB
 
Creating a Single View Part 1: Overview and Data Analysis
Creating a Single View Part 1: Overview and Data AnalysisCreating a Single View Part 1: Overview and Data Analysis
Creating a Single View Part 1: Overview and Data AnalysisMongoDB
 
Multi-Channel Analytics: The Answer to the "Big Data" Challenge and Key to Im...
Multi-Channel Analytics: The Answer to the "Big Data" Challenge and Key to Im...Multi-Channel Analytics: The Answer to the "Big Data" Challenge and Key to Im...
Multi-Channel Analytics: The Answer to the "Big Data" Challenge and Key to Im...Dr. Cedric Alford
 
How to deliver a Single View in Financial Services
 How to deliver a Single View in Financial Services How to deliver a Single View in Financial Services
How to deliver a Single View in Financial ServicesMongoDB
 
Webinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDBWebinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDBMongoDB
 
Marketing analytics for the Banking Industry
Marketing analytics for the Banking IndustryMarketing analytics for the Banking Industry
Marketing analytics for the Banking IndustrySashindar Rajasekaran
 
Smart Meter Data Analytic using Hadoop
Smart Meter Data Analytic using HadoopSmart Meter Data Analytic using Hadoop
Smart Meter Data Analytic using HadoopDataWorks Summit
 
How to build an effective omni-channel CRM & Marketing Strategy & 360 custome...
How to build an effective omni-channel CRM & Marketing Strategy & 360 custome...How to build an effective omni-channel CRM & Marketing Strategy & 360 custome...
How to build an effective omni-channel CRM & Marketing Strategy & 360 custome...Comarch
 
Smart Analytics For The Utility Sector
Smart Analytics For The Utility SectorSmart Analytics For The Utility Sector
Smart Analytics For The Utility SectorHerman Bosker
 
Big Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of ViewBig Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of ViewPietro Leo
 

Viewers also liked (17)

Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 
360° View of Your Customers
360° View of Your Customers360° View of Your Customers
360° View of Your Customers
 
Creating a Single View Part 2: Loading Disparate Source Data and Creating a S...
Creating a Single View Part 2: Loading Disparate Source Data and Creating a S...Creating a Single View Part 2: Loading Disparate Source Data and Creating a S...
Creating a Single View Part 2: Loading Disparate Source Data and Creating a S...
 
Single view with_mongo_db_(lo)
Single view with_mongo_db_(lo)Single view with_mongo_db_(lo)
Single view with_mongo_db_(lo)
 
Creating a Single View Part 1: Overview and Data Analysis
Creating a Single View Part 1: Overview and Data AnalysisCreating a Single View Part 1: Overview and Data Analysis
Creating a Single View Part 1: Overview and Data Analysis
 
Nielsen parado
Nielsen parado  Nielsen parado
Nielsen parado
 
Multi-Channel Analytics: The Answer to the "Big Data" Challenge and Key to Im...
Multi-Channel Analytics: The Answer to the "Big Data" Challenge and Key to Im...Multi-Channel Analytics: The Answer to the "Big Data" Challenge and Key to Im...
Multi-Channel Analytics: The Answer to the "Big Data" Challenge and Key to Im...
 
How to deliver a Single View in Financial Services
 How to deliver a Single View in Financial Services How to deliver a Single View in Financial Services
How to deliver a Single View in Financial Services
 
Webinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDBWebinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDB
 
Marketing analytics for the Banking Industry
Marketing analytics for the Banking IndustryMarketing analytics for the Banking Industry
Marketing analytics for the Banking Industry
 
Single Customer View: The Missing Piece
Single Customer View: The Missing Piece Single Customer View: The Missing Piece
Single Customer View: The Missing Piece
 
Solution Blueprint - Customer 360
Solution Blueprint - Customer 360Solution Blueprint - Customer 360
Solution Blueprint - Customer 360
 
Smart Meter Data Analytic using Hadoop
Smart Meter Data Analytic using HadoopSmart Meter Data Analytic using Hadoop
Smart Meter Data Analytic using Hadoop
 
Advanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITIAdvanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITI
 
How to build an effective omni-channel CRM & Marketing Strategy & 360 custome...
How to build an effective omni-channel CRM & Marketing Strategy & 360 custome...How to build an effective omni-channel CRM & Marketing Strategy & 360 custome...
How to build an effective omni-channel CRM & Marketing Strategy & 360 custome...
 
Smart Analytics For The Utility Sector
Smart Analytics For The Utility SectorSmart Analytics For The Utility Sector
Smart Analytics For The Utility Sector
 
Big Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of ViewBig Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of View
 

Similar to Big Data 360 Overview: Analytics Techniques

From Big Data to Business Value
From Big Data to Business ValueFrom Big Data to Business Value
From Big Data to Business ValueGib Bassett
 
Big Data - Everything you need to know
Big Data - Everything you need to knowBig Data - Everything you need to know
Big Data - Everything you need to knowV2Soft
 
Creating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and ITCreating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and ITEdward Chenard
 
Barry Ooi; Big Data lookb4YouLeap
Barry Ooi; Big Data lookb4YouLeapBarry Ooi; Big Data lookb4YouLeap
Barry Ooi; Big Data lookb4YouLeapBarry Ooi
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data TipsQubole
 
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...Tommy Toy
 
BIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICSBIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICSVikram Joshi
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatCharlie Hecht
 
Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Kavika Roy
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group
 
Disruptive Data Science Series: Transforming Your Company into a Data Science...
Disruptive Data Science Series: Transforming Your Company into a Data Science...Disruptive Data Science Series: Transforming Your Company into a Data Science...
Disruptive Data Science Series: Transforming Your Company into a Data Science...EMC
 
Is Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big DataIs Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big Datahimanshu13jun
 

Similar to Big Data 360 Overview: Analytics Techniques (20)

From Big Data to Business Value
From Big Data to Business ValueFrom Big Data to Business Value
From Big Data to Business Value
 
Mighty Guides Data Disruption
Mighty Guides Data DisruptionMighty Guides Data Disruption
Mighty Guides Data Disruption
 
What is business analytics
What is business analyticsWhat is business analytics
What is business analytics
 
Mighty Guides- Data Disruption
Mighty Guides- Data DisruptionMighty Guides- Data Disruption
Mighty Guides- Data Disruption
 
Big Data - Everything you need to know
Big Data - Everything you need to knowBig Data - Everything you need to know
Big Data - Everything you need to know
 
Creating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and ITCreating Big Data Success with the Collaboration of Business and IT
Creating Big Data Success with the Collaboration of Business and IT
 
Barry Ooi; Big Data lookb4YouLeap
Barry Ooi; Big Data lookb4YouLeapBarry Ooi; Big Data lookb4YouLeap
Barry Ooi; Big Data lookb4YouLeap
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
Rapid-fire BI
Rapid-fire BIRapid-fire BI
Rapid-fire BI
 
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
Big Data analytics best practices
Big Data analytics best practicesBig Data analytics best practices
Big Data analytics best practices
 
Difference b/w DataScience, Data Analyst
Difference b/w DataScience, Data AnalystDifference b/w DataScience, Data Analyst
Difference b/w DataScience, Data Analyst
 
BIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICSBIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICS
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_Yhat
 
Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big Data
 
Disruptive Data Science Series: Transforming Your Company into a Data Science...
Disruptive Data Science Series: Transforming Your Company into a Data Science...Disruptive Data Science Series: Transforming Your Company into a Data Science...
Disruptive Data Science Series: Transforming Your Company into a Data Science...
 
Is Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big DataIs Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big Data
 
365 Data Science
365 Data Science365 Data Science
365 Data Science
 

Recently uploaded

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 

Recently uploaded (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 

Big Data 360 Overview: Analytics Techniques

  • 1. BIG DATA: A 360° Overview Juvénal CHOKOGOUE M Consultant Business Analytics – Big Data BD-DE-0005 11/23/2014
  • 2. Module Overview • The Business Challenge • What this module Stands for ? • Who is this module for ? • Before the battle begins • Anyway! What is Big Data ? • Big Data and Analytics: How these two married together? • Analytical Techniques for Mining Big Data • The New Infrastructure for Data Management : Hadoop • Big Data adoption : Now or Later ? • The Next Steps • What Should i remember ? • Some Big Data Providers • Bibliography & Resources • About me
  • 3. The Business Challenge • Scaling operations up and down as conditions change and ability to Decrease “time to market” for decision-making are become a critical competitive differentiator in today’s economy. • Companies are gathering more and more data to stay competitive. • If they want to decrease their “time to market”, they must make sense of the intersection of all these different kind of data they have gathered. • Technically, when you are dealing with so much data in so many different forms, it is impossible to think about data management in traditional ways. • The challenges and opportunities associated with this new kind of data management problem is known today as "Big Data"
  • 4. What this module Stands for ? Like in any other technological concept that pops up, Software Companies are always fighting against definitions in order to sell their products, confusing and leaving businesses a confuse idea of the concept and of where that concept fit in the issues they have to face. Big Data, like any other concept such as Cloud Computing, Virtualization, Data mining and so on, is just one of these concept. i expected that by the end of this paper : • you will smile the next time you read or hear at the terms big data, hadoop, or analytics :) • you will understand what are behind the scene when one talks about "Big Data" • you will know how one can "make sense" of Big Data using Analytics • you will get a basic idea of data mining techniques used in Business and in Big Data • you will be able to get every news about Big Data So, Keep hearing…
  • 5. What this module Stands for ? Like in any other technological concept that pops up, Software Companies are always fighting against definitions in order to sell their products, confusing and leaving businesses a confuse idea of the concept and of where that concept fit in the issues they have to face. Big Data, like any other concept such as Cloud Computing, Virtualization, Data mining and so on, is just one of these concept. When writing this paper, my main objective was to provide really a 360 ° overview of Big Data, that is a clear understanding of where the term "Big Data" comes from, why is that term so popular now, what does it really mean and what can be its implication for businesses. Because Analytics is another term that is associated to Big Data, i provided a description of a widely recognized and used analytical techniques to help you figure out how used in conjunction with Big Data, analytics can boost Business Performance. So, please don't lend me words; this paper does not intent to as a “how-to” neither for a big data project management, nor for big data application development, nor for Statistical Model Building. Those will be the subject of other papers. Rather, i expected that by the end of this paper : • you will smile the next time you read or hear at the terms big data, Hadoop, or analytics :) • you will understand what are behind the scene when one talks about "Big Data" • you will know how one can "make sense" of Big Data using Analytics • you will get a basic idea of data mining techniques used in Business and in Big Data • you will be able to get every updates about Big Data So, Keep Reading…
  • 6. Before the battle begins information provided here is for informational purposes only and represents my current point of view as of the date of this presentation. Due to changing conditions of market, information provided here can be modify or obsolete, it should not be interpreted to be a commitment and I cannot guarantee its accuracy after the date of this presentation. Contents of websites provided here can be modify or change, or the website itself can be unavailable after the publication of this presentation. So I can not MAKES warranties, express, implied or statutory, as to the information in this presentation. In this presentation, i choose to call the "Analyst" the person who is responsible for data management, analytics, and programming Job. It is just a simplification that i adopted to avoid you of being worried by the new jobs/terms created by Big Data and help you focus on the content of the paper. Microsoft, SQL Server, Teradata, Oracle, Google, Hadoop, Cloudera, HortonWorks, SAS, EMC and other names and products cited here are or may be registered Trademarks in the U.S. and/or in other countries. Feel free to share this module with anyone you know, from your colleagues to your friends, but in this case, don’t forget to mention the name of the author. You can use and change the content of this module at your own but I will not be responsible of it content in this case. This module is not for sale, If you intend to use it to your own, please, don’t commercialize it !
  • 7. Anyway! What is Big Data ?
  • 8. • According to Gartner : "Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.“ (http://www.gartner.com/it-glossary/big-data/) From all definitions provided for Big Data, the definition of Gartner is the most widely adopted for describing Big Data. And from that definition, one thing Is clear : when one uses the term Big Data, it is to designate data that is large in volume , has a high velocity and is available in wide variety . This is often refer to as the “3-V” or the 3 Dimension of Big Data.
  • 9. Big Data and Analytics: How these two married together?
  • 10. Taken alone, Big data is technology-driven. If Businesses want to capitalize on their Big Data paradigm, they have to find a way to combine their traditional business analysis techniques they used in the past to query and dive through the data. But with extremely wide variety of data comes new challenges. Most of traditional business analysis techniques are not suitable for the new kind of data sources we have today and that is where Analytics comes into play! Analytics design the means by which businesses gain insight from data whatever its source, its size and even its format.
  • 11. All this said, you can now understand that Big Data Analytics is the concept that design the new means by which we extract insights from data that are extremely large, extremely varied and extremely swift. • However, Be aware that the efficiency of Analytics depends fundamentally on the question you want to answer, and on the Quality of data. Data quality issues must be consider prior to analytics concern. As it is said in the field: "Garbage in, Garbage out". • Analytics techniques must be handle with cautious and require a formal training in the field. you may consider to invest in acquiring an analytics professional
  • 12. Thirdly, analytics is not a "silver bullet" that will always give you insights. fourthly, Just Because You Have Insights Does not Guarantee You Have The Power To Act on Them, that is Analytics can provide insights, but turning insights from numbers into competitive advantage may require changes that your business can’t afford, or simply doesn’t want to make. The Harvard Business Review explores a case study where through big data it was learned “that he could increase profits substantially by extending the time that items were on the floor before and after discounting. Implementing that change, however, would have required a complete redesign of the supply chain, which the retailer was reluctant to undertake.” (source :https://hbr.org/2013/12/you-may-not-need- big-data-after-all/ar/1) Analytics does not replace your business intuition. It just make you feel more confident about your choice. you may at the end consider your experience and your intuition as a manager to take the decision.
  • 13. Analytical Techniques for Mining Big Data
  • 14. in this part, i am going to talk only about some techniques i am certified in. These techniques are used in most business scenarios and have showed their proof long ago. These techniques are : Regression( Linear and Logistic), Decision Trees, K-Means, Times Series, Neural Network, Association Rules, Naive Bayes and Survival Analysis. In addition, i am going to present Text Analytics fundementals, since in Big Data age, we are generating more and more text data (tweets, facebook comments..). - Regression regression focuses on the relationship between an outcome and its input variables. Here, we are predicting how changes in individual drivers affect the outcome. the outcome can be continuous or discrete. When it is discrete, we are predicting the probability that the outcome will occur. When it is continuous, we are predicting the value of the dependent variable given the independent a survey from TDWI
  • 15. - Decision Trees Decision Trees are a flexible method very commonly deployed in classification and regression problems. Decision trees partition large amount of data into smaller segments by applying a series of rules in the form "if condition THEN expression" (eg: if age less than 30 and revenue greater than 36000 then class = 'Rich'). Decision trees are visually represented as upside-down trees with the root at the top and branches emanating from the root. There are two types of trees: Classification Trees and Regression trees. - K-Means K-means is a clustering method, it enter in the category of Exploratory Data Analysis Methods called "Unsupervised Classification". The goal is to group data based on similarities in input variables with no target or specific outcome. It is the preferred method for segmentation & Profiling. a survey from TDWI
  • 16. -Times Series Time Series Analysis provides a scientific methodology for forecasting. Time Series Analysis is the analysis of a phenomenon that has a temporary evolution. The main objectives in Time Series Analysis are: • To understand the underlying structure of the time series by breaking it into trend, seasonality, and noise. • Fit a mathematical model to forecast the future. - Neural Network Artificial Neural Network are class of flexible non-linear models used for prediction problems. The power of the neural network comes from the fact that they can approximate virtually any continuous association between the inputs and the target, whatever the kind of relationship associate them. There are many kind of Neural Network, but the most widely used is the Multi Layer Perceptron (MLP). - Association Rules Also known as association rules discovery or Market Basket Analysis or affinity analysis, association rule is a popular data mining method for exploring associations between items (data). It is an unsupervised method for in-database mining over transactions in databases.
  • 17. - Naive Bayes Naive bayes is a "Classifier", that is it is used to classify or assign labels to objects based on applying Bayes theorem with strong naïve independence assumptions. Naive Bayes is specifically suited for problems where you have a categorical inputs with lot of levels. - Survival Analysis Survival analysis is a class of statistical methods for studying the occurrence and timing of events. It is suitable for problems where you want to know WHEN a specific event will happen. . Most common approach to build a survival model are the following : Life Tables, Kaplan-Meier estimators, exponential regression, proportional hazards regression, competing risk models and discrete-time methods. - text analytics fundamentals Text analytics is the process of analyzing unstructured text, extracting relevant information, and transforming it into structured information that can then be leveraged in various ways. The analysis and extraction processes take advantage of techniques that originated from computational linguistics (Natural Semantic Language), statistics, and other computer science disciplines.
  • 18. The New Infrastructure for Data Management : Hadoop
  • 19. 6.1 The New data management strategy • The centralized process for data processing is no more efficient nowadays ! • To deal with Big Data, the idea is to distribute the storage of data and parallelize the processing of that data across several cluster of computers: the Cluster computing infrastructure. • In cluster computing : - data Files are stored redundantly. - Computation are divided into tasks and parallelized • The redundancy of the data on multiple hard disk is supported via a new kind of file system called the "Distributed File System" (DFS) and the parallelism of the processing is performed via a new kind of programming model called "MapReduce". • The Most popular (and yet mature) implementation of MapReduce is called "Hadoop". Hadoop comes along with the HDFS (Hadoop Distributed File System) • Yes, you got it! You can use an implementation of MapReduce to manage many large-scale data computations in a way that is tolerant of hardware fault. A cluster computing environment Map Reduce Job Description
  • 20. • Hadoop is a platform that implements MapReduce and provide a redundant, reliable and distributed file system optimized for large files. • In reality, Hadoop is just a set of Java classes (theses classes can also be written into other programming languages such as Python, C#, C++,...) for HDFS types and MapReduce job management. • Theses classes allow the analyst to write functions that will get insight from data without having to worry about how his code is distributed and parallelized in the cluster environment. • To get out the most of a Hadoop cluster , a set of technologies and tools have been developed. These set of tools forms today what is convenient to call : the Hadoop Ecosystem. • The most foundational tools of the Hadoop Ecosystem are the following: Pig, Hive, HBase, Sqoop, Zookeeper & Mahout. 6.2 The Hadoop Ecosystem
  • 21. - Pig Pig is an interactive data flow (or script-based) language and execution environment for Hadoop. Pig provides a data flow language called Pig Latin that allows to express a series of operations to apply to an input data to produce output. - Hive Hive is an interactive and batch query language based on SQL for building MapReduce jobs. It provides users who know SQL with a simple SQL-like implementation called HiveQL. -HBase HBase is a distributed, column-oriented database that utilizes HDFS as its persistence store and supports MapReduce and point queries. It is capable of hosting very large tables (billions of columns/rows) because it is layered on Hadoop clusters of commodity hardware. eg of a Pig script : finding the Maximum temperature by year 1 records = LOAD 'data/samples.txt AS (year: chararray, temperature : int, quality: int); 2 filtered_records = FILTER records BY temperature !=9999 AND (quality ==0 OR quality == 4); 3 grouped_records = GROUP filtered_records BY year ; 4 Max_temp = FOREACH grouped_records GENERATE group, MAX (filtered_records.temperature) 5 DUMP max_temp ; The same previous example written in HiveQL 1 CREATE TABLE records (year string, temperature INT, quality INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' ; 2 LOAD DATA LOCAL 'data/sample.txt' OVERWRITE INTO TABLE records ; 3 SELECT year, MAX(temperature) FROM records WHERE temperature !=9999 AND (quality == 0 OR quality == 1) GROUP BY year ;
  • 22. - Sqoop Sqoop (SQL-to-Hadoop) efficiently transfers data from Hadoop HDFS to structured Relational Databases and vice-verça. Look at Sqoop as the ETL (Extract - Transform - Load) for an Hadoop environment. - Zookeeper Zookeeper provides a distributed configuration service, a synchronization service and a naming registry for distributed applications. Zookeeper is Hadoop’s way of coordinating all the elements of these distributed applications. -Mahout Mahout is a scalable machine learning and data mining library for Hadoop. Look at Mahout as the analytic software for an Hadoop environment. Mahout provides data mining and machine learning algorithms packaged in Java libraries to perform 4 types of analysis in an Hadoop environment: Recommendation mining, classification, clustering and association rules.
  • 23. BIG DATA ADOPTION : NOW OR LATER ?
  • 24. The answer to this question must lie in the integration and the operationalization of analytics as a whole part of the organization's business process. This suppose organization is data-driven. the big data approach is mostly suited to addressing or solving business problems that are subject to one or more of the following criteria: 1. Data throttling: 2. Computation-restricted throttling 3. Large data volumes 4. Significant data variety 5. Benefits from data parallelization
  • 25. What Should I remember ? • Even if we have always had a lot of data, the difference today is that significantly more of it exists, and it varies in type and timeliness. To cope with this problem , you have to think about managing data differently. That is where comes the "Big Data". • Big Data is the name given to the data management challenges and opportunities that emerge when dealing with data that is extremely large in volume, has extremely high velocity and is extremely wide in variety. • Big Data without Analytics is just data • Just Because You Have Insights Doesn’t Guarantee You Have The Power To Act on Them. • every problem is not suitable for Big Data • MapReduce is a programming model that allow to manage large-scale data computations in a way that is tolerant of hardware fault. • Hadoop is a platform that implements MapReduce and provide a redundant, reliable and distributed file system optimized for large files.
  • 26. Some Big Data Providers Here are some Big Data providers I personally know. There are some others. - Cloudera, with its first commercial distribution of Hadoop - HortonWorks, with its commercial distribution of Hadoop - SAS Institute with its SAS on Hadoop platform, SAS High Performance Suite, SAS Grid Computing and SAS Visual Analytics - HP with its platform called HP Vertica - EMC with its platform called GreenPlum Pivotal
  • 27. Bibliography & Resources http://www.cisjournal.org/archive/vol2no4/vol2no4_1.pdf Hybrid Recommender System Using Naive Bayes Classifier and Collaborative Filtering http://eprints.ecs.soton.ac.uk/18483/ Online applications : http://www.convo.co.uk/x02/ http://mahout.apache.org/ EMC Data Science & Big Data Analytics Training Module https://education.emc.com/guest/campaign/data_science.aspx SAS Official Predictive Modeling Training Course https://support.sas.com/edu/schedules.html?id=1366&ctry=us https://support.sas.com/edu/schedules.html?id=1220&ctry=US Big Data for Dummies by Judith Hurwitz, Alan NUGENT, Dr. Fern Halper, Marcia Kaufman ISBN : 978-1-118-50422-2 www.wiley.com Gartner : http://www.gartner.com/it-glossary/big-data/ The Harvard Business Review : https://hbr.org/2013/12/you-may-not-need-big-data-after-all/ar/1 MapReduce: Simplified Data Processing on Large Clusters (from Google) http://static.googleusercontent.com/media/research.google.com/fr//archive/mapreduce-osdi04.pdf Hadoop Apache Foundation http://hadoop.apache.org/ TDWI : http://tdwi.org/
  • 28. About Me • I am a freelance/Consultant who help organisations leverage their data to improve their performance through the right tool, the right methodology and the right technology. I have over 3 years of experience and 5 Certifications. I am a highly certified SAS Professional and also a certified EMC² Data Scientist. Contact Mail : jvc35@yahoo.fr Twitter : @Juvenal_JVC Linkedin : http://fr.linkedin.com/pub/juv%C3%A9nal-chokogoue/52/965/a8 Data Information Knowledge Actionable plans Performance
  • 29. Thank you for attending, I sincerely hope this module will be helpful for you ! The Full version will be available soon !!!!