Data Analytics
Data Science
Data Classification
Components
Data Analytics – Need
Data Analytics – Classification
Data Science – Roles
Data Analytics – Use Cases
Data Analytics – Success Stories
3
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Data Science
"Data Science" was used by
statisticians and economist in early
1970 and defined by Peter Naur in
1974.
Data Science” has gained popularity in
the last couple of years because of the
massive data deposits
Usage of Big Data technology to
explore data used in large corporates,
government and industries made the
term data science catchy.
4Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Data Science as Discipline
Data Science has emerged as a new discipline to
provide deep insight on the large volume of data.
Data Science is fusion of major disciplines like
Computational Algorithms, Statistics and
Visualization
90% of the world’s data has been created in the
last two years which includes 10% of structured
data and 80% of unstructured data
The digital universe is in data deluge and
estimated to be larger than the physical universe
and data unit measurement is predicted as
Geopbytes
5Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Data Classification
◦ Open Data
◦ Closed Data
◦ Hot Data
◦ Warm Data
◦ Cold Data
◦ Thin Data
◦ Thick Data
7Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Data Science Components
Pre-Processing
- ETL
Dash
Boards
ChartsPie,
Bar
Histogram
Data Models
Linear
Regression,
Decision Tree,
Dimensionality
Reduction
Clustering
Outlier
Analysis
Association
Analysis
8Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Data Analytics – Need for
today
Data considered as digital asset
similar to other property.
The organizations believe data
generated by them will provide deep
insights to understand their business
process for arriving strategic
decisions.
The earlier limitation of computational
storage and processing is overcome
by the technologies of cloud
computing and big data techniques.
9Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Data Science Vs Data Analytics
Data Science is a discipline which
groups techniques and methods from
various domains to study about data
and data analytics is a component in
Data Science.
Data Analytics is a process of
analyzing the dataset to find deep
insights of data using computational
algorithms and statistical methods.
There exists no common procedure to
10Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Data Analytics Vs Big Data
Analytics
Data Analytics is used to explore and
analyze datasets using statistical
methods and models.
Big Data Analytics is used to analyze
data with the characteristics of
Volume, Velocity and Variety by
integrating statistics, mathematics,
computational algorithms in Big data
Platform.
11Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
UNDERSTANDING DATA ANALYTICS
– A SCENARIO
Dr.V.Bhuvaneswari, Asst.Professor,
Dept. of Computer Applications,
Bhararthiar University 12
Data Analytics - Classification
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 13
Data Analytics - Methods
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 17
Data Science - Landscape
Dr.V.Bhuvaneswari, Asst.Professor,
Dept. of Computer Applications,
Bhararthiar University 18
Statistics in Data Analytics
Basics – Exploratory Data Analytics
Descriptive Data Analysis – Central
Tendency, Normal Distributions
Inferential Data Analysis – Sampling
Population – Annova, Paired T-test
Dr.V.Bhuvaneswari, Asst.Professor,
Dept. of Computer Applications,
Bhararthiar University 19
Predictive Analytics - Tasks
Dr.V.Bhuvaneswari, Asst.Professor,
Dept. of Computer Applications,
Bhararthiar University 20
Data Science – Emerging
Roles
Data Scientist is responsible for scrubbing data
to bring out deep insights of data
Skills : Expert in CS, Mathematics, Statistics
Work on open ended research problems
Data Engineer is responsible for managing and
administering the infrastructure and storage of
data.
Skills : Strong skills in Programming and Software Engineering
Deep Knowledge in Data warehousing
Expertise in Hadoop, NOSQL and SQL technologies
Data Analyst is one who views the data from one
source and has deep insight on the data based on
the organization guidance.
Skills : Competency Skills in understanding of Statistics
22Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Data Science Applications
Data Personalization - Logs, Tweets, Likes
Smart Pricing – Air Transportation
Financial Services – Fraud Detection
Insurance
Smart Grids – Energy Management
24Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Air Fare Management – Use
case 1
Objectives: Hike airfare based on High Value
Customers - CRM.
Strategic decision requires Understanding of data
insights
How customers are divided?
Which customer is high value customer?
Who is Frequent flyer?
How to retain customers?
Data sources :
Conventional Enterprise information
Data from weblogs, social media, competitors pricing
25Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Data Engineering
Airfare Classification (Economy, Business,First)
Analyse factors (Enterprise Datasources) – Data
Exploration techniques
Passenger Booking information
Forecasted data - Statistics
Inventory
Customers Behavioral data - Predictive Analytics –
Statistical models – Decision tree, classification
Information has to be gained from websites that
provide route information, dining, preferable locations
Holistic Analytics
Analyzing customer data from Social profiles,
sales, CRM etc.
26Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Complexities and Challenges
Data is larger than terabytes
Data integration
Variety data formats
Solution
Big data Accelerators
Hadoop ecosystem
Analytic components
Integrated data warehouses
Source: Big data spectrum Infosys
27Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Insurance Fraud Detection – Use
case Scenario
Data Engineering
Verifying customer data
Customer Profile analysis
Verification of claims raised
Fraud detection from disparate systems
Exact claim reimbursement
Data Sources
Data about customer, product sold from ERP,
CRM
Credit history from other sources
Data from social networking – Customer
profiles, product rating, credit rating from 3rd
parties 28Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Health Epidemics
Data Engineering
Kind of epidemics and target users
Causes and effects with respect to locations
Environmental and other related issues of
epidemics
Data on Awareness
Data Sources
EHR records, Medical Insurance claims,
Socialmedia – awareness, ERP Systems
Data Analytics
Descriptive Analytics
Predictive Analytics ( Model based
analysis) 29Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Big Data Challenges
Privacy Protection
All Big data stages collect, store, process,
knowledge
Integration with enterprise landscape
All systems store data in rdbms,DW
Does not support bulk loading to Big data store
Limited number of analytics from Mahout
Big data technologies lack visualization support
and deliverable methods
Leveraging cloud computing for big data applications
Addressing Real time needs with varied format
and volume 30Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
ORION - Franz Edelman
Award,
Award which recognizes excellence in Operations Research and analytics
ORION, an acronym that stands for On-Road Integrated Optimization and
Navigation, is perhaps the largest commercial analytics project ever
undertaken.
It’s required well over a decade to build and roll out, and more than $250
million of investment by UPS.
At its peak, over 700 UPSers were working on change management and
rollout of the system. So the company clearly went all in on this project.
The company is receiving something in return for its investment; and indeed
it is.
savings (in driver productivity and fuel economy) of between $300 and $400
million a year?
How about 100 million fewer miles driven and a resulting cut in carbon
emissions of 100,000 metric tons a year?
benefit from an analytics project very often, and these have been confirmed
through intensive measurement and reported to Wall Street analysts. –
See more at: http://data-informed.com/prescriptive-analytics-project-
delivering-big-dividends-at-ups/#sthash.HcY5kYwu.dpuf
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 32
Predictive analytics - Netflix
Netflix, however, has raised the TV
show batting average considerably.
The company’s use of predictive
analytics to improve customer
recommendation algorithms for
movies.
The company has used analytics to
predict whether TV shows will be
home runs, solid base hits, or
strikeouts Dr.V.Bhuvaneswari, Asst.Professor,
Dept. of Computer Applications,
Bhararthiar University 33
Predictive Analytics – 2022
Source - Dataquest
Antitheft. As you enter your car, a predictive model establishes your identity
based on several biometric readings, rendering it virtually impossible for an
imposter to start the engine.
Entertainment. Spotify plays new music it predicts you will like.
Traffic. Your navigator pipes up and suggests alternative routing due to
predicted traffic delays. Because the new route has hills and your car’s
battery – its only energy source – is low, your maximum acceleration is
decreased.
Breakfast. An en route drive-through restaurant is suggested by a
recommendation system that knows its daily food preference predictions
must be accurate or you will disable it.
Social. Your Social Techretary offers to read you select Facebook feeds and
Match.com responses it predicts will be of greatest interest. Inappropriate
comments are filtered out. CareerBuilder offers to read postings of jobs for
which you’re predicted to apply. When playing your voice mail, solicitations
such as robocall messages are screened by a predictive model just like e-
mail spam.
Deals. You accept your smartphone’s offer to read to you a text message
from your wireless carrier. Apparently, they’ve predicted you’re going to
switch to a competitor, because they are offering a huge discount on theDr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 34
IoT- Data Analytics -
Manufacturing
According to Accenture, the Industrial
Internet of Things has the potential to
add more than $14 trillion to the global
economy by 2030.
Small sensors placed on complex
machinery emit performance data that
can be used to adjust scheduled
maintenance.
With this functionality, industries such as
energy and oil extraction are now able to
predict and mitigate equipment failures,
significantly reducing downtime,
increasing site safety, and cutting costs.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University 35
IoT – Big Data Analytics
Experts are predicting fully automated farms in the next five years,
but already monster machines, such as the New Holland T8.435
tractor, are becoming commonplace not only on very big farms, but
also on mid-sized ones.
The tractor’s steering is assisted by satellite. It downloads crop and
soil data straight to agronomists and farm managers, works 24/7,
can link with ground sensors and drones using infrared thermal
cameras to tell, within a square meter, the size of a field and where
the most fertile or waterlogged places are. Big data, machinery,
climatology, and agronomy are all combining to increase productivity
and reduce labor costs.
Livestock farming has not gone unnoticed by big data and IoT
developers, either. Wearable technology is no longer just for
humans. Any animal, from elephants to cows to cats and dogs, can
wear or be injected with devices that capture health and behavioral
data.
iNOVOTEC Animal Care, for example, has created wearable and
ingestible devices that provide information about an animal’s
condition that is not easily observable. This enables farmers to catch
illnesses much earlier, leading to healthier stock and cost savings.
36Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University