SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
Story of Building a Telecom
Data Solution
Sawinder Pal Kaur, PhD
Data Scientist, SAP Labs
Outline
1. Define business objectives and translating business
problem into data science problem
2. Introduction to Telecom data - data scale, volume,
continuous and categorical variables, static and dynamic
data
3. Architecture and data processing pipeline: Big data
handling and data science methods for Categorical
feature selection
4. Solution Engineering: How to keep project managers do
feature selection and identify the opportunities to
optimize the existing plans and services?
Business Objective
Business Objective
• Personalize
recommendation
• More customer satisfaction
• Improved Customer
retention
• Increased frequency of
selling
• Better mix of products
• Increased customer loyalty
• Better decision on coupons
and discounts
• Develop effective strategy for
new product launches
• Better offers to specific
customer profile
• Better product design /
pricing
• Improve quality of service
for highest margin
customers
• Invest where highest
margin customers are
using the network
resources
Recommend
Plans and Services
Grouping/
Clustering
Identify Profit
Maximization
Opportunities
Telecom Data &
Data Processing
Pipeline
Data
• How much data is available?
• Data infrastructure
• Data dashboards
• Data preparation for
Machine learning
• Data protection and privacy
Partitioning the data into similar groups
Multi dimensional clustering
Grouping customers-
One dimensional
binning/clustering
High, low, and normal
profitable customers -
One dimensional outlier
detection
Multi dimensional outlier detection
• Dealing with missing –
• Delete the rows with missing
• Replace missing using
• mean/median
• Other number
• Conditional mean
• Model like K nearest neighborhood
• Filter Methods – used as independent feature selection e.g.
Pearson correlation, Mutual Information, MRMR
• Dimensionality reduction – PCA, Variational autoencoder
• Feature Engineering
• Creating new variables – Polynomials, Interaction variables, Ratios
• Wrapper and Embedded methods - used in the model building
process
Feature
selection
Base set
Learning
Model
Performance
Business Insights
Cluster Size Revenue Profit Usage Discount Cost
1 1283 0.05 -0.24 0.90 0.23 0.46
2 582 -0.13 -0.05 -0.15 -1.87 -0.10
3 71 -0.28 -0.55 0.05 -8.07 0.46
4 5309 -0.17 -0.01 -0.37 0.25 -0.25
5 9 19.37 16.26 1.12 -0.06 3.03
6 222 0.10 -1.19 3.66 0.13 2.06
7 270 2.75 2.35 0.11 0.08 0.36
8 8 0.64 -12.55 6.61 0.25 20.97
Revenue, profit and
cost is
very high
Profit is very low
profit and cost and
volume are very high
Telecom Data Analytics

Contenu connexe

Similaire à Telecom Data Analytics

Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Jeremy Lehman
 
Tata steel ideation contest
Tata steel ideation contestTata steel ideation contest
Tata steel ideation contestashwinikumar1424
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...Value Amplify Consulting
 
PrADS Introduction & offerings 2017
PrADS Introduction & offerings 2017 PrADS Introduction & offerings 2017
PrADS Introduction & offerings 2017 Kiran Kumar Muthyala
 
Business intelligence prof nikhat fatma mumtaz husain shaikh
Business intelligence  prof nikhat fatma mumtaz husain shaikhBusiness intelligence  prof nikhat fatma mumtaz husain shaikh
Business intelligence prof nikhat fatma mumtaz husain shaikhNikhat Fatma Mumtaz Husain Shaikh
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxshumPanwar
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AIGary Allemann
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Vivastream
 
Data Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsData Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsAyeshaSharma29
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxellamangapis2003
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsVivastream
 

Similaire à Telecom Data Analytics (20)

Data mining
Data miningData mining
Data mining
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420
 
Tata steel ideation
Tata steel ideationTata steel ideation
Tata steel ideation
 
Tata steel ideation contest
Tata steel ideation contestTata steel ideation contest
Tata steel ideation contest
 
Tata steel ideation contest
Tata steel ideation contestTata steel ideation contest
Tata steel ideation contest
 
RowanDay3.pptx
RowanDay3.pptxRowanDay3.pptx
RowanDay3.pptx
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 
PrADS Introduction & offerings 2017
PrADS Introduction & offerings 2017 PrADS Introduction & offerings 2017
PrADS Introduction & offerings 2017
 
Business intelligence prof nikhat fatma mumtaz husain shaikh
Business intelligence  prof nikhat fatma mumtaz husain shaikhBusiness intelligence  prof nikhat fatma mumtaz husain shaikh
Business intelligence prof nikhat fatma mumtaz husain shaikh
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
 
Deep learning
Deep learningDeep learning
Deep learning
 
Data Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsData Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India Analytics
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
 
HashCash big data services
HashCash big data servicesHashCash big data services
HashCash big data services
 
Technology to decision analysis
Technology to decision analysisTechnology to decision analysis
Technology to decision analysis
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 

Dernier

Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 

Dernier (17)

Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 

Telecom Data Analytics

  • 1. Story of Building a Telecom Data Solution Sawinder Pal Kaur, PhD Data Scientist, SAP Labs
  • 2. Outline 1. Define business objectives and translating business problem into data science problem 2. Introduction to Telecom data - data scale, volume, continuous and categorical variables, static and dynamic data 3. Architecture and data processing pipeline: Big data handling and data science methods for Categorical feature selection 4. Solution Engineering: How to keep project managers do feature selection and identify the opportunities to optimize the existing plans and services?
  • 4. Business Objective • Personalize recommendation • More customer satisfaction • Improved Customer retention • Increased frequency of selling • Better mix of products • Increased customer loyalty • Better decision on coupons and discounts • Develop effective strategy for new product launches • Better offers to specific customer profile • Better product design / pricing • Improve quality of service for highest margin customers • Invest where highest margin customers are using the network resources Recommend Plans and Services Grouping/ Clustering Identify Profit Maximization Opportunities
  • 5. Telecom Data & Data Processing Pipeline
  • 6. Data • How much data is available? • Data infrastructure • Data dashboards • Data preparation for Machine learning • Data protection and privacy
  • 7. Partitioning the data into similar groups Multi dimensional clustering Grouping customers- One dimensional binning/clustering
  • 8. High, low, and normal profitable customers - One dimensional outlier detection Multi dimensional outlier detection
  • 9. • Dealing with missing – • Delete the rows with missing • Replace missing using • mean/median • Other number • Conditional mean • Model like K nearest neighborhood
  • 10. • Filter Methods – used as independent feature selection e.g. Pearson correlation, Mutual Information, MRMR • Dimensionality reduction – PCA, Variational autoencoder • Feature Engineering • Creating new variables – Polynomials, Interaction variables, Ratios • Wrapper and Embedded methods - used in the model building process Feature selection Base set Learning Model Performance
  • 12. Cluster Size Revenue Profit Usage Discount Cost 1 1283 0.05 -0.24 0.90 0.23 0.46 2 582 -0.13 -0.05 -0.15 -1.87 -0.10 3 71 -0.28 -0.55 0.05 -8.07 0.46 4 5309 -0.17 -0.01 -0.37 0.25 -0.25 5 9 19.37 16.26 1.12 -0.06 3.03 6 222 0.10 -1.19 3.66 0.13 2.06 7 270 2.75 2.35 0.11 0.08 0.36 8 8 0.64 -12.55 6.61 0.25 20.97 Revenue, profit and cost is very high Profit is very low profit and cost and volume are very high