SlideShare a Scribd company logo
1 of 32
Customer Feedback Analytics
Nishant M Gandhi
Prof: Kam Heydari
CSYE 7250 Big Data Arch and Governance
Spring 2018
“Your brand is what people say about you when you’re not in the room.” 
: Jeff Bezos
Project Goal and Objectives
Better understand customer sentiment towards Starbucks brand and
services by leveraging social media and customer service email data
and using AI technologies.
Objectives:
• Improve upon products and service delivery based on customer
feedback
• Reflect on effectiveness of marketing campaign
Value Proposition
• Cost of NOT implementing:
• Approximate increase in revenue by $52.5 million per Quarter (1% of Quarter
Revenue)
• Strategic Initiative:
• Align with the core value of company: Best Customer Experience
• Help company to adopt AI revolution
• Improve the effectiveness of marketing campaign with Feedback Analytics
What are the obstacles to getting there?
• The company does not have/own required technology and resources
for capturing social media data and process them for business value.
• The company does not have right talent with required technical
expertise for such project.
Who are the key stakeholders and what are
their roles?
• Enterprise Data Governance Office (Data Owner)
• Data Governance Council (Policy Design and Decision Body)
• Chief Data Officer (Execution & Management Role, Meet Data Compliances)
• Data Steward
• Project Manager (Responsible for timely delivery)
• HR Office
• Talent Management (Allocate Resource and Hire Talent)
• Sales Office
• Provide customer complain email data access
• Marketing Office
• Project Service Consumer
• CEO Office
• Project Service Consumer
Starbucks Social Media Presence
• Twitter (11.9M followers)
• Facebook (37.19M likes)
• Instagram (16.2M followers)
• Google+ (4.8M followers)
• YouTube (158.226K subscribers)
Starbucks Email Address for Customer
Request/Complain
• Online Email Forms
• Company Information
• Starbucks in the Grocery Aisle
• Nutritional Information
• Starbucks.com Web Site
• Mobile Applications
• In Our Stores
• Starbucks Rewards
• Starbucks Cards
• Security Video Request
• security@starbucks.com
Data Volume
Data Sources Data Velocity Average Size Total Size
Twitter 40,000,000 tweets 80 byte 3.2 GB
Facebook 48,000,000 engagement 200 byte 9.6 GB
Instagram 40,000,000 engagement 150 byte 6 GB
Google+ 1,000,000 engagement 90 byte 0.09 GB
Youtube 158,000 engagement 130 byte 0.02 GB
Emails 100,000 engagement 2000 byte 0.2 GB
Total: 19.11 GB /month
Monthly Data Volume:
Pareto 80-20 rule based assumption
Vision Diagram
External Sources
Internal Sources
Database
Natural
Language
Processing
App
Data Storage File System
CEO Office
Marketing
Office
Sequence Diagram
Data Flow Diagram
Solution Architecture
Data Sources
Customer
Service
Email
Database
Social
Media
Accounts
Data
Integration
Data Management Platform
Data Access
Layer
Stream
Data
Collector
Batch
Data
Collector
Big Data Lake (Data Landing, Data
Archival)
Database
Engine
NLP App
JDBC
Rest API
Data
Sanitization
App
Security Monitoring Availability Scalability
Queue
BI Tool
Vendor Selection Strategy
• Reduce the cost of ownership (Available as cloud services)
• Reliable and industry proven
• Enterprise support availability
• Easy to scale
Big Data Technology Stack
Security
Management
Framework
Database
Data Access
Visualization
Data Ingestion Queue: Kafka
• Kafka is distributed streaming platform with scalable and fault-
tolerance.
• It is one of the most used open source queue platform for delivering
streaming data with pub-sub model.
Data Cleaning and NLP Processing: Spark
• Apache Spark is fast distributed data processing platform.
• It provides inbuilt support and libraries for data processing and
machine learning applications.
• It is industry de-facto tool for big data processing.
Data Landing and Storage: Hadoop
• Hadoop is distributed storage and computation framework. It has
grown from minimalistic platform to huge ecosystem of tools on top
of Hadoop.
• It is ideal for big data storage with its distributed file system called
HDFS.
Data Archiving: Amazon S3 Storage Block
• Amazon web services provides the storage service called “Simple
Storage Service” or S3.
• The S3 support the storage of huge file systems and industry de-facto
for the purpose of archiving historical data.
• The S3 provides great management interface and data access security.
Database Engine: Cassandra
• Apache Cassandra is distributed NoSQL database which is optimized
for extremely fast Analytics Query.
• The company Datastax provides enterprise management platform for
Apache Cassandra.
• Apache Cassandra does not support free style query and database
join but extremely fast in plain query processing and very easy to
scale.
Visualization: Qlik View
• The Qlik is one of the market leading visualization tool for analytics
and real time reporting purpose.
Monitoring: ELK Stack
• The ELK stack is industry de-facto for log monitoring tool.
• It has elastic search as document database which is built on Lucene
search engine. The document search provides easy keyword search
on logs.
• Kibana is visualization and dashboard tool for Realtime visually
monitoring application as well as infrastructure logs.
Management: Oozie & Yarn
• Hadoop YARN help in launching Spark job on multiple node cluster. It
is key to manage the distributed data processing task on spark cluster.
• Oozie is workflow management tool for streamlining different
application for workflow.
Security: Apache Ranger
• Apache Ranger is to provide comprehensive security across the
Apache Hadoop ecosystem.
• It provides enhanced support for different authorization methods -
Role based access control, attribute based access control etc on
Hadoop
• It also has centralize auditing of user access and administrative
actions (security related) within all the components of Hadoop.
Solution Architecture
Data Sources
Customer
Service
Email
Data
Integration
Data Management Platform
Data Access
Layer
Java Spring
Streaming
Java Spring
Batch-
Import
Big Data Platform
Database
Engine
NLP AppData
Sanitization
App
Security Monitoring Availability Scalability
What System does
• Ingestion:
• The Java Streaming App will use stream API of Facebook, Twitter, Google+,
Instagram and Youtube to consume data and put it on Kafka Queue.
• The Java Batch App will import batch of email every night and put them in
Kafka.
• Data Cleaning:
• The spark streaming app will consume data from Kafka queue and process
them for validation and data format. The cleaned data will be dumped into
Hadoop-HDFS cluster.
• NLP Processing:
• The Spark-ML app will read Cleaned data from Hadoop-HDFS. The app
append sentiment score and topic categorization. Finally save result on
Cassandra.
What System does
• Data Access
• The Cassandra provides JDBC and REST API to query the sentiment score and
data category analytics.
• The existing visualization products can be used to access data using either
JDBC or REST API for custom business questions and analytics.
• Data Backup and Archiving
• Amazon S3 will be used to do periodic backup and archiving of data.
• Scalability, Monitoring, Availability and Security
• Apache Ranger provides Role based and Group based data access security for
authentication.
• ELK stack is used to monitor logs.
• All platform vendors are selected for their capability of scalability and
availability.
Capacity Planning for 6 months
Storage:
• Data size: 120 GB
• After Replication factor = 3 x 120 = 360 GB
• Amazon S3 Standard Storage (under 50 TB cap)
Capacity Planning for 6 months
Processing:
AWS Service Capacity # of instances Specs
Hadoop EMR c5.2xlarge 5 16 GB RAM / 8 Core
EC2 Cassandra ds2.xlarge 3 32 GB RAM / 4 Core
EC2 for Kafka m5.xlarge 3 16 GB RAM / 4 Core
EC2 for Java Stream App m5.xlarge 1 16 GB RAM / 4 Core
EC2 for Java Batch App m5.xlarge 1 16 GB RAM / 4 Core
EC2 for ELK Stack ds2.xlarge 1 32 GB RAM / 4 Core
Data Backup and Data Archival Strategy
• Data Archival
• Dataset Selection for Archival (we are not going to archive everything)
• Data Archiving Stage: 2 year
• Email Record Retention: 7 years
• Social Media Data Retention: 5 years
• Data Backup
• Data Backup Interval: 24 hr
• Total Data Backup History Retention: 5 (keep last 5 backups)
GDPR Compliances Plan
• Data Control
• Data Security
• Right to Erasure
• Risk Mitigation and Due Diligence
• Breach Notification
Responsibility of meeting GDPR compliances is of Chief Data Officer
Thank You

More Related Content

What's hot

Sprouts Farmers Market Digital Campaign Strategy
Sprouts Farmers Market Digital Campaign StrategySprouts Farmers Market Digital Campaign Strategy
Sprouts Farmers Market Digital Campaign Strategy
Summer Borowski
 
White Paper - Data Warehouse Documentation Roadmap
White Paper -  Data Warehouse Documentation RoadmapWhite Paper -  Data Warehouse Documentation Roadmap
White Paper - Data Warehouse Documentation Roadmap
David Walker
 

What's hot (20)

Data warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail businessData warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail business
 
Creating a Data Driven Culture
Creating a Data Driven Culture Creating a Data Driven Culture
Creating a Data Driven Culture
 
Walmart Big Data Expo
Walmart Big Data ExpoWalmart Big Data Expo
Walmart Big Data Expo
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
The Evolving Role of the Data Architect – What Does It Mean for Your Career?
The Evolving Role of the Data Architect – What Does It Mean for Your Career?The Evolving Role of the Data Architect – What Does It Mean for Your Career?
The Evolving Role of the Data Architect – What Does It Mean for Your Career?
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and Strategy
 
Bechtel Customer Presentation
Bechtel Customer PresentationBechtel Customer Presentation
Bechtel Customer Presentation
 
Key Considerations While Rolling Out Denodo Platform
Key Considerations While Rolling Out Denodo PlatformKey Considerations While Rolling Out Denodo Platform
Key Considerations While Rolling Out Denodo Platform
 
Sprouts Farmers Market Digital Campaign Strategy
Sprouts Farmers Market Digital Campaign StrategySprouts Farmers Market Digital Campaign Strategy
Sprouts Farmers Market Digital Campaign Strategy
 
White Paper - Data Warehouse Documentation Roadmap
White Paper -  Data Warehouse Documentation RoadmapWhite Paper -  Data Warehouse Documentation Roadmap
White Paper - Data Warehouse Documentation Roadmap
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 

Similar to Customer Feedback Analytics for Starbucks

Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark Streaming
Databricks
 

Similar to Customer Feedback Analytics for Starbucks (20)

Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201Creating a Data Driven Culture with Amazon QuickSight - Technical 201
Creating a Data Driven Culture with Amazon QuickSight - Technical 201
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark Streaming
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
Hadoop and SAP BI
Hadoop and SAP BI   Hadoop and SAP BI
Hadoop and SAP BI
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 

More from Nishant Gandhi (8)

Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Graph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using HadoopGraph Coloring Algorithms on Pregel Model using Hadoop
Graph Coloring Algorithms on Pregel Model using Hadoop
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
 
Packet tracer practical guide
Packet tracer practical guidePacket tracer practical guide
Packet tracer practical guide
 
Hadoop Report
Hadoop ReportHadoop Report
Hadoop Report
 
Hadoop
HadoopHadoop
Hadoop
 

Recently uploaded

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Recently uploaded (20)

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

Customer Feedback Analytics for Starbucks

  • 1. Customer Feedback Analytics Nishant M Gandhi Prof: Kam Heydari CSYE 7250 Big Data Arch and Governance Spring 2018
  • 2. “Your brand is what people say about you when you’re not in the room.”  : Jeff Bezos
  • 3. Project Goal and Objectives Better understand customer sentiment towards Starbucks brand and services by leveraging social media and customer service email data and using AI technologies. Objectives: • Improve upon products and service delivery based on customer feedback • Reflect on effectiveness of marketing campaign
  • 4. Value Proposition • Cost of NOT implementing: • Approximate increase in revenue by $52.5 million per Quarter (1% of Quarter Revenue) • Strategic Initiative: • Align with the core value of company: Best Customer Experience • Help company to adopt AI revolution • Improve the effectiveness of marketing campaign with Feedback Analytics
  • 5. What are the obstacles to getting there? • The company does not have/own required technology and resources for capturing social media data and process them for business value. • The company does not have right talent with required technical expertise for such project.
  • 6. Who are the key stakeholders and what are their roles? • Enterprise Data Governance Office (Data Owner) • Data Governance Council (Policy Design and Decision Body) • Chief Data Officer (Execution & Management Role, Meet Data Compliances) • Data Steward • Project Manager (Responsible for timely delivery) • HR Office • Talent Management (Allocate Resource and Hire Talent) • Sales Office • Provide customer complain email data access • Marketing Office • Project Service Consumer • CEO Office • Project Service Consumer
  • 7. Starbucks Social Media Presence • Twitter (11.9M followers) • Facebook (37.19M likes) • Instagram (16.2M followers) • Google+ (4.8M followers) • YouTube (158.226K subscribers)
  • 8. Starbucks Email Address for Customer Request/Complain • Online Email Forms • Company Information • Starbucks in the Grocery Aisle • Nutritional Information • Starbucks.com Web Site • Mobile Applications • In Our Stores • Starbucks Rewards • Starbucks Cards • Security Video Request • security@starbucks.com
  • 9. Data Volume Data Sources Data Velocity Average Size Total Size Twitter 40,000,000 tweets 80 byte 3.2 GB Facebook 48,000,000 engagement 200 byte 9.6 GB Instagram 40,000,000 engagement 150 byte 6 GB Google+ 1,000,000 engagement 90 byte 0.09 GB Youtube 158,000 engagement 130 byte 0.02 GB Emails 100,000 engagement 2000 byte 0.2 GB Total: 19.11 GB /month Monthly Data Volume: Pareto 80-20 rule based assumption
  • 10. Vision Diagram External Sources Internal Sources Database Natural Language Processing App Data Storage File System CEO Office Marketing Office
  • 13. Solution Architecture Data Sources Customer Service Email Database Social Media Accounts Data Integration Data Management Platform Data Access Layer Stream Data Collector Batch Data Collector Big Data Lake (Data Landing, Data Archival) Database Engine NLP App JDBC Rest API Data Sanitization App Security Monitoring Availability Scalability Queue BI Tool
  • 14. Vendor Selection Strategy • Reduce the cost of ownership (Available as cloud services) • Reliable and industry proven • Enterprise support availability • Easy to scale
  • 15. Big Data Technology Stack Security Management Framework Database Data Access Visualization
  • 16. Data Ingestion Queue: Kafka • Kafka is distributed streaming platform with scalable and fault- tolerance. • It is one of the most used open source queue platform for delivering streaming data with pub-sub model.
  • 17. Data Cleaning and NLP Processing: Spark • Apache Spark is fast distributed data processing platform. • It provides inbuilt support and libraries for data processing and machine learning applications. • It is industry de-facto tool for big data processing.
  • 18. Data Landing and Storage: Hadoop • Hadoop is distributed storage and computation framework. It has grown from minimalistic platform to huge ecosystem of tools on top of Hadoop. • It is ideal for big data storage with its distributed file system called HDFS.
  • 19. Data Archiving: Amazon S3 Storage Block • Amazon web services provides the storage service called “Simple Storage Service” or S3. • The S3 support the storage of huge file systems and industry de-facto for the purpose of archiving historical data. • The S3 provides great management interface and data access security.
  • 20. Database Engine: Cassandra • Apache Cassandra is distributed NoSQL database which is optimized for extremely fast Analytics Query. • The company Datastax provides enterprise management platform for Apache Cassandra. • Apache Cassandra does not support free style query and database join but extremely fast in plain query processing and very easy to scale.
  • 21. Visualization: Qlik View • The Qlik is one of the market leading visualization tool for analytics and real time reporting purpose.
  • 22. Monitoring: ELK Stack • The ELK stack is industry de-facto for log monitoring tool. • It has elastic search as document database which is built on Lucene search engine. The document search provides easy keyword search on logs. • Kibana is visualization and dashboard tool for Realtime visually monitoring application as well as infrastructure logs.
  • 23. Management: Oozie & Yarn • Hadoop YARN help in launching Spark job on multiple node cluster. It is key to manage the distributed data processing task on spark cluster. • Oozie is workflow management tool for streamlining different application for workflow.
  • 24. Security: Apache Ranger • Apache Ranger is to provide comprehensive security across the Apache Hadoop ecosystem. • It provides enhanced support for different authorization methods - Role based access control, attribute based access control etc on Hadoop • It also has centralize auditing of user access and administrative actions (security related) within all the components of Hadoop.
  • 25. Solution Architecture Data Sources Customer Service Email Data Integration Data Management Platform Data Access Layer Java Spring Streaming Java Spring Batch- Import Big Data Platform Database Engine NLP AppData Sanitization App Security Monitoring Availability Scalability
  • 26. What System does • Ingestion: • The Java Streaming App will use stream API of Facebook, Twitter, Google+, Instagram and Youtube to consume data and put it on Kafka Queue. • The Java Batch App will import batch of email every night and put them in Kafka. • Data Cleaning: • The spark streaming app will consume data from Kafka queue and process them for validation and data format. The cleaned data will be dumped into Hadoop-HDFS cluster. • NLP Processing: • The Spark-ML app will read Cleaned data from Hadoop-HDFS. The app append sentiment score and topic categorization. Finally save result on Cassandra.
  • 27. What System does • Data Access • The Cassandra provides JDBC and REST API to query the sentiment score and data category analytics. • The existing visualization products can be used to access data using either JDBC or REST API for custom business questions and analytics. • Data Backup and Archiving • Amazon S3 will be used to do periodic backup and archiving of data. • Scalability, Monitoring, Availability and Security • Apache Ranger provides Role based and Group based data access security for authentication. • ELK stack is used to monitor logs. • All platform vendors are selected for their capability of scalability and availability.
  • 28. Capacity Planning for 6 months Storage: • Data size: 120 GB • After Replication factor = 3 x 120 = 360 GB • Amazon S3 Standard Storage (under 50 TB cap)
  • 29. Capacity Planning for 6 months Processing: AWS Service Capacity # of instances Specs Hadoop EMR c5.2xlarge 5 16 GB RAM / 8 Core EC2 Cassandra ds2.xlarge 3 32 GB RAM / 4 Core EC2 for Kafka m5.xlarge 3 16 GB RAM / 4 Core EC2 for Java Stream App m5.xlarge 1 16 GB RAM / 4 Core EC2 for Java Batch App m5.xlarge 1 16 GB RAM / 4 Core EC2 for ELK Stack ds2.xlarge 1 32 GB RAM / 4 Core
  • 30. Data Backup and Data Archival Strategy • Data Archival • Dataset Selection for Archival (we are not going to archive everything) • Data Archiving Stage: 2 year • Email Record Retention: 7 years • Social Media Data Retention: 5 years • Data Backup • Data Backup Interval: 24 hr • Total Data Backup History Retention: 5 (keep last 5 backups)
  • 31. GDPR Compliances Plan • Data Control • Data Security • Right to Erasure • Risk Mitigation and Due Diligence • Breach Notification Responsibility of meeting GDPR compliances is of Chief Data Officer