Customer Feedback Analytics for Starbucks

Customer Feedback Analytics
Nishant M Gandhi
Prof: Kam Heydari
CSYE 7250 Big Data Arch and Governance
Spring 2018

“Your brand is what people say about you when you’re not in the room.”
: Jeff Bezos

Project Goal and Objectives
Better understand customer sentiment towards Starbucks brand and
services by leveraging social media and customer service email data
and using AI technologies.
Objectives:
• Improve upon products and service delivery based on customer
feedback
• Reflect on effectiveness of marketing campaign

Value Proposition
• Cost of NOT implementing:
• Approximate increase in revenue by $52.5 million per Quarter (1% of Quarter
Revenue)
• Strategic Initiative:
• Align with the core value of company: Best Customer Experience
• Help company to adopt AI revolution
• Improve the effectiveness of marketing campaign with Feedback Analytics

What are the obstacles to getting there?
• The company does not have/own required technology and resources
for capturing social media data and process them for business value.
• The company does not have right talent with required technical
expertise for such project.

Who are the key stakeholders and what are
their roles?
• Enterprise Data Governance Office (Data Owner)
• Data Governance Council (Policy Design and Decision Body)
• Chief Data Officer (Execution & Management Role, Meet Data Compliances)
• Data Steward
• Project Manager (Responsible for timely delivery)
• HR Office
• Talent Management (Allocate Resource and Hire Talent)
• Sales Office
• Provide customer complain email data access
• Marketing Office
• Project Service Consumer
• CEO Office
• Project Service Consumer

Starbucks Social Media Presence
• Twitter (11.9M followers)
• Facebook (37.19M likes)
• Instagram (16.2M followers)
• Google+ (4.8M followers)
• YouTube (158.226K subscribers)

Starbucks Email Address for Customer
Request/Complain
• Online Email Forms
• Company Information
• Starbucks in the Grocery Aisle
• Nutritional Information
• Starbucks.com Web Site
• Mobile Applications
• In Our Stores
• Starbucks Rewards
• Starbucks Cards
• Security Video Request
• security@starbucks.com

Data Volume
Data Sources Data Velocity Average Size Total Size
Twitter 40,000,000 tweets 80 byte 3.2 GB
Facebook 48,000,000 engagement 200 byte 9.6 GB
Instagram 40,000,000 engagement 150 byte 6 GB
Google+ 1,000,000 engagement 90 byte 0.09 GB
Youtube 158,000 engagement 130 byte 0.02 GB
Emails 100,000 engagement 2000 byte 0.2 GB
Total: 19.11 GB /month
Monthly Data Volume:
Pareto 80-20 rule based assumption

Vision Diagram
External Sources
Internal Sources
Database
Natural
Language
Processing
App
Data Storage File System
CEO Office
Marketing
Office

Solution Architecture
Data Sources
Customer
Service
Email
Database
Social
Media
Accounts
Data
Integration
Data Management Platform
Data Access
Layer
Stream
Data
Collector
Batch
Data
Collector
Big Data Lake (Data Landing, Data
Archival)
Database
Engine
NLP App
JDBC
Rest API
Data
Sanitization
App
Security Monitoring Availability Scalability
Queue
BI Tool

Vendor Selection Strategy
• Reduce the cost of ownership (Available as cloud services)
• Reliable and industry proven
• Enterprise support availability
• Easy to scale

Big Data Technology Stack
Security
Management
Framework
Database
Data Access
Visualization

Data Ingestion Queue: Kafka
• Kafka is distributed streaming platform with scalable and fault-
tolerance.
• It is one of the most used open source queue platform for delivering
streaming data with pub-sub model.

Data Cleaning and NLP Processing: Spark
• Apache Spark is fast distributed data processing platform.
• It provides inbuilt support and libraries for data processing and
machine learning applications.
• It is industry de-facto tool for big data processing.

Data Landing and Storage: Hadoop
• Hadoop is distributed storage and computation framework. It has
grown from minimalistic platform to huge ecosystem of tools on top
of Hadoop.
• It is ideal for big data storage with its distributed file system called
HDFS.

Data Archiving: Amazon S3 Storage Block
• Amazon web services provides the storage service called “Simple
Storage Service” or S3.
• The S3 support the storage of huge file systems and industry de-facto
for the purpose of archiving historical data.
• The S3 provides great management interface and data access security.

Database Engine: Cassandra
• Apache Cassandra is distributed NoSQL database which is optimized
for extremely fast Analytics Query.
• The company Datastax provides enterprise management platform for
Apache Cassandra.
• Apache Cassandra does not support free style query and database
join but extremely fast in plain query processing and very easy to
scale.

Visualization: Qlik View
• The Qlik is one of the market leading visualization tool for analytics
and real time reporting purpose.

Monitoring: ELK Stack
• The ELK stack is industry de-facto for log monitoring tool.
• It has elastic search as document database which is built on Lucene
search engine. The document search provides easy keyword search
on logs.
• Kibana is visualization and dashboard tool for Realtime visually
monitoring application as well as infrastructure logs.

Management: Oozie & Yarn
• Hadoop YARN help in launching Spark job on multiple node cluster. It
is key to manage the distributed data processing task on spark cluster.
• Oozie is workflow management tool for streamlining different
application for workflow.

Security: Apache Ranger
• Apache Ranger is to provide comprehensive security across the
Apache Hadoop ecosystem.
• It provides enhanced support for different authorization methods -
Role based access control, attribute based access control etc on
Hadoop
• It also has centralize auditing of user access and administrative
actions (security related) within all the components of Hadoop.

Solution Architecture
Data Sources
Customer
Service
Email
Data
Integration
Data Management Platform
Data Access
Layer
Java Spring
Streaming
Java Spring
Batch-
Import
Big Data Platform
Database
Engine
NLP AppData
Sanitization
App
Security Monitoring Availability Scalability

What System does
• Ingestion:
• The Java Streaming App will use stream API of Facebook, Twitter, Google+,
Instagram and Youtube to consume data and put it on Kafka Queue.
• The Java Batch App will import batch of email every night and put them in
Kafka.
• Data Cleaning:
• The spark streaming app will consume data from Kafka queue and process
them for validation and data format. The cleaned data will be dumped into
Hadoop-HDFS cluster.
• NLP Processing:
• The Spark-ML app will read Cleaned data from Hadoop-HDFS. The app
append sentiment score and topic categorization. Finally save result on
Cassandra.

What System does
• Data Access
• The Cassandra provides JDBC and REST API to query the sentiment score and
data category analytics.
• The existing visualization products can be used to access data using either
JDBC or REST API for custom business questions and analytics.
• Data Backup and Archiving
• Amazon S3 will be used to do periodic backup and archiving of data.
• Scalability, Monitoring, Availability and Security
• Apache Ranger provides Role based and Group based data access security for
authentication.
• ELK stack is used to monitor logs.
• All platform vendors are selected for their capability of scalability and
availability.

Capacity Planning for 6 months
Storage:
• Data size: 120 GB
• After Replication factor = 3 x 120 = 360 GB
• Amazon S3 Standard Storage (under 50 TB cap)

Capacity Planning for 6 months
Processing:
AWS Service Capacity # of instances Specs
Hadoop EMR c5.2xlarge 5 16 GB RAM / 8 Core
EC2 Cassandra ds2.xlarge 3 32 GB RAM / 4 Core
EC2 for Kafka m5.xlarge 3 16 GB RAM / 4 Core
EC2 for Java Stream App m5.xlarge 1 16 GB RAM / 4 Core
EC2 for Java Batch App m5.xlarge 1 16 GB RAM / 4 Core
EC2 for ELK Stack ds2.xlarge 1 32 GB RAM / 4 Core

Data Backup and Data Archival Strategy
• Data Archival
• Dataset Selection for Archival (we are not going to archive everything)
• Data Archiving Stage: 2 year
• Email Record Retention: 7 years
• Social Media Data Retention: 5 years
• Data Backup
• Data Backup Interval: 24 hr
• Total Data Backup History Retention: 5 (keep last 5 backups)

GDPR Compliances Plan
• Data Control
• Data Security
• Right to Erasure
• Risk Mitigation and Due Diligence
• Breach Notification
Responsibility of meeting GDPR compliances is of Chief Data Officer

Customer Feedback Analytics for Starbucks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Customer Feedback Analytics for Starbucks

Similar to Customer Feedback Analytics for Starbucks (20)

More from Nishant Gandhi

More from Nishant Gandhi (8)

Recently uploaded

Recently uploaded (20)

Customer Feedback Analytics for Starbucks