SlideShare une entreprise Scribd logo
1  sur  4
Télécharger pour lire hors ligne
Article

Enabling Big Data Analytics with Modeling Workbench
Authors: Ravishankar Rajagopalan
and Dhanesh Padmanabhan
Data Science Infrastructure (DSI) Team
Data Sciences Group
[24]7 Innovation Labs
Bangalore, India
The data scientists at DSG are
required to analyze enormous
amounts of data to develop new
insights and models that can
accurately predict customer
intent.

[24]7 Inc accumulates several gigabytes of data from web, mobile, chat and IVR
channels every day. Innovation Labs (iLabs), the technology division of [24]7,
provides predictive analytics solutions to improve customer experience. Data
Sciences Group (DSG) of the iLabs is primarily responsible for developing
statistical and machine learning models that predict customer intent. These
models are used to offer contextual chat, self-serve application on the web
channel or contextual IVR menu on the IVR channel, driving down the time
required for a customer to locate the information they are seeking, thereby
improving the overall experience.
The data scientists at DSG are required to analyze enormous amounts of data
to develop new insights and models that can accurately predict customer intent.
There is also a constant need to improve the models due to evolving customer
behavior and changing business landscape of our customers, which requires
continual monitoring of models and model updates. The Data Science
Infrastructure (DSI) team is primarily responsible for building scalable analytics
products to equip the data scientists with tools to quickly analyze data, develop
models and monitor performance of models. Modeling Workbench is one such
tool developed by DSI.

Workbench is a web-based tool
for the data scientists to analyze
millions of online customer
journeys

What is the Modeling Workbench?
Modeling Workbench is one of the products DSI conceptualized and developed
in collaboration with the Platform Engineering (PE) team of iLabs and currently
being piloted for the web channel. Workbench is a web-based tool for the data
scientists to analyze millions of online customer journeys and develop quick
insights and build models at scale for improved online predictive targeting.
Workbench is expected to support Exploratory Data Analysis (EDA), Model
building/Validation and Simulation. Model deployment and model monitoring
are supported by other internal tools developed at iLabs. The feedback from the
production systems drives the model improvements.

Development

Production
Model
Building

Exploratory
Data
Analysis

Big Data

Model
Deployment

Model
Validation

Model
Monitoring

Model
Simulation

Modeling Life Cycle

Follow [24]7 India
www.247-inc.com
EDA is the process of using standardized statistical procedures such as
univariate and bivariate analysis to extract variables (features) of interest for the
problem at hand (predict online user’s purchase intent), which are then
subsequently used for model building. Model building and validation involves
implementing several advanced statistical/machine learning algorithms and
picking the best performing model. Simulation is used for understanding the
dynamics of the model in real time. These phases are iterative and a data
scientist typically goes through several iterations to identify the most effective
model.
Being highly scalable, the
workbench could be used to
analyze 100+ million customer
journeys in a few minutes.

The workbench provides customized data analytics functionalities at the click of
a button and it is expected to save considerable time and effort for the data
scientists. Being highly scalable, the workbench could be used to analyze 100+
million customer journeys in a few minutes. In addition, the workbench also
incorporates best practices to be adopted during different phases of modeling
and also facilitates standardization of analyses across DSG.

Productivity
Reduce time to analyze data and build models by
50-75%

Scalability
Provide ability to build and simulate models with
millions of customer journeys in a few minutes

Standardization
Standardize model building and analysis

Benefits of Modeling Workbench

What is the Technology behind the Workbench?
Data scientists at [24]7 in the past have traditionally used relational databases
in conjunction with statistical modeling and data mining software such as R and
Python for analyzing data. The process in the past involved writing custom SQL
scripts on relational databases to prepare the datasets and moving this
prepared datasets to other computing infrastructure where R and Python scripts
were used for analysis and model building. This traditional approach severely
limits the size of data one could analyze since most statistical modeling
software is memory dependent.

Follow [24]7 India
www.247-inc.com
Columnar DB

Weblogs

Big Data Stack

Workbench Backend

Java Front End

Data Scientists

The Modeling Workbench Architecture
The tight integration of R and
columnar database technology
allows
for
scalable
data
analytics

The workbench solves these issues by connecting users through a central
web-based application to an analytical database, which is based on a
distributed columnar database technology. The workbench exposes a standard
set of analyses that execute as server-side SQL or R scripts running directly on
the columnar database. The tight integration of R and columnar database
technology allows for scalable data analytics without the need for data
movement.
The distributed columnar database obtains the data from Hive tables where
weblogs are being transformed on a daily basis using Python Map-reduce
scripts within Hive. The workbench itself is a Java-based web application that
accesses the data from the distributed columnar database remotely. The
analyses performed by data scientists are cached in an application database
powered by Mongo DB, which ensures quick retrieval of results from
previously-saved analysis. The saved analyses are shareable across the team
for effective collaboration.

expected to include natural
language processing, text and
speech analytics

Modeling workbench provides a scalable analytics platform for quickly
crunching data, generating useful insights, and building advanced statistical &
machine learning models. The current version supports the analysis of web
channel data. Future versions are expected to include natural language
processing, text and speech analytics for data obtained from [24]7’s chat and
IVR platforms.

About the Authors
Dhanesh Padmanabhan leads the Data Science Infrastructure team with the
[24]7 Data Sciences Group (DSG). He holds the responsibilities of developing
the analytics infrastructure and the prediction platform for DSG. He has 10
years of experience in marketing analytics in R&D, KPO and Consulting
companies including General Motors R&D, HP Analytics and Marketics
Technologies (now WNS). He holds a Ph.D. in Mechanical Engineering from
the University of Notre Dame.
Ravishankar Rajagopalan is a Principal Analytics Consultant in the [24]7 Data
Science Infrastructure (DSI) team. He is the DSG (Data Sciences Group) lead
for the modeling workbench project. Prior to [24]7, he had worked with GE
Power and Water as part of their Advanced Analytics team and Mu Sigma. He
holds a Ph.D. in Applied Statistics from The Ohio State University.

Follow [24]7 India
www.247-inc.com

Contenu connexe

Dernier

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

En vedette

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

En vedette (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Enabling Big Data Analytics with Modeling Workbench

  • 1. Article Enabling Big Data Analytics with Modeling Workbench Authors: Ravishankar Rajagopalan and Dhanesh Padmanabhan Data Science Infrastructure (DSI) Team Data Sciences Group [24]7 Innovation Labs Bangalore, India
  • 2. The data scientists at DSG are required to analyze enormous amounts of data to develop new insights and models that can accurately predict customer intent. [24]7 Inc accumulates several gigabytes of data from web, mobile, chat and IVR channels every day. Innovation Labs (iLabs), the technology division of [24]7, provides predictive analytics solutions to improve customer experience. Data Sciences Group (DSG) of the iLabs is primarily responsible for developing statistical and machine learning models that predict customer intent. These models are used to offer contextual chat, self-serve application on the web channel or contextual IVR menu on the IVR channel, driving down the time required for a customer to locate the information they are seeking, thereby improving the overall experience. The data scientists at DSG are required to analyze enormous amounts of data to develop new insights and models that can accurately predict customer intent. There is also a constant need to improve the models due to evolving customer behavior and changing business landscape of our customers, which requires continual monitoring of models and model updates. The Data Science Infrastructure (DSI) team is primarily responsible for building scalable analytics products to equip the data scientists with tools to quickly analyze data, develop models and monitor performance of models. Modeling Workbench is one such tool developed by DSI. Workbench is a web-based tool for the data scientists to analyze millions of online customer journeys What is the Modeling Workbench? Modeling Workbench is one of the products DSI conceptualized and developed in collaboration with the Platform Engineering (PE) team of iLabs and currently being piloted for the web channel. Workbench is a web-based tool for the data scientists to analyze millions of online customer journeys and develop quick insights and build models at scale for improved online predictive targeting. Workbench is expected to support Exploratory Data Analysis (EDA), Model building/Validation and Simulation. Model deployment and model monitoring are supported by other internal tools developed at iLabs. The feedback from the production systems drives the model improvements. Development Production Model Building Exploratory Data Analysis Big Data Model Deployment Model Validation Model Monitoring Model Simulation Modeling Life Cycle Follow [24]7 India www.247-inc.com
  • 3. EDA is the process of using standardized statistical procedures such as univariate and bivariate analysis to extract variables (features) of interest for the problem at hand (predict online user’s purchase intent), which are then subsequently used for model building. Model building and validation involves implementing several advanced statistical/machine learning algorithms and picking the best performing model. Simulation is used for understanding the dynamics of the model in real time. These phases are iterative and a data scientist typically goes through several iterations to identify the most effective model. Being highly scalable, the workbench could be used to analyze 100+ million customer journeys in a few minutes. The workbench provides customized data analytics functionalities at the click of a button and it is expected to save considerable time and effort for the data scientists. Being highly scalable, the workbench could be used to analyze 100+ million customer journeys in a few minutes. In addition, the workbench also incorporates best practices to be adopted during different phases of modeling and also facilitates standardization of analyses across DSG. Productivity Reduce time to analyze data and build models by 50-75% Scalability Provide ability to build and simulate models with millions of customer journeys in a few minutes Standardization Standardize model building and analysis Benefits of Modeling Workbench What is the Technology behind the Workbench? Data scientists at [24]7 in the past have traditionally used relational databases in conjunction with statistical modeling and data mining software such as R and Python for analyzing data. The process in the past involved writing custom SQL scripts on relational databases to prepare the datasets and moving this prepared datasets to other computing infrastructure where R and Python scripts were used for analysis and model building. This traditional approach severely limits the size of data one could analyze since most statistical modeling software is memory dependent. Follow [24]7 India www.247-inc.com
  • 4. Columnar DB Weblogs Big Data Stack Workbench Backend Java Front End Data Scientists The Modeling Workbench Architecture The tight integration of R and columnar database technology allows for scalable data analytics The workbench solves these issues by connecting users through a central web-based application to an analytical database, which is based on a distributed columnar database technology. The workbench exposes a standard set of analyses that execute as server-side SQL or R scripts running directly on the columnar database. The tight integration of R and columnar database technology allows for scalable data analytics without the need for data movement. The distributed columnar database obtains the data from Hive tables where weblogs are being transformed on a daily basis using Python Map-reduce scripts within Hive. The workbench itself is a Java-based web application that accesses the data from the distributed columnar database remotely. The analyses performed by data scientists are cached in an application database powered by Mongo DB, which ensures quick retrieval of results from previously-saved analysis. The saved analyses are shareable across the team for effective collaboration. expected to include natural language processing, text and speech analytics Modeling workbench provides a scalable analytics platform for quickly crunching data, generating useful insights, and building advanced statistical & machine learning models. The current version supports the analysis of web channel data. Future versions are expected to include natural language processing, text and speech analytics for data obtained from [24]7’s chat and IVR platforms. About the Authors Dhanesh Padmanabhan leads the Data Science Infrastructure team with the [24]7 Data Sciences Group (DSG). He holds the responsibilities of developing the analytics infrastructure and the prediction platform for DSG. He has 10 years of experience in marketing analytics in R&D, KPO and Consulting companies including General Motors R&D, HP Analytics and Marketics Technologies (now WNS). He holds a Ph.D. in Mechanical Engineering from the University of Notre Dame. Ravishankar Rajagopalan is a Principal Analytics Consultant in the [24]7 Data Science Infrastructure (DSI) team. He is the DSG (Data Sciences Group) lead for the modeling workbench project. Prior to [24]7, he had worked with GE Power and Water as part of their Advanced Analytics team and Mu Sigma. He holds a Ph.D. in Applied Statistics from The Ohio State University. Follow [24]7 India www.247-inc.com