Enabling Big Data Analytics with Modeling Workbench
Authors: Ravishankar Rajagopalan
and Dhanesh Padmanabhan
Data Science Infrastructure (DSI) Team
Data Sciences Group
[24]7 Innovation Labs
Bangalore, India
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Enabling Big Data Analytics with Modeling Workbench
1. Article
Enabling Big Data Analytics with Modeling Workbench
Authors: Ravishankar Rajagopalan
and Dhanesh Padmanabhan
Data Science Infrastructure (DSI) Team
Data Sciences Group
[24]7 Innovation Labs
Bangalore, India
2. The data scientists at DSG are
required to analyze enormous
amounts of data to develop new
insights and models that can
accurately predict customer
intent.
[24]7 Inc accumulates several gigabytes of data from web, mobile, chat and IVR
channels every day. Innovation Labs (iLabs), the technology division of [24]7,
provides predictive analytics solutions to improve customer experience. Data
Sciences Group (DSG) of the iLabs is primarily responsible for developing
statistical and machine learning models that predict customer intent. These
models are used to offer contextual chat, self-serve application on the web
channel or contextual IVR menu on the IVR channel, driving down the time
required for a customer to locate the information they are seeking, thereby
improving the overall experience.
The data scientists at DSG are required to analyze enormous amounts of data
to develop new insights and models that can accurately predict customer intent.
There is also a constant need to improve the models due to evolving customer
behavior and changing business landscape of our customers, which requires
continual monitoring of models and model updates. The Data Science
Infrastructure (DSI) team is primarily responsible for building scalable analytics
products to equip the data scientists with tools to quickly analyze data, develop
models and monitor performance of models. Modeling Workbench is one such
tool developed by DSI.
Workbench is a web-based tool
for the data scientists to analyze
millions of online customer
journeys
What is the Modeling Workbench?
Modeling Workbench is one of the products DSI conceptualized and developed
in collaboration with the Platform Engineering (PE) team of iLabs and currently
being piloted for the web channel. Workbench is a web-based tool for the data
scientists to analyze millions of online customer journeys and develop quick
insights and build models at scale for improved online predictive targeting.
Workbench is expected to support Exploratory Data Analysis (EDA), Model
building/Validation and Simulation. Model deployment and model monitoring
are supported by other internal tools developed at iLabs. The feedback from the
production systems drives the model improvements.
Development
Production
Model
Building
Exploratory
Data
Analysis
Big Data
Model
Deployment
Model
Validation
Model
Monitoring
Model
Simulation
Modeling Life Cycle
Follow [24]7 India
www.247-inc.com
3. EDA is the process of using standardized statistical procedures such as
univariate and bivariate analysis to extract variables (features) of interest for the
problem at hand (predict online user’s purchase intent), which are then
subsequently used for model building. Model building and validation involves
implementing several advanced statistical/machine learning algorithms and
picking the best performing model. Simulation is used for understanding the
dynamics of the model in real time. These phases are iterative and a data
scientist typically goes through several iterations to identify the most effective
model.
Being highly scalable, the
workbench could be used to
analyze 100+ million customer
journeys in a few minutes.
The workbench provides customized data analytics functionalities at the click of
a button and it is expected to save considerable time and effort for the data
scientists. Being highly scalable, the workbench could be used to analyze 100+
million customer journeys in a few minutes. In addition, the workbench also
incorporates best practices to be adopted during different phases of modeling
and also facilitates standardization of analyses across DSG.
Productivity
Reduce time to analyze data and build models by
50-75%
Scalability
Provide ability to build and simulate models with
millions of customer journeys in a few minutes
Standardization
Standardize model building and analysis
Benefits of Modeling Workbench
What is the Technology behind the Workbench?
Data scientists at [24]7 in the past have traditionally used relational databases
in conjunction with statistical modeling and data mining software such as R and
Python for analyzing data. The process in the past involved writing custom SQL
scripts on relational databases to prepare the datasets and moving this
prepared datasets to other computing infrastructure where R and Python scripts
were used for analysis and model building. This traditional approach severely
limits the size of data one could analyze since most statistical modeling
software is memory dependent.
Follow [24]7 India
www.247-inc.com
4. Columnar DB
Weblogs
Big Data Stack
Workbench Backend
Java Front End
Data Scientists
The Modeling Workbench Architecture
The tight integration of R and
columnar database technology
allows
for
scalable
data
analytics
The workbench solves these issues by connecting users through a central
web-based application to an analytical database, which is based on a
distributed columnar database technology. The workbench exposes a standard
set of analyses that execute as server-side SQL or R scripts running directly on
the columnar database. The tight integration of R and columnar database
technology allows for scalable data analytics without the need for data
movement.
The distributed columnar database obtains the data from Hive tables where
weblogs are being transformed on a daily basis using Python Map-reduce
scripts within Hive. The workbench itself is a Java-based web application that
accesses the data from the distributed columnar database remotely. The
analyses performed by data scientists are cached in an application database
powered by Mongo DB, which ensures quick retrieval of results from
previously-saved analysis. The saved analyses are shareable across the team
for effective collaboration.
expected to include natural
language processing, text and
speech analytics
Modeling workbench provides a scalable analytics platform for quickly
crunching data, generating useful insights, and building advanced statistical &
machine learning models. The current version supports the analysis of web
channel data. Future versions are expected to include natural language
processing, text and speech analytics for data obtained from [24]7’s chat and
IVR platforms.
About the Authors
Dhanesh Padmanabhan leads the Data Science Infrastructure team with the
[24]7 Data Sciences Group (DSG). He holds the responsibilities of developing
the analytics infrastructure and the prediction platform for DSG. He has 10
years of experience in marketing analytics in R&D, KPO and Consulting
companies including General Motors R&D, HP Analytics and Marketics
Technologies (now WNS). He holds a Ph.D. in Mechanical Engineering from
the University of Notre Dame.
Ravishankar Rajagopalan is a Principal Analytics Consultant in the [24]7 Data
Science Infrastructure (DSI) team. He is the DSG (Data Sciences Group) lead
for the modeling workbench project. Prior to [24]7, he had worked with GE
Power and Water as part of their Advanced Analytics team and Mu Sigma. He
holds a Ph.D. in Applied Statistics from The Ohio State University.
Follow [24]7 India
www.247-inc.com