Trends in Data Analytics
Ramakrishnan Venkataramanan
sv.ramakrishnan@gmail.com
Introduction
• Big Data may well be the Next Big Thing in the IT world.
• Big data burst upon the scene in the first decade of the 21st century.
• The first organizations to embrace it were online and startup firms.
Firms like Google, eBay, LinkedIn, and Facebook were built around big
data from the beginning.
• Like many new information technologies, big data can bring about
dramatic cost reductions, substantial improvements in the time
required to perform a computing task, or new product and service
offerings.
• ‘Big Data’ is similar to ‘small data’, but bigger in size
• but having data bigger it requires different approaches:
• Techniques, tools and architecture
• an aim to solve new problems or old problems in a better way
• Big Data generates value from the storage and processing of very
large quantities of digital information that cannot be analyzed with
traditional computing techniques.
What is BIG DATA?
DATA Analytics
SAS/BI tools
only in
pockets
Structured
data
Private data
center and
traditional
DWH/HPCE
How is this different?
Very high on cost
and years to
setup and stable
Only top vendors
leading the
innovation
BIG DATA Analytics
R/Python
– across
enterprise
IOT
Cloud+
Hadoop
How is this different?
Low on cost and
quick to setup
Open source is
leading the
innovation and
many libraries for
users
How Is Big Data Different?
1) Automatically generated by a machine
(e.g. Sensor embedded in an engine)
2) Typically an entirely new source of data
(e.g. Use of the internet)
3) Not designed to be friendly
(e.g. Text streams)
4) May not have much values
• Need to focus on the important part 12
• Where processing is hosted?
• Distributed Servers / Cloud (e.g. Amazon EC2)
• Where data is stored?
• Distributed Storage (e.g. Amazon S3)
• What is the programming model?
• Distributed Processing (e.g. MapReduce)
• How data is stored & indexed?
• High-performance schema-free databases (e.g. MongoDB)
• What operations are performed on data?
• Analytic / Semantic Processing
Types of tools used in Big-Data
Data Analytics in Match Making
• Fraud analytics
• Payment analytics
• Campaign analytics
• ML match making
• Compliant analytics
• Customer analytics
• Chat bots
Data Analytics in Auto Industry
• Big Data applied in three areas of operations
• Design
• Manufacturing
• After-sales Support
• Simulation of each engine generates 10’s of TB of data
• Based on historical performances of engines, each simulation helps identify if the
particular simulated engine would be a successful one
• For after-sales support, engines and propulsion systems transmit Gigabytes
of data in real-time to support engineers who decide the best course of
action
• This also helps in the support personnel identifying the conditions for
maintenance in advance, based on the factors and environments under
which the engines have been functioning
Data Analytics in Retail
• Widely used application of analytics is to predict store-wide sales for each
product on a weekly basis
• Clustering and segmentation of products help understand the joint-behavior of
products and the way customers purchase the clusters
• This helps in placing for orders for replenishment of products well in advance and in the most
economical and intelligent way
• Application of In-Database Analytics
• Deploying analytics where the data is stored rather than moving data for external analytics
• Moving towards a Hadoop-framework Datalake model in a cloud based
repository
• Data scientists and business heads of the organization all around the world can have access
to data
• Recent addition is the usage of sensor data of electronic goods to predict when
replacement parts or servicing would be needed
Data Analytics in Healthcare
• Creates clinical knowledge from digitized medical records to improve
healthcare decision making
• Healthcare that is predominantly unstructured requires advanced
analytical algorithms to generate insights out of such data
• product can read each patient’s health chart which otherwise would
require hours of a well trained expert
• Develops insights about the chances of the patient falling ill to a specific disease
• Creates a medical history of the patient for the insurance providers to know more
about the insured
• With the EMR (electronic medical record) data of each patient the
medication history is also taken into account for any future prescriptions
Data Analytics in Insurance
• To price automotive insurance products more than 30 variables are taken into
account when done manually
• Driver’s age, miles driven, gender, driving record
• Since this process wasn’t giving an accurate picture of the potential claims, credit
score, reputational data etc. were also included which led to upward of 1000
variables
• Application of parallel computing and statistical learning techniques helped understand the
impact of each variable better and price products appropriately
• Marketing mix algorithms that learn continuously are applied to determine the
right marketing channel
• Propensity uplift models are applied to help increase profitable acquisitions
• Risk-adjustment based predictive models to underwrite, predict fraud and lapse
Risks of Big Data
• Will be so overwhelmed
• Need the right people and solve the right problems
• Costs escalate too fast
• Isn’t necessary to capture 100%
• Many sources of big data
is privacy
• self-regulation
• Legal regulation
22
References
• www.Slideshare.com
• www.wikipedia.com
• www.computereducation.org
• https://aws.amazon.com/
• Deloitte – Analytics trend , the next evolution
• Acknowledgement:
• Abhimanyu Verma – Head real world evidences and Data to insights Novartis Pharma
AG
• Kiran Vijay Kumar – Head Information security and Enterprise architecture
Matrimony.com
• Arun Chakravarthy – Senior Data scientist Matimony.com