SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
BIG DATA ANALYTICS FOR
REAL TIME SYSTEMS
Kamalika Dutta Manasi Jayapal
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
2 Big Data Analytics for Real Time Systems
Overview
3 Big Data Analytics for Real Time Systems
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
Where does Big Data come from?
4 Big Data Analytics for Real Time Systems
Courtesy: http://goo.gl/JWswfj
What makes it Big Data?
5 Big Data Analytics for Real Time Systems
Courtesy: Oracle
VARIABILITY
Evolution of Big Data
6 Big Data Analytics for Real Time Systems
1960s 1967
Automatic Data
Compression
1997
Information Explosion
Our Literature Survey!
Overview
7 Big Data Analytics for Real Time Systems
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
Big Data Analytics
“Big data analytics is the process of examining large data sets to
uncover hidden patterns, unknown correlations, market trends,
customer preferences and other useful business information.“
8 Big Data Analytics for Real Time Systems
 Predictive Analysis
 Text Analysis
 Data Mining
 Statistical Analysis
Courtesy: smartdatacollective.com
Sample Systems
9 Big Data Analytics for Real Time Systems
Analytics & 3 V‘s
10 Big Data Analytics for Real Time Systems
Courtesy: watalon.com
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
11 Big Data Analytics for Real Time Systems
Real Time Systems
“A real-time system is one that processes information and produces a
response within a specified time, else risk severe consequences,
sometimes including failure.“
12 Big Data Analytics for Real Time Systems
 Telecommunication
Systems
 Anti-Lock Brakes in a Car
 Air Traffic Control System
 Weather Forecasting
System
Courtesy: yourdon.com
Real-Time Analytics of Big Data
13 Big Data Analytics for Real Time Systems
What is Happening?
Kilobytes/
Sec
Megabytes/
Sec
Gigabytes 
Terabytes
Petabytes 
Exabytes
Seconds Milliseconds Minutes
Minutes 
Hours
Big Data
Real Time
Courtesy: infochimps.com
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
14 Big Data Analytics for Real Time Systems
Challenges of Real Time Analytics
15 Big Data Analytics for Real Time Systems
Expensive
Complex Architecture, Batch Processing
Semi and Unstructured Data: New Sources are unpredictable; Relational
databases are not capable, leaving us hamstrung
Market too Dynamic to Predict: Subscribers preferences change; competition
adds acceleration to it
Scalability: Requires sub-second response times; more than a single server can
handle
Thinking Beyond Hadoop!
16 Big Data Analytics for Real Time Systems
Manage & store huge
volume of any data
Hadoop File System
MapReduce
Manage streaming data Stream Computing
Analyze unstructured data Text Analytics Engine
Data WarehousingStructure and control data
Integrate and govern all
data sources
Integration, Data Quality, Security,
Lifecycle Management, MDM
Understand and navigate
federated big data sources
Federated Discovery and Navigation
Courtesy: IBM
Our Solution
 Do the impossible: Incorporate any kind
of data
 Scale Big: Scale without any complexity
 Not Time Consuming: Seconds to
Minutes
 Real Time: Try to analyze data without
expensive data warehouse loads
17 Big Data Analytics for Real Time Systems
Powerful Analytics, In Place, In Real Time.
Courtesy: slideshare.com
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
18 Big Data Analytics for Real Time Systems
In-Memory Computing
In-memory computing primarily relies on keeping data in a server's RAM as a
means of processing at faster speeds. It uses a type of middleware software that
allows one to store data in RAM, across a cluster of computers, and process it in
parallel.
19 Big Data Analytics for Real Time Systems
Courtesy: Stratecast
Stream Processing
20 Big Data Analytics for Real Time Systems
Courtesy: EMC
 Stream-processing systems operate on continuous data streams e.g., click
streams on web pages, user request/query streams, monitoring events,
notifications, etc.
 Stream processing delivers real-time analytic processing on constantly changing
data in motion.
 Analyse first store later!
Complex Event Processing
Complex Event Processing (CEP) processes multiple event streams generated
within the enterprise to construct data abstraction and identify meaningful
patterns among those streams.
21 Big Data Analytics for Real Time Systems
 Analytics across both real-time and historical data.
 Real-time event capture, filtering, pattern detection, matching, and
aggregation.
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
22 Big Data Analytics for Real Time Systems
Tools for Real Time Analytics
Big Data is NOT new, the Tools ARE!
23 Big Data Analytics for Real Time Systems
IBM InfoSphere Streams
Kafka
 A high performance distributed publish-subscribe messaging system.
 Designed for processing of real time activity stream data.
 Initially developed at LinkedIn, now part of Apache.
 Kafka works in combination with Apache Storm, Apache HBase and Apache
Spark for real-time analysis and rendering of streaming data.
24 Big Data Analytics for Real Time Systems
 Fast
 Scalable
 Durable
 Fault-tolerant
Storm
 A highly distributed real-time computation system.
 Acquired by Twitter.
 Twitter claims, “Over a million tuples processed per second per node.”
 Fast, Scalable, Reliable and Fault-tolerant.
25 Big Data Analytics for Real Time Systems
 Stream: Unbounded
sequence of tuples
 Primitives
 Spouts: Pull messages
 Bolts: Perform core
functions of stream
computing
Stream
Spark Streaming
 Was developed in the AMPLab at
UC Berkeley.
 In-memory computing
capabilities deliver speed.
 Low latency
 High throughput
 Fault tolerant
 New programing model:
 Discretized streams (Dstreams)
 Resilient Distributed Datasets
26 Big Data Analytics for Real Time Systems
Spark Streaming uses micro-batching to support continuous stream processing. It is
an extension of Spark which is a batch-processing system.
Courtesy: Apache Spark
Spring XD (XD=eXtreme Data)
 Spring XD is a unified, distributed, and extensible system for data ingestion, real
time analytics, batch processing, and data export.
 Spring XD framework supports streams for the ingestion of event driven data
from a source to a sink that passes through any number of processors.
27 Big Data Analytics for Real Time Systems
Courtesy: Infoq
Comparison of Tools (1)
Spark Streaming Apache Storm Spring XD
Definition
A fast and general purpose
cluster computing system.
A distributed real-time
computation system.
A unified, distributed, and
extensible system for data
ingestion, real time analytics,
batch processing, and data
export.
Implemented in Scala Clojure Java
Programming API Scala, Java, Python
Java API and usable with any
programing language.
Java
Development A full top level Apache project. Undergoing Apache project. Spring project by Pivotal.
Processing Model
Batch processing framework
that also does micro-batching.
Stream Processing Framework
that processes and dispatches
messages as soon as they
arrive.
Unified platform for stream
processing.
Fault Tolerance
Recovery of lost work and
restart of workers via the
resource manager.
Restart of Workers,
Supervisors like nothing ever
happened.
Reassignment of work to
container working.
28 Big Data Analytics for Real Time Systems
Comparison of Tools (2)
Spark Streaming Apache Storm Spring XD
Data processing
Messages are not lost and
delivered once. (Small-scale
batching)
Keeps track of each and every
record.
Unacknowledged messages
are retried until the
container comes back.
Use Cases
• Combines batch and
stream processing
(Lambda Architecture).
• Machine Learning:
Improve performance of
iterative algorithms
• Power Real-time
Dashboards.
Prevention of:
• securities fraud
• compliance violations
• security breaches
• network outage
• Stream tweets to
Hadoop for sentiment
analysis.
• High throughput
distributed data
ingestion into HDFS from
a variety of input
sources.
• Real-time analytics at
ingestion time, e.g.
gathering metrics and
counting values.
29 Big Data Analytics for Real Time Systems
Which tools are right for you?
30 Big Data Analytics for Real Time Systems
Lambda Architecture
31 Big Data Analytics for Real Time Systems
 In 2013, Nathan Marz and James Warren proposed the Lambda Architecture
that attempts to provide a methodology to build a Big Data system.
 Such a system would balance latency, throughput, and fault-tolerance by
using batch processing to provide comprehensive and accurate pre-computed
views, while simultaneously using real-time stream processing to provide
dynamic views.
Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013.
Courtesy: Trivadis
Lambda Architecture Example
32 Big Data Analytics for Real Time Systems
Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013.
Courtesy: Trivadis
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
33 Big Data Analytics for Real Time Systems
Use Cases
34 Big Data Analytics for Real Time Systems
 Healthcare
 Capture and analyze real-time data from medical monitors,
alerting hospital staff to potential health problems before patients
manifest clinical signs of infection or other issues.
 Analyze privacy-protected streams of medical device data to
detect early signs of disease, identify correlations among multiple
patients.
 Finance
 Analyze ticks, tweets, satellite imagery, weather trends, and any
other type of data to inform trading algorithms in real time.
 Apply fraud insights to take action in real time. Use analytics on
streaming data to confidently differentiate legitimate actions,
while preventing or interrupting suspicious actions and respond
immediately to criminal patterns and activities.
Use Cases
35 Big Data Analytics for Real Time Systems
 Government
 Identify social program fraud within seconds based on program
history, citizen profile, and geospatial data.
 Identify items or patterns for deeper investigation in Cyber-
security.
 Transport
 Traffic managers can now respond quickly and accurately to
relevant insights from real-time analytics drawn from data feeds
and reports.
 Telematics can provide data-in-motion such as vehicle speed, data
relating to the transmission control system, braking, air bags, tire
pressure and wiper speed as well as geospatial and current
environmental conditions data. Hence, automotive companies can
strengthen customer relationships
Use Cases
36 Big Data Analytics for Real Time Systems
 Telecommunication
 Improve customer profitability analysis, end-to-end visibility for
new product rollouts and real-time analysis for better the network
customers.
 Perform capacity planning for mobile networks as new high-
bandwidth services are introduced. Improve customer experience.
 Retail
 See a product recurring in abandoned shopping carts. Run a
promotion to close more sales of that product.
 Evaluate sales performance in real time. Take measures now to
achieve sales quotas.
 An electric coupon delivery service sends e-mails to customers
with recommendations matched to their interests derived from
their location information, membership information, and
information on nearby stores.
37 Big Data Analytics for Real Time Systems
Courtesy: SAP
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
38 Big Data Analytics for Real Time Systems
Future Work
 Increased Level of Merging
 Application of Social and Digital Media
 New Technologies
 Further Development of Telemetric Data
 Self Learning Systems
 Complex Statistical Methods
39 Big Data Analytics for Real Time Systems
Conclusion
40 Big Data Analytics for Real Time Systems
Resources
Privacy Security
TimeCost
“Consumer Data will be the biggest differentiator in the next two to three years.
Whoever unlocks the reams of data and uses it strategically, will win”
-Angela Ahrendts, CEO, Burberry
?
41 Big Data Analytics for Real Time Systems

Contenu connexe

Tendances (20)

Web content mining
Web content miningWeb content mining
Web content mining
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Text MIning
Text MIningText MIning
Text MIning
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
Data ingestion
Data ingestionData ingestion
Data ingestion
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)
 
Data analytics
Data analyticsData analytics
Data analytics
 
Predictive Analytics - An Introduction
Predictive Analytics - An IntroductionPredictive Analytics - An Introduction
Predictive Analytics - An Introduction
 
Text Analytics
Text Analytics Text Analytics
Text Analytics
 

En vedette

2016 IBM Interconnect - medical devices transformation
2016 IBM Interconnect  - medical devices transformation2016 IBM Interconnect  - medical devices transformation
2016 IBM Interconnect - medical devices transformationElizabeth Koumpan
 
Augmented Reality for E-Learning
Augmented Reality for E-LearningAugmented Reality for E-Learning
Augmented Reality for E-LearningKamalika Dutta
 
Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...
Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...
Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...Kamalika Dutta
 
Augmented Reality in Education
Augmented Reality in Education Augmented Reality in Education
Augmented Reality in Education K3 Hamilton
 
Liberating medical device data for clinical research
Liberating medical device data for clinical researchLiberating medical device data for clinical research
Liberating medical device data for clinical researchJohn Zaleski
 
Web, gaming, mobile : quel développeur serez-vous demain ?
Web, gaming, mobile : quel développeur serez-vous demain ?Web, gaming, mobile : quel développeur serez-vous demain ?
Web, gaming, mobile : quel développeur serez-vous demain ?Microsoft
 
Rd big data & analytics v1.0
Rd big data & analytics v1.0Rd big data & analytics v1.0
Rd big data & analytics v1.0Yadu Balehosur
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAccenture
 
Big Data Analytics: Architectural Perspective
Big Data Analytics: Architectural PerspectiveBig Data Analytics: Architectural Perspective
Big Data Analytics: Architectural PerspectiveSumit Kalra
 
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...Enrico Palumbo
 
Use of Chemical Characterization to Assess the Equivalency of Medical Devices...
Use of Chemical Characterization to Assess the Equivalency of Medical Devices...Use of Chemical Characterization to Assess the Equivalency of Medical Devices...
Use of Chemical Characterization to Assess the Equivalency of Medical Devices...NAMSA
 
Meta Analysis of Medical Device Data Applications for Designing Studies and R...
Meta Analysis of Medical Device Data Applications for Designing Studies and R...Meta Analysis of Medical Device Data Applications for Designing Studies and R...
Meta Analysis of Medical Device Data Applications for Designing Studies and R...NAMSA
 
The concept of Datalake with Hadoop
The concept of Datalake with HadoopThe concept of Datalake with Hadoop
The concept of Datalake with HadoopAvkash Chauhan
 
A big-data architecture for real-time analytics
A big-data architecture for real-time analyticsA big-data architecture for real-time analytics
A big-data architecture for real-time analyticsramikaurraminder
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Studies in application of Augmented Reality in E Learning - Design Project 3
Studies in application of Augmented Reality in E Learning - Design Project 3Studies in application of Augmented Reality in E Learning - Design Project 3
Studies in application of Augmented Reality in E Learning - Design Project 3Mannu Amrit
 
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service Stefan Schwarz
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 

En vedette (20)

2016 IBM Interconnect - medical devices transformation
2016 IBM Interconnect  - medical devices transformation2016 IBM Interconnect  - medical devices transformation
2016 IBM Interconnect - medical devices transformation
 
Augmented Reality for E-Learning
Augmented Reality for E-LearningAugmented Reality for E-Learning
Augmented Reality for E-Learning
 
Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...
Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...
Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...
 
Augmented Reality in Education
Augmented Reality in Education Augmented Reality in Education
Augmented Reality in Education
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Curved Arrows Flat
Curved Arrows FlatCurved Arrows Flat
Curved Arrows Flat
 
Liberating medical device data for clinical research
Liberating medical device data for clinical researchLiberating medical device data for clinical research
Liberating medical device data for clinical research
 
Web, gaming, mobile : quel développeur serez-vous demain ?
Web, gaming, mobile : quel développeur serez-vous demain ?Web, gaming, mobile : quel développeur serez-vous demain ?
Web, gaming, mobile : quel développeur serez-vous demain ?
 
Rd big data & analytics v1.0
Rd big data & analytics v1.0Rd big data & analytics v1.0
Rd big data & analytics v1.0
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
 
Big Data Analytics: Architectural Perspective
Big Data Analytics: Architectural PerspectiveBig Data Analytics: Architectural Perspective
Big Data Analytics: Architectural Perspective
 
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
 
Use of Chemical Characterization to Assess the Equivalency of Medical Devices...
Use of Chemical Characterization to Assess the Equivalency of Medical Devices...Use of Chemical Characterization to Assess the Equivalency of Medical Devices...
Use of Chemical Characterization to Assess the Equivalency of Medical Devices...
 
Meta Analysis of Medical Device Data Applications for Designing Studies and R...
Meta Analysis of Medical Device Data Applications for Designing Studies and R...Meta Analysis of Medical Device Data Applications for Designing Studies and R...
Meta Analysis of Medical Device Data Applications for Designing Studies and R...
 
The concept of Datalake with Hadoop
The concept of Datalake with HadoopThe concept of Datalake with Hadoop
The concept of Datalake with Hadoop
 
A big-data architecture for real-time analytics
A big-data architecture for real-time analyticsA big-data architecture for real-time analytics
A big-data architecture for real-time analytics
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Studies in application of Augmented Reality in E Learning - Design Project 3
Studies in application of Augmented Reality in E Learning - Design Project 3Studies in application of Augmented Reality in E Learning - Design Project 3
Studies in application of Augmented Reality in E Learning - Design Project 3
 
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 

Similaire à Big Data Analytics for Real Time Systems

Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analyticsAmazon Web Services
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and OpportunitiesKenny Huang Ph.D.
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and AnalyticsVMware Tanzu
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Stavros Kontopoulos
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 
Stream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White PaperStream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White PaperImpetus Technologies
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Amazon Web Services
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionAmazon Web Services
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightEduardo Castro
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...Grid Dynamics
 

Similaire à Big Data Analytics for Real Time Systems (20)

Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Stream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White PaperStream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White Paper
 
Streaming analytics
Streaming analyticsStreaming analytics
Streaming analytics
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in Action
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
 

Dernier

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Big Data Analytics for Real Time Systems

  • 1. BIG DATA ANALYTICS FOR REAL TIME SYSTEMS Kamalika Dutta Manasi Jayapal
  • 2. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 2 Big Data Analytics for Real Time Systems
  • 3. Overview 3 Big Data Analytics for Real Time Systems  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion
  • 4. Where does Big Data come from? 4 Big Data Analytics for Real Time Systems Courtesy: http://goo.gl/JWswfj
  • 5. What makes it Big Data? 5 Big Data Analytics for Real Time Systems Courtesy: Oracle VARIABILITY
  • 6. Evolution of Big Data 6 Big Data Analytics for Real Time Systems 1960s 1967 Automatic Data Compression 1997 Information Explosion Our Literature Survey!
  • 7. Overview 7 Big Data Analytics for Real Time Systems  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion
  • 8. Big Data Analytics “Big data analytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information.“ 8 Big Data Analytics for Real Time Systems  Predictive Analysis  Text Analysis  Data Mining  Statistical Analysis Courtesy: smartdatacollective.com
  • 9. Sample Systems 9 Big Data Analytics for Real Time Systems
  • 10. Analytics & 3 V‘s 10 Big Data Analytics for Real Time Systems Courtesy: watalon.com
  • 11. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 11 Big Data Analytics for Real Time Systems
  • 12. Real Time Systems “A real-time system is one that processes information and produces a response within a specified time, else risk severe consequences, sometimes including failure.“ 12 Big Data Analytics for Real Time Systems  Telecommunication Systems  Anti-Lock Brakes in a Car  Air Traffic Control System  Weather Forecasting System Courtesy: yourdon.com
  • 13. Real-Time Analytics of Big Data 13 Big Data Analytics for Real Time Systems What is Happening? Kilobytes/ Sec Megabytes/ Sec Gigabytes  Terabytes Petabytes  Exabytes Seconds Milliseconds Minutes Minutes  Hours Big Data Real Time Courtesy: infochimps.com
  • 14. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 14 Big Data Analytics for Real Time Systems
  • 15. Challenges of Real Time Analytics 15 Big Data Analytics for Real Time Systems Expensive Complex Architecture, Batch Processing Semi and Unstructured Data: New Sources are unpredictable; Relational databases are not capable, leaving us hamstrung Market too Dynamic to Predict: Subscribers preferences change; competition adds acceleration to it Scalability: Requires sub-second response times; more than a single server can handle
  • 16. Thinking Beyond Hadoop! 16 Big Data Analytics for Real Time Systems Manage & store huge volume of any data Hadoop File System MapReduce Manage streaming data Stream Computing Analyze unstructured data Text Analytics Engine Data WarehousingStructure and control data Integrate and govern all data sources Integration, Data Quality, Security, Lifecycle Management, MDM Understand and navigate federated big data sources Federated Discovery and Navigation Courtesy: IBM
  • 17. Our Solution  Do the impossible: Incorporate any kind of data  Scale Big: Scale without any complexity  Not Time Consuming: Seconds to Minutes  Real Time: Try to analyze data without expensive data warehouse loads 17 Big Data Analytics for Real Time Systems Powerful Analytics, In Place, In Real Time. Courtesy: slideshare.com
  • 18. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 18 Big Data Analytics for Real Time Systems
  • 19. In-Memory Computing In-memory computing primarily relies on keeping data in a server's RAM as a means of processing at faster speeds. It uses a type of middleware software that allows one to store data in RAM, across a cluster of computers, and process it in parallel. 19 Big Data Analytics for Real Time Systems Courtesy: Stratecast
  • 20. Stream Processing 20 Big Data Analytics for Real Time Systems Courtesy: EMC  Stream-processing systems operate on continuous data streams e.g., click streams on web pages, user request/query streams, monitoring events, notifications, etc.  Stream processing delivers real-time analytic processing on constantly changing data in motion.  Analyse first store later!
  • 21. Complex Event Processing Complex Event Processing (CEP) processes multiple event streams generated within the enterprise to construct data abstraction and identify meaningful patterns among those streams. 21 Big Data Analytics for Real Time Systems  Analytics across both real-time and historical data.  Real-time event capture, filtering, pattern detection, matching, and aggregation.
  • 22. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 22 Big Data Analytics for Real Time Systems
  • 23. Tools for Real Time Analytics Big Data is NOT new, the Tools ARE! 23 Big Data Analytics for Real Time Systems IBM InfoSphere Streams
  • 24. Kafka  A high performance distributed publish-subscribe messaging system.  Designed for processing of real time activity stream data.  Initially developed at LinkedIn, now part of Apache.  Kafka works in combination with Apache Storm, Apache HBase and Apache Spark for real-time analysis and rendering of streaming data. 24 Big Data Analytics for Real Time Systems  Fast  Scalable  Durable  Fault-tolerant
  • 25. Storm  A highly distributed real-time computation system.  Acquired by Twitter.  Twitter claims, “Over a million tuples processed per second per node.”  Fast, Scalable, Reliable and Fault-tolerant. 25 Big Data Analytics for Real Time Systems  Stream: Unbounded sequence of tuples  Primitives  Spouts: Pull messages  Bolts: Perform core functions of stream computing Stream
  • 26. Spark Streaming  Was developed in the AMPLab at UC Berkeley.  In-memory computing capabilities deliver speed.  Low latency  High throughput  Fault tolerant  New programing model:  Discretized streams (Dstreams)  Resilient Distributed Datasets 26 Big Data Analytics for Real Time Systems Spark Streaming uses micro-batching to support continuous stream processing. It is an extension of Spark which is a batch-processing system. Courtesy: Apache Spark
  • 27. Spring XD (XD=eXtreme Data)  Spring XD is a unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export.  Spring XD framework supports streams for the ingestion of event driven data from a source to a sink that passes through any number of processors. 27 Big Data Analytics for Real Time Systems Courtesy: Infoq
  • 28. Comparison of Tools (1) Spark Streaming Apache Storm Spring XD Definition A fast and general purpose cluster computing system. A distributed real-time computation system. A unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export. Implemented in Scala Clojure Java Programming API Scala, Java, Python Java API and usable with any programing language. Java Development A full top level Apache project. Undergoing Apache project. Spring project by Pivotal. Processing Model Batch processing framework that also does micro-batching. Stream Processing Framework that processes and dispatches messages as soon as they arrive. Unified platform for stream processing. Fault Tolerance Recovery of lost work and restart of workers via the resource manager. Restart of Workers, Supervisors like nothing ever happened. Reassignment of work to container working. 28 Big Data Analytics for Real Time Systems
  • 29. Comparison of Tools (2) Spark Streaming Apache Storm Spring XD Data processing Messages are not lost and delivered once. (Small-scale batching) Keeps track of each and every record. Unacknowledged messages are retried until the container comes back. Use Cases • Combines batch and stream processing (Lambda Architecture). • Machine Learning: Improve performance of iterative algorithms • Power Real-time Dashboards. Prevention of: • securities fraud • compliance violations • security breaches • network outage • Stream tweets to Hadoop for sentiment analysis. • High throughput distributed data ingestion into HDFS from a variety of input sources. • Real-time analytics at ingestion time, e.g. gathering metrics and counting values. 29 Big Data Analytics for Real Time Systems
  • 30. Which tools are right for you? 30 Big Data Analytics for Real Time Systems
  • 31. Lambda Architecture 31 Big Data Analytics for Real Time Systems  In 2013, Nathan Marz and James Warren proposed the Lambda Architecture that attempts to provide a methodology to build a Big Data system.  Such a system would balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate pre-computed views, while simultaneously using real-time stream processing to provide dynamic views. Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013. Courtesy: Trivadis
  • 32. Lambda Architecture Example 32 Big Data Analytics for Real Time Systems Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013. Courtesy: Trivadis
  • 33. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 33 Big Data Analytics for Real Time Systems
  • 34. Use Cases 34 Big Data Analytics for Real Time Systems  Healthcare  Capture and analyze real-time data from medical monitors, alerting hospital staff to potential health problems before patients manifest clinical signs of infection or other issues.  Analyze privacy-protected streams of medical device data to detect early signs of disease, identify correlations among multiple patients.  Finance  Analyze ticks, tweets, satellite imagery, weather trends, and any other type of data to inform trading algorithms in real time.  Apply fraud insights to take action in real time. Use analytics on streaming data to confidently differentiate legitimate actions, while preventing or interrupting suspicious actions and respond immediately to criminal patterns and activities.
  • 35. Use Cases 35 Big Data Analytics for Real Time Systems  Government  Identify social program fraud within seconds based on program history, citizen profile, and geospatial data.  Identify items or patterns for deeper investigation in Cyber- security.  Transport  Traffic managers can now respond quickly and accurately to relevant insights from real-time analytics drawn from data feeds and reports.  Telematics can provide data-in-motion such as vehicle speed, data relating to the transmission control system, braking, air bags, tire pressure and wiper speed as well as geospatial and current environmental conditions data. Hence, automotive companies can strengthen customer relationships
  • 36. Use Cases 36 Big Data Analytics for Real Time Systems  Telecommunication  Improve customer profitability analysis, end-to-end visibility for new product rollouts and real-time analysis for better the network customers.  Perform capacity planning for mobile networks as new high- bandwidth services are introduced. Improve customer experience.  Retail  See a product recurring in abandoned shopping carts. Run a promotion to close more sales of that product.  Evaluate sales performance in real time. Take measures now to achieve sales quotas.  An electric coupon delivery service sends e-mails to customers with recommendations matched to their interests derived from their location information, membership information, and information on nearby stores.
  • 37. 37 Big Data Analytics for Real Time Systems Courtesy: SAP
  • 38. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 38 Big Data Analytics for Real Time Systems
  • 39. Future Work  Increased Level of Merging  Application of Social and Digital Media  New Technologies  Further Development of Telemetric Data  Self Learning Systems  Complex Statistical Methods 39 Big Data Analytics for Real Time Systems
  • 40. Conclusion 40 Big Data Analytics for Real Time Systems Resources Privacy Security TimeCost “Consumer Data will be the biggest differentiator in the next two to three years. Whoever unlocks the reams of data and uses it strategically, will win” -Angela Ahrendts, CEO, Burberry ?
  • 41. 41 Big Data Analytics for Real Time Systems