SlideShare a Scribd company logo
1 of 33
Download to read offline
Implementing Operational Intelligence
Using
In-Memory Computing
William L. Bain (wbain@scaleoutsoftware.com)
June 29, 2015
Agenda
• What is Operational Intelligence?
• Example: Tracking Set-Top Boxes
• Using an In-Memory Data Grid (IMDG) for Operational Intelligence
• Tracking and analyzing live data
• Comparison to Spark
• Implementing OI Using Data-Parallel Computing in an IMDG
• A Detailed OI Example in Financial Services
• Code Samples in Java
• Implementing MapReduce on an IMDG
• Optimizing MapReduce for OI
• Integrating Operational and Business Intelligence
© ScaleOut Software, Inc. 2
• Develops and markets In-Memory Data Grids, software middleware for:
• Scaling application performance and
• Providing operational intelligence using
• In-memory data storage and computing
• Dr. William Bain, Founder & CEO
• Career focused on parallel computing – Bell Labs, Intel, Microsoft
• 3 prior start-ups, last acquired by Microsoft and product now ships as Network Load
Balancing in Windows Server
• Ten years in the market; 400+ customers, 10,000+ servers
• Sample customers:
About ScaleOut Software
ScaleOut Software’s Product Portfolio
• ScaleOut StateServer® (SOSS)
• In-Memory Data Grid for Windows and Linux
• Scales application performance
• Industry-leading performance and ease of use
• ScaleOut ComputeServer™ adds
• Operational intelligence for “live” data
• Comprehensive management tools
• ScaleOut hServer®
• Full Hadoop Map/Reduce engine (>40X faster*)
• Hadoop Map/Reduce on live, in-memory data
• ScaleOut GeoServer®
• WAN based data replication for DR
• Global data access and synchronization
© ScaleOut Software, Inc. 4
ScaleOut StateServer In-Memory Data Grid
Grid
Service
Grid
Service
Grid
Service
Grid
Service
*in benchmark testing
In-Memory Computing Is Not New
• 1980’s: SIMD Systems, Caltech Cosmic Cube
Thinking Machines
Connection Machine 5
• 1990’s: Commercial Parallel Supercomputers
Intel
IPSC-2
IBM
SP1
What’s New: IMC on Commodity Hardware
• 1990’s – early 2000’s: HPC on Clusters
• Since ~2005: Public Clouds
HP
Blade
Servers
Amazon EC2,
Windows Azure
Introductory Video:
What is Operational Intelligence
https://www.youtube.com/watch?v=H6OFzdIEy-g&feature=youtu.be
Online Systems Need Operational Intelligence
Goal: Provide immediate (sub-second) feedback to a system handling live data.
© ScaleOut Software, Inc. 8
A few example use cases requiring immediate feedback
within a live system:
• Ecommerce: personalized, real-time recommendations
• Healthcare: patient monitoring, predictive treatment
• Equity trading: minimize risk during a trading day
• Reservations systems: identify issues, reroute, etc.
• Credit cards & wire transfers: detect fraud in real time
• IoT, Smart grids: optimize power distribution & detect
issues
Operational vs Business Intelligence
Batch
Static data sets
Petabytes
Disk storage
Minutes to hours
Best uses:
• Analyzing warehoused data
• Mining for long-term trends
Real-time
Live data sets
Gigabytes to terabytes
In-memory storage
Sub-second to seconds
Best uses:
• Tracking live data
• Immediately identifying trends
and capturing opportunities
• Providing immediate feedback
Operational Intelligence Business Intelligence
Big Data Analytics
IMDGs
CEP
Storm
Hadoop
Spark
Hana
OI BI
© ScaleOut Software, Inc. 9
Example: Enhancing Cable TV Experience
• Goals:
• Make real-time, personalized upsell offers
• Immediately respond to service issues
• Detect and manage network hot spots
• Track aggregate behavior to identify patterns, e.g.:
• Total instantaneous incoming event rate
• Most popular programs and # viewers by zip code
• Requirements:
• Track events from 10M set-top boxes with 25K events/sec (2.2B/day)
• Correlate, cleanse, and enrich events per rules (e.g. ignore fast channel switches, match
channels to programs)
• Be able to feed enriched events to recommendation engine within 5 seconds
• Immediately examine any set-top box (e.g., box status) & track aggregate statistics
© ScaleOut Software, Inc. 10
©2011 Tammy Bruce presents LiveWire
Based on a simulated workload for San Diego
metropolitan area:
• Continuously correlates and cleanses telemetry
from 10M simulated set-top boxes (from
synthetic load generator)
• Processes more than 30K events/second
• Enriches events with program information every
second
• Tracks aggregate statistics (e.g., top 10 programs
by zip code) every 10 seconds
The Result: An OI Platform
Real-Time Dashboard
© ScaleOut Software, Inc. 11
Using an IMDG to Implement OI
• IMDG models and tracks the state of a “live” system.
• IMDG analyzes the system’s state in parallel and provides real-time feedback.
© ScaleOut Software, Inc. 12
IMDG analyzes in-memory
data with integrated
compute engine.
IMDG tracks live system’s
state with an in-memory,
object-oriented model.
IMDG enriches in-memory
model from disk-based,
historical data.
• Each set-top box is represented as an object in the IMDG
• Object holds raw & enriched event streams, viewer parameters, and statistics
Example: Tracking Set-TopBoxes
© ScaleOut Software, Inc. 13
• IMDG captures incoming events by
updating objects
• IMDG uses data-parallel computation
to:
• immediately enrich box objects to
generate alerts to recommendation
engine, and
• continuously collect and report
global statistics
The Foundation: In-Memory Data Grids
• In-memory data grid (IMDG) provides scalable, hi av storage for live data:
• Designed to manage business logic state:
• Object-oriented collections by type
• Create/read/update/delete APIs for Java/C#/C++
• Parallel query by object properties
• Data shared by multiple clients
• Designed for transparent scalability and high availability:
• Automatic load-balancing across commodity servers
• Automatic data replication, failure detection, and
recovery
• IMDGs provide ideal platform for operational
intelligence:
• Easy to track live systems with large workloads
• Appropriate availability model for production deployments
© ScaleOut Software, Inc. 14
Comparing IMDGs to Spark
• On the surface, both are surprisingly similar:
• Both designed as scalable, in-memory computing platforms
• Both implement data-parallel operators
• Both can handle streaming data
• But there are key differences that
impact use for operational intelligence:
© ScaleOut Software, Inc. 15
IMDGs Spark
Best use Live, operational data Static data or batched streams
In-memory model Object-oriented collections Resilient distributed datasets
Focus of APIs CRUD, eventing, data-parallel
computing
Data-parallel operators for
analytics
High availability tradeoffs Data replication for fast
recovery
Lineage for max performance
Data-Parallel Computing on an IMDG
• IMDGs provide powerful, cost-effective platform for data-parallel computing:
• Enable integrated computing with data storage:
• Take advantage of cluster’s commodity servers and cores.
• Avoid delays due to data motion (both to/from disk and across network).
• Leverage object-oriented model to minimize development effort:
• Easily define data-parallel tasks as class methods.
• Easily specify domain as object collection.
• Example: “Parallel Method Invocation” (PMI):
• Object-oriented version of standard HPC model
• Runs class methods in parallel across cluster.
• Selects objects using parallel query of obj. collection.
• Serves as a platform for implementing MapReduce
and other data-parallel operators
© ScaleOut Software, Inc. 16
Analyze Data
(Eval)
Combine Results
(Merge)
PMI Example: OI in Financial Services
• Goal: track market price fluctuations for a hedge fund and keep portfolios in balance.
• How:
• Keep portfolios of stocks (long and short positions)
in object collection within IMDG.
• Collect market price changes in
one-second snapshots.
• Define a method which applies a
snapshot to a portfolio and optionally
generates an alert to rebalance.
• Perform repeated parallel method invocations
on a selected (i.e., queried) set of portfolios.
• Combine alerts in parallel using a second user-defined
method.
• Report alerts to UI every second for fund manager.
© ScaleOut Software, Inc. 17
Defining the Dataset
• Simplified example of a portfolio class (Java):
• Note: some properties are made query-able.
• Note: the evalPositions method analyzes the portfolio for a market snapshot.
© ScaleOut Software, Inc. 18
public class Portfolio {
private long id;
private Set<Stock> longPositions;
private Set<Stock> shortPositions;
private double totalValue;
private Region region;
private boolean alerted; // alert for trading
@SossIndexAttribute // query-able property
public double getTotalValue() {…}
@SossIndexAttribute // query-able property
public Region getRegion() {…}
public Set<Long> evalPositions(MarketSnapshot ms) {…};
}
Defining the Parallel Methods
• Implement PMI interface to define methods for analyzing each object and for merging
the results:
© ScaleOut Software, Inc. 19
public class PortfolioAnalysis implements
Invokable<Portfolio, MarketSnapshot, Set<Long>>
{
public Set<Long> eval(Portfolio p, MarketSnapshot ms)
throws InvokeException {
// update portfolio and return id if alerted:
return p.evalPositions(ms);
}
public Set<Long> merge(Set<Long> set1, Set<Long> set2)
throws InvokeException {
set1.addAll(set2);
return set1; // merged set of alerted portfolio ids
}}
Running the Analysis
• PMI can be run from a remote workstation.
• IMDG ships code and libraries
to cluster of servers:
• Execution environment can be
pre-staged for fast startup.
• In-line execution minimizes
scheduling time.
• Avoids batch scheduling delays.
• PMI automatically runs in parallel
across all grid servers:
• Uses software multicast to accelerate startup.
• Passes market snapshot parameter to all servers.
• Uses all servers and cores to maximize throughput.
© ScaleOut Software, Inc. 20
Spawning the Compute Engine
• First obtain a reference to the IMDG’s object collection of portfolios:
• Create an “invocation grid,” a re-usable compute engine for the application:
• Spawns a JVM on all grid servers and connects them to the in-memory data grid.
• Stages the application code on all JVMs.
• Associates the invocation grid with an object collection.
© ScaleOut Software, Inc. 21
InvocationGrid grid = new InvocationGridBuilder("grid")
.addClass(DependencyClass.class)
.addJar("/path/to/dependency.jar")
.setJVMParameters("-Xmx2m")
.load();
pset.setInvocationGrid(grid);
NamedCache pset = CacheFactory.getCache(“portfolios");
Invoking the PMI
• Run the PMI on a queried set of objects within the collection:
• Multicasts the invocation and parameters to all JVMs.
• Runs the data-parallel computation.
• Merges the results and returns a final result to the point of call.
© ScaleOut Software, Inc. 22
InvokeResult alertedPortolios = pset.invoke(
PortfolioAnalysis.class,
Portfolio.class,
and(greaterThan(“totalValue”, 1000000), // query spec
equals(“region”, Region.US)),
marketSnapshot, // parameters
...
);
System.out.println("The alerted portfolios are" +
alertedPortfolios.getResult());
Execution Steps
• Eval phase: each server queries local
objects and runs eval and merge methods:
• Note: Accessing local data avoids
networking overhead.
• Completes with one result object per
server.
© ScaleOut Software, Inc. 23
• Merge phase: all servers perform
distributed merge to create final result:
• Merge runs in parallel to minimize
completion time.
• Returns final result object to client.
Importance of Avoiding Data Motion
• Local data access
enables linear
throughput scaling.
• Network access creates
a bottleneck that limits
throughput.
© ScaleOut Software, Inc. 24
Outputting Continuous Alerts to the UI
• PMI runs every second; it completes in 350 msec. and immediately refreshes UI.
• UI alerts trader to portfolios
that need rebalancing.
• UI allows trader to examine
portfolio details and determine
specific positions that are out
of balance.
• Result: in-memory computing
delivers operational
intelligence.
© ScaleOut Software, Inc. 25
Demonstration Video:
Comparison of PMI to Apache Hadoop
https://www.youtube.com/watch?v=8JTsqp_-Gnw
PMI Scales for Large In-Memory Datasets
• Measured a similar financial services application (back testing stock trading strategies on
stock histories)
• Hosted IMDG in Amazon EC2 using 75 servers holding 1 TB of stock history data in
memory
• IMDG handled a continuous
stream of updates (1.1 GB/s)
• Results: analyzed 1 TB in
4.1 seconds (250 GB/s).
• Observed linear scaling as
dataset and update rate grew.
© ScaleOut Software, Inc. 27
Using PMI to Implement MapReduce for OI
• PMI serves as foundational platform for MapReduce and other parallel operators.
• Implement MapReduce with two PMI phases:
• Runs standard Hadoop MapReduce applications.
• Data can be input from either the IMDG or an
external data source.
• Works with any input/output format.
• IMDG uses PMI phases to invoke the mappers
and reducers.
• Eliminates batch scheduling overhead.
• Intermediate results are stored within the IMDG.
• Minimizes data motion in shuffle phase.
• Allows optional sorting.
• Note: output of a single reducer/combiner
optionally can be globally merged.
© ScaleOut Software, Inc. 28
MapReduce for OI Requires New Data Model
• IMDGs historically implement a feature-rich data model:
• Efficiently manages large objects (KBs-MBs).
• Supports object timeouts, locking, query by
properties, dependency relationships, etc.
• MapReduce typically targets very large collections
of small key/value pairs:
• Does not require rich object semantics.
• Does require efficient storage (minimum metadata)
and highly pipelined access.
• Solution: a new IMDG data model for MapReduce:
• Uses standard Java named map APIs for access.
• MapReduce uses standard input/output formats.
• Stores data in chunks and pipelines to/from engine.
• Automatically defines splits for mappers and holds shuffled data for reducers.
© ScaleOut Software, Inc. 29
Optimizing MapReduce for OI: simpleMR
• Integrate in-memory named map with MapReduce to minimize execution time.
• Use new API (simpleMR in Java, C#) to simplify apps and remove Hadoop dependencies.
© ScaleOut Software, Inc. 30
public class Mapper : IMapper<int, string, string, int>
{
void IMapper<int, string, string, int>.Map(int key,
string value, IContext<string, int> context)
{
...
context.Emit(Encoding.ASCII.GetString(...), 1);
}}
inputMap = new NamedMap<int, string>("Input_Map");
outputMap = new NamedMap<string, int>("Output_Map");
inputMap.RunMapReduce<string, int, string, int>(outputMap,
new Mapper(), new Combiner(), new Reducer(), ...);
Integrating OI and BI in the Data Warehouse
• In-memory data grids can add
value to a BI platform, e.g.:
• Transform live data and store in
HDFS for analysis.
• Provide immediate feedback to
live system pending deep analysis.
• Using YARN, an IMDG can be directly
integrated into a BI cluster:
• The IMDG holds fast-changing data.
• YARN directs MapReduce jobs to
the IMDG.
• The IMDG can output results to
HDFS.
© ScaleOut Software, Inc. 31
ETL Example
Recap: In-Memory Computing for OI
• Online systems need operational intelligence on
“live” data for immediate feedback.
• Creates important new business opportunities.
• Operational intelligence can be implemented
using standard data-parallel computing
techniques.
• In-memory data grids provide an excellent
platform for operational intelligence:
• Model and track the state of a “live” system.
• Implement high availability.
• Offer fast, data-parallel computation for
immediate feedback.
• Provide a straightforward, object-oriented
development model.
© ScaleOut Software, Inc. 32
www.scaleoutsoftware.com

More Related Content

What's hot

Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusDatabricks
 
Extending WSO2 Analytics Platform
Extending WSO2 Analytics PlatformExtending WSO2 Analytics Platform
Extending WSO2 Analytics PlatformWSO2
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentRakuten Group, Inc.
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentKinetica
 
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined StorageZero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined StorageDataCore Software
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Real-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsReal-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsAli Hodroj
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeAli Hodroj
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More CapacityGera Shegalov
 
Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event ProcessingSybase Türkiye
 
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019VMware Tanzu
 
IBM DS8880 and IBM Z - Integrated by Design
IBM DS8880 and IBM Z - Integrated by DesignIBM DS8880 and IBM Z - Integrated by Design
IBM DS8880 and IBM Z - Integrated by DesignStefan Lein
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsSingleStore
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsAli Hodroj
 
Power Your Delta Lake with Streaming Transactional Changes
 Power Your Delta Lake with Streaming Transactional Changes Power Your Delta Lake with Streaming Transactional Changes
Power Your Delta Lake with Streaming Transactional ChangesDatabricks
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Big Data Spain
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Zhenxiao Luo
 

What's hot (20)

Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
 
Extending WSO2 Analytics Platform
Extending WSO2 Analytics PlatformExtending WSO2 Analytics Platform
Extending WSO2 Analytics Platform
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
 
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined StorageZero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Real-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsReal-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data Grids
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
 
QNAP NAS for IoT
QNAP NAS for IoTQNAP NAS for IoT
QNAP NAS for IoT
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
 
Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event Processing
 
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
 
IBM DS8880 and IBM Z - Integrated by Design
IBM DS8880 and IBM Z - Integrated by DesignIBM DS8880 and IBM Z - Integrated by Design
IBM DS8880 and IBM Z - Integrated by Design
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
 
Power Your Delta Lake with Streaming Transactional Changes
 Power Your Delta Lake with Streaming Transactional Changes Power Your Delta Lake with Streaming Transactional Changes
Power Your Delta Lake with Streaming Transactional Changes
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019
 

Similar to IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligence Using In-Memory, Data-Parallel Computing

New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...Big Data Spain
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using HadoopDataWorks Summit
 
SplunkLive! Milano 2016 - customer presentation - Unicredit
SplunkLive! Milano 2016 -  customer presentation - UnicreditSplunkLive! Milano 2016 -  customer presentation - Unicredit
SplunkLive! Milano 2016 - customer presentation - UnicreditSplunk
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
 
Fast Data at ING – the why, what and how of the streaming analytics platform ...
Fast Data at ING – the why, what and how of the streaming analytics platform ...Fast Data at ING – the why, what and how of the streaming analytics platform ...
Fast Data at ING – the why, what and how of the streaming analytics platform ...Bas Geerdink
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Mainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataMainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataDevOps for Enterprise Systems
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Danmark
 
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...In-Memory Computing Summit
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Piyush Kumar
 
Actionable Insights - Thompson
Actionable Insights - ThompsonActionable Insights - Thompson
Actionable Insights - ThompsonProlifics
 
Sybase BAM Overview
Sybase BAM OverviewSybase BAM Overview
Sybase BAM OverviewXu Jiang
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Living objects network performance_management_v2
Living objects network performance_management_v2Living objects network performance_management_v2
Living objects network performance_management_v2Yoan SMADJA
 
WSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2
 

Similar to IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligence Using In-Memory, Data-Parallel Computing (20)

New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
 
SplunkLive! Milano 2016 - customer presentation - Unicredit
SplunkLive! Milano 2016 -  customer presentation - UnicreditSplunkLive! Milano 2016 -  customer presentation - Unicredit
SplunkLive! Milano 2016 - customer presentation - Unicredit
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
Fast Data at ING – the why, what and how of the streaming analytics platform ...
Fast Data at ING – the why, what and how of the streaming analytics platform ...Fast Data at ING – the why, what and how of the streaming analytics platform ...
Fast Data at ING – the why, what and how of the streaming analytics platform ...
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Mainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataMainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live Data
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management Solutions
 
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
Actionable Insights - Thompson
Actionable Insights - ThompsonActionable Insights - Thompson
Actionable Insights - Thompson
 
Sybase BAM Overview
Sybase BAM OverviewSybase BAM Overview
Sybase BAM Overview
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Living objects network performance_management_v2
Living objects network performance_management_v2Living objects network performance_management_v2
Living objects network performance_management_v2
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
WSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product Overview
 

More from In-Memory Computing Summit

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...In-Memory Computing Summit
 
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage TierIMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage TierIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIn-Memory Computing Summit
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...In-Memory Computing Summit
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...In-Memory Computing Summit
 

More from In-Memory Computing Summit (20)

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
 
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free I...
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
 
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
IMC Summit 2016 Breakout - Nikita Shamgunov - Propelling IoT Innovation with ...
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
 
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
IMC Summit 2016 Innovation - Derek Nelson - PipelineDB: The Streaming-SQL Dat...
 
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
IMC Summit 2016 Innovation - Dennis Duckworth - Lambda-B-Gone: The In-memory ...
 
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
IMC Summit 2016 Innovation - Steve Wilkes - Tap Into Your Enterprise – Why Da...
 
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X PlatformIMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
IMC Summit 2016 Innovation - Girish Mutreja - Unveiling the X Platform
 
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage TierIMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
IMC Summit 2016 Breakout - Ken Gibson - The In-Place Working Storage Tier
 
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and O...
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
 
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent MemoryIMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
IMC Summit 2016 Breakout - Gordon Patrick - Developments in Persistent Memory
 
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
 
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise GradeIMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
IMC Summit 2016 Breakout - Steve Wikes - Making IMC Enterprise Grade
 
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
IMC Summit 2016 Breakout - Noah Arliss - The Truth: How to Test Your Distribu...
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
 
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
IMC Summit 2016 Breakout - Girish Mutreja - Extreme Transaction Processing in...
 
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
IMC Summit 2016 Breakout - Greg Luck - How to Speed Up Your Application Using...
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligence Using In-Memory, Data-Parallel Computing

  • 1. Implementing Operational Intelligence Using In-Memory Computing William L. Bain (wbain@scaleoutsoftware.com) June 29, 2015
  • 2. Agenda • What is Operational Intelligence? • Example: Tracking Set-Top Boxes • Using an In-Memory Data Grid (IMDG) for Operational Intelligence • Tracking and analyzing live data • Comparison to Spark • Implementing OI Using Data-Parallel Computing in an IMDG • A Detailed OI Example in Financial Services • Code Samples in Java • Implementing MapReduce on an IMDG • Optimizing MapReduce for OI • Integrating Operational and Business Intelligence © ScaleOut Software, Inc. 2
  • 3. • Develops and markets In-Memory Data Grids, software middleware for: • Scaling application performance and • Providing operational intelligence using • In-memory data storage and computing • Dr. William Bain, Founder & CEO • Career focused on parallel computing – Bell Labs, Intel, Microsoft • 3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server • Ten years in the market; 400+ customers, 10,000+ servers • Sample customers: About ScaleOut Software
  • 4. ScaleOut Software’s Product Portfolio • ScaleOut StateServer® (SOSS) • In-Memory Data Grid for Windows and Linux • Scales application performance • Industry-leading performance and ease of use • ScaleOut ComputeServer™ adds • Operational intelligence for “live” data • Comprehensive management tools • ScaleOut hServer® • Full Hadoop Map/Reduce engine (>40X faster*) • Hadoop Map/Reduce on live, in-memory data • ScaleOut GeoServer® • WAN based data replication for DR • Global data access and synchronization © ScaleOut Software, Inc. 4 ScaleOut StateServer In-Memory Data Grid Grid Service Grid Service Grid Service Grid Service *in benchmark testing
  • 5. In-Memory Computing Is Not New • 1980’s: SIMD Systems, Caltech Cosmic Cube Thinking Machines Connection Machine 5 • 1990’s: Commercial Parallel Supercomputers Intel IPSC-2 IBM SP1
  • 6. What’s New: IMC on Commodity Hardware • 1990’s – early 2000’s: HPC on Clusters • Since ~2005: Public Clouds HP Blade Servers Amazon EC2, Windows Azure
  • 7. Introductory Video: What is Operational Intelligence https://www.youtube.com/watch?v=H6OFzdIEy-g&feature=youtu.be
  • 8. Online Systems Need Operational Intelligence Goal: Provide immediate (sub-second) feedback to a system handling live data. © ScaleOut Software, Inc. 8 A few example use cases requiring immediate feedback within a live system: • Ecommerce: personalized, real-time recommendations • Healthcare: patient monitoring, predictive treatment • Equity trading: minimize risk during a trading day • Reservations systems: identify issues, reroute, etc. • Credit cards & wire transfers: detect fraud in real time • IoT, Smart grids: optimize power distribution & detect issues
  • 9. Operational vs Business Intelligence Batch Static data sets Petabytes Disk storage Minutes to hours Best uses: • Analyzing warehoused data • Mining for long-term trends Real-time Live data sets Gigabytes to terabytes In-memory storage Sub-second to seconds Best uses: • Tracking live data • Immediately identifying trends and capturing opportunities • Providing immediate feedback Operational Intelligence Business Intelligence Big Data Analytics IMDGs CEP Storm Hadoop Spark Hana OI BI © ScaleOut Software, Inc. 9
  • 10. Example: Enhancing Cable TV Experience • Goals: • Make real-time, personalized upsell offers • Immediately respond to service issues • Detect and manage network hot spots • Track aggregate behavior to identify patterns, e.g.: • Total instantaneous incoming event rate • Most popular programs and # viewers by zip code • Requirements: • Track events from 10M set-top boxes with 25K events/sec (2.2B/day) • Correlate, cleanse, and enrich events per rules (e.g. ignore fast channel switches, match channels to programs) • Be able to feed enriched events to recommendation engine within 5 seconds • Immediately examine any set-top box (e.g., box status) & track aggregate statistics © ScaleOut Software, Inc. 10 ©2011 Tammy Bruce presents LiveWire
  • 11. Based on a simulated workload for San Diego metropolitan area: • Continuously correlates and cleanses telemetry from 10M simulated set-top boxes (from synthetic load generator) • Processes more than 30K events/second • Enriches events with program information every second • Tracks aggregate statistics (e.g., top 10 programs by zip code) every 10 seconds The Result: An OI Platform Real-Time Dashboard © ScaleOut Software, Inc. 11
  • 12. Using an IMDG to Implement OI • IMDG models and tracks the state of a “live” system. • IMDG analyzes the system’s state in parallel and provides real-time feedback. © ScaleOut Software, Inc. 12 IMDG analyzes in-memory data with integrated compute engine. IMDG tracks live system’s state with an in-memory, object-oriented model. IMDG enriches in-memory model from disk-based, historical data.
  • 13. • Each set-top box is represented as an object in the IMDG • Object holds raw & enriched event streams, viewer parameters, and statistics Example: Tracking Set-TopBoxes © ScaleOut Software, Inc. 13 • IMDG captures incoming events by updating objects • IMDG uses data-parallel computation to: • immediately enrich box objects to generate alerts to recommendation engine, and • continuously collect and report global statistics
  • 14. The Foundation: In-Memory Data Grids • In-memory data grid (IMDG) provides scalable, hi av storage for live data: • Designed to manage business logic state: • Object-oriented collections by type • Create/read/update/delete APIs for Java/C#/C++ • Parallel query by object properties • Data shared by multiple clients • Designed for transparent scalability and high availability: • Automatic load-balancing across commodity servers • Automatic data replication, failure detection, and recovery • IMDGs provide ideal platform for operational intelligence: • Easy to track live systems with large workloads • Appropriate availability model for production deployments © ScaleOut Software, Inc. 14
  • 15. Comparing IMDGs to Spark • On the surface, both are surprisingly similar: • Both designed as scalable, in-memory computing platforms • Both implement data-parallel operators • Both can handle streaming data • But there are key differences that impact use for operational intelligence: © ScaleOut Software, Inc. 15 IMDGs Spark Best use Live, operational data Static data or batched streams In-memory model Object-oriented collections Resilient distributed datasets Focus of APIs CRUD, eventing, data-parallel computing Data-parallel operators for analytics High availability tradeoffs Data replication for fast recovery Lineage for max performance
  • 16. Data-Parallel Computing on an IMDG • IMDGs provide powerful, cost-effective platform for data-parallel computing: • Enable integrated computing with data storage: • Take advantage of cluster’s commodity servers and cores. • Avoid delays due to data motion (both to/from disk and across network). • Leverage object-oriented model to minimize development effort: • Easily define data-parallel tasks as class methods. • Easily specify domain as object collection. • Example: “Parallel Method Invocation” (PMI): • Object-oriented version of standard HPC model • Runs class methods in parallel across cluster. • Selects objects using parallel query of obj. collection. • Serves as a platform for implementing MapReduce and other data-parallel operators © ScaleOut Software, Inc. 16 Analyze Data (Eval) Combine Results (Merge)
  • 17. PMI Example: OI in Financial Services • Goal: track market price fluctuations for a hedge fund and keep portfolios in balance. • How: • Keep portfolios of stocks (long and short positions) in object collection within IMDG. • Collect market price changes in one-second snapshots. • Define a method which applies a snapshot to a portfolio and optionally generates an alert to rebalance. • Perform repeated parallel method invocations on a selected (i.e., queried) set of portfolios. • Combine alerts in parallel using a second user-defined method. • Report alerts to UI every second for fund manager. © ScaleOut Software, Inc. 17
  • 18. Defining the Dataset • Simplified example of a portfolio class (Java): • Note: some properties are made query-able. • Note: the evalPositions method analyzes the portfolio for a market snapshot. © ScaleOut Software, Inc. 18 public class Portfolio { private long id; private Set<Stock> longPositions; private Set<Stock> shortPositions; private double totalValue; private Region region; private boolean alerted; // alert for trading @SossIndexAttribute // query-able property public double getTotalValue() {…} @SossIndexAttribute // query-able property public Region getRegion() {…} public Set<Long> evalPositions(MarketSnapshot ms) {…}; }
  • 19. Defining the Parallel Methods • Implement PMI interface to define methods for analyzing each object and for merging the results: © ScaleOut Software, Inc. 19 public class PortfolioAnalysis implements Invokable<Portfolio, MarketSnapshot, Set<Long>> { public Set<Long> eval(Portfolio p, MarketSnapshot ms) throws InvokeException { // update portfolio and return id if alerted: return p.evalPositions(ms); } public Set<Long> merge(Set<Long> set1, Set<Long> set2) throws InvokeException { set1.addAll(set2); return set1; // merged set of alerted portfolio ids }}
  • 20. Running the Analysis • PMI can be run from a remote workstation. • IMDG ships code and libraries to cluster of servers: • Execution environment can be pre-staged for fast startup. • In-line execution minimizes scheduling time. • Avoids batch scheduling delays. • PMI automatically runs in parallel across all grid servers: • Uses software multicast to accelerate startup. • Passes market snapshot parameter to all servers. • Uses all servers and cores to maximize throughput. © ScaleOut Software, Inc. 20
  • 21. Spawning the Compute Engine • First obtain a reference to the IMDG’s object collection of portfolios: • Create an “invocation grid,” a re-usable compute engine for the application: • Spawns a JVM on all grid servers and connects them to the in-memory data grid. • Stages the application code on all JVMs. • Associates the invocation grid with an object collection. © ScaleOut Software, Inc. 21 InvocationGrid grid = new InvocationGridBuilder("grid") .addClass(DependencyClass.class) .addJar("/path/to/dependency.jar") .setJVMParameters("-Xmx2m") .load(); pset.setInvocationGrid(grid); NamedCache pset = CacheFactory.getCache(“portfolios");
  • 22. Invoking the PMI • Run the PMI on a queried set of objects within the collection: • Multicasts the invocation and parameters to all JVMs. • Runs the data-parallel computation. • Merges the results and returns a final result to the point of call. © ScaleOut Software, Inc. 22 InvokeResult alertedPortolios = pset.invoke( PortfolioAnalysis.class, Portfolio.class, and(greaterThan(“totalValue”, 1000000), // query spec equals(“region”, Region.US)), marketSnapshot, // parameters ... ); System.out.println("The alerted portfolios are" + alertedPortfolios.getResult());
  • 23. Execution Steps • Eval phase: each server queries local objects and runs eval and merge methods: • Note: Accessing local data avoids networking overhead. • Completes with one result object per server. © ScaleOut Software, Inc. 23 • Merge phase: all servers perform distributed merge to create final result: • Merge runs in parallel to minimize completion time. • Returns final result object to client.
  • 24. Importance of Avoiding Data Motion • Local data access enables linear throughput scaling. • Network access creates a bottleneck that limits throughput. © ScaleOut Software, Inc. 24
  • 25. Outputting Continuous Alerts to the UI • PMI runs every second; it completes in 350 msec. and immediately refreshes UI. • UI alerts trader to portfolios that need rebalancing. • UI allows trader to examine portfolio details and determine specific positions that are out of balance. • Result: in-memory computing delivers operational intelligence. © ScaleOut Software, Inc. 25
  • 26. Demonstration Video: Comparison of PMI to Apache Hadoop https://www.youtube.com/watch?v=8JTsqp_-Gnw
  • 27. PMI Scales for Large In-Memory Datasets • Measured a similar financial services application (back testing stock trading strategies on stock histories) • Hosted IMDG in Amazon EC2 using 75 servers holding 1 TB of stock history data in memory • IMDG handled a continuous stream of updates (1.1 GB/s) • Results: analyzed 1 TB in 4.1 seconds (250 GB/s). • Observed linear scaling as dataset and update rate grew. © ScaleOut Software, Inc. 27
  • 28. Using PMI to Implement MapReduce for OI • PMI serves as foundational platform for MapReduce and other parallel operators. • Implement MapReduce with two PMI phases: • Runs standard Hadoop MapReduce applications. • Data can be input from either the IMDG or an external data source. • Works with any input/output format. • IMDG uses PMI phases to invoke the mappers and reducers. • Eliminates batch scheduling overhead. • Intermediate results are stored within the IMDG. • Minimizes data motion in shuffle phase. • Allows optional sorting. • Note: output of a single reducer/combiner optionally can be globally merged. © ScaleOut Software, Inc. 28
  • 29. MapReduce for OI Requires New Data Model • IMDGs historically implement a feature-rich data model: • Efficiently manages large objects (KBs-MBs). • Supports object timeouts, locking, query by properties, dependency relationships, etc. • MapReduce typically targets very large collections of small key/value pairs: • Does not require rich object semantics. • Does require efficient storage (minimum metadata) and highly pipelined access. • Solution: a new IMDG data model for MapReduce: • Uses standard Java named map APIs for access. • MapReduce uses standard input/output formats. • Stores data in chunks and pipelines to/from engine. • Automatically defines splits for mappers and holds shuffled data for reducers. © ScaleOut Software, Inc. 29
  • 30. Optimizing MapReduce for OI: simpleMR • Integrate in-memory named map with MapReduce to minimize execution time. • Use new API (simpleMR in Java, C#) to simplify apps and remove Hadoop dependencies. © ScaleOut Software, Inc. 30 public class Mapper : IMapper<int, string, string, int> { void IMapper<int, string, string, int>.Map(int key, string value, IContext<string, int> context) { ... context.Emit(Encoding.ASCII.GetString(...), 1); }} inputMap = new NamedMap<int, string>("Input_Map"); outputMap = new NamedMap<string, int>("Output_Map"); inputMap.RunMapReduce<string, int, string, int>(outputMap, new Mapper(), new Combiner(), new Reducer(), ...);
  • 31. Integrating OI and BI in the Data Warehouse • In-memory data grids can add value to a BI platform, e.g.: • Transform live data and store in HDFS for analysis. • Provide immediate feedback to live system pending deep analysis. • Using YARN, an IMDG can be directly integrated into a BI cluster: • The IMDG holds fast-changing data. • YARN directs MapReduce jobs to the IMDG. • The IMDG can output results to HDFS. © ScaleOut Software, Inc. 31 ETL Example
  • 32. Recap: In-Memory Computing for OI • Online systems need operational intelligence on “live” data for immediate feedback. • Creates important new business opportunities. • Operational intelligence can be implemented using standard data-parallel computing techniques. • In-memory data grids provide an excellent platform for operational intelligence: • Model and track the state of a “live” system. • Implement high availability. • Offer fast, data-parallel computation for immediate feedback. • Provide a straightforward, object-oriented development model. © ScaleOut Software, Inc. 32