Making the Most of In-Memory: More than Speed

Making the Most of In-Memory: More than Speed

The Briefing Room

Welcome

Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com

Twitter Tag: #briefr

The Briefing Room

Mission

!   Reveal the essential characteristics of enterprise software,
good and bad
!   Provide a forum for detailed analysis of today s innovative
technologies
!   Give vendors a chance to explain their product to savvy
analysts
!   Allow audience members to pose serious questions... and get
answers!


The Briefing Room

Topics

This Month: DATA PROCESSING
November: DATA DISCOVERY & VISUALIZATION
December: INNOVATORS


The Briefing Room

Data Processing

“

Eﬃciency
is
doing
things

right;
eﬀec2veness
is
doing

the
right
things.

~Peter Drucker


The Briefing Room

Analyst: Robin Bloor

Robin Bloor is
Chief Analyst at
The Bloor Group

robin.bloor@bloorgroup.com


The Briefing Room

Kognitio
!   Founded in 1989, Kognitio is both an in-memory database
and an analytical engine
!   The Kognitio Analytical Platform can be deployed as
software, as an appliance, or in the cloud
!   The platform enables flexible, ad hoc queries on complex
data sets, including data from Hadoop, and it offers scaleup and scale-out capabilities


The Briefing Room

Guest: Roger Gaskell

Roger Gaskell is the Chief Technology Officer and one of the founding members
of the Kognitio Development Team. He has overall responsibility for all product
development, strategic direction and roadmap of new innovation for the
Kognitio Analytical Platform. Roger has been instrumental in all generations of
the product to date. Over this time, it has evolved from an appliance-based
system in the original beta offering in 1989, to a hardware-independent
software for x86 processing, then to a cloud-based Platform-as-a-Service
offering in in the mid-1990s. Prior to Kognitio, Roger was test and development
manager at AB Electronics. During this time his primary responsibility was for
the famous BBC Micro Computer and the development and testing of the first
mass production of personal computers for IBM.


The Briefing Room

Making the most of
in-memory platforms
October 2013

What is an “In-memory” analytical platform

A database where queries are run from data held in
computer memory (RAM) rather than mechanical disk

Memory = Fast / Disk = Slow
Analytics go much quicker – SIMPLE?

Unfortunately, it’s not as simple as that….
10

Why in-memory: RAM is faster than disk (really!)
Actually, this only part of the story:
workload
filtering
crunching

Analytics completely change the workload
characteristics on the database
Simple reporting & transactional processing
is all about “filtering” the data of interest
Analytics is all about complex “crunching”
of the data once it is filtered

CPU cycles
storing

Storing data on physical disks severely limits the
rate at which data can be provided to the CPUs

access
11

Crunching needs processing power & consumes
CPU cycles

Accessing data directly from RAM allows
much more CPU power to be deployed

Analytics is about

crunching through data

CPU cycle-intensive & CPU-bound
“CRUNCHING”
Analytical
Functions

Joins
Aggregations

Sorts

Grouping

•  To understand what is happening in the data
More complex analytics

=

More pronounced this becomes

•  In-memory analytical platforms are therefore CPU-bound
–  Assume disk I/O speeds not a bottleneck
–  In-memory removes the disk I/O bottleneck
12

For analytics, the CPU is king
Being CPU-bound fundamentally changes
a system’s design philosophy

Disk IO Bound

CPU Bound

CPUs wait for data from disk
No need for efficient coding
Parallelisation ineffective

Every CPU cycle is precious – efficient coding
Parallelization = scalable performance
Advanced techniques minimize CPU cycles

Interactive / ad hoc analytics:
THINK data to core ratios ≈ <10GB data per CPU core
13

Why now?

Interest in
in-memory

Price
of RAM,
Logarithmic
(10)

1987

14

1995

2000

2005

2010

Mature BI being overtaken
Numbers, tables, charts, indicators
Historical information, latency
…accessed with ease and simplicity

Decision Support
But BI and BI tools have plateaued!
Progression into advanced analytics & data science

It’s now all about doing more math
…a lot more math
15

Thus more complex methods – real-time
Machine learning algorithms

Analytical Complexity

Behaviour
modelling

Statistical
Analysis

Dynamic
Simulation

Clustering

Dynamic
Interaction
Fraud detection
Reporting & BPM

Campaign
Management

#PP_R
Technology/Automation
16

How to efficiently exploit RAM
•  A large cache is not in-memory
–  In-memory platforms hold data in structures that take advantage of the
properties of RAM
–  Caches are copies of frequently used disk blocks

•  Platform designed to specifically exploit the random
access nature of memory
–  Different algorithms
–  CPU cycles are precious – code efficiency paramount
–  Advanced techniques used to reduce code path length
•  Dynamic Machine Code Generation
•  Extended CPU instruction sets

•  Parallelize everything
–  Scale-out and Scale-up
–  Fully and efficiently use every CPU
core, in every CPU, in every server
17

Analytical Platform Reference Architecture
Application &
Client Layer
All BI Tools

All OLAP Clients

Excel

Analytical
Platform
Layer

Near-line
Storage
(optional)

Reporting

Persistence
Layer
18

Kognitio
Storage

Hadoop
Clusters

Cloud
Storage

Enterprise Data
Warehouses

Legacy
Systems

Perceptions & Questions

Analyst:
Robin Bloor


The Briefing Room

Big Data, Maybe — Big Parallelism, Yes
Many latency-reducing changes are afoot:
u  Hadoop
u  CPU

is a data lake – It’s about latency

and memory rule – The old database is dying

u  Grids,

not clusters – A server is now a cluster

u  Scaling

Up AND Scaling Out – “Only scaling out”
is last year’s story

u  SSD

will replace spinning disk – But it will never
compete with RAM

Why the Excitement?
What are the “new” applications?
BIG DATA capture and staging
BIG DATA ANALYTICS
LITTLE DATA ANALYTICS
OPERATIONAL INTELLIGENCE

A “Modern” Workload

Query Light
&
Math Heavy

Where the Rubber Meets the Road
It isn’t really about application latency any more, it’s
about business process latency (business time!). This
can have many aspects:
u  The

collapse of data flows – take the processing
to the data

u  Data
u  Full

warehouse offload

process automation

u  Lower

latency = NEW BUSINESS PROCESSES

The Question
The question for most organizations is:

Exactly how do
we take
advantage of
these changes?
This is a BUSINESS question AND a TECHNICAL question.

u  Low

latency is exciting, but where do you see the
clear business opportunities?

u  There

seems to be a conundrum about where to
store “slow” data:
Ø  Hadoop?
Ø  Traditional data warehouse?
Ø  New data warehouse?

u  Is

the split between the application and the data
real any more?

u  In

your opinion, does the Enterprise need a new
architecture?

u  How

is it possible to define and monitor service
levels with in-memory applications?

u  Whither

data governance?


The Briefing Room

Upcoming Topics

This Month: DATA PROCESSING
November: DATA DISCOVERY & VISUALIZATION
December: INNOVATORS

www.insideanalysis.com


The Briefing Room

Thank You
for Your
Attention


The Briefing Room

Making the Most of In-Memory: More than Speed

Recommandé

Recommandé

Contenu connexe

Similaire à Making the Most of In-Memory: More than Speed

Similaire à Making the Most of In-Memory: More than Speed (20)

Plus de Inside Analysis

Plus de Inside Analysis (20)

Dernier

Dernier (20)

Making the Most of In-Memory: More than Speed