The document discusses how in-memory platforms are more than just speed - they are designed to efficiently exploit RAM and are optimized for analytics workloads that involve complex "crunching" of data. It explains that analytics workloads are CPU-intensive and benefit from techniques like parallelization across CPU cores. Additionally, the document notes that declining RAM prices and interest in advanced analytics are driving more adoption of in-memory platforms for both large and small data use cases.
3. Mission
! Reveal the essential characteristics of enterprise software,
good and bad
! Provide a forum for detailed analysis of today s innovative
technologies
! Give vendors a chance to explain their product to savvy
analysts
! Allow audience members to pose serious questions... and get
answers!
Twitter Tag: #briefr
The Briefing Room
4. Topics
This Month: DATA PROCESSING
November: DATA DISCOVERY & VISUALIZATION
December: INNOVATORS
Twitter Tag: #briefr
The Briefing Room
5. Data Processing
“
Efficiency
is
doing
things
right;
effec2veness
is
doing
the
right
things.
~Peter Drucker
Twitter Tag: #briefr
The Briefing Room
6. Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
Twitter Tag: #briefr
The Briefing Room
7. Kognitio
! Founded in 1989, Kognitio is both an in-memory database
and an analytical engine
! The Kognitio Analytical Platform can be deployed as
software, as an appliance, or in the cloud
! The platform enables flexible, ad hoc queries on complex
data sets, including data from Hadoop, and it offers scaleup and scale-out capabilities
Twitter Tag: #briefr
The Briefing Room
8. Guest: Roger Gaskell
Roger Gaskell is the Chief Technology Officer and one of the founding members
of the Kognitio Development Team. He has overall responsibility for all product
development, strategic direction and roadmap of new innovation for the
Kognitio Analytical Platform. Roger has been instrumental in all generations of
the product to date. Over this time, it has evolved from an appliance-based
system in the original beta offering in 1989, to a hardware-independent
software for x86 processing, then to a cloud-based Platform-as-a-Service
offering in in the mid-1990s. Prior to Kognitio, Roger was test and development
manager at AB Electronics. During this time his primary responsibility was for
the famous BBC Micro Computer and the development and testing of the first
mass production of personal computers for IBM.
Twitter Tag: #briefr
The Briefing Room
10. What is an “In-memory” analytical platform
A database where queries are run from data held in
computer memory (RAM) rather than mechanical disk
Memory = Fast / Disk = Slow
Analytics go much quicker – SIMPLE?
Unfortunately, it’s not as simple as that….
10
11. Why in-memory: RAM is faster than disk (really!)
Actually, this only part of the story:
workload
filtering
crunching
Analytics completely change the workload
characteristics on the database
Simple reporting & transactional processing
is all about “filtering” the data of interest
Analytics is all about complex “crunching”
of the data once it is filtered
CPU cycles
storing
Storing data on physical disks severely limits the
rate at which data can be provided to the CPUs
access
11
Crunching needs processing power & consumes
CPU cycles
Accessing data directly from RAM allows
much more CPU power to be deployed
12. Analytics is about
crunching through data
CPU cycle-intensive & CPU-bound
“CRUNCHING”
Analytical
Functions
Joins
Aggregations
Sorts
Grouping
• To understand what is happening in the data
More complex analytics
=
More pronounced this becomes
• In-memory analytical platforms are therefore CPU-bound
– Assume disk I/O speeds not a bottleneck
– In-memory removes the disk I/O bottleneck
12
13. For analytics, the CPU is king
Being CPU-bound fundamentally changes
a system’s design philosophy
Disk IO Bound
CPU Bound
CPUs wait for data from disk
No need for efficient coding
Parallelisation ineffective
Every CPU cycle is precious – efficient coding
Parallelization = scalable performance
Advanced techniques minimize CPU cycles
Interactive / ad hoc analytics:
THINK data to core ratios ≈ <10GB data per CPU core
13
15. Mature BI being overtaken
Numbers, tables, charts, indicators
Historical information, latency
…accessed with ease and simplicity
Decision Support
But BI and BI tools have plateaued!
Progression into advanced analytics & data science
It’s now all about doing more math
…a lot more math
15
17. How to efficiently exploit RAM
• A large cache is not in-memory
– In-memory platforms hold data in structures that take advantage of the
properties of RAM
– Caches are copies of frequently used disk blocks
• Platform designed to specifically exploit the random
access nature of memory
– Different algorithms
– CPU cycles are precious – code efficiency paramount
– Advanced techniques used to reduce code path length
• Dynamic Machine Code Generation
• Extended CPU instruction sets
• Parallelize everything
– Scale-out and Scale-up
– Fully and efficiently use every CPU
core, in every CPU, in every server
17
18. Analytical Platform Reference Architecture
Application &
Client Layer
All BI Tools
All OLAP Clients
Excel
Analytical
Platform
Layer
Near-line
Storage
(optional)
Reporting
Persistence
Layer
18
Kognitio
Storage
Hadoop
Clusters
Cloud
Storage
Enterprise Data
Warehouses
Legacy
Systems
21. Big Data, Maybe — Big Parallelism, Yes
Many latency-reducing changes are afoot:
u Hadoop
u CPU
is a data lake – It’s about latency
and memory rule – The old database is dying
u Grids,
not clusters – A server is now a cluster
u Scaling
Up AND Scaling Out – “Only scaling out”
is last year’s story
u SSD
will replace spinning disk – But it will never
compete with RAM
22. Why the Excitement?
What are the “new” applications?
BIG DATA capture and staging
BIG DATA ANALYTICS
LITTLE DATA ANALYTICS
OPERATIONAL INTELLIGENCE
24. Where the Rubber Meets the Road
It isn’t really about application latency any more, it’s
about business process latency (business time!). This
can have many aspects:
u The
collapse of data flows – take the processing
to the data
u Data
u Full
warehouse offload
process automation
u Lower
latency = NEW BUSINESS PROCESSES
25. The Question
The question for most organizations is:
Exactly how do
we take
advantage of
these changes?
This is a BUSINESS question AND a TECHNICAL question.
26. u Low
latency is exciting, but where do you see the
clear business opportunities?
u There
seems to be a conundrum about where to
store “slow” data:
Ø Hadoop?
Ø Traditional data warehouse?
Ø New data warehouse?
u Is
the split between the application and the data
real any more?
27. u In
your opinion, does the Enterprise need a new
architecture?
u How
is it possible to define and monitor service
levels with in-memory applications?
u Whither
data governance?