Contenu connexe Similaire à Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data (20) Calpont InfiniDB® - Scalable and Fast Analytics for Your NoSQL Big Data2. Key Takeaways
• Calpont and InfiniDB
• Architecture – Columnar Storage
Architecture Columnar Storage
• Architecture – Map Reduction Distribution of Work
• Performance Characteristics
Performance Characteristics
• Ease of Use and Flexibility
• Extensibility
InfiniDB® Scalable. Fast. Simple. 2 Copyright © 2011 Calpont. All Rights Reserved.
3. Calpont Corporation
• Company
o Privately held and backed
o Headquartered in Frisco TX
Headquartered in Frisco, TX
• Products Our Mission
o InfiniDB Enterprise
InfiniDB Enterprise To provide a
Launched February 2010 scalable data
platform that
o InfiniDB Community enables analytic
Launched in October, 2009 business decisions
as timely as
customers and
markets dictate.
®
InfiniDB® Scalable. Fast. Simple. 3 Copyright © 2011 Calpont. All Rights Reserved.
4. InfiniDB Release Highlights
• Version 1.0 ‐ Oct. 2009/Feb. 2010
o Columnar storage.g
o Map‐reduction distribution of work.
o High speed data load.
• Version 1.5 – June 2010
o Sub‐query added to map‐reduction framework.
Select, From, Where clause support.
S l F Wh l
Correlated, Non‐Correlated sub‐query.
InfiniDB® Scalable. Fast. Simple. 4 Copyright © 2011 Calpont. All Rights Reserved.
5. InfiniDB Release Highlights
• Version 2.0 – November 2010
o Compression with real‐time decompression.
p p
o User‐defined functions, fully parallel and distributed.
Latitude/longitude distance calculation.
Geo‐Fencing ‐ is a location within polygon.
o Enhanced partition elimination.
o E h
Enhanced parallelization of reduction operations.
d ll li i f d i i
• Version 2.1 – March 2011
o Statistical aggregate functions
Statistical aggregate functions.
o View support.
o Auto‐increment
o Insert‐select.
InfiniDB® Scalable. Fast. Simple. 5 Copyright © 2011 Calpont. All Rights Reserved.
6. InfiniDB Release Highlights
• Version 2.2 – June 2011
o Group_concat and bit aggregate functions.
p_ gg g
o Additional scalar functions made parallel and distributed.
o Improved performance and memory for large strings.
• Version 3.0 – Q4/Q1
o Cl d h d
Cloud shared nothing.
hi
o Distributed/parallel load.
InfiniDB® Scalable. Fast. Simple. 6 Copyright © 2011 Calpont. All Rights Reserved.
7. Technology Trends
M o o re 's L a w a n d B e y o n d
300
D ata W arehous e Grow th - 75%
250 Mem ory C apac ity - 60%
D is k C apac ity - 50%
200 Moore s
Moore's Law (C P U) - 45%
Percent Increase
D is k B andw idth - 40%
150 Mem ory B andw idth - 20%
D is k Latenc y - 10%
P
100 Mem ory Latenc y -10%
50
0
5 6 7 8 9 10
Ye ar s
InfiniDB® Scalable. Fast. Simple. 7 Copyright © 2011 Calpont. All Rights Reserved.
8. Trends Drive Demand for Alternate Solutions
M o o re 's L a w a n d B e y o n d
300
D ata W arehous e Grow th - 75%
250 Mem ory C apac ity - 60%
D is k C apac ity - 50%
200 Moore s
Moore's Law (C P U) - 45%
Percent Increase
D is k B andw idth - 40%
150 Mem ory B andw idth - 20%
D is k Latenc y - 10%
P
100 Mem ory Latenc y -10%
50
0
5 6 7 8 9 10
Ye ar s
InfiniDB® Scalable. Fast. Simple. 8 Copyright © 2011 Calpont. All Rights Reserved.
9. Traditional Row/Index Based DBMS for Analytics
M o o re 's L a w a n d B e y o n d
300
D ata W arehous e Grow th - 75%
250 Mem ory C apac ity - 60%
D is k C apac ity - 50%
200 Moore s
Moore's Law (C P U) - 45%
Percent Increase
D is k B andw idth - 40%
150 Mem ory B andw idth - 20%
D is k Latenc y - 10%
Index Operations
I d O ti
P
100 Mem ory Latenc y -10%
50
0
5 6 7 8 9 10
Ye ar s
InfiniDB® Scalable. Fast. Simple. 9 Copyright © 2011 Calpont. All Rights Reserved.
10. InfiniDB Technology Foundations
M o o re 's L a w a n d B e y o n d
300
D ata W arehous e Grow th - 75%
250 Mem ory C apac ity - 60%
• Scalable Disk
D is k C apac ity - 50%
• Scalable Cache
200 Moore s
Moore's Law (C P U) - 45%
• Real‐time Decompression
l
Percent Increase
D is k B andw idth - 40%
• Efficient I/O from cache
150 Mem ory B andw idth - 20%
• Efficient I/O from disk
D is k Latenc y - 10%
P
100 No Random I/O Operations
Mem ory Latenc y /
d -10%
50
0
5 6 7 8 9 10
Ye ar s
InfiniDB® Scalable. Fast. Simple. 10 Copyright © 2011 Calpont. All Rights Reserved.
12. InfiniDB Architecture – Columnar Storage
What is Columnar Storage ? Column 1
File 1
Column 2
File 2
Column 3
File 3
• Stores each column for a table in a
different file/block on disk.
o Column 1 values stored in file 1.
o C l
Column 2 values stored in file 2.
2 l d i fil 2
o Column 3 values stored in file 3.
12
InfiniDB® Scalable. Fast. Simple. 12 Copyright © 2011 Calpont. All Rights Reserved.
13. InfiniDB Architecture – Columnar Storage
• Rows are identified by offset. Row 101 Column 1
File 1
Column 2
File 2
Column 3
File 3
can be found at:
o Column 1 value is at offset 101 in file1.
o Column 2 value is at offset 101 in file2.
o C l
Column 3 value is at offset 101 in file3.
3 l i t ff t 101 i fil 3
Offset 101 1234 2012‐01‐01 Smith
13
InfiniDB® Scalable. Fast. Simple. 13 Copyright © 2011 Calpont. All Rights Reserved.
14. InfiniDB Architecture – Column Restriction
Col 1 Col 2 Col 3 Col 90
File 1 File 2 File 3 File 90
Restriction ‐ find rows based on filters
• Column Filter (filter 1 filter 2 filter 3)
Column Filter (filter 1, filter 2, filter 3)
• Table Expression/Functions (exp 1, exp 2)
• Join Filter (join 1, join 2, join 3)
Join Filter (join 1, join 2, join 3)
…
Just‐in‐time column access defers I/O until
needed.
14
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.
15. InfiniDB Architecture – Column Projection
Col 1 Col 2 Col 3 Col 90
File 1 File 2 File 3 File 90
Projection – display columns as selected.
• Select Column Filter (filter 1 filter 2
Select Column Filter (filter 1, filter 2,
filter 3, etc.)
… Just do I/O for:
• Columns selected
• Rows that pass the filters
15
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.
16. Column Restriction and Projection
|-------
|------- Column # Seve
|-------
Extent # 5
-------- Column # F
--
-------- Co
Filter 3
Projection Projection
Filter 1
Filter 2
olumn # S -------
Four -------
enteen ---
Six
Extent # 27
---------|
---------|
---------|
• Automatic Vertical Partitioning and Horizontal Partitioning
• Just‐In‐Time Materialization
InfiniDB® Scalable. Fast. Simple. 16 Copyright © 2011 Calpont. All Rights Reserved.
17. InfiniDB Architecture – Columnar Storage
InfiniDB Eliminates: InfiniDB Adds:
• Full Table Scan • Efficient I/O
• Random I/O • Real‐time Compression
• Index Load Overhead • Fast, predictable Load
• Conditional • Predictable Performance
Performance
17
InfiniDB® Scalable. Fast. Simple. 17 Copyright © 2011 Calpont. All Rights Reserved.
19. InfiniDB – Two Tier Architecture
or …
Purpose built for big data analytics.
Purpose built for big data analytics
• User Module (UM) Single Server
Understands SQL.
Q
• Performance Module (PM)
Operates on data blocks.
InfiniDB® Scalable. Fast. Simple. 19 Copyright © 2011 Calpont. All Rights Reserved.
20. Tiered MPP Building Blocks
Module Process Functionality Value
• Hosts MySQL Familiar DBMS interface
MySQL • Connection management Leverages existing partner integrations
• SQL parsing & optimization Delivers full SQL syntax support
Enables shared nothing and shared
• Abstracts physical and logical
p y g everything storage
everything storage
Extent Map storage
Enables partition elimination
• Metadata store
Built‐in failover
Independent scalability and tunable
• Work distribution
Work distribution
concurrency
ExeMgr • Final results management and
Multi‐threaded to take advantage of multi‐
aggregation
core HW platforms
InfiniDB® Scalable. Fast. Simple. 20 Copyright © 2011 Calpont. All Rights Reserved.
21. Tiered MPP Building Blocks
Module Process Functionality Value
• Scale‐out cache management Independent scalability and tunable
• Distributed scan, filter, join and
b d fl d performance
f
PrimProc
aggregation operations Multi‐threaded to take advantage of multi‐
• Resource management core HW platforms
• High Speed Bulk Load
g p Enables concurrent reads and writes, non‐
blocking read enabled
Data • Transactional DML and DDL
Multi‐threaded to take advantage of multi‐
• Online schema extensions core HW platforms
InfiniDB® Scalable. Fast. Simple. 21 Copyright © 2011 Calpont. All Rights Reserved.
22. Tiered MPP Building Blocks
What is the basic unit of work within the Performance Module?
• One thread working on a range of rows. Typically 1/2 million rows,
stored in a few hundred blocks of data.
• Execute all column operations required (restriction and projection).
• Execute any group by/aggregation against local data.
• R t
Return results to User Module.
lt t U M d l
• Primitives are run in parallel and fully distributed (MPP).
InfiniDB® Scalable. Fast. Simple. 22 Copyright © 2011 Calpont. All Rights Reserved.
24. InfiniDB Load Performance
• Load rate capable of 1 million rows/second depending
on disk and data model.
on disk and data model.
• Consistent load rate over time.
TIME
InfiniDB® Scalable. Fast. Simple. 24 Copyright © 2011 Calpont. All Rights Reserved.
25. InfiniDB Load Performance
• Through 60 billion rows
Through 60 billion rows.
• Through 225 billion rows.
g
• Through 1.031 trillion rows.
InfiniDB® Scalable. Fast. Simple. 25 Copyright © 2011 Calpont. All Rights Reserved.
27. Performance Benchmark – Percona SSB
Percona External Test vs. Internal Tests vs. 16 PMs @ AWS
cached queries, scale factor 1000
50000
1PM
45000
2PMs
40000 4PMs
16PMS (AWS)
35000
InfoBright - Percona
30000 Lucid - Percona
Seconds
InfiniDB - Percona
25000
9,694.53
9 694 53
S
20000
15000 6,867.74
10000
5000
0
Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3
InfiniDB® Scalable. Fast. Simple. 27 Copyright © 2011 Calpont. All Rights Reserved.
28. SSB Queries on Amazon Web Services (AWS)
InfiniDB Internal vs. InfiniDB @ AWS - cached queries, scale factor 1000
1200
1PM
2PMs
1000
4PMs
16PMS (AWS)
800
seconds
600
s
400
7.83
200
0
Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3
InfiniDB® Scalable. Fast. Simple. 28 Copyright © 2011 Calpont. All Rights Reserved.
29. Asia Region Distributor Benchmark
InfiniDB (1 PM)
InfiniDB (2 PMs)
Legacy Columnar
DBMS-X Row-Based
InfiniDB® Scalable. Fast. Simple. 29 Copyright © 2011 Calpont. All Rights Reserved.
33. InfiniDB Ease of Use – Automatic Everything
• Column storage happens automatically.
• Compression happens automatically.
p pp y
• Which compression to use happens automatically.
• No index build or maintenance.
• Extent map partition behavior happens automatically.
• Distribution of data across server/disk resources happens
automatically.
automatically
• Distribution of work happens automatically.
• Ad‐hoc performance happens automatically.
Ad hoc performance happens automatically.
InfiniDB® Scalable. Fast. Simple. 33 Copyright © 2011 Calpont. All Rights Reserved.
34. Full Featured SQL to Map‐Reduction Mapping
Robust Column‐Aware Optimizer Handles:
o Filter order optimization.
o Join order optimization.
Powerful Join Optimizations Handle:
P f lJ i O i i i H dl
o Inner join, outer join, semi‐join (sub‐query).
o N‐table single step hash‐join (up to 60).
N table single step hash join (up to 60).
Queue‐Based Scheduling of Performance Module Handles:
o Automatically parallelizes query.
o Allows small queries to get in, and return, while larger query is
running.
running
InfiniDB® Scalable. Fast. Simple. 34 Copyright © 2011 Calpont. All Rights Reserved.
35. Full Featured Mapping from SQL to Map‐Reduce
Robust Tools to Maximize Physical I/O:
o Reading only the columns selected to avoid I/O.
o Just‐in‐time materialization to avoid I/O.
o Automatic partition elimination to avoid I/O.
o S l bl d t b ff
Scalable data buffer cache to avoid I/O from disk.
h t id I/O f di k
o Compression to minimize the bytes read from disk.
Extensible User Defined Function (UDF):
o UDFs run as full‐featured functions within InfiniDB.
o Gain full benefits of Optimizer, Join, Scheduler, and Physical I/O
features.
InfiniDB® Scalable. Fast. Simple. 35 Copyright © 2011 Calpont. All Rights Reserved.
36. InfiniDB Ease of Use – Avoiding Trade‐Offs
Traditional (and some current) DBMS technologies often involve
significant trade‐offs that just don’t exist within InfiniDB.
Load Rate vs. More Indexes.
More Attributes vs.
M Att ib t Better Performance
B tt P f
Summary Tables vs. Real‐time access to data
p
Save Space vs. Q y
Query Performance
InfiniDB® Scalable. Fast. Simple. 36 Copyright © 2011 Calpont. All Rights Reserved.
38. Extensibility for Big Data
Big Data and Extensibility
• Data size continues to escalate.
• New uses of data to drive business actions.
• New attributes and dimensions are continually being included.
InfiniDB® Scalable. Fast. Simple. 38 Copyright © 2011 Calpont. All Rights Reserved.
39. InfiniDB Extensibility – Scale Efficiently
Handling Data Scale
• InfiniDB scales with your data.
y
• Scalability combined with very efficient I/O.
o Columnar storage.
o Just‐in‐time materialization.
o Partition elimination.
o Scalable cache
Scalable cache.
o Columnar compression.
InfiniDB® Scalable. Fast. Simple. 39 Copyright © 2011 Calpont. All Rights Reserved.
40. InfiniDB Extensibility – Online Schema Changes
Schema Changes
• The InfiniDB columnar architecture eliminates table rebuilds.
• New column files are added without change to existing columns.
• InfiniDB also allows for these column additions to be handled as
on‐line operations.
InfiniDB® Scalable. Fast. Simple. 40 Copyright © 2011 Calpont. All Rights Reserved.
41. InfiniDB Extensibility – Business Logic
The Data Driven Business
• Extend your analytics capability with InfiniDB’s User Defined
y y p y
(parallel and distributed) Functions.
• Reactive and predictive analysis of your data:
o Quickly
kl
o Predictably
• Remove Barriers
Remove Barriers
o No waiting for new aggregates to be built.
o No waiting for new code to be written.
InfiniDB® Scalable. Fast. Simple. 41 Copyright © 2011 Calpont. All Rights Reserved.
44. InfiniDB Customer Experience
A number of customer case studies are available at
www.calpont.com for further detail, but the key differential features
as to why customers are choosing InfiniDB include:
•P f
Performance at scale.
t l
• Large number of dimensions. ®
• Ad‐hoc query performance
Ad hoc query performance.
• Unique record analysis.
• Near real‐time load capability.
• Faster time to market.
• Predictable query performance.
InfiniDB® Scalable. Fast. Simple. 44 Copyright © 2011 Calpont. All Rights Reserved.
45. Key Takeaways
The InfiniDB Performance Architecture
• Architecture – Columnar Storage
Architecture Columnar Storage
• Architecture – Map Reduction Distribution of Work
The InfiniDB Deployment Experience
• Performance Characteristics
Performance Characteristics
• Ease of Use and Flexibility
•EExtensibility
ibili
InfiniDB® Scalable. Fast. Simple. 45 Copyright © 2011 Calpont. All Rights Reserved.