NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World

NoSQL Analytics: JSON Data
Analysis and Acceleration in
MongoDB World
Ajaykumar Gupte
IBM
1

2
Agenda
Basic overview of JSON data management.
Overview of IBM in-memory accelerator
Performance using in-memory accelerator with JSON data

3
Explosion of mobile
devices – gaming
and social apps
Advertising:
serving ads and
real-time
bidding
Social networking,
online
communities
E-commerce, social
commerce
Machine data and
real-time
operational
decisions
Smart
Devices
Internet of Data
Internet of
Things
SQL
SQL, {JSON}, Spatial
{JSON},
TimeSeries
SQL, {JSON}
Simple,
{JSON},
Timeseries
SQL, {JSON}

4

Schema-less app development lowers costs
– Simplicity and agility to develop apps quickly
– Inter operate with modern applications, especially mobile
– Applications address schema migrations
– Sometimes referred to a “flexible schema” development

While fixed schema is required for traditional
relational
– Transaction volumes growing
– Operational data, analytics, static records
– Creates greater value through “360 degree” view of the
business
Demand is growing in both areas
These approaches are
complementary
NoSQL Compared to “Traditional” DBMS

5
Real Time Analytics
•
Customer Issues
– Several different models of data (SQL,
NoSQL, TimeSeries/Sensor)
– NoSQL is not strong building relations
between collections
– Most valuable analytics combine the results of
all data models
– Most prominent analytic system written using
standard SQL

6
NoSQL SQL
Relational
Fixed schema
Joins, DML Optimizer
Transactions(multi-stmt)
Analytics
Replication
Triggers, Stored Procedures
Document store
Schema less
Indexing Aggregation
Sharding
Replication
Map/reduce
Scalability
Emerging Trends

7
IBM Informix database 12.1
Relational, Embeddable
Real-time Analytics
Fast, Always-on Transactions
NoSQL capability
Multi-Tenancy
Sensor data management
High availability
Easy to Use

8
Tier 2
Smart Gateway / Aggregator
Tier 3
Sensor Operational
Analytics Zone
Tier 1
Devices/Sensors
AnalyticsTier 4
Deep Analytics Zone
Simple
Customer
Portal
REST (https)
Time Series
REST (https)
MAC Addr. MAC Addr. MAC Addr. MAC Addr. MAC Addr. MAC Addr.
Informix TS IWA
Streams
Logic
dist.
engine
Cloudant
MessageSight
Lightweight
Analytics
Context
The IoT Architecture
IBM has opportunity in multiple Tiers of IoT

9
•
Relational and non-relational data in one system
•
NoSQL/MongoDB Apps can access Informix Relational Tables
•
Distributed Queries
•
Multi-statement Transactions
•
Enterprise Proven Reliability
•
Enterprise Scalability
•
Enterprise Level Availability
Informix provides the capability to leverage
the abilities of both relational DBMS and document store
systems.
The Hybrid Solution
Informix has the Best of Both Worlds

10
Informix 12.1 & MongoDB Clients
•
New Wire Protocol Listener supports
existing MongoDB drivers
•
Connect to MongoDB or Informix with
same application!
MongoDB
native Client
MongoDB
web browser
Mobile
Applications
MongoDB
Wire
Protocol
Informix
12.1
MongoDB
driver

11
Informix JSON Store Benefits
•
Row locking on the individual JSON document
•
Large documents, up to 2GB maximum size
•
Ability to compress documents
•
Ability to intelligently cache commonly used
documents
•
Use existing storage options and management tools.

12
Two New Data Types JSON and BSON
•
Native JSON and BSON data types
•
Index support for NoSQL data types
•
Native operators and comparator
functions allow for direct
manipulation of the BSON data type
•
Database Server seamlessly
converts to and from
• JSON BSON
• Character data JSON

13
Informix: All Together Now!
SQL Tables
JSON Collections
TimeSeries
MQ Series
SQL APIs
JDBC, ODBC
Informix
IWA – BLU ACCELERATION
GENBSON: SQL to {BSON}
MongoDB
Drivers
TEXT SEARCH
SPATIAL
TIME SERIES {BSON}
REST API

14
RelationalTable JSON Timeseries SpatialText
SQL API
StandardODBC, JDBC,.NET, OData,etc.Language SQL.
Direct SQLAccess.Dynamic ViewsRow types
StandardSQL/extJDBC/ODBCJSON Support
Standard SQLJDBC/ODBCJSON Support
Mongo API(NoSQL)
Mongo APIs forJava,
Javascript, C++, C#,...
Mongo APIs forJava,Javascript, C++, C#,...
Virtual TableJSON support JSON Support
Hybrid Access:SQL, JSON, Timeseries &
Spatial

15
Benefits of Hybrid Power
Access consistent data from its source
Avoid ETL, continuous data sync and conflicts.
Exploit the power of SQL, MongoAPI seamlessly
Exploit the power of RDBMS technologies in MongoAPI:
– Informix Warehouse accelerator,
– Cost based Optimizer & power of SQL
– R-tree indices for spatial, Lucene text indexes, and more.
Access all your data thru any interface: MongoAPI & SQL
Store data in one place and efficiently transform and use them
on demand.
Existing SQL based tools and APIs can access new data in
JSON

16
How to Convert Relational Data as JSON Documents
•
Relational data can be treated as structured JSON documents; column
name-value becomes key-value pair.
•
select partner, pnum, country from partners;
partner pnum Country
Acme 1748 Australia
Vernco 1746 USA
Baker 1472 Spain
Contrex 1742 France
{parnter: “Acme”, pnum:1748, Country: “Australia”}
{parnter: “Vernco”, pnum:1746, Country: “USA”}
{parnter: “Backer”, pnum:1472, Country: “Spain”}
{parnter: “Contrex”, pnum:1742, Country: “France”}
•
GENBSON function
Method for transforming existing SQL data into a JSON or BSON
document store format
•
select GENBSON( partners )::JSON from partners;
16

17
•
Supports B-Tree indexes on any key-value pairs.
•
Typed indices could be on simple basic type (int, decimal,)
•
Type-less indices could be created on BSON and use BSON type comparison
•
Informix translates ensureIndex() to CREATE INDEX
•
Informix translates dropIndex() to DROP INDEX
Indexing
Mongo Operation SQL Operation
db.customers.ensureIndex(
{orderDate:1, zip:-1})
CREATE INDEX IF NOT EXISTS v_customer_2 ONcustomer (bson_get(data,‘orderDate') ASC,bson_get(data,‘zip') DESC) USING BSON
CREATE UNIQUE INDEX IF NOT EXISTS
v_customer_3 ON customer
(bson_get(data,'orderDate')ASC) USING BSON

Flexible Grid + Sharding
Informix
Shard 1
Informix/1
Secondary
Disk or Diskless
Informix/1
Secondary
Disk or Diskless
Informix
Secondary 1
Disk or Diskless
Informix
Shard 2
Informix/1
Secondary
Disk or Diskless
Informix/1
Secondary
Disk or Diskless
Informix
Secondary 2
Disk or Diskless
Informix
Shard 3
Informix/1
Secondary
Disk or Diskless
Informix/1
Secondary
Disk or Diskless
Informix
Secondary 3
Disk or Diskless
Informix
Shard 4
Informix/1
Secondary
Disk or Diskless
Informix/1
Secondary
Disk or Diskless
Informix
Secondary 4
Disk or Diskless
Informix NoSQL Cluster Architecture Overview

Scaling in both directions

Shard Disk Secondary

Secondary server(s) provide HA and scaling

Allow write on Secondary

19
Mongo Application
IBM Wire Listener
IDXs
Logs
Enterprise replication + Flexible Grid + Sharding
Distributed
Queries
Informix Dynamic Server
Tables
Tables
IDXs
Relational
Tables
JSON
Collections
SELECT bson_new(bson, ‘{}’) FROM customer WHERE
data.state::varchar(128) =“MO”
db.customer.find({state:”MO”}) db.partners.find({state:”CA”})
SELECT * FROM partners WHERE state=“CA”
Customer
partners
JSON JSON
Access RelationalAccess JSON
MongoAPI Accessing Both NoSQL and Relational Tables

20
MongoAPI Accessing Both NoSQL and Relational Tables
•
Typically NoSQL does not involve transactions
– In many cases, a document update is atomic, but not the application
statement (Example :7 targeted for deletion, but only 4 are removed )
•
Informix-NoSQL provides transactions on all application statements
– Each server operation INSERT, UPDATE, DELETE, SELECT will
automatically be committed after each operation.
•
Default isolation level is DIRTY READ
•
All standard isolation level support
•
$sql operator – execute SQL commands within Informix database
db.getCollection("system.sql").find({ "$sql":
"select c.customer_num, p.customer_num as p_cust from customer c left
outer join partners p on c.customer_num = p.customer_num order by 1" })

21
IBM Informix Warehouse
Accelerator

22
IBM Informix Warehouse Accelerator (IWA)
Results
Analytic
query
Linux on
Intel / AMD 64-bit
TCP/IP
Query
Optimizer
In-Memory
Compressed
Columnar
Database Partition
Bulk Loader
Query
Processor
Yes
Analytic
query
Results
Accelerate
Query?
Most Unix/Linux
64-bit platforms
In-Disk
[Compressed]
Relational / Row-based
Database
Informix database server
Informix Warehouse
Accelerator
No
POWERFUL HYBRID DATABASE PLATFORMPOWERFUL HYBRID DATABASE PLATFORM
Extreme Performance Transactions Extreme Performance Analytics

23
You can use IWA's In-Memory Analytics to Speed Up queries on…
Local or remote views
HA Clusters

IWA Overview and Seamless Integration with
Informix/IDS
• Before IWA…

Informix

Receives analytic query from client

Spends some time doing intensive I/O

Returns results back to the client
Informix 12.1
Results
SQL

Informix/IWA Setup and Workflow
• Using IWA: Process is transparent to Informix client
Results
SQL

Informix

Receives analytic query from client

If query uses data matching an IWA datamart
and can be accelerated, route/offload it to IWA

Returns results back to the client

If query is not based on an IWA datamart or
cannot be accelerated, Informix will resolve it
Informix 12.1

The Accelerator

Processes the routed SQL query extremely
fast and returns answer back to Informix
Linux on
Intel/AMD 64-bit
Bulk Loader Compressed
Database
Partition
TCP/IP
Compression
In-Memory Columnar Storage
Frequency Partitioning
Parallelism
Predicate evaluation on compressed data
Multi-core and Vector optimized algorithms
SIMD
Query Router
Query Processor

26
Informix Dynamic Server
Tables
Tables
Relational
Tables and
views
JSON
Collections {Customer}
partners
SQL & BI Applications
{Orders}
Inventory
Tables
Timeseries
Tables
{Orders}
Text index (BTS)
spatial indices
Informix Warehouse Accelerator (In-Memory Query Engine)
ODBC, JDBC connections
SQL Apps/Tools
MongoDB Drivers
NoSQL Apps/Tools

27
Informix Database Server
Informix warehouse Accelerator
BI Applications
Step 1. Install, configure,
start Informix
Step 2. Install, configure,
start Accelerator
Step 3. Connect Studio to
Informix & add accelerator
Step 4. Design, validate,
Deploy Data mart
Step 5. Load data to
accelerator
Ready for Queries
IBM Smart
Analytics Studio
Step 1
Step 2
Step 3
Step 4
Step 5
Ready
IWA – BLU ACCELERATION

28

All NOSQL Marts (all views based on JSON collections )

Hybrid Marts (subset of views based on JSON collections)

TPCDS 10GB workload

web_returns fact table along with 13 dimension tables

Total Memory on the machine: 250GB

Total nodes: 5

Coordinator nodes: 1

Coordinator node Memory: 20000 MB

Worker nodes : 4

Worker nodes memory : 100000 MB

IWA DRDA Interface: eth1 (IWA running on a different machine
than IDS server)
JSON Data Acceleration

29

Using genbson to create web_returns, we can literally accomplish this
with just two statements and with good performance
create table json_web_returns_coll(c1 serial, c2 bson);
insert into json_web_returns_coll select 0, genbson( web_returns_ext ) from
web_returns_ext;
719964 row(s) inserted.
create table json_customer_address_coll(c1 serial, c2 bson);
insert into json_customer_address_coll select 0, genbson( customer_address_ext )
from customer_address_ext;
create table json_date_dim_coll(c1 serial, c2 bson);
insert into json_date_dim_coll select 0, genbson( date_dim_ext ) from date_dim_ext;
create table json_time_dim_coll(c1 serial, c2 bson);
insert into json_time_dim_coll select 0, genbson( time_dim_ext ) from time_dim_ext;

30
Run time comparision of queries on JSON Collection with IDS and IWA
Queries against
TPCDS
(web_returns) SQL
tables
JSON Collections
(IDS)
JSON
Collections
(IWA)
Query_1.sql 5m 8.09s 0m 1.28s
Query_10.sql 18m 1.37s 0m 6.77s
Query_11.sql 15m 8.29s 0m 3.97s
Query_12.sql 0m 21.51s 0m 0.25s

31
Sqexplain - Query_1
Select {+FACT(web_returns)} first 100 wr_returning_customer_sk as ctr_customer_sk
,ca_state as ctr_state,sum(wr_return_amt) as ctr_total_return from web_returns
,date_dim ,customer_address where wr_returned_date_sk = d_date_sk and d_year
=1999 and wr_returning_addr_sk = ca_address_sk group by
wr_returning_customer_sk ,ca_state order by 1,2,3
…...
DYNAMIC HASH JOIN
Dynamic Hash Filters: BSON_VALUE_INT (informix.json_web_returns_coll.data ,
'wr_returned_date_sk' ) = BSON_VALUE_INT (informix.json_date_dim_coll.data ,
'd_date_sk' )
3) informix.json_customer_address_coll: SEQUENTIAL SCAN
DYNAMIC HASH JOIN
Dynamic Hash Filters: BSON_VALUE_INT (informix.json_web_returns_coll.data ,
'wr_returning_addr_sk' ) = BSON_VALUE_INT
(informix.json_customer_address_coll.data , 'ca_address_sk' )

32
Create demo database from the mongo shell
$ mongo demo_database
MongoDB shell version: 2.4.9
connecting to: demo_database
mongos>
Create / Load the "comments" fact table
mongos> db.comments.insert( [
{ uid:12345, pid:444, comment:"first" },
{ uid:12345, pid:888, comment:"second" },
{ uid:99999, pid:444, comment:"third" }
] )
Create /Load the "users" dimension table
mongos> db.users.insert( [
{ uid:12345, name:"john" },
{ uid:99999, name:"mia" }
] )

33
demo_database –
create view vcomments(uid,pid,comment) AS SELECT
data.uid::INT,
data.pid::INT,
data.comment::VARCHAR(128)
from comments;
create view vusers(uid,name) AS SELECT
data.uid::int,
data.name::varchar(128)
from users;
Create the SQL Views & analyze workload

34
set environment use_dwa 'probe cleanup';
set environment use_dwa 'probe start';
select {+ avoid_execute} * from vcomments c,vusers u
where c.uid=u.uid;
set environment use_dwa 'probe stop';
execute procedure
ifx_probe2mart('demo_database','noSQL_mart');
execute function ifx_createmart('demo_dwa','noSQL_mart');
execute function ifx_loadmart('demo_dwa','noSQL_mart','NONE');
Create the SQL Views & analyze workload

35
set environment use_dwa 'accelerate on';
select c.uid,name,comment from vcomments c,vusers u
where c.uid=u.uid and pid=444;
uid 12345
name john
comment first
uid 99999
name mia
comment third
Deploy NoSQL data mart & Issue queries

NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (11)

Similaire à NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World

Similaire à NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World (20)

Plus de Ajay Gupte

Plus de Ajay Gupte (7)

Dernier

Dernier (20)

NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World

Notes de l'éditeur