SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Dr. Seah Boon Keong
MIMOS BHD
seahbk2006@yahoo.com
Using High Performance
Parallel Data Warehouse
(HPDW) Big Data Analytical
Platform for Big Data Analysis
Content
1. Challenges on use of Big Data
2. HPDW Overview and Features
3. Benchmark
4. Demo
Harness Big Data to improve decision making
Decisions based upon
transactional data
• Social data
• Information on Video and images
• Machine-generated data (sensors,
etc)
Decisions based upon all
data
Before Big Data After Big Data
Challenges/Problems for Data Scientist or Analytics
1 2 3
Hardware Setup and
configuration
Big Data Setup Streaming Setup
Integration Work
and Testing
Selecting and test
multiple tools
required
Analytics SetupVisualization
Setup
Analytics then only can be performed (estimation effort -
1000 man hours for tasks 1-8)
4
5678
9
Challenges/Problems of RDBMS for processing
big data
• Bringing a combination of Big Data to data
warehouse is a challenge
• Existing RDBMS technology is not built for
handling large data set
• In addition the ability to perform join queries
between historical and streaming data
How HPDW can address data scientist or data analysis
pains?
HPDW Appliance
Integrated Big Data Platform for Batch and Stream
Hide the complexity of development and integration
from scratch of various components
Enable data scientist and data analysis to focus on
analysing data and not on big data setup
Provided with integrated R tools for data analysis
with HPDW data access
Provided with data visualization tool





Additional service for Data Warehouse migration to
Big Data
Enable various stream analysis such as IOT devices
through RESTful service in JSON


How HPDW can address data scientist or data analysis
pains?
HPDW Appliance
Integrated Big Data Platform for Batch and Stream
Hide the complexity of development and integration
from scratch of various components
Enable data scientist and data analysis to focus on
analysing data and not on big data setup
Provided with integrated R tools for data analysis
with HPDW data access
Provided with data visualization tool





Additional service for Data Warehouse migration to
Big Data
Enable various stream analysis such as IOT devices
through RESTful service in JSON


HPDW allows analysts to
focus on analyzing data,
not on managing
infrastructure
Content
1. Challenges on use of Big Data
2. HPDW Overview and Features
3. Benchmark
4. Demo
HPDW Big Data Analytics Architecture
Business
Data
Data
Streams
Social Log
Enterprise
DB
Data Streaming
Data Platform
Data Exploration
Analytics
Reports
Output
Sentiments
IoT Trends
Charts,
Dashboard
Drill Down
Reports
HPDW Big Data Analytics Platform
API
(REST+JSON)
JDBC ODBC
Data
Migratio
n Plugin
InMemory
Fast Data
Join SQL (Batch and
Stream, Data Lakes)
R Spark
Other BI Tools
Tableau
Python
Multi Data Source
Exploration
Charts
Drill Down
Hadoop
HPDW Big Data Analytics Architecture
HPDW Appliance (Screenshot)
Data Platform
HPDW Appliance
Fast SQL Query
Join Query for Historical
Data and Data Streams
RDBMS data migration
plugin
JDBC Support




ODBC Big DataSupport
for BI Integration such as
Tableau

HPDW Appliance
Fast SQL Query
Unify Query for Historical
Data and Data Streams
Analytics of multiple data
sources for immediate
data exploration
RDBMS data migration
connector
Supports Data Mining Tool
(R Package, etc)





Additional service for Data
Warehouse migration to
Big Data
Integrate with 3rd party BI
tool (Tableau, etc)


HPDW Sample Query and Unify Query
SELECT d.monthname_part||'-'||CAST(d.yearpart AS VARCHAR) AS
monthyear,r.referencesourcedesc,
ag.agegroupdesc,g.gendermalaydesc,SUM(f.encounter_cnt) AS encounter_cnt
FROM fact_patientencounter_100000000 f
JOIN dim_lk_reference r on r.sk_dim_reference=f.sk_dim_reference
JOIN dim_lk_agegroup ag ON ag.sk_dim_agegroup=f.sk_dim_agegroup
JOIN dim_lk_gender g ON g.sk_dim_gender=f.sk_dim_gender
JOIN dim_date d ON d.sk_dim_date=f.sk_dim_date where d.yearpart=2013
GROUP BY d.monthname_part||'-'||CAST(d.yearpart AS
VARCHAR),r.referencesourcedesc,ag.agegroupdesc,g.gendermalaydesc
SELECT * FROM hpdw.stream.tweets WHERE text like '%malaysia%'
Sample of join query output
SELECT dim_lk_gender.*, hpdw.stream.tweets.* FROM dim_lk_gender,
hpdw.stream.tweets WHERE text like '%malaysia%'
DB Viewer (Aqua Studio)
HPDW Appliance
JDBC
Connector
Viewing Data in HPDW
Tableau
HPDW Appliance
ODBC
Connector
Use of HPDW data in Tableau
Note: Compare to Hortonworks ODBC (Hive) benchmark, HPDW ODBC is much
faster for data access :
• <1 sec for HPDW ODBC direct,
• 30-40 sec for ODBC Hortonworks Hive.
RDBMS
HPDW Appliance
ETL Plugin for Data
Migration
HPDW Parallel ETL Component (Parallel Data
Migration -> RDBMS to HPDW)
RDBMS
HPDW Parallel ETL Component (Parallel Data
Migration -> RDBMS to HPDW)
ETL Plugin for Data Migration
Appliance
RESTful
Svc
Data Streaming
http://10.1.4.136:9000/message?sourcetype=realtime&message=<
JSON>
• Provided with RESTful+JSON for any stream input – IoT, social data
• Store Historical Stream Data for SQL query
• Query can be performed between Stream and Batch Data
Twitter JSON to HPDW Stream JSON
HPDW Appliance
Data Analytics
Supports Python, R, Spark, etc
Data analysis on HPDW data sources and others
Transform and aggregate data for further data understanding
Data Exploration
Supports Python, R, Spark, etc
Data Exploration
Supports Python, R, Spark, etc
Data Exploration
Content
1. Challenges on use of Big Data
2. HPDW Overview and Features
3. Benchmark
4. Demo
HPDW Benchmark
Nodes 4 x Physical Nodes
CPU
Intel Xeon Ten-Core
E5-2660v3 2.60Ghz
processors – 20 Cores
RAM 128 GB
Storage HDD 4 TB (RAID 10)
OS Ubuntu (64 bits)
Query 1 (Total number of
patients)
Query 2 (Total Encounters by month & year,
servicetype, nationality, agegroup, gender)
Query 3 (Total Encounters by month & year,
reference hospital, agegroup, gender)
SELECT count (*) FROM
fact_patientencounter_10000
0000
SELECT d.monthname_part||'-'||CAST(d.yearpart AS
VARCHAR) AS monthyear,
st.servicetypedesc,n.nationalitydesc,ag.agegroupdesc,
g.gendermalaydesc,SUM(f.encounter_cnt) AS
encounter_cnt
FROM fact_patientencounter_100000000 f
JOIN dim_lk_servicetype st on
st.sk_dim_servicetype=f.sk_dim_servicetype
JOIN dim_lk_agegroup ag ON
ag.sk_dim_agegroup=f.sk_dim_agegroup
JOIN dim_lk_gender g ON
g.sk_dim_gender=f.sk_dim_gender
JOIN dim_lk_nationality n ON
n.sk_dim_nationality=f.sk_dim_nationality
JOIN dim_date d ON d.sk_dim_date=f.sk_dim_date where
d.yearpart=2013
GROUP BY d.monthname_part||'-'||CAST(d.yearpart AS
VARCHAR),st.servicetypedesc,
n.nationalitydesc,ag.agegroupdesc,g.gendermalaydesc
SELECT d.monthname_part||'-'||CAST(d.yearpart AS
VARCHAR) AS monthyear,r.referencesourcedesc,
ag.agegroupdesc,g.gendermalaydesc,SUM(f.encounter_
cnt) AS encounter_cnt
FROM fact_patientencounter_100000000 f
JOIN dim_lk_reference r on
r.sk_dim_reference=f.sk_dim_reference
JOIN dim_lk_agegroup ag ON
ag.sk_dim_agegroup=f.sk_dim_agegroup
JOIN dim_lk_gender g ON
g.sk_dim_gender=f.sk_dim_gender
JOIN dim_date d ON d.sk_dim_date=f.sk_dim_date
where d.yearpart=2013
GROUP BY d.monthname_part||'-'||CAST(d.yearpart
AS
VARCHAR),r.referencesourcedesc,ag.agegroupdesc,g.ge
ndermalaydesc
Evaluation
1. Migrated MOH Data Warehouse from
PostgreSQL to HPDW
2. Performing 3 different sets of query in
100M, 200M and 300M
3. Comparing HPDW against a well-
known relational database
(PostgreSQL Enterprise 9.4)
CPU
Intel Xeon Ten-Core
E5-2660v3 2.60Ghz
processors – 20
Cores
RAM 96 GB
Storage HDD 4 TB (RAID 5)
OS Ubuntu (64 bits)
HPDW
PostgreSQL
Test Case: Total number of patients
Numbers of
records (in
millions)
Execution Time (in s )
HPDW PostgreSQL
1 2 3 4 5 Averag
e
1 2 3 4 5 Average
100 3 1 1 1 1 1.4 101.6 11.1 10.7 10.7 10.7 29
200 3 1 2 1 1 1.6 208.2 130.4 28.3 28.4 28.4 84.7
300 4 3 4 3 2 3.2 432.6 423.6 345.6 315.1 313.7 366.1
1.4
1.6
3.2
10.7
28.4
313.7
0 50 100 150 200 250 300 350
100 M
200 M
300 M
Execution Time (in s)
Rows
TC 1: Total number of patients
PostgreSQL
HPDW
100x
In Test Case 1:
PostgreSQL takes about 313.7 seconds to execute 300 M rows of records.
HPDW just takes 3.2 seconds. It is 100 times faster than PostgreSQL.
Test Case: Total Encounters by month & year, service type, nationality,
age group, gender
In Test Case 2:, PostgreSQL takes about 21057 seconds to execute 300 M rows of records.
HPDW just takes 92.4 seconds. It is 228 times faster than PostgreSQL.
Numbers
of records
(in
millions)
Execution Time (in s )
HPDW PostgreSQL *
1 2 3 4 5 Averag
e
1 2 3 4 5 Average
100 26 25 25 26 25 25.4 5757 5757
200 47 46 51 49 47 48 12682 12682
300 92 103 89 89 89 92.4 21057 21057
Note:*Test is only carried out once for each row of PostgreSQL due to time constraints
25.4
48
92.4
5757
12682
21057
0 5000 10000 15000 20000 25000
100 M
200 M
300 M
Execution Time (in s)
Rows
PostgreSQL
HPDW
228x
Test Case: Total Encounters by month & year, reference hospital, age group, gender
In Test Case 3: PostgreSQL takes about 508.6 seconds to execute 300 M rows of records.
HPDW just takes 46.8 seconds. It is 11 times faster than PostgreSQL.
17.8
31.2
46.8
87.9
276
508.6
0 100 200 300 400 500 600
100 M
200 M
300 M
Execution Time (in s)
Rows
PostgreSQL
HPDW
Numbers
of records
(in
millions)
Execution Time (in s )
HPDW PostgreSQL
1 2 3 4 5 Averag
e
1 2 3 4 5 Average
100 17 17 17 18 20 17.8 125.1 87.7 87.8 87.9 87.9 95.28
200 35 34 32 30 25 31.2 277 285 279 277.7 276.6 279.06
300 49 44 46 47 48 46.8 509.6 507.8 507.9 508.8 508.6 508.5
11x
Overview of Benchmark Results of HPDW vs
PostgreSQL
• Performance
improvement of
11x – 200x
• Data Size
 100M Rows=
8GB
 200M Rows=
16GB
 300M Rows=
24GB
228x
High Performance with Fewer Cores and Nodes
HPDW
Appliance
PostgreSQL
Over
11-200x
Faster
40 sec
3 hours
Content
1. Challenges on use of Big Data
2. HPDW Overview and Features
3. Benchmark
4. Demo
Conclusion and Future Work
32
Summary
• Successfully developed HPDW Big Data Analytical Platform
• Consists of 4 major sections: Data Streaming, Data Platform, Data
Exploration and Analytics
• Provide end-to-end solution for both storing and analyzing of historical and
streaming data - Unify query
• HPDW uses InMemory for data process and Infiniband/10gbs as the high
network speed to interconnect all the data nodes.
• Incorporates RESTful JSON for easy stream data insertion.
• Provide JDBC and ODBC connection for further 3rd party tool integration
Future Work
• To have more SQL query commands to be supported which will include the
update statement.
• On the HPDW Data analytics section is to include real time streaming of data
visualisation and also more data sources supported such as OData, Excel, etc
Thank you
33
• Questions and Comments?
Seah Boon Keong (Ph.D)
seahbk2006@yahoo.com

Contenu connexe

Tendances

Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
 
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...Andreas Schreiber
 
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons LearnedHadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons LearnedDataWorks Summit
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?IJCSIS Research Publications
 
Effect of countries in performance of hadoop.
Effect of countries in performance of hadoop.Effect of countries in performance of hadoop.
Effect of countries in performance of hadoop.Computer Science Journals
 
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...Yann Pauly
 
Finding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopFinding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopNushrat
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal
 
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd Iaetsd
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战hdhappy001
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...William Yetman
 
Hadoop Summit Kiosk Deck
Hadoop Summit Kiosk DeckHadoop Summit Kiosk Deck
Hadoop Summit Kiosk DeckSumeet Singh
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop frameworkTu Pham
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache HiveMurtaza Doctor
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Research Data Alliance
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 

Tendances (20)

Movie data analysis
Movie data analysisMovie data analysis
Movie data analysis
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
 
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons LearnedHadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
 
Effect of countries in performance of hadoop.
Effect of countries in performance of hadoop.Effect of countries in performance of hadoop.
Effect of countries in performance of hadoop.
 
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-...
 
Finding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopFinding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache Hadoop
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
 
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasets
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战
 
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
 
Hadoop Summit Kiosk Deck
Hadoop Summit Kiosk DeckHadoop Summit Kiosk Deck
Hadoop Summit Kiosk Deck
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache Hive
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 

En vedette

Christo Kutrovsky - Maximize Data Warehouse Performance with Parallel Queries
Christo Kutrovsky - Maximize Data Warehouse Performance with Parallel QueriesChristo Kutrovsky - Maximize Data Warehouse Performance with Parallel Queries
Christo Kutrovsky - Maximize Data Warehouse Performance with Parallel QueriesChristo Kutrovsky
 
Versa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarVersa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarShawn Rao
 
Analytic powerhouse parallel data warehouse und r
Analytic powerhouse parallel data warehouse und rAnalytic powerhouse parallel data warehouse und r
Analytic powerhouse parallel data warehouse und rMarcel Franke
 
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paperSql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paperWendy Frodyma
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015In Memory-Technologien im Vergleich - SQL Server Konferenz 2015
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015Marcel Franke
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMicrosoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMark Kromer
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsDavid Portnoy
 

En vedette (10)

Christo Kutrovsky - Maximize Data Warehouse Performance with Parallel Queries
Christo Kutrovsky - Maximize Data Warehouse Performance with Parallel QueriesChristo Kutrovsky - Maximize Data Warehouse Performance with Parallel Queries
Christo Kutrovsky - Maximize Data Warehouse Performance with Parallel Queries
 
Versa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarVersa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinar
 
Analytic powerhouse parallel data warehouse und r
Analytic powerhouse parallel data warehouse und rAnalytic powerhouse parallel data warehouse und r
Analytic powerhouse parallel data warehouse und r
 
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paperSql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015In Memory-Technologien im Vergleich - SQL Server Konferenz 2015
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMicrosoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAs
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 

Similaire à Hpdw 2015-v10-paper

Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Impetus Technologies
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Microsoft SQL Server - Parallel Data Warehouse Presentation
Microsoft SQL Server - Parallel Data Warehouse PresentationMicrosoft SQL Server - Parallel Data Warehouse Presentation
Microsoft SQL Server - Parallel Data Warehouse PresentationMicrosoft Private Cloud
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jugGerald Muecke
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementAndreas Schreiber
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsAndreas Schreiber
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Data Culture Series - Keynote & Panel - Reading - 12th May 2015
Data Culture Series  - Keynote & Panel - Reading - 12th May 2015Data Culture Series  - Keynote & Panel - Reading - 12th May 2015
Data Culture Series - Keynote & Panel - Reading - 12th May 2015Jonathan Woodward
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisYuanyuan Tian
 
From Relational Database Management to Big Data: Solutions for Data Migration...
From Relational Database Management to Big Data: Solutions for Data Migration...From Relational Database Management to Big Data: Solutions for Data Migration...
From Relational Database Management to Big Data: Solutions for Data Migration...Cognizant
 
Applying Machine Learning using H2O
Applying Machine Learning using H2OApplying Machine Learning using H2O
Applying Machine Learning using H2OIan Gomez
 
Applying Machine Learning using H2O
Applying Machine Learning using H2OApplying Machine Learning using H2O
Applying Machine Learning using H2OSri Ambati
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBencht_ivanov
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & RŁukasz Grala
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Zhenxiao Luo
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftRed Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftTravis Wright
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 

Similaire à Hpdw 2015-v10-paper (20)

Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Microsoft SQL Server - Parallel Data Warehouse Presentation
Microsoft SQL Server - Parallel Data Warehouse PresentationMicrosoft SQL Server - Parallel Data Warehouse Presentation
Microsoft SQL Server - Parallel Data Warehouse Presentation
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Data Culture Series - Keynote & Panel - Reading - 12th May 2015
Data Culture Series  - Keynote & Panel - Reading - 12th May 2015Data Culture Series  - Keynote & Panel - Reading - 12th May 2015
Data Culture Series - Keynote & Panel - Reading - 12th May 2015
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
From Relational Database Management to Big Data: Solutions for Data Migration...
From Relational Database Management to Big Data: Solutions for Data Migration...From Relational Database Management to Big Data: Solutions for Data Migration...
From Relational Database Management to Big Data: Solutions for Data Migration...
 
Applying Machine Learning using H2O
Applying Machine Learning using H2OApplying Machine Learning using H2O
Applying Machine Learning using H2O
 
Applying Machine Learning using H2O
Applying Machine Learning using H2OApplying Machine Learning using H2O
Applying Machine Learning using H2O
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open ShiftRed Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
Red Hat Summit 2017 - Intro to SQL Server on RHEL and Open Shift
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 

Dernier

Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Dernier (20)

Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

Hpdw 2015-v10-paper

  • 1. Dr. Seah Boon Keong MIMOS BHD seahbk2006@yahoo.com Using High Performance Parallel Data Warehouse (HPDW) Big Data Analytical Platform for Big Data Analysis
  • 2. Content 1. Challenges on use of Big Data 2. HPDW Overview and Features 3. Benchmark 4. Demo
  • 3. Harness Big Data to improve decision making Decisions based upon transactional data • Social data • Information on Video and images • Machine-generated data (sensors, etc) Decisions based upon all data Before Big Data After Big Data
  • 4. Challenges/Problems for Data Scientist or Analytics 1 2 3 Hardware Setup and configuration Big Data Setup Streaming Setup Integration Work and Testing Selecting and test multiple tools required Analytics SetupVisualization Setup Analytics then only can be performed (estimation effort - 1000 man hours for tasks 1-8) 4 5678 9
  • 5. Challenges/Problems of RDBMS for processing big data • Bringing a combination of Big Data to data warehouse is a challenge • Existing RDBMS technology is not built for handling large data set • In addition the ability to perform join queries between historical and streaming data
  • 6. How HPDW can address data scientist or data analysis pains? HPDW Appliance Integrated Big Data Platform for Batch and Stream Hide the complexity of development and integration from scratch of various components Enable data scientist and data analysis to focus on analysing data and not on big data setup Provided with integrated R tools for data analysis with HPDW data access Provided with data visualization tool      Additional service for Data Warehouse migration to Big Data Enable various stream analysis such as IOT devices through RESTful service in JSON  
  • 7. How HPDW can address data scientist or data analysis pains? HPDW Appliance Integrated Big Data Platform for Batch and Stream Hide the complexity of development and integration from scratch of various components Enable data scientist and data analysis to focus on analysing data and not on big data setup Provided with integrated R tools for data analysis with HPDW data access Provided with data visualization tool      Additional service for Data Warehouse migration to Big Data Enable various stream analysis such as IOT devices through RESTful service in JSON   HPDW allows analysts to focus on analyzing data, not on managing infrastructure
  • 8. Content 1. Challenges on use of Big Data 2. HPDW Overview and Features 3. Benchmark 4. Demo
  • 9. HPDW Big Data Analytics Architecture Business Data Data Streams Social Log Enterprise DB Data Streaming Data Platform Data Exploration Analytics Reports Output Sentiments IoT Trends Charts, Dashboard Drill Down Reports HPDW Big Data Analytics Platform API (REST+JSON) JDBC ODBC Data Migratio n Plugin InMemory Fast Data Join SQL (Batch and Stream, Data Lakes) R Spark Other BI Tools Tableau Python Multi Data Source Exploration Charts Drill Down Hadoop
  • 10. HPDW Big Data Analytics Architecture
  • 12. Data Platform HPDW Appliance Fast SQL Query Join Query for Historical Data and Data Streams RDBMS data migration plugin JDBC Support     ODBC Big DataSupport for BI Integration such as Tableau 
  • 13. HPDW Appliance Fast SQL Query Unify Query for Historical Data and Data Streams Analytics of multiple data sources for immediate data exploration RDBMS data migration connector Supports Data Mining Tool (R Package, etc)      Additional service for Data Warehouse migration to Big Data Integrate with 3rd party BI tool (Tableau, etc)   HPDW Sample Query and Unify Query SELECT d.monthname_part||'-'||CAST(d.yearpart AS VARCHAR) AS monthyear,r.referencesourcedesc, ag.agegroupdesc,g.gendermalaydesc,SUM(f.encounter_cnt) AS encounter_cnt FROM fact_patientencounter_100000000 f JOIN dim_lk_reference r on r.sk_dim_reference=f.sk_dim_reference JOIN dim_lk_agegroup ag ON ag.sk_dim_agegroup=f.sk_dim_agegroup JOIN dim_lk_gender g ON g.sk_dim_gender=f.sk_dim_gender JOIN dim_date d ON d.sk_dim_date=f.sk_dim_date where d.yearpart=2013 GROUP BY d.monthname_part||'-'||CAST(d.yearpart AS VARCHAR),r.referencesourcedesc,ag.agegroupdesc,g.gendermalaydesc SELECT * FROM hpdw.stream.tweets WHERE text like '%malaysia%' Sample of join query output SELECT dim_lk_gender.*, hpdw.stream.tweets.* FROM dim_lk_gender, hpdw.stream.tweets WHERE text like '%malaysia%'
  • 14. DB Viewer (Aqua Studio) HPDW Appliance JDBC Connector Viewing Data in HPDW
  • 15. Tableau HPDW Appliance ODBC Connector Use of HPDW data in Tableau Note: Compare to Hortonworks ODBC (Hive) benchmark, HPDW ODBC is much faster for data access : • <1 sec for HPDW ODBC direct, • 30-40 sec for ODBC Hortonworks Hive.
  • 16. RDBMS HPDW Appliance ETL Plugin for Data Migration HPDW Parallel ETL Component (Parallel Data Migration -> RDBMS to HPDW)
  • 17. RDBMS HPDW Parallel ETL Component (Parallel Data Migration -> RDBMS to HPDW) ETL Plugin for Data Migration
  • 18. Appliance RESTful Svc Data Streaming http://10.1.4.136:9000/message?sourcetype=realtime&message=< JSON> • Provided with RESTful+JSON for any stream input – IoT, social data • Store Historical Stream Data for SQL query • Query can be performed between Stream and Batch Data Twitter JSON to HPDW Stream JSON
  • 20. Supports Python, R, Spark, etc Data analysis on HPDW data sources and others Transform and aggregate data for further data understanding Data Exploration
  • 21. Supports Python, R, Spark, etc Data Exploration
  • 22. Supports Python, R, Spark, etc Data Exploration
  • 23. Content 1. Challenges on use of Big Data 2. HPDW Overview and Features 3. Benchmark 4. Demo
  • 24. HPDW Benchmark Nodes 4 x Physical Nodes CPU Intel Xeon Ten-Core E5-2660v3 2.60Ghz processors – 20 Cores RAM 128 GB Storage HDD 4 TB (RAID 10) OS Ubuntu (64 bits) Query 1 (Total number of patients) Query 2 (Total Encounters by month & year, servicetype, nationality, agegroup, gender) Query 3 (Total Encounters by month & year, reference hospital, agegroup, gender) SELECT count (*) FROM fact_patientencounter_10000 0000 SELECT d.monthname_part||'-'||CAST(d.yearpart AS VARCHAR) AS monthyear, st.servicetypedesc,n.nationalitydesc,ag.agegroupdesc, g.gendermalaydesc,SUM(f.encounter_cnt) AS encounter_cnt FROM fact_patientencounter_100000000 f JOIN dim_lk_servicetype st on st.sk_dim_servicetype=f.sk_dim_servicetype JOIN dim_lk_agegroup ag ON ag.sk_dim_agegroup=f.sk_dim_agegroup JOIN dim_lk_gender g ON g.sk_dim_gender=f.sk_dim_gender JOIN dim_lk_nationality n ON n.sk_dim_nationality=f.sk_dim_nationality JOIN dim_date d ON d.sk_dim_date=f.sk_dim_date where d.yearpart=2013 GROUP BY d.monthname_part||'-'||CAST(d.yearpart AS VARCHAR),st.servicetypedesc, n.nationalitydesc,ag.agegroupdesc,g.gendermalaydesc SELECT d.monthname_part||'-'||CAST(d.yearpart AS VARCHAR) AS monthyear,r.referencesourcedesc, ag.agegroupdesc,g.gendermalaydesc,SUM(f.encounter_ cnt) AS encounter_cnt FROM fact_patientencounter_100000000 f JOIN dim_lk_reference r on r.sk_dim_reference=f.sk_dim_reference JOIN dim_lk_agegroup ag ON ag.sk_dim_agegroup=f.sk_dim_agegroup JOIN dim_lk_gender g ON g.sk_dim_gender=f.sk_dim_gender JOIN dim_date d ON d.sk_dim_date=f.sk_dim_date where d.yearpart=2013 GROUP BY d.monthname_part||'-'||CAST(d.yearpart AS VARCHAR),r.referencesourcedesc,ag.agegroupdesc,g.ge ndermalaydesc Evaluation 1. Migrated MOH Data Warehouse from PostgreSQL to HPDW 2. Performing 3 different sets of query in 100M, 200M and 300M 3. Comparing HPDW against a well- known relational database (PostgreSQL Enterprise 9.4) CPU Intel Xeon Ten-Core E5-2660v3 2.60Ghz processors – 20 Cores RAM 96 GB Storage HDD 4 TB (RAID 5) OS Ubuntu (64 bits) HPDW PostgreSQL
  • 25. Test Case: Total number of patients Numbers of records (in millions) Execution Time (in s ) HPDW PostgreSQL 1 2 3 4 5 Averag e 1 2 3 4 5 Average 100 3 1 1 1 1 1.4 101.6 11.1 10.7 10.7 10.7 29 200 3 1 2 1 1 1.6 208.2 130.4 28.3 28.4 28.4 84.7 300 4 3 4 3 2 3.2 432.6 423.6 345.6 315.1 313.7 366.1 1.4 1.6 3.2 10.7 28.4 313.7 0 50 100 150 200 250 300 350 100 M 200 M 300 M Execution Time (in s) Rows TC 1: Total number of patients PostgreSQL HPDW 100x In Test Case 1: PostgreSQL takes about 313.7 seconds to execute 300 M rows of records. HPDW just takes 3.2 seconds. It is 100 times faster than PostgreSQL.
  • 26. Test Case: Total Encounters by month & year, service type, nationality, age group, gender In Test Case 2:, PostgreSQL takes about 21057 seconds to execute 300 M rows of records. HPDW just takes 92.4 seconds. It is 228 times faster than PostgreSQL. Numbers of records (in millions) Execution Time (in s ) HPDW PostgreSQL * 1 2 3 4 5 Averag e 1 2 3 4 5 Average 100 26 25 25 26 25 25.4 5757 5757 200 47 46 51 49 47 48 12682 12682 300 92 103 89 89 89 92.4 21057 21057 Note:*Test is only carried out once for each row of PostgreSQL due to time constraints 25.4 48 92.4 5757 12682 21057 0 5000 10000 15000 20000 25000 100 M 200 M 300 M Execution Time (in s) Rows PostgreSQL HPDW 228x
  • 27. Test Case: Total Encounters by month & year, reference hospital, age group, gender In Test Case 3: PostgreSQL takes about 508.6 seconds to execute 300 M rows of records. HPDW just takes 46.8 seconds. It is 11 times faster than PostgreSQL. 17.8 31.2 46.8 87.9 276 508.6 0 100 200 300 400 500 600 100 M 200 M 300 M Execution Time (in s) Rows PostgreSQL HPDW Numbers of records (in millions) Execution Time (in s ) HPDW PostgreSQL 1 2 3 4 5 Averag e 1 2 3 4 5 Average 100 17 17 17 18 20 17.8 125.1 87.7 87.8 87.9 87.9 95.28 200 35 34 32 30 25 31.2 277 285 279 277.7 276.6 279.06 300 49 44 46 47 48 46.8 509.6 507.8 507.9 508.8 508.6 508.5 11x
  • 28. Overview of Benchmark Results of HPDW vs PostgreSQL • Performance improvement of 11x – 200x • Data Size  100M Rows= 8GB  200M Rows= 16GB  300M Rows= 24GB 228x
  • 29. High Performance with Fewer Cores and Nodes HPDW Appliance PostgreSQL Over 11-200x Faster 40 sec 3 hours
  • 30. Content 1. Challenges on use of Big Data 2. HPDW Overview and Features 3. Benchmark 4. Demo
  • 31. Conclusion and Future Work 32 Summary • Successfully developed HPDW Big Data Analytical Platform • Consists of 4 major sections: Data Streaming, Data Platform, Data Exploration and Analytics • Provide end-to-end solution for both storing and analyzing of historical and streaming data - Unify query • HPDW uses InMemory for data process and Infiniband/10gbs as the high network speed to interconnect all the data nodes. • Incorporates RESTful JSON for easy stream data insertion. • Provide JDBC and ODBC connection for further 3rd party tool integration Future Work • To have more SQL query commands to be supported which will include the update statement. • On the HPDW Data analytics section is to include real time streaming of data visualisation and also more data sources supported such as OData, Excel, etc
  • 32. Thank you 33 • Questions and Comments? Seah Boon Keong (Ph.D) seahbk2006@yahoo.com