60. 評価で用いたSQL
4マスタ連携
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."supplier" "supplier" ON ("lineorder"."lo_suppkey" = "supplier"."s_suppkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
INNER JOIN "public"."dwdate" "dwdate" ON ("lineorder"."lo_orderdate" = "dwdate"."d_datekey")
WHERE (("customer"."c_nation" IN ('ALGERIA', 'ARGENTINA', 'BRAZIL')) AND ("part"."p_color" IN ('black', 'blue',
'brown')))
GROUP BY 1, 2
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."supplier" "supplier" ON ("lineorder"."lo_suppkey" = "supplier"."s_suppkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
INNER JOIN "public"."dwdate" "dwdate" ON ("lineorder"."lo_orderdate" = "dwdate"."d_datekey")
GROUP BY 1, 2
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."supplier" "supplier" ON ("lineorder"."lo_suppkey" = "supplier"."s_suppkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
INNER JOIN "public"."dwdate" "dwdate" ON ("lineorder"."lo_orderdate" = "dwdate"."d_datekey")
WHERE ("part"."p_color" IN ('black', 'blue', 'brown'))
GROUP BY 1, 2
ヒット件数 多
(51億件)
ヒット件数 中
(2.1億件)
ヒット件数 少
(25百万件)
Page. 60
61. 評価で用いたSQL
2マスタ連携
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
WHERE (("customer"."c_nation" IN ('ALGERIA', 'ARGENTINA', 'BRAZIL')) AND ("part"."p_color" IN ('black', 'blue',
'brown')))
GROUP BY 1, 2
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
GROUP BY 1, 2
SELECT "customer"."c_nation" AS "c_nation",
"part"."p_color" AS "p_color",
SUM("lineorder"."lo_revenue") AS "sum_lo_revenue_ok"
FROM "public"."lineorder" "lineorder"
INNER JOIN "public"."customer" "customer" ON ("lineorder"."lo_custkey" = "customer"."c_custkey")
INNER JOIN "public"."part" "part" ON ("lineorder"."lo_partkey" = "part"."p_partkey")
WHERE ("part"."p_color" IN ('black', 'blue', 'brown'))
GROUP BY 1, 2
ヒット件数 多
(51億件)
ヒット件数 中
(2.1億件)
ヒット件数 少
(25百万件)
Page. 61
62. Data Platform for Hadoopの概要
バッチ処理/リアルタイム処理と多様なデータ分析に対応可能な『ビッグデータ分
析共通基盤』を、事前検証済みで提供し、迅速な導入を実現
① バッチ処理とリアルタイム処理
に対応
大規模データの分散処理に適した「Apache™
Hadoop®」とインメモリ処理を効率的に行い、
リアルタイムな処理を可能にする「Apache™
Spark®」により大規模データのバッチ処理か
らリアルタイム処理までを実現
② 構造データ、非構造データ
両方の処理に最適
③ 事前検証による迅速な導入
と安定した運用
大規模データを元来のデータ構造のま
まで蓄積。データ活用の際には、アプ
リケーションの用途に合わせてデータ
構造を指定しながら読みだすことがで
きるため、データ活用の自由度を拡大
必要なハードウェアとソフトウェアを組み
合わせ、事前に設計(サイジングとチュー
ニング含む)・検証・構築した統合型シス
テムで提供。導入期間の短縮とプラット
フォーム品質の安定を両立し、トータルコ
ストの削減に貢献
リリース情報(国内)2016年2月出荷開始
64. DX 1000の特長
省電力 高並列・高密度
Traditional servers
*1:Based on all the servers with Atom C2000 and 2.5GbEther
*2: A total of network bandwidth of all downlinks in the rack
*3:RMS :Rack Management System
700 servers
per rack
75% less space
75% less energy
5600 processor cores
per rack
22TB of memory per rack
90TB of SSD storage
per rack
Networking*2
3.5Tbps per rack
• サーバ向けIntel® AtomTMプロセッサー C2000シリーズの採用やSSDの標準搭
載等により、クラス最高レベル*1の高集積/省電力化を実現
• CPU性能とバランスのとれた2.5GbitEthernetをいち採用。10GbEtherにくらべ電
力を抑えながら、スケールアウト型のビッグデータ分析基盤で求められる高速ネ
ットワーク環境を実現
*1:Atom C2000搭載,2.5GbEtherNW対応機種において
*2:サーバ当たりNW帯域×サーバ台数
Page. 64
Our queries were already highly optimized. So we focused on some other parts. A query execution essentially is put together from
– Client execution [ 0s if done correctly ]
– Optimization [HiveServer2] [~ 0.1s]
– HCatalog lookups [Hcatalog, Metastore] [ very fast in hive 14 ]
– Application Master creation [4-5s]
– Container Allocation [3-5s]
– Query Execution