Contenu connexe Similaire à SQL et in-memory sur Hadoop avec Pivotal et HAWQ (20) Plus de Modern Data Stack France (20) SQL et in-memory sur Hadoop avec Pivotal et HAWQ2. SQL et in-memory sur Hadoop
avec Pivotal et HAWQ
Alexandre Vasseur
Jérôme Campo
Field Engineering, Pivotal
© Copyright 2013 Pivotal. All rights reserved.
3. Pivotal
Spin off d’EMC et VMware
Editeur logiciel
Plus de 1250 employés
Data Science Team
© Copyright 2013 Pivotal. All rights reserved.
Pivotal HD
4. Hadoop à 1000 noeuds pour la communauté
Ÿ 1000 noeuds, 24 000 cores
Ÿ 48 TB RAM
Ÿ 24 PB (12 000 disques)
Ÿ Améliorer Hadoop
Ÿ Valider l’éco système Hadoop à
l’échelle
http://www.analyticsworkbench.com
© Copyright 2013 Pivotal. All rights reserved.
5. Pivotal Hadoop
HAWQ– Advanced
Database Services
ANSI SQL + Analytics
Pivotal HD
Enterprise
Resource
Management
& Workflow
Xtension
Framework
HBase
Catalog
Services
Dynamic Pipelining
Pig, Hive,
Mahout
Map Reduce
Hadoop Virtualization (HVE)
Yarn
Sqoop
Data Loader
Apache
Pivotal HD Added Value
Configure,
Deploy,
Monitor,
Manage
HDFS
Zookeeper
© Copyright 2013 Pivotal. All rights reserved.
Query
Optimizer
Command
Flume
Center
6. 10 ans de R&D sur la base de données
massivement parallèle
• Moteur SQL haute performance
– Multi-petabyte
– ANSI SQL complet
– Drivers standardisés et éco-système
• Accès direct aux formats Hadoop
– Text, Avro, Hive, HBase, autres formats via API
• Database massivement parrallèle sur Hadoop
– Format colonne, compressé, partitionnés, polymorphe
– Gestion des priorités et des accès
MAD
lib
© Copyright 2013 Pivotal. All rights reserved.
• In-Database Analytics
– Bibliothèques statistiques et machine learning
parrallèlisées
– Accessible via R ou SQL
7. Fonctionnement de HAWQ
Clients
SELECT beer, price
FROM Bars b, Sells s
WHERE b.name = s.bar
AND b.city = ‘San Francisco’
HAWQ Master Host
Query Parser
JDBC/ODBC
SQL Console
Query Optimizer
HDFS Namenode
HAWQ Segment Host
HAWQ Segment Host
HAWQ Segment Host
Query Executor
Query Executor
Query Executor
HDFS Datanode
HDFS Datanode
HDFS Datanode
© Copyright 2013 Pivotal. All rights reserved.
...
8. Fonctionnement de HAWQ
Execution Plan
MotionGather
Clients
Projects.beer, s.price
HashJoinb.name = s.bar
HAWQ Master Host
MotionRedist(b.name)
Query Parser
JDBC/ODBC
SQL Console
Query Optimizer
HDFS Namenode
€
s
ScanSells
Filterb.city = 'San Francisco'
b
ScanBars
HAWQ Segment Host
HAWQ Segment Host
HAWQ Segment Host
Query Executor
Query Executor
Query Executor
HDFS Datanode
HDFS Datanode
HDFS Datanode
© Copyright 2013 Pivotal. All rights reserved.
...
9. Fonctionnement de HAWQ
Clients
HAWQ Master Host
Query Parser
JDBC/ODBC
Query Optimizer
HDFS Namenode
SQL Console
HAWQ Segment Host
MotionGather
Projects.beer, s.price
Query Executor
HAWQ Segment Host
MotionGather
Projects.beer, s.price
HAWQ Segment Host
MotionGather
Projects.beer, s.price
MotionRedist(b.name)
MotionRedist(b.name)
MotionRedist(b.name)
s
ScanSells
Filterb.city = 'San Francisco'
s
ScanSells
Filterb.city = 'San Francisco'
HDFS Datanode
© Copyright 2013 Pivotal. All rights reserved.
Filterb.city = 'San Francisco'
b
ScanBars
b
ScanBars
b
ScanBars
Query Executor
HashJoinb.name = s.bar
HashJoinb.name = s.bar
HashJoinb.name = s.bar
s
ScanSells
Query Executor
HDFS Datanode
HDFS Datanode
...
10. 10 ans de R&D sur les grilles mémoires
NoSQL/NewSQL
Sensor Data / Feeds
Map-Reduce
Analytic Apps
Model
Refresh
Model
Refresh
I/P & O/P
Formatter
Online Apps
HAWQ
GPXF
DW
Native Persistence
External Tables
Re-evaluate
Model
Shared Data - HFiles
© Copyright 2013 Pivotal. All rights reserved.
Re-evaluate
Model
HDFS
ICM
11. In-memory No/NewSQL sur Hadoop
Ÿ Bénéfices d’une grille mémoire
– Données en mémoire quand il le faut
– Très haute disponibilité, concurrence massive, temps de réponse mémoire
Ÿ Intégration native Hadoop
– Eviction / stockage sur HDFS natif
– Accès à la donnée in-memory ou globale via SQL/NoSQL et HAWQ
© Copyright 2013 Pivotal. All rights reserved.
12. Tester Pivotal HD
Pivotal HD Single Node VM
Pivotal HD avec Vagrant
Ÿ Hadoop Stack Components – Pig, Hive,
Hbase, HDFS, Mahout, YARN, MRv2
Ÿ Installation multi VM avec Virtual Box ou
VMware Workstation/Fusion
Ÿ HAWQ / PXF
Ÿ Command Center
Ÿ DataLoader
Ÿ Eclipse, Maven, Ant
Ÿ Retail Data Set
http://gopivotal.com/pivotal-products/data/pivotal-hd#4
http://blog.gopivotal.com/products/in-45-min-set-up-hadoop-pivotal-hd-on-a-multi-vm-cluster-run-test-data
© Copyright 2013 Pivotal. All rights reserved.
13. Big/Fast Demo – Big Data Workflow
HTTP Pipe
Filter
Transform
Tap
Tap
JSON Field
Extract
JSON Field
Logistic
Extract
Regression
MAD
lib
© Copyright 2013 Pivotal. All rights reserved.
HDFS Sink
Analytic Counter
Analytic Counter