SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
The Glass Half Full
Using Programmable Hardware Accelerators in
Analytical Databases
Zsolt István
IMDEA Software Institute 1
IMDEA Software
Institute
• 16 Faculty in the areas of:
• Program Analysis and Verification
• Languages and Compilers
• Security and Privacy
• Theoretical Computer Science
• Distributed Systems and Databases
• ~10 Post-docs, ~25 PhD Students,
~10 Interns
• Located in UPM Montegancedo Campus,
Madrid
• We are hiring! https://software.imdea.org/
▪ OLAP – Online Analytical Processing
▪ Large datasets – up to TBs
▪ Ad-hoc querying to extract insight, recurring
reporting – Possibly complex operations
▪ Read-mostly workloads, updates in batches
▪ OLTP – Online Transaction Processing
▪ Smaller datasets
▪ Queries known, relate to business actions
▪ Makes heavy use of indexes
▪ Reads and updates intermixed
3
Context: Analytical Databases
4
Databases were a 25 Billion $ market in 2018…
Could we specialize machines to them?
https://www.statista.com/statistics/810188/worldwide-commercial-database-market-size/
▪ Fully custom machine for databases
▪ Processors – special ISA microprocessors
▪ Memory – magnetic bubbles and CCDs
▪ Semiconductor technology and
general purpose CPUs took over
5
Database Computer – ’70s
“The first goal is to design it with the
capability of handling a very large on-line
database of 10^10 bytes or beyond since
special-purpose machines are not likely to
be cost-effective for small databases.”
Jayanta Banerjee, David K. Hsiao, Krishnamurthi Kannan: DBC - A Database Computer for Very Large Databases.
IEEE Trans. Computers 28(6): 414-429 (1979)
▪ Based on VAX multi-
processor system
▪ By the time the software
and hardware were
developed, CPUs have
become much faster
▪ Couldn’t keep up with
Moore’s law
6
Gamma Machine – ’80s
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High
Performance Dataflow Database Machine. VLDB 1986: 228-237
7
Data/Compute Gap
CPU Scaling Commodity in Cloud
Specialized
Hardware
Revival
8
Renewed interest in Specialized Hardware
ASICsFPGAsCPUs
Field Programmable Gate Array (FPGA)
▪ Free choice of architecture
▪ Fine-grained pipelining,
communication, distributed memory
▪ Tradeoff: all “code” occupies chip
space
▪ Evolving platform: larger chips, more
heterogeneity
9
Re-programmable Specialized Hardware
Op 1
Op 2
Op 3
10
Integration Options
Accel.
1) On the side 2) In data-path 3) Co-processor
Data
Data Data
Accel.
Accel.
▪ Accelerator
▪ Amazon F1
▪ In data path
▪ Microsoft Catapult
▪ Co-processor
▪ Intel Xeon+FPGA
11
In the Cloud Today
Socket1 Socket2
CPU FPGA
Socket1
CPU FPGA
CPU
FPGA
Intel Xeon+FPGA Gen.1 Intel Xeon+FPGA Gen.2
12
The Glass Half Empty…
▪ 1) On the side acceleration introduces overhead
▪ Many related work offers no real speedup if we factor in data
movement, transformation, software overhead… 13
The Glass Half Empty…
0
20
40
60
80
100
120
Software With Acceleration
Query execution time
Compute Data Movement
2xAccel.
Data
▪ 2) “All or nothing” behavior makes query planning difficult
▪ Example: fixed capacity hash table on FPGA
▪ Constant time access for reads and writes
▪ What happens if data doesn’t fit?
▪ Can’t always know the number of keys aprioi
14
The Glass Half Empty…
#
▪ 3) Analytical databases becoming more optimized / not much
compute in core SQL
▪ X100 [CIDR05] showed that <10% of compute time spent on SQL
operators +,-,*,SUM,AVG in analytical queries
▪ Columnar stores often memory bound (10s of GB/s)
15
The Glass Half Empty…
▪ On the side acceleration introduces overhead
▪ “All or nothing” behavior makes query planning difficult
▪ Analytical databases becoming more optimized / not much
compute in core SQL
16
The Glass Half Empty…
▪ On the side acceleration introduces overhead
✓ Reduce data movement bottlenecks
17
The Glass Half Full…
▪ IBEX: Database storage engine with processing offload
▪ Filter and pre-aggregate for analytic workloads
18
Processing in data path: Smart Flash
Database Server
IBEX
SSD
IBEX – An Intelligent Storage Engine with Support for Advanced SQL Off-loading. L. Woods, Z. Istvan
and G. Alonso, VLDB’14
→ Larger bandwidth, more IOPS
(Samsung YourSQL, MIT BlueDBM)
▪ Opportunity to extend SSDs/Flash
with complex offload
Samsung “smart” SSD
19
Processing in data path: Distributed Processing
Workers(Compute)
Storage
+ Provisioning
+ Scalability
Caribou: Distributed
storage with processing
• Specialized HW nodes
• 10Gbps access
• 25W power cons.
Zsolt István, David Sidler, Gustavo Alonso: Caribou: Intelligent Distributed Storage. PVLDB 10(11), 2017.
20
Smart Storage in Databases: Filter push-down
Intel Hyperscan library (Xeon E5-2680 v2)
2.8x
SELECT … FROM customer
WHERE age<35 AND purchases>2
AND address LIKE “%PO. Box 123%”
▪ Challenge: guarantee that filtering never slows down retrieval
▪ Algorithms can be re-imagined to become bandwidth-bound
instead of compute-bound
▪ Extend the state of the art: parameterization without re-programming [FCCM16]
▪ Many options: Regular expressions, comparisons, decompression, …
[FCCM16] Runtime Parameterizable Regular Expression Operators for Databases. Zs. Istvan, D. Sidler, G. Alonso. FCCM’16
✓ Reduce data movement bottlenecks
▪ “All or nothing” behavior makes query planning difficult
✓ Hybrid processing
21
The Glass Half Full…
▪ Group-by: Compute aggregate function over categories
▪ select avg(salary) from employees group by department
22
IBEX’s Hybrid Group-by
CPUIbex with SW-only Group-By
Projection Selection Group-by
Final
Group
s
Input table
Filtered
data
▪ Group-by: Compute aggregate function over categories
▪ select avg(salary) from employees group by department
23
IBEX’s Hybrid Group-by
CPUIbex with HW-only Group-By
Projection Selection Group-by
Final
Group
s
Input table
Filtered
data
CPUIbex with HW-only Group-By
Projection Selection Group-by
Final
Group
s
Input table
Filtered
data
▪ Group-by: Compute aggregate function over categories
▪ select avg(salary) from employees group by department
▪ If number of groups does not fit on FPGA?
▪ Send partial aggregates – finalize in SW
▪ Worst case: same as no acceleration
▪ Best- case: All in HW!
24
IBEX’s Hybrid Group-by
CPUIbex with Hybrid Group-by
Input table Projection Selection Group-by Group-by
Final
Group
s
Filtered
data
Partial
Group
s
Challenge: How to split across accelerator and software?
✓ Reduce data movement bottlenecks
✓ Hybrid Processing
▪ Analytical databases becoming more optimized / not much
compute in core SQL
✓ Emerging compute-intensive workloads
25
The Glass Half Full
▪ Databases adopting new ways of analyzing the data
▪ SAP Hana, Oracle, SQL Server, etc.
▪ Specialized hardware can help both with model building [Kara18],
inference [Owaida18]
▪ Benefits for “classical” algorithms as well
[Kara18] Kara et al: ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation. PVLDB 12(4): 348-361 (2018)
[Owaida18] Owaida et al: Application Partitioning on FPGA Clusters: Inference over Decision Tree Ensembles. FPL 2018: 295-300
26
The Rise of Machine Learning
CPUFPGA Co-processor
27
doppioDB: a hybrid database engine
Database
Engine
(MonetDB)
Hardware
Operator
Software
operator
Software
operator
Hardware
Operator
Hardware
Operator
▪ Goal: extend the capabilities of analytical databases
▪ FPGA works on the same data as software (cache-coherent access)
▪ Can combine SW and HW operators inside the same query
▪ Challenge: ensure high utilization of FPGA, use in many queries
DRAM (DB Tables)
No data copy,
transformation,
partitioning, etc.
Hardware
Operator
K-means – Algorithm
◼ Goal: partition unlabeled data into several
clusters, where the number of clusters is
the “k” in the k-means.
◼ Two steps in each iteration:
◼ Assignment: assign data points to
closet centroid according to distance
metric
◼ Centroid update: the centroids are re-
calculated by averaging all the data
points within each cluster
◼ Long process if the data set and number of
iterations are large
28
DRAM
(DB Tables)
Design – Execution Walk-Through
Receives K-Means parameters1
Fetch the initial centroids and
the data
2
3 Calculates the distance between
a data point and all the centroids
and assign it to closest centroid
4 Accumulates data points per cluster and
counts how many data points are assigned to
each cluster
Collect partial results from each pipeline5
Division for updating new centroid6
Writes back the final results7
1
2
3 4
56
7
Zhenhao He, David Sidler, Zsolt István, Gustavo Alonso: A Flexible K-Means Operator for Hybrid Databases. FPL 2018
29
30
Uses of Parallelism
K is known /
Centroids
known
Need to determine K
(Elbow method)
▪ K-Means algorithm
▪ FPGA outperforms several cores of the CPU
▪ Can use parallelism in two ways – cover more queries
▪ Text: Regular expression matching, Edit distance, …
▪ Database ops.: Skyline queries, Group-by aggregations, …
▪ Statistics: Histograms, Count-min sketch, Bloom filters, …
▪ Machine learning: Clustering (K-means), Stochastic Gradient
Descent, Decision Trees, …
▪ Data management: Hash tables, hash functions, …
▪ [Your algorithm here]
31
Wide range of algorithms can benefit from hardware
✓ Reduce data movement bottlenecks
✓ Hybrid Processing
✓ Emerging compute-intensive workloads
32
The Glass Half Full…
Future Challenges…
▪ Managing Programmable Hardware accelerators
▪ Is this the job of the OS or does the DB has to take control?
▪ How to share programmable hardware across tenants
▪ Compilation/synthesis of hardware accelerators
▪ Can we derive accelerators from user queries?
▪ Intermediary DSL or building blocks we could use?
For more details, see: The Glass Half Full: Using Programmable Hardware Accelerators in Analytics. Z. István. IEEE Data Engineering
Bulletin, March 2019.

Contenu connexe

Tendances

Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...Cloudera, Inc.
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرdatastack
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsDavid Portnoy
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica
 
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaBDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaDavid Lauzon
 
Organising for Data Success
Organising for Data SuccessOrganising for Data Success
Organising for Data SuccessLars Albertsson
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseDavid Lauzon
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013IntelAPAC
 
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchRim Moussa
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Gpu computing workshop
Gpu computing workshopGpu computing workshop
Gpu computing workshopdatastack
 
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian... White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...EMC
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 
Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubesmister_zed
 
Hotel inspection data set analysis copy
Hotel inspection data set analysis   copyHotel inspection data set analysis   copy
Hotel inspection data set analysis copySharon Moses
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 

Tendances (20)

Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaBDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using Impala
 
Organising for Data Success
Organising for Data SuccessOrganising for Data Success
Organising for Data Success
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013
 
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Asd 2015
Asd 2015Asd 2015
Asd 2015
 
Gpu computing workshop
Gpu computing workshopGpu computing workshop
Gpu computing workshop
 
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian... White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubes
 
Hotel inspection data set analysis copy
Hotel inspection data set analysis   copyHotel inspection data set analysis   copy
Hotel inspection data set analysis copy
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 

Similaire à A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Databases

Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataHitoshi Sato
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesJun Liu
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemDan Eaton
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Boni Bruno
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Intel_Swarm64 Solution Brief
Intel_Swarm64 Solution BriefIntel_Swarm64 Solution Brief
Intel_Swarm64 Solution BriefPaul McCullugh
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareData Con LA
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!TigerGraph
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 
A5 oracle exadata-the game changer for online transaction processing data w...
A5   oracle exadata-the game changer for online transaction processing data w...A5   oracle exadata-the game changer for online transaction processing data w...
A5 oracle exadata-the game changer for online transaction processing data w...Dr. Wilfred Lin (Ph.D.)
 

Similaire à A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Databases (20)

Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Intel_Swarm64 Solution Brief
Intel_Swarm64 Solution BriefIntel_Swarm64 Solution Brief
Intel_Swarm64 Solution Brief
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
A5 oracle exadata-the game changer for online transaction processing data w...
A5   oracle exadata-the game changer for online transaction processing data w...A5   oracle exadata-the game changer for online transaction processing data w...
A5 oracle exadata-the game changer for online transaction processing data w...
 

Plus de Facultad de Informática UCM

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?Facultad de Informática UCM
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...Facultad de Informática UCM
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersFacultad de Informática UCM
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmFacultad de Informática UCM
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingFacultad de Informática UCM
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroFacultad de Informática UCM
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...Facultad de Informática UCM
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Facultad de Informática UCM
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFacultad de Informática UCM
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoFacultad de Informática UCM
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCFacultad de Informática UCM
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Facultad de Informática UCM
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Facultad de Informática UCM
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Facultad de Informática UCM
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windFacultad de Informática UCM
 

Plus de Facultad de Informática UCM (20)

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation Computers
 
uElectronics ongoing activities at ESA
uElectronics ongoing activities at ESAuElectronics ongoing activities at ESA
uElectronics ongoing activities at ESA
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura Arm
 
Formalizing Mathematics in Lean
Formalizing Mathematics in LeanFormalizing Mathematics in Lean
Formalizing Mathematics in Lean
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented Computing
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuro
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error Correction
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intento
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
 
Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore wind
 

Dernier

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectssuserb6619e
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 

Dernier (20)

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 

A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Databases

  • 1. The Glass Half Full Using Programmable Hardware Accelerators in Analytical Databases Zsolt István IMDEA Software Institute 1
  • 2. IMDEA Software Institute • 16 Faculty in the areas of: • Program Analysis and Verification • Languages and Compilers • Security and Privacy • Theoretical Computer Science • Distributed Systems and Databases • ~10 Post-docs, ~25 PhD Students, ~10 Interns • Located in UPM Montegancedo Campus, Madrid • We are hiring! https://software.imdea.org/
  • 3. ▪ OLAP – Online Analytical Processing ▪ Large datasets – up to TBs ▪ Ad-hoc querying to extract insight, recurring reporting – Possibly complex operations ▪ Read-mostly workloads, updates in batches ▪ OLTP – Online Transaction Processing ▪ Smaller datasets ▪ Queries known, relate to business actions ▪ Makes heavy use of indexes ▪ Reads and updates intermixed 3 Context: Analytical Databases
  • 4. 4 Databases were a 25 Billion $ market in 2018… Could we specialize machines to them? https://www.statista.com/statistics/810188/worldwide-commercial-database-market-size/
  • 5. ▪ Fully custom machine for databases ▪ Processors – special ISA microprocessors ▪ Memory – magnetic bubbles and CCDs ▪ Semiconductor technology and general purpose CPUs took over 5 Database Computer – ’70s “The first goal is to design it with the capability of handling a very large on-line database of 10^10 bytes or beyond since special-purpose machines are not likely to be cost-effective for small databases.” Jayanta Banerjee, David K. Hsiao, Krishnamurthi Kannan: DBC - A Database Computer for Very Large Databases. IEEE Trans. Computers 28(6): 414-429 (1979)
  • 6. ▪ Based on VAX multi- processor system ▪ By the time the software and hardware were developed, CPUs have become much faster ▪ Couldn’t keep up with Moore’s law 6 Gamma Machine – ’80s David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237
  • 7. 7 Data/Compute Gap CPU Scaling Commodity in Cloud Specialized Hardware Revival
  • 8. 8 Renewed interest in Specialized Hardware ASICsFPGAsCPUs
  • 9. Field Programmable Gate Array (FPGA) ▪ Free choice of architecture ▪ Fine-grained pipelining, communication, distributed memory ▪ Tradeoff: all “code” occupies chip space ▪ Evolving platform: larger chips, more heterogeneity 9 Re-programmable Specialized Hardware Op 1 Op 2 Op 3
  • 10. 10 Integration Options Accel. 1) On the side 2) In data-path 3) Co-processor Data Data Data Accel. Accel.
  • 11. ▪ Accelerator ▪ Amazon F1 ▪ In data path ▪ Microsoft Catapult ▪ Co-processor ▪ Intel Xeon+FPGA 11 In the Cloud Today Socket1 Socket2 CPU FPGA Socket1 CPU FPGA CPU FPGA Intel Xeon+FPGA Gen.1 Intel Xeon+FPGA Gen.2
  • 12. 12 The Glass Half Empty…
  • 13. ▪ 1) On the side acceleration introduces overhead ▪ Many related work offers no real speedup if we factor in data movement, transformation, software overhead… 13 The Glass Half Empty… 0 20 40 60 80 100 120 Software With Acceleration Query execution time Compute Data Movement 2xAccel. Data
  • 14. ▪ 2) “All or nothing” behavior makes query planning difficult ▪ Example: fixed capacity hash table on FPGA ▪ Constant time access for reads and writes ▪ What happens if data doesn’t fit? ▪ Can’t always know the number of keys aprioi 14 The Glass Half Empty… #
  • 15. ▪ 3) Analytical databases becoming more optimized / not much compute in core SQL ▪ X100 [CIDR05] showed that <10% of compute time spent on SQL operators +,-,*,SUM,AVG in analytical queries ▪ Columnar stores often memory bound (10s of GB/s) 15 The Glass Half Empty…
  • 16. ▪ On the side acceleration introduces overhead ▪ “All or nothing” behavior makes query planning difficult ▪ Analytical databases becoming more optimized / not much compute in core SQL 16 The Glass Half Empty…
  • 17. ▪ On the side acceleration introduces overhead ✓ Reduce data movement bottlenecks 17 The Glass Half Full…
  • 18. ▪ IBEX: Database storage engine with processing offload ▪ Filter and pre-aggregate for analytic workloads 18 Processing in data path: Smart Flash Database Server IBEX SSD IBEX – An Intelligent Storage Engine with Support for Advanced SQL Off-loading. L. Woods, Z. Istvan and G. Alonso, VLDB’14 → Larger bandwidth, more IOPS (Samsung YourSQL, MIT BlueDBM) ▪ Opportunity to extend SSDs/Flash with complex offload Samsung “smart” SSD
  • 19. 19 Processing in data path: Distributed Processing Workers(Compute) Storage + Provisioning + Scalability Caribou: Distributed storage with processing • Specialized HW nodes • 10Gbps access • 25W power cons. Zsolt István, David Sidler, Gustavo Alonso: Caribou: Intelligent Distributed Storage. PVLDB 10(11), 2017.
  • 20. 20 Smart Storage in Databases: Filter push-down Intel Hyperscan library (Xeon E5-2680 v2) 2.8x SELECT … FROM customer WHERE age<35 AND purchases>2 AND address LIKE “%PO. Box 123%” ▪ Challenge: guarantee that filtering never slows down retrieval ▪ Algorithms can be re-imagined to become bandwidth-bound instead of compute-bound ▪ Extend the state of the art: parameterization without re-programming [FCCM16] ▪ Many options: Regular expressions, comparisons, decompression, … [FCCM16] Runtime Parameterizable Regular Expression Operators for Databases. Zs. Istvan, D. Sidler, G. Alonso. FCCM’16
  • 21. ✓ Reduce data movement bottlenecks ▪ “All or nothing” behavior makes query planning difficult ✓ Hybrid processing 21 The Glass Half Full…
  • 22. ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department 22 IBEX’s Hybrid Group-by CPUIbex with SW-only Group-By Projection Selection Group-by Final Group s Input table Filtered data
  • 23. ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department 23 IBEX’s Hybrid Group-by CPUIbex with HW-only Group-By Projection Selection Group-by Final Group s Input table Filtered data
  • 24. CPUIbex with HW-only Group-By Projection Selection Group-by Final Group s Input table Filtered data ▪ Group-by: Compute aggregate function over categories ▪ select avg(salary) from employees group by department ▪ If number of groups does not fit on FPGA? ▪ Send partial aggregates – finalize in SW ▪ Worst case: same as no acceleration ▪ Best- case: All in HW! 24 IBEX’s Hybrid Group-by CPUIbex with Hybrid Group-by Input table Projection Selection Group-by Group-by Final Group s Filtered data Partial Group s Challenge: How to split across accelerator and software?
  • 25. ✓ Reduce data movement bottlenecks ✓ Hybrid Processing ▪ Analytical databases becoming more optimized / not much compute in core SQL ✓ Emerging compute-intensive workloads 25 The Glass Half Full
  • 26. ▪ Databases adopting new ways of analyzing the data ▪ SAP Hana, Oracle, SQL Server, etc. ▪ Specialized hardware can help both with model building [Kara18], inference [Owaida18] ▪ Benefits for “classical” algorithms as well [Kara18] Kara et al: ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation. PVLDB 12(4): 348-361 (2018) [Owaida18] Owaida et al: Application Partitioning on FPGA Clusters: Inference over Decision Tree Ensembles. FPL 2018: 295-300 26 The Rise of Machine Learning
  • 27. CPUFPGA Co-processor 27 doppioDB: a hybrid database engine Database Engine (MonetDB) Hardware Operator Software operator Software operator Hardware Operator Hardware Operator ▪ Goal: extend the capabilities of analytical databases ▪ FPGA works on the same data as software (cache-coherent access) ▪ Can combine SW and HW operators inside the same query ▪ Challenge: ensure high utilization of FPGA, use in many queries DRAM (DB Tables) No data copy, transformation, partitioning, etc. Hardware Operator
  • 28. K-means – Algorithm ◼ Goal: partition unlabeled data into several clusters, where the number of clusters is the “k” in the k-means. ◼ Two steps in each iteration: ◼ Assignment: assign data points to closet centroid according to distance metric ◼ Centroid update: the centroids are re- calculated by averaging all the data points within each cluster ◼ Long process if the data set and number of iterations are large 28
  • 29. DRAM (DB Tables) Design – Execution Walk-Through Receives K-Means parameters1 Fetch the initial centroids and the data 2 3 Calculates the distance between a data point and all the centroids and assign it to closest centroid 4 Accumulates data points per cluster and counts how many data points are assigned to each cluster Collect partial results from each pipeline5 Division for updating new centroid6 Writes back the final results7 1 2 3 4 56 7 Zhenhao He, David Sidler, Zsolt István, Gustavo Alonso: A Flexible K-Means Operator for Hybrid Databases. FPL 2018 29
  • 30. 30 Uses of Parallelism K is known / Centroids known Need to determine K (Elbow method) ▪ K-Means algorithm ▪ FPGA outperforms several cores of the CPU ▪ Can use parallelism in two ways – cover more queries
  • 31. ▪ Text: Regular expression matching, Edit distance, … ▪ Database ops.: Skyline queries, Group-by aggregations, … ▪ Statistics: Histograms, Count-min sketch, Bloom filters, … ▪ Machine learning: Clustering (K-means), Stochastic Gradient Descent, Decision Trees, … ▪ Data management: Hash tables, hash functions, … ▪ [Your algorithm here] 31 Wide range of algorithms can benefit from hardware
  • 32. ✓ Reduce data movement bottlenecks ✓ Hybrid Processing ✓ Emerging compute-intensive workloads 32 The Glass Half Full… Future Challenges… ▪ Managing Programmable Hardware accelerators ▪ Is this the job of the OS or does the DB has to take control? ▪ How to share programmable hardware across tenants ▪ Compilation/synthesis of hardware accelerators ▪ Can we derive accelerators from user queries? ▪ Intermediary DSL or building blocks we could use? For more details, see: The Glass Half Full: Using Programmable Hardware Accelerators in Analytics. Z. István. IEEE Data Engineering Bulletin, March 2019.