SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Adastra Group
Our Solution Portfolio
Webinar: Fast data in times of crisis with the help of GPU2
One Focus: Data & Digitalization
Advanced Analytics
(Big) Data
Engineering
Data Governance Cloud
Services
Machine Learning
& AI
Digital
Transformation
ADASTRA Group
Adastra introduction
3 Adastra Group
International consulting company
that creates functional solutions
in various sectors, facilitating
the transition to the digital era.
Cutting-edge software for data
quality management, Master Data
Management, and data governance.
Solutions to complex business
problems in risk management, sales,
and process optimization.
Specialist in mobile app
development.
Full-service creative agency based
on a strong technological
background.
Recruitment for banks, financial institutions,
telecoms and insurance companies, and many
others, including Adastra.
Artificial intelligence, machine
learning and optimization services.
Big data monetization solutions.
Webinar: Fast data in times of crisis with the help of GPU
Adastra Group
Technical & other details
Webinar: Fast data in times of crisis with the help of GPU4
The panel
Matej Misik
QikkDB & TellStory product owner
Ask questions &
answer polls
Get beta access
to the tools we
show
Leave us with
feedback
Tomas Synek
Moderator
Martin Zahumensky
TellStory power user
Data bases & GPU intro with QikkDB [45mins]
Intro into the deep-tech DB space
What are GPUs and how they accelerate HPC
Data story telling with TellStory [45mins]
Traditional BI vs. data story telling
Explaining Covid19 by creating a data story
Agenda for today
Let’s
GO
General intro into
the problem and to
DBs
Some of our
challenges
Real-time visitors reporting
over stream of data
30k per second
~ 2.6 billion per day
e.g. monitoring crowd
during an event, targeted
marketing
Some of our
challenges
Data science on large
datasets
Testing hypotheses and
ad-hoc querying when
indexing is not predictable
Profiling new datasets
Large flows of commuters
above 500 SIM-cards
We were looking for
solutions
Tested different technologies Elastic,
ClickHouse... not working for us very well
for various reasons
Came across GPU accelerated
computing
so?
Why not?
Elastic – slow on one node,
slow data ingest
Actian Vector – faster, but still
not performing well on one
node
Clickhouse – much faster, no
geo-spatial capabilities, only
for linux
MS-SQL – even when tuned
not fast enough
MapD (Omnisci) – considered
but far too expensive
Types of databases
By type of use:
• Transactional
• Batch
• Real-time
• Analytical
• Streaming
By using resources:
• In-memory
• Disk databases
• Hardware accelerated (FPGA, GPU,
Quantum)
Relational
Columnar Time-seriesGraph
DocumentKey-value
By stored data:
...
The technological edge – Why GPU?
GPUs for HPC (high performance computing)
~10x higher performance in
single hardware unit
Great effectiveness (cheaper
computations)
Power growing exponentially vs
linear CPU
Image
processing
Tsunami
simulation
DNA
analyses
Generic
commodi
ty HW
Available in
Cloud
AWS, Azure
Lot of processors for
parallel computing
Intel® Xeon® Platinum 8253
has 16 cores
NVDIA Tesla V100
has 5120 cores and is data
center focused
Rediscovery of Columnar Data
Storage
Utilizing GPUs computation power requires different approach to storing data.
The most suitable database architecture that works well with parallel processing
is columnar storage. In contrast to conventional relational databases which store
data in row-based format, columnar databases store data in separate columns.
In context of parallel processing, GPUs love long vectors of the same data type
FIgure 1: GPUs have thousands of arithmetic logic units (ALUs) in one piece of hardware.
CPU GPU
GPUs help to accelerate
compute-intensive use-cases
“1 GPU node replaces up to 54 CPU nodes” (NVIDIA)
New cards to be announced 2020 with approx. 8000 cores & 40% faster
Inserting a GPU into the
machine is not enough
Need to parallelize programs = hard
CUDA programming model since 2007 by Nvidia
Algorithms must be Embarrassingly parallel
Multi-GPU
How the computation is spread onto cores
GPU CUDA core A B C
Logical conditions
Records meeting the
condition
Result after
reconstruction
A>= B A < 5 Final AND mask
1st
1 5 Apple 0 1 0 - Orange
2 4 Grapes 0 1 0 - Lemon
3 3 Orange 1 1 1 Orange -
2nd
4 2 Lemon 1 1 1 Lemon -
5 1 Banana 1 0 0 - -
nth ...
Transfer data CPU RAM to GPU GPU memory – no transfers GPU to CPU RAM
SELECT C FROM FRUIT_TABLE WHERE A >= B AND A < 5
Parallel execution
1
2
n
1
n
Where is Spatio-temporal different?
Polygon Operations
Crucial requirements for the
database system
Fast insert Fast processing
Scalability & high
availability
Limit pre-aggregations
Standardized access and
common syntax
Deep-tech based on real
science
Google Protocol Buffers
Processing data on GPU is written in CUDA 10 (direct commands to HW
on single core level)
Database core is written in low level language C++ 17 (memory
management, control of instructions…)
Libraries for specific modules
(networking, building, parsing…)
Created in cooperation with Slovak
Technical University top talents
What is qikkDB for?
Filtering and aggregations over single flat huge table
Spatio-temporal data processing
Complex polygon operations (contains, intersect,
union)
Numeric and datetime data
Incremental data which are growing over time
Network utilization & analysis, Risk scoring, Dynamic pricing,
Real-time Analytics, Hypothesis verification, Profiling of big
data, Machine learning, etc.
Logs
Polygons
IoT
GPSNetwork
Events
Auto
motive
Maps
So how fast is it?
1.2B data rows in
7 columns
Average execution
time was obtained
based on 200 query
runs
Biggest datasets
tested at 400GB,
limited by Memory,
can be cached from
disk for bigger
datasets,
benchmarks to
come soon
Execution
Times Results
1. QikkDB
2. GiraffeDB
leading GPU database
3. CatDB
leading columnar database
4. RacoonDB
tuned leading relational database
CPU machine(c5d.9xlarge)36 CPU cores
We use codenames for well known
databases because for legal
reasons we can’t tell you who
these slow guys are.
GPU machine(p3.8xlarge)4x Tesla V100
Compared to Other DBs (results in ms)
Query qikkDB @
p3.8xl
qikkDB @
g4dn.12xl
GiraffeDB
@ p3.8xl
CatDB @
c5d.9xl
RacoonDB
@ c5d.9xl
Elastic
(tuned)
Spark 21x
m3xl
Spark
i3.8xl
#1 22 37 25 435 22 810 2362 22000
#2 37 82 235 1061 964 1818 3559 25000
#3 228 925 231 1630 3491 n/a 4019 27000
#4 283 1105 417 2174 3996 n/a 20412 65000
Avg 143 537 227 1325 2118 n/a 7588 34750
10
to 100x
quicker
The blazing speed
Same HW, 1.2bn data points, 2 databases
www.tellstory.cloud
Both running
on AWS
g4dn.12xlarge
48vCPU 192GB
RAM, 4x Tesla
T4 GPU
Deployed beta
platform with
data
exploration
front-end
QikkDB demo on
smart meter data
Persisted data
on disk
(compressed)
Pre-loaded
data on RAM
Relevant
columns go
to the GPU
Data on GPU RAM
(decompressed)
Result set
PCI-E
Filters &
aggregations
CUDA kernels
When inserting new data a column is automatically created ~
“schema less”, good for IoT and similar
Whats going on in the background?
Data storage & flow
How can it scale?
Multi-GPU (vertical) scalability single-node (up
to 8 GPUs)
• Accelerating computations
• Enabling multiple session
Multi-level caching
• GPU RAM cache
• CPU RAM cache
On roadmap
• Multi-nodes (horizontal) scalability
• High-availability
• Data lazy loading
Not limited to data size ~ Best performance when
data fit GPU mem, but can load from disk on demand
Why not just index?
Traditional databases use indexing for faster processing
resulting in slow insert
qikkDB does not need indexing
(but they are available anyway)
Data are just appended
GPU takes care of fast processing
Integration with your
environment on
standards you know
Kafka connector
ODBC/JDBC
Adapters
C#, Java, Python
Streaming data
Visualizationtools (PowerBI…)
Customapplications,data analysis…
Speed up your
BI tools,
applications or
use TellStory for
fast analysis
TellStory
Exploration & analysis FE
Data story tellingwith real-timedata exploration
GPU AWS
12USD/hour
GPU HW
~50k USD
Expensive
hardware?
QikkDB can handle the queries in a fraction
of the time of traditional databases, so you
can do more with your hardware
allocation in the same time.
It also means that to do the same amount
of work you need a lot less hardware and
therefore saving on costs.
“1 GPU node replaces up to 54 CPU nodes” (NVIDIA)
v
In short: Interactive analytics
on massive data sets
GPU acceleration
§ Billions of data points in milliseconds
Great for spatio-temporal data
§ Finding & understanding links between data
points in space & time
Standard SQL syntax
§ Easy to start using & integrate into the data
science environment
Efficiency & speed
§ GPUs becoming commodity HW and thanks to
their efficiency cost per 10k queries on par
with CPU approaches
GPU
Columnar
DB
Real-time
queries in
millisecs
API, ODBC,
JDBC,
connects
to
everything
SQL
standard
Spatio-
temporal
data
processing
Cloud or
on-prem
Data bases & GPU intro with QikkDB
Intro into the deep-tech DB space
What are GPUs and how they accelerate HPC
Data story telling with TellStory
Traditional BI vs. data story telling
Explaining Covid19 by creating a data story
Live stories and fast data
TellStory Roadmap
Q&A
Part 2!
Let’s
GO
Martin is ex-Instarea CEO now
working in Ataccama as Head
of Product Strategy
Martin created
https://qikk.ly/c
ovid19 story and
will lead you
through how he
did it
Interpreted data, easy to understand, with new facts
brought to reader
And once they have the story they can start to sell it to
other parties
Animated
video
playing
Story telling
A story is about being visual
Cool
Visualization
Plugins &
animations
Newspaper
like reading
&
interactive
Interesting
facts
1
2
3
Creating the Covid-19 story Live
When you want to have the story live,
you must have the data live, and when
you work with billions big data sets you
need
Fast Database
Animated
video
playing
LIVE story
Live
More features to come in Phase 2, Let AI create your Story is in progress
TellStory Roadmap
Beta release JUNE
Find interesting facts
Minute by Minute
updates
(be notified when something
interesting happens)
Animated
visualizations
(timeline charts, maps)
Share as Video
(Instagram upload, Youtube
livestream)
Google sheets
integration
Auto update data
(scheduled refresh)
Embed sections
(embedding only parts of
story will be possible)
Value
proposition
for Adastra
services
with these
tools
Quick pilots for hands on
experience
§ GPU data acceleration: 2 month pilot to
deliver real-time processing of vast
streaming data (e.g. 5G, smart meters,
transactions)
§ Data story telling: 1 month pilot to
provide customers with live &
interactive intelligence and insights
§ Data story telling: 1 month pilot to give
management the minute by minute data
they need
Q&A
Check out
www.qikk.ly
and
www.tellstory.ai
Useful links
More info
§ https://qikk.ly – product web with basic
information
§ https://qikk.ly/downloads/qikkDB_white_pa
per.pdf – White paper
§ https://docs.qikk.ly/ – Documentation &
Installation instructions
§ https://support.qikk.ly/ – Issues & Features
reporting portal
§ https://tellstory.cloud – Front-end for data
visualization, SQL console on AWS
§ https://tellstory.ai – Find out more about
TellStory

Contenu connexe

Tendances

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Masayuki Matsushita
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
Nathan Bijnens
 

Tendances (20)

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial Services
 
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database AnalyticsOperationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...
 

Similaire à Fast data in times of crisis with GPU accelerated database QikkDB | Business Breakfast | 23.4.2020

Similaire à Fast data in times of crisis with GPU accelerated database QikkDB | Business Breakfast | 23.4.2020 (20)

NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data Centers
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
GTC Tel Aviv: Accelerate Analytics with a GPU Data FrameGTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataBig Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real Time
 
Accelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPUAccelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPU
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Fast data in times of crisis with GPU accelerated database QikkDB | Business Breakfast | 23.4.2020

  • 1.
  • 2. Adastra Group Our Solution Portfolio Webinar: Fast data in times of crisis with the help of GPU2 One Focus: Data & Digitalization Advanced Analytics (Big) Data Engineering Data Governance Cloud Services Machine Learning & AI Digital Transformation
  • 3. ADASTRA Group Adastra introduction 3 Adastra Group International consulting company that creates functional solutions in various sectors, facilitating the transition to the digital era. Cutting-edge software for data quality management, Master Data Management, and data governance. Solutions to complex business problems in risk management, sales, and process optimization. Specialist in mobile app development. Full-service creative agency based on a strong technological background. Recruitment for banks, financial institutions, telecoms and insurance companies, and many others, including Adastra. Artificial intelligence, machine learning and optimization services. Big data monetization solutions. Webinar: Fast data in times of crisis with the help of GPU
  • 4. Adastra Group Technical & other details Webinar: Fast data in times of crisis with the help of GPU4 The panel Matej Misik QikkDB & TellStory product owner Ask questions & answer polls Get beta access to the tools we show Leave us with feedback Tomas Synek Moderator Martin Zahumensky TellStory power user
  • 5. Data bases & GPU intro with QikkDB [45mins] Intro into the deep-tech DB space What are GPUs and how they accelerate HPC Data story telling with TellStory [45mins] Traditional BI vs. data story telling Explaining Covid19 by creating a data story Agenda for today Let’s GO
  • 6. General intro into the problem and to DBs
  • 7. Some of our challenges Real-time visitors reporting over stream of data 30k per second ~ 2.6 billion per day e.g. monitoring crowd during an event, targeted marketing
  • 8. Some of our challenges Data science on large datasets Testing hypotheses and ad-hoc querying when indexing is not predictable Profiling new datasets Large flows of commuters above 500 SIM-cards
  • 9. We were looking for solutions Tested different technologies Elastic, ClickHouse... not working for us very well for various reasons Came across GPU accelerated computing so? Why not? Elastic – slow on one node, slow data ingest Actian Vector – faster, but still not performing well on one node Clickhouse – much faster, no geo-spatial capabilities, only for linux MS-SQL – even when tuned not fast enough MapD (Omnisci) – considered but far too expensive
  • 10. Types of databases By type of use: • Transactional • Batch • Real-time • Analytical • Streaming By using resources: • In-memory • Disk databases • Hardware accelerated (FPGA, GPU, Quantum) Relational Columnar Time-seriesGraph DocumentKey-value By stored data: ...
  • 11. The technological edge – Why GPU? GPUs for HPC (high performance computing) ~10x higher performance in single hardware unit Great effectiveness (cheaper computations) Power growing exponentially vs linear CPU Image processing Tsunami simulation DNA analyses Generic commodi ty HW Available in Cloud AWS, Azure
  • 12. Lot of processors for parallel computing Intel® Xeon® Platinum 8253 has 16 cores NVDIA Tesla V100 has 5120 cores and is data center focused Rediscovery of Columnar Data Storage Utilizing GPUs computation power requires different approach to storing data. The most suitable database architecture that works well with parallel processing is columnar storage. In contrast to conventional relational databases which store data in row-based format, columnar databases store data in separate columns. In context of parallel processing, GPUs love long vectors of the same data type FIgure 1: GPUs have thousands of arithmetic logic units (ALUs) in one piece of hardware. CPU GPU GPUs help to accelerate compute-intensive use-cases “1 GPU node replaces up to 54 CPU nodes” (NVIDIA) New cards to be announced 2020 with approx. 8000 cores & 40% faster
  • 13. Inserting a GPU into the machine is not enough Need to parallelize programs = hard CUDA programming model since 2007 by Nvidia Algorithms must be Embarrassingly parallel
  • 14. Multi-GPU How the computation is spread onto cores GPU CUDA core A B C Logical conditions Records meeting the condition Result after reconstruction A>= B A < 5 Final AND mask 1st 1 5 Apple 0 1 0 - Orange 2 4 Grapes 0 1 0 - Lemon 3 3 Orange 1 1 1 Orange - 2nd 4 2 Lemon 1 1 1 Lemon - 5 1 Banana 1 0 0 - - nth ... Transfer data CPU RAM to GPU GPU memory – no transfers GPU to CPU RAM SELECT C FROM FRUIT_TABLE WHERE A >= B AND A < 5 Parallel execution 1 2 n 1 n
  • 15. Where is Spatio-temporal different? Polygon Operations
  • 16. Crucial requirements for the database system Fast insert Fast processing Scalability & high availability Limit pre-aggregations Standardized access and common syntax
  • 17. Deep-tech based on real science Google Protocol Buffers Processing data on GPU is written in CUDA 10 (direct commands to HW on single core level) Database core is written in low level language C++ 17 (memory management, control of instructions…) Libraries for specific modules (networking, building, parsing…) Created in cooperation with Slovak Technical University top talents
  • 18. What is qikkDB for? Filtering and aggregations over single flat huge table Spatio-temporal data processing Complex polygon operations (contains, intersect, union) Numeric and datetime data Incremental data which are growing over time Network utilization & analysis, Risk scoring, Dynamic pricing, Real-time Analytics, Hypothesis verification, Profiling of big data, Machine learning, etc. Logs Polygons IoT GPSNetwork Events Auto motive Maps
  • 19. So how fast is it? 1.2B data rows in 7 columns Average execution time was obtained based on 200 query runs Biggest datasets tested at 400GB, limited by Memory, can be cached from disk for bigger datasets, benchmarks to come soon
  • 20. Execution Times Results 1. QikkDB 2. GiraffeDB leading GPU database 3. CatDB leading columnar database 4. RacoonDB tuned leading relational database CPU machine(c5d.9xlarge)36 CPU cores We use codenames for well known databases because for legal reasons we can’t tell you who these slow guys are. GPU machine(p3.8xlarge)4x Tesla V100 Compared to Other DBs (results in ms) Query qikkDB @ p3.8xl qikkDB @ g4dn.12xl GiraffeDB @ p3.8xl CatDB @ c5d.9xl RacoonDB @ c5d.9xl Elastic (tuned) Spark 21x m3xl Spark i3.8xl #1 22 37 25 435 22 810 2362 22000 #2 37 82 235 1061 964 1818 3559 25000 #3 228 925 231 1630 3491 n/a 4019 27000 #4 283 1105 417 2174 3996 n/a 20412 65000 Avg 143 537 227 1325 2118 n/a 7588 34750 10 to 100x quicker
  • 21. The blazing speed Same HW, 1.2bn data points, 2 databases www.tellstory.cloud Both running on AWS g4dn.12xlarge 48vCPU 192GB RAM, 4x Tesla T4 GPU Deployed beta platform with data exploration front-end
  • 22. QikkDB demo on smart meter data
  • 23. Persisted data on disk (compressed) Pre-loaded data on RAM Relevant columns go to the GPU Data on GPU RAM (decompressed) Result set PCI-E Filters & aggregations CUDA kernels When inserting new data a column is automatically created ~ “schema less”, good for IoT and similar Whats going on in the background? Data storage & flow
  • 24. How can it scale? Multi-GPU (vertical) scalability single-node (up to 8 GPUs) • Accelerating computations • Enabling multiple session Multi-level caching • GPU RAM cache • CPU RAM cache On roadmap • Multi-nodes (horizontal) scalability • High-availability • Data lazy loading Not limited to data size ~ Best performance when data fit GPU mem, but can load from disk on demand
  • 25. Why not just index? Traditional databases use indexing for faster processing resulting in slow insert qikkDB does not need indexing (but they are available anyway) Data are just appended GPU takes care of fast processing
  • 26. Integration with your environment on standards you know Kafka connector ODBC/JDBC Adapters C#, Java, Python Streaming data Visualizationtools (PowerBI…) Customapplications,data analysis… Speed up your BI tools, applications or use TellStory for fast analysis TellStory Exploration & analysis FE Data story tellingwith real-timedata exploration
  • 27. GPU AWS 12USD/hour GPU HW ~50k USD Expensive hardware? QikkDB can handle the queries in a fraction of the time of traditional databases, so you can do more with your hardware allocation in the same time. It also means that to do the same amount of work you need a lot less hardware and therefore saving on costs. “1 GPU node replaces up to 54 CPU nodes” (NVIDIA)
  • 28. v In short: Interactive analytics on massive data sets GPU acceleration § Billions of data points in milliseconds Great for spatio-temporal data § Finding & understanding links between data points in space & time Standard SQL syntax § Easy to start using & integrate into the data science environment Efficiency & speed § GPUs becoming commodity HW and thanks to their efficiency cost per 10k queries on par with CPU approaches GPU Columnar DB Real-time queries in millisecs API, ODBC, JDBC, connects to everything SQL standard Spatio- temporal data processing Cloud or on-prem
  • 29. Data bases & GPU intro with QikkDB Intro into the deep-tech DB space What are GPUs and how they accelerate HPC Data story telling with TellStory Traditional BI vs. data story telling Explaining Covid19 by creating a data story Live stories and fast data TellStory Roadmap Q&A Part 2! Let’s GO
  • 30. Martin is ex-Instarea CEO now working in Ataccama as Head of Product Strategy Martin created https://qikk.ly/c ovid19 story and will lead you through how he did it
  • 31. Interpreted data, easy to understand, with new facts brought to reader And once they have the story they can start to sell it to other parties Animated video playing Story telling
  • 32. A story is about being visual Cool Visualization Plugins & animations Newspaper like reading & interactive Interesting facts
  • 34. When you want to have the story live, you must have the data live, and when you work with billions big data sets you need Fast Database Animated video playing LIVE story Live
  • 35. More features to come in Phase 2, Let AI create your Story is in progress TellStory Roadmap Beta release JUNE Find interesting facts Minute by Minute updates (be notified when something interesting happens) Animated visualizations (timeline charts, maps) Share as Video (Instagram upload, Youtube livestream) Google sheets integration Auto update data (scheduled refresh) Embed sections (embedding only parts of story will be possible)
  • 36. Value proposition for Adastra services with these tools Quick pilots for hands on experience § GPU data acceleration: 2 month pilot to deliver real-time processing of vast streaming data (e.g. 5G, smart meters, transactions) § Data story telling: 1 month pilot to provide customers with live & interactive intelligence and insights § Data story telling: 1 month pilot to give management the minute by minute data they need
  • 38. Useful links More info § https://qikk.ly – product web with basic information § https://qikk.ly/downloads/qikkDB_white_pa per.pdf – White paper § https://docs.qikk.ly/ – Documentation & Installation instructions § https://support.qikk.ly/ – Issues & Features reporting portal § https://tellstory.cloud – Front-end for data visualization, SQL console on AWS § https://tellstory.ai – Find out more about TellStory