SlideShare une entreprise Scribd logo
1  sur  23
High Performance Content-Based Matching Using GPUs Alessandro Margara and GianpaoloCugola margara@elet.polimi.it, cugola@elet.polimi.it Dip. Elettronica e Informazione (DEI) Politecnico di Milano
The Problem: Content-Based Matching High Performance Content-Based Matching Using GPUs - DEBS 2011 2 Publishers Content-Based Matching Subscribers Predicate Filter (Smoke=true and Room = “Kitchen”) or (Light>30 and Room=“Bedroom”) Light=50, Room=Bedroom, Sender=“Sensor1” Attribute Constraint
Introduced by Nvidia in 2006 General purpose parallel computing architecture New instruction set New programming model Programmable using high-level languages Cuda C (a C dialect) Programming GPUs: CUDA High Performance Content-Based Matching Using GPUs - DEBS 2011 3
Programming Model: Basics The device (GPU) acts as a coprocessor for the host (CPU) and has its own separate memory space It is necessary to copy input data from the main memory to the GPU memory before starting a computation … … and to copy results back to the main memory when the computation finishes Often the most expensive operations Involve sending information through the PCI-Ex bus Bandwidth but also latency Also requires serialization of data structures! They must be kept simple High Performance Content-Based Matching Using GPUs - DEBS 2011 4
Typical Workflow High Performance Content-Based Matching Using GPUs - DEBS 2011 5 Allocate memory on device Serialize and copy data to device Execute one or more kernels on the device Wait for the device to finish processing Copy results back
Programming Model: Fundamentals Single Program Multiple Threads implementation strategy A single kernel(function) is executed by multiple threads in parallel Threads are organized in blocks Threads within different blocks operate independently Threads within the same block cooperate to solve a single sub-problem The runtime provides a blockIdand athreadIdvariable, to uniquely identify each running thread Accessing such variables is the only way to differentiate the work done by different threads High Performance Content-Based Matching Using GPUs - DEBS 2011 6
Programming Model: Memory management Hierarchical organization of memory All threads have access to the same common global memory Large (512MB-6GB) but slow (DRAM) Stores information received from the host Persistent across different function calls Threads within a block coordinate themselves using a shared memory Implemented on-chip Fast but limited (16-48KB) Each thread has its own localmemory It’s the only “cache” available No hardware/system support Must be explicitly controlled by the application code High Performance Content-Based Matching Using GPUs - DEBS 2011 7
More on Memory Management Without hardware managed caches, accesses to global memory can easily become a bottleneck Issues to consider when designing algorithms and data structures Maximize usage of shared (block local) memory Without overcoming its size Threads with contiguous ids should access contiguous global memory regions Hardware can combine them into several memory-wide accesses High Performance Content-Based Matching Using GPUs - DEBS 2011 8
Hardware Implementation An array of Streaming Multiprocessors (SMs) containing many (extremely simple) processing cores Each SM executes threads in groups of 32 called warps Scheduling is performed in hardware with zero overhead Optimized for data parallel problems Maximum efficiency only if all threads in a warp agree on the execution path 9 High Performance Content-Based Matching Using GPUs - DEBS 2011
Some Numbers NVIDIA GTX 460 1GB RAM (Global Memory) 7 Streaming Multiprocessors Each SM contains 48 cores Each SM manages up to 48 warps (32 threads each) Up to 10752 threads managed concurrently!!! Up to 336 threads running concurrently!!! Today’s cheap GPU: less than 160$ High Performance Content-Based Matching Using GPUs - DEBS 2011 10
Existing Algorithms Two approaches Counting algorithms Tree-based algorithms Complex data structures to optimize sequential execution Trees, Maps, … Lots of pointers!!! Hardly fit the data parallel programming model! High Performance Content-Based Matching Using GPUs - DEBS 2011 11
Algorithm Description High Performance Content-Based Matching Using GPUs - DEBS 2011 12 F1: A>10 and B=20 F2: B>15 and C<30 S1 A=12 B=20 A=12 B=20 F3: D=20 S2 2 1 0 0 1 0
Algorithm Description Constraints with the same name are stored in array on the GPU Contiguous memory regions When processing an event E, the CPU selects all relevant constraint arrays Based on the name of the attributes in E High Performance Content-Based Matching Using GPUs - DEBS 2011 13
Algorithm Description Bi-dimensional organization of threads One thread for each attribute/constraint pair Threads in the same block evaluate the same attribute It can be copied in shared memory Threads with contiguous ids access contiguous constraints Accesses combined into several memory-wide operations Filters count updated with an atomic operation High Performance Content-Based Matching Using GPUs - DEBS 2011 14 Event attributes B=32 C=21 A=7
Improvement Problem: before processing each event we need to reset filters count and interfaces selection vector Naïve version: use a memset Communication with the GPU introduces additional delay Solution: two copies of filters count and interfaces vector While processing an event One copy is used One copy is reset for the next event Inside the same kernel No communication overhead High Performance Content-Based Matching Using GPUs - DEBS 2011 15
Results: Default Scenario Comparison against state of the art sequential implementation SFF (Siena) 1.9.4 AMD CPU @ 2.8GHz Default scenario Relatively “simple” 10 interfaces, 25k filters, 1M constraints Analysis changing various parameters We measure latency Processing time for a single event High Performance Content-Based Matching Using GPUs - DEBS 2011 16 7x
Results: Number of Constraints High Performance Content-Based Matching Using GPUs - DEBS 2011 17 10x
Results: Number of Filters High Performance Content-Based Matching Using GPUs - DEBS 2011 18 13x
Results What is the time needed to install subscriptions? Need to serialize data structures Need to copy from CPU memory to GPU memory But data structures are simple! Memory requirements? 35MB in the default scenario Up to 200MB in all our tests Not a problem for a modern GPU High Performance Content-Based Matching Using GPUs - DEBS 2011 19
Results We measured the latency when processing a single event 0.14ms processing time  7000 events/s? What about the maximum throughput? High Performance Content-Based Matching Using GPUs - DEBS 2011 20 9400 events/s
Conclusions Benefits of GPU in a wide range of scenarios In particular in the most challenging workloads Additional advantage It leaves the CPU free to perform other tasks E.g. Communication related tasks Available for download Includes a translator from Siena subscriptions / messages More info at http://home.dei.polimi.it/margara High Performance Content-Based Matching Using GPUs - DEBS 2011 21
Future Work We are currently working with multi-core CPUs Using OpenMP We are currently testing our algorithm within a real system Both GPUs and multi-core CPUs Take into account communication overhead Measure of latency and throughput We plan to explore the advantages of GPUs with probabilistic (as opposed to exact) matching Encoded filters (Bloom filters) Balance between performance and percentage of false positives High Performance Content-Based Matching Using GPUs - DEBS 2011 22
Questions? High Performance Content-Based Matching Using GPUs - DEBS 2011 23

Contenu connexe

Tendances

Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFSDataWorks Summit
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelaratorsEmmanuel college
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Paging and Segmentation in Operating System
Paging and Segmentation in Operating SystemPaging and Segmentation in Operating System
Paging and Segmentation in Operating SystemRaj Mohan
 
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingAchieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingMesbah Uddin Khan
 
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value CreationEXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value CreationIosif Itkin
 
Enery efficient data prefetching
Enery efficient data prefetchingEnery efficient data prefetching
Enery efficient data prefetchingHimanshu Koli
 
Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Marcirio Chaves
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensOscar Law
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overviewRajiv Kumar
 
Parallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingParallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingGrigoris Anagnostopoulos
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsSrinath Perera
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentationVishal Singh
 

Tendances (20)

openCL Paper
openCL PaperopenCL Paper
openCL Paper
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelarators
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Paging and Segmentation in Operating System
Paging and Segmentation in Operating SystemPaging and Segmentation in Operating System
Paging and Segmentation in Operating System
 
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingAchieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
 
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value CreationEXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
 
Enery efficient data prefetching
Enery efficient data prefetchingEnery efficient data prefetching
Enery efficient data prefetching
 
OS_Ch9
OS_Ch9OS_Ch9
OS_Ch9
 
Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1
 
Chapter 8 - Main Memory
Chapter 8 - Main MemoryChapter 8 - Main Memory
Chapter 8 - Main Memory
 
ResumeJagannath
ResumeJagannathResumeJagannath
ResumeJagannath
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
PowerAlluxio
PowerAlluxioPowerAlluxio
PowerAlluxio
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
Parallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingParallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modeling
 
Memory Mapping Cache
Memory Mapping CacheMemory Mapping Cache
Memory Mapping Cache
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common Patterns
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 

Similaire à Content-Based Matching on GPUs

I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfanil0878
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo
 
In Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosIn Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosDenodo
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Unified Computing System - PC Without CPU
Unified Computing System - PC Without CPUUnified Computing System - PC Without CPU
Unified Computing System - PC Without CPUErAnalSalshingikar
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processorsHebeon1
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)Sri Prasanna
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Antonio Cesarano
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
Hug syncsort etl hadoop big data
Hug syncsort etl hadoop big dataHug syncsort etl hadoop big data
Hug syncsort etl hadoop big dataStéphane Heckel
 

Similaire à Content-Based Matching on GPUs (20)

I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdf
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
 
In Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosIn Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data Scenarios
 
Low-level Graphics APIs
Low-level Graphics APIsLow-level Graphics APIs
Low-level Graphics APIs
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Unified Computing System - PC Without CPU
Unified Computing System - PC Without CPUUnified Computing System - PC Without CPU
Unified Computing System - PC Without CPU
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processors
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Hug syncsort etl hadoop big data
Hug syncsort etl hadoop big dataHug syncsort etl hadoop big data
Hug syncsort etl hadoop big data
 
Power overview 2018 08-13b
Power overview 2018 08-13bPower overview 2018 08-13b
Power overview 2018 08-13b
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Content-Based Matching on GPUs

  • 1. High Performance Content-Based Matching Using GPUs Alessandro Margara and GianpaoloCugola margara@elet.polimi.it, cugola@elet.polimi.it Dip. Elettronica e Informazione (DEI) Politecnico di Milano
  • 2. The Problem: Content-Based Matching High Performance Content-Based Matching Using GPUs - DEBS 2011 2 Publishers Content-Based Matching Subscribers Predicate Filter (Smoke=true and Room = “Kitchen”) or (Light>30 and Room=“Bedroom”) Light=50, Room=Bedroom, Sender=“Sensor1” Attribute Constraint
  • 3. Introduced by Nvidia in 2006 General purpose parallel computing architecture New instruction set New programming model Programmable using high-level languages Cuda C (a C dialect) Programming GPUs: CUDA High Performance Content-Based Matching Using GPUs - DEBS 2011 3
  • 4. Programming Model: Basics The device (GPU) acts as a coprocessor for the host (CPU) and has its own separate memory space It is necessary to copy input data from the main memory to the GPU memory before starting a computation … … and to copy results back to the main memory when the computation finishes Often the most expensive operations Involve sending information through the PCI-Ex bus Bandwidth but also latency Also requires serialization of data structures! They must be kept simple High Performance Content-Based Matching Using GPUs - DEBS 2011 4
  • 5. Typical Workflow High Performance Content-Based Matching Using GPUs - DEBS 2011 5 Allocate memory on device Serialize and copy data to device Execute one or more kernels on the device Wait for the device to finish processing Copy results back
  • 6. Programming Model: Fundamentals Single Program Multiple Threads implementation strategy A single kernel(function) is executed by multiple threads in parallel Threads are organized in blocks Threads within different blocks operate independently Threads within the same block cooperate to solve a single sub-problem The runtime provides a blockIdand athreadIdvariable, to uniquely identify each running thread Accessing such variables is the only way to differentiate the work done by different threads High Performance Content-Based Matching Using GPUs - DEBS 2011 6
  • 7. Programming Model: Memory management Hierarchical organization of memory All threads have access to the same common global memory Large (512MB-6GB) but slow (DRAM) Stores information received from the host Persistent across different function calls Threads within a block coordinate themselves using a shared memory Implemented on-chip Fast but limited (16-48KB) Each thread has its own localmemory It’s the only “cache” available No hardware/system support Must be explicitly controlled by the application code High Performance Content-Based Matching Using GPUs - DEBS 2011 7
  • 8. More on Memory Management Without hardware managed caches, accesses to global memory can easily become a bottleneck Issues to consider when designing algorithms and data structures Maximize usage of shared (block local) memory Without overcoming its size Threads with contiguous ids should access contiguous global memory regions Hardware can combine them into several memory-wide accesses High Performance Content-Based Matching Using GPUs - DEBS 2011 8
  • 9. Hardware Implementation An array of Streaming Multiprocessors (SMs) containing many (extremely simple) processing cores Each SM executes threads in groups of 32 called warps Scheduling is performed in hardware with zero overhead Optimized for data parallel problems Maximum efficiency only if all threads in a warp agree on the execution path 9 High Performance Content-Based Matching Using GPUs - DEBS 2011
  • 10. Some Numbers NVIDIA GTX 460 1GB RAM (Global Memory) 7 Streaming Multiprocessors Each SM contains 48 cores Each SM manages up to 48 warps (32 threads each) Up to 10752 threads managed concurrently!!! Up to 336 threads running concurrently!!! Today’s cheap GPU: less than 160$ High Performance Content-Based Matching Using GPUs - DEBS 2011 10
  • 11. Existing Algorithms Two approaches Counting algorithms Tree-based algorithms Complex data structures to optimize sequential execution Trees, Maps, … Lots of pointers!!! Hardly fit the data parallel programming model! High Performance Content-Based Matching Using GPUs - DEBS 2011 11
  • 12. Algorithm Description High Performance Content-Based Matching Using GPUs - DEBS 2011 12 F1: A>10 and B=20 F2: B>15 and C<30 S1 A=12 B=20 A=12 B=20 F3: D=20 S2 2 1 0 0 1 0
  • 13. Algorithm Description Constraints with the same name are stored in array on the GPU Contiguous memory regions When processing an event E, the CPU selects all relevant constraint arrays Based on the name of the attributes in E High Performance Content-Based Matching Using GPUs - DEBS 2011 13
  • 14. Algorithm Description Bi-dimensional organization of threads One thread for each attribute/constraint pair Threads in the same block evaluate the same attribute It can be copied in shared memory Threads with contiguous ids access contiguous constraints Accesses combined into several memory-wide operations Filters count updated with an atomic operation High Performance Content-Based Matching Using GPUs - DEBS 2011 14 Event attributes B=32 C=21 A=7
  • 15. Improvement Problem: before processing each event we need to reset filters count and interfaces selection vector Naïve version: use a memset Communication with the GPU introduces additional delay Solution: two copies of filters count and interfaces vector While processing an event One copy is used One copy is reset for the next event Inside the same kernel No communication overhead High Performance Content-Based Matching Using GPUs - DEBS 2011 15
  • 16. Results: Default Scenario Comparison against state of the art sequential implementation SFF (Siena) 1.9.4 AMD CPU @ 2.8GHz Default scenario Relatively “simple” 10 interfaces, 25k filters, 1M constraints Analysis changing various parameters We measure latency Processing time for a single event High Performance Content-Based Matching Using GPUs - DEBS 2011 16 7x
  • 17. Results: Number of Constraints High Performance Content-Based Matching Using GPUs - DEBS 2011 17 10x
  • 18. Results: Number of Filters High Performance Content-Based Matching Using GPUs - DEBS 2011 18 13x
  • 19. Results What is the time needed to install subscriptions? Need to serialize data structures Need to copy from CPU memory to GPU memory But data structures are simple! Memory requirements? 35MB in the default scenario Up to 200MB in all our tests Not a problem for a modern GPU High Performance Content-Based Matching Using GPUs - DEBS 2011 19
  • 20. Results We measured the latency when processing a single event 0.14ms processing time  7000 events/s? What about the maximum throughput? High Performance Content-Based Matching Using GPUs - DEBS 2011 20 9400 events/s
  • 21. Conclusions Benefits of GPU in a wide range of scenarios In particular in the most challenging workloads Additional advantage It leaves the CPU free to perform other tasks E.g. Communication related tasks Available for download Includes a translator from Siena subscriptions / messages More info at http://home.dei.polimi.it/margara High Performance Content-Based Matching Using GPUs - DEBS 2011 21
  • 22. Future Work We are currently working with multi-core CPUs Using OpenMP We are currently testing our algorithm within a real system Both GPUs and multi-core CPUs Take into account communication overhead Measure of latency and throughput We plan to explore the advantages of GPUs with probabilistic (as opposed to exact) matching Encoded filters (Bloom filters) Balance between performance and percentage of false positives High Performance Content-Based Matching Using GPUs - DEBS 2011 22
  • 23. Questions? High Performance Content-Based Matching Using GPUs - DEBS 2011 23