SlideShare une entreprise Scribd logo
1  sur  20
Evaluating FPGA-acceleration for Real-time
           Unstructured Search
           Sai Rahul Chalamalasetti†, Martin Margala†, Wim Vanderbauwhede*,
                       Mitch Wright‡, Parthasarathy Ranganathan‡‡
                     †University of Massachusetts Lowell, Lowell, MA
                           *University of Glasgow, Scotland, UK
                              ‡Hewlett Packard, Houston, TX
                          ‡‡Hewlett Packard Labs, Palo Alto, CA




4/9/2012                                                                      1
Outline
  Motivation
  Workload and Algorithm Description
  Hardware Systems
  Synthetic Datasets
  Performance Results
  Alternatives and Future Work
  Conclusion



4/9/2012                                2
Motivation
 The era of “big data”
        Explosion in data – particularly unstructured data
            Information doubling every 18 months or faster
            Enterprise server systems processed, delivered over 9 zettabytes in
             2008 (UCSD report)
            Walmart:1M transactions/hour; LHC: 1PB/second; YouTube: 48
             hours/minute; Facebook: 100TB logs/day
        Explosion in data-centric workloads
            Collect, store, access, share, visualize, analyze, interpret, …
            Consumer, Enterprise, Scientific, …
            New applications emerging recently: search, live business
             analytics, social correlation, collaborative filtering, …
            Need better performance for deeper analytics across diverse data




4/9/2012                                                                           3
Motivation
  The era of “green computing”
        Power and cooling important constraint for
        servers/datacenters
            Only 4 countries consume more electricity than worldwide
            datacenters; millions of dollars for cloud datacenters
            Thermal density and costs of power delivery and cooling
            infrastructure
        Sustainability a growing concern
            Lifecycle minimization of environmental effects and carbon
            emissions
            Corporate initiatives from HP, Cisco, Dell, Google, IBM, Intel;
            Government initiatives from EPA, DOE, etc.




4/9/2012                                                                       4
This Work
 High-performance energy-efficient data-centric architectures
      FPGAs/accelerators a good way to improve energy efficiency
      Accelerated Unstructured Search, mainly data analysis (document filtering &
       profile match)
      GiDEL ProcStar IV board (Four Altera Stratix IV 530 FPGAs)
 Recent developments offer promise
    Better toolkits, and IPs for Host Computer Interfaces to FPGAs, e.g. GiDEL
    Future platforms, e.g. ARM+FPGA on a single die by Altera and Xilinx
    Recent commercial successes , e.g. Fusion-io, Netezza, etc.
 What we achieved
      Performance speed up of 23X to 38X
      Energy efficiency improvements of 31X to 40X
      Performance-per-cost improvement of 10X




  4/9/2012                                                                           5
Choice of Workload
 Wide variety of emerging data-centric workloads
     Operations: collect and store, maintain & manage; retrieve, interpret &
      analyze
 Focus on important emerging class: real-time unstructured
  search
       Searching patent repositories for related work comparison
       Searching emails and share points for enterprise information management
       Detecting spam in incoming emails
       Monitoring communications for e.g. terrorist activity
       News story topic detection and tracking
       Searching through books, images, and videos for matching profiles




 4/9/2012                                                                         6
Algorithm Description
 Document model
     Each document modeled as bag of words “D” of pairs (t,f)
         t is a term; f is number of occurrences of t in document D
     Profile “M” is a set of pairs p = (t,w)
         t is term; w is weight function
         Bayesian algorithm used offline to precompute profile based on user requirements




 4/9/2012                                                                                    7
Hardware Platform




 FPGA Board                                            Application Implementation
     GiDEL PROCStar-IV development board                  GiDEL External Memory IPs
     Internal FPGA Memory of 20Mb                         Algorithm in VHDL in Altera Quartus
     External Memory for single FPGA                      GiDEL ProcWizard to integrate Algorithm
             Bank A  512 MB (profile and scores)
             Bank B/C  2 GB each (document stream)
                                                            with its IPs




 4/9/2012                                                                                             8
Baseline Systems
  An optimized multi-threaded reference implementation
        Written in C++, compiled in g++ with optimization –O3
  Different platforms
       System 1 – Intel Core 2 Duo Mobile E8435, 3.06 GHz and 8GB RAM
       System 2 – 8-core Intel Core i7-2600, 3.4 GHz and 16GB RAM
  The high memory baselines are required to enable sufficient
   memory for the data collection
        Reading the data from disk would dominate the performance
        Collection is preloaded in memory




4/9/2012                                                                 9
Hardware Algorithm Description
             Profile    Ext. Mem   Bloom
             Storage    (Bank A)    Filter
             Latency/     20 cc     1 cc
               Term


 Probability of Input Terms to be a
  profile hit is extremely small.
 Bloom Filter is used to discard
  misses.
 Extract Parallelism out of FPGA ?
     Parallel term look up of Bloom Filter




  4/9/2012                                    10
Hardware Algorithm Description
 Multi Bank Bloom Filter: To decrease
  congestion for multi look up
      Lookup Eight Terms in Parallel in Bloom
       Filter
 Individual banks are implemented on
  Altera M9K (hard memory blocks) on
  the FPGA
 The current implementation uses only
  half of 1280 M9K blocks to map 4Mb of
  Profile
 To decrease/eliminate false positives
  future Bloom Filter designs include
      8 Mb on all the M9K blocks (130 MHz)
      16 Mb profile size on both M9k and M114k
       together (100MHz)




 4/9/2012                                         11
Implemented Algorithm

                           Utilization   Logic        Memory
                                         Elements     MRAMs
                                         424 KLEs     20 Mbits
                           Total         17,562       4
                           Algorithm     4,561 (1%)   4 (22%)




4/9/2012                                                         12
Synthetic Data Sets
  Creating Synthetic Data Sets
        The real world data is hard to access, e.g. patent collections are governed by
         licenses that restrict their use.
        Synthetic document collections statistically match real-world collections.
  Real-World Document Collections
       Newspaper Collection (TREC Aquaint)
       Patent collection from US Patent Office (USPTO), and European Patent
        Office (EPO)
       Lemur2 Information Retrieval toolkit is used to determine the rank
        frequency for all the terms in the collection
                                                        Average     Average
                               Collection   # Docs      Doc. Len.   Uniq. Terms
                               Aquaint      1,033,461   437         169
                               USPTO        1,406,200   1718        353
                               EPO          989,507     3863        705




4/9/2012        2www.lemurproject.org                                                     13
Synthetic Data Sets
 Modeling Distributions of terms
        Most natural language documents follow Zipfian for rank-frequency distribution



        We use Montemurro’s extension to Zipf's law
     Modeling Document Lengths
        Sampled from a truncated Gaussian
        Verified using a χ2 test with 95 % confidence
 Synthetic documents of varying lengths
        Each document terms follow fitted rank-frequency distribution
        Convert documents into the standard bag-of-words representation




    4/9/2012                                                                              14
Experimental Parameters
  The Performance of Algorithm on the system depends on
        the size of the collection
           256K document of 4096 terms (Patent collection)
           1M documents of 1024terms (Aquaint collection)

        the size of the profile
            4K, 16K and 64K terms, which are similar to that of TREC Aquaint and EPO

  Profile Types
        “Random”: Selecting number of random documents from the
         collection until the desired profile size is reached, hit probability 10-5
        “Selected”: Selecting terms that occur in very few documents (Most
         representative of real world usage), hit probability 5.10-4




4/9/2012                                                                                15
Performance Results
     Profile         System1             System2           FPGA board        System1       System2    FPGA board
 Random, 4K            269                  416               3090             292          1118          3090
Random, 16K            245                  324               3090             288          1014          3090
Random, 64K            223                  379               3090             253           945          3090
 Selected, 4K          118                  232               3088             120           309          3088
Selected, 16K          107                  164               3088              94           350          3088
Selected, 64K           82                  136               3088              72           183          3088
  Empty, 4K            710                 1564               3090             911          2005          3090
 Empty, 16K            711                 1664               3090             844          1976          3090
 Empty, 64K            710                 1338               3090             877          1952          3090
    Full, 4K             8                  11                 36               7             10           36
   Full, 16K             8                  12                 36               8             12           36
   Full, 64K             9                  10                 36               8             11           36
                256K documents of 4096 terms(M Terms/Sec)                  1M documents of 1024 terms(M Terms/Sec)

 #Threads        System1         System2       FPGA System
  0 (Idle)          40              67             35            FPGA / System *           System 1          System2
      1             67              93            61.5
      2             67             107             68
                                                                        Speed up              38X               23X
      4             67             135            74.5              Perf. / Watt              31X               40X
      8             67*            141             81
 Power consumption of document filtering application (W)




 4/9/2012                                                                                                              16
Performance versus Cost
  We used cost model from Shah and Patel’s work
                                                         Cost Breakdown      CPU            CPU+FPGA
                                                        Space                           21M$/y
                                                        Power & Cooling      52M$/y         29M$/y
                                                         IT Infrastructure   59M$/y         248M$/y
                                                         Total               132M$/y        299M$/y
                                                         Performance         136Mops/s      3090Mops/s
                                                        (single system)
                                                         Performance/Cost    32Mops/$       330Mops/$


                                                      Considering the Device Demand
                                                       Economics, Performance/Cost for
                                                       various FPGA costs, such as $2000,
                                                       $4000, and $8000 are calculated
      Performance/Cost versus FPGA system cost and    Various speedup factors effect on
      performance gains
                                                       Gain Factor



4/9/2012                                                                                                 17
Alternatives and Future Work
  ASIC Bloom Filter
        Frequency of operation, and its effect on Power Consumption
  GPGPU
        Frequency of operation, size of internal memory, and I/O bottleneck
  Decrease congestion probability for multi term access of
   Bloom Filter by increasing number of banks
  In-depth characterization of other diverse workloads
  Explore low power host systems, such as ARM, Atom etc
  Implement the Hardware algorithm using high-level languages
   such as Impulse-C, Catapult-C, MORA framework and OpenCL



4/9/2012                                                                       18
Conclusion
  Growing demand on Data-Center Computing, and “Green
   Computing” motivated designers for high performance system
   with improved energy efficiency
  A new FPGA- accelerated system design for information retrieval
   or unstructured search
  Algorithm is implemented on GiDEL ProcStar IV (Altera Stratix IV
   530 FPGA), achieving 800Mterms/sec of throughput with power
   consumption of 6W
  Comparisons of FPGA system with Baseline system
        Speed up of 23x to 38X
        Energy Efficiency of 31x to 40x
        Performance-per-cost improvement of 10X




4/9/2012                                                              19
Thank you




           Questions ?




4/9/2012                 20

Contenu connexe

Tendances

SeCold - A Linked Data Platform for Mining Software Repositories
SeCold - A Linked Data Platform for  Mining Software RepositoriesSeCold - A Linked Data Platform for  Mining Software Repositories
SeCold - A Linked Data Platform for Mining Software Repositoriesimanmahsa
 
AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhereinside-BigData.com
 
JavaFest. Grzegorz Piwowarek. Hazelcast - Hitchhiker’s Guide
JavaFest. Grzegorz Piwowarek. Hazelcast - Hitchhiker’s GuideJavaFest. Grzegorz Piwowarek. Hazelcast - Hitchhiker’s Guide
JavaFest. Grzegorz Piwowarek. Hazelcast - Hitchhiker’s GuideFestGroup
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
Intel 2020 Labs Day Keynote Slides
Intel 2020 Labs Day Keynote SlidesIntel 2020 Labs Day Keynote Slides
Intel 2020 Labs Day Keynote SlidesDESMOND YUEN
 
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)Dag Endresen
 
Using Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent ArchiveUsing Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent ArchivePhil Cryer
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Heiko Joerg Schick
 
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale ComputingHigh Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale ComputingHeiko Joerg Schick
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkSteve Loughran
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsHeiko Joerg Schick
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksDESMOND YUEN
 
Advances at the Argonne Leadership Computing Center
Advances at the Argonne Leadership Computing CenterAdvances at the Argonne Leadership Computing Center
Advances at the Argonne Leadership Computing Centerdavidemartin
 
Scalable Incremental Index for Druid
Scalable Incremental Index for DruidScalable Incremental Index for Druid
Scalable Incremental Index for DruidItai Yaffe
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSSYuan CHAO
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceLEGATO project
 
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...Larry Smarr
 
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...Larry Smarr
 

Tendances (20)

SeCold - A Linked Data Platform for Mining Software Repositories
SeCold - A Linked Data Platform for  Mining Software RepositoriesSeCold - A Linked Data Platform for  Mining Software Repositories
SeCold - A Linked Data Platform for Mining Software Repositories
 
AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhere
 
JavaFest. Grzegorz Piwowarek. Hazelcast - Hitchhiker’s Guide
JavaFest. Grzegorz Piwowarek. Hazelcast - Hitchhiker’s GuideJavaFest. Grzegorz Piwowarek. Hazelcast - Hitchhiker’s Guide
JavaFest. Grzegorz Piwowarek. Hazelcast - Hitchhiker’s Guide
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
Intel 2020 Labs Day Keynote Slides
Intel 2020 Labs Day Keynote SlidesIntel 2020 Labs Day Keynote Slides
Intel 2020 Labs Day Keynote Slides
 
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
 
Using Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent ArchiveUsing Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent Archive
 
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
 
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale ComputingHigh Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale Computing
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery Networks
 
Advances at the Argonne Leadership Computing Center
Advances at the Argonne Leadership Computing CenterAdvances at the Argonne Leadership Computing Center
Advances at the Argonne Leadership Computing Center
 
Scalable Incremental Index for Druid
Scalable Incremental Index for DruidScalable Incremental Index for Druid
Scalable Incremental Index for Druid
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSS
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
 
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
A High-Performance Campus-Scale Cyberinfrastructure for Effectively Bridging ...
 

En vedette

En vedette (7)

OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on Ze...
OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on Ze...OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on Ze...
OpenCV acceleration battle:OpenCL on Firefly-RK3288(MALI-T764) vs. FPGA on Ze...
 
What is FPGA?
What is FPGA?What is FPGA?
What is FPGA?
 
FPGA
FPGAFPGA
FPGA
 
Announcing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAnnouncing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAs
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Similaire à Presentation Ispass 2012 Session6 Presentation1

Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
The Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCThe Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCIntel IT Center
 
Rama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/LRama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/Lmsramakrishna
 
Interactive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudInteractive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudHelix Nebula The Science Cloud
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed_Hat_Storage
 
The State of Decentralized Storage
The State of Decentralized StorageThe State of Decentralized Storage
The State of Decentralized StorageCoinGecko
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataLviv Startup Club
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Lviv Startup Club
 
Apache metron - An Introduction
Apache metron - An IntroductionApache metron - An Introduction
Apache metron - An IntroductionBaban Gaigole
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIDataWorks Summit
 
Cloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovaciónCloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovaciónFundación Ramón Areces
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsRed_Hat_Storage
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 

Similaire à Presentation Ispass 2012 Session6 Presentation1 (20)

Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Systore07 V4
Systore07 V4Systore07 V4
Systore07 V4
 
The Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCThe Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPC
 
Rama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/LRama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/L
 
Blue Gene
Blue GeneBlue Gene
Blue Gene
 
Interactive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudInteractive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science Cloud
 
Hdf5
Hdf5Hdf5
Hdf5
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use Cases
 
The State of Decentralized Storage
The State of Decentralized StorageThe State of Decentralized Storage
The State of Decentralized Storage
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)
 
Apache metron - An Introduction
Apache metron - An IntroductionApache metron - An Introduction
Apache metron - An Introduction
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AI
 
Cloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovaciónCloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovación
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 

Presentation Ispass 2012 Session6 Presentation1

  • 1. Evaluating FPGA-acceleration for Real-time Unstructured Search Sai Rahul Chalamalasetti†, Martin Margala†, Wim Vanderbauwhede*, Mitch Wright‡, Parthasarathy Ranganathan‡‡ †University of Massachusetts Lowell, Lowell, MA *University of Glasgow, Scotland, UK ‡Hewlett Packard, Houston, TX ‡‡Hewlett Packard Labs, Palo Alto, CA 4/9/2012 1
  • 2. Outline  Motivation  Workload and Algorithm Description  Hardware Systems  Synthetic Datasets  Performance Results  Alternatives and Future Work  Conclusion 4/9/2012 2
  • 3. Motivation The era of “big data”  Explosion in data – particularly unstructured data  Information doubling every 18 months or faster  Enterprise server systems processed, delivered over 9 zettabytes in 2008 (UCSD report)  Walmart:1M transactions/hour; LHC: 1PB/second; YouTube: 48 hours/minute; Facebook: 100TB logs/day  Explosion in data-centric workloads  Collect, store, access, share, visualize, analyze, interpret, …  Consumer, Enterprise, Scientific, …  New applications emerging recently: search, live business analytics, social correlation, collaborative filtering, …  Need better performance for deeper analytics across diverse data 4/9/2012 3
  • 4. Motivation  The era of “green computing”  Power and cooling important constraint for servers/datacenters  Only 4 countries consume more electricity than worldwide datacenters; millions of dollars for cloud datacenters  Thermal density and costs of power delivery and cooling infrastructure  Sustainability a growing concern  Lifecycle minimization of environmental effects and carbon emissions  Corporate initiatives from HP, Cisco, Dell, Google, IBM, Intel; Government initiatives from EPA, DOE, etc. 4/9/2012 4
  • 5. This Work  High-performance energy-efficient data-centric architectures  FPGAs/accelerators a good way to improve energy efficiency  Accelerated Unstructured Search, mainly data analysis (document filtering & profile match)  GiDEL ProcStar IV board (Four Altera Stratix IV 530 FPGAs)  Recent developments offer promise  Better toolkits, and IPs for Host Computer Interfaces to FPGAs, e.g. GiDEL  Future platforms, e.g. ARM+FPGA on a single die by Altera and Xilinx  Recent commercial successes , e.g. Fusion-io, Netezza, etc.  What we achieved  Performance speed up of 23X to 38X  Energy efficiency improvements of 31X to 40X  Performance-per-cost improvement of 10X 4/9/2012 5
  • 6. Choice of Workload  Wide variety of emerging data-centric workloads  Operations: collect and store, maintain & manage; retrieve, interpret & analyze  Focus on important emerging class: real-time unstructured search  Searching patent repositories for related work comparison  Searching emails and share points for enterprise information management  Detecting spam in incoming emails  Monitoring communications for e.g. terrorist activity  News story topic detection and tracking  Searching through books, images, and videos for matching profiles 4/9/2012 6
  • 7. Algorithm Description  Document model  Each document modeled as bag of words “D” of pairs (t,f)  t is a term; f is number of occurrences of t in document D  Profile “M” is a set of pairs p = (t,w)  t is term; w is weight function  Bayesian algorithm used offline to precompute profile based on user requirements 4/9/2012 7
  • 8. Hardware Platform  FPGA Board  Application Implementation  GiDEL PROCStar-IV development board  GiDEL External Memory IPs  Internal FPGA Memory of 20Mb  Algorithm in VHDL in Altera Quartus  External Memory for single FPGA  GiDEL ProcWizard to integrate Algorithm  Bank A  512 MB (profile and scores)  Bank B/C  2 GB each (document stream) with its IPs 4/9/2012 8
  • 9. Baseline Systems  An optimized multi-threaded reference implementation  Written in C++, compiled in g++ with optimization –O3  Different platforms  System 1 – Intel Core 2 Duo Mobile E8435, 3.06 GHz and 8GB RAM  System 2 – 8-core Intel Core i7-2600, 3.4 GHz and 16GB RAM  The high memory baselines are required to enable sufficient memory for the data collection  Reading the data from disk would dominate the performance  Collection is preloaded in memory 4/9/2012 9
  • 10. Hardware Algorithm Description Profile Ext. Mem Bloom Storage (Bank A) Filter Latency/ 20 cc 1 cc Term  Probability of Input Terms to be a profile hit is extremely small.  Bloom Filter is used to discard misses.  Extract Parallelism out of FPGA ?  Parallel term look up of Bloom Filter 4/9/2012 10
  • 11. Hardware Algorithm Description  Multi Bank Bloom Filter: To decrease congestion for multi look up  Lookup Eight Terms in Parallel in Bloom Filter  Individual banks are implemented on Altera M9K (hard memory blocks) on the FPGA  The current implementation uses only half of 1280 M9K blocks to map 4Mb of Profile  To decrease/eliminate false positives future Bloom Filter designs include  8 Mb on all the M9K blocks (130 MHz)  16 Mb profile size on both M9k and M114k together (100MHz) 4/9/2012 11
  • 12. Implemented Algorithm Utilization Logic Memory Elements MRAMs 424 KLEs 20 Mbits Total 17,562 4 Algorithm 4,561 (1%) 4 (22%) 4/9/2012 12
  • 13. Synthetic Data Sets  Creating Synthetic Data Sets  The real world data is hard to access, e.g. patent collections are governed by licenses that restrict their use.  Synthetic document collections statistically match real-world collections.  Real-World Document Collections  Newspaper Collection (TREC Aquaint)  Patent collection from US Patent Office (USPTO), and European Patent Office (EPO)  Lemur2 Information Retrieval toolkit is used to determine the rank frequency for all the terms in the collection Average Average Collection # Docs Doc. Len. Uniq. Terms Aquaint 1,033,461 437 169 USPTO 1,406,200 1718 353 EPO 989,507 3863 705 4/9/2012 2www.lemurproject.org 13
  • 14. Synthetic Data Sets  Modeling Distributions of terms  Most natural language documents follow Zipfian for rank-frequency distribution  We use Montemurro’s extension to Zipf's law  Modeling Document Lengths  Sampled from a truncated Gaussian  Verified using a χ2 test with 95 % confidence  Synthetic documents of varying lengths  Each document terms follow fitted rank-frequency distribution  Convert documents into the standard bag-of-words representation 4/9/2012 14
  • 15. Experimental Parameters  The Performance of Algorithm on the system depends on  the size of the collection  256K document of 4096 terms (Patent collection)  1M documents of 1024terms (Aquaint collection)  the size of the profile  4K, 16K and 64K terms, which are similar to that of TREC Aquaint and EPO  Profile Types  “Random”: Selecting number of random documents from the collection until the desired profile size is reached, hit probability 10-5  “Selected”: Selecting terms that occur in very few documents (Most representative of real world usage), hit probability 5.10-4 4/9/2012 15
  • 16. Performance Results Profile System1 System2 FPGA board System1 System2 FPGA board Random, 4K 269 416 3090 292 1118 3090 Random, 16K 245 324 3090 288 1014 3090 Random, 64K 223 379 3090 253 945 3090 Selected, 4K 118 232 3088 120 309 3088 Selected, 16K 107 164 3088 94 350 3088 Selected, 64K 82 136 3088 72 183 3088 Empty, 4K 710 1564 3090 911 2005 3090 Empty, 16K 711 1664 3090 844 1976 3090 Empty, 64K 710 1338 3090 877 1952 3090 Full, 4K 8 11 36 7 10 36 Full, 16K 8 12 36 8 12 36 Full, 64K 9 10 36 8 11 36 256K documents of 4096 terms(M Terms/Sec) 1M documents of 1024 terms(M Terms/Sec) #Threads System1 System2 FPGA System 0 (Idle) 40 67 35 FPGA / System * System 1 System2 1 67 93 61.5 2 67 107 68 Speed up 38X 23X 4 67 135 74.5 Perf. / Watt 31X 40X 8 67* 141 81 Power consumption of document filtering application (W) 4/9/2012 16
  • 17. Performance versus Cost  We used cost model from Shah and Patel’s work Cost Breakdown CPU CPU+FPGA Space 21M$/y Power & Cooling 52M$/y 29M$/y IT Infrastructure 59M$/y 248M$/y Total 132M$/y 299M$/y Performance 136Mops/s 3090Mops/s (single system) Performance/Cost 32Mops/$ 330Mops/$  Considering the Device Demand Economics, Performance/Cost for various FPGA costs, such as $2000, $4000, and $8000 are calculated Performance/Cost versus FPGA system cost and  Various speedup factors effect on performance gains Gain Factor 4/9/2012 17
  • 18. Alternatives and Future Work  ASIC Bloom Filter  Frequency of operation, and its effect on Power Consumption  GPGPU  Frequency of operation, size of internal memory, and I/O bottleneck  Decrease congestion probability for multi term access of Bloom Filter by increasing number of banks  In-depth characterization of other diverse workloads  Explore low power host systems, such as ARM, Atom etc  Implement the Hardware algorithm using high-level languages such as Impulse-C, Catapult-C, MORA framework and OpenCL 4/9/2012 18
  • 19. Conclusion  Growing demand on Data-Center Computing, and “Green Computing” motivated designers for high performance system with improved energy efficiency  A new FPGA- accelerated system design for information retrieval or unstructured search  Algorithm is implemented on GiDEL ProcStar IV (Altera Stratix IV 530 FPGA), achieving 800Mterms/sec of throughput with power consumption of 6W  Comparisons of FPGA system with Baseline system  Speed up of 23x to 38X  Energy Efficiency of 31x to 40x  Performance-per-cost improvement of 10X 4/9/2012 19
  • 20. Thank you Questions ? 4/9/2012 20