SlideShare a Scribd company logo
1 of 100
Stream processing is a computer
programming paradigm, related to
SIMD
Stream processing is a computer
programming paradigm, related to
SIMD



It allows some applications to more
easily exploit a limited form of
parallel processing
A stream is simply a set of records
that require similar computation.
Streams provide data parallelism
A stream is simply a set of records
that require similar computation.
Streams provide data parallelism




                   Kernels are the functions that
                   are applied to each element in
                   the stream
A stream is simply a set of records
that require similar computation.
Streams provide data parallelism




                    Kernels are the functions that
                    are applied to each element in
                    the stream


For each element we can only read from the
input, perform operations on it, and write to the
output
Stream processing is especially suitable for
applications that exhibit three characteristics ---
Stream processing is especially suitable for
applications that exhibit three characteristics ---
Stream processing is especially suitable for
applications that exhibit three characteristics ---
Stream processing is especially suitable for
applications that exhibit three characteristics ---
Flynn’s Taxonomy:            SISD




Single Instruction: Only one instruction stream is being acted on
by the CPU during any one clock cycle

Single Data: Only one data stream is being used as input during
any one clock cycle
Flynn’s Taxonomy:       SIMD




Single Instruction: All processing units execute the same
instruction at any given clock cycle

Multiple Data: Each processing unit can operate on a different
data element
Flynn’s Taxonomy:         MISD




Multiple Instruction: Each processing unit operates on the data
independently via separate instruction streams.

Single Data: A single data stream is fed into multiple processing
units.
Flynn’s Taxonomy:         MIMD




Multiple Instruction: Every processor may be executing a different
instruction stream

Multiple Data: Every processor may be working with a different data
stream
Stream Processors




stream processing makes use of locality of reference by explicitly
grouping related code and data together for easy fetching into the
cache
A stream processing language for programs based
on streams of data


  e.g Audio, video, DSP, networking,
  and cryptographic processing kernels




         HDTV editing, radar tracking, microphone arrays,
         cellphone base stations, graphics




                                                            [Thies 2002]
A high-level, architecture-independent language
for streaming applications

1. Improves programmer productivity (vs.
   Java, C)

2. Offers scalable performance on multicores



                                           [Thies 2002]
GPU
GPU is a single-chip processor
that creates lighting effects and
transforms objects every time a 3D
scene is redrawn

Used primarily for 3-D
applications.


                            a GPU can be present on a video
                            card, or it can be on the
                            motherboard, or in certain CPUs, on
                            the CPU die
World’s First GPu
Nvidia in 1999 marketed the GeForce 256 as "the world's
first 'GPU, a single-chip processor that is capable of
processing a minimum of 10 million polygons per second".


Rival ATI Technologies coined the term visual processing
unit or VPU with the release of the Radeon 9700 in 2002.
GPUs have a very high compute capacity
GPUs have a very high compute capacity
GPUs have a very high compute capacity




To the hardware, the accelerator
looks like another IO unit; it
communicates with the CPU using IO
commands and DMA memory transfers
GPUs have a very high compute capacity




To the hardware, the accelerator
looks like another IO unit; it          To the software, the accelerator
communicates with the CPU using IO      is another computer to which your
commands and DMA memory transfers       program sends data and routines
                                        to execute
GPGPU
This concept turns the massive floating-point computational
power of a modern graphics accelerator into general-purpose
computing power
GPGPU
This concept turns the massive floating-point computational
power of a modern graphics accelerator into general-purpose
computing power
GPGPU
This concept turns the massive floating-point computational
power of a modern graphics accelerator into general-purpose
computing power
GPGPU
This concept turns the massive floating-point computational
power of a modern graphics accelerator into general-purpose
computing power




  GPUs are stream processors – processors that can operate
  in parallel by running a single kernel on many records in
  a stream at once
GPGPU
 This concept turns the massive floating-point computational
 power of a modern graphics accelerator into general-purpose
 computing power




   GPUs are stream processors – processors that can operate
   in parallel by running a single kernel on many records in
   a stream at once



Ideal GPGPU applications have large data sets, high parallelism,
and minimal dependency between data elements
In certain circumstances the GPU calculates   forty
times faster than the conventional CPUs
In certain circumstances the GPU calculates   forty
     times faster than the conventional CPUs

AMD
Athlon 64   CPU     154 m
X2
In certain circumstances the GPU calculates   forty
     times faster than the conventional CPUs

AMD                              ATI X1950
Athlon 64   CPU     154 m                    GPU     384 m
                                 XTX
X2
In certain circumstances the GPU calculates   forty
     times faster than the conventional CPUs

AMD                              ATI X1950
Athlon 64   CPU     154 m                    GPU     384 m
                                 XTX
X2




Intel Core 2
             CPU    582 m
Quad
In certain circumstances the GPU calculates    forty
     times faster than the conventional CPUs

AMD                              ATI X1950
Athlon 64   CPU     154 m                     GPU     384 m
                                 XTX
X2




Intel Core 2                      NVIDIA
             CPU    582 m                     GPU      680 m
Quad                              G8800 GTX
“The processing power of just 5,000 ATI processors is
also enough to rival that of the existing 200,000
computers currently involved in the Folding@home project”




                                                      [Ref 1]
“The processing power of just 5,000 ATI processors is
 also enough to rival that of the existing 200,000
 computers currently involved in the Folding@home project”


“..it is estimated that if a mere 10,000 computers were to
each use an ATI processor to conduct folding research, that
the Folding@home program would effectively perform faster
than the fastest supercomputer in existence
today, surpassing the 1 petaFLOP level “- 2007



   November 10, 2011- Folding@home 6.0 petaFlop where
   8.162 petaFLOP ( K computer)                          [Ref 1]
comparing GPUs to CPUs isn't an
        apples-to-apples comparison

  The clock rates are lower




the architectures are radically
different




     the problems they're trying to solve are almost
     completely unrelated
Application Processor:

Executes application code like
MPEG decoding

Sequences the instructions and
issues them to Stream clients
e.g KEU and
DRAM interface




                                 [Kapasi 2003]
Two Stream Clients:

KEU:
Programmable Kernel Execution
Unit

DRAM interface:

Provides access to global data
storage




                                 [Kapasi 2003]
KEU:

It has two stream level
instructions:

1. load_kernel – loads
  compiled kernel function in
  the local instruction
  storage inside the KEU

2. run_kernel – executes the
  kernel




                                [Kapasi 2003]
DRAM interface:

Two stream level instructions
as well –

1. load_stream – loads an
  entire stream from SRF

2. store_stream – stores a
  stream into SRF




                                [Kapasi 2003]
Local register files (LRFs)

1. use for operands for
   arithmetic operations
   (similar to caches on CPUs)

2. exploit fine-grain locality




                                 [Kapasi 2003]
Stream register files
(SRFs)

1. capture coarse-grain
   locality
2. efficiently transfer data
   to and from the LRFs




                               [Kapasi 2003]
[Kapasi 2003]
Topics learnt today:

1. Stream Processing
   3. How modern GPUs use stream processing

                                    4. Imagine Stream Processor from
                                    Stanford
2. StreamIT language from MIT
Stream Processing

More Related Content

What's hot

GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009Randall Hand
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit pptNitesh Dubey
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentationVishal Singh
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)Fatima Qayyum
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
HC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerHC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerAMD Developer Central
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...AMD Developer Central
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An IntroductionDhan V Sagar
 
Optimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyOptimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyDavid Lecomber
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrKohei KaiGai
 

What's hot (20)

GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
HC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerHC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
Example Application of GPU
Example Application of GPUExample Application of GPU
Example Application of GPU
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
Optimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyOptimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for Energy
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
 

Similar to Stream Processing

Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
Intel new processors
Intel new processorsIntel new processors
Intel new processorszaid_b
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciênciaCampus Party Brasil
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8AbdullahMunir32
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrjRoberto Brandao
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 

Similar to Stream Processing (20)

Modern processor art
Modern processor artModern processor art
Modern processor art
 
Danish presentation
Danish presentationDanish presentation
Danish presentation
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
processor struct
processor structprocessor struct
processor struct
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciência
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 

Recently uploaded

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 

Recently uploaded (20)

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 

Stream Processing

  • 1.
  • 2. Stream processing is a computer programming paradigm, related to SIMD
  • 3. Stream processing is a computer programming paradigm, related to SIMD It allows some applications to more easily exploit a limited form of parallel processing
  • 4. A stream is simply a set of records that require similar computation. Streams provide data parallelism
  • 5. A stream is simply a set of records that require similar computation. Streams provide data parallelism Kernels are the functions that are applied to each element in the stream
  • 6. A stream is simply a set of records that require similar computation. Streams provide data parallelism Kernels are the functions that are applied to each element in the stream For each element we can only read from the input, perform operations on it, and write to the output
  • 7. Stream processing is especially suitable for applications that exhibit three characteristics ---
  • 8. Stream processing is especially suitable for applications that exhibit three characteristics ---
  • 9. Stream processing is especially suitable for applications that exhibit three characteristics ---
  • 10. Stream processing is especially suitable for applications that exhibit three characteristics ---
  • 11. Flynn’s Taxonomy: SISD Single Instruction: Only one instruction stream is being acted on by the CPU during any one clock cycle Single Data: Only one data stream is being used as input during any one clock cycle
  • 12. Flynn’s Taxonomy: SIMD Single Instruction: All processing units execute the same instruction at any given clock cycle Multiple Data: Each processing unit can operate on a different data element
  • 13. Flynn’s Taxonomy: MISD Multiple Instruction: Each processing unit operates on the data independently via separate instruction streams. Single Data: A single data stream is fed into multiple processing units.
  • 14. Flynn’s Taxonomy: MIMD Multiple Instruction: Every processor may be executing a different instruction stream Multiple Data: Every processor may be working with a different data stream
  • 15. Stream Processors stream processing makes use of locality of reference by explicitly grouping related code and data together for easy fetching into the cache
  • 16. A stream processing language for programs based on streams of data e.g Audio, video, DSP, networking, and cryptographic processing kernels HDTV editing, radar tracking, microphone arrays, cellphone base stations, graphics [Thies 2002]
  • 17. A high-level, architecture-independent language for streaming applications 1. Improves programmer productivity (vs. Java, C) 2. Offers scalable performance on multicores [Thies 2002]
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72. GPU GPU is a single-chip processor that creates lighting effects and transforms objects every time a 3D scene is redrawn Used primarily for 3-D applications. a GPU can be present on a video card, or it can be on the motherboard, or in certain CPUs, on the CPU die
  • 73. World’s First GPu Nvidia in 1999 marketed the GeForce 256 as "the world's first 'GPU, a single-chip processor that is capable of processing a minimum of 10 million polygons per second". Rival ATI Technologies coined the term visual processing unit or VPU with the release of the Radeon 9700 in 2002.
  • 74. GPUs have a very high compute capacity
  • 75. GPUs have a very high compute capacity
  • 76. GPUs have a very high compute capacity To the hardware, the accelerator looks like another IO unit; it communicates with the CPU using IO commands and DMA memory transfers
  • 77. GPUs have a very high compute capacity To the hardware, the accelerator looks like another IO unit; it To the software, the accelerator communicates with the CPU using IO is another computer to which your commands and DMA memory transfers program sends data and routines to execute
  • 78. GPGPU This concept turns the massive floating-point computational power of a modern graphics accelerator into general-purpose computing power
  • 79. GPGPU This concept turns the massive floating-point computational power of a modern graphics accelerator into general-purpose computing power
  • 80. GPGPU This concept turns the massive floating-point computational power of a modern graphics accelerator into general-purpose computing power
  • 81. GPGPU This concept turns the massive floating-point computational power of a modern graphics accelerator into general-purpose computing power GPUs are stream processors – processors that can operate in parallel by running a single kernel on many records in a stream at once
  • 82. GPGPU This concept turns the massive floating-point computational power of a modern graphics accelerator into general-purpose computing power GPUs are stream processors – processors that can operate in parallel by running a single kernel on many records in a stream at once Ideal GPGPU applications have large data sets, high parallelism, and minimal dependency between data elements
  • 83. In certain circumstances the GPU calculates forty times faster than the conventional CPUs
  • 84. In certain circumstances the GPU calculates forty times faster than the conventional CPUs AMD Athlon 64 CPU 154 m X2
  • 85. In certain circumstances the GPU calculates forty times faster than the conventional CPUs AMD ATI X1950 Athlon 64 CPU 154 m GPU 384 m XTX X2
  • 86. In certain circumstances the GPU calculates forty times faster than the conventional CPUs AMD ATI X1950 Athlon 64 CPU 154 m GPU 384 m XTX X2 Intel Core 2 CPU 582 m Quad
  • 87. In certain circumstances the GPU calculates forty times faster than the conventional CPUs AMD ATI X1950 Athlon 64 CPU 154 m GPU 384 m XTX X2 Intel Core 2 NVIDIA CPU 582 m GPU 680 m Quad G8800 GTX
  • 88. “The processing power of just 5,000 ATI processors is also enough to rival that of the existing 200,000 computers currently involved in the Folding@home project” [Ref 1]
  • 89. “The processing power of just 5,000 ATI processors is also enough to rival that of the existing 200,000 computers currently involved in the Folding@home project” “..it is estimated that if a mere 10,000 computers were to each use an ATI processor to conduct folding research, that the Folding@home program would effectively perform faster than the fastest supercomputer in existence today, surpassing the 1 petaFLOP level “- 2007 November 10, 2011- Folding@home 6.0 petaFlop where 8.162 petaFLOP ( K computer) [Ref 1]
  • 90. comparing GPUs to CPUs isn't an apples-to-apples comparison The clock rates are lower the architectures are radically different the problems they're trying to solve are almost completely unrelated
  • 91.
  • 92. Application Processor: Executes application code like MPEG decoding Sequences the instructions and issues them to Stream clients e.g KEU and DRAM interface [Kapasi 2003]
  • 93. Two Stream Clients: KEU: Programmable Kernel Execution Unit DRAM interface: Provides access to global data storage [Kapasi 2003]
  • 94. KEU: It has two stream level instructions: 1. load_kernel – loads compiled kernel function in the local instruction storage inside the KEU 2. run_kernel – executes the kernel [Kapasi 2003]
  • 95. DRAM interface: Two stream level instructions as well – 1. load_stream – loads an entire stream from SRF 2. store_stream – stores a stream into SRF [Kapasi 2003]
  • 96. Local register files (LRFs) 1. use for operands for arithmetic operations (similar to caches on CPUs) 2. exploit fine-grain locality [Kapasi 2003]
  • 97. Stream register files (SRFs) 1. capture coarse-grain locality 2. efficiently transfer data to and from the LRFs [Kapasi 2003]
  • 99. Topics learnt today: 1. Stream Processing 3. How modern GPUs use stream processing 4. Imagine Stream Processor from Stanford 2. StreamIT language from MIT