SlideShare a Scribd company logo
1 of 30
Design issues of IBM CellDesign issues of IBM Cell
ArchitectureArchitecture
Vitthal Gutthe MEIT 1326Vitthal Gutthe MEIT 1326
Pravin kumar Yadav MEIT 1338Pravin kumar Yadav MEIT 1338
Vyanktesh Dorlikar MEIT 1324Vyanktesh Dorlikar MEIT 1324
contentscontents
 General IntroductionGeneral Introduction
 History of developmentHistory of development
 Technical overview of architectureTechnical overview of architecture
 Detailed technical discussion ofDetailed technical discussion of
componentscomponents
 Design choicesDesign choices
 Cell programming issuesCell programming issues
History of DevelopmentHistory of Development
 Sony Playstation2Sony Playstation2
• Released March 2000 in JapanReleased March 2000 in Japan
• 128bit “Emotion Engine”128bit “Emotion Engine”
• With freq. of 294Mhz,MIPS CPUWith freq. of 294Mhz,MIPS CPU
• Having capability of 6.2gflops(gegaHaving capability of 6.2gflops(gega
floating point operation per second)floating point operation per second)
History ContinuedHistory Continued
 Partnership between Sony, Toshiba,Partnership between Sony, Toshiba,
IBM in Summer of 2000IBM in Summer of 2000
 Initial goal of 1000 x PS2 Power inInitial goal of 1000 x PS2 Power in
single Machinesingle Machine
 March 2001, Sony-IBM-ToshibaMarch 2001, Sony-IBM-Toshiba
design center opened with andesign center opened with an
investment of $400m investment.investment of $400m investment.
Overall Goals for CellOverall Goals for Cell
 High performance in multimedia appsHigh performance in multimedia apps
 Gain Real time performanceGain Real time performance
 Power consumption should bePower consumption should be
minimumminimum
 Cost as low as possibleCost as low as possible
 Available by 2005Available by 2005
 Avoid memory latency issuesAvoid memory latency issues
associated with control structuresassociated with control structures
The Cell itselfThe Cell itself
 Power PC basedPower PC based
main core (PPE)main core (PPE)
 MultipleMultiple
SPEs(Synergistic)SPEs(Synergistic)
 On die memoryOn die memory
controllercontroller
 Inter-coreInter-core
transport bustransport bus
 High speed IOHigh speed IO
Cell Die LayoutCell Die Layout
Cell ImplementationCell Implementation
 Cell is an architectureCell is an architecture
 Preliminary ImplementationPreliminary Implementation
• 1 PPE1 PPE
• 7 SPE (1 Disabled for yield increase)7 SPE (1 Disabled for yield increase)
• 221 mm² die size on a 90 nm process221 mm² die size on a 90 nm process
• Clocked at freq. 3-4ghzClocked at freq. 3-4ghz
• 256GFLOPS Single Precision @ 4ghz256GFLOPS Single Precision @ 4ghz
Why a Cell ArchitectureWhy a Cell Architecture
 Follows a trend in computingFollows a trend in computing
architecturearchitecture
 Natural extension of dual and multi-Natural extension of dual and multi-
corecore
 Extremely low hardware overheadExtremely low hardware overhead
 Software controllableSoftware controllable
 Specialized hardware more useful forSpecialized hardware more useful for
multimediamultimedia
Possible UsesPossible Uses
 Playstation3Playstation3
(Obviously)(Obviously)
 Blade servers (IBM)Blade servers (IBM)
• Amazing singleAmazing single
precision FPprecision FP
performanceperformance
• Scientific applicationsScientific applications
 Toshiba HDTVToshiba HDTV
productsproducts
Power Processing ElementPower Processing Element
 PowerPC instruction set with AltiVecPowerPC instruction set with AltiVec
 Used for general purpose computingUsed for general purpose computing
and controlling SPE’sand controlling SPE’s
 Simultaneous MultithreadingSimultaneous Multithreading
 Separate 32 KB L1 Caches andSeparate 32 KB L1 Caches and
unified 512 KB L2 Cacheunified 512 KB L2 Cache
PPE (cont.)PPE (cont.)
 Slow but power efficient PowerPCSlow but power efficient PowerPC
instruction set implementationinstruction set implementation
 Two issue in-order instruction fetchTwo issue in-order instruction fetch
 Conspicuous lack of instructionConspicuous lack of instruction
windowwindow
 Compare to conventional PowerPCCompare to conventional PowerPC
implementations (G5)implementations (G5)
 Performance depends on SPEPerformance depends on SPE
utilizationutilization
Synergistic Processing Element (SPE)Synergistic Processing Element (SPE)
 Specialized hardwareSpecialized hardware
 Meant to be used inMeant to be used in
parallelparallel
• (7 on PS3(7 on PS3
implementation)implementation)
 On chip memory (256kb)On chip memory (256kb)
 No branch predictionNo branch prediction
 In-order executionIn-order execution
 Dual issueDual issue
SPE ArchitectureSPE Architecture
 0.99µm2 on 90nm Process0.99µm2 on 90nm Process
 128 registers (128 bits wide)128 registers (128 bits wide)
• Instructions assumed to be 4x 32bitInstructions assumed to be 4x 32bit
 Variant of VMX instruction setVariant of VMX instruction set
• Modified for 128 registersModified for 128 registers
 On chip memory is NOT a cacheOn chip memory is NOT a cache
SPE ExecutionSPE Execution
 Dual issue, in-orderDual issue, in-order
 Seven execution unitsSeven execution units
 Vector logicVector logic
 8 single precision operations per8 single precision operations per
cyclecycle
 Significant performance hit forSignificant performance hit for
double precisiondouble precision
SPE Execution DiagramSPE Execution Diagram
SPE Local Storage AreaSPE Local Storage Area
 NOT a cacheNOT a cache
 256kb, 4 x 64kb ECC single port256kb, 4 x 64kb ECC single port
SRAMSRAM
 Completely private to each SPECompletely private to each SPE
 Directly addressable by softwareDirectly addressable by software
 Can be used as a cache, but onlyCan be used as a cache, but only
with software controlswith software controls
 No tag bits, or any extra hardwareNo tag bits, or any extra hardware
SPE LS SchedulingSPE LS Scheduling
 Software controlled DMASoftware controlled DMA
 DMA to and from main memoryDMA to and from main memory
 Scheduling a HUGE problemScheduling a HUGE problem
• Done primarily in softwareDone primarily in software
• IBM predicts 80-90% usage ideallyIBM predicts 80-90% usage ideally
 Request queue handles 16 simultaneousRequest queue handles 16 simultaneous
requestsrequests
• Up to 16 kb transfer eachUp to 16 kb transfer each
• Priority: DMA, L/S, FetchPriority: DMA, L/S, Fetch
 Fetch / execute parallelismFetch / execute parallelism
SPE Control LogicSPE Control Logic
 Very little in comparisonVery little in comparison
 Represents shift in focusRepresents shift in focus
 Complete lack of branch predictionComplete lack of branch prediction
• Software branch predictionSoftware branch prediction
• Loop unrollingLoop unrolling
• 18 cycle penalty18 cycle penalty
 Software controlled DMASoftware controlled DMA
SPE PipelineSPE Pipeline
 Little ILP, and thusLittle ILP, and thus
little control logiclittle control logic
 Dual issueDual issue
 Simple commitSimple commit
unit (no reorderunit (no reorder
buffer or otherbuffer or other
complexities)complexities)
 Same executionSame execution
unit for FP/intunit for FP/int
SPE SummarySPE Summary
 Essentially small vector computerEssentially small vector computer
 Based on Altivec/VMX ISABased on Altivec/VMX ISA
• Extensions for DMA and LS managementExtensions for DMA and LS management
• Extended for 128x 128bit registerfileExtended for 128x 128bit registerfile
 Uniquely suited for real time applicationsUniquely suited for real time applications
 Extremely fast for certain FP operationsExtremely fast for certain FP operations
 Offload a large amount on to compiler /Offload a large amount on to compiler /
software.software.
Element Interconnect BusElement Interconnect Bus
 4 concentric rings connecting all Cell4 concentric rings connecting all Cell
elementselements
 128-bit wide interconnects128-bit wide interconnects
EIB (cont.)EIB (cont.)
 Designed to minimize coupling noiseDesigned to minimize coupling noise
 Rings of data traveling in alternatingRings of data traveling in alternating
directionsdirections
 Buffers and repeaters at each SPEBuffers and repeaters at each SPE
boundaryboundary
 Architecture can be scaled up withArchitecture can be scaled up with
increased bus latencyincreased bus latency
EIB (cont.)EIB (cont.)
 Total bandwidth at ~200GB/sTotal bandwidth at ~200GB/s
 EIB controller located physically inEIB controller located physically in
center of chip between SPE’scenter of chip between SPE’s
 Controller reserves channels for eachController reserves channels for each
individual data transfer requestindividual data transfer request
 Implementation allows for SPEImplementation allows for SPE
extension horizontallyextension horizontally
Memory InterfaceMemory Interface
 Rambus XDR memory to keep Cell atRambus XDR memory to keep Cell at
full utilizationfull utilization
 3.2 Gbps data bandwidth per device3.2 Gbps data bandwidth per device
connected to XDR interfaceconnected to XDR interface
 Cell uses dual channel XDR with fourCell uses dual channel XDR with four
devices and 16-bit wide buses todevices and 16-bit wide buses to
achieve 25.2 GB/s total memoryachieve 25.2 GB/s total memory
bandwidthbandwidth
Input / Output BusInput / Output Bus
 Rambus FlexIO BusRambus FlexIO Bus
 IO interface consists of 12IO interface consists of 12
unidirectional byte lanesunidirectional byte lanes
 Each lane supports 6.4 GB/sEach lane supports 6.4 GB/s
bandwidthbandwidth
 7 outbound lanes and 5 inbound7 outbound lanes and 5 inbound
laneslanes
Design ChoicesDesign Choices
 In-order executionIn-order execution
• Abandoning ILPAbandoning ILP
• ILP – 10-20% increase per generationILP – 10-20% increase per generation
• Reducing control logicReducing control logic
• Real time responsivenessReal time responsiveness
 Cache DesignCache Design
• Software configuration on SPESoftware configuration on SPE
• Standard L2 cache on PPEStandard L2 cache on PPE
Cell Programming IssuesCell Programming Issues
 No Cell compiler in existence to manageNo Cell compiler in existence to manage
utilization of SPE’s at compile timeutilization of SPE’s at compile time
 SPE’s do not natively support contextSPE’s do not natively support context
switching. Must be OS managed.switching. Must be OS managed.
 SPE’s are vector processors. Not efficientSPE’s are vector processors. Not efficient
for general-purpose computation.for general-purpose computation.
 PPE’s and SPE’s use different instructionPPE’s and SPE’s use different instruction
sets.sets.
Cell Programming (cont.)Cell Programming (cont.)
 Functional Offload ModelFunctional Offload Model
 Simplest model for Cell programmingSimplest model for Cell programming
 Optimize existing libraries for SPEOptimize existing libraries for SPE
computationcomputation
 Requires no rebuild of mainRequires no rebuild of main
application logic which runs on PPEapplication logic which runs on PPE
RefrencesRefrences
• "Synergistic Processing in Cell's Multicore
Architecture"(PDF). IEEE. Retrieved 2007-03-22.
•Jump up^ "Cell Designer talks about PS3 and IBM
Cell Processors". Retrieved 2007-03-22.
•Jump up^ "Cell Broadband Engine Interconnect
and Memory Interface"(PDF). IBM. Retrieved 2007-
03-22.
•http://en.wikipedia.org/wiki/Cell_(microprocessor)

More Related Content

What's hot

IP Addressing (Subnetting, VLSM, Supernetting)
IP Addressing (Subnetting, VLSM, Supernetting)IP Addressing (Subnetting, VLSM, Supernetting)
IP Addressing (Subnetting, VLSM, Supernetting)cuetcse
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelaratorsEmmanuel college
 
Assignment 1,computer networks-317
Assignment 1,computer networks-317Assignment 1,computer networks-317
Assignment 1,computer networks-317Khondoker Sadia
 
Final networks lab manual
Final networks lab manualFinal networks lab manual
Final networks lab manualJaya Prasanna
 
RISC and CISC Processors
RISC and CISC ProcessorsRISC and CISC Processors
RISC and CISC ProcessorsAdeel Rasheed
 
INTERCONNECTION STRUCTURE
INTERCONNECTION STRUCTUREINTERCONNECTION STRUCTURE
INTERCONNECTION STRUCTUREVENNILAV6
 
Computer networks short note (version 8)
Computer networks short note (version 8)Computer networks short note (version 8)
Computer networks short note (version 8)Nimmi Weeraddana
 
Networking presentation
Networking presentationNetworking presentation
Networking presentationJyoti Tewari
 
introduction to Networking
introduction to Networkingintroduction to Networking
introduction to Networkingiicecollege
 
WIRELESS TRANSMISSION
WIRELESS TRANSMISSIONWIRELESS TRANSMISSION
WIRELESS TRANSMISSIONjunnubabu
 
Cache memoy designed by Mohd Tariq
Cache memoy designed by Mohd TariqCache memoy designed by Mohd Tariq
Cache memoy designed by Mohd TariqMohd Tariq
 
Introduction to router
Introduction to routerIntroduction to router
Introduction to routerFarhan Galib
 
Classification of routing protocols
Classification of routing protocolsClassification of routing protocols
Classification of routing protocolsMenaga Selvaraj
 

What's hot (20)

IP Addressing (Subnetting, VLSM, Supernetting)
IP Addressing (Subnetting, VLSM, Supernetting)IP Addressing (Subnetting, VLSM, Supernetting)
IP Addressing (Subnetting, VLSM, Supernetting)
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelarators
 
Fddi seminar
Fddi seminarFddi seminar
Fddi seminar
 
Assignment 1,computer networks-317
Assignment 1,computer networks-317Assignment 1,computer networks-317
Assignment 1,computer networks-317
 
IOT - Unit 3.pptx
IOT - Unit 3.pptxIOT - Unit 3.pptx
IOT - Unit 3.pptx
 
Final networks lab manual
Final networks lab manualFinal networks lab manual
Final networks lab manual
 
RISC and CISC Processors
RISC and CISC ProcessorsRISC and CISC Processors
RISC and CISC Processors
 
INTERCONNECTION STRUCTURE
INTERCONNECTION STRUCTUREINTERCONNECTION STRUCTURE
INTERCONNECTION STRUCTURE
 
Wireless Sensor Networks ppt
Wireless Sensor Networks pptWireless Sensor Networks ppt
Wireless Sensor Networks ppt
 
Computer networks short note (version 8)
Computer networks short note (version 8)Computer networks short note (version 8)
Computer networks short note (version 8)
 
Networking presentation
Networking presentationNetworking presentation
Networking presentation
 
Notes on NUMA architecture
Notes on NUMA architectureNotes on NUMA architecture
Notes on NUMA architecture
 
introduction to Networking
introduction to Networkingintroduction to Networking
introduction to Networking
 
WIRELESS TRANSMISSION
WIRELESS TRANSMISSIONWIRELESS TRANSMISSION
WIRELESS TRANSMISSION
 
SCSI commands
SCSI commandsSCSI commands
SCSI commands
 
Hybrid wireless protocols
Hybrid wireless protocolsHybrid wireless protocols
Hybrid wireless protocols
 
Cache memoy designed by Mohd Tariq
Cache memoy designed by Mohd TariqCache memoy designed by Mohd Tariq
Cache memoy designed by Mohd Tariq
 
Network Layer & Transport Layer
Network Layer & Transport LayerNetwork Layer & Transport Layer
Network Layer & Transport Layer
 
Introduction to router
Introduction to routerIntroduction to router
Introduction to router
 
Classification of routing protocols
Classification of routing protocolsClassification of routing protocols
Classification of routing protocols
 

Viewers also liked

Encryptioon and key management introduction
Encryptioon and key management introductionEncryptioon and key management introduction
Encryptioon and key management introductionVyanktesh Dorlikar
 
Lec Jan12 2009
Lec Jan12 2009Lec Jan12 2009
Lec Jan12 2009Ravi Soni
 
Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3IBMInfoSphereUGFR
 

Viewers also liked (6)

Encryptioon and key management introduction
Encryptioon and key management introductionEncryptioon and key management introduction
Encryptioon and key management introduction
 
Lec Jan12 2009
Lec Jan12 2009Lec Jan12 2009
Lec Jan12 2009
 
9/27 PPT RSS
9/27 PPT RSS9/27 PPT RSS
9/27 PPT RSS
 
wireless biometric system
wireless biometric systemwireless biometric system
wireless biometric system
 
Object oriented data model
Object oriented data modelObject oriented data model
Object oriented data model
 
Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3
 

Similar to Ibm cell

3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Community
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Community
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Community
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overviewlambertt
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Community
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Odinot Stanislas
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalTommy Lee
 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Blockoscon2007
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Community
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Community
 
1 emc vs_compellent
1 emc vs_compellent1 emc vs_compellent
1 emc vs_compellentjyoti_j2
 

Similar to Ibm cell (20)

3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash Storage
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Block
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage
 
1 emc vs_compellent
1 emc vs_compellent1 emc vs_compellent
1 emc vs_compellent
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 

Recently uploaded

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfsmsksolar
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 

Recently uploaded (20)

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 

Ibm cell

  • 1. Design issues of IBM CellDesign issues of IBM Cell ArchitectureArchitecture Vitthal Gutthe MEIT 1326Vitthal Gutthe MEIT 1326 Pravin kumar Yadav MEIT 1338Pravin kumar Yadav MEIT 1338 Vyanktesh Dorlikar MEIT 1324Vyanktesh Dorlikar MEIT 1324
  • 2. contentscontents  General IntroductionGeneral Introduction  History of developmentHistory of development  Technical overview of architectureTechnical overview of architecture  Detailed technical discussion ofDetailed technical discussion of componentscomponents  Design choicesDesign choices  Cell programming issuesCell programming issues
  • 3. History of DevelopmentHistory of Development  Sony Playstation2Sony Playstation2 • Released March 2000 in JapanReleased March 2000 in Japan • 128bit “Emotion Engine”128bit “Emotion Engine” • With freq. of 294Mhz,MIPS CPUWith freq. of 294Mhz,MIPS CPU • Having capability of 6.2gflops(gegaHaving capability of 6.2gflops(gega floating point operation per second)floating point operation per second)
  • 4. History ContinuedHistory Continued  Partnership between Sony, Toshiba,Partnership between Sony, Toshiba, IBM in Summer of 2000IBM in Summer of 2000  Initial goal of 1000 x PS2 Power inInitial goal of 1000 x PS2 Power in single Machinesingle Machine  March 2001, Sony-IBM-ToshibaMarch 2001, Sony-IBM-Toshiba design center opened with andesign center opened with an investment of $400m investment.investment of $400m investment.
  • 5. Overall Goals for CellOverall Goals for Cell  High performance in multimedia appsHigh performance in multimedia apps  Gain Real time performanceGain Real time performance  Power consumption should bePower consumption should be minimumminimum  Cost as low as possibleCost as low as possible  Available by 2005Available by 2005  Avoid memory latency issuesAvoid memory latency issues associated with control structuresassociated with control structures
  • 6. The Cell itselfThe Cell itself  Power PC basedPower PC based main core (PPE)main core (PPE)  MultipleMultiple SPEs(Synergistic)SPEs(Synergistic)  On die memoryOn die memory controllercontroller  Inter-coreInter-core transport bustransport bus  High speed IOHigh speed IO
  • 7. Cell Die LayoutCell Die Layout
  • 8. Cell ImplementationCell Implementation  Cell is an architectureCell is an architecture  Preliminary ImplementationPreliminary Implementation • 1 PPE1 PPE • 7 SPE (1 Disabled for yield increase)7 SPE (1 Disabled for yield increase) • 221 mm² die size on a 90 nm process221 mm² die size on a 90 nm process • Clocked at freq. 3-4ghzClocked at freq. 3-4ghz • 256GFLOPS Single Precision @ 4ghz256GFLOPS Single Precision @ 4ghz
  • 9. Why a Cell ArchitectureWhy a Cell Architecture  Follows a trend in computingFollows a trend in computing architecturearchitecture  Natural extension of dual and multi-Natural extension of dual and multi- corecore  Extremely low hardware overheadExtremely low hardware overhead  Software controllableSoftware controllable  Specialized hardware more useful forSpecialized hardware more useful for multimediamultimedia
  • 10. Possible UsesPossible Uses  Playstation3Playstation3 (Obviously)(Obviously)  Blade servers (IBM)Blade servers (IBM) • Amazing singleAmazing single precision FPprecision FP performanceperformance • Scientific applicationsScientific applications  Toshiba HDTVToshiba HDTV productsproducts
  • 11. Power Processing ElementPower Processing Element  PowerPC instruction set with AltiVecPowerPC instruction set with AltiVec  Used for general purpose computingUsed for general purpose computing and controlling SPE’sand controlling SPE’s  Simultaneous MultithreadingSimultaneous Multithreading  Separate 32 KB L1 Caches andSeparate 32 KB L1 Caches and unified 512 KB L2 Cacheunified 512 KB L2 Cache
  • 12. PPE (cont.)PPE (cont.)  Slow but power efficient PowerPCSlow but power efficient PowerPC instruction set implementationinstruction set implementation  Two issue in-order instruction fetchTwo issue in-order instruction fetch  Conspicuous lack of instructionConspicuous lack of instruction windowwindow  Compare to conventional PowerPCCompare to conventional PowerPC implementations (G5)implementations (G5)  Performance depends on SPEPerformance depends on SPE utilizationutilization
  • 13. Synergistic Processing Element (SPE)Synergistic Processing Element (SPE)  Specialized hardwareSpecialized hardware  Meant to be used inMeant to be used in parallelparallel • (7 on PS3(7 on PS3 implementation)implementation)  On chip memory (256kb)On chip memory (256kb)  No branch predictionNo branch prediction  In-order executionIn-order execution  Dual issueDual issue
  • 14. SPE ArchitectureSPE Architecture  0.99µm2 on 90nm Process0.99µm2 on 90nm Process  128 registers (128 bits wide)128 registers (128 bits wide) • Instructions assumed to be 4x 32bitInstructions assumed to be 4x 32bit  Variant of VMX instruction setVariant of VMX instruction set • Modified for 128 registersModified for 128 registers  On chip memory is NOT a cacheOn chip memory is NOT a cache
  • 15. SPE ExecutionSPE Execution  Dual issue, in-orderDual issue, in-order  Seven execution unitsSeven execution units  Vector logicVector logic  8 single precision operations per8 single precision operations per cyclecycle  Significant performance hit forSignificant performance hit for double precisiondouble precision
  • 16. SPE Execution DiagramSPE Execution Diagram
  • 17. SPE Local Storage AreaSPE Local Storage Area  NOT a cacheNOT a cache  256kb, 4 x 64kb ECC single port256kb, 4 x 64kb ECC single port SRAMSRAM  Completely private to each SPECompletely private to each SPE  Directly addressable by softwareDirectly addressable by software  Can be used as a cache, but onlyCan be used as a cache, but only with software controlswith software controls  No tag bits, or any extra hardwareNo tag bits, or any extra hardware
  • 18. SPE LS SchedulingSPE LS Scheduling  Software controlled DMASoftware controlled DMA  DMA to and from main memoryDMA to and from main memory  Scheduling a HUGE problemScheduling a HUGE problem • Done primarily in softwareDone primarily in software • IBM predicts 80-90% usage ideallyIBM predicts 80-90% usage ideally  Request queue handles 16 simultaneousRequest queue handles 16 simultaneous requestsrequests • Up to 16 kb transfer eachUp to 16 kb transfer each • Priority: DMA, L/S, FetchPriority: DMA, L/S, Fetch  Fetch / execute parallelismFetch / execute parallelism
  • 19. SPE Control LogicSPE Control Logic  Very little in comparisonVery little in comparison  Represents shift in focusRepresents shift in focus  Complete lack of branch predictionComplete lack of branch prediction • Software branch predictionSoftware branch prediction • Loop unrollingLoop unrolling • 18 cycle penalty18 cycle penalty  Software controlled DMASoftware controlled DMA
  • 20. SPE PipelineSPE Pipeline  Little ILP, and thusLittle ILP, and thus little control logiclittle control logic  Dual issueDual issue  Simple commitSimple commit unit (no reorderunit (no reorder buffer or otherbuffer or other complexities)complexities)  Same executionSame execution unit for FP/intunit for FP/int
  • 21. SPE SummarySPE Summary  Essentially small vector computerEssentially small vector computer  Based on Altivec/VMX ISABased on Altivec/VMX ISA • Extensions for DMA and LS managementExtensions for DMA and LS management • Extended for 128x 128bit registerfileExtended for 128x 128bit registerfile  Uniquely suited for real time applicationsUniquely suited for real time applications  Extremely fast for certain FP operationsExtremely fast for certain FP operations  Offload a large amount on to compiler /Offload a large amount on to compiler / software.software.
  • 22. Element Interconnect BusElement Interconnect Bus  4 concentric rings connecting all Cell4 concentric rings connecting all Cell elementselements  128-bit wide interconnects128-bit wide interconnects
  • 23. EIB (cont.)EIB (cont.)  Designed to minimize coupling noiseDesigned to minimize coupling noise  Rings of data traveling in alternatingRings of data traveling in alternating directionsdirections  Buffers and repeaters at each SPEBuffers and repeaters at each SPE boundaryboundary  Architecture can be scaled up withArchitecture can be scaled up with increased bus latencyincreased bus latency
  • 24. EIB (cont.)EIB (cont.)  Total bandwidth at ~200GB/sTotal bandwidth at ~200GB/s  EIB controller located physically inEIB controller located physically in center of chip between SPE’scenter of chip between SPE’s  Controller reserves channels for eachController reserves channels for each individual data transfer requestindividual data transfer request  Implementation allows for SPEImplementation allows for SPE extension horizontallyextension horizontally
  • 25. Memory InterfaceMemory Interface  Rambus XDR memory to keep Cell atRambus XDR memory to keep Cell at full utilizationfull utilization  3.2 Gbps data bandwidth per device3.2 Gbps data bandwidth per device connected to XDR interfaceconnected to XDR interface  Cell uses dual channel XDR with fourCell uses dual channel XDR with four devices and 16-bit wide buses todevices and 16-bit wide buses to achieve 25.2 GB/s total memoryachieve 25.2 GB/s total memory bandwidthbandwidth
  • 26. Input / Output BusInput / Output Bus  Rambus FlexIO BusRambus FlexIO Bus  IO interface consists of 12IO interface consists of 12 unidirectional byte lanesunidirectional byte lanes  Each lane supports 6.4 GB/sEach lane supports 6.4 GB/s bandwidthbandwidth  7 outbound lanes and 5 inbound7 outbound lanes and 5 inbound laneslanes
  • 27. Design ChoicesDesign Choices  In-order executionIn-order execution • Abandoning ILPAbandoning ILP • ILP – 10-20% increase per generationILP – 10-20% increase per generation • Reducing control logicReducing control logic • Real time responsivenessReal time responsiveness  Cache DesignCache Design • Software configuration on SPESoftware configuration on SPE • Standard L2 cache on PPEStandard L2 cache on PPE
  • 28. Cell Programming IssuesCell Programming Issues  No Cell compiler in existence to manageNo Cell compiler in existence to manage utilization of SPE’s at compile timeutilization of SPE’s at compile time  SPE’s do not natively support contextSPE’s do not natively support context switching. Must be OS managed.switching. Must be OS managed.  SPE’s are vector processors. Not efficientSPE’s are vector processors. Not efficient for general-purpose computation.for general-purpose computation.  PPE’s and SPE’s use different instructionPPE’s and SPE’s use different instruction sets.sets.
  • 29. Cell Programming (cont.)Cell Programming (cont.)  Functional Offload ModelFunctional Offload Model  Simplest model for Cell programmingSimplest model for Cell programming  Optimize existing libraries for SPEOptimize existing libraries for SPE computationcomputation  Requires no rebuild of mainRequires no rebuild of main application logic which runs on PPEapplication logic which runs on PPE
  • 30. RefrencesRefrences • "Synergistic Processing in Cell's Multicore Architecture"(PDF). IEEE. Retrieved 2007-03-22. •Jump up^ "Cell Designer talks about PS3 and IBM Cell Processors". Retrieved 2007-03-22. •Jump up^ "Cell Broadband Engine Interconnect and Memory Interface"(PDF). IBM. Retrieved 2007- 03-22. •http://en.wikipedia.org/wiki/Cell_(microprocessor)