SlideShare une entreprise Scribd logo
1  sur  86
Télécharger pour lire hors ligne
Kish Hirani
Sebastien Schertenleib
Sony Computer Entertainment Europe
Research & Development Division
Slide 2
• PlayStation® Development Resources
• Architecture of the PlayStation®3 (PS3™)
• Case studies
– Cover techniques used by 1st party titles for both the PSP®
(PlayStation®Portable) and PS3™
• PlayStation® Advanced Programming
– Graphics performance
– PS3™ SPE utilisation
What We Will Be Covering
Kish Hirani
Head of Developer Services
Sony Computer Entertainment Europe
(SCEE)
Developer Services
SCEE - Research & Development Division
Slide 4
Developer Services
• Great Marlborough Street in
Central London
• Cover ‘PAL’ Region
Developers
Slide 5
• Support – SCE DevNet
• Field Engineering
• Technical and Strategic Consultancy
• Technical Training
• TRC Consultancy
• Projects
What We Cover
Slide 6
PlayStation® DevNet
Slide 7
PS3 DevKit
Latest additions to Dev Tools
PS3 Test Kit
Slide 8
Existing PSP DevKit
Latest additions to Dev Tools
PSPgo
Commander Arm
Slide 9
• PS3 DevKit: €1,700
• PS3 TestKit: €900
• PSP DevKit: €1,200
• PSP Commander: €300 euros
Dev Tools Prices
Slide 10
• Option for Digital Distribution via
PlayStation® Store as well as disc
• Same with full PSP titles
• And now minis
Digital Distribution Options
Slide 11
• PSN available in 58 countries
– 12 languages, 22 currencies
– Including Russia! 90% of all titles release are now localised.
• 700+ games and demos available
• 600+ million items downloaded worldwide
• More than 130K registered in Russia
• Cumulative Russia PSN Sales €650K
– as of FY2009 Q3
PlayStation®Network Vital Statistics
Slide 12
• 52.9 million globally (as of end June 2009)
– 17 million in SCEE Region
– More than a 850K in Russia
• Russia Tie Ratio: 2:1
PSP Vital Statistics
Slide 13
• 23.7 million PS3s (as of end of June 2009)
– One million of the new model sold in three weeks in
September
• More than 190K in the Russia so far
– PlayStation and PS2 both sold more than 2 million in
Russia during their lifespan (PS2 still selling well)
• 68% of PS3 users in Russia go online
• Russia Tie Ratio 5:1
PS3 Vital Statistics
Slide 14
https://www.tpr.scee.net/
Where to start if you would like to become a registered PlayStation Developer
Slide 15
http://www.worldwidestudios.net/xdev
For registered developers who wish to be considered as a SCEE 2nd Partner Developer
Dr Sebastien Schertenleib
Developer Services - SCEE Research & Development
PS3™ Hardware Overview
Slide 17
• Console vs PC Development
• Main Memory
• RSX™
– Bandwidth, Pipeline and Data Placement
• Cell Broadband Engine™
– Element Interconnect Bus, PPE and SPE
• Benefits of SPU Programming
• Data Storage
• Peripherals
PS3™ Hardware Overview
Slide 18
PC vs Console Game Development
PC
• Open Platform
• Limitless Resources
• Generic API
• IHV Drivers
Console
• Closed Platform
• Fixed Hardware
• Tiny HW Abstraction
• Direct access
Slide 19
• High performance on a budget
• Fixed hardware target with a long life cycles
– Games have to get better every year
– Waiting for hardware to catch up with software is not an option
• Have to understand the strengths and weaknesses of the
hardware
Why Consoles Are Different
Slide 20
• Performance benefits from understanding of how the
hardware works
– Does not mean coding in assembly
– Need to have an awareness of how software design may affect
performance
– Generally code optimized for consoles run also faster on PC
Why Consoles Are Different
Slide 21
• Memory is a very finite resource
• Memory access is slow compare to processor speed
– This gulf is getting larger on all platforms 
– Cache is trying to insulate the application from memory latency
– But cache on consoles are generally smaller than on a PC
Why Consoles Are Different
Slide 22
System Architecture
Slide 23
Main Memory
Slide 24
• Traditionally, heap function is called (malloc/new)
– Has to be contiguous
– Fragmentation possible
• Recommend acquiring memory through mmapper
– Build heap on top of this
– Non-contiguous pages mapped to contiguous address space
– Not subject to fragmentation
– More control: page sizes, access rights
Memory Management
Slide 25
• Games will typically have their own memory manager
– Small amount of memory compared to PC
– Conservative with data structures and memory allocations
• Different strategies for different uses
– Memory pools, custom heaps, memory tracking
• Understand the fragmentation, performance and space implications
for each tool and apply them appropriately
Memory Management
Slide 26
• Fetching data from memory into a processor can easily dominate
performance
– L1 cache miss 10~30 cycles
– L2 cache miss ~300 cycles
• Most applications exhibit locality of code or data
– A loop executes the same small set of instructions on data located
contiguously in memory
• Caches aim to increase overall system performance by taking advantage
of this
– Fast on die memory is very expensive so they are limited in size
Caches
Slide 27
Vertex
Shader
Setup/
Rasteriser
Fragment
Shader
Raster
Operations
GDDR
GDDR
RSX™
RSX™
Slide 28
• RSX™ has access to two memories
• Local Memory
–249MB usable (256MB – 7MB reserved by OS)
• System Memory
–213MB usable (256MB – 43MB reserved by OS)
–RSX™ usable through memory mapping by application
in 1MB blocks
Memory Setup
Slide 29
Memory Architecture
Main Memory
Cell RSX™
Local Memory
Slide 30
System Bandwidth
25.6GB/s 20.8GB/s
Audio Out
Video Out
15GB/s
20GB/s
Audio
I/F
Video
I/F
Cell
Cell
I/F
GPU
Main Memory Local Memory
Slide 31
Vertex
Shader
Triangle
Setup
Rasteriser Fragment
Shader
Raster
Operations
Vertex/Index DataVertex/Index Data
Local MemoryMain Memory
RSX™ Pipeline
Slide 32
Vertex
Shader
Rasteriser Fragment
Shader
Main Texture
Cache
Local Texture
Cache
Textures Textures
Triangle
Setup
Raster
Operations
Local MemoryMain Memory
RSX™ Pipeline
Slide 33
Z-Cull
Feedback
Fragment
Shader
Texture/
Framebuffer
Texture/
Framebuffer
Raster
Operations
Vertex
Shader
Triangle
Setup
Rasteriser
Local MemoryMain Memory
RSX™ Pipeline
Slide 34
• Able to utilise graphics data in either memory
– Vertex attributes, Textures
• Frame Buffer stores colour & depth surfaces
– Can also be in either memory
– Local memory supports colour & Z compression for MSAA modes
• Saves bandwidth rather than space
• Generally local memory accesses are faster
• If local memory bandwidth is bottleneck → consider moving
assets to system memory
Data Placement
Slide 35
How triangles end up on screen
RSX™
Local Memory
Cell
Main Memory
Verts
Textures
Shaders
Verts
Textures
ShadersCmd Buffer
Slide 36
Cell Broadband Engine™
Slide 37
Cell Broadband Engine™
• Processor core of PS3™
– Clocked at 3.2GHz
• Contains 7 processing cores.
– PowerPC Processor Element (PPE)
• 512k cache memory
– 6 Synergistic Processing Elements (SPE)
• 256k local memory store
Slide 38
Inputs and Outputs
Slide 39
• Memory Interface Controller (MIC)
– Controls the 2 Rambus XDR channels.
– Theoretical 25.6GB/s transfer
• IOIF 0 - Connection to RSX™
– 20 GB/s to RSX™
– 15 GB/s from RSX™
• IOIF1 – Connection to South Bridge
– 2.5GB/s to/from devices
Inputs and Outputs
Slide 40
Element Interconnect Bus
Slide 41
• This connects all units in the CBE
• Consist of 4 128bit buses
• All data transfers are 128 byte packets
– Smaller transfers sent as partial packed
Element Interconnect Bus
Slide 42
Element Interconnect Bus
Slide 43
Element Interconnect Bus
Slide 44
PowerPC Processing Element
Slide 45
• 64 bit CPU core
• 25.6 GFLOPs FPU
• In order execution
• Execution units include:
– IEEE double precision FPU
– Integer
– Float vector unit (VMX)
PowerPC Processing Element
Slide 46
PPU
E
I
B
PowerPC Processing Element
Slide 47
• Intrinsics are supported, preferable to inline assembly
–Scheduling and optimisation
• Two hardware threads
–2 or more active software threads act to mitigate stalls
• Good use of caches can make big difference in
performance
PPE Performance
Slide 48
• Power PC issue that can severely affect performance
• PPU has a store queue, that queues up stores waiting to
get sent to the cache
• If a load occurs that conflicts with data in store queue, the
load will stall until the data is stored
– Pipeline stall, around 40 cycles
• Find using Tuner
– LHS performance counter
Load Hit Store
Slide 49
• Type conversion
– Integer, floating point or vector registers
– Moves have to go through memory as there are no asm
instructions to move data between register sets
• Stack access - when calling a function that has:
– Large parameters - will be passed on the stack
– Many parameters - some may end up on the stack
– Few instructions - with stack based parameters
LHS Cases
Slide 50
Synergistic Processor Elements
Slide 51
SPE Architecture
• 3.2GHz processor
• 128 Vector registers
• 128 bit
• No scalar registers
• 256kB Local Store
• Very fast, like L1 cache
• Runs Asynchronously from rest of system
• Independent Memory Flow Controller (MFC)
• SIMD (single instruction multiple data)
• No Operating System, Very few system calls
• Performance is high and deterministic
SPE
EIB
MFC
SPU
Local Store
Slide 52
• SPE means Synergistic Processing Element
– Mutual dependence between PPE and SPEs
• PPE runs operating system and performs top level control
– General Purpose Hardware (PowerPC)
• SPEs provide bulk of application performance
– Simple hardware
– Simple pipeline
– Simple memory architecture
– Massive computational horsepower
SPEs
Slide 53
• Simple, predictable instruction set
– Has full SIMD support
– Intrinsics are supported for all instructions
• Optimised compiler
– Handles scalar conversion
– Often as good as PPE for scalar code, can be quicker
• Easy to program; can be programmed in C/C++
• Core power of system, key to performance
• Frees up RSX™, frees up PPE
Synergistic Processor Elements
Slide 54
• Ported PPU code usually faster for optimised & non-optimised code
– Port your code
• Easy to program; with some limitations can be programmed in C/C++
• Scalar code can be run
– Full SIMD instruction set
• Multithreaded engine analogy - can move threads to SPUs
SPU Development
Slide 55
• Raw
– You own the hardware
– Complete control, direct hardware access
– Once loaded – stays resident
• SPU Threads
– The OS owns the hardware
– Must belong to a thread group
– Scheduled by Cell OS Lv-2 kernel – based on group priority
• SPURS
– Higher level abstraction
– Plays well with middleware
Ways of Using SPUs
Slide 56
• Runs on-top of SPU Threads
– Small kernel on SPU
• Thread switching more efficient
– Requires no PPU resources
– PPU SPURS Handler: only to restore thread group when necessary
• Easier to synchronise threads
• Easier to balance load on multiple SPUs
– SPUs grab work when available
SPURS (SPU Run Time System)
Slide 57
SPURS (SPU Run Time System)
Policy Module
SPURS Kernel
Work
Load
Policy Module
SPURS Kernel
Work
Load
Policy Module
SPURS Kernel
SPURS
Handler
PPU SPU0 SPU1 SPU2
Thread Group for SPURS
• Implemented as an
SPU Thread Group
• Kernel, Policy
Module, Workload
• Minimal PPU
Overhead
Work
Load
Slide 58
• Jobs
– A chain of execution units
– Suited to short execution times
– Automatic DMA transfer of input & output data
– Executed with associated data
– No context switching – run complete
• Tasks
– Different data model, larger programs
– Can yield - synchronisation mechanisms
– Optimised context switch
– Similar to threads
SPURS Workload Types
Slide 59
• SPURS JobChain uses manual control of the execution sequence
• SPURS JobQueue is designed for dynamic job submission in parallel
SPURS Job Queue
SPU
SPU
SPURS Job Queue
PPU Fiber
Port
PPU Thread
SPU
Submitted jobs
SPURS Task
submitting in parallel executing in parallel
Slide 60
Example Application 1
Slide 61
• Problem: Update an array of objects and submit the visible
ones for rendering
• Object.update() method was bottleneck on PPE
– Determined using SN Tuner
– This function generates the world space matrices for each
object
– Embarrassingly parallel
• No data dependencies between objects
• If we had 1 processor for each object we could do that 
Function off load example using a SPURS Task
Slide 62
SPURS Tasks
//---Conditionally calling SPU Code
if (usingSPU==false) //---------------------------generic version
{
for(int i=0; i<OBJECT_COUNT; i++)
{
testObject[i].update();
}
}
else //-------------------------------------------SPU version
{
int test=spurs.ThrowTaskRunComplete(taskUpdateObject_elf_start,
(int)testObject,
OBJECT_COUNT,0,0);
//could do something useful here…
int result = spurs.CatchTaskRunComplete(test);
}
Slide 63
• Getting data in and out of SPU is first problem
• Get this working before worrying about actual processing
– Brute force often works just fine
• DMA entire object array into SPE memory
• Run update method
• DMA entire object array back to main memory
• List DMA allows you to get fancy
– Gather and Scatter operation
• If the data set is really big or if you need every last drop of performance
– Streaming model
• Overlap DMA transfers with processing
• Double or quad buffering
SPURS Task
Slide 64
SPURS Task: Getting Data in and out
int cellSpuMain(.....)
{
int sourceAddr = ....; //source address
int count = ....; //number of data elements
int dataSize = count * sizeof(gfxObject); //amount of memory to get
gfxObject *buf = (gfxObject*)memalign(128,dataSize); //alloc mem
DmaGet((void*)buf,sourceAddr,dataSize);
DmaWait(....);
//---data is loaded at this point so do something interesting
DmaPut((void*)buf,sourceAddr,dataSize);
DmaWait(....);
return(0);
}
Slide 65
• Keep code the same as PPE Version
– Conditional compilation based on processor type
– Might have to split code into a separate file
SPURS Task: Executing the Code
//--- SPU code to call the update method
//--- data is loaded at this point so do something interesting
//--- step through array of objects and update
gfxObject *tp;
for(int i=0; i<count; i++)
{
tp = &buf[i];
tp->update(); //same method as on PPE compiled for SPU
}
Slide 66
• 5x Speed up in this instance
– Brute force solution
– Could add more SPUS
– Problem is embarrassingly parallel
• Note that the same source code for the method is being
used on PPU and SPU
– Code stays in sync
– No fancy SPU specific optimisations
SPURS Task: Results
Slide 67
• PPU Stalled while waiting for SPU to Process
• SPU Data get and put were synchronous
– Not using hardware to its fullest extent
– Easy to get more if we need it
SPURS Task: Time Waster
PPU PPU
GET PUTExec
TIME
SPU
Slide 68
Example Application 2
Slide 69
Physics Solver
Parallel SPU x5PPU
512
22ms
3200
23ms
Slide 70
PPU Solver
• Collision detection and response is a big, involved process
0
50
100
150
200
250
0 100 200 300 400 500 600
Time(ms)
Step
Integrate
Collision
Response
3200 boxes
Slide 71
Development History
0
50
100
150
200
250
PPU x1 SPU x1 SPU x5 SPU x5 (Optimised)
Time(ms)
Total
Time(ms)
Slide 72
SPU solver
0
5
10
15
20
25
0 100 200 300 400 500 600
Time(ms)
Step
Integrate
Collision
Response
Time(ms)
Slide 73
• Simple pipeline
– No stalls like LHS
• No Operating System, very few system calls
– Performance is high and deterministic
• Memory is very fast like L1 cache
– Memory transfers a asynchronous so can be hidden
• Even if performance is comparable, frees up the PPE to do other stuff
SPEs are faster than the PPE
Slide 74
• Double buffer
• Align data and transfers optimally
• Vectorise data: SOA instead of AOS
• Use SIMD
– 4 operations at once = 4x speed up
• Use intrinsics
• Program directives - branch hinting
Further SPU Optimisations
Slide 75
Double-Buffering
Buffer0 Buffer0 Buffer0 Buffer0
Buffer1 Buffer1 Buffer1 Buffer1
DMA Computation
• Allows overlapping of DMA and computation
• Transfer to/from one buffer while using data from another
• Use different tag group for each buffer
• Depends on space available in LS
TIME
Slide 76
• Graphics API
• Multistream – Audio
• Bullet – Physics
• PlayStation®Edge – Geometry/Compression/Animation
• FIOS – Optimised disc access
• Various Codecs
Our Libraries that run on SPUs
Slide 77
• Game engine including
– Modular run time
– Samples, docs and whitepapers
– Full source and art work
• Optimised for multi-core platforms especially PS3™
• Extractable and reusable components
PhyreEngine™
Slide 78
• A lot of 3rd party middleware partners use the SPUs
– Physics
– Audio
– AI
– Game Engines
– Many more…
• Examples by guest speakers at the end of the talk
– NVIDIA® PhysX ®, Epic Games Unreal ® Engine3
and Crytek CryENGINE ® 3
In Addition
Slide 79
• Animation
• AI Threat prediction
• AI Line of fire
• AI Obstacle avoidance
• Collision detection
• Physics
• Particle simulation
• Particle rendering
• Scene graph
Killzone®2 SPU Usage
• Display list building
• IBL Light probes
• Graphics post-processing
• Dynamic music system
• Skinning
• MP3 Decompression
• Edge Zlib
• etc.
44 Job types
Slide 80
• Effect done on SPU
– Motion Blur, Bloom, Depth of Field
– Quarter-res image on XDR
• SPUs assist RSX™
1. RSX™ prepares low-res image buffers in XDR
2. RSX™ triggers interrupt to start SPUs
3. SPUs perform image operations
4. RSX™ already starts on next frame
5. Result of SPUs processed by RSX™ early in next frame
Post Processing
Slide 81
Resolve to quarter resolution
Image generated on the SPUs
(bloom, DoF, motion blur)
Final image by the RSX™
(Blending the out-of-focus and blurry areas in)
XDR DDR
Slide 82
South Bridge
Slide 83
• Faster than Blu-ray
– Over 20MB/s vs. 9MB/s of Blu-ray
– Cache data to HDD for faster subsequent load times
– Install titles
– Stream data
• Virtual memory
• Decrease usage of system memory
Storage – Hard drive
Slide 84
• High capacity
– 25GB single layer
– 50GB dual layer
• Constant linear velocity
– Constant read speed over disc - 9 MB/s
– Optimise layout, duplicate and pack data
• Use compression
– Lowers bandwidth cost and lowers read time
– Use SPUs to decompress – Edge Zlib
• Can emulate on Devkit
Blu-ray
Slide 85
• Can access both Blu-ray drive (~9 MB/s) and
HDD (~20MB/s) simultaneously
– On HDD, data come from the system cache and
game data partitions
• SPU Zlib decompression
– Increase the throughput
• libFIOS
– Optimize I/O transfers
– Ease background HDD caching implementation
PS3TM I/O Performance
System
Cache
(2GB)
25 ms apart
Game Data
(up to 5GB
per title)
Slide 86
Any questions so far?

Contenu connexe

Tendances

Introduction into Procedural Content Generation by Yogie Aditya
Introduction into Procedural Content Generation by Yogie AdityaIntroduction into Procedural Content Generation by Yogie Aditya
Introduction into Procedural Content Generation by Yogie AdityagamelanYK
 
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing Cameras
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing CamerasQualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing Cameras
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing CamerasQualcomm Developer Network
 
Game engines and Their Influence in Game Design
Game engines and Their Influence in Game DesignGame engines and Their Influence in Game Design
Game engines and Their Influence in Game DesignPrashant Warrier
 
Technologies Used In Graphics Rendering
Technologies Used In Graphics RenderingTechnologies Used In Graphics Rendering
Technologies Used In Graphics RenderingBhupinder Singh
 
98 374 Lesson 02-slides
98 374 Lesson 02-slides98 374 Lesson 02-slides
98 374 Lesson 02-slidesTracie King
 
XNA for Windows Phone
XNA for Windows PhoneXNA for Windows Phone
XNA for Windows PhoneEd Donahue
 
Output devices
Output devicesOutput devices
Output devicesyogiraj001
 
98 374 Lesson 04-slides
98 374 Lesson 04-slides98 374 Lesson 04-slides
98 374 Lesson 04-slidesTracie King
 
Computer Output Devices
Computer   Output DevicesComputer   Output Devices
Computer Output DevicesFaraz Ahmed
 
Console design
Console designConsole design
Console designcrimzon36
 
98 374 Lesson 05-slides
98 374 Lesson 05-slides98 374 Lesson 05-slides
98 374 Lesson 05-slidesTracie King
 
Output device presentation
Output device presentationOutput device presentation
Output device presentationSuraj Neupane
 
Motion graphics and_compositing_video_analysis_worksheet_3
Motion graphics and_compositing_video_analysis_worksheet_3Motion graphics and_compositing_video_analysis_worksheet_3
Motion graphics and_compositing_video_analysis_worksheet_3megrobbo95
 
Fundamentals of computer output devices
Fundamentals of computer output devicesFundamentals of computer output devices
Fundamentals of computer output devicesJesus Obenita Jr.
 
The Ultimate Gaming
The Ultimate GamingThe Ultimate Gaming
The Ultimate Gamingkoolshreeram
 

Tendances (20)

Introduction into Procedural Content Generation by Yogie Aditya
Introduction into Procedural Content Generation by Yogie AdityaIntroduction into Procedural Content Generation by Yogie Aditya
Introduction into Procedural Content Generation by Yogie Aditya
 
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing Cameras
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing CamerasQualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing Cameras
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing Cameras
 
Game engines and Their Influence in Game Design
Game engines and Their Influence in Game DesignGame engines and Their Influence in Game Design
Game engines and Their Influence in Game Design
 
Technologies Used In Graphics Rendering
Technologies Used In Graphics RenderingTechnologies Used In Graphics Rendering
Technologies Used In Graphics Rendering
 
98 374 Lesson 02-slides
98 374 Lesson 02-slides98 374 Lesson 02-slides
98 374 Lesson 02-slides
 
XNA for Windows Phone
XNA for Windows PhoneXNA for Windows Phone
XNA for Windows Phone
 
Output devices
Output devicesOutput devices
Output devices
 
98 374 Lesson 04-slides
98 374 Lesson 04-slides98 374 Lesson 04-slides
98 374 Lesson 04-slides
 
Output Device
Output DeviceOutput Device
Output Device
 
Hp envy
Hp  envyHp  envy
Hp envy
 
Computer Output Devices
Computer   Output DevicesComputer   Output Devices
Computer Output Devices
 
Control panel
Control panelControl panel
Control panel
 
Output devices
Output devicesOutput devices
Output devices
 
Console design
Console designConsole design
Console design
 
98 374 Lesson 05-slides
98 374 Lesson 05-slides98 374 Lesson 05-slides
98 374 Lesson 05-slides
 
Output device presentation
Output device presentationOutput device presentation
Output device presentation
 
Motion graphics and_compositing_video_analysis_worksheet_3
Motion graphics and_compositing_video_analysis_worksheet_3Motion graphics and_compositing_video_analysis_worksheet_3
Motion graphics and_compositing_video_analysis_worksheet_3
 
1. case study
1. case study1. case study
1. case study
 
Fundamentals of computer output devices
Fundamentals of computer output devicesFundamentals of computer output devices
Fundamentals of computer output devices
 
The Ultimate Gaming
The Ultimate GamingThe Ultimate Gaming
The Ultimate Gaming
 

Similaire à Sony Computer Entertainment Europe Research & Development Division

CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)byteLAKE
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Community
 
Ca lecture 03
Ca lecture 03Ca lecture 03
Ca lecture 03Haris456
 
The Next Generation of PhyreEngine
The Next Generation of PhyreEngineThe Next Generation of PhyreEngine
The Next Generation of PhyreEngineSlide_N
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterPatrick Quairoli
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleJames Saint-Rossy
 
Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)AllineaSoftware
 
IS740 Chapter 03
IS740 Chapter 03IS740 Chapter 03
IS740 Chapter 03iDocs
 
Basic Computer Architecture
Basic Computer ArchitectureBasic Computer Architecture
Basic Computer ArchitectureYong Heui Cho
 
The Central Processing Unit(CPU) for Chapter 4
The Central Processing Unit(CPU) for Chapter 4The Central Processing Unit(CPU) for Chapter 4
The Central Processing Unit(CPU) for Chapter 4MKKhaing
 

Similaire à Sony Computer Entertainment Europe Research & Development Division (20)

CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
 
Ca lecture 03
Ca lecture 03Ca lecture 03
Ca lecture 03
 
Architecture of high end processors
Architecture of high end processorsArchitecture of high end processors
Architecture of high end processors
 
Chapter Three
Chapter ThreeChapter Three
Chapter Three
 
The Next Generation of PhyreEngine
The Next Generation of PhyreEngineThe Next Generation of PhyreEngine
The Next Generation of PhyreEngine
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at Scale
 
Coa presentation3
Coa presentation3Coa presentation3
Coa presentation3
 
Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)
 
IS_Ch03.ppt
IS_Ch03.pptIS_Ch03.ppt
IS_Ch03.ppt
 
Is ch03
Is ch03Is ch03
Is ch03
 
IS740 Chapter 03
IS740 Chapter 03IS740 Chapter 03
IS740 Chapter 03
 
Cpu
CpuCpu
Cpu
 
Computer !
Computer !Computer !
Computer !
 
Basic Computer Architecture
Basic Computer ArchitectureBasic Computer Architecture
Basic Computer Architecture
 
Chap4.ppt
Chap4.pptChap4.ppt
Chap4.ppt
 
Chap4.ppt
Chap4.pptChap4.ppt
Chap4.ppt
 
Chap4.ppt
Chap4.pptChap4.ppt
Chap4.ppt
 
The Central Processing Unit(CPU) for Chapter 4
The Central Processing Unit(CPU) for Chapter 4The Central Processing Unit(CPU) for Chapter 4
The Central Processing Unit(CPU) for Chapter 4
 

Plus de Slide_N

New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiSlide_N
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSlide_N
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60 Slide_N
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomSlide_N
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationSlide_N
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignSlide_N
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotSlide_N
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: TheorySlide_N
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMSlide_N
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Slide_N
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionSlide_N
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerSlide_N
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesSlide_N
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
 
MLAA on PS3
MLAA on PS3MLAA on PS3
MLAA on PS3Slide_N
 
SPU gameplay
SPU gameplaySPU gameplay
SPU gameplaySlide_N
 
Insomniac Physics
Insomniac PhysicsInsomniac Physics
Insomniac PhysicsSlide_N
 
SPU Shaders
SPU ShadersSPU Shaders
SPU ShadersSlide_N
 
SPU Physics
SPU PhysicsSPU Physics
SPU PhysicsSlide_N
 
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Slide_N
 

Plus de Slide_N (20)

New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of Destruction
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM Power
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution Continues
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 
MLAA on PS3
MLAA on PS3MLAA on PS3
MLAA on PS3
 
SPU gameplay
SPU gameplaySPU gameplay
SPU gameplay
 
Insomniac Physics
Insomniac PhysicsInsomniac Physics
Insomniac Physics
 
SPU Shaders
SPU ShadersSPU Shaders
SPU Shaders
 
SPU Physics
SPU PhysicsSPU Physics
SPU Physics
 
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2
 

Dernier

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Dernier (20)

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

Sony Computer Entertainment Europe Research & Development Division

  • 1. Kish Hirani Sebastien Schertenleib Sony Computer Entertainment Europe Research & Development Division
  • 2. Slide 2 • PlayStation® Development Resources • Architecture of the PlayStation®3 (PS3™) • Case studies – Cover techniques used by 1st party titles for both the PSP® (PlayStation®Portable) and PS3™ • PlayStation® Advanced Programming – Graphics performance – PS3™ SPE utilisation What We Will Be Covering
  • 3. Kish Hirani Head of Developer Services Sony Computer Entertainment Europe (SCEE) Developer Services SCEE - Research & Development Division
  • 4. Slide 4 Developer Services • Great Marlborough Street in Central London • Cover ‘PAL’ Region Developers
  • 5. Slide 5 • Support – SCE DevNet • Field Engineering • Technical and Strategic Consultancy • Technical Training • TRC Consultancy • Projects What We Cover
  • 7. Slide 7 PS3 DevKit Latest additions to Dev Tools PS3 Test Kit
  • 8. Slide 8 Existing PSP DevKit Latest additions to Dev Tools PSPgo Commander Arm
  • 9. Slide 9 • PS3 DevKit: €1,700 • PS3 TestKit: €900 • PSP DevKit: €1,200 • PSP Commander: €300 euros Dev Tools Prices
  • 10. Slide 10 • Option for Digital Distribution via PlayStation® Store as well as disc • Same with full PSP titles • And now minis Digital Distribution Options
  • 11. Slide 11 • PSN available in 58 countries – 12 languages, 22 currencies – Including Russia! 90% of all titles release are now localised. • 700+ games and demos available • 600+ million items downloaded worldwide • More than 130K registered in Russia • Cumulative Russia PSN Sales €650K – as of FY2009 Q3 PlayStation®Network Vital Statistics
  • 12. Slide 12 • 52.9 million globally (as of end June 2009) – 17 million in SCEE Region – More than a 850K in Russia • Russia Tie Ratio: 2:1 PSP Vital Statistics
  • 13. Slide 13 • 23.7 million PS3s (as of end of June 2009) – One million of the new model sold in three weeks in September • More than 190K in the Russia so far – PlayStation and PS2 both sold more than 2 million in Russia during their lifespan (PS2 still selling well) • 68% of PS3 users in Russia go online • Russia Tie Ratio 5:1 PS3 Vital Statistics
  • 14. Slide 14 https://www.tpr.scee.net/ Where to start if you would like to become a registered PlayStation Developer
  • 15. Slide 15 http://www.worldwidestudios.net/xdev For registered developers who wish to be considered as a SCEE 2nd Partner Developer
  • 16. Dr Sebastien Schertenleib Developer Services - SCEE Research & Development PS3™ Hardware Overview
  • 17. Slide 17 • Console vs PC Development • Main Memory • RSX™ – Bandwidth, Pipeline and Data Placement • Cell Broadband Engine™ – Element Interconnect Bus, PPE and SPE • Benefits of SPU Programming • Data Storage • Peripherals PS3™ Hardware Overview
  • 18. Slide 18 PC vs Console Game Development PC • Open Platform • Limitless Resources • Generic API • IHV Drivers Console • Closed Platform • Fixed Hardware • Tiny HW Abstraction • Direct access
  • 19. Slide 19 • High performance on a budget • Fixed hardware target with a long life cycles – Games have to get better every year – Waiting for hardware to catch up with software is not an option • Have to understand the strengths and weaknesses of the hardware Why Consoles Are Different
  • 20. Slide 20 • Performance benefits from understanding of how the hardware works – Does not mean coding in assembly – Need to have an awareness of how software design may affect performance – Generally code optimized for consoles run also faster on PC Why Consoles Are Different
  • 21. Slide 21 • Memory is a very finite resource • Memory access is slow compare to processor speed – This gulf is getting larger on all platforms  – Cache is trying to insulate the application from memory latency – But cache on consoles are generally smaller than on a PC Why Consoles Are Different
  • 24. Slide 24 • Traditionally, heap function is called (malloc/new) – Has to be contiguous – Fragmentation possible • Recommend acquiring memory through mmapper – Build heap on top of this – Non-contiguous pages mapped to contiguous address space – Not subject to fragmentation – More control: page sizes, access rights Memory Management
  • 25. Slide 25 • Games will typically have their own memory manager – Small amount of memory compared to PC – Conservative with data structures and memory allocations • Different strategies for different uses – Memory pools, custom heaps, memory tracking • Understand the fragmentation, performance and space implications for each tool and apply them appropriately Memory Management
  • 26. Slide 26 • Fetching data from memory into a processor can easily dominate performance – L1 cache miss 10~30 cycles – L2 cache miss ~300 cycles • Most applications exhibit locality of code or data – A loop executes the same small set of instructions on data located contiguously in memory • Caches aim to increase overall system performance by taking advantage of this – Fast on die memory is very expensive so they are limited in size Caches
  • 28. Slide 28 • RSX™ has access to two memories • Local Memory –249MB usable (256MB – 7MB reserved by OS) • System Memory –213MB usable (256MB – 43MB reserved by OS) –RSX™ usable through memory mapping by application in 1MB blocks Memory Setup
  • 29. Slide 29 Memory Architecture Main Memory Cell RSX™ Local Memory
  • 30. Slide 30 System Bandwidth 25.6GB/s 20.8GB/s Audio Out Video Out 15GB/s 20GB/s Audio I/F Video I/F Cell Cell I/F GPU Main Memory Local Memory
  • 31. Slide 31 Vertex Shader Triangle Setup Rasteriser Fragment Shader Raster Operations Vertex/Index DataVertex/Index Data Local MemoryMain Memory RSX™ Pipeline
  • 32. Slide 32 Vertex Shader Rasteriser Fragment Shader Main Texture Cache Local Texture Cache Textures Textures Triangle Setup Raster Operations Local MemoryMain Memory RSX™ Pipeline
  • 34. Slide 34 • Able to utilise graphics data in either memory – Vertex attributes, Textures • Frame Buffer stores colour & depth surfaces – Can also be in either memory – Local memory supports colour & Z compression for MSAA modes • Saves bandwidth rather than space • Generally local memory accesses are faster • If local memory bandwidth is bottleneck → consider moving assets to system memory Data Placement
  • 35. Slide 35 How triangles end up on screen RSX™ Local Memory Cell Main Memory Verts Textures Shaders Verts Textures ShadersCmd Buffer
  • 37. Slide 37 Cell Broadband Engine™ • Processor core of PS3™ – Clocked at 3.2GHz • Contains 7 processing cores. – PowerPC Processor Element (PPE) • 512k cache memory – 6 Synergistic Processing Elements (SPE) • 256k local memory store
  • 39. Slide 39 • Memory Interface Controller (MIC) – Controls the 2 Rambus XDR channels. – Theoretical 25.6GB/s transfer • IOIF 0 - Connection to RSX™ – 20 GB/s to RSX™ – 15 GB/s from RSX™ • IOIF1 – Connection to South Bridge – 2.5GB/s to/from devices Inputs and Outputs
  • 41. Slide 41 • This connects all units in the CBE • Consist of 4 128bit buses • All data transfers are 128 byte packets – Smaller transfers sent as partial packed Element Interconnect Bus
  • 45. Slide 45 • 64 bit CPU core • 25.6 GFLOPs FPU • In order execution • Execution units include: – IEEE double precision FPU – Integer – Float vector unit (VMX) PowerPC Processing Element
  • 47. Slide 47 • Intrinsics are supported, preferable to inline assembly –Scheduling and optimisation • Two hardware threads –2 or more active software threads act to mitigate stalls • Good use of caches can make big difference in performance PPE Performance
  • 48. Slide 48 • Power PC issue that can severely affect performance • PPU has a store queue, that queues up stores waiting to get sent to the cache • If a load occurs that conflicts with data in store queue, the load will stall until the data is stored – Pipeline stall, around 40 cycles • Find using Tuner – LHS performance counter Load Hit Store
  • 49. Slide 49 • Type conversion – Integer, floating point or vector registers – Moves have to go through memory as there are no asm instructions to move data between register sets • Stack access - when calling a function that has: – Large parameters - will be passed on the stack – Many parameters - some may end up on the stack – Few instructions - with stack based parameters LHS Cases
  • 51. Slide 51 SPE Architecture • 3.2GHz processor • 128 Vector registers • 128 bit • No scalar registers • 256kB Local Store • Very fast, like L1 cache • Runs Asynchronously from rest of system • Independent Memory Flow Controller (MFC) • SIMD (single instruction multiple data) • No Operating System, Very few system calls • Performance is high and deterministic SPE EIB MFC SPU Local Store
  • 52. Slide 52 • SPE means Synergistic Processing Element – Mutual dependence between PPE and SPEs • PPE runs operating system and performs top level control – General Purpose Hardware (PowerPC) • SPEs provide bulk of application performance – Simple hardware – Simple pipeline – Simple memory architecture – Massive computational horsepower SPEs
  • 53. Slide 53 • Simple, predictable instruction set – Has full SIMD support – Intrinsics are supported for all instructions • Optimised compiler – Handles scalar conversion – Often as good as PPE for scalar code, can be quicker • Easy to program; can be programmed in C/C++ • Core power of system, key to performance • Frees up RSX™, frees up PPE Synergistic Processor Elements
  • 54. Slide 54 • Ported PPU code usually faster for optimised & non-optimised code – Port your code • Easy to program; with some limitations can be programmed in C/C++ • Scalar code can be run – Full SIMD instruction set • Multithreaded engine analogy - can move threads to SPUs SPU Development
  • 55. Slide 55 • Raw – You own the hardware – Complete control, direct hardware access – Once loaded – stays resident • SPU Threads – The OS owns the hardware – Must belong to a thread group – Scheduled by Cell OS Lv-2 kernel – based on group priority • SPURS – Higher level abstraction – Plays well with middleware Ways of Using SPUs
  • 56. Slide 56 • Runs on-top of SPU Threads – Small kernel on SPU • Thread switching more efficient – Requires no PPU resources – PPU SPURS Handler: only to restore thread group when necessary • Easier to synchronise threads • Easier to balance load on multiple SPUs – SPUs grab work when available SPURS (SPU Run Time System)
  • 57. Slide 57 SPURS (SPU Run Time System) Policy Module SPURS Kernel Work Load Policy Module SPURS Kernel Work Load Policy Module SPURS Kernel SPURS Handler PPU SPU0 SPU1 SPU2 Thread Group for SPURS • Implemented as an SPU Thread Group • Kernel, Policy Module, Workload • Minimal PPU Overhead Work Load
  • 58. Slide 58 • Jobs – A chain of execution units – Suited to short execution times – Automatic DMA transfer of input & output data – Executed with associated data – No context switching – run complete • Tasks – Different data model, larger programs – Can yield - synchronisation mechanisms – Optimised context switch – Similar to threads SPURS Workload Types
  • 59. Slide 59 • SPURS JobChain uses manual control of the execution sequence • SPURS JobQueue is designed for dynamic job submission in parallel SPURS Job Queue SPU SPU SPURS Job Queue PPU Fiber Port PPU Thread SPU Submitted jobs SPURS Task submitting in parallel executing in parallel
  • 61. Slide 61 • Problem: Update an array of objects and submit the visible ones for rendering • Object.update() method was bottleneck on PPE – Determined using SN Tuner – This function generates the world space matrices for each object – Embarrassingly parallel • No data dependencies between objects • If we had 1 processor for each object we could do that  Function off load example using a SPURS Task
  • 62. Slide 62 SPURS Tasks //---Conditionally calling SPU Code if (usingSPU==false) //---------------------------generic version { for(int i=0; i<OBJECT_COUNT; i++) { testObject[i].update(); } } else //-------------------------------------------SPU version { int test=spurs.ThrowTaskRunComplete(taskUpdateObject_elf_start, (int)testObject, OBJECT_COUNT,0,0); //could do something useful here… int result = spurs.CatchTaskRunComplete(test); }
  • 63. Slide 63 • Getting data in and out of SPU is first problem • Get this working before worrying about actual processing – Brute force often works just fine • DMA entire object array into SPE memory • Run update method • DMA entire object array back to main memory • List DMA allows you to get fancy – Gather and Scatter operation • If the data set is really big or if you need every last drop of performance – Streaming model • Overlap DMA transfers with processing • Double or quad buffering SPURS Task
  • 64. Slide 64 SPURS Task: Getting Data in and out int cellSpuMain(.....) { int sourceAddr = ....; //source address int count = ....; //number of data elements int dataSize = count * sizeof(gfxObject); //amount of memory to get gfxObject *buf = (gfxObject*)memalign(128,dataSize); //alloc mem DmaGet((void*)buf,sourceAddr,dataSize); DmaWait(....); //---data is loaded at this point so do something interesting DmaPut((void*)buf,sourceAddr,dataSize); DmaWait(....); return(0); }
  • 65. Slide 65 • Keep code the same as PPE Version – Conditional compilation based on processor type – Might have to split code into a separate file SPURS Task: Executing the Code //--- SPU code to call the update method //--- data is loaded at this point so do something interesting //--- step through array of objects and update gfxObject *tp; for(int i=0; i<count; i++) { tp = &buf[i]; tp->update(); //same method as on PPE compiled for SPU }
  • 66. Slide 66 • 5x Speed up in this instance – Brute force solution – Could add more SPUS – Problem is embarrassingly parallel • Note that the same source code for the method is being used on PPU and SPU – Code stays in sync – No fancy SPU specific optimisations SPURS Task: Results
  • 67. Slide 67 • PPU Stalled while waiting for SPU to Process • SPU Data get and put were synchronous – Not using hardware to its fullest extent – Easy to get more if we need it SPURS Task: Time Waster PPU PPU GET PUTExec TIME SPU
  • 69. Slide 69 Physics Solver Parallel SPU x5PPU 512 22ms 3200 23ms
  • 70. Slide 70 PPU Solver • Collision detection and response is a big, involved process 0 50 100 150 200 250 0 100 200 300 400 500 600 Time(ms) Step Integrate Collision Response 3200 boxes
  • 71. Slide 71 Development History 0 50 100 150 200 250 PPU x1 SPU x1 SPU x5 SPU x5 (Optimised) Time(ms) Total Time(ms)
  • 72. Slide 72 SPU solver 0 5 10 15 20 25 0 100 200 300 400 500 600 Time(ms) Step Integrate Collision Response Time(ms)
  • 73. Slide 73 • Simple pipeline – No stalls like LHS • No Operating System, very few system calls – Performance is high and deterministic • Memory is very fast like L1 cache – Memory transfers a asynchronous so can be hidden • Even if performance is comparable, frees up the PPE to do other stuff SPEs are faster than the PPE
  • 74. Slide 74 • Double buffer • Align data and transfers optimally • Vectorise data: SOA instead of AOS • Use SIMD – 4 operations at once = 4x speed up • Use intrinsics • Program directives - branch hinting Further SPU Optimisations
  • 75. Slide 75 Double-Buffering Buffer0 Buffer0 Buffer0 Buffer0 Buffer1 Buffer1 Buffer1 Buffer1 DMA Computation • Allows overlapping of DMA and computation • Transfer to/from one buffer while using data from another • Use different tag group for each buffer • Depends on space available in LS TIME
  • 76. Slide 76 • Graphics API • Multistream – Audio • Bullet – Physics • PlayStation®Edge – Geometry/Compression/Animation • FIOS – Optimised disc access • Various Codecs Our Libraries that run on SPUs
  • 77. Slide 77 • Game engine including – Modular run time – Samples, docs and whitepapers – Full source and art work • Optimised for multi-core platforms especially PS3™ • Extractable and reusable components PhyreEngine™
  • 78. Slide 78 • A lot of 3rd party middleware partners use the SPUs – Physics – Audio – AI – Game Engines – Many more… • Examples by guest speakers at the end of the talk – NVIDIA® PhysX ®, Epic Games Unreal ® Engine3 and Crytek CryENGINE ® 3 In Addition
  • 79. Slide 79 • Animation • AI Threat prediction • AI Line of fire • AI Obstacle avoidance • Collision detection • Physics • Particle simulation • Particle rendering • Scene graph Killzone®2 SPU Usage • Display list building • IBL Light probes • Graphics post-processing • Dynamic music system • Skinning • MP3 Decompression • Edge Zlib • etc. 44 Job types
  • 80. Slide 80 • Effect done on SPU – Motion Blur, Bloom, Depth of Field – Quarter-res image on XDR • SPUs assist RSX™ 1. RSX™ prepares low-res image buffers in XDR 2. RSX™ triggers interrupt to start SPUs 3. SPUs perform image operations 4. RSX™ already starts on next frame 5. Result of SPUs processed by RSX™ early in next frame Post Processing
  • 81. Slide 81 Resolve to quarter resolution Image generated on the SPUs (bloom, DoF, motion blur) Final image by the RSX™ (Blending the out-of-focus and blurry areas in) XDR DDR
  • 83. Slide 83 • Faster than Blu-ray – Over 20MB/s vs. 9MB/s of Blu-ray – Cache data to HDD for faster subsequent load times – Install titles – Stream data • Virtual memory • Decrease usage of system memory Storage – Hard drive
  • 84. Slide 84 • High capacity – 25GB single layer – 50GB dual layer • Constant linear velocity – Constant read speed over disc - 9 MB/s – Optimise layout, duplicate and pack data • Use compression – Lowers bandwidth cost and lowers read time – Use SPUs to decompress – Edge Zlib • Can emulate on Devkit Blu-ray
  • 85. Slide 85 • Can access both Blu-ray drive (~9 MB/s) and HDD (~20MB/s) simultaneously – On HDD, data come from the system cache and game data partitions • SPU Zlib decompression – Increase the throughput • libFIOS – Optimize I/O transfers – Ease background HDD caching implementation PS3TM I/O Performance System Cache (2GB) 25 ms apart Game Data (up to 5GB per title)