SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Programming with Linux on the 
         Playstation3
                              FOSDEM 2008
                         olivier.grisel@ensta.org


               
                   Architecture overview:  
                   introducing the Cell BE 
               
                   Installing Linux
               
                   SIMD programming in C/C++
               
                   Asynchronous data transfer with 
                   the DMA




           
Who am I

    Java / Python developer at Nuxeo (FOSS document 
    management server)

    Interested in Artificial Intelligence (and need fast 
    Support Vector Machines)

    Slides to be published at:
    http://oliviergrisel.name




                         
PS3 architecture overview

    CPU: IBM Cell/BE @ 3.2GHz 
    
        218 GFLOPS
    
        Main RAM: 256MB XDR (64b@3.2GHz)

    GPU: Nvidia RSX
    
         1.8 TFLOPS (SP) / 356 GFLOPS programmable 
    
        VRAM: 256MB GDDR3 (2x128b@700MHz)

    System Bus: 2.5 GB/s


                         
The Cell Broadband Engine
             
                 1 PPE core @ 3.2GHz
                 
                     64bit hyperthreaded 
                     PowerPC
                 
                     512KB L2 cache
             
                 8 SPE cores @ 3.2GHz
                 
                     128bit SIMD optimized
                 
                     256KB SRAM



         
PS3 Clusters
          
              Cheap cluster for 
              academic researchers
          
              Carolina State U. and 
              U. Massachusetts at D.
          
              8+1 cluster with ssh and 
              MPI




       
PS3 GRID Computing

    PS3GRID project
    
        based on BOINC
    
        30,000 atoms simulation

    Folding@Home
    
        1 PFLOPS with 800 
        TFLOPS from PS3s
    
        BlueGene == 280 
        TFLOPS

                            
Linux on the PS3

    Lv1 Hypervisor shipped with the default firmware

    Partition utility in the Sony Game OS menu

    Choose your favorite distro: 




    Install a ­powerpc64­smp or ­ps3 kernel

    Install gcc­spu + libspe2


                        
Programming the Cell/BE in C

    Program the PPE as a chief conductor to spread the 
    numerical code to SPEs

    Use POSIX threads to start SPE subroutines in 
    parallel

    Use SPE intrinsics to perform vector instructions

    Eliminate branches as much as possible in SPE code

    Align your data to 16 bytes


                        
Introduction to SIMD programming

    128 bits registers (SSE2, Altivec, SPE)
     
         2 x double
     
         4 x float
     
         4 x int

    introduce new vector types

    1 vector float operation == 4 float operations

    logical (and, or, cmp, ...), arithmetic (+, *, abs, ...), 
    shuffling
                          
SIMD programming – the big picture 




              
Not always SIMD­izable




            
SIMD programming with libspe2 and 
                                gcc­spu

    #include <spu_intrinsics.h>

    avoid scalar types use:
    
        vector_float4
    
        vector_double2
    
        vector_char16 ...

    d = spu_and(a, b); e = spu_madd(a, b, c);

    spu­gcc  pure_spe_prog.c ­o pure_spe_prog.elf

                             
Branch elimination

    avoid branching (if / else)
    
        c = spu_sel(a, b, spu_cmpgt(a, d));




                            
A sample SPE program
volatile union {
       vec_float4 vec;
       float part[4];
} sum;
float dot_product(const float* xp, const float* yp, const int size) {
       sum.vec = (vec_float4) {0, 0, 0, 0};
       vec_float4* xvp = (vec_float4*) xp;
       vec_float4* yvp = (vec_float4*) yp; 
       vec_float4* xvp_end = xvp + size / 4;
       while(__builtin_expect(xvp < xvp_end, 1)) {
            sum.vec = spu_madd(*xvp, *yvp, sum.vec);
            xvp++;
            yvp++;
       }
       return sum.part[0] + sum.part[1] + sum.part[2] + sum.part[3];
}

                                       
DMA with the SPUs' Memory Flow 
                   Controllers

    #include <spu_mfcio.h>

    mfc_get(&local_data, main_mem_data_ea, 
    sizeof(local_data), DMA_TAG, 0, 0);

    mfc_put(&local_data, main_mem_data_ea, 
    sizeof(&local_data), DMA_TAG, 0, 0);

    mfc_getb(&local_data, main_mem_data_ea, 
    sizeof(local_data), DMA_TAG, 0, 0);

    spu_mfcstat(MFC_TAG_UPDATE_ALL);
                      
Double­buffering – the problem




            
Double­buffering – the big picture




             
Double­buffering with MFC

    1. SPU queues MFC GET to fill buffer #1

    2. SPU queues MFC GET to fill buffer #2

    3. SPU waits for buffer #1 to finish filling

    4. SPU processes buffer #1

    5. SPU queues MFC PUT back content of buffer #1

    6. SPU queues MFC GETB to refill buffer #1

    7. SPU waits for buffer #2 to finish filling

    8. SPU processes buffer #2 (...)

                        
Some resources

    Cell BE Programming Tutorial (ibm.com 190 pages)

    IBM developerworks short programming tutorials
    
         Search for articles by Jonathan Barlett

    Barcelona Supercomputing Center (software)
    
        http://www.bsc.es/projects/deepcomputing/linuxoncell/

    PS3 programming workshops (videos)
    
        http://www.cc.gatech.edu/~bader/CellProgramming.html

    #ps3dev on freenode
                            
Thanks, credits, licensing

    Most schemas from excellent GFDL 'd tutorial by 
    Geoff Levand (Sony Corp)
    
        http://www.kernel.org/pub/linux/kernel/people/geoff/cell

    Pictures and trade marks belong to their respective 
    owners (Sony, IBM, Universities, Folding@Home, 
    PS3GRID, ...)

    All remaining work is GFDL


                           
7 differences




       

Contenu connexe

Similaire à Programming the PS3

Tiny ML for spark Fun Edge
Tiny ML for spark Fun EdgeTiny ML for spark Fun Edge
Tiny ML for spark Fun Edge艾鍗科技
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanJimin Hsieh
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy
 
TestUpload
TestUploadTestUpload
TestUploadZarksaDS
 
07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W mattersAlexandre Moneger
 
Davide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruptionDavide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruptionlinuxlab_conf
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Cloudera, Inc.
 
Streaming huge databases using logical decoding
Streaming huge databases using logical decodingStreaming huge databases using logical decoding
Streaming huge databases using logical decodingAlexander Shulgin
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)Alexandre Moneger
 
Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Maarten Mulders
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Caching and tuning fun for high scalability @ FrOSCon 2011
Caching and tuning fun for high scalability @ FrOSCon 2011Caching and tuning fun for high scalability @ FrOSCon 2011
Caching and tuning fun for high scalability @ FrOSCon 2011Wim Godden
 
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter boardEmbedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter boardVincent Claes
 
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter boardSerial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter boardVincent Claes
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 

Similaire à Programming the PS3 (20)

Tiny ML for spark Fun Edge
Tiny ML for spark Fun EdgeTiny ML for spark Fun Edge
Tiny ML for spark Fun Edge
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
TestUpload
TestUploadTestUpload
TestUpload
 
07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters
 
Davide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruptionDavide Berardi - Linux hardening and security measures against Memory corruption
Davide Berardi - Linux hardening and security measures against Memory corruption
 
Beyond Puppet
Beyond PuppetBeyond Puppet
Beyond Puppet
 
Memcached Study
Memcached StudyMemcached Study
Memcached Study
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
 
20160908 hivemall meetup
20160908 hivemall meetup20160908 hivemall meetup
20160908 hivemall meetup
 
Streaming huge databases using logical decoding
Streaming huge databases using logical decodingStreaming huge databases using logical decoding
Streaming huge databases using logical decoding
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)
 
Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Caching and tuning fun for high scalability @ FrOSCon 2011
Caching and tuning fun for high scalability @ FrOSCon 2011Caching and tuning fun for high scalability @ FrOSCon 2011
Caching and tuning fun for high scalability @ FrOSCon 2011
 
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter boardEmbedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Embedding VHDL in LabVIEW FPGA on Xilinx Spartan 3E Starter board
 
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter boardSerial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter board
Serial Communication in LabVIEW FPGA on Xilinx Spartan 3E Starter board
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 

Plus de Olivier Grisel

Strategies and Tools for Parallel Machine Learning in Python
Strategies and Tools for Parallel Machine Learning in PythonStrategies and Tools for Parallel Machine Learning in Python
Strategies and Tools for Parallel Machine Learning in PythonOlivier Grisel
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Olivier Grisel
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKOlivier Grisel
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnOlivier Grisel
 
Nuxeo 5.3 and Semantic R&D
Nuxeo 5.3 and Semantic R&DNuxeo 5.3 and Semantic R&D
Nuxeo 5.3 and Semantic R&DOlivier Grisel
 
Hadoop MapReduce - OSDC FR 2009
Hadoop MapReduce - OSDC FR 2009Hadoop MapReduce - OSDC FR 2009
Hadoop MapReduce - OSDC FR 2009Olivier Grisel
 

Plus de Olivier Grisel (7)

Strategies and Tools for Parallel Machine Learning in Python
Strategies and Tools for Parallel Machine Learning in PythonStrategies and Tools for Parallel Machine Learning in Python
Strategies and Tools for Parallel Machine Learning in Python
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learn
 
Nuxeo Iks 2009 11 13
Nuxeo Iks 2009 11 13Nuxeo Iks 2009 11 13
Nuxeo Iks 2009 11 13
 
Nuxeo 5.3 and Semantic R&D
Nuxeo 5.3 and Semantic R&DNuxeo 5.3 and Semantic R&D
Nuxeo 5.3 and Semantic R&D
 
Hadoop MapReduce - OSDC FR 2009
Hadoop MapReduce - OSDC FR 2009Hadoop MapReduce - OSDC FR 2009
Hadoop MapReduce - OSDC FR 2009
 

Dernier

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Dernier (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Programming the PS3

  • 1. Programming with Linux on the  Playstation3     FOSDEM 2008 olivier.grisel@ensta.org  Architecture overview:   introducing the Cell BE   Installing Linux  SIMD programming in C/C++  Asynchronous data transfer with  the DMA    
  • 2. Who am I  Java / Python developer at Nuxeo (FOSS document  management server)  Interested in Artificial Intelligence (and need fast  Support Vector Machines)  Slides to be published at: http://oliviergrisel.name    
  • 3. PS3 architecture overview  CPU: IBM Cell/BE @ 3.2GHz   218 GFLOPS  Main RAM: 256MB XDR (64b@3.2GHz)  GPU: Nvidia RSX   1.8 TFLOPS (SP) / 356 GFLOPS programmable   VRAM: 256MB GDDR3 (2x128b@700MHz)  System Bus: 2.5 GB/s    
  • 4. The Cell Broadband Engine  1 PPE core @ 3.2GHz  64bit hyperthreaded  PowerPC  512KB L2 cache  8 SPE cores @ 3.2GHz  128bit SIMD optimized  256KB SRAM    
  • 5. PS3 Clusters  Cheap cluster for  academic researchers  Carolina State U. and  U. Massachusetts at D.  8+1 cluster with ssh and  MPI    
  • 6. PS3 GRID Computing  PS3GRID project  based on BOINC  30,000 atoms simulation  Folding@Home  1 PFLOPS with 800  TFLOPS from PS3s  BlueGene == 280  TFLOPS    
  • 7. Linux on the PS3  Lv1 Hypervisor shipped with the default firmware  Partition utility in the Sony Game OS menu  Choose your favorite distro:   Install a ­powerpc64­smp or ­ps3 kernel  Install gcc­spu + libspe2    
  • 8. Programming the Cell/BE in C  Program the PPE as a chief conductor to spread the  numerical code to SPEs  Use POSIX threads to start SPE subroutines in  parallel  Use SPE intrinsics to perform vector instructions  Eliminate branches as much as possible in SPE code  Align your data to 16 bytes    
  • 9. Introduction to SIMD programming  128 bits registers (SSE2, Altivec, SPE)  2 x double  4 x float  4 x int  introduce new vector types  1 vector float operation == 4 float operations  logical (and, or, cmp, ...), arithmetic (+, *, abs, ...),  shuffling    
  • 12. SIMD programming with libspe2 and  gcc­spu  #include <spu_intrinsics.h>  avoid scalar types use:  vector_float4  vector_double2  vector_char16 ...  d = spu_and(a, b); e = spu_madd(a, b, c);  spu­gcc  pure_spe_prog.c ­o pure_spe_prog.elf    
  • 13. Branch elimination  avoid branching (if / else)  c = spu_sel(a, b, spu_cmpgt(a, d));    
  • 14. A sample SPE program volatile union { vec_float4 vec; float part[4]; } sum; float dot_product(const float* xp, const float* yp, const int size) { sum.vec = (vec_float4) {0, 0, 0, 0};        vec_float4* xvp = (vec_float4*) xp;        vec_float4* yvp = (vec_float4*) yp;  vec_float4* xvp_end = xvp + size / 4; while(__builtin_expect(xvp < xvp_end, 1)) { sum.vec = spu_madd(*xvp, *yvp, sum.vec); xvp++; yvp++; } return sum.part[0] + sum.part[1] + sum.part[2] + sum.part[3]; }    
  • 15. DMA with the SPUs' Memory Flow  Controllers  #include <spu_mfcio.h>  mfc_get(&local_data, main_mem_data_ea,  sizeof(local_data), DMA_TAG, 0, 0);  mfc_put(&local_data, main_mem_data_ea,  sizeof(&local_data), DMA_TAG, 0, 0);  mfc_getb(&local_data, main_mem_data_ea,  sizeof(local_data), DMA_TAG, 0, 0);  spu_mfcstat(MFC_TAG_UPDATE_ALL);    
  • 18. Double­buffering with MFC  1. SPU queues MFC GET to fill buffer #1  2. SPU queues MFC GET to fill buffer #2  3. SPU waits for buffer #1 to finish filling  4. SPU processes buffer #1  5. SPU queues MFC PUT back content of buffer #1  6. SPU queues MFC GETB to refill buffer #1  7. SPU waits for buffer #2 to finish filling  8. SPU processes buffer #2 (...)    
  • 19. Some resources  Cell BE Programming Tutorial (ibm.com 190 pages)  IBM developerworks short programming tutorials   Search for articles by Jonathan Barlett  Barcelona Supercomputing Center (software)  http://www.bsc.es/projects/deepcomputing/linuxoncell/  PS3 programming workshops (videos)  http://www.cc.gatech.edu/~bader/CellProgramming.html  #ps3dev on freenode    
  • 20. Thanks, credits, licensing  Most schemas from excellent GFDL 'd tutorial by  Geoff Levand (Sony Corp)  http://www.kernel.org/pub/linux/kernel/people/geoff/cell  Pictures and trade marks belong to their respective  owners (Sony, IBM, Universities, Folding@Home,  PS3GRID, ...)  All remaining work is GFDL