2. What’s Multicore? Multiple cores in a single chip Improving performance by adding core Became the main stream in recent years - Examples - Core 2 dual, Core 2 Quad, Core-i3/5/7 Intel - Athlon II X2, Phenom II X4, Opteron AMD - Cell Broad Engine IBM
3. Why Multicore? The difficulties of single core processor’s development - Overheat - Energy consumption - Electron leakage - Example - Intel abandoned the project of 4GHz processor in fall 2004 Multicore processor resolve these problem and has better performance
4. Research Introduction Purpose: - To see the performance difference between single core and multicore processors How: - Use the PS3 as the host machine - Use the CPU of PS3 to execute a series of matrix multiplication - Execute with single core - Execute with multicore - programming tools are needed for handling cores - Record the time and analysis the performance
5. Play Station 3 Physical Components CPU: Cell Broad Engine Memory: 256MB Storage: 80GB Software Yellow Dog Linux Cell SDK
6. Cell Broad Engine Processor Developed by Sony, Toshiba, and IBM jointly. Multicore structure - Power Processing Element x 1 (PPE) - Like a traditional processor - It has its own L1, L2 cache - Synergistic Processing Element x 8 (SPE) - Can be used synchronously - It has 256KB local storage
7. Matrix Multiplication Simple but time consuming Some assumptions are made for research purpose - Dimension is set to N2 - Data type is set to double - Only even numbers are applied
10. Basic idea - Consist of three functions - task<inner>: distribute - task<leaf>: compute - task<ext>: connect
11. Programming in Sequoia To programming in Sequoia, four files are required to run the matrix multiplication. - “Makefile” for compiling - “matrixmult.sq” Sequoia program - “mapping_ps3.xml” for mapping - “main.cc” for starting During the process - Good documentation - Good adaptability for different purposes - Details need to be handled by programmers
12. Cellgen An implicit multicore programming model C/C++ based programming tool Like OpenMP style - OpenMP API Basic idea - Starts after “#pragma cell” - Parameters - public: shared by SPEs - private: each SPE has a copy Scott Schneider Ph.D. Candidate Virginia Tech
13. Programming in Cellgen There are files needed to run matrix multiplication - Two “Makefile” for compiling - One “matrixmult.cellgen” Cellgen code - One “double16b_t.h” for padding column data - suggested by the author to improve performance During the process - Understandable - C/C++ based; easy to catch up. - Lack of documentation - Only “Readme” file is available.
14.
15. Result in Graph (1) The following is the line chart generated from the data of the table. Memory size limit PPE Only Cellgen Sequoia
17. Result Analysis Performance of Cellgen - Unexpected overhead or runtime error may occur and throw the performance back. Performance of Sequoia - According to the stable record, it is about 8 times faster than the execution time of PPE. - Although the memory size is 256MB, performance starts dropping down after 2048 2. - The performance becomes the same with PPE after reaching 4096 2 . - Probably the most of the data are swapped with disk, which is out of the Sequoia’s ability.
18. Conclusion Multicore processor has better performance than single core processor, which is about 8 times faster if the memory space is sufficient. Multicore may also have some unexpected overhead or error, which may draw back the performance like what I have in Cellgen. Multicore processing is art. - In the paper “ Programming Multiprocessors With Explicitly Managed Memory Hierarchies,” Cellgen has better performance than Sequoia does. However, Cellgen doesn’t do well like Sequoia in this research.
19. Reference http://elhabib.at/files/2008/07/yellowdog-vorlage_p1.jpg http://scawley.files.wordpress.com/2008/03/sony_playstation_3_60gb_game_console__brand_new.jpg http://www.5ilight.com/dianzi/upimg/20070222/11H154H0L05A08.jpg http://moss.csc.ncsu.edu/~mueller/cluster/ps3/cell.jpg http://upload.wikimedia.org/wikipedia/en/thumb/e/eb/Matrix_multiplication_diagram_2.svg/313px-Matrix_multiplication_diagram_2.svg.png http://www.stanford.edu/group/sequoia/cgi-bin/node/182 http://www.stanford.edu/group/sequoia/cgi-bin/ http://openmp.org/wp/about-openmp/ http://people.cs.vt.edu/~scschnei/pictures/scott.jpg http://openmp.org/wp/openmp_336x120.gif http://www.ibm.com/developerworks/power/library/pa-cellperf/ Ramanathan, R. M. “Intel® Multi-Core Processors: Making the Move to Quad-Core and Beyond.” Intel Multi-CoreProcessors. pp.3, 15 November 2008 http://www.intel.com/technology/architecture/downloads/quad-core-06.pdf . Sutter, Hurb. “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software.” Dr. Dobb’s Journal, 30(3), March 2005 < http://lyle.smu.edu/~coyle/cse8313/handouts/Free.lunch.over.pdf>. http://www.stanford.edu/group/sequoia/cgi-bin/ http://github.com/scotts/cellgen/ http://en.wikipedia.org/wiki/Cell_(microprocessor) Martin Linklater. "Optimizing Cell Code". Game Developer Magazine, April 2007: pp. 15–18. "To increase fabrication yelds, Sony ships PlayStation 3 Cell processors with only seven working SPEs. And from those seven, one SPE will be used by the operating system for various tasks, This leaves six SPEs for game programmer to use.“ Scott Schneider, Jae-SeungYeom and Dimitrios S. Nikolopoulos. Programming Multiprocessors With Explicitly Managed Memory Hierarchies. IEEE Computer, December, 2009.
Main introduction about what multicore is. Take the old English paper as reference!
The reasons why the CPU manufactures change from single core to multicore.
Research intro. Talking about what my purpose is, how I test and justify my results, and what application, programming models, and host machine I will use. (PS. Application -> large dimension matrix; Programming models -> Sequoia, Cellgen; Host machine -> PS3)
Introduce PS3 with its memory size, CPU, and what OS we use for. (PS. Main memory -> 256MB; CPU -> Cell Broad Engine; OS -> Yellow Dog Linux) Don’t forget to mention that Cell SDK is necessary for developing Cell CPU!Picture for YDL: http://elhabib.at/files/2008/07/yellowdog-vorlage_p1.jpg Picture: http://scawley.files.wordpress.com/2008/03/sony_playstation_3_60gb_game_console__brand_new.jpg
Application is a series of matrix multiplication. Also, put the reason why I choose matrix multiplication to be my application. Picture: http://upload.wikimedia.org/wikipedia/en/thumb/e/eb/Matrix_multiplication_diagram_2.svg/313px-Matrix_multiplication_diagram_2.svg.png
Brief introduction about Sequoia. Major points -> explicit local storage management, mapping a tree structure as a memory hierarchy, and major programming points(inner, leaf, and ext tasks).Picture tree structure: http://www.stanford.edu/group/sequoia/cgi-bin/node/182Sequoia logo: http://www.stanford.edu/group/sequoia/cgi-bin/
Connect to matrixmult.sq, matrixmult_ps3_mapping.xml, and main.cc files here and explain briefly. Then, talk about how I feel during the process. (Basic idea -> good documentation and adaptability for different purpose, but programmer has to handle much more in detail!) Also remind that five files are required to use Sequoia: two Makefile files (for compile purpose), xxx.sq code (main program), xxx.xml (for mapping purpose), main.cc (for execution purpose)
Introduction about Cellgen. Major points: C/C++ based software tool, implicit local storage management, OpenMP-like support. (PS. OpenMP needs to be explained -> orally brief explanation; use http://openmp.org/wp/about-openmp/ as reference!!) Author photo: http://people.cs.vt.edu/~scschnei/pictures/scott.jpgOpenMP logo: http://openmp.org/wp/openmp_336x120.gif
Put matrixmult.cellgen and double16b_t.h files here and explain briefly. Also mention that the problem of lack of documentation and the problem the author said.
Put the overall result table here and “explain”. Do not say too much here, analysis will be left on later slides!
Put the result graph here and “explain”. Do not say too much here.
The line chart about
Just leave the “important” partial data here and explain my analysis. Major point: how fast can Sequoia reach, Sequoia and Cellgen limit of the physical main memory which only has 256MB, and the unexpected poor performance of Cellgen (maybe some potential overhead draw back the overall performance).
Multicore processing is art!
Reference list: form old EN paper, from OpenMP website, from Cellgen author websites, from Sequoia websites, the IEEE magazine.
Questions be prepared!http://www.cmoe.com/blog/wp-content/images/question-mark.jpg