The document discusses Intel's HPC portfolio and roadmap update. It provides an overview of the new Intel Xeon E5-2600 v2 processor family, highlighting its efficiency, performance, and security features. The Xeon E5-2600 v2 is expected to deliver up to 30% more performance using the same or less power compared to the previous generation. It offers up to 12 cores, 30MB of cache, and support for the latest I/O and memory technologies to provide powerful and efficient processing for modern data centers.
3. This slide MUST be used with any slides removed from this presentation
Legal Disclaimers Continued
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different
processor families. Go to: http://www.intel.com/products/processor_number
Intel® HT Technology available on select Intel® processors. Requires an Intel® HT Technology-enabled system. Consult your system manufacturer.
Performance will vary depending on the specific hardware and software used. For more information including details on which processors support HT
Technology, visit http://www.intel.com/info/hyperthreading.
Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM). Functionality,
performance or other benefits will vary depending on hardware and software configurations. Software applications may not be compatible with all
operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization
No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a computer system
with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an Intel TXT-compatible
measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit
http://www.intel.com/technology/security
Requires a system with Intel® Turbo Boost Technology. Intel Turbo Boost Technology and Intel Turbo Boost Technology 2.0 are only available on select
Intel® processors. Consult your PC manufacturer. Performance varies depending on hardware, software, and system configuration. For more
information, visit http://www.intel.com/go/turbo
Intel® AES-NI requires a computer system with an AES-NI enabled processor, as well as non-Intel software to execute the instructions in the correct
sequence. AES-NI is available on select Intel® processors. For availability, consult your reseller or system manufacturer. For more information,
see http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/
Intel, Intel Xeon, the Intel Xeon logo and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
and other countries. Other names and brands may be claimed as the property of others
3
INTEL CONFIDENTIAL
4. This slide MUST be used with any slides with performance data removed from this presentation
Legal Disclaimers: Performance
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as
measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other
sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on
the performance of Intel products, Go to: http://www.intel.com/performance/resources/benchmark_limitations.htm.
Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform
into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance
improvements reported.
SPEC, SPECint, SPECfp, SPECrate. SPECpower, SPECjAppServer, SPECjEnterprise, SPECjbb, SPECompM, SPECompL, and SPEC MPI are trademarks of the Standard
Performance Evaluation Corporation. See http://www.spec.org for more information.
SAP and SAP NetWeaver are the registered trademarks of SAP AG in Germany and in several other countries. See http://www.sap.com/benchmark for more information.
Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or
configuration may affect actual performance.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and
MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the
results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the
performance of that product when combined with other products.
4
INTEL CONFIDENTIAL
6. This slide MUST be used with any slides with performance data removed from this presentation
Optimization Notice
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations
that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction
sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any
optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer
to the applicable product User and Reference Guides for more information regarding the specific instruction
sets covered by this notice.
Notice revision #20110804
6
INTEL CONFIDENTIAL
8. Intel® Xeon® Processor E5-2600 v2 Product Family
Tick-Tock Development Model:
Sustained Microprocessor Leadership
Intel® Core™
Microarchitecture
Intel® Microarchitecture
Codename Nehalem
Intel® Microarchitecture
Codename Sandy
Bridge
Xeon®
5300
Xeon®
5400
Xeon®
5500
Xeon®
5600
Xeon®
Xeon®
E5- 2600 E5- 2600 v2
65nm
45nm
45nm
32nm
32nm
New
Microarchitecture
New
Process
Technology
New
Microarchitecture
New
Process
Technology
TOCK
TICK
TOCK
TICK
Intel® Microarchitecture
Codename Haswell
Haswell
Future
22nm
22nm
14nm
New
Microarchitecture
New
Process
Technology
New
Microarchitecture
New
Process
Technology
TOCK
TICK
TOCK
TICK
Latest Micro-architecture on Leading Process Technology
8
INTEL CONFIDENTIAL
9. Intel® Xeon® Processor E5-2600 v2 Product Family
At the Heart of a Modern Data Center
Intel ® Xeon ® E5-2600 v2 product family
Efficient
Leading 22nm manufacturing process
reduces power usage. Supports Intel® Node
Manager & Intel® Data Center Manager
Software
Powerful
Up to 12 cores and 30MB cache
expected to deliver up to 30%1 more
performance in same power envelope
vs previous generation
.
2
Secure
Improved security with Intel® Secure Key &
Intel® OS Guard for additional HW embedded
security plus enhanced AES-NI
1 Source: Intel internal measurements. {SPECint*_rate_base2006, 28 March 2013, E5-26xxv2 (12C, 2.5GHz,) vs. E5-2600 (8C, 2.9Ghz, ). Results have been simulated and are provided for informational purposes only. Results were derived using simulations run on an architecture simulator or model.
Any difference in system hardware or software design or configuration may affect actual performance. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.. For
more information go to http://www.intel.com/performance
*Other names and brands may be claimed as the property of others.
INTEL CONFIDENTIAL
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
9
10. Unrelenting Focus on Power Efficiency
Xeon E5-2600 v2
Active Power
Dynamic Power
Idle Power
Delivering up to 45%1 power
efficiency improvements
through enhanced fine grain
power controls and 22nm trigate process
Efficient Turbo that
intelligently adapts to peak
workloads conditions and
disengages when Memory
and I/O are the bottlenecks
Low leakage process
technology and power gating
technology contribute to Idle
Power of up to 23%1 lower
than previous generation
1. Source: Intel internal measurements: [Baseline Configuration and Score on SPECPower_ssj2013* benchmark. Idle power based on , Intel® Xeon ® processor E5- 26xx v2
(12C, 2.5GHz, 95W), 28 March 2013]. Results have been simulated and are provided for informational purposes only. Results were derived using simulations run on an
architecture simulator or model. Any difference in system hardware or software design or configuration may affect actual performance. Intel product plans in this presentation do
not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. For more information go
to http://www.intel.com/performance
*Other names and brands may be claimed as the property of others.
10
INTEL CONFIDENTIAL
11. Intel® Xeon® Processor E5-2600 v2 Product Family
REAL Performance Where it Counts
Xeon E5-2600 v2
50% MORE
IMPROVED
IMPROVED
NEW
cores / threads
faster memory
integrated IO (PCIe 3.0)
virtualization feature
50% MORE
~30%1 less
NEW
last-level cache
idle power
security features
1. Source: Intel internal measurements: [idle power, Intel® Xeon ® processor E5- 26xx v2 (12C, 2.5GHz, 95W), 28 March 2013]. Results have been simulated and are provided for informational purposes only. Results were derived using simulations run on an architecture
simulator or model. Any difference in system hardware or software design or configuration may affect actual performance. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s
current plan of record product roadmaps. For more information go to http://www.intel.com/performance
* Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
INTEL CONFIDENTIAL
11
12. Intel® Xeon® Processor E5-2600 v2 Product Family
Reduce Bottlenecks
with Intel® Integrated I/O
Better Together
Unleash the full I/O capabilities of Xeon® E5
with Intel® Ethernet X540 Server10GbE Adapter
or Intel® True Scale 7300 series HCAs
Increase I/O Performance
NETWORKING
APPLIANCES
Reduce I/O Latency
HPC
TRADING
STORAGE
LARGE SCALE
ANALYTICS
1. Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family
(230 ns) vs. Intel® Xeon® processor 5500 series (340 ns). See notes in backup for configuration details
.2. Source: 8 GT/s and 128b/130b encoding in PCIe* 3.0 specification enables double the interconnect bandwidth over the PCIe* 2.0 specification
(www.pcisig.com/news_room/November_18_2010_Press_Release/).
* Other names and brands may be claimed as the property of others
with Intel® Integrated I/O
12
INTEL CONFIDENTIAL
13. Intel® Xeon® processor E5-2600 v2 Product Family
Intel Xeon Processor
E5-2600 v2
Socket compatible replacement
for Intel® Xeon® processor
E5-2600 product family
Up to 12 cores and 30MB
cache expected to deliver up to
50%1 more performance in
same power envelope
Up to 30MB
Shared Cache
4 channels of up
To DDR3 1866
MHz memory
* Other names and brands may be claimed as the property of others
Integrated
PCI
Express*
3.0
Up to 40
lanes
per socket
Improved security with Intel®
Secure Key & Intel® OS Guard
for additional HW embedded
security
1 1Baseline Configuration and Score on SPECVirt_sc2013* benchmark. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. Software
and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products.
For more information go to http://www.intel.com/performance
13
INTEL CONFIDENTIAL
14. Intel® Xeon® Processor E5-2600 v2 Product Family
World Record Performance
• E5-2600 v2 featured in the #1 supercomputer “Milky Way-2” on the
Top500 list
• With 12 cores running up to 2.7 GHz, E5-2600 v2 delivers 259 GFlops
per socket, a 56% increase over the previous generation
• E5-2600 v2 also in 2 other supercomputers on the Top500 list - #54
and #329
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using
specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to
assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
14
Source: http://newsroom.intel.com/community/intel_newsroom/blog/2013/06/17/intel-powers-the-worlds-fastest-supercomputer-reveals-new-and-future-high-performance-computing-technologies
19. Intel® Xeon® Processor
E5-2600 v2 Product Family
Neusoft CT
Intel® Advance Vector Extensions – Float 16
“The Intel Xeon Processor E5-2600 V2 generates complex CT images about
45 percent faster than the previous Intel Xeon Processor E5-2600. A study
that took 20 minutes can now be completed in just 11. Multiply that by the
hundreds of patients in a busy clinic and the time savings stretch to hours
per day. We also see performance gains of up to 1.54X using Float16
Instructions for select workloads. This is absolutely a preferred hardware
platform for Neusoft CT.”
www.neusoft.com
Intel AVX
The Higher the Better
Shuangxue Li, Vice President of Neusoft Medical Systems Co.,
Ltd., General Manager of Diagnostic Imaging Systems Division
1.80
Neusoft CT scanners depend, in part, on the efficiency of the
underlying image-generation software and its ability to deliver
high quality images quickly.
Benefits of the Intel® Xeon® processor E5 v2 family—Fast image
generation reduces wait times for patients and medical teams in
busy clinical settings.
Intel® AVX reduces the diagnosis latency,
and helps doctors to make the right decision in the
shortest time
1.60
On Intel®
Xeon®
Processor E52600 V2
series without
AVX Float 16
Optimization
1.54X
1.40
1.20
1.00
0.80
On Intel®
Xeon®
Processor E52600 V2
series with
AVX Float 16
Optimization
0.60
0.40
0.20
0.00
19
INTELINTEL CONFIDENTIAL UNDER NDA ONLY UNLESS TAGGED “PUBLIC AT LAUNCH”.
CONFIDENTIAL - USE
*Other names and brands may be claimed as the property of others
20. Intel® Xeon® Processor
E5-2600 v2 Product Family
SunGard ALM Benchmark 5.8.6
Risk Analytics
www.sungard.com
“Having successfully migrated to Intel® Xeon® E5-2600 v2, we have seen
significant increases in processing power. We have run the large scenario
simulation engine that is able to take advantage of the increased number
of cores in the new Intel platform; the new platform increased our
performance by more than 38%. These improved results come at a time
when our customers are demanding faster results with even greater
granularity.”
80
Joe Sass, Director of Product Strategy for SunGard’s Ambit ALM business
70
Finance
Elapsed time in sec
60
1.38x Faster
50
SunGard’s Asset & Liability Risk Management solution provides
complete multidimensional analysis of the balance sheet,
incorporating interest rate risk, income simulation and market
valuation using deterministic and stochastic modeling.
40
30
20
10
Better Information and Analysis Means Better Decisions.
SunGard ALM Risk Management Solutions
0
Intel® Xeon®
processorE5-2600
Intel® Xeon®
processor E5-2600 V2
20
INTELINTEL CONFIDENTIAL UNDER NDA ONLY UNLESS TAGGED “PUBLIC AT LAUNCH”.
CONFIDENTIAL - USE
*Other names and brands may be claimed as the property of others
21. Intel® Xeon® Processor
E5-2600 v2 Product Family
Paradigm GeoDepth* v2011.3
Seismic Imaging
“Our continued investment in the optimization of the GeoDepth software,
leveraging the Intel compliers and the Intel MKL library, enables our
customers to take immediate advantage of the 50% increase in compute
cores in this latest generation of Intel Xeon processors.” Duane Dopkin,
Executive Vice President, Technology.”
www.paradigm.com
ENERGY
Duane Dopkin, Senior Vice President, Technology
Customer Benefits—Geophysicists can choose to apply the scalable
performance improvements to produce higher resolution images of the
subsurface, or to improved throughput of their existing workload.
1.45
1.59
12 core Intel® Xeon® processor E5-2697 V2
CSFWMIG Benchmark
1
Key Intel® Xeon® processor E5 v2 advantage—Increased memory speed
reduces communications overhead; 24-cores per two-socket server provides
application scalability; Intel® Hyper-Threading Technology and Intel® Turbo
Boost Technology provide much higher price performance.
8 core Intel® Xeon® processor E5-2670
1
GeoDepth* is the leading system for 3D and 2D velocity model building and
seismic imaging in time and depth. Through the integration of interpretation,
velocity analysis, model building, model updating, model validation, depth
imaging and time-to-depth conversion, GeoDepth provides the continuity
needed to produce high-quality, interpretable images consistent with other
available data.
Paradigm 2011.3 Benchmarks
Relative Performance Higher is better
CRAM Benchmark
Scalable performance for high resolution seismic imaging
21
INTELINTEL CONFIDENTIAL UNDER NDA ONLY UNLESS TAGGED “PUBLIC AT LAUNCH”.
CONFIDENTIAL - USE
*Other names and brands may be claimed as the property of others
22. Intel® Xeon® Processor
E5-2600 v2 Product Family
Star-CCM+*
Engineering Analysis (Multi-Disciplinary)
“We redesigned the front end of our chassis almost exclusively using
simulations with CD-adapco software, the redesign added about 50 lbs.
of down force which is enough extra grip to give us about a tenth of a
second a lap.”
www.cd-adapco.com
HPC
Andy Hogg, Aerodynamics manager, Michael Waltrip Racing
CD-adapco STAR-CCM+* provides comprehensive support for solving
complex engineering problems involving flow (of fluids or solids), heat
transfer and stress. It helps engineers automate workflows to perform
iterative design studies with minimal user interaction.
Faster simulation runtimes reduce simulation/prototyping timelines, to
improve design quality, and speed time to market.
Increased memory bandwidth of the E5-2600 v2 series allows better
utilization of its computational resources, significantly improving
run times.
Faster performance; High quality digital cinema;
Faster time to Market
STAR-CCM+ 8.04.007
Lemans 17M
Iteration time, sec - lower is better
20.000
18.000
16.000
14.000
12.000
10.000
8.000
6.000
4.000
2.000
0.000
17.7
13.0
Sandy Bridge
8-core @ 2.7
GHz
Ivy Bridge
10-core @ 2.8
GHz
22
INTELINTEL CONFIDENTIAL UNDER NDA ONLY UNLESS TAGGED “PUBLIC AT LAUNCH”.
CONFIDENTIAL - USE
*Other names and brands may be claimed as the property of others
1.53x
Faster
11.5
Ivy Bridge
12-core * 2.7
GHz
24. PARALLELISM IS THE PATH FORWARD
Intel is Your Roadmap
Most Commonly
Used Parallel Processor
• Performance and energy efficiency for
most workloads
• Parallel, Serial, Multicore + Vector
• Robust security and reliability
• Flexible foundation for growth
and innovation
24
Optimized for
Highly Parallel Application
• More cores and more threads per core
• Wider Vector instructions
• Higher memory bandwidth
• Common languages, directives,
libraries & tools
• Complements Intel® Xeon® processors
25. PRODUCT LINEUP
Intel® Xeon Phi™ Coprocessor
3xxx Family
Outstanding Parallel
Computing Solution
Performance/$ leadership
5xxx Family
Optimized for High Density
Environments
6GB GDDR5
240GB/s
>1TF DP
3120P
3120A
5110P
5120D
7120P
7120X
8GB GDDR5
>300GB/s
>1TF DP
Performance/watt leadership
225-245W
7xxx Family
16GB GDDR5
Highest Performance, Most
Memory
Performance leadership
352GB/s
>1.2TF DP
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the
performance of that product when combined with other products. For more information go to http://www.intel.com/performance
25
25
26. BIG GAINS FOR PARALLEL APPLICATIONS
7.00
5.00
Performance
3.00
1.00
Efficient vectorization,
threading, and parallel
execution drives higher
performance for
many applications
0.80
1.00
0.60
0.40
0%
25%
% Vector
0.20
50%
* Theoretical acceleration using a highly-parallel Intel® Xeon Phi™ coprocessor
versus a standard multi-core Intel® Xeon® processor
26
0.00
75%
100%
Fraction Parallel
27. 27
PARALLELIZING FOR HIGH PERFORMANCE
A Two Step Process
STARTING POINT
Typical serial code
running on multi-core
Intel® Xeon® processors
67.097
SECONDS
Current
Performance
STEP 1.
OPTIMIZE CODE
Parallelize and vectorize
code and continue to run on
multi-core Intel Xeon processors
STEP 2.
USE COPROCESSORS
Run all or part of the
optimized code on Intel® Xeon
Phi™ coprocessors
27
0.46
SECONDS
0.197
SECONDS
145X
FASTER
2.3X
FASTER
340X
FASTER
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Source: Intel Measured results as of October 26, 2012 Configuration Details: Please reference slide speaker notes.
For more information go to http://www.intel.com/performance
28. Highly-parallel Processing for Unparalleled Discovery
Seamlessly solve your most important problems of any scale
Intel® Xeon Phi™
product family
Based on Intel® Many Integrated Core (Intel® MIC)
architecture
Leading performance for highly parallel workloads
Common Intel® Xeon® programming model seamlessly
increases developer productivity
Single Source
Launching on 22nm with >50 cores
Compilers
and Runtimes
Intel® Xeon® processor
Ground-breaking real-world application
performance
Industry-leading energy efficiency
Meet HPC challenges and scale
for growth
28
28
29. A GROWING ECOSYSTEM
Developing today on Intel® Xeon Phi ™ Co-processors
Shown at SC’12, November 2012
29
Other brands and names are the property of their respective owners.
30. PERFORMANCE PROOF POINT:
ENERGY INDUSTRY
Sinopec iCluster PSDM
“
Speedup
(Higher is Better)
This will provide an amazing boost for the
performance of the Sinopec iCluster seismic
imaging system.”
Zhao Gaishan
VP of Sinopec Geophysical Research Institute, November, 2012
6
5
5
4
3
2
1
1
1.06
0
• Application: Sinopec iCluster PSDM is a key module in the
Sinopec iCluster* seismic imaging system. The split step
fourier prestack depth migration (SSF PsDM) algorithm is
ideal for mild lateral velocity variations.
It provides one-way approximation with wave
propagation performed in the frequency domain.
• Status: In house code
•
•
•
2S Intel® Xeon® processor E5-2680
• Usage Model: Offload
Intel® Xeon Phi™ Coprocessor
(pre-production HW/SW)
• Demonstrated Results: Dramatic scaling (5.3x) over
baseline using two server nodes, each with two Intel®
Xeon® processors and two Intel® Xeon Phi™ coprocessors
2S Intel ® Xeon® processor E5-2680 +
2 Intel® Xeon Phi™ Coprocessor
Two node (pre-production HW/SW)
Code Optimization Strategies:
software.intel.com/en-us/articles/optimize-seismic-imaging-processing-on-intel-xeon-phi
30
30
SOURCE: INTEL RESULTS AS OF JULY, 2013
31. PERFORMANCE PROOF POINT:
FINANCIAL SERVICES
Black-scholes formula valuation
• Application: Black-Scholes financial modeling requires
raw computational power plus high bandwidth
between
execution cores and memory
Speedup
(Higher is Better)
• Status: Case Study available
7
• Highlights: Dramatic scaling for both single- and
double-precision computations
5.81
6
5
4
2.85
3
2
1
0
•
•
1
Single Precision
1
Double
Precision
2S Intel® Xeon® processor E5-2670
2S Intel Xeon processor E5-2670 +
Intel® Xeon Phi™ Coprocessor
(pre-production HW/SW)
Read the Case Study:
• Demonstrated Results:
• Intel® Xeon Phi™ coprocessor streaming store
provides optimized cache and bandwidth usage
• Intel Xeon Phi coprocessor fast transcendental
functions exp2(), log2() increase performance
and accuracy on SP
• Intel® Xeon® processors also benefit from using
exp2()/log2()
• Compiler based code generation enables plain
C++ code, which delivers higher
performance than vector intrinsics
software.intel.com/en-us/articles/case-study-achieving-superior-performance-on-black-scholes-valuation-computing-using
31
31
SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2013
32. PERFORMANCE PROOF POINT:
GOVERNMENT AND ACADEMIC RESEARCH
LRZ/TUM SG++
• Application: Extends MATLAB with a toolbox for
employing spatially adaptive sparse grids in a flexible,
modular way
Speedup
(Higher is Better)
• Status: not released yet
4.5
3.92
4
• Uses MPI, intrinsics, offload pragmas, and OpenMP
3.5
3
• Floating-point intensive and has a complex, non-linear
kernel
2.5
2
1.5
1
1
0.5
0
•
•
2S Intel® Xeon® processor E5-2670
2S Intel Xeon processor E5-2670 +
2* Intel® Xeon Phi™ Coprocessors
(pre-production HW/SW)
32
32
• Workload Characteristics:
SOURCE: THIRD PARTY MEASURED RESULTS AS OF NOVEMBER, 2012
• Innermost loop contains an if statement for efficient
handling of high-dimensional grid boundaries
(reduces computational complexity)
- Demonstrated Results:
• Highlight: Performance scales well to four
Intel® Xeon Phi™ coprocessors per node
• Supports symmetric configurations with Intel® Xeon®
processors and Intel® Xeon Phi™ coprocessors
33. PERFORMANCE PROOF-POINT:
WEATHER AND CLIMATE RESEARCH
WRF v3.5
•
Application: Weather Research and Forecasting
(WRF)
•
Status: WRF V3.5 was released 4/18/13
•
Code Optimization:
Speedup
(Higher is Better)
1.6
1.4
• Approximately two dozen files with less than
2,000 lines of code were modified (out of
approximately 700,000 lines of code in about
800 files, all Fortran standard compliant)
1.4
1.2
1
1
0.8
0.6
• Most modifications improved performance for
both the host and the co-processors
0.4
0.2
0
•
•
Performance Measurements: Pre release of WRF
3.5 (V3.5Pre) and NCAR supported CONUS2.5KM
benchmark (a high resolution weather forecast)
•
Acknowledgments: There were many contributors
to these results, including the National Renewable
Energy Laboratory and The Weather Channel
Companies
2S Intel® Xeon® processor E5-2670 with
eight-node cluster configuration
2S Intel® Xeon® processor E5-2670 +
Intel® Xeon Phi™ coprocessor
(pre-production HW/SW)
with eight-node cluster configuration
33
33
•
SOURCE: INTEL MEASURED RESULTS AS OF JULY, 2013
34. Next Generation
Intel® Xeon Phi™ Product Family
(Codenamed Knights Landing)
Available in Intel cutting-edge 14
nanometer process
Stand alone CPU or PCIe coprocessor
– not bound by ‘offloading’
bottlenecks
Integrated Memory - balances
compute with bandwidth
Parallel is the path forward - Intel is your roadmap
All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.
34
34
Note that code name above is not the product name
36. Next Front of System Innovation: Fabrics
HPC Expertise
Fabric Management & Software
Highest Performance, Scalable IB
Products
HPC Expertise
Intellectual Property
World-class Interconnects
Low-latency Ethernet Switching
Data Center Ethernet Expertise
High Radix & Low Radix Switch
Products
Intel’s
Comprehensive
Connectivity and
Fabric
Portfolio
Market Leading Compute & Ethernet
Products
Platform Expertise
Unprecedented Rate of Innovation in HPC Fabric
36
Other brands and names are the property of their respective owners.
37. Intel® True Scale HPC Fabric
Key Differentiators
Connectionless
- Minimal on-adapter state
- No Chance of cache misses as the cluster/fabric scales
- Maintains low end-to-end latency, even at scale
PSM Layer
- Performance Scaled Messaging light weight interface between MPI (Message
Passing Interface) and the InfiniBand device driver
- High MPI message rate performance
- Excellent short message efficiency
- Collective performance at scale without requiring special/hardware
acceleration
37
38. The Advantages of Fabrics Integration
Intel®
Processor
32 GB/sec
Fabric
Controller
System IO Interface (PCIe)
10-20 GB/sec
Fabric Interface
Today
Problem:
• Power – System IO Interface Adds “10s Of Watts”
Incremental Power
• Cost & Density – More Components On A Server
Node
• Scalability – Processor Capacity & Memory
Bandwidth Scaling Faster Than System IO
Bandwidth
Solution:
Fabric Interface
Intel® Processor
100+ GB/sec
Fabric Controller
Tomorrow
• Removing The System IO Interface From The
Fabrics Solution Reducing Power
• An Integrated Fabrics Results In Fewer Components
On The Server Node
• An Integrated Fabric Balances Fabric and Compute
Scaling Application Performance & Efficiency
Fabrics Integration Required to Scale Performance & Power
38