Hardware and Software Co-optimization to Make Sure Oracle Fusion Middleware Rocks!
1. 1
Khun Ban (Intel),
Kingsum Chow (Intel)
September 22-26, Day 2013
Hardware and Software
Co-Optimization to make
Sure Oracle Fusion
Middleware Rocks
3. 3
This is Why We Care About Java
http://en.wikipedia.org/wiki/Programming_languages_used_in_most_popular_websites
4. 4
Improvement Examples of JIT Compiler and
Java Libraries
AES-NI in SunJCE intrinsics
• Used in Enterprise/Healthcare applications
PCLMULQDQ for CRC32
• Used in HBASE/Hadoop workloads
RDRAND for SecureRandom
• Intel Secure Key
AVX superword vectorization
• Used in HPC/Enterprise applications
AVX in string and array intrinsics
• Widely used in all applications
Java applications benefit in a transparent manner
5. Intel® Xeon® Processor E5-2600 v2 Product Family
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific
computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully
evaluatingyour contemplated purchases, including the performance of that product when combined with other products.
1..30% savings: Oracle: http://www.intel.com/content/www/us/en/data-center-efficiency/data-center-efficiency-xeon-oracle-changing-the-game-study.html
2. Baidu: http://www.intel.com/content/www/us/en/data-center-efficiency/data-center-efficiency-xeon-baidu-case-study.html. China Telecom:
http://www.intel.com/content/www/us/en/enterprise-security/enterprise-security-xeon-5600-china-telecom-business-advantage-study.html?wapkw=china
telecom.
3. Over previous generation Intel® processors. Intel internal estimate. For more legal information on performance forecasts go to
http://www.intel.com/performance
*Other names and brands may be claimed as the property of others.
Decrease Data Center Power Costs
…without Compromise
5
Server
s
50%
Labor
13%
Networkin
g
6%
Facilities
5%
Other IT
3%
23%
Operational Costs of a
Typical Large Cloud
Service Provider
Power
Power Management at the
Server, Rack & Data Center
Level
Intel® Node Manager
Up to 30% power reduction at
similar performance.1 Up to 40%
more servers and performance per
rack.2
Intel® Xeon®
Greater Workload Consolidation.
Up to 66% TCO reduction3
Intel® Data Center Manager
Up to 30% power reduction at similar
performance.1 Up to 40% more
servers and performance per rack.2
6. 6
Tick-Tock Development Model:
Sustained Microprocessor Leadership
Intel® Core™
Microarchitecture
New
Micro-
architecture
Xeon®
5300
65nm
TOCK
Xeon®
5400
New
Process
Technology
45nm
TICK
Intel® Microarchitecture
Codename Nehalem
New
Micro-
architecture
Xeon®
5500
45nm
TOCK
Xeon®
5600
32nm
New
Process
Technology
TICK
Intel® Microarchitecture
Codename Sandy
Bridge
Xeon®
E5- 2600
32nm
New
Micro-
architecture
TOCK
22nm
New
Process
Technology
TICK
Intel® Microarchitecture
Codename Haswell
Haswell
22nm
New
Micro-
architecture
TOCK
Future
14nm
New
Process
Technology
TICK
Xeon®
E5- 2600 v2
Latest Micro-architecture on Leading Process Technology
7. 7
Real Enhancements Where it Counts
50% MORE
Last-level
cache
Cores &
Threads
IMPROVED
Integrated IO (PCIe
3.0)
Faster
Memory
• 1. Source: Intel internal measurements: [idle power, Intel® Xeon ® processor E5- 26xx v2 (12C, 2.5GHz, 95W), 28
March 2013]. Results have been simulated and are provided for informational purposes only. Results were derived
using simulations run on an architecture simulator or model. Any difference in system hardware or software design or
configuration may affect actual performance. Intel product plans in this presentation do not constitute Intel plan of
record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product
roadmaps. For more information go to http://www.intel.com/performance
• * Other names and brands may be claimed as the property of others
Intel® Xeon® Processor E5-2600 v2 Product Family
23% LESS
Idle
power
NEW
Virtualization
features
Security
features
8. 8
More Data Protection Capabilities
Intel® Xeon® Processor E5-2600 v2 Product Family
Reduces Malware
Exploiting
System Vulnerabilities
Strong Algorithms
Data Protection
Intel®
Secure Key
True Randomness
Good Cryptography
Requires
Intel® AES-NI
Strong Keys
Faster, Stronger Encryption
User Mode
Supervisor Mode
(Kernel Mode /
Ring-0)
Operating System
“Can perform
any task on
system”
“Can perform
limited tasks
on system”
Intel® OS Guard
9. 9
Performance penalties of
virtualization often due
to VM exits
Many VM exits are for
interrupt controller
processing
Eliminate up to 50%* of
VM exits
Interrupt/APIC Virtualization: Overview
Intel® Xeon® Processor E5-2600 v2 Product Family
Without APICv:
• VMM must fetch/decode instruction
• ~2,000-7,000 cycles per exit* (varies by VMM)
With APICv:
• Instruction executes directly
• H/W and microcode emulates APIC
• No VM exit
VMM
APICv model
in software
VM
Guest OS
VM
Exits
VMM
APICv in
CPU HW / µ-
code
VM
Guest OS
No VM
Exits
configure
Without APICv With APICv
Advanced Programmable
Interrupt Controller Virtualization
*Intel internal estimation of improvement vs E5-2690 . Source: Intel internal projection (SPP JET Q2'12 approved r esults as of 19 July 2012.
Results have been simulated and are provided for informational purposes only. Results were derived using simulations run on an architecture simulator or
model. Any difference in system hardware or software design or configuration may affect actual performance. Intel product plans in this presentation do not
constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.
*Other names and brands may be claimed as the property of others.
10. 10
Intel® SSDs: Data Center Product
ComparisonIntel® SSD
910 Series S3700 Series S3500 Series 710 Series 320 Series
Interface PCIe* gen2 x8 SATA 6Gb/s ATA8 SATA 6Gb/s ATA8 SATA 3Gb/s ATA8 SATA 3Gb/s ATA8
Form factor ½ height ½ length 1.8, 2.5 inch 1.8, 2.5 inch 2.5 inch 1.8, 2.5 inch
Capacities (GB) 800, 400 GB (1.8) 400, 200 GB
(2.5) 800, 400, 200,
100 GB
(1.8) 800, 400, 240, 80
GB
(2.5) 800, 600, 480, 300,
240, 160, 120, 80 GB
300, 200, 100 GB (1.8) 300, 160, 80
GB
(2.5) 600, 300,
160, 120, 80, 40
GB
Random
Performance (IOPs,
4KB files)
R: up to 180K
W: up to 75K
R: up to 75K
W: up to 36K
R: up to 75K2
W: up to 11.5K2
R: up to 38.5K
W: up to 2.7K
R: up to 39.5K2
W: up to 6002
Sustained
Sequential
Performance (MB/s)
R: up to 2.0GB/s
W: up to 1.0GB/s
R: up to 500MB/s
W: up to 460MB/s
R: up to 500MB/s3
W: up to 4505MB/s3
R:up to 270MB/s
W:up to 210MB/s
R:up to 270MB/s3
W:up to 200MB/s3
Power 25W Max, 8W stdby 6W typ, 650mW4 idle 5W typ, 650mW4 idle 3.7 typ. 700mW
idle
4W active, 95mW
idle
Latency R, W: <65µs R: 50µs, W: 65µs R: 50µs, W: 66µs R: 75µs, W: 85µs R: 75µs, W: 95µs
Endurance5 Up to 14PB 10 DWPD over 5 years Up to 450TB Up to 1.8PB Up to 60TB
Reliability 1 sector/1016 bits
read, max (UBER5)
2.0M Hrs MTBF 2.0M Hrs MTBF 2.0M Hrs MTBF 1.2M Hrs MTBF
Security No AES-256 bit encryption AES-128 bit encryption
Data path protection Yes LBA tag checking
Pwr safe Write cache Yes
Temperature sensor Yes No
Current in-rush limiter Yes
2. IOPs for 160GB other
capacities may vary
3. Transfer rate for
300GB; other
capacities may vary.
Measurements are
performed on 100-
percent span of the
drive (enterprise
workload)
4. Without DIPM.
5. Endurance claims are
“up to” using 4KB transfer
sizes.
6. Uncorrectable Bit
Error Rate
7. DWPD – Drive Writes
Per Day
11. TRANSFORMING COMMUNICATIONS11
Intel QuickAssist Technology
Up to 50Gbps Crypto
Acceleration
Intel® QuickAssist Technology providing hardware
acceleration for encryption and compression in Chipset
Up to 24Gbps
Compression
Acceleration
Target Market Segments:
• Networking Infrastructure
• WAN Optimization
• Cloud Computing
• Integrated VPN/Firewall appliances
• UTM Gateway, Routers
• 3G & 4G/LTE infrastructure equipment
E5-2600 v2
E5-2600 v2
QPI
Intel
Quick
Assist
Adapter
DDR3
DDR3
DDR3
DDR3
DDR3
DDR3 DDR3
Intel
Ethernet
Controller
Up to
2x10GbE
DDR3
2 SATA 2.0
6 USB 2.0
50 GPIO
PCIe
PCIeDMI
Up to 40K RSA 2k-bit ops/sec
Acceleration
12. 12
Intel Xeon Delivered Outstanding Oracle
WebLogic Server Performance
• Intel and Oracle’s
collaboration achieved 2X*
performance in two years
– Software optimizations
– Hardware advances
• WebLogic Server is Oracle’s
flagship Application Server
• Performance shown is for the
most recent three generations
of Xeon E5 processors
• http://www.spec.org/jEnterprise2010/results/res2011q3/jEnterprise2010-20110727-00023.html
• http://www.spec.org/jEnterprise2010/results/res2012q1/jEnterprise2010-20120208-00028.html
• http://www.spec.org/jEnterprise2010/results/res2013q3/jEnterprise2010-20130904-00046.html
Configuration
info
* Based on SPECjEnterprise2010 published result.
“The SPECjEnterprise2010 benchmark is a full system benchmark which allows performance
measurement and characterization of Java EE 5.0 servers and supporting infrastructure
such as JVM, Database, CPU, disk and servers.” -- http://www.spec.org/jEnterprise2010/
5,427
8,310
11,260
-
2,000
4,000
6,000
8,000
10,000
12,000
Intel Xeon X5690 (Aug 11)
Oracle WLS 10.3.5
Intel Xeon E5-2690 (Mar 12)
Oracle WLS 10.3.6
Intel Xeon E5-2697 v2 (Sept 13)
Oracle WLS 12.1.2
SPECjEnterprise2010 Performance (EjOPS)
14. 14
Hard to Write Fast and Correct Multi-Threaded Code
Difficulty of Software Development
Identify concurrency
(algorithmic, manual…)
Manage concurrency
(locks, …)
Correctness Performance
15. 15
Bob and Alice saw A == $100. Locks prevent such data races
Need for Synchronization
Alice wants $50 from A
• A == $100, A set to $50
Bob wants $60 from A
• A == $100, A set to $40
A should be -10
use lock
A $100 A $100
Lock
Alice wants $50 from A
Alice locks table
A == $100, A set to $50
Bob wants $60 from A
Bob waits till lock release
A == $50, Insufficient funds
Developer
Table
16. 16
Such Tuning is Time Consuming and Error Prone
Lock Granularity Optimization
A $100
B $200
A $100
B $200
Lock
Developer
Alice withdraws $20 from A
• Alice locks table
Bob wants $30 from B
• Waits for Alice to free table
Alice withdraws $20 from A
• Alice locks A
Bob wants $30 from B
• Bob locks B
Coarse grain locking
(lock per table)
Fine grain locking
(lock per element)
Lock
Lock
Lock
Lock
Lock
Lock
17. 17
Complexity of Fine Grain Locking
Expensive and Difficult to Debug Millions of Lines of Code
Alice transfers $20 from A to B
• Alice locks A and locks B
Performs transfer
• Alice unlocks A and unlocks B
A $100
B $200
A $100
B $200
Alice transfers
$20 from A to B
Locks A
Cannot lock B
Bob transfers
$50 from B to A
Locks B
Cannot lock A
Lock
Lock
Lock
Lock
Lock
Lock
Lock
Lock
Lock
Lock
Lock
Lock
18. 18
What We Really Want…
Lock Elision: Fine Grain Behavior at Coarse Grain Effort
Developer uses coarse grain lock
Hardware elides the lock to expose concurrency in program
• Alice and Bob don’t wait
• Hardware automatically detects real data conflicts
Developer Effort
A $100
B $200
C $200
Hardware
Program Behavior
A $100
B $200
C $200
Coarse grain locking effort Fine grain locking
behavior
Lock
Lock
Lock
Lock
Lock
Lock
Lock
19. 19
Let The CPU Handle the Locks
Hardware does the work of figuring out concurrency
• Fine grain performance at coarse grain effort
Intel® TSX ‡ : Instruction set extensions for IA
• Transactionally execute lock-protected critical sections
• Execute without acquiring lock expose hidden concurrency
• Hardware manages transactional updates – All or None
• Specification at http://software.intel.com/file/41604
Intel® is making Parallel Programming Easier and Faster
‡ Intel® Transactional Synchronization Extensions (Intel® TSX), available on next generation Intel®
microarchitecture (Haswell)
21. 21
Summary
• Intel has been working closely with our partners to make sure our
customers will get the best out-of-the-box Fusion Middleware
experience:
• Highest performance to improve user experience
• Lowest cost of ownership by reducing power
• Safest communication with better security
26. 26
Legal Disclaimer
Intel® AES-NI requires a computer system with an AES-NI enabled processor, as well as non-Intel software to execute the
instructions in the correct sequence. AES-NI is available on select Intel® processors. For availability, consult your reseller
or system manufacturer. For more information, see Intel® Advanced Encryption Standard Instructions (AES-NI).
Intel® Hyper-Threading Technology (Intel® HT Technology) is available on select Intel® Core™ processors. Requires an
Intel® HT Technology-enabled system. Consult your PC manufacturer. Performance will vary depending on the specific
hardware and software used. For more information including details on which processors support Intel HT Technology, visit
http://www.intel.com/info/hyperthreading.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the results to
vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products. For more information go to
http://www.intel.com/performance.
Estimated Results Benchmark Disclaimer: Results have been estimated based on internal Intel analysis and are
provided for informational purposes only. Any difference in system hardware or software design or configuration
may affect actual performance.
Intel® Secure Key Technology: No system can provide absolute security. Requires an Intel® Secure Key-enabled
platform, available on select Intel® processors, and software optimized to support Intel Secure Key. Consult
your system manufacturer for more information.
27. 27
Compiler Notice
Intel's compilers may or may not optimize to the same degree for non-Intel
microprocessors for optimizations that are not unique to Intel microprocessors.
These optimizations include SSE2, SSE3, and SSE3 instruction sets and other
optimizations. Intel does not guarantee the availability, functionality, or
effectiveness of any optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with
Intel microprocessors. Certain optimizations not specific to Intel
microarchitecture are reserved for Intel microprocessors. Please refer to the
applicable product User and Reference Guides for more information regarding
the specific instruction sets covered by this notice.
Notice revision #20110804
28. 28
Risk Factors
The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are
forward-looking statements that involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,”
“believes,” “seeks,” “estimates,” “may,” “will,” “should” and their variations identify forward-looking statements. Statements that refer to or
are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel’s actual
results, and variances from Intel’s current expectations regarding such factors could cause actual results to differ materially from those
expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could cause actual
results to differ materially from the company’s expectations. Demand could be different from Intel's expectations due to factors including
changes in business and economic conditions; customer acceptance of Intel’s and competitors’ products; supply constraints and other
disruptions affecting customers; changes in customer order patterns including order cancellations; and changes in the level of inventory at
customers. Uncertainty in global economic and financial conditions poses a risk that consumers and businesses may defer purchases in
response to negative financial events, which could negatively affect product demand and other related matters. Intel operates in intensely
competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product
demand that is highly variable and difficult to forecast. Revenue and the gross margin percentage are affected by the timing of Intel product
introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings
and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to
technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from
expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products
for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; start-
up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; product
manufacturing quality/yields; and impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's
results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers
or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and
fluctuations in currency exchange rates. Expenses, particularly certain marketing and compensation expenses, as well as restructuring and
asset impairment charges, vary depending on the level of demand for Intel's products and the level of revenue and profits. Intel’s results
could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by adverse effects associated with
product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual
property, stockholder, consumer, antitrust, disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC
reports. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more
products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring other remedies such as
compulsory licensing of intellectual property. A detailed discussion of these and other factors that could affect Intel’s results is included in
Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release.
Rev. 7/17/13
I heard that Twitter very recently replaced their Ruby and Scala code with Java because it performs much better.
Key Message: But we don’t stop our focus on energy savings at the CPU, we also proactively address opportunities for cost savings at the rack and data center level. Let us show you how to:- Minimize your company’s impact on the environment, - Reduce DC power consumption,- Increase density at the system, rack and data center level - All without compromising the performance of business solutions and applicationsPower and Cooling costs represent as much as 23% of total OPEX for a typical large Cloud Service Provider. Intel can help you better manage your power bill in 3 ways:- Latest generation of Xeon-based servers allows for greater workload consolidation & virtualization – and can bring TCO reduction of 66%. They are also built to withstand higher temperature levels common in modernized data centers.- Intel Node Manager enables control of power consumption at the server level, and can deliver up to 30% power savings without compromising performance- Intel Data Center manages at the rack, row & Data Center level; and provides a console to manage and optimize workload placement based on dynamic power usage insights.
That’s why the Intel Software and Services group—and Intel’s software subsidiaries—play such a huge role in fulfilling Intel’s vision to create and extend computing technology to connect and enrich the lives of people. We work with a global ecosystem of developers to help them access and exploit the tremendous capabilities of Intel processors and technologies so that they can deliver amazing experiences to end-users. We have ongoing focused efforts to help ensure those experiences are secure and optimized. Plus we are continuing to work on making anytime/anywhere computing a reality. We’ll talk more about how we are doing this later in the presentation. But first let me tell you a little more about this Intel Software and Services team that is helping to transform the world of computing.Khun -addedIntel® Xeon® Processor E5-2600 v2 product family (formerly code-named Ivy Bridge-EP), the new chips:feature up to 12 cores, a 50% increase from the previous generation;have double the system memory capacity—from 750MB to 1.5GB of DRAM memory;are up to 45% more energy efficient;feature advanced security features, stronger encryption and a new feature called Intel OS Guard, which fortifies the operating system against malware attacks; andKey points: Significant performance gains with up to 50% more cores and cache at the same or lower power levels Continued advancements in security and virtualization performanceStory:To meet the growing demands of IT such as readiness for cloud computing, the growth in users and the ability to tackle the most complex technical problems, Intel has focused on improving the processor that lies at the heart of a next generation data center. The Intel Xeon processor E5-2600 v2 product family supports up to 12 cores and 30MB of cache which is 50% greater than the Intel Xeon processor E5-2600 product family while operating at the same or lower power levels. Initial performance estimates is this increase in processing capabilities will offer up to 40% more performance on a range of applications. Whether you are looking to use analytics to turn data into insights, using simulation tools to bring better products to market faster than ever or using social media applications to deeper relationships with customers, the latest Xeon processors will offer more responsiveness and flexibility to drive your data center and business to the next level.
Key points: Continuing to reduce VMM overhead to get virtual performance as close as possible to a native OS environmentStory:In a virtualized environment the virtual machine manager (VMM) must emulate nearly all guest OS accesses to the advanced programmable interrupt controller (APIC) registers which requires “VM exits” – time-consuming transitions to the VMM for emulation – and back. These exits are a major source of overhead in a virtual environment. Intel’s Advanced Programmable Interrupt Controller virtualization (APICv) reduces the number of exits by redirecting most guest OS APIC reads/writes to a virtual-APIC page to allow most reads to occur without VM exits. While final performance tests are not complete, Intel forecasts that this will eliminate up to 35% of VM exits relate to the APIC and allow another up to 30% of the VM exits to occur faster. Description:2-socket servers based on Intel Xeon processor E5 product family through 2013Baseline 1.0 is Xeon E5-2690 (8C Sandy Bridge EP) SKU Assumptions:E5-2690: 8C, 135W, 2.9GHzE5-2600v2: 12C, 130W, 2.7GHz Additional Details:VMM must virtualize guest’s interrupts and interrupt controller (APIC)Models APIC control state on a “virtual-APIC page” in DRAVMM must emulate nearly all guest accesses to APIC control registersRequires “VM exits” – time-consuming transitions into VMM for emulation – and back VMM must decode and emulate guest instructions that access APICExcept for Intel® VT FlexPriority, which virtualizesaccess to one APIC control registerTask priority – TPRNo VM exits required in this caseVMM must virtualize all interrupts coming to guestMust determine when guest is ready to receive interrupts and deliver as neededVirtualization of interrupts and APIC is a major source of overheadIllustration on next slideAPIC-register virtualization:Redirects most guest APIC reads/writes to virtual-APIC pageMost reads will be allowed without VM exits such as, interrupt command register - ICR_LowVM exits occur after writes (no need for decode) such as, ICR_low, timer’s initial-count registerVirtual-interrupt delivery:Extend TPR virtualization to other APIC registersNo need for VM exits for most frequent accesses (e.g., EOI – required for every interrupt)CPU delivers virtual interrupts to guest (including virtual IPIs)VMM needn’t track guest readiness or deliver manuallyEliminates old “pending interrupt” VM exitsNet result*: (Intel Netperf estimation)Eliminate up to 50% of VM exits (most of those related to virtualization of interrupts/APIC)Optimize up to 10% of VM exits (emulation made easier for some APIC writes)
Make sure audience understands what we mean by data conflict.And say “read-read” is NOT a conflict.