4. Data Growth Phenomenon
Photos uploaded to
750M Facebook over 2011
new year’s weekend
Data stored In
Tweets sent 966PB Manufacturing
as of 2009
200M every day in
August 2011
Video generated every
6.7PB day in a Smart City
project in China
Potential annual value
$20B+
Spent on
$60B from Big Data to US
health care acquisition of
data storage,
management,
and analysis
companies in
Value for service last 12 months
provider from global
$100B+ personal location
data Decrease in product
50% development, assembly
costs for manufacturing
Source: McKinsey Global Institute Analysis
5. What is Big Data?
Traditional Data Big Data
Volume Gigabytes to Terabytes Petabytes and beyond
Velocity Occasional Batch – Complex Event Processing Real-Time Data Analytics
Variety Centralized, Structured i.e. Database Distributed, Unstructured Multi-format
6. Why is Big Data Important?
Smart City Project: Up to 50% Decrease
Improve Public in Product
Safety, Boost Development and
Economic Growth Assembly Costs1
Online Retailer
Generate Revenue Generated 30% of
from Data Analytics Sales Due to
of B2B Sales Analytics Driven
Recomendations1
Data is the Raw Material of the Information Age
1::McKinsey Global Institute Analysis
*Other brands and names are the property of their respective owners.
8. Changing Economics for Big Data Challenges
Annual Server Unit Shipments Supercomputing in 2010
1997
>500 TFLOPS ~$55K/GFLOP
~1 TFLOP <$100/GFlop
1990 2000 2000
2010 Performanc $/GFLOP
e
9. The Heart of a Next Generation Cloud
Intel® Xeon® E5: The Cloud’s primary building block
• Up to 80% performance boost vs..
prior gen1 at consistent power level
• Dramatically reduce compute time
with Intel® Advanced Vector
Extensions
• Performance when you need it with
Intel® Turbo Boost Tech 2.0
• Up to 66% reduction in total cost of
ownership1
Delivers 100X Performance Boost since 2000
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific
computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you
in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
1 Over previous generation Intel® processors. Intel internal estimate. For more legal information on performance forecasts go to http://www.intel.com/performance
2 Intel measurements of average time for an I/O device read to local system memory under idle conditions. Improvement compares Xeon E5-2600 family vs.. Xeon 5600 series
3 Source: Intel internal analysis (backup); 2008 of 3 yr TCO. 4X power efficiency of 4 year old servers. See www.intelsalestraining.com/xeonestimator
9 4 Intel. As reported at Q1’12 Intel earnings call.
11. Intel® Xeon® Processors:
Solve the Most Important Problems of Any Scale
373 of Top500*
supercomputers are
powered by
Intel® Architecture
12. Enabling a Vibrant Ecosystem
Intel Software and Services
Intel Software Network:
Engaging 7,000 ISVs via Providing resources
Software Partner Program to > 8.3M developers
Provisioning >2800 academic
institutions with curricula,
tools, training and research Support Open Source
Enabling AWS to run your choice of OS, Applications & Programming Languages
13. Amazon Web Services powered by
Intel® Xeon® processor E5-2670
Access to Supercomputing On-demand
Latest Intel® Xeon® performance enhancements
without disruption
Business Agility to Efficiently Perform Data
Intensive Tasks in Less Time
14. Intel® Powered Supercomputer at AWS
AWS built the 42nd fastest supercomputer in the world
1,064 Amazon EC2 CC2 instances with17,024 cores
240 teraflops cluster (240 trillion calculations per second)
Less than $1,000 per hour
Based on Intel® Xeon® processor E5-2670
Supercomputers by the Hour … for Everyone
15. Intel® & AWS deliver scale that lowers your cost
AWS Scale & Innovation… … Drives Customer’s Costs Down
Attract More Invest in
Customers Capital
Reduce Invest in
Prices Technology
19 AWS Price Reductions
Improve
Efficiency
Fueling Innovation in the Cloud
16. Business Agility
Experiment Often & Fail Quickly with AWS on Intel
Cost of failure falls
dramatically
People are free to try out
new ideas
More risk taking, more
innovation
19. With Nimbus Discovery, looking at a cancer drug
target:
• Completed 12.55 Compute Years of Work
• Analyzed ~21 Million Ligands
• In only 3 hours, at a cost of $4828.85 / hour
• Instead of $20+ Million in infrastructure
Intel & AWS make impossible Big Science, possible
20. Weblog Analysis Suggests What You Are Searching For
Better consumer experience through Big Data analysis
20
21. Power and Simplicity of AWS on Intel® Xeon® Processors:
Speeds your Time to Market
24. Tick-Tock Development Model
Sustained Xeon® Microprocessor Leadership
Tick Tock Tick Tock Tick Tock Tick Tock
65nm 45nm 32nm 22nm
Intel® Core™ Nehalem Sandy Bridge
Microarchitecture Microarchitecture Microarchitecture
First high-volume Up to 6 cores Up to 8 cores
server Quad-Core and 12MB Cache and 20MB Cache
CPUs
Integrated memory controller Integrated PCI Express
Dedicated high-speed with DDR3 support
bus per CPU Turbo Boost 2.0
Turbo Boost, Intel HT, AES-
HW-assisted NI1 Intel Advanced Vector
virtualization (VT-x) Extensions (AVX)
End-to-end HW-assisted
virtualization (VT-x, -d, -c)
25. APPROVED FOR PUBLIC USE
Intel® Xeon® Processor E5-2600 Product Family
Historical 2S Integer Throughput Performance
Integer Throughput Performance
Single Core
100X
Dual Core
Quad Core
Six Core
Eight Core
Baseline Score
Higher is better
Xeon 1.00Xeon 1.26Xeon 2.20 512KB 3.06Xeon 3.20Xeon 3.60Xeon 3.80 2M 3.00Xeon 3.00 8M L2 QCXeon 2.93 8ML3 QC (2009) GT/s QPI GT/s QP
256KB L2 512KB L2 (2001) L2 (2002) (2003) L3 (2004) L2 (2004) L2 (2005) DC (2006) 12M L2 QC (2008) 2.9 20ML3 8C 8.0 (2010)
(2000) Xeon 1M L3 2M 1M Xeon 4M L2 Xeon 3.33 (2007) 3.46 Xeon 6C 6.4
Xeon 12ML3
Intel® Xeon® Delivers 100X Boost in 2S Integer Throughput Performance since 2000
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using
specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance
tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Internal Assessment and Estimates.
For more information go to http://www.intel.com/performance
25
Notes de l'éditeur
What is Big Data - Big Data represents the information that is generated and requested from the result of connecting over 2.5B people with billions of devices supported by billions more sensors and intelligent connected systems to the Internet. These data will continue to grow at a rapid pace as more and more devices and users join online services. Tools to enable near real time information processing of this data are maturing with updated infrastructure like the cloud, or embedded IT infrastructures and storage. The “Vs.” represent the elements of how to classify the different actions and behavior on this information…Ultimately the goal is to come up with the 4th V – Value or the Quickest and Most Confident Time to Value Response for a given set of requests and conditions on the digital environment you are connected to.
If harnessed and managed correctly the impact of Big Data Insights is monumental. Imagine a world where Person to Person and Machine to Machine Analysis and Understanding is made available at your fingertips for business, social and ecological purposes. The tremendous gains in efficiency for the cities we live in and love to visit would be fantastic.Smart City Project: The Smart Cities project is creating an innovation network between cities and academics to develop & deliver better services to citizens and businesses Example “In Rio de Janeiro, IBM has developed one of the most ambitious urban management commandcenters that have ever arisen. Rio’s operation center is a large situation room that offers real time data of the different systems that govern the city. This integrates, from 400 cameras on the street and a wide network of sensors, 32 municipal agencies to exchange information in order to solve urgent crises such as a power outage, an endless rain or a traffic jam, taking decisions instantly. (source: http://www.paseoproject.eu/en/imaging-smart-city-minds/)Local Example as of 7/11 - Taipei has become the first “smart city with free Internet access” among the world's international cities. All people in Taiwan, whether they are residents or visitors from foreign countries or mainland China, can take advantage of the system, Hau said. (source: www.chinapost.com.tw/taiwan/local/taipei/2011/07/02/308329/Taipei-flips.htm)Taobao.com (Taobao: Cutting Costs and Saving Power)• Established in 2003, Taobao.com was financed by Alibaba Group.• Taobao.com is the largest online shopping platform with a market segment share of over 80 percent of the e-business in China.• Taobao.com has more than 800 million pieces of product information and over 370 million registered users. It is one of the top 20 websites by number of page views worldwide, with over 60 million visits every day.Manufacturing and Retail per McKinsey group report “Big data: The next frontier for innovation, competition, and productivity” (source: https://www.box.com/s/b8ffbe1253be764ec33b)Retail info: In the coming years, the continued adoption and development of big data levers have the potential to further increase sector-wide productivity by at least 0.5 percent a year through 2020. Among individual firms, these levers could increase operating margins by more than 60 percent for those pioneers that maximize their use of big data… US online and Web-influenced retail sales are forecast to become more than half of all sales by 2013… The volume of data is growing inexorably as retailers not only record every customer transaction and operation but also keep track of emerging data sources such as radio-frequency identification (RFID) chips that track products, and online customer behavior and sentiment.Specific example is Amazon’s results from their “you might also want” prompts (note I dropped Amazon specific attribution since we did not get it directly from them)Manufacturing info: Manufacturing stores more data than any other sector—close to 2 Exabyte's of new data stored in 2010. This sector generates data from a multitude of sources, from instrumented production machinery (process control), to supply chain management systems, to systems that monitor the performance of products that have already been sold (e.g., during a single cross-country flight, a Boeing 737 generates 240 terabytes of data).Example: Product lifecycle management (PLM) becomes a platform for “co-creation,” OEM and part suppliers can collaborate on design on-line. Toyota, Fiat, and Nissan have all cut new-model development time by 30 to 50 percent; Toyota claims to have eliminated 80 percent of defects prior to building the first physical prototype.Additional context: Big data are driving additional efficiency in the production process with the application of simulation techniques to the already large volume of data that production generates. The increasing deployment of the “Internet of Things” is also allowing manufacturers to use real-time data from sensors to track parts, monitor machinery, and guide actual operations.
When the Pentium Pro processor was introduced back in 1995, we shipped fewer than a million servers based on Intel processors, and less than 10% of the revenue spent on all server hardware was based on Intel architecture and 90% was based on these other proprietary architectures. And what you’ve seen since is a dramatic growth in the total volume of servers, with a significant portion of that driven by Intel based processors. And of course today the industry ships per IDC a little over 8 million servers a year, with 8 out of 10 of those servers based on Intel. Little did we know back in 1995 that we were one of key ingredients coming together to enable the transformation of the internet and the growth of the worldwide web. The ability to have a standard high volume server so that the internet users could scale in a cost effective manner, combined with the standards of the time: HTML, HTTP. All of that combined with the software, such as Apache web servers and Netscape browsers. All of these factors converged to create the internet phenomena and drive that growth. Of course we didn’t forecast that with the Pentium Pro processor, but we’re very proud to have been a part of it.
Intel Xeon-based servers have long been adopted by IT shops as the leading compute building blocks. And the most popular Xeon-based newest servers today are based on the Xeon E5-2600 processor series which offers advanced capabilities that simplify and save.To meet the growing demands of IT such as readiness for cloud computing, the growth in users and the ability to tackle the most complex technical problems, Intel has focused on increasing the capabilities of the processor that lies at the heart of a next generation data center. The Intel Xeon processor E5-2600 product family is the next generation Xeon processor that replaces Platforms based on the Intel Xeon processor 5600 & 5500 series. These processors offer better than ever performance no matter what your constraint is – floor space, power or budget – and on workloads that range from the most complicated scientific exploration to simple, yet crucial, web serving and infrastructure applications. Building on the success of it’s Xeon 5600 predecessor, the E5-2600 product family has increased processor core count and cache size in addition to supporting more efficient instructions with Intel® Advance Vector Extensions, to deliver up to an average of 80% more performance across a range of workloads. In addition to the raw performance gains, we’ve invested in improved I/O with Intel Integrated I/O which reduces latency ~30% while adding more lanes and higher bandwidth with support for PCIe 3.0. This helps reduce network and storage bottlenecks to unleash the performance capabilities of the latest Xeon processor. Deploying the E5-2600 can reduce your total cost of ownership by up to 66% via savings in utilities, software support and maintenance. Check out the online TCO tool to estimate savings in your specific situation. Now let’s turn to storage…
I want to highlight four of our key enabling programs.With 7,000 member companies, the Intel® Software Partner program helps support optimization efforts by concentrating resources, references, and tools for software optimization into key Technology Focus Areas.Through the Intel® Software Network, developers can connect with a plethora of communities, tools, training, events and more to do their jobs better and to deliver more efficient, higher-performing software to the market more quickly. These communities include mobility, open source, virtualization, visual computing, multi-threading, manageability, Intel® Atom™ processors and more. The Intel Academic Community provides on-line training and supplies higher education institutions with technical curricula and other resources.The Intel AppUpSM development program mentioned previously, provides the tools, resources, and support developers need to easily create, port, package, and sell apps for multiple device platforms worldwide through the Intel AppUp center and 20+ affiliate stores.Other enabling efforts include a customer response team, and high-touch enabling—that might involve on-site engineers—to speed development.
The more misspelled words you collect, the better is your spellcheck applicationIndividual engineers empowered to find answers in the vast data logs driving innovation and value
Key points:The Intel® Xeon® processor E5-2600 product family offers ~80% higher performance on key industry benchmarksOn some synthetic technical computing focused benchmarks we are seeing even high results – over 2X but typical user experience should see improvements closer to 80%Comparisons are top bin 5600 to top bin 2600 in 2S configurationStory:Compared to the best Intel Xeon processor 5600 series part, the Intel Xeon processor E5-2600 product family offers significant performance improvements across a range of workloads. What you’ll see on this slide is a cross section of enterprise and technical computing workloads that give a flavor of the scale of benefits that a typical user would see. For example on integer throughput (aka SPECint_rate) which is a good proxy for a typical enterprise server shows ~70% improvement while technical computing workloads – think everything from supercomputers crunching advanced physics problems to workstations rendering media content - are seeing even stronger results. You may notice that I’ve said that we’re seeing up to 80% performance improvement but actually have measured results over 2X – those results are specific, synthetic benchmarks that are focused on testing specific elements of the processor such as STREAMs which measures memory and LINPACK which is used to rank supercomputer’s theoretical computational power and we’re seeing fantastic results based on the latest microarchitecture, but when I talk to you about performance I want to make sure that I set the right expectation about what you should expect to see when you run you applications that utilize every part of the server not just narrow elements. So what enables us to deliver this kind of performance improvement vs. the prior generation even though these parts are on the same manufacturing technology?