The document announces Intel's Open Platform for Next-Gen Analytics, including the Intel Distribution for Apache Hadoop software. The software delivers hardware-enhanced performance and security for Apache Hadoop and enables partners to innovate analytics solutions. Intel aims to democratize data analysis from edge to cloud with open platforms and software value.
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Intel And Big Data: An Open Platform for Next-Gen Analytics
1. Open Platform for Next-Gen Analytics
Boyd Davis
VP Intel Architecture Group
GM Datacenter Software Division
@IntelITS
2. Legal Information
Today’s presentations contain forward-looking statements. All statements made that are not historical facts are subject to a number of
risks and uncertainties, and actual results may differ materially. Please refer to our most recent Earnings Release and our most recent
Form 10-Q or 10-K filing for more information on the risk factors that could cause actual results to differ.
If we use any non-GAAP financial measures during the presentations, you will find on our website, intc.com, the required reconciliation
to the most directly comparable GAAP financial measure.
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS
IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS
INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark
and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the
results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance
of that product when combined with other products.
Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel's current plan of
record product roadmaps.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not
guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not
specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference
Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
3. Making sense of one petabyte
50x 13y 11s
To read To view To generate
in Library of Congress as HD Video in 2012
http://blogs.loc.gov/digitalpreservation/2011/07/transferring-libraries-of-congress-of-data/
4. Analysis of data can transform society
Enhance scientific understanding, drive
innovation, and accelerate medical cures
Create new business models and
improve organizational processes
Increase public safety and improve
energy efficiency with smart grids
5. Virtuous cycle of data-driven user experience
Richer
user experiences
Richer data to
analyze
CLIENTS
Richer data
CLOUD from devices
INTELLIGENT SYSTEMS
6. Intel at the intersection of forces behind big data
HPC Cloud Open Source
Intel®
TrueScale
Infiniband
Enabling exascale computing Helping enterprises build Contributing code and
on massive data sets open interoperable clouds fostering ecosystem
* Other names and brands may be claimed as the property of others.
7. Democratize data analysis from edge to cloud
Unlock value in silicon
Support open platforms
Deliver software value
8. History of Intel and Apache Hadoop*
Product
Optimization
Tuning
Benchmarking
Release 2.0
Research Telco Smart City
(2012)
Release 1.0
HiBench Healthcare Retail (2011)
Web
Open Cirrus*
2009 2013
* Other names and brands may be claimed as the property of others.
9. Announcing availability of
Intel® Distribution for Apache Hadoop* software
Hardware-enhanced performance & security
Enables partner innovation in analytics
Strengthens Apache Hadoop* ecosystem
* Other names and brands may be claimed as the property of others.
10. Intel® Distribution for Apache Hadoop* software
Intel® Manager for Apache Hadoop software
Deployment, Configuration, Monitoring, Alerts, and Security
Data Exchange
Sqoop 1.4.1
Oozie 3.3.0 Pig 0.9.2 Mahout 0.7 R connectors Hive 0.9.0
HBase 0.94.1
Workflow Scripting Machine Learning Statistics SQL Query
Columnar Store
Zookeeper 3.4.5
Coordination
YARN (MRv2)
Distributed Processing Framework
Flume 1.3.0
Log Collector
HDFS 2.0.3
Hadoop Distributed File System
Intel proprietary
Intel enhancements contributed back to open source
All external names and brands are claimed as the property of others.
Open source components included without change
11. Intel® Distribution for Apache Hadoop* software
• Up to 20x faster decryption with AES-NI*
• Optimized with SSD and Cache Acceleration
• Up to 8.5X faster queries in Hive*
• Hardware-enhanced compression with AVX & SSE4.2
• Automated tuning with Intel® Active Tuner
*Based on internal testing
12. Sold with World-Class Intel Support
Annual Subscription with Technical Support
Support Coverage Options: 24x7 or 8x5
Via Solution Vendors and Service Providers
13. Backed by broad portfolio of datacenter products
Software
Cache
Acceleration
Software
Server Storage & Memory Network
14. Paul Perez
Vice President and GM
Data Center Group
* Other names and brands may be claimed as the property of others.
15. Intel portfolio delivers balanced performance
>4 hours Shown to improve 1 Terabyte sort
from 4 hours to 7 minutes
Intel® Xeon®
E5-2690
processor
~50%
improved Intel® SSD
520 Series Intel® 10GbE
Intel® Xeon Adapters
5690 ~80% Intel® Distribution for
Apache Hadoop*
improved ~50% software
7200 HDD improved
~40%
improved
1GbE Adapter
~7 minutes
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and
functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with
other products.
Source: Intel Internal testing
For more information go to : intel.com/performance
Other brands and names are the property of their respective owners
`
16. Proven in the enterprise
Using the Intel® Distribution to gain tremendous results
IT
* Other names and brands may be claimed as the property of others.
17. Satnam Alag
Vice President and CTO
* Other names and brands may be claimed as the property of others.
18. Delivering innovation in the open
Pipeline of innovation from Intel Labs
• Machine Learning
• Data-Intensive Algorithms & Computer Architecture
Roadmap of open source from Intel Software
• Project Panthera: Standard SQL on Apache Hadoop
• Project Rhino: Hardening Apache Hadoop
19. Lighting up unused data for big impact
Intel accelerating adoption of Hadoop
+
Apache Hadoop landing on Intel Xeon
2 years faster
Units
Intel® Xeon processor growth from big data use
2013 2014 2015 2016 2017
20. With broad support from the ecosystem
* Other names and brands may be claimed as the property of others.
21. Enabling partner innovation in next-gen analytics
Richard Pledereder, Senior Vice President
SAP® HANA* Engineering
Steve Garrou, Vice President
Global Solutions
Ranga Rangachari, Vice President and GM
Storage Business
Paul Perez, Vice President and GM
Data Center Group
22. Summary
• Intel announced Intel® Distribution for Apache Hadoop* software
• Delivers hardware-enhanced capabilities and software enhancements
• Backed by broad portfolio of Intel data center products
• Contributes to open source and supports Apache Hadoop
• Enabling ecosystem of partners to innovate on analytics solutions
25. Apache Hadoop Performance Test Configuration
4 hours to 7 minutes
Cluster Configuration Head Node Hardware
q 1 Head Node (name node, job tracker) q 1 x Dell r710 1U servers
q 10 Workers (data nodes, task trackers) § Intel: 2x3.47GHz Intel® Xeon®
q 10-Gigabit Switch: Cisco Nexus 5020 processor X5690
§ Memory: 48G RAM
§ Storage: 10K SAS HDD
Software Configuration § Intel® Ethernet 10 Gigabit SFP+
q Intel Distribution for Apache Hadoop 2.1.1
§ Intel® Ethernet 1 Gigabit
q Apache Hadoop 1.0.3
q RHEL 6.3
q Oracle Java 1.7.0_05 Worker Node Hardware
Results have been estimated based on internal Intel analysis and are provided for
10 x Dell r720 2U servers
informational purposes only. Any difference in system hardware or software design or § Intel: 2 x 2.90Ghz Intel® Xeon® processor E5-2690
configuration may affect actual performance. Software and workloads used in
performance tests may have been optimized for performance only on Intel § Memory: 128G RAM
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured
using specific computer systems, components, software, operations and functions. Any § Storage: 520 Series SSDs
change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated
§ Intel® Ethernet 10 Gigabit SFP+
purchases, including the performance of that product when combined with other
products. Note: The below disclaimer should be included whenever the general
§ Intel® Ethernet 1 Gigabit
performance disclaimer is used, but should be numbered separately:
Configurations: [describe config + what test used + who did testing]. For more
information go to http://www.intel.com/performance