SlideShare une entreprise Scribd logo
1  sur  16
Sergey Maidanov
Software Engineering Manager,
Scripting Analyzers & Tools, Intel
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
2
Programming Languages by Popularity
Python remains #1
programming language in
hiring demand
followed by Java and C++
Go and Scala
demonstrate strong
growth for last 2 years
STAC Summit, November 2015
* Source: CodeEval, Feb 2015
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
3
Programming Languages Productivity
STAC Summit, November 2015
0
1
2
3
4
5
Python Java C++ C
PROGRAMMING
COMPLEXITY (HOURS)
Prechelt* Berkholz**
0
1
2
3
4
Python Java C++ C
LANGUAGE VERBOSITY
(LOC/FEATURE)
Prechelt* Berkholz**
* L.Prechelt, An empirical comparison of seven programming languages, IEEE Computer,
2000, Vol. 33, Issue 10, pp. 23-29
** RedMonk - D.Berkholz, Programming languages ranked by expressiveness
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Numerical Modeling: From Prototype To Production
4
1. Pre-
Processing
2. Analysis
3. Modeling
4. Model
Validation
5.
Visualization
1. Pre-
processing
2. Model
Calibration
3. Decision
Making
4. Model
Validation
5. Reporting
Prototyping Production
Python*,
R*, Matlab*,
Excel*
Fortran*,
C++.
Java*/Scala*
HPC/Big Data ClusterWorkstation
STAC Summit, November 2015
MPI*,
OpenMP*,
Hadoop*,
Spark*, …
1x
3-10x
and more
Development cost
Development cost
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
5
Why’s Interpreted Code Unfriendly To Modern HW?
STAC Summit, November 2015
Moore’s law still works and will work for at least next 10 years
We have hit limits in
 Power
 Instruction level parallelism
 Clock speed
But not in
 Transistors (more memory, bigger caches, wider SIMD, specialized HW)
 Number of cores
Flop/Byte continues growing
 10x worse in last 20 years
Source: J.Cownie, HPC Trends and What They Mean for
Me, Imperial College, Oct. 2015
Efficient software development means
 Optimizations for data locality & contiguity
 Vectorization
 Threading
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
6
Performance: Native vs. Interpreted Code
0.2
1
5
25
125
625
3125
15625
Python C C (Parallelism)
BLACK SCHOLES FORMULA
MOPTIONS/SEC
STAC Summit, November 2015
Chapter 19. Performance
Optimization of Black
Scholes Pricing
Configuration info: - Versions: Intel® Distribution for Python 2.7.10 Technical Preview 1 (Aug 03, 2015), icc 15.0; Hardware: Intel® Xeon® CPU E5-2698 v3 @ 2.30GHz (2 sockets, 16 cores each, HT=OFF),
64 GB of RAM, 8 DIMMS of 8GB@2133MHz; Operating System: Ubuntu 14.04 LTS.
55x
349x
Vectorization,
threading, and data
locality optimizations
Static compilation
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
7
Possible Improvement Approaches
How to make Python more usable in both prototyping &
production?
 Numerical/machine learning packages (NumPy/SciPy/Scikit-learn)
accelerated with native libraries (e.g. Intel® MKL)
 Python language extensions that exploit vectorization and multi-
core parallelism, e.g. Cython (via GCC/ICC), Numba (LLVM)
 Better performance profiling of Python codes
 Packages and extensions for multi-node parallelism, e.g. mpi4py
 Integration with Big Data/ML infrastructures (Hadoop, Spark)
STAC Summit, November 2015
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Configuration info: - Versions: Intel® Distribution for Python 2.7.10 Technical Preview 1 (Aug 03, 2015), Ubuntu* built Python*: Python 2.7.10, NumPy 1.9.2 built with gcc 4.8.4; Hardware: Intel® Xeon® CPU
E5-2698 v3 @ 2.30GHz (2 sockets, 16 cores each, HT=OFF), 64 GB of RAM, 8 DIMMS of 8GB@2133MHz; Operating System: Ubuntu 14.04 LTS.
1 1 1 1 12X 4X 4X 4X 4X
22X
61X
70X
77X
97X
SVD Cholesky QR Inv DGEMM
Speedup
Ubuntu
Intel, 1 thread
Intel, 32 threads
Python* Performance Boost on Select Numerical Functions
Intel® Distribution for Python (Technical Preview) vs. Ubuntu Python
N=10000 N=40000 N=30000 N=25000 N=40000
Python Numerical Packages Acceleration
STAC Summit, November 2015 8
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
9
Python Language Extensions For Performance
STAC Summit, November 2015
0.2
1
5
25
125
625
3125
15625
Python NumExpr Numba Cython C (Parallelism)
BLACK SCHOLES FORMULA
MOPTIONS/SEC
36x
88x
6x
<35%
Configuration info: - Versions: Intel® Distribution for Python 2.7.10 Technical Preview 1 (Aug 03, 2015), Ubuntu* built Python*: Python 2.7.10, NumPy 1.9.2 built with gcc 4.8.4, llvm 3.6.0, Numba 0.22.0, icc
15.0; Hardware: Intel® Xeon® CPU E5-2698 v3 @ 2.30GHz (2 sockets, 16 cores each, HT=OFF), 64 GB of RAM, 8 DIMMS of 8GB@2133MHz; Operating System: Ubuntu 14.04 LTS.
1x
1.2x
1.5x
10x6x
Development cost 1x
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Scientific/Engineering
Web/Social
Business
10
Data Analytics Usage Models
1. Pre-
Processing
2. Analysis
3. Modeling
4. Model
Validation
5.
Visualization
1. Pre-
processing
2. Model
Calibration
3. Decision
Making
4. Model
Validation
5. Reporting
Interactive Modeling
Visualization & Reporting Production Use
R*, Python*,
other 3rd party
tools
C++, Java*,
Scala*, Python*
Software
DAAL
Software
DAAL
Visual
IDE
Collaboration
Services
Data
Store
Infrastructure
(Comms, Security)
Management
(Monitoring, Rule
Management)
Client Edge
Data Source Edge
C++, JS, C#
Data Center
Workstation
SQL, noSQL,
Files,
Sensor Data
C++, JS, C#
STAC Summit, November 2015
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Intel Confidential – NDA presentation; presentation contains Intel
confidential information presented to customers with a non-disclosure
agreement
11
Big Data Requires End-To-End Solutions
What device do I run analytics?
 Perform analysis close to data source to optimize response latency, decrease network
bandwidth utilization, and maximize security.
 Offload data to cluster for large-scale analytics only.
 Make personalized decisions on a client device
Pre-processing Transformation Analysis Modeling Decision Making
Decompression,
Filtering, Normalization
Aggregation,
Dimension Reduction
Summary Statistics
Clustering, etc.
Machine Learning (Training)
Parameter Estimation
Simulation
Forecasting
Decision Trees, etc.
Scientific/Engineering
Web/Social
Business
Compute (Server, Desktop, etc ) Client EdgeData Source Edge
Validation
Hypothesis testing
Model errors
STAC Summit, November 2015
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Kernel D
Dense
Big Data Flow and Computational Flow
12
…
Observations, n
Time
Memory
Capacity
Categorical
Blank/Missing
Numeric
Attributes,p
Big Data Attributes Computational Solution
Distributed across different nodes/devices •Distributed computing, e.g. comm-avoiding algorithms
Huge data size not fitting into device
memory
•Distributed computing
•Streaming algorithms
Data coming in time •Data buffering
•Streaming algorithms
Non-homogeneous data •CategoricalNumeric (counters, histograms, etc)
•Homogeneous numeric data kernels
• Conversions, Indexing, Repacking
Sparse/Missing/Noisy data •Sparse data algorithms
•Recovery methods (bootstrapping, outlier correction)
Distributed Computing Streaming Computing
D1D2D3
Si+1 = T(Si,Di)
Ri+1 = F(Si+1)
R1
Rk
D1
D2
Dk
R2 R
Si,Ri
Offline Computing
D1Dk-
1
Dk
R
…
Append
R = F(D1,…,Dk)
Converts, Indexing,
Repacking
Data Recovery
F
D
F
I
C
C
B
F
D
F
I
C
C
B
F
D
F
I
C
C
B
F
D
F
I
C
C
B
1
2
7
4
5
6
3
C C C C8
D
D
D
D
D
D
D
D
1
2
D D D D4
D D D D7
D
D
D
D
D
D
D
D
Kernel C
Indexed
D
D
D
D
D
D
D
D
Counter
Histogram
I
I I I I
F
D
F
I
C
C
B
F
D
F
I
C
C
B
F
D
F
I
C
C
B
F
D
F
I
C
C
B
1
2
7
4
5
6
3
C C C C8
Outlier
F
D
F
I
C
C
B
F
D
F
I
C
C
B
F
D
F
I
C
C
B
F
D
F
I
C
C
B
C C C C
Recover
STAC Summit, November 2015
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
13
Intel® DAAL – Essentials for End-To-End Analytics
• C++ and Java*/Scala* library for data analytics
• Targeting Python and R interfaces in future releases
• “MKL” for machine learning with a few key
differences
• Optimizes entire data flow vs. compute part only
• Targets both data center and edge devices
• Supports offline, online and distributed data
processing
• Abstracted from cross-device communication layer
• Allows plugging in different Big Data & IoT analytics
frameworks
• Comes with samples for Hadoop*, Spark*, MPI*
• Builds upon MKL/IPP for best performance
4X
6X 6X
7X 7X
0
2
4
6
8
1M x 200 1M x 400 1M x 600 1M x 800 1M x 1000
Speedup
Table size
PCA (correlation method) on an 8-node Hadoop* cluster
based on Intel® Xeon® Processors E5-2697 v3
Configuration Info - Versions: Intel® Data Analytics Acceleration Library 2016,
CDH v5.3.1, Apache Spark* v1.2.0, Weka 3.6.12; Hardware: Intel® Xeon®
Processor E5-2699 v3, 2 Eighteen-core CPUs (45MB LLC, 2.3GHz), 128GB of
RAM per node; Operating System: CentOS 6.6 x86_64.
2.75X
12X
23X
38X
0
10
20
30
40
50
50000
transactions
250000
transactions
500000
transactions
1000000
transactions
Speedup
Apriori on Intel® Xeon® Processor E5-2699
v3
STAC Summit, November 2015
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice Intel Confidential – NDA presentation; presentation
contains Intel confidential information presented to
customers with a non-disclosure agreement
14
Summary
• Python is among top productivity languages
• Python is unfriendly to modern hardware, and hence to production use
• New tools and libraries allow making tradeoff between productivity and performance
• Intel is investing in Python to be more usable in prototyping and production
• Intel® Distribution for Python - in Tech Preview Now!
• Intel® VTune Amplifier for Python – available for evaluation Now!
• Big Data analytics requires end-to-end solutions
• Intel has “end-to-end” response for new analytics challenges
• Intel® Data Analytics Acceleration Library 2016 product available Now!
Copyright © 2015, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO
LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS
INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE,
MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information
and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when
combined with other products.
Copyright © 2015, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are
trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are
reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804
15STAC Summit, November 2015 15
Denis Nagorny - Pumping Python Performance

Contenu connexe

Tendances

Me3D: A Model-driven Methodology Expediting Embedded Device Driver Development
Me3D: A Model-driven Methodology  Expediting Embedded Device  Driver DevelopmentMe3D: A Model-driven Methodology  Expediting Embedded Device  Driver Development
Me3D: A Model-driven Methodology Expediting Embedded Device Driver Development
huichenphd
 
FPGA Camp - Intellitech Presentation
FPGA Camp - Intellitech PresentationFPGA Camp - Intellitech Presentation
FPGA Camp - Intellitech Presentation
FPGA Central
 
Chip ex 2011 faraday
Chip ex 2011 faradayChip ex 2011 faraday
Chip ex 2011 faraday
chiportal
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
guest3eed30
 

Tendances (19)

Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
 
Introduction to the Graphics Pipeline of the PS3
Introduction to the Graphics Pipeline of the PS3Introduction to the Graphics Pipeline of the PS3
Introduction to the Graphics Pipeline of the PS3
 
Developing new zynq based instruments
Developing new zynq based instrumentsDeveloping new zynq based instruments
Developing new zynq based instruments
 
5 pipeline arch_rationale
5 pipeline arch_rationale5 pipeline arch_rationale
5 pipeline arch_rationale
 
Me3D: A Model-driven Methodology Expediting Embedded Device Driver Development
Me3D: A Model-driven Methodology  Expediting Embedded Device  Driver DevelopmentMe3D: A Model-driven Methodology  Expediting Embedded Device  Driver Development
Me3D: A Model-driven Methodology Expediting Embedded Device Driver Development
 
Linux Kernel , BSP, Boot Loader, ARM Engineer - Satish profile
Linux Kernel , BSP, Boot Loader, ARM Engineer - Satish profileLinux Kernel , BSP, Boot Loader, ARM Engineer - Satish profile
Linux Kernel , BSP, Boot Loader, ARM Engineer - Satish profile
 
FPGA Camp - Intellitech Presentation
FPGA Camp - Intellitech PresentationFPGA Camp - Intellitech Presentation
FPGA Camp - Intellitech Presentation
 
Revers engineering
Revers engineeringRevers engineering
Revers engineering
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
 
SequenceL intro slideshare
SequenceL intro slideshareSequenceL intro slideshare
SequenceL intro slideshare
 
Overview of Intel® Omni-Path Architecture
Overview of Intel® Omni-Path ArchitectureOverview of Intel® Omni-Path Architecture
Overview of Intel® Omni-Path Architecture
 
Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...
Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...
Android on Windows 11 - A Developer's Perspective (Windows Subsystem For Andr...
 
Automated hardware testing using python
Automated hardware testing using pythonAutomated hardware testing using python
Automated hardware testing using python
 
Best practices for long-term support and security of the device-tree
Best practices for long-term support and security of the device-treeBest practices for long-term support and security of the device-tree
Best practices for long-term support and security of the device-tree
 
Agilent flash programming agilent utility card versus deep serial memory-ca...
Agilent flash programming   agilent utility card versus deep serial memory-ca...Agilent flash programming   agilent utility card versus deep serial memory-ca...
Agilent flash programming agilent utility card versus deep serial memory-ca...
 
Chip ex 2011 faraday
Chip ex 2011 faradayChip ex 2011 faraday
Chip ex 2011 faraday
 
EclipseCon 2011: Deciphering the CDT debugger alphabet soup
EclipseCon 2011: Deciphering the CDT debugger alphabet soupEclipseCon 2011: Deciphering the CDT debugger alphabet soup
EclipseCon 2011: Deciphering the CDT debugger alphabet soup
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Nobuya Okada presentation
Nobuya Okada presentationNobuya Okada presentation
Nobuya Okada presentation
 

En vedette

En vedette (20)

олімпіади
олімпіади олімпіади
олімпіади
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
The High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian OzsvaldThe High Performance Python Landscape by Ian Ozsvald
The High Performance Python Landscape by Ian Ozsvald
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python Integration
 
Spark + Scikit Learn- Performance Tuning
Spark + Scikit Learn- Performance TuningSpark + Scikit Learn- Performance Tuning
Spark + Scikit Learn- Performance Tuning
 
Python profiling
Python profilingPython profiling
Python profiling
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in Spark
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profiling
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production ScaleGPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
 
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleGPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
 
Getting The Best Performance With PySpark
Getting The Best Performance With PySparkGetting The Best Performance With PySpark
Getting The Best Performance With PySpark
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
 

Similaire à Denis Nagorny - Pumping Python Performance

“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
Edge AI and Vision Alliance
 

Similaire à Denis Nagorny - Pumping Python Performance (20)

Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT Platforms
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Intels presentation at blue line industrial computer seminar
Intels presentation at blue line industrial computer seminarIntels presentation at blue line industrial computer seminar
Intels presentation at blue line industrial computer seminar
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
 
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Re-architecting the Datacenter to Deliver Better Experiences (Intel)
Re-architecting the Datacenter to Deliver Better Experiences (Intel)Re-architecting the Datacenter to Deliver Better Experiences (Intel)
Re-architecting the Datacenter to Deliver Better Experiences (Intel)
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance Computing
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and Ceph
 
Go native benchmark test su dispositivi x86: java, ndk, ipp e tbb
Go native  benchmark test su dispositivi x86: java, ndk, ipp e tbbGo native  benchmark test su dispositivi x86: java, ndk, ipp e tbb
Go native benchmark test su dispositivi x86: java, ndk, ipp e tbb
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdown
 

Dernier

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Dernier (20)

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 

Denis Nagorny - Pumping Python Performance

  • 1. Sergey Maidanov Software Engineering Manager, Scripting Analyzers & Tools, Intel
  • 2. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 2 Programming Languages by Popularity Python remains #1 programming language in hiring demand followed by Java and C++ Go and Scala demonstrate strong growth for last 2 years STAC Summit, November 2015 * Source: CodeEval, Feb 2015
  • 3. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 3 Programming Languages Productivity STAC Summit, November 2015 0 1 2 3 4 5 Python Java C++ C PROGRAMMING COMPLEXITY (HOURS) Prechelt* Berkholz** 0 1 2 3 4 Python Java C++ C LANGUAGE VERBOSITY (LOC/FEATURE) Prechelt* Berkholz** * L.Prechelt, An empirical comparison of seven programming languages, IEEE Computer, 2000, Vol. 33, Issue 10, pp. 23-29 ** RedMonk - D.Berkholz, Programming languages ranked by expressiveness
  • 4. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Numerical Modeling: From Prototype To Production 4 1. Pre- Processing 2. Analysis 3. Modeling 4. Model Validation 5. Visualization 1. Pre- processing 2. Model Calibration 3. Decision Making 4. Model Validation 5. Reporting Prototyping Production Python*, R*, Matlab*, Excel* Fortran*, C++. Java*/Scala* HPC/Big Data ClusterWorkstation STAC Summit, November 2015 MPI*, OpenMP*, Hadoop*, Spark*, … 1x 3-10x and more Development cost Development cost
  • 5. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 5 Why’s Interpreted Code Unfriendly To Modern HW? STAC Summit, November 2015 Moore’s law still works and will work for at least next 10 years We have hit limits in  Power  Instruction level parallelism  Clock speed But not in  Transistors (more memory, bigger caches, wider SIMD, specialized HW)  Number of cores Flop/Byte continues growing  10x worse in last 20 years Source: J.Cownie, HPC Trends and What They Mean for Me, Imperial College, Oct. 2015 Efficient software development means  Optimizations for data locality & contiguity  Vectorization  Threading
  • 6. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 6 Performance: Native vs. Interpreted Code 0.2 1 5 25 125 625 3125 15625 Python C C (Parallelism) BLACK SCHOLES FORMULA MOPTIONS/SEC STAC Summit, November 2015 Chapter 19. Performance Optimization of Black Scholes Pricing Configuration info: - Versions: Intel® Distribution for Python 2.7.10 Technical Preview 1 (Aug 03, 2015), icc 15.0; Hardware: Intel® Xeon® CPU E5-2698 v3 @ 2.30GHz (2 sockets, 16 cores each, HT=OFF), 64 GB of RAM, 8 DIMMS of 8GB@2133MHz; Operating System: Ubuntu 14.04 LTS. 55x 349x Vectorization, threading, and data locality optimizations Static compilation
  • 7. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 7 Possible Improvement Approaches How to make Python more usable in both prototyping & production?  Numerical/machine learning packages (NumPy/SciPy/Scikit-learn) accelerated with native libraries (e.g. Intel® MKL)  Python language extensions that exploit vectorization and multi- core parallelism, e.g. Cython (via GCC/ICC), Numba (LLVM)  Better performance profiling of Python codes  Packages and extensions for multi-node parallelism, e.g. mpi4py  Integration with Big Data/ML infrastructures (Hadoop, Spark) STAC Summit, November 2015
  • 8. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Configuration info: - Versions: Intel® Distribution for Python 2.7.10 Technical Preview 1 (Aug 03, 2015), Ubuntu* built Python*: Python 2.7.10, NumPy 1.9.2 built with gcc 4.8.4; Hardware: Intel® Xeon® CPU E5-2698 v3 @ 2.30GHz (2 sockets, 16 cores each, HT=OFF), 64 GB of RAM, 8 DIMMS of 8GB@2133MHz; Operating System: Ubuntu 14.04 LTS. 1 1 1 1 12X 4X 4X 4X 4X 22X 61X 70X 77X 97X SVD Cholesky QR Inv DGEMM Speedup Ubuntu Intel, 1 thread Intel, 32 threads Python* Performance Boost on Select Numerical Functions Intel® Distribution for Python (Technical Preview) vs. Ubuntu Python N=10000 N=40000 N=30000 N=25000 N=40000 Python Numerical Packages Acceleration STAC Summit, November 2015 8
  • 9. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 9 Python Language Extensions For Performance STAC Summit, November 2015 0.2 1 5 25 125 625 3125 15625 Python NumExpr Numba Cython C (Parallelism) BLACK SCHOLES FORMULA MOPTIONS/SEC 36x 88x 6x <35% Configuration info: - Versions: Intel® Distribution for Python 2.7.10 Technical Preview 1 (Aug 03, 2015), Ubuntu* built Python*: Python 2.7.10, NumPy 1.9.2 built with gcc 4.8.4, llvm 3.6.0, Numba 0.22.0, icc 15.0; Hardware: Intel® Xeon® CPU E5-2698 v3 @ 2.30GHz (2 sockets, 16 cores each, HT=OFF), 64 GB of RAM, 8 DIMMS of 8GB@2133MHz; Operating System: Ubuntu 14.04 LTS. 1x 1.2x 1.5x 10x6x Development cost 1x
  • 10. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Scientific/Engineering Web/Social Business 10 Data Analytics Usage Models 1. Pre- Processing 2. Analysis 3. Modeling 4. Model Validation 5. Visualization 1. Pre- processing 2. Model Calibration 3. Decision Making 4. Model Validation 5. Reporting Interactive Modeling Visualization & Reporting Production Use R*, Python*, other 3rd party tools C++, Java*, Scala*, Python* Software DAAL Software DAAL Visual IDE Collaboration Services Data Store Infrastructure (Comms, Security) Management (Monitoring, Rule Management) Client Edge Data Source Edge C++, JS, C# Data Center Workstation SQL, noSQL, Files, Sensor Data C++, JS, C# STAC Summit, November 2015
  • 11. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Intel Confidential – NDA presentation; presentation contains Intel confidential information presented to customers with a non-disclosure agreement 11 Big Data Requires End-To-End Solutions What device do I run analytics?  Perform analysis close to data source to optimize response latency, decrease network bandwidth utilization, and maximize security.  Offload data to cluster for large-scale analytics only.  Make personalized decisions on a client device Pre-processing Transformation Analysis Modeling Decision Making Decompression, Filtering, Normalization Aggregation, Dimension Reduction Summary Statistics Clustering, etc. Machine Learning (Training) Parameter Estimation Simulation Forecasting Decision Trees, etc. Scientific/Engineering Web/Social Business Compute (Server, Desktop, etc ) Client EdgeData Source Edge Validation Hypothesis testing Model errors STAC Summit, November 2015
  • 12. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Kernel D Dense Big Data Flow and Computational Flow 12 … Observations, n Time Memory Capacity Categorical Blank/Missing Numeric Attributes,p Big Data Attributes Computational Solution Distributed across different nodes/devices •Distributed computing, e.g. comm-avoiding algorithms Huge data size not fitting into device memory •Distributed computing •Streaming algorithms Data coming in time •Data buffering •Streaming algorithms Non-homogeneous data •CategoricalNumeric (counters, histograms, etc) •Homogeneous numeric data kernels • Conversions, Indexing, Repacking Sparse/Missing/Noisy data •Sparse data algorithms •Recovery methods (bootstrapping, outlier correction) Distributed Computing Streaming Computing D1D2D3 Si+1 = T(Si,Di) Ri+1 = F(Si+1) R1 Rk D1 D2 Dk R2 R Si,Ri Offline Computing D1Dk- 1 Dk R … Append R = F(D1,…,Dk) Converts, Indexing, Repacking Data Recovery F D F I C C B F D F I C C B F D F I C C B F D F I C C B 1 2 7 4 5 6 3 C C C C8 D D D D D D D D 1 2 D D D D4 D D D D7 D D D D D D D D Kernel C Indexed D D D D D D D D Counter Histogram I I I I I F D F I C C B F D F I C C B F D F I C C B F D F I C C B 1 2 7 4 5 6 3 C C C C8 Outlier F D F I C C B F D F I C C B F D F I C C B F D F I C C B C C C C Recover STAC Summit, November 2015
  • 13. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 13 Intel® DAAL – Essentials for End-To-End Analytics • C++ and Java*/Scala* library for data analytics • Targeting Python and R interfaces in future releases • “MKL” for machine learning with a few key differences • Optimizes entire data flow vs. compute part only • Targets both data center and edge devices • Supports offline, online and distributed data processing • Abstracted from cross-device communication layer • Allows plugging in different Big Data & IoT analytics frameworks • Comes with samples for Hadoop*, Spark*, MPI* • Builds upon MKL/IPP for best performance 4X 6X 6X 7X 7X 0 2 4 6 8 1M x 200 1M x 400 1M x 600 1M x 800 1M x 1000 Speedup Table size PCA (correlation method) on an 8-node Hadoop* cluster based on Intel® Xeon® Processors E5-2697 v3 Configuration Info - Versions: Intel® Data Analytics Acceleration Library 2016, CDH v5.3.1, Apache Spark* v1.2.0, Weka 3.6.12; Hardware: Intel® Xeon® Processor E5-2699 v3, 2 Eighteen-core CPUs (45MB LLC, 2.3GHz), 128GB of RAM per node; Operating System: CentOS 6.6 x86_64. 2.75X 12X 23X 38X 0 10 20 30 40 50 50000 transactions 250000 transactions 500000 transactions 1000000 transactions Speedup Apriori on Intel® Xeon® Processor E5-2699 v3 STAC Summit, November 2015
  • 14. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Intel Confidential – NDA presentation; presentation contains Intel confidential information presented to customers with a non-disclosure agreement 14 Summary • Python is among top productivity languages • Python is unfriendly to modern hardware, and hence to production use • New tools and libraries allow making tradeoff between productivity and performance • Intel is investing in Python to be more usable in prototyping and production • Intel® Distribution for Python - in Tech Preview Now! • Intel® VTune Amplifier for Python – available for evaluation Now! • Big Data analytics requires end-to-end solutions • Intel has “end-to-end” response for new analytics challenges • Intel® Data Analytics Acceleration Library 2016 product available Now!
  • 15. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2015, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 15STAC Summit, November 2015 15

Notes de l'éditeur

  1. Data from Stanford CPUDB