SlideShare a Scribd company logo
1 of 36
Download to read offline
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Sheng Fu
(sheng.fu@intel.com)
August 12, 2015
Improving the performance of
OpenSubdiv* on Intel Architecture
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS
DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO
SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER
INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION
CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS
COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH
MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves
these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this
information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products.
Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar
performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase.
Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other
platforms, and assigning them a relative performance number that correlates with the performance improvements reported.
SPEC, SPECint, SPECfp, SPECrate. SPECpower, SPECjAppServer, SPECjbb, SPECjvm, SPECWeb, SPECompM, SPECompL, SPEC MPI, SPECjEnterprise* are trademarks of the Standard Performance Evaluation Corporation. See
http://www.spec.org for more information. TPC-C, TPC-H, TPC-E are trademarks of the Transaction Processing Council. See http://www.tpc.org for more information.
Hyper-Threading Technology requires a computer system with a processor supporting HT Technology and an HT Technology-enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and
software you use. For more information including details on which processors support HT Technology, see here
Intel® Turbo Boost Technology requires a Platform with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on hardware, software and overall system configuration.
Check with your platform manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see http://www.intel.com/technology/turboboost
No computer system can provide absolute security. Requires an enabled Intel® processor and software optimized for use of the technology. Consult your system manufacturer and/or software vendor for more information.
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to:
Learn About Intel® Processor Numbers
Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.
Copyright © 2014 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon and Intel Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All dates and
products specified are for planning purposes only and are subject to change without notice
*Other names and brands may be claimed as the property of others.
Legal Disclaimers
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forward-looking statements that
involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,” “estimates,” “may,” “will,” “should” and their variations
identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many
factors could affect Intel’s actual results, and variances from Intel’s current expectations regarding such factors could cause actual results to differ materially from those
expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could cause actual results to differ materially from the
company’s expectations. Demand could be different from Intel's expectations due to factors including changes in business and economic conditions; customer acceptance of
Intel’s and competitors’ products; supply constraints and other disruptions affecting customers; changes in customer order patterns including order cancellations; and changes
in the level of inventory at customers. Uncertainty in global economic and financial conditions poses a risk that consumers and businesses may defer purchases in response to
negative financial events, which could negatively affect product demand and other related matters. Intel operates in intensely competitive industries that are characterized by
a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross
margin percentage are affected by the timing of Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors,
including product offerings and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to
technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on capacity
utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the
timing and execution of the manufacturing ramp and associated costs; start-up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of
materials or resources; product manufacturing quality/yields; and impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's
results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including
military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Expenses, particularly certain
marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the level of
revenue and profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by adverse effects associated with
product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust,
disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an
injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or
requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion of these and other factors that could affect Intel’s results is included in
Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release.
Risk Factors
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Introduction to OpenSubdiv*
 Optimizing subdivision kernel with ICC
 Optimizing patch evaluation with ISPC
 Embree Viewer: a demo to render animated subdivision
surface interactively on Intel architecture
Agenda
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Start from a polygon control mesh
 Apply subdivision rule recursively to get the limit surface
What is a subdivision surface
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Support arbitrary topology
 Smooth
 Deform efficiently for animation
Why have subdivision surfaces been
extensively used in the DCC industry?
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Open source libraries that implement high performance
subdivision surface evaluation on CPU and GPU
 Optimized for drawing deforming surfaces with static topology at
interactive frame rates
 Match the RenderMan* specification
What is OpenSubdiv*?
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Pipeline to render subdivision surfaces
Feature adaptive
subdivision to get patches
Evaluate patches to
tessellate patches into
triangles
Render triangle meshes
Control mesh
Patches
triangles
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimizing a subdivision kernel
• How a subdivision kernel works:
Compute vertex data for a vertex in
the new level by summing weighted
vertex data of surrounding vertices
in the current level
v1, w1 v2, w2
v3, w3 v4, w4
vnew=v1*w1+v2*w2+v3*w3+v4*w4
for (int i=start; i<end; ++i) {
for (int k = 0; k<numElems; ++k)
result[k] = 0.0f;
for (int j=0; j<sizes[i]; ++j, ++indices, ++weights) {
src = vertexSrc + (*indices)*numElems;
weight = *weights;
for (int k=0; k<numElems; ++k) {
result[k] += src[k] * weight;
}
}
dst = vertexDst + i*numElems;
memcpy(dst, result, numElems*sizeof(float));
}
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Vectorizing a subdivision kernel with ICC
auto vectorization
• What is vectorization?
• Converts scalar code to SIMD code
• What is ICC auto-vectorization?
• ICC automatically identifies and generates packed SIMD instructions to
unroll a loop
• Only the most inner loop can be auto-vectorized
• Use pragmas to help ICC vectorize the loop
#pragma ivdep, #pragma SIMD, #pragma vector align …
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimizing a subdivision kernel with ICC
for (int i=start; i<end; ++i) {
for (int k = 0; k<numElems; ++k)
result[k] = 0.0f;
for (int j=0; j<sizes[i]; ++j, ++indices, ++weights) {
src = vertexSrc + (*indices)*numElems;
weight = *weights;
#pragma simd
#pragma vector aligned
for (int k=0; k<numElems; ++k) {
result[k] += src[k] * weight;
}
}
dst = vertexDst + i*numElems;
memcpy(dst, result, numElems*sizeof(float));
}
This loop got vectorized
Accumulated in a local
variable to avoid extra
memory copy
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimizing a subdivision kernel with ICC
• Align vertex data to get better
performance
• vertex data must be aligned
on 4 floats or 8 floats
• Subdivision kernel uses a
template to remove the cost of
the loop
When numElems is a constant of 4
or 8, the highlighted loop can be
converted to two SIMD multiply
and add instructions, or one FMA
instruction.
template <int numElems> void
ComputeStencilKernel(
……….
#pragma simd
#pragma vector aligned
for (int k=0; k<numElems; ++k) {
result[k] += src[k] * weight;
}
………..
}
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimizing a subdivision kernel with ICC
Subdivision kernel time, collected from glViewer, CPU
kernel, subd level = 2, tessellation level = 1
Data collected on 2 socket 20 core IvyBridge
GCC 1.6ms 0.5ms 0.9ms
ICC 0.46ms 0.15ms 0.24ms
Speedup 3.5x 3.3x 3.8x
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Parallelize a subdivision kernel with TBB
• TBB is an open-source task-based parallel-
programming library
• The OpenSubdiv* TBB kernel uses TBB
parallel_for to parallelize the subdivision kernel
• TBB parallel_for can also be used on the subd
mesh level to achieve better load balancing
Run tbb parallel_for on an array of sub mesh
{
Run tbb parallel_for an array of subdivision kernel
{
}
}
Pseudo code for nested parallel_for
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Parallelize a subdivision kernel with TBB
VTune Amplifier threads profiling
(collected on 20 core IvyBridge)
CPU utilization for nested
parallel_for
CPU utilization for parallel_for
only on subdivision kernel
Performance result:
Total number of meshes: 222
Minimum control face number: 28
Maximum control face number: 60,038
Wall clock time for
“parallel for only on
subdivision level” 5ms
Wall clock time for
“nested parallel for” 2.6ms
(2x speedup)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimizing patch evaluation with ISPC
Feature adaptive
subdivision to get patches
Evaluate patches to
tessellate patches into
triangles
Render triangle meshes
Control faces
Patches
triangles
Subdivision surface render pipeline
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Step1: bundle sample points for the same patch
Benefit of bundling sample points:
• Only need to gather vertex data for a patch once
• Get ready for evaluating patch with SIMD
The data layout in the patch coordinate buffer for bundled samples
points:Array
Index
1
Patch
Index
1
Vertex
Index
1
S
1
T
1
PatchCoord
Array
Index
1
Patch
Index
1
Vertex
Index
1
S
2
T
2
PatchCoord for one patch
Array
Index
n
Patch
Index
n
Vertex
Index
n
S T
PatchCoord
PatchCoord for another patch
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Step2: Evaluate a patch with ISPC
What is Intel SPMD (single program multiple data) Program Compiler (ISPC) ?
• An open-source language and compiler for Intel SIMD architectures
• ISPC is NOT an “autovectorizing” compiler!
• It does not generate vector code by analyzing and transforming scalar loops,
such as ICC.
• ISPC is more of a “WYSIWYG” vectorizing compiler
• The programmer tells ISPC what is vector and what is scalar
• Vector types are explicit, not discovered.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Step2: Evaluate a patch with ISPC
ISPC Language
• Familiar C-based syntax
• Code like sequential algorithms, but executes in parallel (SPMD)
• Easily mixes scalar and vector computation
• Two new type modifiers (uniform and varying) distinguish between scalar
and vector data types
• Easily interoperates with C/C++
• You can call C/C++ code from ISPC functions, or call ISPC code from C/C++ code
• Passing pointers between ISPC and C/C++ code just works
• Efficient data layout
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
ISPC Example – an ISPC function
Export void simple(uniform float vin[], uniform float vout[],
uniform int count) {
foreach (int index = 0 ... count) {
varying float v = vin[index];
if (v < 3.)
v = v * v;
else
v = sqrt(v);
vout[index] = v;
}
}
Visible from C Scalar input type
The “foreach” statement provides
automatic multi-dimensional traversal
of iteration space, optimal code
generation for fully-vectorized
iterations, and automatic remainder
loop generation
Vector type
Varying (default) loop index, so
“vector-width” number of iterations
are done at once (depending on
compile target), one loop iteration per
vector “lane” with masking
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
ISPC Example – C code that calls it
#include <stdio.h>
#include "simple.h"
int main() {
float vin[16], vout[16];
for (int i = 0; i < 16; ++i)
vin[i] = i;
simple(vin, vout, 16);
for (int i = 0; i < 16; ++i)
printf("%d: simple(%f) = %fn", i, vin[i], vout[i]);
}
Call ISPC Function
0: simple(0.000000) = 0.000000
1: simple(1.000000) = 1.000000
2: simple(2.000000) = 4.000000
3: simple(3.000000) = 1.732051
...
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
ISPC patch evaluation: gathering control points
uniform Point controlVertices[16];
for(uniform int i=0; i<16; i++) {
uniform unsigned int id = vertexIndices[i];
uniform const float * uniform pVertex;
pVertex = inQ + inDesc.offset + id * inDesc.stride;
controlVertices[i].x = pVertex[0];
controlVertices[i].y = pVertex[1];
controlVertices[i].z = pVertex[2];
pVertex += 3;
}
• Gathering only needs to
be done once for each
patch, since sampling
points are sorted by patch
handle
• Data are uniform, no
SIMD yet.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
ISPC patch evaluation: vectorized patch evaluation
foreach( n = 0 ... nPoint) {
float sWeights[4], tWeights[4];
getBSplineWeights(s, sWeights);
getBSplineWeights (t, tWeights);
float weight[16];
for (uniform int i = 0; i < 4; ++i)
for (uniform int j = 0; j < 4; ++j) {
weight[4*i+j] = sWeights[j] * tWeights[i];
}
float *pOutQ = outQ + outDesc.offset + n * outDesc.stride;
for(uniform int c=0; c<nChannel; c++) {
uniform int offset = c * 16;
Point Q;
Q.x = Q.y = Q.z = 0.0;
for (uniform int i=0; i<16; ++i) {
Q = Q + weight[i] * controlVertices[offset + i];
}
*pOutQ ++ = Q.x, *pOutQ ++ = Q.y, *pOutQ ++ = Q.z;
}
}
inline void
getBSplineWeights(float t, float point[4]) {
float const one6th = 1.0f / 6.0f;
float t2 = t * t;
float t3 = t * t2;
point[0] = one6th * (1.0f - 3.0f*(t -t2) -t3);
point[1] = one6th * (4.0f - 6.0f*t2 + 3.0f*t3);
point[2] = one6th * (1.0f + 3.0f*(t +t2 -t3));
point[3] = one6th * ( t3);
}
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Parallelize ISPC patch evaluation with TBB
tbb::blocked_range<int> range = tbb::blocked_range<int>(0, numPatchCoords, grain_size);
tbb::parallel_for(range, [&](const tbb::blocked_range<int> &r)
{
int i = r.begin();
while (i < r.end()) {
int nCoord = 1;
Far::PatchTable::PatchHandle handle = patchCoords[i].handle;
while(i + nCoord < r.end() && handle.isEqual(patchCoords[i + nCoord].handle) )
nCoord ++;
__declspec( align(64) ) float u[nCoord], v[nCoord];
for(int n=0; n<nCoord; n++)
u[n] = patchCoords[i + n].s; v[n] = patchCoords[i + n].t;
ispc::evalPatch(nCoord, u, v, …);
i += nCoord;
}
});
Search sampling
points that belong
to the same patch
Put UV into a local
array
Call ispc evaluation
function
Run tbb
parallel_for on all
sampling points
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
ISPC patch evaluation performance data
Sampling
points
ISPC TBB
Single
Thread(ms)
20 Threads(ms) Single
Thread(ms)
20 Threads
65536 2.5(3.6x) 0.5(1.25x) 7.1 0.6
655360 12(5.8x) 3.0(2.1x) 70 6.3
Performance data collected in glLimitEval, subdivision level = 3,
vertex animation is turned off, CPU: two socket 20 core IvyBridge
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Demo: Embree Viewer
Feature adaptive
subdivision to get patches
Evaluate patches to
tessellate patches into
triangles
Render triangle meshes
Control faces
Patches
triangles
Subdivision surface render pipeline
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Demo: Embree Viewer
• Similar demo as glViewer
• Complete CPU based solution: subdivision, tessellation, and
rendering are all on the CPU
• Ray tracing-based rendering with Embree, an open source ray
tracing kernel
• High quality rendering, support shadows.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Demo: Embree Viewer
• Step1: feature-adaptive subdivision to generate patches with the
TBB subdivision kernel
patch1 patch2
patch3 patch4
patch5
patch6
patch7
Patches generated with
subdivision level = 1
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Demo: Embree Viewer
• Step2: uniformly tessellate patch into triangles, using ispcEvaluator
to evaluate tessellation points on the limit surface
Patched tessellated with
tessellation level = 1
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Demo: Embree Viewer
Step3: render mesh with Embree
• Create an Embree scene: rtcNewScene
• Create a Embree triangle mesh: rtcNewTriangleMesh
• Pass the vertex buffer and index buffer representing the
triangle mesh to Embree: rtcSetBuffer
• Update BVH when mesh positions are updated: rtcUpdate
• Build Embree BVH: rtcCommit
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Demo: Embree Viewer
Step3: render mesh with Embree
• Divide screen to 8x8 tiles, and use TBB parallel_for to render
each tile in parallel
• Fire 8 packed primary rays and test for intersections using
SIMD: rtcIntersect8
• Fire a shadow ray for each intersected ray: rtOccluded
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Demo: Embree Viewer
Model: toy car
Shadow: off
Patches: 11,331
Triangle: 201,600
Vertices: 171,283
Subd level: 2
Tess level: 1
FPS: 72
Resolution: 800x800
CPU: two socket 20 core IvyBridge
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Demo: Embree Viewer
Model: toy car
Shadow: on
Patches: 11,331
Triangle: 201,600
Vertices: 171,283
Subd level: 2
Tess level: 1
FPS: 45
Resolution: 800x800
CPU: two socket 20 core IvyBridge
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Demo: Embree Viewer
Model: toy car
Patches: 11,331
Triangle: 201,600
Vertices: 171,283
Subd level: 2
Tess level: 1
Resolution: 800x800
CPU: two socket 20 core IvyBridge
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Links for tools and libraries mentioned in this presentation
Optimized OpenSubdiv is in the following fork:
https://github.com/shengfuintel/OpenSubdiv, checkout branch intel
Intel tools and libraries mentioned in this presentation:
• ICC: https://software.intel.com/en-us/c-compilers
• ISPC: https://ispc.github.io
• Embree: https://embree.github.io
• VTune Amplifier: https://software.intel.com/en-us/intel-vtune-amplifier-xe
• TBB: https://www.threadingbuildingblocks.org/
C o p y r i g h t © 2 0 1 5 , I n t e l C o r p o r a t i o n . A l l r i g h t s r e s e r v e d . *O t h e r n a me s a n d b r a n d s ma y b e c l a i me d a s t h e p r o p e r t y o f o t h e r s .

More Related Content

What's hot

Transforming Business with Advanced Analytics
Transforming Business with Advanced AnalyticsTransforming Business with Advanced Analytics
Transforming Business with Advanced Analytics
Intel IT Center
 
LF_DPDK17_The Path to Data Plane Microservices
LF_DPDK17_The Path to Data Plane MicroservicesLF_DPDK17_The Path to Data Plane Microservices
LF_DPDK17_The Path to Data Plane Microservices
LF_DPDK
 
LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int...
LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int...LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int...
LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int...
LF_DPDK
 
D101 ggc techprodspec
D101 ggc techprodspecD101 ggc techprodspec
D101 ggc techprodspec
IMI CALULU
 
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance testsLF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK
 

What's hot (15)

Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
 
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
 
Transforming Business with Advanced Analytics
Transforming Business with Advanced AnalyticsTransforming Business with Advanced Analytics
Transforming Business with Advanced Analytics
 
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...
 
LF_DPDK17_The Path to Data Plane Microservices
LF_DPDK17_The Path to Data Plane MicroservicesLF_DPDK17_The Path to Data Plane Microservices
LF_DPDK17_The Path to Data Plane Microservices
 
LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int...
LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int...LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int...
LF_DPDK17_Reducing Barriers to Adoption - Making DPDK Easier to Integrate int...
 
Intel® desktop board
Intel® desktop boardIntel® desktop board
Intel® desktop board
 
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XE
 
D101 ggc techprodspec
D101 ggc techprodspecD101 ggc techprodspec
D101 ggc techprodspec
 
TDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devicesTDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devices
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
 
EARS: The Easy Approach to Requirements Syntax
EARS: The Easy Approach to Requirements SyntaxEARS: The Easy Approach to Requirements Syntax
EARS: The Easy Approach to Requirements Syntax
 
E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case
 
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance testsLF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
 
Launch X-431 Diagun V product introduction
Launch X-431 Diagun V product introductionLaunch X-431 Diagun V product introduction
Launch X-431 Diagun V product introduction
 

Similar to Improving the performance of OpenSubdiv* on Intel Architecture

Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
IntelAPAC
 
Intel Mobile Launch Information
Intel Mobile Launch InformationIntel Mobile Launch Information
Intel Mobile Launch Information
Anna Yovka
 
Crooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinumCrooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinum
Alan Frost
 

Similar to Improving the performance of OpenSubdiv* on Intel Architecture (20)

Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro...
Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro...Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro...
Advancing Science in Alternative Energy and Bioengineering with Many-Core Pro...
 
Intel HPC Update
Intel HPC UpdateIntel HPC Update
Intel HPC Update
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
 
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013
 
Austin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelAustin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intel
 
VIOPS08: マイクロサーバー アーキテクチャトレンド
VIOPS08: マイクロサーバー アーキテクチャトレンドVIOPS08: マイクロサーバー アーキテクチャトレンド
VIOPS08: マイクロサーバー アーキテクチャトレンド
 
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenhoO uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
 
8 intel network builders overview
8 intel network builders overview8 intel network builders overview
8 intel network builders overview
 
Intel® AI: Reinforcement Learning Coach
Intel® AI:  Reinforcement Learning Coach Intel® AI:  Reinforcement Learning Coach
Intel® AI: Reinforcement Learning Coach
 
AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12
 
Yocto Project Open Source Build System and Collaboration Initiative
Yocto Project Open Source Build System and Collaboration InitiativeYocto Project Open Source Build System and Collaboration Initiative
Yocto Project Open Source Build System and Collaboration Initiative
 
Intel: мобильность и трансформация рабочего места
Intel: мобильность и трансформация рабочего местаIntel: мобильность и трансформация рабочего места
Intel: мобильность и трансформация рабочего места
 
Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010
 
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
 
4 dpdk roadmap(1)
4 dpdk roadmap(1)4 dpdk roadmap(1)
4 dpdk roadmap(1)
 
Intel Mobile Launch Information
Intel Mobile Launch InformationIntel Mobile Launch Information
Intel Mobile Launch Information
 
E20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAIE20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAI
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
 
Como criar um mundo autônomo e conectado - Jomar Silva
Como criar um mundo autônomo e conectado - Jomar SilvaComo criar um mundo autônomo e conectado - Jomar Silva
Como criar um mundo autônomo e conectado - Jomar Silva
 
Crooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinumCrooke CWF Keynote FINAL final platinum
Crooke CWF Keynote FINAL final platinum
 

More from Intel® Software

More from Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Recently uploaded (20)

%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

Improving the performance of OpenSubdiv* on Intel Architecture

  • 1. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Sheng Fu (sheng.fu@intel.com) August 12, 2015 Improving the performance of OpenSubdiv* on Intel Architecture
  • 2. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported. SPEC, SPECint, SPECfp, SPECrate. SPECpower, SPECjAppServer, SPECjbb, SPECjvm, SPECWeb, SPECompM, SPECompL, SPEC MPI, SPECjEnterprise* are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information. TPC-C, TPC-H, TPC-E are trademarks of the Transaction Processing Council. See http://www.tpc.org for more information. Hyper-Threading Technology requires a computer system with a processor supporting HT Technology and an HT Technology-enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. For more information including details on which processors support HT Technology, see here Intel® Turbo Boost Technology requires a Platform with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on hardware, software and overall system configuration. Check with your platform manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see http://www.intel.com/technology/turboboost No computer system can provide absolute security. Requires an enabled Intel® processor and software optimized for use of the technology. Consult your system manufacturer and/or software vendor for more information. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to: Learn About Intel® Processor Numbers Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. Copyright © 2014 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon and Intel Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All dates and products specified are for planning purposes only and are subject to change without notice *Other names and brands may be claimed as the property of others. Legal Disclaimers
  • 3. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forward-looking statements that involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,” “estimates,” “may,” “will,” “should” and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel’s actual results, and variances from Intel’s current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could cause actual results to differ materially from the company’s expectations. Demand could be different from Intel's expectations due to factors including changes in business and economic conditions; customer acceptance of Intel’s and competitors’ products; supply constraints and other disruptions affecting customers; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Uncertainty in global economic and financial conditions poses a risk that consumers and businesses may defer purchases in response to negative financial events, which could negatively affect product demand and other related matters. Intel operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross margin percentage are affected by the timing of Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; start-up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; product manufacturing quality/yields; and impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Expenses, particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the level of revenue and profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion of these and other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release. Risk Factors
  • 4. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Introduction to OpenSubdiv*  Optimizing subdivision kernel with ICC  Optimizing patch evaluation with ISPC  Embree Viewer: a demo to render animated subdivision surface interactively on Intel architecture Agenda
  • 5. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Start from a polygon control mesh  Apply subdivision rule recursively to get the limit surface What is a subdivision surface
  • 6. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Support arbitrary topology  Smooth  Deform efficiently for animation Why have subdivision surfaces been extensively used in the DCC industry?
  • 7. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Open source libraries that implement high performance subdivision surface evaluation on CPU and GPU  Optimized for drawing deforming surfaces with static topology at interactive frame rates  Match the RenderMan* specification What is OpenSubdiv*?
  • 8. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Pipeline to render subdivision surfaces Feature adaptive subdivision to get patches Evaluate patches to tessellate patches into triangles Render triangle meshes Control mesh Patches triangles
  • 9. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimizing a subdivision kernel • How a subdivision kernel works: Compute vertex data for a vertex in the new level by summing weighted vertex data of surrounding vertices in the current level v1, w1 v2, w2 v3, w3 v4, w4 vnew=v1*w1+v2*w2+v3*w3+v4*w4 for (int i=start; i<end; ++i) { for (int k = 0; k<numElems; ++k) result[k] = 0.0f; for (int j=0; j<sizes[i]; ++j, ++indices, ++weights) { src = vertexSrc + (*indices)*numElems; weight = *weights; for (int k=0; k<numElems; ++k) { result[k] += src[k] * weight; } } dst = vertexDst + i*numElems; memcpy(dst, result, numElems*sizeof(float)); }
  • 10. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Vectorizing a subdivision kernel with ICC auto vectorization • What is vectorization? • Converts scalar code to SIMD code • What is ICC auto-vectorization? • ICC automatically identifies and generates packed SIMD instructions to unroll a loop • Only the most inner loop can be auto-vectorized • Use pragmas to help ICC vectorize the loop #pragma ivdep, #pragma SIMD, #pragma vector align …
  • 11. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimizing a subdivision kernel with ICC for (int i=start; i<end; ++i) { for (int k = 0; k<numElems; ++k) result[k] = 0.0f; for (int j=0; j<sizes[i]; ++j, ++indices, ++weights) { src = vertexSrc + (*indices)*numElems; weight = *weights; #pragma simd #pragma vector aligned for (int k=0; k<numElems; ++k) { result[k] += src[k] * weight; } } dst = vertexDst + i*numElems; memcpy(dst, result, numElems*sizeof(float)); } This loop got vectorized Accumulated in a local variable to avoid extra memory copy
  • 12. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimizing a subdivision kernel with ICC • Align vertex data to get better performance • vertex data must be aligned on 4 floats or 8 floats • Subdivision kernel uses a template to remove the cost of the loop When numElems is a constant of 4 or 8, the highlighted loop can be converted to two SIMD multiply and add instructions, or one FMA instruction. template <int numElems> void ComputeStencilKernel( ………. #pragma simd #pragma vector aligned for (int k=0; k<numElems; ++k) { result[k] += src[k] * weight; } ……….. }
  • 13. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimizing a subdivision kernel with ICC Subdivision kernel time, collected from glViewer, CPU kernel, subd level = 2, tessellation level = 1 Data collected on 2 socket 20 core IvyBridge GCC 1.6ms 0.5ms 0.9ms ICC 0.46ms 0.15ms 0.24ms Speedup 3.5x 3.3x 3.8x
  • 14. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Parallelize a subdivision kernel with TBB • TBB is an open-source task-based parallel- programming library • The OpenSubdiv* TBB kernel uses TBB parallel_for to parallelize the subdivision kernel • TBB parallel_for can also be used on the subd mesh level to achieve better load balancing Run tbb parallel_for on an array of sub mesh { Run tbb parallel_for an array of subdivision kernel { } } Pseudo code for nested parallel_for
  • 15. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Parallelize a subdivision kernel with TBB VTune Amplifier threads profiling (collected on 20 core IvyBridge) CPU utilization for nested parallel_for CPU utilization for parallel_for only on subdivision kernel Performance result: Total number of meshes: 222 Minimum control face number: 28 Maximum control face number: 60,038 Wall clock time for “parallel for only on subdivision level” 5ms Wall clock time for “nested parallel for” 2.6ms (2x speedup)
  • 16. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimizing patch evaluation with ISPC Feature adaptive subdivision to get patches Evaluate patches to tessellate patches into triangles Render triangle meshes Control faces Patches triangles Subdivision surface render pipeline
  • 17. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Step1: bundle sample points for the same patch Benefit of bundling sample points: • Only need to gather vertex data for a patch once • Get ready for evaluating patch with SIMD The data layout in the patch coordinate buffer for bundled samples points:Array Index 1 Patch Index 1 Vertex Index 1 S 1 T 1 PatchCoord Array Index 1 Patch Index 1 Vertex Index 1 S 2 T 2 PatchCoord for one patch Array Index n Patch Index n Vertex Index n S T PatchCoord PatchCoord for another patch
  • 18. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Step2: Evaluate a patch with ISPC What is Intel SPMD (single program multiple data) Program Compiler (ISPC) ? • An open-source language and compiler for Intel SIMD architectures • ISPC is NOT an “autovectorizing” compiler! • It does not generate vector code by analyzing and transforming scalar loops, such as ICC. • ISPC is more of a “WYSIWYG” vectorizing compiler • The programmer tells ISPC what is vector and what is scalar • Vector types are explicit, not discovered.
  • 19. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Step2: Evaluate a patch with ISPC ISPC Language • Familiar C-based syntax • Code like sequential algorithms, but executes in parallel (SPMD) • Easily mixes scalar and vector computation • Two new type modifiers (uniform and varying) distinguish between scalar and vector data types • Easily interoperates with C/C++ • You can call C/C++ code from ISPC functions, or call ISPC code from C/C++ code • Passing pointers between ISPC and C/C++ code just works • Efficient data layout
  • 20. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. ISPC Example – an ISPC function Export void simple(uniform float vin[], uniform float vout[], uniform int count) { foreach (int index = 0 ... count) { varying float v = vin[index]; if (v < 3.) v = v * v; else v = sqrt(v); vout[index] = v; } } Visible from C Scalar input type The “foreach” statement provides automatic multi-dimensional traversal of iteration space, optimal code generation for fully-vectorized iterations, and automatic remainder loop generation Vector type Varying (default) loop index, so “vector-width” number of iterations are done at once (depending on compile target), one loop iteration per vector “lane” with masking
  • 21. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. ISPC Example – C code that calls it #include <stdio.h> #include "simple.h" int main() { float vin[16], vout[16]; for (int i = 0; i < 16; ++i) vin[i] = i; simple(vin, vout, 16); for (int i = 0; i < 16; ++i) printf("%d: simple(%f) = %fn", i, vin[i], vout[i]); } Call ISPC Function 0: simple(0.000000) = 0.000000 1: simple(1.000000) = 1.000000 2: simple(2.000000) = 4.000000 3: simple(3.000000) = 1.732051 ...
  • 22. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. ISPC patch evaluation: gathering control points uniform Point controlVertices[16]; for(uniform int i=0; i<16; i++) { uniform unsigned int id = vertexIndices[i]; uniform const float * uniform pVertex; pVertex = inQ + inDesc.offset + id * inDesc.stride; controlVertices[i].x = pVertex[0]; controlVertices[i].y = pVertex[1]; controlVertices[i].z = pVertex[2]; pVertex += 3; } • Gathering only needs to be done once for each patch, since sampling points are sorted by patch handle • Data are uniform, no SIMD yet.
  • 23. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. ISPC patch evaluation: vectorized patch evaluation foreach( n = 0 ... nPoint) { float sWeights[4], tWeights[4]; getBSplineWeights(s, sWeights); getBSplineWeights (t, tWeights); float weight[16]; for (uniform int i = 0; i < 4; ++i) for (uniform int j = 0; j < 4; ++j) { weight[4*i+j] = sWeights[j] * tWeights[i]; } float *pOutQ = outQ + outDesc.offset + n * outDesc.stride; for(uniform int c=0; c<nChannel; c++) { uniform int offset = c * 16; Point Q; Q.x = Q.y = Q.z = 0.0; for (uniform int i=0; i<16; ++i) { Q = Q + weight[i] * controlVertices[offset + i]; } *pOutQ ++ = Q.x, *pOutQ ++ = Q.y, *pOutQ ++ = Q.z; } } inline void getBSplineWeights(float t, float point[4]) { float const one6th = 1.0f / 6.0f; float t2 = t * t; float t3 = t * t2; point[0] = one6th * (1.0f - 3.0f*(t -t2) -t3); point[1] = one6th * (4.0f - 6.0f*t2 + 3.0f*t3); point[2] = one6th * (1.0f + 3.0f*(t +t2 -t3)); point[3] = one6th * ( t3); }
  • 24. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Parallelize ISPC patch evaluation with TBB tbb::blocked_range<int> range = tbb::blocked_range<int>(0, numPatchCoords, grain_size); tbb::parallel_for(range, [&](const tbb::blocked_range<int> &r) { int i = r.begin(); while (i < r.end()) { int nCoord = 1; Far::PatchTable::PatchHandle handle = patchCoords[i].handle; while(i + nCoord < r.end() && handle.isEqual(patchCoords[i + nCoord].handle) ) nCoord ++; __declspec( align(64) ) float u[nCoord], v[nCoord]; for(int n=0; n<nCoord; n++) u[n] = patchCoords[i + n].s; v[n] = patchCoords[i + n].t; ispc::evalPatch(nCoord, u, v, …); i += nCoord; } }); Search sampling points that belong to the same patch Put UV into a local array Call ispc evaluation function Run tbb parallel_for on all sampling points
  • 25. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. ISPC patch evaluation performance data Sampling points ISPC TBB Single Thread(ms) 20 Threads(ms) Single Thread(ms) 20 Threads 65536 2.5(3.6x) 0.5(1.25x) 7.1 0.6 655360 12(5.8x) 3.0(2.1x) 70 6.3 Performance data collected in glLimitEval, subdivision level = 3, vertex animation is turned off, CPU: two socket 20 core IvyBridge
  • 26. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Demo: Embree Viewer Feature adaptive subdivision to get patches Evaluate patches to tessellate patches into triangles Render triangle meshes Control faces Patches triangles Subdivision surface render pipeline
  • 27. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Demo: Embree Viewer • Similar demo as glViewer • Complete CPU based solution: subdivision, tessellation, and rendering are all on the CPU • Ray tracing-based rendering with Embree, an open source ray tracing kernel • High quality rendering, support shadows.
  • 28. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Demo: Embree Viewer • Step1: feature-adaptive subdivision to generate patches with the TBB subdivision kernel patch1 patch2 patch3 patch4 patch5 patch6 patch7 Patches generated with subdivision level = 1
  • 29. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Demo: Embree Viewer • Step2: uniformly tessellate patch into triangles, using ispcEvaluator to evaluate tessellation points on the limit surface Patched tessellated with tessellation level = 1
  • 30. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Demo: Embree Viewer Step3: render mesh with Embree • Create an Embree scene: rtcNewScene • Create a Embree triangle mesh: rtcNewTriangleMesh • Pass the vertex buffer and index buffer representing the triangle mesh to Embree: rtcSetBuffer • Update BVH when mesh positions are updated: rtcUpdate • Build Embree BVH: rtcCommit
  • 31. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Demo: Embree Viewer Step3: render mesh with Embree • Divide screen to 8x8 tiles, and use TBB parallel_for to render each tile in parallel • Fire 8 packed primary rays and test for intersections using SIMD: rtcIntersect8 • Fire a shadow ray for each intersected ray: rtOccluded
  • 32. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Demo: Embree Viewer Model: toy car Shadow: off Patches: 11,331 Triangle: 201,600 Vertices: 171,283 Subd level: 2 Tess level: 1 FPS: 72 Resolution: 800x800 CPU: two socket 20 core IvyBridge
  • 33. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Demo: Embree Viewer Model: toy car Shadow: on Patches: 11,331 Triangle: 201,600 Vertices: 171,283 Subd level: 2 Tess level: 1 FPS: 45 Resolution: 800x800 CPU: two socket 20 core IvyBridge
  • 34. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Demo: Embree Viewer Model: toy car Patches: 11,331 Triangle: 201,600 Vertices: 171,283 Subd level: 2 Tess level: 1 Resolution: 800x800 CPU: two socket 20 core IvyBridge
  • 35. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Links for tools and libraries mentioned in this presentation Optimized OpenSubdiv is in the following fork: https://github.com/shengfuintel/OpenSubdiv, checkout branch intel Intel tools and libraries mentioned in this presentation: • ICC: https://software.intel.com/en-us/c-compilers • ISPC: https://ispc.github.io • Embree: https://embree.github.io • VTune Amplifier: https://software.intel.com/en-us/intel-vtune-amplifier-xe • TBB: https://www.threadingbuildingblocks.org/
  • 36. C o p y r i g h t © 2 0 1 5 , I n t e l C o r p o r a t i o n . A l l r i g h t s r e s e r v e d . *O t h e r n a me s a n d b r a n d s ma y b e c l a i me d a s t h e p r o p e r t y o f o t h e r s .