SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Notes about moving
from Python to C++
Yung-Yu Chen (@yungyuc)

PyConTW 2020, Tainan, Taiwan
1
Why move?
• Numerical software: Computer programs to solve scientific or
mathematic problems.

• Other names: Mathematical software, scientific software, technical
software.

• Python is a popular language for application experts to describe the
problems and solutions, because it is easy to use.

• Most of the computing systems (the numerical software) are designed in
a hybrid architecture.

• The computing kernel uses C++.

• Python is chosen for the user-level API.
2
What I do
• By training, I am a computational scientist, focusing on
applications of continuum mechanics. (I am not a
computer scientist.)

• In my day job, I write high-performance code for
semiconductor applications of computational geometry
and photolithography.

• In my spare time, I am teaching a course ‘numerical
software development’ in the department of computer
science in NCTU.
3
twitter: https://twitter.com/yungyuc

linkedin: https://www.linkedin.com/in/yungyuc/.
Example: PDEs
4
Numerical simulations of
conservation laws:

∂u
∂t
+
3
∑
k=1
∂F(k)
(u)
∂xk
= 0
Use case: stress waves in 

anisotropic solids
Use case: compressible flows
Example: OPC
5
photoresist
silicon substrate
photomask
light source
Photolithography in semiconductor fabrication
wave length is only
hundreds of nm
image I want to
project on the PR
shape I need
on the mask
Optical proximity correction (OPC)
(smaller than the
wave length)
write code to
make it happen
Python: the good is the bad
• Dynamicity enables
convention over
compilation for productivity

• It is so when you have
good testing and
documentation
import math
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
@property
def length(self):
return math.hypot(self.x, self.y)
5.0print(Point(3, 4).length)
print(Point("3", "4").length) Traceback (most recent call last):
File "ambiguity.py", line 10, in <module>
print(Point("3", "4").length)
File "ambiguity.py", line 8, in length
return math.hypot(self.x, self.y)
TypeError: a float is required
6
What is fast?
worker(Data&, unsigned long, int):
mov DWORD PTR [rdi+rsi*4], edx
mov rax, rdi
ret
class Data {
int m_storage[1024];
public:
int & operator[] (size_t it) { return m_storage[it]; }
};
Data & worker(Data & data, size_t idx, int value) {
data[idx] = value;
return data;
}
High-level language
Machine code
7
Dynamic typing is slow
• Python trades runtime for convenience; everything is
indirect and incurs runtime overhead
def work(data, key, name, value):
# Assign value to the data as a container with key.
data[key] = value
# Assign value to the data as an object with name.
setattr(data, name, value)
return data
8
• The innocent-looking Python code uses many checking
functions under the hood
int PyObject_SetItem(PyObject *o, PyObject *key, PyObject *value)
{
PyMappingMethods *m;
if (o == NULL || key == NULL || value == NULL) {
null_error();
return -1;
}
m = o->ob_type->tp_as_mapping;
if (m && m->mp_ass_subscript)
return m->mp_ass_subscript(o, key, value);
if (o->ob_type->tp_as_sequence) {
if (PyIndex_Check(key)) {
Py_ssize_t key_value;
key_value = PyNumber_AsSsize_t(key, PyExc_IndexError);
if (key_value == -1 && PyErr_Occurred())
return -1;
return PySequence_SetItem(o, key_value, value);
}
else if (o->ob_type->tp_as_sequence->sq_ass_item) {
type_error(/* ... omit ... */);
return -1;
}
}
type_error(/* ... omit ... */);
return -1;
}
int PyObject_SetAttr(PyObject *v, PyObject *name, PyObject *value)
{
PyTypeObject *tp = Py_TYPE(v);
int err;
if (!PyUnicode_Check(name)) {
PyErr_Format(PyExc_TypeError, /* ... omit ... */);
return -1;
}
Py_INCREF(name);
PyUnicode_InternInPlace(&name);
if (tp->tp_setattro != NULL) {
err = (*tp->tp_setattro)(v, name, value);
Py_DECREF(name);
return err;
}
if (tp->tp_setattr != NULL) {
const char *name_str = PyUnicode_AsUTF8(name);
if (name_str == NULL) {
Py_DECREF(name);
return -1;
}
err = (*tp->tp_setattr)(v, (char *)name_str, value);
Py_DECREF(name);
return err;
}
Py_DECREF(name);
_PyObject_ASSERT(name, name->ob_refcnt >= 1);
if (tp->tp_getattr == NULL && tp->tp_getattro == NULL)
PyErr_Format(PyExc_TypeError, /* ... omit ... */);
else
PyErr_Format(PyExc_TypeError, /* ... omit ... */);
return -1;
}
data[key] = value setattr(data, name, value)
(in Objects/abstract.c) (in Objects/object.c)
static PyObject *__pyx_pf_6simple_work(
PyObject *__pyx_v_data, PyObject *__pyx_v_key
, PyObject *__pyx_v_name, PyObject *__pyx_v_value
) {
PyObject_SetItem(__pyx_v_data, __pyx_v_key, __pyx_v_value);
PyObject_SetAttr(__pyx_v_data, __pyx_v_name, __pyx_v_value);
return __pyx_v_data;
}
def work(data, key, name, value):
data[key] = value
setattr(data, name, value)
return data
mapping
sequence
attribute name check
check to set
9
Speed is the king
To pursue high-performance and flexibility simultaneously,
the systems need to use the best tools at the proper layers.
This is a path to develop such systems.
10
Python controls execution flow
C++ manages resources
Use C++ to generate fast assembly code
Glue Python and C++
PyObject uses malloc
PyObject_MALLOC() is a wrapper to the standard C memory
allocator malloc().
11
PyObject *
_PyObject_New(PyTypeObject *tp)
{
PyObject *op;
op = (PyObject *) PyObject_MALLOC(_PyObject_SIZE(tp));
if (op == NULL)
return PyErr_NoMemory();
return PyObject_INIT(op, tp);
}
Python call stack
• Python maintains its
own stack

• It’s different from that of
the ‘real machine’

• The two (C/C++ and
Python) call stacks work
very differently
12
#0 line 11: show_stack
#1 line 23: caller
#2 line 25: <module>: top-level module
local_val is: defined outside
import inspect
def show_stack():
# Print calling stacks.
f = inspect.currentframe()
i = 0
while f:
name = f.f_code.co_name
if '<module>' == name:
name += ': top-level module'
print("#%d line %d: %s" % (i, f.f_lineno, name))
f = f.f_back
i += 1
# Obtain a local variable defined in an outside stack.
f = inspect.currentframe().f_back
print('local_val is:', f.f_locals['local_var'])
del f
def caller():
local_var = "defined outside"
show_stack()
caller()
Best practice: glue
• For performance and compile-time checking, we do not want to use
Python

• For flow control and configuration, we do not want to use C++

• Glue tools:

• Python C API : https://docs.python.org/3/c-api/index.html

• Cython : https://cython.org/

• Boost.Python : https://www.boost.org/doc/libs/1_74_0/libs/python/
doc/html/index.html

• Pybind11 : https://github.com/pybind/pybind11
13
I only listed tools that require manual generation of wrapping code and compilation.
Pick pybind11
14
Tools Pros Cons
Python C API always available
manual reference count is hard to
get right
Cython easy to use no comprehensive C++ support
Boost.Python comprehensive C++ support requires boost
Pybind11
easy-to-use comprehensive
modern C++ support
require C++11
Build system: cmake
• Again, there are many tools for build systems

• Plain make

• Python distutils, setuptools, pip, etc.

• cmake

• cmake:

• Cross platform

• Consistent and up-to-date support
15
How cmake works
16
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <cmath>
std::vector<double> distance(
std::vector<double> const & x
, std::vector<double> const & y) {
std::vector<double> r(x.size());
for (size_t i = 0 ; i < x.size() ; ++i)
{
r[i] = std::hypot(x.at(i), y.at(i));
}
return r;
}
PYBIND11_MODULE(_pybmod, mod) {
namespace py = pybind11;
mod.doc() = "simple pybind11 module";
mod.def("distance", &distance);
}
cmake_minimum_required(VERSION 3.9)
project(pybmod)
find_package(pybind11 REQUIRED)
include_directories(${pybind11_INCLUDE_DIRS})
pybind11_add_module(
_pybmod
pybmod.cpp
)
target_link_libraries(_pybmod
PRIVATE ${MKL_LINKLINE})
set_target_properties(_pybmod
PROPERTIES CXX_VISIBILITY_PRESET "default")
add_custom_target(_pybmod_py
COMMAND ${CMAKE_COMMAND} -E
copy $<TARGET_FILE:_pybmod>
${PROJECT_SOURCE_DIR}
DEPENDS _pybmod)
$ mkdir -p build ; cd build
$ cmake .. -DCMAKE_BUILD_TYPE=Release
$ make -C build _pybmod_py
$ python3 -c 'import _pybmod ; print(_pybmod.distance([3, 8], [4, 6]))'
[5.0, 10.0]
CMakeLists.txt pybmod.cpp
Zero-copy: do it where it fits
Python app C++ app
C++
container
Ndarray
manage
access
Python app C++ app
C++
container
Ndarray
manage
accessa11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory buffer shared across language memory buffer shared across language
Top (Python) - down (C++) Bottom (C++) - up (Python)
Python app C++ app
a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory buffer shared across language
Ndarray
C++
container
17
Not recom
m
ended
for performance
Share buffer from C++
Two handy approaches by pybind11
18
cls
.def_buffer
(
[](wrapped_type & self)
{
namespace py = pybind11;
return py::buffer_info
(
self.data() /* Pointer to buffer */
, sizeof(int8_t) /* Size of one scalar */
, py::format_descriptor<char>::format()
/* Python struct-style format descriptor */
, 1 /* Number of dimensions */
, { self.size() } /* Buffer dimensions */
, { 1 } /* Strides (in bytes) for each index */
);
}
)
cls
.def_property_readonly
(
"ndarray"
, [](wrapped_type & self)
{
namespace py = pybind11;
return py::array
(
py::detail::npy_format_descriptor<int8_t>::dtype()
/* Numpy dtype */
, { self.size() } /* Buffer dimensions */
, { 1 } /* Strides (in bytes) for each index */
, self.data() /* Pointer to buffer */
, py::cast(self.shared_from_this())
/* Owning Python object */
);
}
)
.def_property_readonly(“ndarray2”, /* ... */)
Return a distinct numpy ndarray:

• The C++ class can return multiple
buffers
Python buffer protocol:

• The C++ class itself is turned into an
array description
Check assembly
• Make sure compilers generate good code for
performance hotspot

• Checking instead of writing: we don’t
have time to use assembly for everything

• If compilers don’t do a good job, before
jumping into hand-written assembly, consider
to use intrinsics: https://software.intel.com/
sites/landingpage/IntrinsicsGuide/
19
vmovupd ymm0, ymmword [rsi + r9*8]
vmulpd ymm0, ymm0, ymm0
vmovupd ymm1, ymmword [rdx + r9*8]
vmulpd ymm1, ymm1, ymm1
vaddpd ymm0, ymm0, ymm1
vsqrtpd ymm0, ymm0
void calc_distance(
size_t const n
, double const * x
, double const * y
, double * r) {
for (size_t i = 0 ; i < n ; ++i)
{
r[i] = std::sqrt(x[i]*x[i] + y[i]*y[i]);
}
}
movupd xmm0, xmmword [rsi + r8*8]
mulpd xmm0, xmm0
movupd xmm1, xmmword [rdx + r8*8]
mulpd xmm1, xmm1
addpd xmm1, xmm0
sqrtpd xmm0, xmm1
-O3 -mavx2 -O3
# Use redare2 to print the assembly.
r2 -Aqc "e scr.color=0 ; s sym.calc_distance_unsignedlong_doubleconst__doubleconst__double ; pdf" 
_pybmod.cpython-37m-darwin.so
AVX: 256-bit-wide vectorization SSE: 128-bit-wide vectorization
Manage resources
• Python uses reference count to manage object lifecycle.
You can do the in C++ by using shared pointer.
20
• Reference counting involves a lock and is slow. You didn’t
experience it in Python because (i) other parts in Python
are even slower and (ii) you didn’t have a choice.
class ConcreteBuffer
: public std::enable_shared_from_this<ConcreteBuffer>
{
private:
struct ctor_passkey {};
public:
static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes)
{
return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey());
}
ConcreteBuffer(size_t nbytes, const ctor_passkey &)
: m_nbytes(nbytes)
, m_data(allocate(nbytes))
{}
};
Translate reference count
• If reference count is used in C++, make sure it is correctly
translated between Python.
21
pybind11::class_
<
ConcreteBuffer
, std::shared_ptr<ConcreteBuffer>
>(
mod, "ConcreteBuffer"
, "Contiguous memory buffer"
);
class ConcreteBuffer
: public std::enable_shared_from_this<ConcreteBuffer>
{
private:
struct ctor_passkey {};
public:
static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes)
{
return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey());
}
ConcreteBuffer(size_t nbytes, const ctor_passkey &)
: m_nbytes(nbytes)
, m_data(allocate(nbytes))
{}
};
• If reference count is used in C++, make sure it is correctly
translated between Python.
Run Python in C++
• When C++ code puts together the architecture, oftentimes
we want to keep Python code as little as possible.

• But Python is used for scripting and it’s necessary to
return Python objects for the scripts.

• Pybind11 allows to write concise C++. It’s much more
maintainable than in-line Python.
22
>>> joined = cppfunction()
>>> print(joined)
one, two, three
py::str cppfunction() {
py::list lst = py::list();
lst.append("one");
lst.append("two");
lst.append("three");
return py::str(", ").attr("join")(lst);
}
// No, don't do this!
PyRun_SimpleString("', '.join('one', 'two', 'three')");
Import and run
Pybind11 API is very Pythonic
23
>>> show_modules()
1101
3000
void show_modules() {
py::module sys = py::module::import("sys");
py::print(py::len(sys.attr("modules")));
py::print(sys.attr("getrecursionlimit")());
}
def show_modules():
import sys
print(len(sys.modules))
print(sys.getrecursionlimit())
Spend time in compilation
to save time in runtime
Compiler does a lot of good. Make use of it as much as
possible.
24
template < typename T >
class SimpleArray
{
public:
using value_type = T;
using shape_type = small_vector<size_t>;
// Determine content value element size during compile time.
static constexpr size_t ITEMSIZE = sizeof(value_type);
static constexpr size_t itemsize() { return ITEMSIZE; }
// Straight constructor.
explicit SimpleArray(size_t length)
: m_buffer(ConcreteBuffer::construct(length * ITEMSIZE))
, m_shape{length}
, m_stride{1}
{}
template< class InputIt > SimpleArray(InputIt first, InputIt last)
: SimpleArray(last-first)
{
// Let STL decide how to optimize memory copy.
std::copy(first, last, data());
}
};
Compiler reduces runtime
errors
• Wrap the wrappers: reduce duplicated code

• Do it in compile time to reduce runtime errors
25
template
<
class Wrapper
, class Wrapped
, class Holder = std::unique_ptr<Wrapped>
, class WrappedBase = Wrapped
>
class WrapBase
{
public:
using wrapper_type = Wrapper;
using wrapped_type = Wrapped;
using wrapped_base_type = WrappedBase;
using holder_type = Holder;
// Treat inheritance hierarchy.
using class_ = typename std::conditional_t
<
std::is_same< Wrapped, WrappedBase >::value
, pybind11::class_
< wrapped_type, holder_type >
, pybind11::class_
<wrapped_type, wrapped_base_type, holder_type>
>;
// Singleton.
static wrapper_type & commit(pybind11::module & mod)
{
static wrapper_type derived(mod);
return derived;
}
class_ & cls() { return m_cls; }
protected:
// Register through construction.
template <typename... Extra>
WrapBase(
pybind11::module & mod
, char const * pyname, char const * pydoc
, const Extra & ... extra
)
: m_cls(mod, pyname, pydoc, extra ...)
{}
private:
class_ m_cls;
};
Decouple resource
management from algorithms
26
• Fixed-size
contiguous data
buffer
template < typename T >
class SimpleArray
{
public:
using value_type = T;
using shape_type = small_vector<size_t>;
static constexpr size_t ITEMSIZE = sizeof(value_type);
explicit SimpleArray(size_t length)
: m_buffer(ConcreteBuffer::construct(length * ITEMSIZE))
, m_shape{length}
, m_stride{1}
{}
private:
std::shared_ptr<ConcreteBuffer> m_buffer;
shape_type m_shape;
shape_type m_stride;
};
class ConcreteBuffer
: public std::enable_shared_from_this<ConcreteBuffer>
{
private:
struct ctor_passkey {};
public:
static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes)
{ return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey()); }
ConcreteBuffer(size_t nbytes, const ctor_passkey &)
: m_nbytes(nbytes), m_data(allocate(nbytes))
{}
private:
using unique_ptr_type =
std::unique_ptr<int8_t, std::default_delete<int8_t[]>>;
size_t m_nbytes;
unique_ptr_type m_data;
};
• Descriptive data
object owning the
data buffer
Encapsulate complex calculation
without losing performance
27
// A solution element.
class Selm : public ElementBase<Selm>
{
public:
value_type dxneg() const { return x()-xneg(); }
value_type dxpos() const { return xpos()-x(); }
value_type xctr() const { return (xneg()+xpos())/2; }
value_type const & so0(size_t iv) const { return field().so0(xindex(), iv); }
value_type & so0(size_t iv) { return field().so0(xindex(), iv); }
value_type const & so1(size_t iv) const { return field().so1(xindex(), iv); }
value_type & so1(size_t iv) { return field().so1(xindex(), iv); }
value_type const & cfl() const { return field().cfl(xindex()); }
value_type & cfl() { return field().cfl(xindex()); }
value_type xn(size_t iv) const { return field().kernel().calc_xn(*this, iv); }
value_type xp(size_t iv) const { return field().kernel().calc_xp(*this, iv); }
value_type tn(size_t iv) const { return field().kernel().calc_tn(*this, iv); }
value_type tp(size_t iv) const { return field().kernel().calc_tp(*this, iv); }
value_type so0p(size_t iv) const { return field().kernel().calc_so0p(*this, iv); }
void update_cfl() { return field().kernel().update_cfl(*this); }
};
xj− 1
2
xs
j− 1
2
xj+ 1
2
xs
j+ 1
2
x+
j− 1
2
= x−
j+ 1
2
xj = xs
j
(h∗
)n
j− 1
2 ,+
· (0, −∆x+
j− 1
2
) (h∗
)n
j+ 1
2 ,−
· (0, −∆x−
j+ 1
2
)
h
n+ 1
2
j · (0, ∆xj)
(h∗
)n,+
j− 1
2
· (−∆t
2 , 0) (h∗
)n,+
j+ 1
2
· (∆t
2 , 0)
3 solution elements around a compound conservation element
(Part of a code for the space-time conservation element and solution element (CESE) method)
Conclusion
• Speed is the king for numerical calculation

• Use Python for configuration and C++ for number
crunching: glue them

• Keep in mind: Software engineering, zero-copy data
buffer, read assembly, manage resources, use compiler
for speed and safety

• More information: https://github.com/yungyuc/nsd
28
Thank you
29

Contenu connexe

Tendances

Tendances (20)

Function Parameters
Function ParametersFunction Parameters
Function Parameters
 
chapter-4-data-file-handlingeng.pdf
chapter-4-data-file-handlingeng.pdfchapter-4-data-file-handlingeng.pdf
chapter-4-data-file-handlingeng.pdf
 
Python ppt.pptx
Python ppt.pptxPython ppt.pptx
Python ppt.pptx
 
Constructors and Destructor in C++
Constructors and Destructor in C++Constructors and Destructor in C++
Constructors and Destructor in C++
 
C++ oop
C++ oopC++ oop
C++ oop
 
Phython Programming Language
Phython Programming LanguagePhython Programming Language
Phython Programming Language
 
Introduction to Game programming with PyGame Part 1
Introduction to Game programming with PyGame Part 1Introduction to Game programming with PyGame Part 1
Introduction to Game programming with PyGame Part 1
 
Python GUI Programming
Python GUI ProgrammingPython GUI Programming
Python GUI Programming
 
Constructor and Types of Constructors
Constructor and Types of ConstructorsConstructor and Types of Constructors
Constructor and Types of Constructors
 
Virtual Functions | Polymorphism | OOP
Virtual Functions | Polymorphism | OOPVirtual Functions | Polymorphism | OOP
Virtual Functions | Polymorphism | OOP
 
Python basic syntax
Python basic syntaxPython basic syntax
Python basic syntax
 
Compiler lab final report writing
Compiler lab final report writingCompiler lab final report writing
Compiler lab final report writing
 
Python
PythonPython
Python
 
Python - the basics
Python - the basicsPython - the basics
Python - the basics
 
Full Python in 20 slides
Full Python in 20 slidesFull Python in 20 slides
Full Python in 20 slides
 
Variables in python
Variables in pythonVariables in python
Variables in python
 
Pointers in C/C++ Programming
Pointers in C/C++ ProgrammingPointers in C/C++ Programming
Pointers in C/C++ Programming
 
Modul metasploit
Modul metasploitModul metasploit
Modul metasploit
 
Inline Functions and Default arguments
Inline Functions and Default argumentsInline Functions and Default arguments
Inline Functions and Default arguments
 
Vb.net (loop structure)
Vb.net (loop structure)Vb.net (loop structure)
Vb.net (loop structure)
 

Similaire à Notes about moving from python to c++ py contw 2020

Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CSteffen Wenz
 
Object Oriented Technologies
Object Oriented TechnologiesObject Oriented Technologies
Object Oriented TechnologiesUmesh Nikam
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationGlobalLogic Ukraine
 
Pemrograman Python untuk Pemula
Pemrograman Python untuk PemulaPemrograman Python untuk Pemula
Pemrograman Python untuk PemulaOon Arfiandwi
 
Unit 2function in python.pptx
Unit 2function in python.pptxUnit 2function in python.pptx
Unit 2function in python.pptxvishnupriyapm4
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAdam Getchell
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Edureka!
 
Pres_python_talakhoury_26_09_2023.pdf
Pres_python_talakhoury_26_09_2023.pdfPres_python_talakhoury_26_09_2023.pdf
Pres_python_talakhoury_26_09_2023.pdfRamziFeghali
 
#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting Started#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting StartedHadziq Fabroyir
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance PythonIan Ozsvald
 
TWINS: OOP and FP - Warburton
TWINS: OOP and FP - WarburtonTWINS: OOP and FP - Warburton
TWINS: OOP and FP - WarburtonCodemotion
 
Chapter 1 Class 12 Computer Science Unit 1
Chapter 1 Class 12 Computer Science Unit 1Chapter 1 Class 12 Computer Science Unit 1
Chapter 1 Class 12 Computer Science Unit 1ssusera7a08a
 
Revision of the basics of python1 (1).pdf
Revision of the basics of python1 (1).pdfRevision of the basics of python1 (1).pdf
Revision of the basics of python1 (1).pdfoptimusnotch44
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docxeugeniadean34240
 

Similaire à Notes about moving from python to c++ py contw 2020 (20)

Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
Object Oriented Technologies
Object Oriented TechnologiesObject Oriented Technologies
Object Oriented Technologies
 
C++ manual Report Full
C++ manual Report FullC++ manual Report Full
C++ manual Report Full
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python Integration
 
Pemrograman Python untuk Pemula
Pemrograman Python untuk PemulaPemrograman Python untuk Pemula
Pemrograman Python untuk Pemula
 
Unit 2function in python.pptx
Unit 2function in python.pptxUnit 2function in python.pptx
Unit 2function in python.pptx
 
C++ Programming
C++ ProgrammingC++ Programming
C++ Programming
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional Programming
 
Lecture2.ppt
Lecture2.pptLecture2.ppt
Lecture2.ppt
 
C++ Programming
C++ ProgrammingC++ Programming
C++ Programming
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
 
Pres_python_talakhoury_26_09_2023.pdf
Pres_python_talakhoury_26_09_2023.pdfPres_python_talakhoury_26_09_2023.pdf
Pres_python_talakhoury_26_09_2023.pdf
 
#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting Started#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting Started
 
C Programming Tutorial - www.infomtec.com
C Programming Tutorial - www.infomtec.comC Programming Tutorial - www.infomtec.com
C Programming Tutorial - www.infomtec.com
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
Pythonpresent
PythonpresentPythonpresent
Pythonpresent
 
TWINS: OOP and FP - Warburton
TWINS: OOP and FP - WarburtonTWINS: OOP and FP - Warburton
TWINS: OOP and FP - Warburton
 
Chapter 1 Class 12 Computer Science Unit 1
Chapter 1 Class 12 Computer Science Unit 1Chapter 1 Class 12 Computer Science Unit 1
Chapter 1 Class 12 Computer Science Unit 1
 
Revision of the basics of python1 (1).pdf
Revision of the basics of python1 (1).pdfRevision of the basics of python1 (1).pdf
Revision of the basics of python1 (1).pdf
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
 

Plus de Yung-Yu Chen

Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for SpeedYung-Yu Chen
 
SimpleArray between Python and C++
SimpleArray between Python and C++SimpleArray between Python and C++
SimpleArray between Python and C++Yung-Yu Chen
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a jobYung-Yu Chen
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of pythonYung-Yu Chen
 
Take advantage of C++ from Python
Take advantage of C++ from PythonTake advantage of C++ from Python
Take advantage of C++ from PythonYung-Yu Chen
 
Start Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeStart Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeYung-Yu Chen
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for SpeedYung-Yu Chen
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computingYung-Yu Chen
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering SoftwareYung-Yu Chen
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational WorkYung-Yu Chen
 

Plus de Yung-Yu Chen (10)

Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for Speed
 
SimpleArray between Python and C++
SimpleArray between Python and C++SimpleArray between Python and C++
SimpleArray between Python and C++
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of python
 
Take advantage of C++ from Python
Take advantage of C++ from PythonTake advantage of C++ from Python
Take advantage of C++ from Python
 
Start Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeStart Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New Rope
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for Speed
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computing
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering Software
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational Work
 

Dernier

Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detailhaiderbaloch3
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 

Dernier (20)

Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detail
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 

Notes about moving from python to c++ py contw 2020

  • 1. Notes about moving from Python to C++ Yung-Yu Chen (@yungyuc) PyConTW 2020, Tainan, Taiwan 1
  • 2. Why move? • Numerical software: Computer programs to solve scientific or mathematic problems. • Other names: Mathematical software, scientific software, technical software. • Python is a popular language for application experts to describe the problems and solutions, because it is easy to use. • Most of the computing systems (the numerical software) are designed in a hybrid architecture. • The computing kernel uses C++. • Python is chosen for the user-level API. 2
  • 3. What I do • By training, I am a computational scientist, focusing on applications of continuum mechanics. (I am not a computer scientist.) • In my day job, I write high-performance code for semiconductor applications of computational geometry and photolithography. • In my spare time, I am teaching a course ‘numerical software development’ in the department of computer science in NCTU. 3 twitter: https://twitter.com/yungyuc
 linkedin: https://www.linkedin.com/in/yungyuc/.
  • 4. Example: PDEs 4 Numerical simulations of conservation laws: ∂u ∂t + 3 ∑ k=1 ∂F(k) (u) ∂xk = 0 Use case: stress waves in 
 anisotropic solids Use case: compressible flows
  • 5. Example: OPC 5 photoresist silicon substrate photomask light source Photolithography in semiconductor fabrication wave length is only hundreds of nm image I want to project on the PR shape I need on the mask Optical proximity correction (OPC) (smaller than the wave length) write code to make it happen
  • 6. Python: the good is the bad • Dynamicity enables convention over compilation for productivity • It is so when you have good testing and documentation import math class Point: def __init__(self, x, y): self.x = x self.y = y @property def length(self): return math.hypot(self.x, self.y) 5.0print(Point(3, 4).length) print(Point("3", "4").length) Traceback (most recent call last): File "ambiguity.py", line 10, in <module> print(Point("3", "4").length) File "ambiguity.py", line 8, in length return math.hypot(self.x, self.y) TypeError: a float is required 6
  • 7. What is fast? worker(Data&, unsigned long, int): mov DWORD PTR [rdi+rsi*4], edx mov rax, rdi ret class Data { int m_storage[1024]; public: int & operator[] (size_t it) { return m_storage[it]; } }; Data & worker(Data & data, size_t idx, int value) { data[idx] = value; return data; } High-level language Machine code 7
  • 8. Dynamic typing is slow • Python trades runtime for convenience; everything is indirect and incurs runtime overhead def work(data, key, name, value): # Assign value to the data as a container with key. data[key] = value # Assign value to the data as an object with name. setattr(data, name, value) return data 8 • The innocent-looking Python code uses many checking functions under the hood
  • 9. int PyObject_SetItem(PyObject *o, PyObject *key, PyObject *value) { PyMappingMethods *m; if (o == NULL || key == NULL || value == NULL) { null_error(); return -1; } m = o->ob_type->tp_as_mapping; if (m && m->mp_ass_subscript) return m->mp_ass_subscript(o, key, value); if (o->ob_type->tp_as_sequence) { if (PyIndex_Check(key)) { Py_ssize_t key_value; key_value = PyNumber_AsSsize_t(key, PyExc_IndexError); if (key_value == -1 && PyErr_Occurred()) return -1; return PySequence_SetItem(o, key_value, value); } else if (o->ob_type->tp_as_sequence->sq_ass_item) { type_error(/* ... omit ... */); return -1; } } type_error(/* ... omit ... */); return -1; } int PyObject_SetAttr(PyObject *v, PyObject *name, PyObject *value) { PyTypeObject *tp = Py_TYPE(v); int err; if (!PyUnicode_Check(name)) { PyErr_Format(PyExc_TypeError, /* ... omit ... */); return -1; } Py_INCREF(name); PyUnicode_InternInPlace(&name); if (tp->tp_setattro != NULL) { err = (*tp->tp_setattro)(v, name, value); Py_DECREF(name); return err; } if (tp->tp_setattr != NULL) { const char *name_str = PyUnicode_AsUTF8(name); if (name_str == NULL) { Py_DECREF(name); return -1; } err = (*tp->tp_setattr)(v, (char *)name_str, value); Py_DECREF(name); return err; } Py_DECREF(name); _PyObject_ASSERT(name, name->ob_refcnt >= 1); if (tp->tp_getattr == NULL && tp->tp_getattro == NULL) PyErr_Format(PyExc_TypeError, /* ... omit ... */); else PyErr_Format(PyExc_TypeError, /* ... omit ... */); return -1; } data[key] = value setattr(data, name, value) (in Objects/abstract.c) (in Objects/object.c) static PyObject *__pyx_pf_6simple_work( PyObject *__pyx_v_data, PyObject *__pyx_v_key , PyObject *__pyx_v_name, PyObject *__pyx_v_value ) { PyObject_SetItem(__pyx_v_data, __pyx_v_key, __pyx_v_value); PyObject_SetAttr(__pyx_v_data, __pyx_v_name, __pyx_v_value); return __pyx_v_data; } def work(data, key, name, value): data[key] = value setattr(data, name, value) return data mapping sequence attribute name check check to set 9
  • 10. Speed is the king To pursue high-performance and flexibility simultaneously, the systems need to use the best tools at the proper layers. This is a path to develop such systems. 10 Python controls execution flow C++ manages resources Use C++ to generate fast assembly code Glue Python and C++
  • 11. PyObject uses malloc PyObject_MALLOC() is a wrapper to the standard C memory allocator malloc(). 11 PyObject * _PyObject_New(PyTypeObject *tp) { PyObject *op; op = (PyObject *) PyObject_MALLOC(_PyObject_SIZE(tp)); if (op == NULL) return PyErr_NoMemory(); return PyObject_INIT(op, tp); }
  • 12. Python call stack • Python maintains its own stack • It’s different from that of the ‘real machine’ • The two (C/C++ and Python) call stacks work very differently 12 #0 line 11: show_stack #1 line 23: caller #2 line 25: <module>: top-level module local_val is: defined outside import inspect def show_stack(): # Print calling stacks. f = inspect.currentframe() i = 0 while f: name = f.f_code.co_name if '<module>' == name: name += ': top-level module' print("#%d line %d: %s" % (i, f.f_lineno, name)) f = f.f_back i += 1 # Obtain a local variable defined in an outside stack. f = inspect.currentframe().f_back print('local_val is:', f.f_locals['local_var']) del f def caller(): local_var = "defined outside" show_stack() caller()
  • 13. Best practice: glue • For performance and compile-time checking, we do not want to use Python • For flow control and configuration, we do not want to use C++ • Glue tools: • Python C API : https://docs.python.org/3/c-api/index.html • Cython : https://cython.org/ • Boost.Python : https://www.boost.org/doc/libs/1_74_0/libs/python/ doc/html/index.html • Pybind11 : https://github.com/pybind/pybind11 13 I only listed tools that require manual generation of wrapping code and compilation.
  • 14. Pick pybind11 14 Tools Pros Cons Python C API always available manual reference count is hard to get right Cython easy to use no comprehensive C++ support Boost.Python comprehensive C++ support requires boost Pybind11 easy-to-use comprehensive modern C++ support require C++11
  • 15. Build system: cmake • Again, there are many tools for build systems • Plain make • Python distutils, setuptools, pip, etc. • cmake • cmake: • Cross platform • Consistent and up-to-date support 15
  • 16. How cmake works 16 #include <pybind11/pybind11.h> #include <pybind11/stl.h> #include <cmath> std::vector<double> distance( std::vector<double> const & x , std::vector<double> const & y) { std::vector<double> r(x.size()); for (size_t i = 0 ; i < x.size() ; ++i) { r[i] = std::hypot(x.at(i), y.at(i)); } return r; } PYBIND11_MODULE(_pybmod, mod) { namespace py = pybind11; mod.doc() = "simple pybind11 module"; mod.def("distance", &distance); } cmake_minimum_required(VERSION 3.9) project(pybmod) find_package(pybind11 REQUIRED) include_directories(${pybind11_INCLUDE_DIRS}) pybind11_add_module( _pybmod pybmod.cpp ) target_link_libraries(_pybmod PRIVATE ${MKL_LINKLINE}) set_target_properties(_pybmod PROPERTIES CXX_VISIBILITY_PRESET "default") add_custom_target(_pybmod_py COMMAND ${CMAKE_COMMAND} -E copy $<TARGET_FILE:_pybmod> ${PROJECT_SOURCE_DIR} DEPENDS _pybmod) $ mkdir -p build ; cd build $ cmake .. -DCMAKE_BUILD_TYPE=Release $ make -C build _pybmod_py $ python3 -c 'import _pybmod ; print(_pybmod.distance([3, 8], [4, 6]))' [5.0, 10.0] CMakeLists.txt pybmod.cpp
  • 17. Zero-copy: do it where it fits Python app C++ app C++ container Ndarray manage access Python app C++ app C++ container Ndarray manage accessa11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language memory buffer shared across language Top (Python) - down (C++) Bottom (C++) - up (Python) Python app C++ app a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language Ndarray C++ container 17 Not recom m ended for performance
  • 18. Share buffer from C++ Two handy approaches by pybind11 18 cls .def_buffer ( [](wrapped_type & self) { namespace py = pybind11; return py::buffer_info ( self.data() /* Pointer to buffer */ , sizeof(int8_t) /* Size of one scalar */ , py::format_descriptor<char>::format() /* Python struct-style format descriptor */ , 1 /* Number of dimensions */ , { self.size() } /* Buffer dimensions */ , { 1 } /* Strides (in bytes) for each index */ ); } ) cls .def_property_readonly ( "ndarray" , [](wrapped_type & self) { namespace py = pybind11; return py::array ( py::detail::npy_format_descriptor<int8_t>::dtype() /* Numpy dtype */ , { self.size() } /* Buffer dimensions */ , { 1 } /* Strides (in bytes) for each index */ , self.data() /* Pointer to buffer */ , py::cast(self.shared_from_this()) /* Owning Python object */ ); } ) .def_property_readonly(“ndarray2”, /* ... */) Return a distinct numpy ndarray: • The C++ class can return multiple buffers Python buffer protocol: • The C++ class itself is turned into an array description
  • 19. Check assembly • Make sure compilers generate good code for performance hotspot • Checking instead of writing: we don’t have time to use assembly for everything • If compilers don’t do a good job, before jumping into hand-written assembly, consider to use intrinsics: https://software.intel.com/ sites/landingpage/IntrinsicsGuide/ 19 vmovupd ymm0, ymmword [rsi + r9*8] vmulpd ymm0, ymm0, ymm0 vmovupd ymm1, ymmword [rdx + r9*8] vmulpd ymm1, ymm1, ymm1 vaddpd ymm0, ymm0, ymm1 vsqrtpd ymm0, ymm0 void calc_distance( size_t const n , double const * x , double const * y , double * r) { for (size_t i = 0 ; i < n ; ++i) { r[i] = std::sqrt(x[i]*x[i] + y[i]*y[i]); } } movupd xmm0, xmmword [rsi + r8*8] mulpd xmm0, xmm0 movupd xmm1, xmmword [rdx + r8*8] mulpd xmm1, xmm1 addpd xmm1, xmm0 sqrtpd xmm0, xmm1 -O3 -mavx2 -O3 # Use redare2 to print the assembly. r2 -Aqc "e scr.color=0 ; s sym.calc_distance_unsignedlong_doubleconst__doubleconst__double ; pdf" _pybmod.cpython-37m-darwin.so AVX: 256-bit-wide vectorization SSE: 128-bit-wide vectorization
  • 20. Manage resources • Python uses reference count to manage object lifecycle. You can do the in C++ by using shared pointer. 20 • Reference counting involves a lock and is slow. You didn’t experience it in Python because (i) other parts in Python are even slower and (ii) you didn’t have a choice. class ConcreteBuffer : public std::enable_shared_from_this<ConcreteBuffer> { private: struct ctor_passkey {}; public: static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes) { return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey()); } ConcreteBuffer(size_t nbytes, const ctor_passkey &) : m_nbytes(nbytes) , m_data(allocate(nbytes)) {} };
  • 21. Translate reference count • If reference count is used in C++, make sure it is correctly translated between Python. 21 pybind11::class_ < ConcreteBuffer , std::shared_ptr<ConcreteBuffer> >( mod, "ConcreteBuffer" , "Contiguous memory buffer" ); class ConcreteBuffer : public std::enable_shared_from_this<ConcreteBuffer> { private: struct ctor_passkey {}; public: static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes) { return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey()); } ConcreteBuffer(size_t nbytes, const ctor_passkey &) : m_nbytes(nbytes) , m_data(allocate(nbytes)) {} }; • If reference count is used in C++, make sure it is correctly translated between Python.
  • 22. Run Python in C++ • When C++ code puts together the architecture, oftentimes we want to keep Python code as little as possible. • But Python is used for scripting and it’s necessary to return Python objects for the scripts. • Pybind11 allows to write concise C++. It’s much more maintainable than in-line Python. 22 >>> joined = cppfunction() >>> print(joined) one, two, three py::str cppfunction() { py::list lst = py::list(); lst.append("one"); lst.append("two"); lst.append("three"); return py::str(", ").attr("join")(lst); } // No, don't do this! PyRun_SimpleString("', '.join('one', 'two', 'three')");
  • 23. Import and run Pybind11 API is very Pythonic 23 >>> show_modules() 1101 3000 void show_modules() { py::module sys = py::module::import("sys"); py::print(py::len(sys.attr("modules"))); py::print(sys.attr("getrecursionlimit")()); } def show_modules(): import sys print(len(sys.modules)) print(sys.getrecursionlimit())
  • 24. Spend time in compilation to save time in runtime Compiler does a lot of good. Make use of it as much as possible. 24 template < typename T > class SimpleArray { public: using value_type = T; using shape_type = small_vector<size_t>; // Determine content value element size during compile time. static constexpr size_t ITEMSIZE = sizeof(value_type); static constexpr size_t itemsize() { return ITEMSIZE; } // Straight constructor. explicit SimpleArray(size_t length) : m_buffer(ConcreteBuffer::construct(length * ITEMSIZE)) , m_shape{length} , m_stride{1} {} template< class InputIt > SimpleArray(InputIt first, InputIt last) : SimpleArray(last-first) { // Let STL decide how to optimize memory copy. std::copy(first, last, data()); } };
  • 25. Compiler reduces runtime errors • Wrap the wrappers: reduce duplicated code • Do it in compile time to reduce runtime errors 25 template < class Wrapper , class Wrapped , class Holder = std::unique_ptr<Wrapped> , class WrappedBase = Wrapped > class WrapBase { public: using wrapper_type = Wrapper; using wrapped_type = Wrapped; using wrapped_base_type = WrappedBase; using holder_type = Holder; // Treat inheritance hierarchy. using class_ = typename std::conditional_t < std::is_same< Wrapped, WrappedBase >::value , pybind11::class_ < wrapped_type, holder_type > , pybind11::class_ <wrapped_type, wrapped_base_type, holder_type> >; // Singleton. static wrapper_type & commit(pybind11::module & mod) { static wrapper_type derived(mod); return derived; } class_ & cls() { return m_cls; } protected: // Register through construction. template <typename... Extra> WrapBase( pybind11::module & mod , char const * pyname, char const * pydoc , const Extra & ... extra ) : m_cls(mod, pyname, pydoc, extra ...) {} private: class_ m_cls; };
  • 26. Decouple resource management from algorithms 26 • Fixed-size contiguous data buffer template < typename T > class SimpleArray { public: using value_type = T; using shape_type = small_vector<size_t>; static constexpr size_t ITEMSIZE = sizeof(value_type); explicit SimpleArray(size_t length) : m_buffer(ConcreteBuffer::construct(length * ITEMSIZE)) , m_shape{length} , m_stride{1} {} private: std::shared_ptr<ConcreteBuffer> m_buffer; shape_type m_shape; shape_type m_stride; }; class ConcreteBuffer : public std::enable_shared_from_this<ConcreteBuffer> { private: struct ctor_passkey {}; public: static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes) { return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey()); } ConcreteBuffer(size_t nbytes, const ctor_passkey &) : m_nbytes(nbytes), m_data(allocate(nbytes)) {} private: using unique_ptr_type = std::unique_ptr<int8_t, std::default_delete<int8_t[]>>; size_t m_nbytes; unique_ptr_type m_data; }; • Descriptive data object owning the data buffer
  • 27. Encapsulate complex calculation without losing performance 27 // A solution element. class Selm : public ElementBase<Selm> { public: value_type dxneg() const { return x()-xneg(); } value_type dxpos() const { return xpos()-x(); } value_type xctr() const { return (xneg()+xpos())/2; } value_type const & so0(size_t iv) const { return field().so0(xindex(), iv); } value_type & so0(size_t iv) { return field().so0(xindex(), iv); } value_type const & so1(size_t iv) const { return field().so1(xindex(), iv); } value_type & so1(size_t iv) { return field().so1(xindex(), iv); } value_type const & cfl() const { return field().cfl(xindex()); } value_type & cfl() { return field().cfl(xindex()); } value_type xn(size_t iv) const { return field().kernel().calc_xn(*this, iv); } value_type xp(size_t iv) const { return field().kernel().calc_xp(*this, iv); } value_type tn(size_t iv) const { return field().kernel().calc_tn(*this, iv); } value_type tp(size_t iv) const { return field().kernel().calc_tp(*this, iv); } value_type so0p(size_t iv) const { return field().kernel().calc_so0p(*this, iv); } void update_cfl() { return field().kernel().update_cfl(*this); } }; xj− 1 2 xs j− 1 2 xj+ 1 2 xs j+ 1 2 x+ j− 1 2 = x− j+ 1 2 xj = xs j (h∗ )n j− 1 2 ,+ · (0, −∆x+ j− 1 2 ) (h∗ )n j+ 1 2 ,− · (0, −∆x− j+ 1 2 ) h n+ 1 2 j · (0, ∆xj) (h∗ )n,+ j− 1 2 · (−∆t 2 , 0) (h∗ )n,+ j+ 1 2 · (∆t 2 , 0) 3 solution elements around a compound conservation element (Part of a code for the space-time conservation element and solution element (CESE) method)
  • 28. Conclusion • Speed is the king for numerical calculation • Use Python for configuration and C++ for number crunching: glue them • Keep in mind: Software engineering, zero-copy data buffer, read assembly, manage resources, use compiler for speed and safety • More information: https://github.com/yungyuc/nsd 28