SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
© 2015 Nexedi
wendelin.core
effortless out-of-core NumPy
2014-04-03 – Paris
© 2015 Nexedi
Who am I?
● Kirill Smelkov
● Senior developer at Nexedi
● Author of wendelin.core
● Contributor to linux, git and scientific libraries
from time to time
● kirr@nexedi.com
© 2015 Nexedi
Agenda
● Where do we come from
● Five problems to solves
● The solution
● Future Roadmap
© 2015 Nexedi
Where do we come from?
© 2015 Nexedi
Nexedi
● Possibly Largest OSS Publisher in Europe
– ERP5: ERP, CRM, ECM, e-business framework
– SlapOS: distributed mesh cloud operation system
– NEO: distributed transactional NoSQL database
– Wendelin: out-of-core big data based on NumPy
– re6st: resilient IPv6 mesh overlay network
– RenderJS: javascript component system
– JIO: javascript virtual database and virtual filesystem
– cloudooo: multimedia conversion server
– Web Runner: web based Platform-as-a-Service (PaaS) and IDE
– OfficeJS: web office suite based on RenderJS and JIO
© 2015 Nexedi
© 2015 Nexedi
© 2015 Nexedi
+
?
Application Convergence
© 2015 Nexedi
neoctl
Sate access
Command control
Master
OID & TID allocation
Synchronisation
Load balancing
Storage
Object data
Transaction data
Partition table
Application
ZODB
neo.client
Data
ControlAdmin
State archival
Command proxy
ERP5 Storage: NEO
© 2015 Nexedi
Standard Hardware no router / no SAN
– 2 x 10 Gbps
– 2 x 6 core Xeon CPU
– 512 GB RAM
– 4 x 1 TB SSD
– 1 x M2090 GPU
x 160
x 32
+
+
x 320
– 10 Gbps
– Unmanaged
© 2015 Nexedi
Five Problems to Solve
© 2015 Nexedi
It is All About NumPy
© 2015 Nexedi
Problem 1: Persistent NumPy
● How to store NumPy arrays in a database?
– in NEO?
– in NoSQL?
– in SQL?
© 2015 Nexedi
Problem 2: Distributed NumPy
● How to share NumPy arrays in a cluster?
– One PC, many Python processes
– Many PC, many Python processes
© 2015 Nexedi
Problem 3: Out-of-core NumPy
● How to load big NumPy arrays in small RAM?
– ERP5: “it should work even if it does not work”
– Stopping business is not an option
(because of not enough RAM)
© 2015 Nexedi
Problem 4: Transactional NumPy
● How to make NumPy arrays transaction safe?
– Exception handling
– Concurrent writes
– Distributed computing
© 2015 Nexedi
Problem 5: Compatibility
● Compatibility with NumPy-based stack is a must
● Native BLAS support is a must
● Cython/FORTRAN/C/C++ support is a must
● Code rewrite is not an option
– Blaze: not NumPy compatible below Python level
– Dato: not NumPy compatible
© 2015 Nexedi
The Solution
© 2015 Nexedi
Unsolutions
● Update NumPy & libraries with calling
notification hooks when memory is changed
→ not practical
– There is a lot of code in numpy and lot of libraries around numpy
– Catching them all would be a huge task
● Compare array data to original array content at
commit time and store only found-to-be-
changed parts → not good
– At every commit whole array data has to be read/analyzed and
array data can be very big
© 2015 Nexedi
Remember mmap? READ
● Region of memory mapped by kernel to a file
● Memory pages start with NONE protection
→ CPU can not read nor write
● Whenever read request comes from CPU,
kernel traps it (thanks to MMU), loads content
for that page from file, and resumes original
read
© 2015 Nexedi
Remember mmap? WRITE
● Whenever write request comes from CPU,
kernel traps it, marks the page as DIRTY,
unprotects it and resumes original write
→ kernel knows which pages were modified
● Whenever application wants to make sure
modified data is stored back to file (msync),
kernel goes over list of dirty pages and writes
their content back to file
© 2015 Nexedi
Partial Conclusion
● If we manage to represent arrays as files, we'll
get “track-changes-to-content” from kernel
© 2015 Nexedi
FUSE ?
● FUSE & virtual filesystem representing "glued"
arrays from ZODB BTree & objects
● Problem 1: does not work with huge pages
– Performance issues
– Not easy to fix
● Problem 2: no support for commit / abort
– Transaction issues
© 2015 Nexedi
UVMM: Userspace Virtual Memory Manager
● Trap write access to memory via installing
SIGSEGV signal handler
© 2015 Nexedi
UVMM ON CPU WRITE
● SIGSEGV handler gets notified,
● Marks corresponding array block as dirty
● Adjust memory protection to be read-write
● Resumes write instruction
● → we know which array parts were modified
© 2015 Nexedi
UVMM ON CPU READ
● Set pages initial protection to PROT_NONE
→ no-read and no-write
● First load in SIGSEGV handler
● When RAM is tight, we can "forget" already
loaded (but not-yet modified) memory parts and
free RAM for loading new data
© 2015 Nexedi
UVMM LIMITS ?
● Array size is only limited by virtual memory
address space size
→ 127TB on Linux/amd64 (today)
● Future Linux kernel may support more
© 2015 Nexedi
Is it safe to do work in SIGSEGV handler?
● Short answer: YES
● Long answer: www.wendelin.io
© 2015 Nexedi
from wendelin.bigfile import BigFile
# bigfile with data storage in 'some backend'
class BigFile_SomeBackend(BigFile):
.blksize = … # file is stored in block of size
def loadblk(self, blk, buf) # load file block #blk to memory buffer `buf`
def storeblk(self, blk, buf) # store data from memory buffer `buf` to file
# block blk
f = BigFile_SomeBackend(...)
Tutorial: init a BigFile backend
© 2015 Nexedi
# BigFile handle is a representation of file snapshot that could be locally
# modified in-memory. The changes could be later either discarded or stored
# back to file. One file can have many opened handles each with its own
# modifications.
fh = f.fileh_open()
# memory mapping of fh
vma = fh.mmap(pgoffset=0, pglen=N)
# vma exposes memoryview/buffer interfaces
mem = memoryview(vma)
# now we can do with `mem` whatever we like
...
fh.dirty_discard() # to forget all changes done to `mem` memory
fh.dirty_writeout(...) # to store changes back to file
BigFile Handle: BigFile as Memory
© 2015 Nexedi
from webdelin.bigfile.file_zodb import ZBigFile
import transaction
f = ZBigFile() # create anew
f = root['...'].some.object # load saved state from database
# the same as with plain BigFile (previous example)
fh = fileh_open()
vma = fh.mmap(0, N)
mem = memoryview(vma)
# we can also modify other objects living in ZODB
transaction.abort() # to abort all changes to mem and other objects
transaction.commit() # to commit all changes to mem and other objects
ZBigFile: ZODB & Transactions
© 2015 Nexedi
# f - some BigFile
# n - some (large) number
fh = f.fileh_open() # handle to bigfile (see slide ...)
A = BigArray(shape=(n,10), dtype=uint32, fh)
a = A[0:3*(1<<30), :] # real ndarray viewing first 3 giga-rows (= ~120GB) of
# data from f
# NOTE 120GB can be significantly > of RAM available
a.mean() # computes mean of items in above range
# this call is just an ndarray.mean() call and code
# which works is the code in NumPy.
# NOTE data will be loaded and freed by virtual memory
# manager transparently to client code which computes
# the mean
BigArray: “ndarray” on top of BigFile
© 2015 Nexedi
a[2] = ...
...
fh.dirty_discard() # to discard, or
fh.dirty_writeout() # to write
BigArray: Transactions
© 2015 Nexedi
from wendelin.bigarra.array_zodb import ZbigArray
import transaction
# root is connection to oped database
root['sensor_data'] = A = ZBigArray(shape=..., dtype=...)
# populate A with data
A[2] = 1
# compute mean
A.mean()
# abort / commit changes
transaction.abort()
transaction.commit()
ZBigArray: ZODB & Transactions
© 2015 Nexedi
NEO and ZBigArray
ZBigArray
1 2 3 4 5 6 7 8 9 10 11 12
5
9
6
10
7
11
1 2 3 4
8
12
© 2015 Nexedi
Future Improvements
● Temporary arrays created by NumPy libraries
● Performance
● Multithreading
© 2015 Nexedi
Future Roadmap
© 2015 Nexedi
Roadmap
● Make wendelin.core fast
– userfaultfd, filesystem-based approach
– remove use of pickles
– remove large temporary arrays in NumPy, etc.
● Yet, you can start using wendelin.core now!
– Persistent
– Distributed
– Out-of-core
– Transactional
– Virtually no change to your code needed
– Open Source
www.wendelin.io
© 2015 Nexedi
wendelin.core
effortless out-of-core NumPy
2014-04-03 – Paris
www.wendelin.io

Contenu connexe

Similaire à PyData Paris 2015 - Track 3.4 Kirill Smelkov

Week 4 B IP Subnetting Lab Essay
Week 4 B IP Subnetting Lab EssayWeek 4 B IP Subnetting Lab Essay
Week 4 B IP Subnetting Lab Essay
Amanda Brady
 
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbers
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbersDefcon 22 - Stitching numbers - generating rop payloads from in memory numbers
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbers
Alexandre Moneger
 

Similaire à PyData Paris 2015 - Track 3.4 Kirill Smelkov (20)

[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
 
MySQL Cluster Asynchronous replication (2014)
MySQL Cluster Asynchronous replication (2014) MySQL Cluster Asynchronous replication (2014)
MySQL Cluster Asynchronous replication (2014)
 
Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and ceph
 
Week 4 B IP Subnetting Lab Essay
Week 4 B IP Subnetting Lab EssayWeek 4 B IP Subnetting Lab Essay
Week 4 B IP Subnetting Lab Essay
 
WekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound AgainWekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Again
 
OSMC 2012 | Shinken by Jean Gabès
OSMC 2012 | Shinken by Jean GabèsOSMC 2012 | Shinken by Jean Gabès
OSMC 2012 | Shinken by Jean Gabès
 
Scaling PHP apps
Scaling PHP appsScaling PHP apps
Scaling PHP apps
 
How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)How to deploy & optimize eZ Publish (2014)
How to deploy & optimize eZ Publish (2014)
 
PyData Paris 2015 - Track 4.1 Jean-Paul Smets et Ivan Tiagov
PyData Paris 2015 - Track 4.1 Jean-Paul Smets et Ivan TiagovPyData Paris 2015 - Track 4.1 Jean-Paul Smets et Ivan Tiagov
PyData Paris 2015 - Track 4.1 Jean-Paul Smets et Ivan Tiagov
 
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbers
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbersDefcon 22 - Stitching numbers - generating rop payloads from in memory numbers
Defcon 22 - Stitching numbers - generating rop payloads from in memory numbers
 
My "Perfect" Toolchain Setup for Grails Projects
My "Perfect" Toolchain Setup for Grails ProjectsMy "Perfect" Toolchain Setup for Grails Projects
My "Perfect" Toolchain Setup for Grails Projects
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
 
Introduction to PaaS and Heroku
Introduction to PaaS and HerokuIntroduction to PaaS and Heroku
Introduction to PaaS and Heroku
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
 
Better Code: Concurrency
Better Code: ConcurrencyBetter Code: Concurrency
Better Code: Concurrency
 
Management and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - SlidesManagement and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - Slides
 
OpenStack Days Krakow
OpenStack Days KrakowOpenStack Days Krakow
OpenStack Days Krakow
 
Cook your KV
Cook your KVCook your KV
Cook your KV
 
Instant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositoriesInstant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositories
 

Plus de Pôle Systematic Paris-Region

Plus de Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Dernier

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Dernier (20)

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 

PyData Paris 2015 - Track 3.4 Kirill Smelkov

  • 1. © 2015 Nexedi wendelin.core effortless out-of-core NumPy 2014-04-03 – Paris
  • 2. © 2015 Nexedi Who am I? ● Kirill Smelkov ● Senior developer at Nexedi ● Author of wendelin.core ● Contributor to linux, git and scientific libraries from time to time ● kirr@nexedi.com
  • 3. © 2015 Nexedi Agenda ● Where do we come from ● Five problems to solves ● The solution ● Future Roadmap
  • 4. © 2015 Nexedi Where do we come from?
  • 5. © 2015 Nexedi Nexedi ● Possibly Largest OSS Publisher in Europe – ERP5: ERP, CRM, ECM, e-business framework – SlapOS: distributed mesh cloud operation system – NEO: distributed transactional NoSQL database – Wendelin: out-of-core big data based on NumPy – re6st: resilient IPv6 mesh overlay network – RenderJS: javascript component system – JIO: javascript virtual database and virtual filesystem – cloudooo: multimedia conversion server – Web Runner: web based Platform-as-a-Service (PaaS) and IDE – OfficeJS: web office suite based on RenderJS and JIO
  • 9. © 2015 Nexedi neoctl Sate access Command control Master OID & TID allocation Synchronisation Load balancing Storage Object data Transaction data Partition table Application ZODB neo.client Data ControlAdmin State archival Command proxy ERP5 Storage: NEO
  • 10. © 2015 Nexedi Standard Hardware no router / no SAN – 2 x 10 Gbps – 2 x 6 core Xeon CPU – 512 GB RAM – 4 x 1 TB SSD – 1 x M2090 GPU x 160 x 32 + + x 320 – 10 Gbps – Unmanaged
  • 11. © 2015 Nexedi Five Problems to Solve
  • 12. © 2015 Nexedi It is All About NumPy
  • 13. © 2015 Nexedi Problem 1: Persistent NumPy ● How to store NumPy arrays in a database? – in NEO? – in NoSQL? – in SQL?
  • 14. © 2015 Nexedi Problem 2: Distributed NumPy ● How to share NumPy arrays in a cluster? – One PC, many Python processes – Many PC, many Python processes
  • 15. © 2015 Nexedi Problem 3: Out-of-core NumPy ● How to load big NumPy arrays in small RAM? – ERP5: “it should work even if it does not work” – Stopping business is not an option (because of not enough RAM)
  • 16. © 2015 Nexedi Problem 4: Transactional NumPy ● How to make NumPy arrays transaction safe? – Exception handling – Concurrent writes – Distributed computing
  • 17. © 2015 Nexedi Problem 5: Compatibility ● Compatibility with NumPy-based stack is a must ● Native BLAS support is a must ● Cython/FORTRAN/C/C++ support is a must ● Code rewrite is not an option – Blaze: not NumPy compatible below Python level – Dato: not NumPy compatible
  • 18. © 2015 Nexedi The Solution
  • 19. © 2015 Nexedi Unsolutions ● Update NumPy & libraries with calling notification hooks when memory is changed → not practical – There is a lot of code in numpy and lot of libraries around numpy – Catching them all would be a huge task ● Compare array data to original array content at commit time and store only found-to-be- changed parts → not good – At every commit whole array data has to be read/analyzed and array data can be very big
  • 20. © 2015 Nexedi Remember mmap? READ ● Region of memory mapped by kernel to a file ● Memory pages start with NONE protection → CPU can not read nor write ● Whenever read request comes from CPU, kernel traps it (thanks to MMU), loads content for that page from file, and resumes original read
  • 21. © 2015 Nexedi Remember mmap? WRITE ● Whenever write request comes from CPU, kernel traps it, marks the page as DIRTY, unprotects it and resumes original write → kernel knows which pages were modified ● Whenever application wants to make sure modified data is stored back to file (msync), kernel goes over list of dirty pages and writes their content back to file
  • 22. © 2015 Nexedi Partial Conclusion ● If we manage to represent arrays as files, we'll get “track-changes-to-content” from kernel
  • 23. © 2015 Nexedi FUSE ? ● FUSE & virtual filesystem representing "glued" arrays from ZODB BTree & objects ● Problem 1: does not work with huge pages – Performance issues – Not easy to fix ● Problem 2: no support for commit / abort – Transaction issues
  • 24. © 2015 Nexedi UVMM: Userspace Virtual Memory Manager ● Trap write access to memory via installing SIGSEGV signal handler
  • 25. © 2015 Nexedi UVMM ON CPU WRITE ● SIGSEGV handler gets notified, ● Marks corresponding array block as dirty ● Adjust memory protection to be read-write ● Resumes write instruction ● → we know which array parts were modified
  • 26. © 2015 Nexedi UVMM ON CPU READ ● Set pages initial protection to PROT_NONE → no-read and no-write ● First load in SIGSEGV handler ● When RAM is tight, we can "forget" already loaded (but not-yet modified) memory parts and free RAM for loading new data
  • 27. © 2015 Nexedi UVMM LIMITS ? ● Array size is only limited by virtual memory address space size → 127TB on Linux/amd64 (today) ● Future Linux kernel may support more
  • 28. © 2015 Nexedi Is it safe to do work in SIGSEGV handler? ● Short answer: YES ● Long answer: www.wendelin.io
  • 29. © 2015 Nexedi from wendelin.bigfile import BigFile # bigfile with data storage in 'some backend' class BigFile_SomeBackend(BigFile): .blksize = … # file is stored in block of size def loadblk(self, blk, buf) # load file block #blk to memory buffer `buf` def storeblk(self, blk, buf) # store data from memory buffer `buf` to file # block blk f = BigFile_SomeBackend(...) Tutorial: init a BigFile backend
  • 30. © 2015 Nexedi # BigFile handle is a representation of file snapshot that could be locally # modified in-memory. The changes could be later either discarded or stored # back to file. One file can have many opened handles each with its own # modifications. fh = f.fileh_open() # memory mapping of fh vma = fh.mmap(pgoffset=0, pglen=N) # vma exposes memoryview/buffer interfaces mem = memoryview(vma) # now we can do with `mem` whatever we like ... fh.dirty_discard() # to forget all changes done to `mem` memory fh.dirty_writeout(...) # to store changes back to file BigFile Handle: BigFile as Memory
  • 31. © 2015 Nexedi from webdelin.bigfile.file_zodb import ZBigFile import transaction f = ZBigFile() # create anew f = root['...'].some.object # load saved state from database # the same as with plain BigFile (previous example) fh = fileh_open() vma = fh.mmap(0, N) mem = memoryview(vma) # we can also modify other objects living in ZODB transaction.abort() # to abort all changes to mem and other objects transaction.commit() # to commit all changes to mem and other objects ZBigFile: ZODB & Transactions
  • 32. © 2015 Nexedi # f - some BigFile # n - some (large) number fh = f.fileh_open() # handle to bigfile (see slide ...) A = BigArray(shape=(n,10), dtype=uint32, fh) a = A[0:3*(1<<30), :] # real ndarray viewing first 3 giga-rows (= ~120GB) of # data from f # NOTE 120GB can be significantly > of RAM available a.mean() # computes mean of items in above range # this call is just an ndarray.mean() call and code # which works is the code in NumPy. # NOTE data will be loaded and freed by virtual memory # manager transparently to client code which computes # the mean BigArray: “ndarray” on top of BigFile
  • 33. © 2015 Nexedi a[2] = ... ... fh.dirty_discard() # to discard, or fh.dirty_writeout() # to write BigArray: Transactions
  • 34. © 2015 Nexedi from wendelin.bigarra.array_zodb import ZbigArray import transaction # root is connection to oped database root['sensor_data'] = A = ZBigArray(shape=..., dtype=...) # populate A with data A[2] = 1 # compute mean A.mean() # abort / commit changes transaction.abort() transaction.commit() ZBigArray: ZODB & Transactions
  • 35. © 2015 Nexedi NEO and ZBigArray ZBigArray 1 2 3 4 5 6 7 8 9 10 11 12 5 9 6 10 7 11 1 2 3 4 8 12
  • 36. © 2015 Nexedi Future Improvements ● Temporary arrays created by NumPy libraries ● Performance ● Multithreading
  • 38. © 2015 Nexedi Roadmap ● Make wendelin.core fast – userfaultfd, filesystem-based approach – remove use of pickles – remove large temporary arrays in NumPy, etc. ● Yet, you can start using wendelin.core now! – Persistent – Distributed – Out-of-core – Transactional – Virtually no change to your code needed – Open Source www.wendelin.io
  • 39. © 2015 Nexedi wendelin.core effortless out-of-core NumPy 2014-04-03 – Paris www.wendelin.io