SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
'FUSE'ing Python for rapid development of
        storage efficient file-system




               PyCon APAC ‘12
          Singapore, Jun 07-09, 2012

        Chetan Giridhar, Vishal Kanaujia
File Systems
• Provides way to organize, store, retrieve and manage
  information
• Abstraction layer
• File system:                                           User
                                         User space
   – Maps name to an object
                                         Kernel space
   – Objects to file contents
                                                      File System
• File system types (format):
                                                      Hardware
   – Media based file systems (FAT, ext2)
   – Network file systems (NFS)
   – Special-purpose file systems (procfs)
• Services
   – open(), read(), write(), close()…
Virtual File-system
• To support multiple FS in *NIX
• VFS is an abstract layer in kernel
• Decouples file system implementation from the
  interface (POSIX API)
   – Common API serving different file system types
• Handles user calls related to file systems.
   – Implements generic FS actions
   – Directs request to specific code to handle the request
• Associate (and disassociate) devices with instances of
  the appropriate file system.
File System: VFS

                              User                   Myapp.py: open(), read()


                            glibc.so                  glibc.so: open(), read()
System call
 interface                                           System call: sys_open(),
                                                           sys_read()
                               VFS
Kernel space                                             VFS: vfs_open(),
                                                           vfs_read()

               ext2           NFS           procfs
                      Block layer/ Device
                       drivers/Hardware
Developing FS in *NIX
• In-kernel file-systems (traditionally)
• It is a complex task
     •   Understanding kernel libraries and modules
     •   Development experience in kernel space
     •   Managing disk i/o
     •   Time consuming and tedious development
          – Frequent kernel panic
          – Higher testing efforts
     • Kernel bloating and side effects like security
Solution: User space
• In user space:
  – Shorter development cycle
  – Easy to update fixes, test and distribute
  – More flexibility
     • Programming tools, debuggers, and libraries as you
       have if you were developing standard *NIX applications
• User-space file-systems
  – File systems become regular applications (as
    opposed to kernel extensions)
FUSE (Filesystem in USErspace)
• Implement a file system in user-space
   – no kernel code required!
• Secure, non-privileged mounts
• Useful to develop “virtual” file-systems
   – Allows you to imagine “anything” as a file ☺
   – local disk, across the network, from memory, or any other
     combination
• User operates on a mounted instance of FS:
   - Unix utilities
   - POSIX libraries
FUSE: Diving deeper
  Myfile.py:
   open()                        CustomFS: open()




    glibc.so                        libfuse.so



VFS: vfs_open()        fuse.ko      /dev/fuse
FUSE | develop
• Choices of development in C, C++, Java, … and of course
  Python!
• Python interface for FUSE
   – (FusePython: most popularly used)
• Open source project
   – http://fuse.sourceforge.net/
• For ubuntu systems:
   $sudo apt-get instatall python-fuse
   $mkdir ./mount_point
   $python myfuse.py ./mount_point
   $fusermount -u ./mount_point
FUSE API Overview
• File management
   – open(path)
   – create(path, mode)
   – read(path, length, offset)
   – write(path, data, offset)
 • Directory and file system management
   – unlink(path)
   – readdir(path)
• Metadata operations
   – getattr(path)
   – chmod(path, mode)
   – chown(path, uid, gid)
seFS – storage efficient FS
• A prototype, experimental file system with:
   – Online data de-duplication (SHA1)
   – Compression (text based)
• SQLite
• Ubuntu 11.04, Python-Fuse Bindings
• Provides following services:
   open()           write()        chmod()
   create()         readdir()      chown()
   read()           unlink()
seFS Architecture
                                     <<FUSE code>>
               Your                    myfuse.py
            application
                                         File
                                       System
                                      Operations
             <<SQLite DB
             Interface>>
               seFS.py

                                                         storage
                                                       Efficiency =
                                                     De-duplication
                                                             +
  <<pylibrary>>                                       Compression
SQLiteHandler.py
 Compression.py            seFS DB
    Crypto.py
seFS: Database
CREATE TABLE metadata(                    CREATE TABLE data(
       "id" INTEGER,                           "id" INTEGER
      "abspath" TEXT,                         PRIMARY KEY
     "length" INTEGER,                      AUTOINCREMENT,
       "mtime" TEXT,        seFS                "sha" TEXT,
       "ctime" TEXT,                           "data" BLOB,
       "atime" TEXT,
                           Schema            "length" INTEGER,
     "inode" INTEGER);                     "compressed" BLOB);




                         data table




                         metadata table
seFS API flow
$touch abc     $rm abc         $cat >> abc    User
                                              Operations




getattr()                         getattr()
               getattr()
                                   open()
 create()      access()                       seFS APIs
                                   write()
 open()        unlink()
                                   flush()
 create()
                                  release()
release()
                           storage
               seFS DB     Efficiency
seFS: Code
def getattr(self, path):
    sefs = seFS()
    stat = fuse.stat()                                   stat.stat_ino =
    context = fuse.FuseGetContext()                                  int(sefs.getinode(path))
    #Root
    if path == '/':                                       # Get the file size from DB
         stat.stat_nlink = 2                              if sefs.getlength(path) is not None:
         stat.stat_mode = stat.S_IFDIR | 0755                stat.stat_size =
    else:                                                            int(sefs.getlength(path))
         stat.stat_mode = stat.S_IFREG | 0777             else:
         stat.stat_nlink = 1                                  stat.stat_size = 0
         stat.stat_uid, stat.stat_gid =                   return stat
                       (context ['uid'], context   else:
['gid'])                                              return - errno.ENOENT

        # Search for this path in DB
        ret = sefs.search(path)
        # If file exists in DB, get its times
        if ret is True:
            tup = sefs.getutime(path)
            stat.stat_mtime =
              int(tup[0].strip().split('.')[0])
            stat.stat_ctime =
              int(tup[1].strip().split('.')[0])
            stat.stat_atime =
              int(tup[2].strip().split('.')[0])
seFS: Code…
 def create(self, path,flags=None,mode=None):
    sefs = seFS()
    ret = self.open(path, flags)
    if ret == -errno.ENOENT:
    #Create the file in database
         ret = sefs.open(path)
         t = int(time.time())
         mytime = (t, t, t)
         ret = sefs.utime(path, mytime)
         self.fd = len(sefs.ls())
         sefs.setinode(path, self.fd)
         return 0


def write(self, path, data, offset):
    length = len(data)
    sefs = seFS()
    ret = sefs.write(path, data)
    return length
seFS: Learning
• Design your file system and define the
  objectives first, before development
• Database schema is crucial
• skip implementing functionality your file
  system doesn’t intend to support
• Knowledge on FUSE API is essential
  – FUSE APIs are look-alike to standard POSIX APIs
  – Limited documentation of FUSE API
• Performance?
Conclusion
• Development of FS is very easy with FUSE
• Python aids RAD with Python-Fuse bindings
• seFS: Thought provoking implementation
• Creative applications – your needs and
  objectives
• When are you developing your own File
  system?! ☺
Further Read
• Sample Fuse based File systems
  – Sshfs
  – YoutubeFS
  – Dedupfs
  – GlusterFS
• Python-Fuse bindings
  – http://fuse.sourceforge.net/
• Linux internal manual
Contact Us
• Chetan Giridhar
  – http://technobeans.com
  – cjgiridhar@gmail.com
• Vishal Kanaujia
  – http://freethreads.wordpress.com
  – vishalkanaujia@gmail.com
Backup
Agenda
• The motivation
• Intro to *NIX File Systems
• Trade off: code in user and kernel space
• FUSE?
• Hold on – What’s VFS?
• Diving into FUSE internals
• Design and develop your own File System with
  Python-FUSE bindings
• Lessons learnt
• Python-FUSE: Creative applications/ Use-cases
User-space and Kernel space
• Kernel-space
   – Kernel code including device drivers
   – Kernel resources (hardware)
   – Privileged user
• User-space
   – User application runs
   – Libraries dealing with kernel space
   – System resources
FUSE: Internals
• Three major components:
  – Userspace library (libfuse.*)
  – Kernel module (fuse.ko)
  – Mount utility (fusermount)
• Kernel module hooks in to VFS
  – Provides a special device “/dev/fuse”
     • Can be accessed by a user-space process
     • Interface: user-space application and fuse kernel
       module
     • Read/ writes occur on a file descriptor of /dev/fuse
FUSE Workflow
                        Custatom File Systatem




   User (file I/O)         User-space FUSE lib

                                                 User space


                                                 Kernel space

Virtual File Systatem       FUSE: kernel lib
Facts and figures
• seFS – online storage efficiency
• De-duplication/ compression
   – Managed catalogue information (file meta-data rarely changes)
   – Compression encoded information
• Quick and easy prototyping (Proof of concept)
• Large dataset generation
   – Data generated on demand
Creative applications: FUSE based File
               systems
• SSHFS: Provides access to a remote file-system
  through SSH
• WikipediaFS: View and edit Wikipedia articles as
  if they were real files
• GlusterFS: Clustered Distributed Filesystem
  having capability to scale up to several petabytes.
• HDFS: FUSE bindings exist for the open
  source Hadoop distributed file system
• seFS: You know it already ☺
Fuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FS

Contenu connexe

Tendances

Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
 
Javase7 1641812
Javase7 1641812Javase7 1641812
Javase7 1641812Vinay H G
 
Rhel 6.2 complete ebook
Rhel 6.2 complete ebookRhel 6.2 complete ebook
Rhel 6.2 complete ebookYash Gulati
 
Apache hadoop 2_installation
Apache hadoop 2_installationApache hadoop 2_installation
Apache hadoop 2_installationsushantbit04
 
Linux command line cheatsheet
Linux command line cheatsheetLinux command line cheatsheet
Linux command line cheatsheetWe Ihaveapc
 
Rhel 6.2 complete ebook
Rhel 6.2  complete ebookRhel 6.2  complete ebook
Rhel 6.2 complete ebookYash Gulati
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows habeebulla g
 
Course 102: Lecture 5: File Handling Internals
Course 102: Lecture 5: File Handling Internals Course 102: Lecture 5: File Handling Internals
Course 102: Lecture 5: File Handling Internals Ahmed El-Arabawy
 

Tendances (20)

Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
2016 03 15_biological_databases_part4
2016 03 15_biological_databases_part42016 03 15_biological_databases_part4
2016 03 15_biological_databases_part4
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
Registry
RegistryRegistry
Registry
 
Javase7 1641812
Javase7 1641812Javase7 1641812
Javase7 1641812
 
Mysql all
Mysql allMysql all
Mysql all
 
Rhel 6.2 complete ebook
Rhel 6.2 complete ebookRhel 6.2 complete ebook
Rhel 6.2 complete ebook
 
Linux basic
Linux basicLinux basic
Linux basic
 
Apache hadoop 2_installation
Apache hadoop 2_installationApache hadoop 2_installation
Apache hadoop 2_installation
 
Persistences
PersistencesPersistences
Persistences
 
Linux command line cheatsheet
Linux command line cheatsheetLinux command line cheatsheet
Linux command line cheatsheet
 
Beginning hive and_apache_pig
Beginning hive and_apache_pigBeginning hive and_apache_pig
Beginning hive and_apache_pig
 
Rhel 6.2 complete ebook
Rhel 6.2  complete ebookRhel 6.2  complete ebook
Rhel 6.2 complete ebook
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Course 102: Lecture 5: File Handling Internals
Course 102: Lecture 5: File Handling Internals Course 102: Lecture 5: File Handling Internals
Course 102: Lecture 5: File Handling Internals
 
Android Data Persistence
Android Data PersistenceAndroid Data Persistence
Android Data Persistence
 
Linux cheat-sheet
Linux cheat-sheetLinux cheat-sheet
Linux cheat-sheet
 
Linux configer
Linux configerLinux configer
Linux configer
 

En vedette

Bytecode Optimizations
Bytecode OptimizationsBytecode Optimizations
Bytecode OptimizationsESUG
 
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...akaptur
 
Diving into byte code optimization in python
Diving into byte code optimization in python Diving into byte code optimization in python
Diving into byte code optimization in python Chetan Giridhar
 
Testers in product development code review phase
Testers in product development   code review phaseTesters in product development   code review phase
Testers in product development code review phaseChetan Giridhar
 
Design patterns in python v0.1
Design patterns in python v0.1Design patterns in python v0.1
Design patterns in python v0.1Chetan Giridhar
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonChetan Giridhar
 
Rapid development & integration of real time communication in websites
Rapid development & integration of real time communication in websitesRapid development & integration of real time communication in websites
Rapid development & integration of real time communication in websitesChetan Giridhar
 

En vedette (7)

Bytecode Optimizations
Bytecode OptimizationsBytecode Optimizations
Bytecode Optimizations
 
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotha...
 
Diving into byte code optimization in python
Diving into byte code optimization in python Diving into byte code optimization in python
Diving into byte code optimization in python
 
Testers in product development code review phase
Testers in product development   code review phaseTesters in product development   code review phase
Testers in product development code review phase
 
Design patterns in python v0.1
Design patterns in python v0.1Design patterns in python v0.1
Design patterns in python v0.1
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
 
Rapid development & integration of real time communication in websites
Rapid development & integration of real time communication in websitesRapid development & integration of real time communication in websites
Rapid development & integration of real time communication in websites
 

Similaire à Fuse'ing python for rapid development of storage efficient FS

Fuse'ing python for rapid development of storage efficient
Fuse'ing python for rapid development of storage efficientFuse'ing python for rapid development of storage efficient
Fuse'ing python for rapid development of storage efficientVishal Kanaujia
 
An Introduction to User Space Filesystem Development
An Introduction to User Space Filesystem DevelopmentAn Introduction to User Space Filesystem Development
An Introduction to User Space Filesystem DevelopmentMatt Turner
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Development of the irods rados plugin @ iRODS User group meeting 2014
Development of the irods rados plugin @ iRODS User group meeting 2014Development of the irods rados plugin @ iRODS User group meeting 2014
Development of the irods rados plugin @ iRODS User group meeting 2014mgrawinkel
 
Writing file system in CPython
Writing file system in CPythonWriting file system in CPython
Writing file system in CPythondelimitry
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msJodok Batlogg
 
Functional Programming with Streams in node.js
Functional Programming with Streams in node.jsFunctional Programming with Streams in node.js
Functional Programming with Streams in node.jsAdam Crabtree
 
20090514 Introducing Puppet To Sasag
20090514 Introducing Puppet To Sasag20090514 Introducing Puppet To Sasag
20090514 Introducing Puppet To Sasaggarrett honeycutt
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentJim Mlodgenski
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pigSudar Muthu
 
Script of Scripts Polyglot Notebook and Workflow System
Script of ScriptsPolyglot Notebook and Workflow SystemScript of ScriptsPolyglot Notebook and Workflow System
Script of Scripts Polyglot Notebook and Workflow SystemBo Peng
 
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
xPatterns on Spark, Tachyon and Mesos - Bucharest meetupxPatterns on Spark, Tachyon and Mesos - Bucharest meetup
xPatterns on Spark, Tachyon and Mesos - Bucharest meetupRadu Chilom
 
第2回 Hadoop 輪読会
第2回 Hadoop 輪読会第2回 Hadoop 輪読会
第2回 Hadoop 輪読会Toshihiro Suzuki
 
An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"sandinmyjoints
 
Secure .NET programming
Secure .NET programmingSecure .NET programming
Secure .NET programmingAnte Gulam
 
Node.js - A practical introduction (v2)
Node.js  - A practical introduction (v2)Node.js  - A practical introduction (v2)
Node.js - A practical introduction (v2)Felix Geisendörfer
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosEuangelos Linardos
 

Similaire à Fuse'ing python for rapid development of storage efficient FS (20)

Fuse'ing python for rapid development of storage efficient
Fuse'ing python for rapid development of storage efficientFuse'ing python for rapid development of storage efficient
Fuse'ing python for rapid development of storage efficient
 
An Introduction to User Space Filesystem Development
An Introduction to User Space Filesystem DevelopmentAn Introduction to User Space Filesystem Development
An Introduction to User Space Filesystem Development
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
PyFilesystem
PyFilesystemPyFilesystem
PyFilesystem
 
Development of the irods rados plugin @ iRODS User group meeting 2014
Development of the irods rados plugin @ iRODS User group meeting 2014Development of the irods rados plugin @ iRODS User group meeting 2014
Development of the irods rados plugin @ iRODS User group meeting 2014
 
Writing file system in CPython
Writing file system in CPythonWriting file system in CPython
Writing file system in CPython
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900ms
 
Functional Programming with Streams in node.js
Functional Programming with Streams in node.jsFunctional Programming with Streams in node.js
Functional Programming with Streams in node.js
 
20090514 Introducing Puppet To Sasag
20090514 Introducing Puppet To Sasag20090514 Introducing Puppet To Sasag
20090514 Introducing Puppet To Sasag
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Script of Scripts Polyglot Notebook and Workflow System
Script of ScriptsPolyglot Notebook and Workflow SystemScript of ScriptsPolyglot Notebook and Workflow System
Script of Scripts Polyglot Notebook and Workflow System
 
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
xPatterns on Spark, Tachyon and Mesos - Bucharest meetupxPatterns on Spark, Tachyon and Mesos - Bucharest meetup
xPatterns on Spark, Tachyon and Mesos - Bucharest meetup
 
Vfs
VfsVfs
Vfs
 
第2回 Hadoop 輪読会
第2回 Hadoop 輪読会第2回 Hadoop 輪読会
第2回 Hadoop 輪読会
 
An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"
 
Secure .NET programming
Secure .NET programmingSecure .NET programming
Secure .NET programming
 
04 standard class library c#
04 standard class library c#04 standard class library c#
04 standard class library c#
 
Node.js - A practical introduction (v2)
Node.js  - A practical introduction (v2)Node.js  - A practical introduction (v2)
Node.js - A practical introduction (v2)
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
 

Dernier

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Fuse'ing python for rapid development of storage efficient FS

  • 1. 'FUSE'ing Python for rapid development of storage efficient file-system PyCon APAC ‘12 Singapore, Jun 07-09, 2012 Chetan Giridhar, Vishal Kanaujia
  • 2. File Systems • Provides way to organize, store, retrieve and manage information • Abstraction layer • File system: User User space – Maps name to an object Kernel space – Objects to file contents File System • File system types (format): Hardware – Media based file systems (FAT, ext2) – Network file systems (NFS) – Special-purpose file systems (procfs) • Services – open(), read(), write(), close()…
  • 3. Virtual File-system • To support multiple FS in *NIX • VFS is an abstract layer in kernel • Decouples file system implementation from the interface (POSIX API) – Common API serving different file system types • Handles user calls related to file systems. – Implements generic FS actions – Directs request to specific code to handle the request • Associate (and disassociate) devices with instances of the appropriate file system.
  • 4. File System: VFS User Myapp.py: open(), read() glibc.so glibc.so: open(), read() System call interface System call: sys_open(), sys_read() VFS Kernel space VFS: vfs_open(), vfs_read() ext2 NFS procfs Block layer/ Device drivers/Hardware
  • 5. Developing FS in *NIX • In-kernel file-systems (traditionally) • It is a complex task • Understanding kernel libraries and modules • Development experience in kernel space • Managing disk i/o • Time consuming and tedious development – Frequent kernel panic – Higher testing efforts • Kernel bloating and side effects like security
  • 6. Solution: User space • In user space: – Shorter development cycle – Easy to update fixes, test and distribute – More flexibility • Programming tools, debuggers, and libraries as you have if you were developing standard *NIX applications • User-space file-systems – File systems become regular applications (as opposed to kernel extensions)
  • 7. FUSE (Filesystem in USErspace) • Implement a file system in user-space – no kernel code required! • Secure, non-privileged mounts • Useful to develop “virtual” file-systems – Allows you to imagine “anything” as a file ☺ – local disk, across the network, from memory, or any other combination • User operates on a mounted instance of FS: - Unix utilities - POSIX libraries
  • 8. FUSE: Diving deeper Myfile.py: open() CustomFS: open() glibc.so libfuse.so VFS: vfs_open() fuse.ko /dev/fuse
  • 9. FUSE | develop • Choices of development in C, C++, Java, … and of course Python! • Python interface for FUSE – (FusePython: most popularly used) • Open source project – http://fuse.sourceforge.net/ • For ubuntu systems: $sudo apt-get instatall python-fuse $mkdir ./mount_point $python myfuse.py ./mount_point $fusermount -u ./mount_point
  • 10. FUSE API Overview • File management – open(path) – create(path, mode) – read(path, length, offset) – write(path, data, offset) • Directory and file system management – unlink(path) – readdir(path) • Metadata operations – getattr(path) – chmod(path, mode) – chown(path, uid, gid)
  • 11. seFS – storage efficient FS • A prototype, experimental file system with: – Online data de-duplication (SHA1) – Compression (text based) • SQLite • Ubuntu 11.04, Python-Fuse Bindings • Provides following services: open() write() chmod() create() readdir() chown() read() unlink()
  • 12. seFS Architecture <<FUSE code>> Your myfuse.py application File System Operations <<SQLite DB Interface>> seFS.py storage Efficiency = De-duplication + <<pylibrary>> Compression SQLiteHandler.py Compression.py seFS DB Crypto.py
  • 13. seFS: Database CREATE TABLE metadata( CREATE TABLE data( "id" INTEGER, "id" INTEGER "abspath" TEXT, PRIMARY KEY "length" INTEGER, AUTOINCREMENT, "mtime" TEXT, seFS "sha" TEXT, "ctime" TEXT, "data" BLOB, "atime" TEXT, Schema "length" INTEGER, "inode" INTEGER); "compressed" BLOB); data table metadata table
  • 14. seFS API flow $touch abc $rm abc $cat >> abc User Operations getattr() getattr() getattr() open() create() access() seFS APIs write() open() unlink() flush() create() release() release() storage seFS DB Efficiency
  • 15. seFS: Code def getattr(self, path): sefs = seFS() stat = fuse.stat() stat.stat_ino = context = fuse.FuseGetContext() int(sefs.getinode(path)) #Root if path == '/': # Get the file size from DB stat.stat_nlink = 2 if sefs.getlength(path) is not None: stat.stat_mode = stat.S_IFDIR | 0755 stat.stat_size = else: int(sefs.getlength(path)) stat.stat_mode = stat.S_IFREG | 0777 else: stat.stat_nlink = 1 stat.stat_size = 0 stat.stat_uid, stat.stat_gid = return stat (context ['uid'], context else: ['gid']) return - errno.ENOENT # Search for this path in DB ret = sefs.search(path) # If file exists in DB, get its times if ret is True: tup = sefs.getutime(path) stat.stat_mtime = int(tup[0].strip().split('.')[0]) stat.stat_ctime = int(tup[1].strip().split('.')[0]) stat.stat_atime = int(tup[2].strip().split('.')[0])
  • 16. seFS: Code… def create(self, path,flags=None,mode=None): sefs = seFS() ret = self.open(path, flags) if ret == -errno.ENOENT: #Create the file in database ret = sefs.open(path) t = int(time.time()) mytime = (t, t, t) ret = sefs.utime(path, mytime) self.fd = len(sefs.ls()) sefs.setinode(path, self.fd) return 0 def write(self, path, data, offset): length = len(data) sefs = seFS() ret = sefs.write(path, data) return length
  • 17. seFS: Learning • Design your file system and define the objectives first, before development • Database schema is crucial • skip implementing functionality your file system doesn’t intend to support • Knowledge on FUSE API is essential – FUSE APIs are look-alike to standard POSIX APIs – Limited documentation of FUSE API • Performance?
  • 18. Conclusion • Development of FS is very easy with FUSE • Python aids RAD with Python-Fuse bindings • seFS: Thought provoking implementation • Creative applications – your needs and objectives • When are you developing your own File system?! ☺
  • 19. Further Read • Sample Fuse based File systems – Sshfs – YoutubeFS – Dedupfs – GlusterFS • Python-Fuse bindings – http://fuse.sourceforge.net/ • Linux internal manual
  • 20. Contact Us • Chetan Giridhar – http://technobeans.com – cjgiridhar@gmail.com • Vishal Kanaujia – http://freethreads.wordpress.com – vishalkanaujia@gmail.com
  • 22. Agenda • The motivation • Intro to *NIX File Systems • Trade off: code in user and kernel space • FUSE? • Hold on – What’s VFS? • Diving into FUSE internals • Design and develop your own File System with Python-FUSE bindings • Lessons learnt • Python-FUSE: Creative applications/ Use-cases
  • 23. User-space and Kernel space • Kernel-space – Kernel code including device drivers – Kernel resources (hardware) – Privileged user • User-space – User application runs – Libraries dealing with kernel space – System resources
  • 24. FUSE: Internals • Three major components: – Userspace library (libfuse.*) – Kernel module (fuse.ko) – Mount utility (fusermount) • Kernel module hooks in to VFS – Provides a special device “/dev/fuse” • Can be accessed by a user-space process • Interface: user-space application and fuse kernel module • Read/ writes occur on a file descriptor of /dev/fuse
  • 25. FUSE Workflow Custatom File Systatem User (file I/O) User-space FUSE lib User space Kernel space Virtual File Systatem FUSE: kernel lib
  • 26. Facts and figures • seFS – online storage efficiency • De-duplication/ compression – Managed catalogue information (file meta-data rarely changes) – Compression encoded information • Quick and easy prototyping (Proof of concept) • Large dataset generation – Data generated on demand
  • 27. Creative applications: FUSE based File systems • SSHFS: Provides access to a remote file-system through SSH • WikipediaFS: View and edit Wikipedia articles as if they were real files • GlusterFS: Clustered Distributed Filesystem having capability to scale up to several petabytes. • HDFS: FUSE bindings exist for the open source Hadoop distributed file system • seFS: You know it already ☺