SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
IBM Linux Technology Center




                      VirtFS
A virtualization aware File System pass-through



     Venkateswararao Jujjuri (JV)
     jvrao@us.ibm.com
     Linux Symposium 2010




                                             © 2010 IBM Corporation
IBM Linux Technology Center


Paravirtual Applications and System Services

 Move up the virtualization intelligence into system
    services.
   Being explored by research and academic communities
    but largely ignored by the mainline.
   Provides hybrid environment leveraging the security,
    isolation, and performance.
   Visibility into guest operations allow hypervisor to offer
    variety of use cases.
       Desktops, network sharing, file systems
   Avoids a layer of indirection and boosts performance.
   Adding this to the existing device virtualization takes
    the virtualization to next level.




                                                         © 2010 IBM Corporation
IBM Linux Technology Center


Paravirtual File Systems
 Good target as an entry into paravirtual system
  services.
 Virtual storage in the form of virtual disks.
     Can't be shared between multiple guests.
    Redundant       caching
    Unnecessary         indirection between FS and block
      layer.
 Using traditional distributed/network file systems over
  virtualized network device.
     Configuration, management and encapsulation
      overheads.
    Double    caching.
    Different    semantics for different File Systems.

                                                            © 2010 IBM Corporation
IBM Linux Technology Center


Use cases of Paravirtual File Systems
 Replace virtual disk as the root filesystem.
    Rapid    cloning, Easy management, secure.
 Sharing between host and guests.
 Offer file system services to thin clients like LibraryOS.
 Cloud computing
     Secure window of host file system on the guest.
    Different
             portions of the same file system shared
     among different guests.
    Knowledge     about the guest activity enables
      hypervisor to offer services like de-dup, snapshots
      etc.
 Better utilization of system resources.


                                                       © 2010 IBM Corporation
IBM Linux Technology Center


VirtFS
 Paravirtual file system pass-through between the KVM
  host and guest.
 Uses 9P Protocol between Client and Server.
      9P2000.L protocol is being developed/defined as part of
       this effort.
 Server is part of QEMU and uses VirtIO transport.
      File System is exported to the guest at the invocation of
       QEMU.
 Client is part of the Guest Kernel.
      Mounted on the guest with the mount tag defined during
       the QEMU invocation.




                                                           © 2010 IBM Corporation
IBM Linux Technology Center


Plan 9 Overview
 Plan 9 OS is developed by AT&T Bell laboratories
  (Alcatel Lucent).
 Intention was to address Unix shortcomings
 Seamless distributed system with integrated secure
  network resource sharing.
 Three core design principles
      Single set of simple, well-defined interfaces to services.
      Simple protocol to securely distribute the interfaces
       across any network
      Dynamic hierarchical structure to organize these
       interfaces.
 Unix pioneered the concept of treating devices like files,
 Plan 9 took the metaphor further by using file
 operations as the simple well-defined interfaces to all
 system and application services.
                                                               © 2010 IBM Corporation
IBM Linux Technology Center


9P Overview
 9P represents the protocol/abstract interface used to
    access resources under Plan 9.
   Any transport can be used. The only requirement is it
    should be a reliable, in-order transport.
   Made into Linux kernel 2.6.14 and had major changes
    in 2.6.24.
   Part of Linux mainline with VirtIO transport support.
   9P2000.u extension
        For POSIX adoption, during Linux port the protocol was
         extended with 9P2000.u version.
        Provided support for numeric uid/gid, extended
         operations to support symlinks, links, special files etc.
        Did not include full support for Linux operations.



                                                               © 2010 IBM Corporation
IBM Linux Technology Center


9P2000.L Protocol extension
 Aimed at addressing 9P2000.u protocol deficiencies
 while keeping the core protocol elements intact.
      New opcodes which match Linux VFS API in
       complimentary name space.
      Linux native data formats (stat/permissions, etc)
      Support of xattr, locking, quotas etc.
 Co-exist with legacy and 9P2000.L with no changes to
  the existing operations.
 Protocol version is negotiated during initial hand shake.
  If server doesn't support that version, client falls back.




                                                           © 2010 IBM Corporation
IBM Linux Technology Center


KVM and QEMU
 Kernel based Virtual Machine - KVM
      Is a full virtualization solution for Linux on x86 h/w
       containing virtualization extensions ( VT-X / AMD-V)
      Set of Linux kernel modules offer a special process
       mode to the user spaces processes (kvm.ko, kvm-
       intel.ko or kvm-amd.ko)

 Quick EMUlator – QEMU
      Uses interfaces provided by KVM to offer full system
       virtualization.
      Emulates standard PC hardware such as IDE disk, VGA
       graphics, PCI devices etc.
      Any I/O requests a guest OS makes are intercepted and
       routed to the user mode to be emulated by the QEMU
       process.
                                                             © 2010 IBM Corporation
IBM Linux Technology Center


VirtIO Transport
 A paravirtual IO bus based on hypervisor neutral DMA
  API.
 Offers lockless ring queues between the guest and the
  host to enable zero-copy bulk data transfer.
 VirtIO PCI transport allow VirtFS to be implemented in
  such a way that guest driven I/O Operations can be
  zero-copy.




                                                    © 2010 IBM Corporation
IBM Linux Technology Center


VirtFS Block Diagram


                                           Apps on Guest
           VirtIO
           Ring
                                                        VFS Interface
                                   VirtFS (v9fs)
                                       Client
                                                                           Host
                                                                        User Space
                                           Guest Kernel
           VirtFS
           Server             FS API
      (v9fs server in QEMU)
                                                   File System
           VFS Interface

                              HOST KERNEL
                              HARDWARE



                                                                          © 2010 IBM Corporation
IBM Linux Technology Center


VirtFS Implementation
 KVM, QEMU, and VirtIO presents an ideal platform for the
  VirtFS server.
 Two types of virtual devices
      virtio-9p-pci, used to transport protocol messages and data
       between host and the guest.
      Fsdev, used to define the exported file system characteristics
       like fs type and security model etc.
 Command line options
    -fsdev
     local,id=myid,path=/share_path/,security_model=mapped
     -device virtio-9p-pci,fsdev=myid,mount_tag=v_tag1
    -virtfs
     local,path=/share_path/,security_model=passthrough,mnt_ta
     g=v_tag2
    On Client mount -t 9p -o trans=virtio -o version=9p2000.L
     v_tag1 /mnt
                                                           © 2010 IBM Corporation
IBM Linux Technology Center


Security
 Two models of security enforcement
 One with complete isolation of guest user domain from
 that of the host.
      Eliminates the need for root squashing
      No setuid/setgid exposures.
      Complete isolation enhances security.
      Not very portable.
 Other model shares user domains between host and the
 guests.
      Follows transitional network file system model.
      If not careful, it is susceptible to security holes.
 Client based security enforcement.
 Server makes sure that the client control never crosses
 the exported portion.
                                                              © 2010 IBM Corporation
IBM Linux Technology Center


Security Model - Mapped
 VirtFS server intercepts and maps all file object create
    and get/set attribute requests from client.
   Files are created with VirtFS server's user credentials.
   Client user credentials are stored in extended
    attributes.
   Extended user attributes are allowed for regular files
    and directories only.
   For special files, corresponding regular files are created
    on file server and appropriate mode bits are added to
    extended attributes.
   This enhances security.
        Guest user domain is completely isolated from host's.
        Symlinks can't be followed locally on the file server.
 File system will be VirtFS'ized.

                                                                  © 2010 IBM Corporation
IBM Linux Technology Center


Security Model – Mapped                           (Cont...)
   On Host (ls -l output)
  drwx------. 2 virfsuid virtfsgid 4096 2010-05-11 09:19 adir
  -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:36 afifo
  -rw-------. 2 virfsuid virtfsgid 0 2010-05-11 09:19 afile
  -rw-------. 2 virfsuid virtfsgid 0 2010-05-11 09:19 alink
  -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:57 asocket1
  -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:32 blkdev
  -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:33 chardev
  -rw-------. 1 root root 6 2010-05-11 09:20 asymlink

   On Guest (ls -l output)
  drwxr-xr-x 2 guestuser guestuser 4096 2010-05-11 12:19 adir
  prw-r--r-- 1 guestuser guestuser 0 2010-05-11 12:36 afifo
  -rw-r--r-- 2 guestuser guestuser 0 2010-05-11 12:19 afile
  -rw-r--r-- 2 guestuser guestuser 0 2010-05-11 12:19 alink
  srwxr-xr-x 1 guestuser guestuser 0 2010-05-11 12:57 asocket1
  brw-r--r-- 1 guestuser guestuser 0, 0 2010-05-11 12:32 blkdev
  crw-r--r-- 1 guestuser guestuser 4, 5 2010-05-11 12:33 chardev
  lrwxrwxrwx 1 root root 6 2010-05-11 12:20 asymlink -> afile
                                                                 © 2010 IBM Corporation
IBM Linux Technology Center


Security Model – Passthrough
 All the requests are passed directly to underlying file
  system without any interception.
 File system objects on the fileserver will be created with
  client-user's credentials.
 Two methods to do this:
      setuid/setgid during the creation.
      chmod/chown immediately after creation.
 All special files are created as-is.
 Portable between NFS/CIFS.
 Susceptible to security issues.
      Client root can create files on the fileserver with root
       privileges if fileserver is running as root.
      Symlinks can be followed locally.


                                                             © 2010 IBM Corporation
IBM Linux Technology Center


Security Model – Passthrough                        (Cont...)
   On Host
  # grep 611 /etc/passwd
  hostuser:x:611:611::/home/hostuser:/bin/bash
  # ls -l
  -rwxrwxrwx. 2 hostuser hostuser 0 2010-05-12 18:14 file1
  -rwxrwxrwx. 2 hostuser hostuser 0 2010-05-12 18:14 link1
  srwxrwxr-x. 1 hostuser hostuser 0 2010-05-12 18:27 mysock
  lrwxrwxrwx. 1 hostuser hostuser 5 2010-05-12 18:25 symlink1 -> file1

   On Guest
  $ grep 611 /etc/passwd
  guestuser:x:611:611::/home/guestuser:/bin/bash
  $ ls -l
  -rwxrwxrwx 2 guestuser guestuser 0 2010-05-12 21:14 file1
  -rwxrwxrwx 2 guestuser guestuser 0 2010-05-12 21:14 link1
  srwxrwxr-x 1 guestuser guestuser 0 2010-05-12 21:27 mysock
  lrwxrwxrwx 1 guestuser guestuser 5 2010-05-12 21:25 symlink1 ->file1



                                                            © 2010 IBM Corporation
IBM Linux Technology Center


ACL Implementation
 Access Control Lists (ACLs) allow fine grained control.
 No universal standards.
 Linux offers POSIX ACLs, but they are not versatile/rich
  enough to support NFSv4.
 Rich ACL patch set for Linux is on the mailing list.
 Strategy for VirtFS
      Enforcement at client.
      Support only one ACL model.
      Start with POSIX ACLs
      Help Rich ACLs to make into the mainline.
      Convert to Rich ACLs once they are available on
       mainline.




                                                         © 2010 IBM Corporation
IBM Linux Technology Center


Where are we?
 VirtFS server is in QEMU mainline.
 Security model patchset had been accepted into QEMU
    mainline, part of QEMU 0.13
   Several patches made into mainline Linux.
   Fedora13 and Lucid mounts VirtFS (9P2000.U).
   Making good progress on 9P2000.L. Implemented all
    the required VFS calls to satisfy Tuxera POSIX test
    suite. These patches are either on the list or already
    got accepted.
   A patchset to generalize worker thread infrastructure in
    QEMU is on mainline. Working on to convert the current
    single thread server into multi-thread using that
    infrastructure.
   Working on POSIX ACLs, byte range lock
    implementation.
                                                       © 2010 IBM Corporation
IBM Linux Technology Center


Performance Tests
 Plain & Simple; dd to compare.
 Write
      dd if=/dev/zero of=/mnt/fileX bs=<blocksize>
       count=<count>
 Read
      dd if=/mnt/fileX of=/dev/null bs=<blocksize>
       count=<count>
 blocksize - 8k, 64k, 2M.
 Count = Number of blocks to do 8GB worth of IO
 All tests are conducted on Guest.




                                                      © 2010 IBM Corporation
IBM Linux Technology Center


Comparison with NFS and CIFS




      Sequential Read               Sequential Write



                                                       © 2010 IBM Corporation
IBM Linux Technology Center


Comparison with blockdev




      Sequential Read               Sequential Write



                                                       © 2010 IBM Corporation
IBM Linux Technology Center


Next Steps
 Fully Linux complaint, complete 9P2000.L protocol.
 ACL implementation.
 Page Cache sharing between host and guest(s)
 dcache sharing between host and guests(s)
 Interfacing with other filesystem APIs.
 Enable consistent caching.
 NFS and CIFS exportability.
 Making it a rootfs for guests instead of using root
  volumes
 Ongoing stability and scalability and performance
  improvements.




                                                      © 2010 IBM Corporation
IBM Linux Technology Center


Conclusions
 Huge Potential for specialized filesystems in the
    virtualization space.
   Growth in the cloud space will be a major catalyst.
   Lot of scope for innovation.
   A step towards paravirtual system services.
   Related Work
        XenFS
        Shared Folders(VMHGFS)
      Lguest    9P support (virtio gateway to spfs)
      Plan   9 Kernel KVM and Lguest Support (Sandia)
      Foundation:  Venti for storage content-addressable back-
         end for vmware (MIT)



                                                           © 2010 IBM Corporation
IBM Linux Technology Center




                        Questions?




                                     © 2010 IBM Corporation

Contenu connexe

Tendances

IBM System x Private Cloud Offering, Advanced Configuration: Architecture and...
IBM System x Private Cloud Offering, Advanced Configuration: Architecture and...IBM System x Private Cloud Offering, Advanced Configuration: Architecture and...
IBM System x Private Cloud Offering, Advanced Configuration: Architecture and...IBM India Smarter Computing
 
Presentation comparing server io consolidation solution with i scsi, infini...
Presentation   comparing server io consolidation solution with i scsi, infini...Presentation   comparing server io consolidation solution with i scsi, infini...
Presentation comparing server io consolidation solution with i scsi, infini...xKinAnx
 
4 implementation
4 implementation4 implementation
4 implementationhanmya
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Eric Van Hensbergen
 
Towards Software Defined Persistent Memory
Towards Software Defined Persistent MemoryTowards Software Defined Persistent Memory
Towards Software Defined Persistent MemorySwaminathan Sundararaman
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
Ibm system storage n series with multi store and snapmover redp4170
Ibm system storage n series with multi store and snapmover redp4170Ibm system storage n series with multi store and snapmover redp4170
Ibm system storage n series with multi store and snapmover redp4170Banking at Ho Chi Minh city
 
TSM og virtualisering
 TSM og virtualisering TSM og virtualisering
TSM og virtualiseringSolv AS
 
Hyper-V VMM ile Cloud computing
Hyper-V VMM ile Cloud computingHyper-V VMM ile Cloud computing
Hyper-V VMM ile Cloud computingAhmet Mutlu
 
VMware vSphere 4.1 deep dive - part 1
VMware vSphere 4.1 deep dive - part 1VMware vSphere 4.1 deep dive - part 1
VMware vSphere 4.1 deep dive - part 1Louis Göhl
 
San 101 basics of administrating a san
San 101 basics of administrating a sanSan 101 basics of administrating a san
San 101 basics of administrating a sanpineapplebed24
 
Linux one vs x86 18 july
Linux one vs x86 18 julyLinux one vs x86 18 july
Linux one vs x86 18 julyDiego Rodriguez
 

Tendances (20)

IBM System x Private Cloud Offering, Advanced Configuration: Architecture and...
IBM System x Private Cloud Offering, Advanced Configuration: Architecture and...IBM System x Private Cloud Offering, Advanced Configuration: Architecture and...
IBM System x Private Cloud Offering, Advanced Configuration: Architecture and...
 
Presentation comparing server io consolidation solution with i scsi, infini...
Presentation   comparing server io consolidation solution with i scsi, infini...Presentation   comparing server io consolidation solution with i scsi, infini...
Presentation comparing server io consolidation solution with i scsi, infini...
 
4 implementation
4 implementation4 implementation
4 implementation
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
Towards Software Defined Persistent Memory
Towards Software Defined Persistent MemoryTowards Software Defined Persistent Memory
Towards Software Defined Persistent Memory
 
What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
Ibm system storage n series with multi store and snapmover redp4170
Ibm system storage n series with multi store and snapmover redp4170Ibm system storage n series with multi store and snapmover redp4170
Ibm system storage n series with multi store and snapmover redp4170
 
Brocade VDX 6730 Converged Switch for IBM
Brocade VDX 6730 Converged Switch for IBMBrocade VDX 6730 Converged Switch for IBM
Brocade VDX 6730 Converged Switch for IBM
 
TSM og virtualisering
 TSM og virtualisering TSM og virtualisering
TSM og virtualisering
 
Cisco MDS 9124 Multilayer Fabric Switch
Cisco MDS 9124 Multilayer Fabric SwitchCisco MDS 9124 Multilayer Fabric Switch
Cisco MDS 9124 Multilayer Fabric Switch
 
Vmware san connectivity
Vmware san connectivityVmware san connectivity
Vmware san connectivity
 
Summit x460
Summit x460Summit x460
Summit x460
 
Hyper-V VMM ile Cloud computing
Hyper-V VMM ile Cloud computingHyper-V VMM ile Cloud computing
Hyper-V VMM ile Cloud computing
 
VMware vSphere 4.1 deep dive - part 1
VMware vSphere 4.1 deep dive - part 1VMware vSphere 4.1 deep dive - part 1
VMware vSphere 4.1 deep dive - part 1
 
San 101 basics of administrating a san
San 101 basics of administrating a sanSan 101 basics of administrating a san
San 101 basics of administrating a san
 
Cisco MDS 9148 Multilayer Fabric Switch
Cisco MDS 9148 Multilayer Fabric SwitchCisco MDS 9148 Multilayer Fabric Switch
Cisco MDS 9148 Multilayer Fabric Switch
 
Linux on System z – disk I/O performance
Linux on System z – disk I/O performanceLinux on System z – disk I/O performance
Linux on System z – disk I/O performance
 
Linux one vs x86 18 july
Linux one vs x86 18 julyLinux one vs x86 18 july
Linux one vs x86 18 july
 
What’s new System Center 2012 SP1, VMM
What’s new System Center 2012 SP1, VMMWhat’s new System Center 2012 SP1, VMM
What’s new System Center 2012 SP1, VMM
 

Similaire à VirtFS Ols2010

Virtualization Everywhere
Virtualization EverywhereVirtualization Everywhere
Virtualization Everywherewebhostingguy
 
Presentation power vm virtualization without limits
Presentation   power vm virtualization without limitsPresentation   power vm virtualization without limits
Presentation power vm virtualization without limitssolarisyougood
 
SYSAD323 Virtualization Basics
SYSAD323 Virtualization BasicsSYSAD323 Virtualization Basics
SYSAD323 Virtualization BasicsDon Bosco BSIT
 
Cloud stack for z Systems - July 2016
Cloud stack for z Systems - July 2016Cloud stack for z Systems - July 2016
Cloud stack for z Systems - July 2016Anderson Bassani
 
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化ITband
 
Presentation cloud computing and the internet
Presentation   cloud computing and the internetPresentation   cloud computing and the internet
Presentation cloud computing and the internetxKinAnx
 
Xen and Client Virtualization: the case of XenClient XT
Xen and Client Virtualization: the case of XenClient XTXen and Client Virtualization: the case of XenClient XT
Xen and Client Virtualization: the case of XenClient XTThe Linux Foundation
 
Contrail Deep-dive - Cloud Network Services at Scale
Contrail Deep-dive - Cloud Network Services at ScaleContrail Deep-dive - Cloud Network Services at Scale
Contrail Deep-dive - Cloud Network Services at ScaleMarketingArrowECS_CZ
 
Virtualization and Open Virtualization Format (OVF)
Virtualization and Open Virtualization Format (OVF)Virtualization and Open Virtualization Format (OVF)
Virtualization and Open Virtualization Format (OVF)rajsandhu1989
 
Virtualization technolegys for amdocs
Virtualization technolegys for amdocsVirtualization technolegys for amdocs
Virtualization technolegys for amdocsSamuel Dratwa
 
RHCE (RED HAT CERTIFIED ENGINEERING)
RHCE (RED HAT CERTIFIED ENGINEERING)RHCE (RED HAT CERTIFIED ENGINEERING)
RHCE (RED HAT CERTIFIED ENGINEERING)Sumant Garg
 
LOAD BALANCING OF APPLICATIONS USING XEN HYPERVISOR
LOAD BALANCING OF APPLICATIONS  USING XEN HYPERVISORLOAD BALANCING OF APPLICATIONS  USING XEN HYPERVISOR
LOAD BALANCING OF APPLICATIONS USING XEN HYPERVISORVanika Kapoor
 
Focus Group Open Source 09.05.2011 Massimiliano Belardi
Focus Group Open Source 09.05.2011 Massimiliano BelardiFocus Group Open Source 09.05.2011 Massimiliano Belardi
Focus Group Open Source 09.05.2011 Massimiliano BelardiRoberto Galoppini
 

Similaire à VirtFS Ols2010 (20)

Virtualization Everywhere
Virtualization EverywhereVirtualization Everywhere
Virtualization Everywhere
 
zLinux
zLinuxzLinux
zLinux
 
Presentation power vm virtualization without limits
Presentation   power vm virtualization without limitsPresentation   power vm virtualization without limits
Presentation power vm virtualization without limits
 
SYSAD323 Virtualization Basics
SYSAD323 Virtualization BasicsSYSAD323 Virtualization Basics
SYSAD323 Virtualization Basics
 
Cloud stack for z Systems - July 2016
Cloud stack for z Systems - July 2016Cloud stack for z Systems - July 2016
Cloud stack for z Systems - July 2016
 
0bnosis 2016
0bnosis 20160bnosis 2016
0bnosis 2016
 
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化
 
Presentation cloud computing and the internet
Presentation   cloud computing and the internetPresentation   cloud computing and the internet
Presentation cloud computing and the internet
 
Pnfs
PnfsPnfs
Pnfs
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
Xen and Client Virtualization: the case of XenClient XT
Xen and Client Virtualization: the case of XenClient XTXen and Client Virtualization: the case of XenClient XT
Xen and Client Virtualization: the case of XenClient XT
 
CloudX on OpenStack
CloudX on OpenStackCloudX on OpenStack
CloudX on OpenStack
 
Contrail Deep-dive - Cloud Network Services at Scale
Contrail Deep-dive - Cloud Network Services at ScaleContrail Deep-dive - Cloud Network Services at Scale
Contrail Deep-dive - Cloud Network Services at Scale
 
Virtualization and Open Virtualization Format (OVF)
Virtualization and Open Virtualization Format (OVF)Virtualization and Open Virtualization Format (OVF)
Virtualization and Open Virtualization Format (OVF)
 
Virtualization technolegys for amdocs
Virtualization technolegys for amdocsVirtualization technolegys for amdocs
Virtualization technolegys for amdocs
 
RHCE (RED HAT CERTIFIED ENGINEERING)
RHCE (RED HAT CERTIFIED ENGINEERING)RHCE (RED HAT CERTIFIED ENGINEERING)
RHCE (RED HAT CERTIFIED ENGINEERING)
 
LOAD BALANCING OF APPLICATIONS USING XEN HYPERVISOR
LOAD BALANCING OF APPLICATIONS  USING XEN HYPERVISORLOAD BALANCING OF APPLICATIONS  USING XEN HYPERVISOR
LOAD BALANCING OF APPLICATIONS USING XEN HYPERVISOR
 
Nfv
NfvNfv
Nfv
 
Ibm tivoli security and system z redp4355
Ibm tivoli security and system z redp4355Ibm tivoli security and system z redp4355
Ibm tivoli security and system z redp4355
 
Focus Group Open Source 09.05.2011 Massimiliano Belardi
Focus Group Open Source 09.05.2011 Massimiliano BelardiFocus Group Open Source 09.05.2011 Massimiliano Belardi
Focus Group Open Source 09.05.2011 Massimiliano Belardi
 

Dernier

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

VirtFS Ols2010

  • 1. IBM Linux Technology Center VirtFS A virtualization aware File System pass-through Venkateswararao Jujjuri (JV) jvrao@us.ibm.com Linux Symposium 2010 © 2010 IBM Corporation
  • 2. IBM Linux Technology Center Paravirtual Applications and System Services  Move up the virtualization intelligence into system services.  Being explored by research and academic communities but largely ignored by the mainline.  Provides hybrid environment leveraging the security, isolation, and performance.  Visibility into guest operations allow hypervisor to offer variety of use cases.  Desktops, network sharing, file systems  Avoids a layer of indirection and boosts performance.  Adding this to the existing device virtualization takes the virtualization to next level. © 2010 IBM Corporation
  • 3. IBM Linux Technology Center Paravirtual File Systems  Good target as an entry into paravirtual system services.  Virtual storage in the form of virtual disks.  Can't be shared between multiple guests.  Redundant caching  Unnecessary indirection between FS and block layer.  Using traditional distributed/network file systems over virtualized network device.  Configuration, management and encapsulation overheads.  Double caching.  Different semantics for different File Systems. © 2010 IBM Corporation
  • 4. IBM Linux Technology Center Use cases of Paravirtual File Systems  Replace virtual disk as the root filesystem.  Rapid cloning, Easy management, secure.  Sharing between host and guests.  Offer file system services to thin clients like LibraryOS.  Cloud computing  Secure window of host file system on the guest.  Different portions of the same file system shared among different guests.  Knowledge about the guest activity enables hypervisor to offer services like de-dup, snapshots etc.  Better utilization of system resources. © 2010 IBM Corporation
  • 5. IBM Linux Technology Center VirtFS  Paravirtual file system pass-through between the KVM host and guest.  Uses 9P Protocol between Client and Server.  9P2000.L protocol is being developed/defined as part of this effort.  Server is part of QEMU and uses VirtIO transport.  File System is exported to the guest at the invocation of QEMU.  Client is part of the Guest Kernel.  Mounted on the guest with the mount tag defined during the QEMU invocation. © 2010 IBM Corporation
  • 6. IBM Linux Technology Center Plan 9 Overview  Plan 9 OS is developed by AT&T Bell laboratories (Alcatel Lucent).  Intention was to address Unix shortcomings  Seamless distributed system with integrated secure network resource sharing.  Three core design principles  Single set of simple, well-defined interfaces to services.  Simple protocol to securely distribute the interfaces across any network  Dynamic hierarchical structure to organize these interfaces.  Unix pioneered the concept of treating devices like files, Plan 9 took the metaphor further by using file operations as the simple well-defined interfaces to all system and application services. © 2010 IBM Corporation
  • 7. IBM Linux Technology Center 9P Overview  9P represents the protocol/abstract interface used to access resources under Plan 9.  Any transport can be used. The only requirement is it should be a reliable, in-order transport.  Made into Linux kernel 2.6.14 and had major changes in 2.6.24.  Part of Linux mainline with VirtIO transport support.  9P2000.u extension  For POSIX adoption, during Linux port the protocol was extended with 9P2000.u version.  Provided support for numeric uid/gid, extended operations to support symlinks, links, special files etc.  Did not include full support for Linux operations. © 2010 IBM Corporation
  • 8. IBM Linux Technology Center 9P2000.L Protocol extension  Aimed at addressing 9P2000.u protocol deficiencies while keeping the core protocol elements intact.  New opcodes which match Linux VFS API in complimentary name space.  Linux native data formats (stat/permissions, etc)  Support of xattr, locking, quotas etc.  Co-exist with legacy and 9P2000.L with no changes to the existing operations.  Protocol version is negotiated during initial hand shake. If server doesn't support that version, client falls back. © 2010 IBM Corporation
  • 9. IBM Linux Technology Center KVM and QEMU  Kernel based Virtual Machine - KVM  Is a full virtualization solution for Linux on x86 h/w containing virtualization extensions ( VT-X / AMD-V)  Set of Linux kernel modules offer a special process mode to the user spaces processes (kvm.ko, kvm- intel.ko or kvm-amd.ko)  Quick EMUlator – QEMU  Uses interfaces provided by KVM to offer full system virtualization.  Emulates standard PC hardware such as IDE disk, VGA graphics, PCI devices etc.  Any I/O requests a guest OS makes are intercepted and routed to the user mode to be emulated by the QEMU process. © 2010 IBM Corporation
  • 10. IBM Linux Technology Center VirtIO Transport  A paravirtual IO bus based on hypervisor neutral DMA API.  Offers lockless ring queues between the guest and the host to enable zero-copy bulk data transfer.  VirtIO PCI transport allow VirtFS to be implemented in such a way that guest driven I/O Operations can be zero-copy. © 2010 IBM Corporation
  • 11. IBM Linux Technology Center VirtFS Block Diagram Apps on Guest VirtIO Ring VFS Interface VirtFS (v9fs) Client Host User Space Guest Kernel VirtFS Server FS API (v9fs server in QEMU) File System VFS Interface HOST KERNEL HARDWARE © 2010 IBM Corporation
  • 12. IBM Linux Technology Center VirtFS Implementation  KVM, QEMU, and VirtIO presents an ideal platform for the VirtFS server.  Two types of virtual devices  virtio-9p-pci, used to transport protocol messages and data between host and the guest.  Fsdev, used to define the exported file system characteristics like fs type and security model etc.  Command line options  -fsdev local,id=myid,path=/share_path/,security_model=mapped -device virtio-9p-pci,fsdev=myid,mount_tag=v_tag1  -virtfs local,path=/share_path/,security_model=passthrough,mnt_ta g=v_tag2  On Client mount -t 9p -o trans=virtio -o version=9p2000.L v_tag1 /mnt © 2010 IBM Corporation
  • 13. IBM Linux Technology Center Security  Two models of security enforcement  One with complete isolation of guest user domain from that of the host.  Eliminates the need for root squashing  No setuid/setgid exposures.  Complete isolation enhances security.  Not very portable.  Other model shares user domains between host and the guests.  Follows transitional network file system model.  If not careful, it is susceptible to security holes.  Client based security enforcement.  Server makes sure that the client control never crosses the exported portion. © 2010 IBM Corporation
  • 14. IBM Linux Technology Center Security Model - Mapped  VirtFS server intercepts and maps all file object create and get/set attribute requests from client.  Files are created with VirtFS server's user credentials.  Client user credentials are stored in extended attributes.  Extended user attributes are allowed for regular files and directories only.  For special files, corresponding regular files are created on file server and appropriate mode bits are added to extended attributes.  This enhances security.  Guest user domain is completely isolated from host's.  Symlinks can't be followed locally on the file server.  File system will be VirtFS'ized. © 2010 IBM Corporation
  • 15. IBM Linux Technology Center Security Model – Mapped (Cont...)  On Host (ls -l output) drwx------. 2 virfsuid virtfsgid 4096 2010-05-11 09:19 adir -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:36 afifo -rw-------. 2 virfsuid virtfsgid 0 2010-05-11 09:19 afile -rw-------. 2 virfsuid virtfsgid 0 2010-05-11 09:19 alink -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:57 asocket1 -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:32 blkdev -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:33 chardev -rw-------. 1 root root 6 2010-05-11 09:20 asymlink  On Guest (ls -l output) drwxr-xr-x 2 guestuser guestuser 4096 2010-05-11 12:19 adir prw-r--r-- 1 guestuser guestuser 0 2010-05-11 12:36 afifo -rw-r--r-- 2 guestuser guestuser 0 2010-05-11 12:19 afile -rw-r--r-- 2 guestuser guestuser 0 2010-05-11 12:19 alink srwxr-xr-x 1 guestuser guestuser 0 2010-05-11 12:57 asocket1 brw-r--r-- 1 guestuser guestuser 0, 0 2010-05-11 12:32 blkdev crw-r--r-- 1 guestuser guestuser 4, 5 2010-05-11 12:33 chardev lrwxrwxrwx 1 root root 6 2010-05-11 12:20 asymlink -> afile © 2010 IBM Corporation
  • 16. IBM Linux Technology Center Security Model – Passthrough  All the requests are passed directly to underlying file system without any interception.  File system objects on the fileserver will be created with client-user's credentials.  Two methods to do this:  setuid/setgid during the creation.  chmod/chown immediately after creation.  All special files are created as-is.  Portable between NFS/CIFS.  Susceptible to security issues.  Client root can create files on the fileserver with root privileges if fileserver is running as root.  Symlinks can be followed locally. © 2010 IBM Corporation
  • 17. IBM Linux Technology Center Security Model – Passthrough (Cont...)  On Host # grep 611 /etc/passwd hostuser:x:611:611::/home/hostuser:/bin/bash # ls -l -rwxrwxrwx. 2 hostuser hostuser 0 2010-05-12 18:14 file1 -rwxrwxrwx. 2 hostuser hostuser 0 2010-05-12 18:14 link1 srwxrwxr-x. 1 hostuser hostuser 0 2010-05-12 18:27 mysock lrwxrwxrwx. 1 hostuser hostuser 5 2010-05-12 18:25 symlink1 -> file1  On Guest $ grep 611 /etc/passwd guestuser:x:611:611::/home/guestuser:/bin/bash $ ls -l -rwxrwxrwx 2 guestuser guestuser 0 2010-05-12 21:14 file1 -rwxrwxrwx 2 guestuser guestuser 0 2010-05-12 21:14 link1 srwxrwxr-x 1 guestuser guestuser 0 2010-05-12 21:27 mysock lrwxrwxrwx 1 guestuser guestuser 5 2010-05-12 21:25 symlink1 ->file1 © 2010 IBM Corporation
  • 18. IBM Linux Technology Center ACL Implementation  Access Control Lists (ACLs) allow fine grained control.  No universal standards.  Linux offers POSIX ACLs, but they are not versatile/rich enough to support NFSv4.  Rich ACL patch set for Linux is on the mailing list.  Strategy for VirtFS  Enforcement at client.  Support only one ACL model.  Start with POSIX ACLs  Help Rich ACLs to make into the mainline.  Convert to Rich ACLs once they are available on mainline. © 2010 IBM Corporation
  • 19. IBM Linux Technology Center Where are we?  VirtFS server is in QEMU mainline.  Security model patchset had been accepted into QEMU mainline, part of QEMU 0.13  Several patches made into mainline Linux.  Fedora13 and Lucid mounts VirtFS (9P2000.U).  Making good progress on 9P2000.L. Implemented all the required VFS calls to satisfy Tuxera POSIX test suite. These patches are either on the list or already got accepted.  A patchset to generalize worker thread infrastructure in QEMU is on mainline. Working on to convert the current single thread server into multi-thread using that infrastructure.  Working on POSIX ACLs, byte range lock implementation. © 2010 IBM Corporation
  • 20. IBM Linux Technology Center Performance Tests  Plain & Simple; dd to compare.  Write  dd if=/dev/zero of=/mnt/fileX bs=<blocksize> count=<count>  Read  dd if=/mnt/fileX of=/dev/null bs=<blocksize> count=<count>  blocksize - 8k, 64k, 2M.  Count = Number of blocks to do 8GB worth of IO  All tests are conducted on Guest. © 2010 IBM Corporation
  • 21. IBM Linux Technology Center Comparison with NFS and CIFS Sequential Read Sequential Write © 2010 IBM Corporation
  • 22. IBM Linux Technology Center Comparison with blockdev Sequential Read Sequential Write © 2010 IBM Corporation
  • 23. IBM Linux Technology Center Next Steps  Fully Linux complaint, complete 9P2000.L protocol.  ACL implementation.  Page Cache sharing between host and guest(s)  dcache sharing between host and guests(s)  Interfacing with other filesystem APIs.  Enable consistent caching.  NFS and CIFS exportability.  Making it a rootfs for guests instead of using root volumes  Ongoing stability and scalability and performance improvements. © 2010 IBM Corporation
  • 24. IBM Linux Technology Center Conclusions  Huge Potential for specialized filesystems in the virtualization space.  Growth in the cloud space will be a major catalyst.  Lot of scope for innovation.  A step towards paravirtual system services.  Related Work  XenFS  Shared Folders(VMHGFS)  Lguest 9P support (virtio gateway to spfs)  Plan 9 Kernel KVM and Lguest Support (Sandia)  Foundation: Venti for storage content-addressable back- end for vmware (MIT) © 2010 IBM Corporation
  • 25. IBM Linux Technology Center Questions? © 2010 IBM Corporation