SlideShare une entreprise Scribd logo
1  sur  159
Télécharger pour lire hors ligne
Solaris 10 Administration Topics Workshop
                                  3 - File Systems
                               By Peter Baer Galvin


                                         For Usenix
                            Last Revision April 2009

                        Copyright 2009 Peter Baer Galvin - All Rights Reserved



Saturday, May 2, 2009
About the Speaker
                         Peter Baer Galvin - 781 273 4100
                         pbg@cptech.com
                         www.cptech.com
                         peter@galvin.info
                         My Blog: www.galvin.info
                         Bio
                                 Peter Baer Galvin is the Chief Technologist for Corporate Technologies, Inc., a leading
                                 systems integrator and VAR, and was the Systems Manager for Brown University's
                                 Computer Science Department. He has written articles for Byte and other magazines. He
                                 was contributing editor of the Solaris Corner for SysAdmin Magazine , wrote Pete's
                                 Wicked World, the security column for SunWorld magazine, and Pete’s Super Systems, the
                                 systems administration column there. He is now Sun columnist for the Usenix ;login:
                                 magazine. Peter is co-author of the Operating Systems Concepts and Applied Operating
                                 Systems Concepts texbooks. As a consultant and trainer, Mr. Galvin has taught tutorials
                                 in security and system administration and given talks at many conferences and
                                 institutions.




                                         Copyright 2009 Peter Baer Galvin - All Rights Reserved                            2



Saturday, May 2, 2009
Objectives
                        Cover a wide variety of topics in Solaris 10

                        Useful for experienced system administrators

                        Save time

                        Avoid (my) mistakes

                        Learn about new stuff

                        Answer your questions about old stuff

                        Won't read the man pages to you

                        Workshop for hands-on experience and to reinforce concepts

                        Note – Security covered in separate tutorial


                                    Copyright 2009 Peter Baer Galvin - All Rights Reserved   3




Saturday, May 2, 2009
More Objectives
                        What makes novice vs. advanced administrator?
                           Bytes as well as bits, tactics and strategy
                           Knows how to avoid trouble
                               How to get out of it once in it
                               How to not make it worse
                           Has reasoned philosophy
                           Has methodology


                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   4




Saturday, May 2, 2009
Prerequisites

                        Recommend at least a couple of years of
                        Solaris experience
                           Or at least a few years of other Unix
                           experience
                        Best is a few years of admin experience,
                        mostly on Solaris


                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   5




Saturday, May 2, 2009
About the Tutorial

                        Every SysAdmin has a different knowledge set
                        A lot to cover, but notes should make good
                        reference
                           So some covered quickly, some in detail
                               Setting base of knowledge

                           Please ask questions
                               But let’s take off-topic off-line

                               Solaris BOF
                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   6




Saturday, May 2, 2009
Fair Warning
                        Sites vary
                        Circumstances vary
                        Admin knowledge varies
                        My goals
                           Provide information useful for each of you at
                           your sites
                           Provide opportunity for you to learn from
                           each other

                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   7




Saturday, May 2, 2009
Why Listen to Me
                   20 Years of Sun experience
                   Seen much as a consultant
                   Hopefully, you've used:
                        My Usenix ;login: column
                        The Solaris Corner @ www.samag.com
                        The Solaris Security FAQ
                        SunWorld “Pete's Wicked World”
                        SunWorld “Pete's Super Systems”
                        Unix Secure Programming FAQ (out of date)
                        Operating System Concepts (The Dino Book), now 8th ed
                        Applied Operating System Concepts



                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   8




Saturday, May 2, 2009
Slide Ownership

                        As indicated per slide, some slides
                        copyright Sun Microsystems
                        Feel free to share all the slides - as long as
                        you don’t charge for them or teach from
                        them for fee



                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   9




Saturday, May 2, 2009
Overview
                                     Lay of the Land




                        Copyright 2009 Peter Baer Galvin - All Rights Reserved



Saturday, May 2, 2009
Schedule
                         Times and Breaks




                        Copyright 2009 Peter Baer Galvin - All Rights Reserved   11




Saturday, May 2, 2009
Coverage


                        Solaris 10+, with some Solaris 9 where
                        needed
                        Selected topics that are new, different,
                        confusing, underused, overused, etc




                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   12




Saturday, May 2, 2009
Outline

                        Overview
                        Objectives
                        Choosing the most appropriate file system(s)
                        UFS / SDS
                        Veritas FS / VM (not in detail)
                        ZFS




                                   Copyright 2009 Peter Baer Galvin - All Rights Reserved   13




Saturday, May 2, 2009
Polling Time
                        Solaris releases in use?
                           Plans to upgrade?
                        Other OSes in use?
                        Use of Solaris rising or falling?
                           SPARC and x86
                           OpenSolaris?

                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   14




Saturday, May 2, 2009
Your Objectives?




                        Copyright 2009 Peter Baer Galvin - All Rights Reserved   15




Saturday, May 2, 2009
Lab Preparation
                        Have device capable of telnet on the
                        USENIX network
                           Or have a buddy
                        Learn your “magic number”
                        Telnet to 131.106.62.100+”magic number”
                        User “root, password “lisa”
                           It’s all very secure

                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   16




Saturday, May 2, 2009
Lab Preparation

                        Or...
                           Use virtualbox
                           Use your own system
                           Use a remote machine you have legit
                           access to


                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   17




Saturday, May 2, 2009
Choosing the Most Appropriate File Systems




                        Copyright 2009 Peter Baer Galvin - All Rights Reserved



Saturday, May 2, 2009
Choosing the Most Appropriate File Systems

                        Many file systems, many not optional (tmpfs et al)

                        Where you have choice, how to choose?

                        Consider

                             Solaris version being used

                                   < S10 means no ZFS

                             ISV support

                                   For each ISV make sure desired FS is supported

                                   Apps, backups, clustering

                             Priorities

                                   Now weigh priorities of performance, reliability, experience,
                                   features, risk / reward

                                     Copyright 2009 Peter Baer Galvin - All Rights Reserved        19




Saturday, May 2, 2009
Consider...
                        Pros and cons of mixing file systems
                        Root file system
                           Not much value in using vxfs / vxvm here
                           unless used elsewhere
                        Interoperability (need to detach from one type
                        of system and attach to another?)
                        Cost
                        Supportability & support model
                        Non-production vs. production use
                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   20




Saturday, May 2, 2009
Root Disk Mirroring
                            The Crux of Performance




                         Copyright 2009 Peter Baer Galvin - All Rights Reserved



Saturday, May 2, 2009
Topics


                •Root disk mirroring
                •ZFS



                        Copyright 2009 Peter Baer Galvin - All Rights Reserved   22




Saturday, May 2, 2009
Root Disk Mirroring
                        Complicated because
                          Must be bootable
                          Want it protected from disk failure
                              And want the protection to work


                          Can increase or decrease upgrade
                          complexity
                              Veritas
                              Live upgrade
                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   23




Saturday, May 2, 2009
Manual Mirroring
                        Vxvm encapsulation can cause lack of availability
                        Vxvm needs a rootdg disk
                        Any automatic mirroring can propagate errors
                        Consider
                            Use disksuite (Solaris Volume Manager) to mirror boot disk
                            Use 3rd disk as rootdg, 3rd disksuite metadb, manual mirror
                            copy
                            Or use 10Mb rootdg on 2 boot disks in disksuite to do the
                            mirroring
                            Best of all worlds – details in column at
                            www.samag.com/solaris

                                   Copyright 2009 Peter Baer Galvin - All Rights Reserved   24




Saturday, May 2, 2009
Manual Mirroring
             Sometimes want more than no mirroring, less than real mirroring
             Thus "manual mirroring"
                Nightly cron job to copy partitions elsewhere
                Can be used to duplicate root disk, if installboot used
                Combination of newfs, mount, ufsdump | ufsrestore
                Quite effective, useful, and cheap
                Easy recovery from corrupt root image, malicious error, sysadmin
                error
                Has saved at least one client
                But disk failure can require manual intervention
                Complete script can be found at www.samag.com/solaris

                             Copyright 2009 Peter Baer Galvin - All Rights Reserved   25




Saturday, May 2, 2009
Best Practice – Root Disk
                        Have 4 disks for root!
                           1st is primary boot device
                           2nd is disksuite mirror of first
                           3rd is manual mirror of 1st
                           4th is manual mirror, kept on a shelf!
                        Put nothing but systems files on these disks
                        (/, /var, /opt, /usr, swap)

                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   26




Saturday, May 2, 2009
Aside: Disk Performance
                              Which is faster?




                73GB drive                                      300GB drive
               10000 RPM                                         10000 RPM
                   3Gb/sec                                           3Gb/sec

                         Copyright 2009 Peter Baer Galvin - All Rights Reserved   27




Saturday, May 2, 2009
UFS / SDS




                        Copyright 2009 Peter Baer Galvin - All Rights Reserved



Saturday, May 2, 2009
UFS Overview
                        Standard Pre-Solaris 10 file system
                        Many years old, updated continously
                             But still showing its age
                        No integrated volume manager, instead use SDS
                        (disk suite)
                        Very fast, but feature poor
                             For example snapshots exist but only useful for
                             backups
                        Painful to manage, change, repair

                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   29




Saturday, May 2, 2009
Features
                        64-bit pointers
                        16TB file systems (on 64-bit Solaris)
                        1TB maximum file size
                        metadata logging (by default) increases
                        performance and keeps file systems (usually)
                        consistent after a crash
                        Lots of ISV and internal command (dump) support
                        Only bootable Solaris file system (until S10 10/08)
                        Dynamic multipathing, but via separate “traffic
                        manager” facility
                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   30




Saturday, May 2, 2009
Issues
                        Sometimes there is still corruption

                              Need to run fsck

                                      Sometimes it fails

                        Many limits

                        Many features lacking (compared to ZFS)

                        Lots of manual administration tasks

                              format to slice up a disk

                              newfs to format the file system, fsck to check it

                              mount and /etc/vfstab to mount a file system

                              share commands, plus svcadm commands, to NFS export

                              Plus separate volume management
                                  Copyright 2009 Peter Baer Galvin - All Rights Reserved   31




Saturday, May 2, 2009
Volume Management
                        Separate set of commands (meta*) to manage volumes (RAID et al)

                        For example, to mirror the root file system

                              Have 2 disks with identical partitioning

                                      Have 2 small partition per disk for meta-data (here
                                      slices 5 and 6)

                              newfs the file systems

                              Create meta-data state databases (at least 3, for quorum)

                                      # metadb -a /dev/dsk/c0t0d0s5

                                      # metadb -a /dev/dsk/c0t0d0s6

                                      # metadb -a /dev/dsk/c0t1d0s5

                                      # metadb -a /dev/dsk/c0t1d0s6

                                   Copyright 2009 Peter Baer Galvin - All Rights Reserved   32




Saturday, May 2, 2009
Volume Management (cont)
                          Initialize submirrors (components of mirrors) and mirror the partitions - here
                          we do /, swap, and /var
                        # metainit -f d10 1 1 c0t0d0s0
                        # metainit -f d20 1 1 c0t1d0s0
                        # metainit d0 -m d10

                          Make the new / bootable
                        # metaroot d0
                        # metainit -f d11 1 1 c0t0d0s1
                        # metainit -f d21 1 1 c0t1d0s1
                        # metainit d1 -m d11
                        # metainit -f d14 1 1 c0t0d0s4
                        # metainit -f d24 1 1 c0t1d0s4
                        # metainit d4 -m d14
                        # metainit -f d17 1 1 c0t0d0s7
                        # metainit -f d27 1 1 c0t1d0s7
                        # metainit d7 -m d17
                                        Copyright 2009 Peter Baer Galvin - All Rights Reserved             33




Saturday, May 2, 2009
Volume Management (cont)

                        Update /etc/vfstab to reflect new meta devices
                        /dev/md/dsk/d1    -        -         swap      -         no          -
                        /dev/md/dsk/d4    /dev/md/rdsk/d4 /var         ufs       1           yes   -
                        /dev/md/dsk/d7    /dev/md/rdsk/d7 /export ufs            1           yes   -

                        Finally attach the submirror to each device to be mirrored
                        # metattach d0 d20
                        # metattach d1 d21
                        # metattach d4 d24
                        # metattach d7 d27

                        Now the root disk is mirrored, and commands such as Solaris upgrade, live
                        upgrade, and boot understand that




                                    Copyright 2009 Peter Baer Galvin - All Rights Reserved             34




Saturday, May 2, 2009
Veritas VM / FS




                        Copyright 2009 Peter Baer Galvin - All Rights Reserved



Saturday, May 2, 2009
Overview
                        A popular, commercial addition to Solaris
                        64-bit
                        Integrated volume management (vxfs + vxvm)
                        Mirrored root disk via “encapsulation”
                        Good ISV support
                        Good extended features such as snapshots, replication
                        Shrink and grow file systems
                        Extent based (for better and worse), journaled,
                        clusterable
                        Cross-platform
                                 Copyright 2009 Peter Baer Galvin - All Rights Reserved   36




Saturday, May 2, 2009
Features
                        Very large limits
                        Dynamic multipathing included
                        Hot spares to automatically replace failed
                        disks
                        Dirty region logging (DRL) volume
                        transaction logs for fast recovery from
                        crash
                            But still can require consistency check

                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   37




Saturday, May 2, 2009
Issues
                        $$$
                        Adds supportability complexities (who do
                        you call)
                        Complicates OS upgrades (unencapsulate
                        first)
                        Fairly complex to manage
                        Comparison of performance vs. ZFS at
                        http://www.sun.com/software/whitepapers/
                        solaris10/zfs_veritas.pdf

                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   38




Saturday, May 2, 2009
ZFS




                        Copyright 2009 Peter Baer Galvin - All Rights Reserved



Saturday, May 2, 2009
ZFS
                        Looks to be the “next great thing”
                        Shipped officially in S10U2 (the 06/06 release)
                        From scratch file system
                        Includes volume management, file system, reliability,
                        scalability, performance, snapshots, clones,
                        replication
                        128-bit file system, almost everything is “infinite”
                        Checksumming throughout
                        Simple, endian independent, export/importable…
                        Still using traffic manager for multipathing
             (some following slides are from ZFS talk by Jeff Bonwick
                and Bill Moore – ZFS team leads at Sun)
                                 Copyright 2009 Peter Baer Galvin - All Rights Reserved   40




Saturday, May 2, 2009
Trouble with Existing Filesystems
                        No defense against silent data corruption
                           Any defect in disk, controller, cable, driver, or firmware can
                           corrupt data silently; like running a server without ECC
                           memory
                        Brutal to manage
                           Labels, partitions, volumes, provisioning, grow/shrink, /etc/
                           vfstab...
                           Lots of limits: filesystem/volume size, file size, number of files,
                           files per directory, number of snapshots, ...
                           Not portable between platforms (e.g. x86 to/from SPARC)
                        Dog slow
                           Linear-time create, fat locks, fixed block size, naïve prefetch,
                           slow random writes, dirty region logging
                                 Copyright 2009 Peter Baer Galvin - All Rights Reserved       41




Saturday, May 2, 2009
Design Principles
                        Pooled storage
                           Completely eliminates the antique notion of volumes
                           Does for storage what VM did for memory

                        End-to-end data integrity
                           Historically considered “too expensive”
                           Turns out, no it isn't
                           And the alternative is unacceptable

                        Transactional operation
                           Keeps things always consistent on disk
                           Removes almost all constraints on I/O order
                           Allows us to get huge performance wins
                                  Copyright 2009 Peter Baer Galvin - All Rights Reserved   42




Saturday, May 2, 2009
Why “volumes” Exist
                        In the beginning, each filesystem managed a
                        single disk
                        Customers wanted more space, bandwidth,
                        reliability
                           Rewrite filesystems to handle many disks: hard
                           Insert a little shim (“volume”) to cobble disks together:
                           easy

                        An industry grew up around the FS/volume
                        model
                           Filesystems, volume managers sold as separate products
                           Inherent problems in FS/volume interface can't be fixed
                                 Copyright 2009 Peter Baer Galvin - All Rights Reserved   43




Saturday, May 2, 2009
Traditional Volumes

                          FS                                 FS

                        Volume                          Volume
                        (stripe)                        (mirror)




                          Copyright 2009 Peter Baer Galvin - All Rights Reserved   44




Saturday, May 2, 2009
ZFS Pools

                        Abstraction: malloc/free
                        No partitions to manage
                        Grow/shrink automatically
                        All bandwidth always available
                        All storage in the pool is shared


                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   45




Saturday, May 2, 2009
ZFS Pooled Storage

                        FS              FS                 FS               FS        FS

                             Storage Pool                               Storage Pool
                               (RAIDZ)                                    (Mirror)




                             Copyright 2009 Peter Baer Galvin - All Rights Reserved        46




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   47




Saturday, May 2, 2009
ZFS Data Integrity Model
                        Everything is copy-on-write
                           Never overwrite live data
                           On-disk state always valid – no “windows of
                           vulnerability”
                           No need for fsck(1M)
                        Everything is transactional
                           Related changes succeed or fail as a whole
                           No need for journaling
                           Everything is checksummed
                           No silent data corruption
                           No panics due to silently corrupted metadata
                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   48




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   49




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   50




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   51




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   52




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   53




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   54




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   55




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   56




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   57




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   58




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   59




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   60




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   61




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   62




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   63




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   64




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   65




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   66




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   67




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   68




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   69




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   70




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   71




Saturday, May 2, 2009
Terms
                        Pool - set of disks in one or more RAID
                        formats (i.e. mirrored stripe)
                           No “/”
                        File system - mountable-container of files
                        Data set - file system, block device,
                        snapshot, volume or clone within a pool
                           Named via pool/path[@snapshot]

                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   72




Saturday, May 2, 2009
Terms (cont)
                        ZIL - ZFS intent log
                           On-disk duplicate of in-memory log of
                           changes to make to data sets
                           Write goes to memory, ZIL, is
                           acknowledged, then goes to disk
                        ARC - in-memory read cache
                        L2ARC - level 2 ARC - on flash memory

                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   73




Saturday, May 2, 2009
What ZFS doesn’t do
                        Can’t remove individual devices from pools
                           Rather, replace the device, or 3-way mirror
                           including the device and then remove the device
                           Can’t shrink a pool (yet)
                        Can add individual devices, but not optimum (yet)
                           If adding disk to RAIDZ or RAIDZ2, then end up
                           with RAIDZ(2)+ 1 concatenated device
                           Instead add full RAID elements to a pool
                               Add a mirror pair or RAIDZ(2) set
                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   74




Saturday, May 2, 2009
zpool
             # zpool
             missing command
             usage: zpool command args ...
             where 'command' is one of the following:


                        create [-fn] [-o property=value] ...
                           [-O file-system-property=value] ...
                           [-m mountpoint] [-R root] <pool> <vdev> ...
                        destroy [-f] <pool>


                        add [-fn] <pool> <vdev> ...
                        remove <pool> <device> ...


                        list [-H] [-o property[,...]] [pool] ...
                        iostat [-v] [pool] ... [interval [count]]
                        status [-vx] [pool] ...


                        online <pool> <device> ...
                        offline [-t] <pool> <device> ...
                        clear <pool> [device]

                                   Copyright 2009 Peter Baer Galvin - All Rights Reserved
                                                                                            75


Saturday, May 2, 2009
zpool (cont)
                        attach [-f] <pool> <device> <new-device>
                        detach <pool> <device>
                        replace [-f] <pool> <device> [new-device]


                        scrub [-s] <pool> ...


                        import [-d dir] [-D]
                        import [-o mntopts] [-o property=value] ...
                            [-d dir | -c cachefile] [-D] [-f] [-R root] -a
                        import [-o mntopts] [-o property=value] ...
                          [-d dir | -c cachefile] [-D] [-f] [-R root] <pool | id>
                  [newpool]
                      export [-f] <pool> ...
                        upgrade
                        upgrade -v
                        upgrade [-V version] <-a | pool ...>


                        history [-il] [<pool>] ...
                        get <"all" | property[,...]> <pool> ...
                        set <property=value> <pool>
                                  Copyright 2009 Peter Baer Galvin - All Rights Reserved   76




Saturday, May 2, 2009
zpool (cont)
             # zpool create ezfs raidz c2t0d0 c3t0d0 c4t0d0 c5t0d0
             # zpool status -v
               pool: ezfs
              state: ONLINE
              scrub: none requested
             config:


                        NAME          STATE      READ WRITE CKSUM
                        ezfs          ONLINE        0      0      0
                          raidz       ONLINE        0      0      0
                            c2t0d0    ONLINE        0      0      0
                            c3t0d0    ONLINE        0      0      0
                            c4t0d0    ONLINE        0      0      0
                            c5t0d0    ONLINE        0      0      0


             errors: No known data errors




                                     Copyright 2009 Peter Baer Galvin - All Rights Reserved   77




Saturday, May 2, 2009
zpool (cont)
                pool: zfs
              state: ONLINE
              scrub: none requested
             config:


                        NAME         STATE        READ WRITE CKSUM
                        zfs          ONLINE          0       0      0
                          raidz      ONLINE          0       0      0
                            c0d0s7   ONLINE          0       0      0
                            c0d1s7   ONLINE          0       0      0
                            c1d1     ONLINE          0       0      0
                            c1d0     ONLINE          0       0      0


             errors: No known data errors




                                   Copyright 2009 Peter Baer Galvin - All Rights Reserved   78




Saturday, May 2, 2009
zpool (cont)
             (/)# zpool iostat -v
                            capacity              operations               bandwidth
             pool         used avail             read write               read write
             ---------- ----- -----             ----- -----              ----- -----
             bigp         630G    392G              2      4             41.3K   496K
               raidz      630G    392G              2      4             41.3K   496K
                  c0d0s6     -       -              0      2             8.14K   166K
                  c0d1s6     -       -              0      2             7.77K   166K
                  c1d0s6     -       -              0      2             24.1K   166K
                  c1d1s6     -       -              0      2             22.2K   166K
             ---------- ----- -----             ----- -----              ----- -----



                         Copyright 2009 Peter Baer Galvin - All Rights Reserved         79




Saturday, May 2, 2009
zpool (cont)
             # zpool status -v
               pool: rpool
              state: ONLINE
              scrub: none requested
             config:
                        NAME         STATE    READ WRITE CKSUM
                        rpool        ONLINE      0    0     0
                          mirror     ONLINE      0    0     0
                            c0d0s0   ONLINE      0    0     0
                            c0d1s0   ONLINE      0    0     0
             errors: No known data errors
               pool: zpbg
              state: ONLINE
              scrub: none requested
             config:
                        NAME         STATE    READ WRITE CKSUM
                        zpbg         ONLINE      0    0     0
                          raidz1     ONLINE      0    0     0
                            c4t0d0   ONLINE      0    0     0
                            c4t1d0   ONLINE      0    0     0
                            c5t0d0   ONLINE      0    0     0
                            c5t1d0   ONLINE      0    0     0
                            c6t0d0   ONLINE      0    0     0
             errors: No known data errors

                                         Copyright 2009 Peter Baer Galvin - All Rights Reserved   80




Saturday, May 2, 2009
zpool (cont)
              zpool iostat -v
                              capacity        operations          bandwidth
             pool           used avail       read write          read write
             ----------    ----- -----      ----- -----         ----- -----
             rpool         6.72G    225G        0         1     9.09K     11.6K
               mirror      6.72G    225G        0         1     9.09K     11.6K
                  c0d0s0       -       -        0         0     5.01K     11.7K
                  c0d1s0       -       -        0         0     5.09K     11.7K
             ----------    -----   -----    -----     -----     -----     -----
             zpbg          3.72T    833G        0         0     32.0K     1.24K
               raidz1      3.72T    833G        0         0     32.0K     1.24K
                  c4t0d0       -       -        0         0     9.58K       331
                  c4t1d0       -       -        0         0     10.3K       331
                  c5t0d0       -       -        0         0     10.4K       331
                 c5t1d0        -       -        0         0     10.3K       331
                 c6t0d0        -       -        0         0     9.54K       331
             ----------    -----   -----    -----     -----     -----     -----


                              Copyright 2009 Peter Baer Galvin - All Rights Reserved   81




Saturday, May 2, 2009
zpool (cont)

                        Note that for import and export, a pool is
                        the delineator
                           You can’t import or export a file system
                           because it’s an integral part of a pool
                           Might cause you to use smaller pools
                           than other


                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   82




Saturday, May 2, 2009
zfs
             # zfs
             missing command
             usage: zfs command args ...
             where 'command' is one of the following:


                         create [-p] [-o property=value] ... <filesystem>
                         create [-ps] [-b blocksize] [-o property=value] ... -V <size> <volume>
                         destroy [-rRf] <filesystem|volume|snapshot>


                         snapshot [-r] [-o property=value] ... <filesystem@snapname|
                        volume@snapname>
                         rollback [-rRf] <snapshot>
                         clone [-p] [-o property=value] ... <snapshot> <filesystem|volume>
                         promote <clone-filesystem>
                         rename <filesystem|volume|snapshot> <filesystem|volume|snapshot>
                         rename -p <filesystem|volume> <filesystem|volume>
                         rename -r <snapshot> <snapshot>


                                    Copyright 2009 Peter Baer Galvin - All Rights Reserved        83




Saturday, May 2, 2009
zfs (cont)
                      list [-rH] [-o property[,...]] [-t type[,...]] [-s
                  property] ...
                           [-S property] ... [filesystem|volume|snapshot] ...
                        set <property=value> <filesystem|volume|snapshot> ...
                       get [-rHp] [-o field[,...]] [-s source[,...]]
                           <"all" | property[,...]> [filesystem|volume|
                  snapshot] ...
                       inherit [-r] <property> <filesystem|volume|snapshot> ...
                       upgrade [-v]
                       upgrade [-r] [-V version] <-a | filesystem ...>

                        mount
                        mount [-vO] [-o opts] <-a | filesystem>
                        unmount [-f] <-a | filesystem|mountpoint>
                        share <-a | filesystem>
                        unshare [-f] <-a | filesystem|mountpoint>


                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   84




Saturday, May 2, 2009
zfs (cont)
                        send [-R] [-[iI] snapshot] <snapshot>
                        receive [-vnF] <filesystem|volume|snapshot>
                        receive [-vnF] -d <filesystem>


                        allow [-ldug] <"everyone"|user|group>[,...] <perm|@setname>[,...]
                            <filesystem|volume>
                        allow [-ld] -e <perm|@setname>[,...] <filesystem|volume>
                        allow -c <perm|@setname>[,...] <filesystem|volume>
                        allow -s @setname <perm|@setname>[,...] <filesystem|volume>


                        unallow [-rldug] <"everyone"|user|group>[,...]
                            [<perm|@setname>[,...]] <filesystem|volume>
                        unallow [-rld] -e [<perm|@setname>[,...]] <filesystem|volume>
                        unallow [-r] -c [<perm|@setname>[,...]] <filesystem|volume>
                      unallow [-r] -s @setname [<perm|@setname>[,...]] <filesystem|
                  volume>
             Each dataset is of the form: pool/[dataset/]*dataset[@name]
             For the property list, run: zfs set|get
             For the delegated permission list, run: zfs allow|unallow


                                 Copyright 2009 Peter Baer Galvin - All Rights Reserved     85




Saturday, May 2, 2009
zfs (cont)
             # zfs get
             missing property argument
             usage:
                        get [-rHp] [-o field[,...]] [-s source[,...]]
                            <"all" | property[,...]> [filesystem|volume|snapshot] ...
             The following properties are supported:
                        PROPERTY         EDIT   INHERIT    VALUES
                        available          NO         NO   <size>
                        compressratio      NO         NO   <1.00x or higher if compressed>
                        creation           NO         NO   <date>
                        mounted            NO         NO   yes | no
                        origin             NO         NO   <snapshot>
                        referenced         NO         NO   <size>
                        type               NO         NO   filesystem | volume | snapshot
                        used               NO         NO   <size>
                      aclinherit          YES       YES    discard | noallow | restricted |
                  passthrough
                        aclmode           YES       YES    discard | groupmask | passthrough
                        atime             YES       YES    on | off



                                     Copyright 2009 Peter Baer Galvin - All Rights Reserved    86




Saturday, May 2, 2009
zfs (cont)
                        canmount          YES          NO    on | off | noauto
                        casesensitivity    NO        YES     sensitive | insensitive | mixed
                      checksum            YES        YES     on | off | fletcher2 | fletcher4 |
                  sha256
                        compression       YES        YES     on | off | lzjb | gzip | gzip-[1-9]
                        copies            YES        YES     1 | 2 | 3
                        devices           YES        YES     on | off
                        exec              YES        YES     on | off
                        mountpoint        YES        YES     <path> | legacy | none
                      nbmand              YES        YES     on | off
                      normalization        NO        YES     none | formC | formD | formKC |
                  formKD
                        primarycache      YES        YES     all | none | metadata
                        quota             YES          NO    <size> | none
                        readonly          YES        YES     on | off
                        recordsize        YES        YES     512 to 128k, power of 2
                        refquota          YES          NO    <size> | none
                        refreservation    YES          NO    <size> | none
                        reservation       YES          NO    <size> | none


                                   Copyright 2009 Peter Baer Galvin - All Rights Reserved          87




Saturday, May 2, 2009
zfs (cont)
                                secondarycache    YES        YES     all | none | metadata
                                setuid            YES        YES     on | off
                                shareiscsi        YES        YES     on | off | type=<type>
                               sharenfs           YES        YES     on | off | share(1M)
                           options
                               sharesmb           YES        YES     on | off | sharemgr(1M)
                           options
                               snapdir            YES        YES     hidden | visible
                                utf8only           NO        YES     on | off
                                version           YES         NO     1 | 2 | 3 | current
                                volblocksize       NO        YES     512 to 128k, power of 2
                                volsize           YES         NO     <size>
                                vscan             YES        YES     on | off
                                xattr             YES        YES     on | off
                                zoned             YES        YES     on | off


                        Sizes are specified in bytes with standard units such as K, M, G,
                            etc.
                        User-defined properties can be specified by using a name
                            containing a colon (:).

                                 Copyright 2009 Peter Baer Galvin - All Rights Reserved        88




Saturday, May 2, 2009
zfs (cont)
             (/)# zfs list
             NAME                   USED AVAIL REFER              MOUNTPOINT
             bigp                   630G   384G      -            /zfs/bigp
             bigp/big               630G   384G   630G            /zfs/bigp/big
             (root@sparky)-(7/pts)-(06:35:11/05/05)-
             (/)# zfs snapshot bigp/big@5-nov
             (root@sparky)-(8/pts)-(06:35:11/05/05)-
             (/)# zfs list
             NAME                   USED AVAIL REFER              MOUNTPOINT
             bigp                   630G   384G      -            /zfs/bigp
             bigp/big               630G   384G   630G            /zfs/bigp/big
             bigp/big@5-nov            0      -   630G            /zfs/bigp/big@5-nov

             # zfs send bigp/big@5-nov | ssh host zfs receive poolB/received/
                big@5-nov
             # zfs send -i 5-nov big/bigp@6-nov | ssh host 
                zfs receive poolB/received/big

                            Copyright 2009 Peter Baer Galvin - All Rights Reserved      89




Saturday, May 2, 2009
zfs (cont)
            # zpool history
            History for 'zpbg':
            2006-04-03.11:47:44 zpool create -f zpbg raidz c5t0d0 c10t0d0
                c11t0d0 c12t0d0 c13t0d0
            2006-04-03.18:19:48 zfs receive zpbg/imp
            2006-04-03.18:41:39 zfs receive zpbg/home
            2006-04-03.19:04:22 zfs receive zpbg/photos
            2006-04-03.19:37:56 zfs set mountpoint=/export/home zpbg/home
            2006-04-03.19:44:22 zfs receive zpbg/mail
            2006-04-03.20:12:34 zfs set mountpoint=/var/mail zpbg/mail
            2006-04-03.20:14:32 zfs receive zpbg/mqueue
            2006-04-03.20:15:01 zfs set mountpoint=/var/spool/mqueue zpbg/
                mqueue
            # zfs create -V 2g tank/volumes/v2
            # zfs set shareiscsi=on tank/volumes/v2
            # iscsitadm list target
            Target: tank/volumes/v2
                 iSCSI Name: iqn.1986-03.com.sun:02:984fe301-c412-ccc1-cc80-
            cf9a72aa062a
                 Connections: 0
                           Copyright 2009 Peter Baer Galvin - All Rights Reserved   90




Saturday, May 2, 2009
zpool history -l
                        Shows user name, host name, and zone of
                        command
            # zpool history -l users
            History for ’users’:
            2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0
            [user root on corona:global]
            2008-07-10.09:43:13 zfs create users/marks
            [user root on corona:global]
            2008-07-10.09:43:44 zfs destroy users/marks
            [user root on corona:global]
            2008-07-10.09:43:48 zfs create users/home
            [user root on corona:global]
            2008-07-10.09:43:56 zfs create users/home/markm
            [user root on corona:global]
            2008-07-10.09:44:02 zfs create users/home/marks
            [user root on corona:global]


                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   91




Saturday, May 2, 2009
zpool history -i

                        Shows zfs internal activities - useful for
                        debugging
                # zpool history -i users
                History for ’users’:
                2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0
                2008-07-10.09:43:13 [internal create txg:6] dataset = 21
                2008-07-10.09:43:13 zfs create users/marks
                2008-07-10.09:43:48 [internal create txg:12] dataset = 27
                2008-07-10.09:43:48 zfs create users/home
                2008-07-10.09:43:55 [internal create txg:14] dataset = 33




                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   92




Saturday, May 2, 2009
ZFS Delegate Admin
                        Use zfs allow and zfs unallow to grant
                        and remove permissions
                        Use “delegation” property to manage if
                        delegation enabled
                        Then delegate
                 # zfs allow cindys create,destroy,mount,snapshot tank/cindys
                 # zfs allow tank/cindys
                 -------------------------------------------------------------
                 Local+Descendent permissions on (tank/cindys)
                 user cindys create,destroy,mount,snapshot
                 -------------------------------------------------------------

                 # zfs unallow cindys tank/cindys
                 # zfs allow tank/cindys

                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   93




Saturday, May 2, 2009
ZFS - Odds and Ends
                        zfs get all will display all set attributes of all ZFS file
                        systems
                        Recursive snapshots (via -r) as of S10 8/07
                        zfs clone makes a RW copy of a snapshot
                        zfs promote sets the root of the file system to be the
                        specified clone
                        You can undo a zpool destroy with zpool import
                        -D
                        As of S10 8/07 ZFS is integrated with FMA
                        As of S10 11/06 ZFS supports double-RAID parity
                                  Copyright 2009 Peter Baer Galvin - All Rights Reserved   94




Saturday, May 2, 2009
ZFS “GUI”

                        Did you know that Solaris has an admin
                        GUI?
                        Webconsole enabled by default
                        Turn off via svcadm if not used
                        By default (on Nevada B64 at least) ZFS
                        only on-by-default feature


                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   95




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   96




Saturday, May 2, 2009
ZFS Automatic Snapshots
                        In Nevada 100 (LSARC 2008/571) - will be in OpenSolaris
                        2008.11

                        SMF service and GNOME app

                        Can take automatic scheduled snapshots

                              By default all zfs file systems, at boot, then every 15
                              minutes, every hour, every day, etc

                              Auto delete of oldest snapshots if user-defined
                              amount of space is not available

                        Can perform incremental or full backups via those snapshots
                        Nautilus integration allows user to browse and restore files
                        graphically

                                  Copyright 2009 Peter Baer Galvin - All Rights Reserved   97




Saturday, May 2, 2009
ZFS Automatic Snapshots (cont)


                        One SMF service per time frequency:
              frequent      snapshots every 15 mins, keeping 4 snapshots
              hourly        snapshots every hour, keeping 24 snapshots
              daily         snapshots every day, keeping 31 snapshots
              weekly        snapshots every week, keeping 7 snapshots
              monthly       snapshots every month, keeping 12 snapshots

                        Details here: http://src.opensolaris.org/source/xref/jds/zfs-
                        snapshot/README.zfs-auto-snapshot.txt




                                  Copyright 2009 Peter Baer Galvin - All Rights Reserved   98




Saturday, May 2, 2009
ZFS Automatic Snapshots (cont)
                        Service properties provide more details

                        zfs/fs-name	 	        The name of the filesystem. If the special filesystem name "//" is used, then the
                        system snapshots only filesystems with the zfs user property "com.sun:auto-snapshot:<label>" set to
                        true, so to take frequent snapshots of tank/timf, run the following zfs command:

                           # zfs set com.sun:auto-snapshot:frequent=true tank/timf
                        The "snap-children" property is ignored when using this fs-name value. Instead, the system
                        automatically determines when it's able to take recursive, vs. non-recursive snapshots of the system,
                        based on the values of the ZFS user properties.

                        zfs/interval	 	       [ hours | days | months | none]	

                        When set to none, we don't take automatic snapshots, but leave an SMF instance available for users to
                        manually fire the method script whenever they want - useful for snapshotting on system events.

                        zfs/keep	        	     How many snapshots to retain - eg. setting this to "4" would keep only the four
                        most recent snapshots. When each new snapshot is taken, the oldest is destroyed. If a snapshot has
                        been cloned, the service will drop to maintenance mode when attempting to destroy that snapshot.
                        Setting to "all" keeps all snapshots.

                         zfs/period	 	         How often you want to take snapshots, in intervals set according to "zfs/
                        interval" (eg. every 10 days)



                                          Copyright 2009 Peter Baer Galvin - All Rights Reserved                                 99




Saturday, May 2, 2009
ZFS Automatic Snapshots (cont)
                        zfs/snapshot-children	 "true" if you would like to recursively take snapshots of all child
                        filesystems of the specified fs-name. This value is ignored when setting zfs/fs-name='//'

                        zfs/backup		       [ full | incremental | none ]

                        zfs/backup-save-cmd	 The command string used to save the backup stream.

                        zfs/backup-lock	 You shouldn't need to change this - but it should be set to "unlocked"
                        by default. We use it to indicate when a backup is running.

                        zfs/label	 	        A label that can be used to differentiate this set of snapshots from
                        others, not required. If multiple schedules are running on the same machine, using
                        distinct labels for each schedule is needed - otherwise oneschedule could remove
                        snapshots taken by another schedule according to it's snapshot-retention policy. (see
                        "zfs/keep")	

                        zfs/verbose	    	     Set to false by default, setting to true makes the service
                        produce more output about what it's doing.

                         zfs/avoidscrub	 Set to false by default, this determines whether we should avoid
                        taking snapshots on any pools that have a scrub or resilver in progress. More info in the
                        bugid:

                                6343667 need itinerary so interrupted scrub/resilver doesn't have to start over
                                       Copyright 2009 Peter Baer Galvin - All Rights Reserved                        100




Saturday, May 2, 2009
ZFS Automatic Snapshot (cont)




                        http://blogs.sun.com/erwann/resource/
                        menu-location.png




                              Copyright 2009 Peter Baer Galvin - All Rights Reserved   101




Saturday, May 2, 2009
ZFS Automatic Snapshot (cont)



                        If life-preserver icon enabled in file browser,
                        then backup of directory is available
                            Press to bring up nav bar




                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   102




Saturday, May 2, 2009
ZFS Automatic Snapshot (cont)
                        Drag slider into past to show previous version
                        of files in the directory
                        Then right-click on afile and select “Restore to
                        Desktop” if you want it back
                        More features coming




                            Press to bring up nav bar
                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   103




Saturday, May 2, 2009
ZFS Status
                        Netbackup, Legato support ZFS for
                        backup / restore
                        VCS supports ZFS as file system of
                        clustered services
                        Most vendors don’t care which file system
                        app runs on
                        Performance as good as other file systems
                            Feature set better

                              Copyright 2009 Peter Baer Galvin - All Rights Reserved   104




Saturday, May 2, 2009
ZFS Futures
                        Support by ISVs
                               Backup / restore
                                    Some don’t get metadata (yet)

                                    Use zfs send to emit file containing filesystem

                               Clustering (see Lustre)

                        Performance still a work in progress
                        Being ported to BSD, Mac OS Leopard
                        Check out the ZFS FAQ at
                                 http://www.opensolaris.org/os/community/zfs/faq/



                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   105




Saturday, May 2, 2009
ZFS Performance
                         From http://www.opensolaris.org/jive/thread.jspa?
                         messageID=14997
             billm


               Reply
                 On Thu, Nov 17, 2005 at 05:21:36AM -0800, Jim Lin wrote:
                 > Does ZFS reorganize (ie. defrag) the files over time?

                     Not yet.

                     > If it doesn't, it might not perform well in "write-little read-much"
                     > scenarios (where read performance is much more important than write
                     > performance).

                     As always, the correct answer is "it depends". Let's take a look at
                     several cases:

                     - Random reads: No matter if the data was written randomly or
                     sequentially, random reads are random for any filesystem,
                     regardless of their layout policy. Not much you can do to
                     optimize these, except have the best I/O scheduler possible.

                                       Copyright 2009 Peter Baer Galvin - All Rights Reserved   106




Saturday, May 2, 2009
ZFS Performance (cont)

                  - Sequential writes, sequential reads: With ZFS, sequential writes
                  lead to sequential layout on disk. So sequential reads will
                  perform quite well in this case.

                  - Random writes, sequential reads: This is the most interesting
                  case. With random writes, ZFS turns them into sequential writes,
                  which go *really* fast. With sequential reads, you know which
                  order the reads are going to be coming in, so you can kick off
                  a bunch of prefetch reads. Again, with a good I/O scheduler
                  (which ZFS just happens to have), you can turn this into good read
                  performance, if not entirely as good as totally sequential.

                  Believe me, we've thought about this a lot. There is a lot we can do to
                  improve performance, and we're just getting started.


                                 Copyright 2009 Peter Baer Galvin - All Rights Reserved     107




Saturday, May 2, 2009
ZFS Performance (cont)
                        For DBs and other direct-disk-access-
                        wanting applications
                           There is no direct I/O in ZFS
                           But can get very good performance by
                           matching I/O size of the app (e.g.
                           Oracle uses 8K) with recordsize of zfs
                           file system
                              This is set at filesystem create time
                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   108




Saturday, May 2, 2009
ZFS Performance (cont)
                        The ZIL can be a bottleneck on NFS servers
                           NFS does sync writes
                           Put the ZIL on another disk, or on SSD
                        ZFS aggressively uses memory for caching
                        Low priority user, but can cause temporary
                        conflicts with other users
                        Use arcstat to monitor memory use
                    http://www.solarisinternals.com/wiki/index.php/
                    Arcstat
                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   109




Saturday, May 2, 2009
ZFS Backup Tool
                        Zetaback is a thin-agent based ZFS backup tool

                        Runs from a central host

                        Scans clients for new ZFS filesystems

                        Manages varying desired backup intervals (per host) for

                               full backups
                               incremental backups

                        Maintain varying retention policies (per host)

                        Summarize existing backups

                        Restore any host:fs backup at any point in time to any target
                        host
                         https://labs.omniti.com/trac/zetaba
                                     Copyright 2009 Peter Baer Galvin - All Rights Reserved   110




Saturday, May 2, 2009
zfs upgrade
                        On-disk format of ZFS changes over time
                        Forward-upgradeable, but not backward
                        compatible
                           Watch out when attaching and detaching zpools
                           Also “sent” not readable by older zfs versions
                 # zfs upgrade
                 This system is currently running ZFS    filesystem version 2.
                 The following filesystems are out of    date, and can be upgraded. After being
                 upgraded, these filesystems (and any    ’zfs send’ streams generated from
                 subsequent snapshots) will no longer    be accessible by older software
                 versions.
                 VER FILESYSTEM
                 --- ------------
                 1 datab
                 1 datab/users
                 1 datab/users/area51

                                 Copyright 2009 Peter Baer Galvin - All Rights Reserved           111




Saturday, May 2, 2009
Automatic Snapshots and Backups



                        Unsupported services, may become
                        supported
                        http://blogs.sun.com/timf/entry/
                        zfs_automatic_snapshots_0_10
                        http://blogs.sun.com/timf/entry/
                        zfs_automatic_for_the_people




                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   112




Saturday, May 2, 2009
ZFS - Smashing!




                        http://www.youtube.com/watch?v=CN6iDzesEs0&fmt=18
                               Copyright 2009 Peter Baer Galvin - All Rights Reserved   113




Saturday, May 2, 2009
Storage Odds and Ends
                  iostat -y    shows performance info on multipathed devices

                  raidctl     is RAID configuration tool for multiple RAID controllers

                  fsstat    file-system based stat command
                  # fsstat -F
                   new      name       name   attr   attr lookup rddir     read read    write write
                   file remov          chng    get   set     ops    ops     ops bytes     ops bytes
                        0          0      0      0      0       0      0     0      0       0     0 ufs
                        0          0      0 26.0K       0   52.0K   354 4.71K 1.56M         0     0 proc
                        0          0      0      0      0       0      0     0      0       0     0 nfs
                  53.2K 1.02K 24.0K 8.99M 48.6K             4.26M   161K 44.8M 11.8G 23.1M 6.58G zfs
                        0          0      0 2.94K       0       0      0     0      0       0     0 lofs
                  7.26K 2.84K 4.30K 31.5K              83   35.4K      6 40.5K 41.3M 45.6K 39.2M tmpfs
                        0          0      0    410      0       0      0     33 11.0K       0     0 mntfs
                        0          0      0      0      0       0      0     0      0       0     0 nfs3
                        0          0      0      0      0       0      0     0      0       0     0 nfs4
                        0          0      0      0      0       0      0     0      0       0     0 autofs
                                         Copyright 2009 Peter Baer Galvin - All Rights Reserved              114




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes
                  http://developers.sun.com/openstorage/articles/opensolaris_storage_server.html
                  Example 1: ZFS Filesystem

                  Objectives:

                                 Understand the purpose of the ZFS filesystem.

                                 Configure a ZFS pool and filesystem.

                  Requirements:

                                 A server (SPARC or x64 based) running the OpenSolaris OS.

                                 Configuration details from the running server.

                  Step 1: Identify your Disks.

                  Identify the storage available for adding to the ZFS pool using the format(1) command. Your output will vary from that shown here:

                  # format
                  Searching for disks...done
                  AVAILABLE DISK SELECTIONS:
                                0. c0t2d0
                                   /pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@2,0
                                1. c0t3d0
                                   /pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@3,0
                  Specify disk (enter its number): ^D

                                              Copyright 2009 Peter Baer Galvin - All Rights Reserved                                                   115




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont




                  Step 2: Add your disks to your ZFS pool.

                  # zpool create -f            mypool c0t3d0s0
                  # zpool list
                  NAME          SIZE       USED     AVAIL     CAP    HEALTH    ALTROOT
                  mypool         10G        94K     10.0G      0%    ONLINE    -
                  Step 3: Create a filesystem in your pool.

                  # zfs create mypool/myfs
                  # df -h /mypool/myfs
                  Filesystem                          size   used    avail capacity       Mounted on
                  mypool/myfs                         9.8G    18K     9.8G         1%     /mypool/myfs




                                       Copyright 2009 Peter Baer Galvin - All Rights Reserved            116




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont
                  Example 2: Network File System (NFS)

                  Objectives:

                                Understand the purpose of the NFS filesystem.

                                Create an NFS shared filesystem on a server and mount it on a client.

                  Requirements:

                                Two servers (SPARC or x64 based) - one from the previous example - running the OpenSolaris OS.

                                Configuration details from the running systems.

                  Step 1: Create the NFS shared filesystem on the server.

                  Switch on the NFS service on the server:

                  # svcs nfs/server
                  STATE                   STIME          FMRI
                  disabled                6:49:39        svc:/network/nfs/server:default
                  # svcadm enable nfs/server
                  Share the ZFS filesystem over NFS:

                  # zfs set sharenfs=on mypool/myfs
                  # dfshares
                  RESOURCE                        SERVER ACCESS TRANSPORT
                  x4100:/mypool/myfs              x4100      -             -

                                             Copyright 2009 Peter Baer Galvin - All Rights Reserved                              117




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  Step 2: Switch on the NFS service on the client.

                  This is similar to the the procedure for the server:

                  # svcs nfs/client
                  STATE                   STIME          FMRI
                  disabled                6:47:03        svc:/network/nfs/client:default
                  # svcadm enable nfs/client
                  Mount the shared filesystem on the client:

                  # mkdir /mountpoint
                  # mount -F nfs x4100:/mypool/myfs /mountpoint
                  # df -h /mountpoint
                  Filesystem                     size     used       avail capacity    Mounted on
                  x4100:/mypool/myfs 9.8G                 18K        9.8G   1%        /mountpoint




                                      Copyright 2009 Peter Baer Galvin - All Rights Reserved        118




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  Example 3: Common Internet File System (CIFS)

                  Objectives:

                                Understand the purpose of the CIFS filesystem.

                                Configure a CIFS share on one machine (from the previous example) and make it available on the other machine.

                  Requirements:

                                Two servers (SPARC or x64 based) running the OpenSolaris OS.

                                Configuration details provided here.

                  Step 1: Create a ZFS filesystem for CIFS.

                  # zfs create -o casesensitivity=mixed mypool/myfs2
                  # df -h /mypool/myfs2
                  Filesystem         size       used avail capacity Mounted on
                  mypool/myfs 2 9.8G 18K                9.8G      1%            /mypool/myfs2
                  Step 2: Switch on the SMB Server service on the server.

                  # svcs smb/server
                  STATE                   STIME         FMRI
                  disabled                6:49:39       svc:/network/smb/server:default
                  # svcadm enable smb/server


                                            Copyright 2009 Peter Baer Galvin - All Rights Reserved                                             119




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  Step 3: Share the filesystem using CIFS.

                  # zfs set sharesmb=on mypool/myfs2
                  Verify using the following command:

                  # zfs get sharesmb mypool/myfs2
                  NAME                       PROPERTY             VALUE    SOURCE
                  mypool/myfs2               sharesmb             on       local
                  Step 4: Verify the CIFS naming.

                  Because we have not explicitly named the share, we can examine the default name assigned to it using the following command:

                  # sharemgr show -vp
                  default nfs=()
                  zfs
                         zfs/mypool/myfs nfs=()
                                     /mypool/myfs
                         zfs/mypool/myfs2 smb=()
                                     mypool_myfs2=/mypool/myfs2
                  Both the NFS share (/mypool/myfs) and the CIFS share (mypool_myfs2) are shown.

                  Step 5: Edit the file /etc/pam.conf to support creation of an encrypted version of the user's password for CIFS.

                  Add the following line to the end of the file:

                  other password required pam_smb_passwd.so.1 nowarn


                                                     Copyright 2009 Peter Baer Galvin - All Rights Reserved                                     120




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont



                  Step 6: Change the password using the passwd command.
                  # passwd username
                  New Password:
                  Re-enter new Password:
                  passwd: password successfully changed for root
                  Now repeat Steps 5 and 6 on the Solaris client.

                  Step 7: Enable CIF client services on the client node.
                  # svcs smb/client
                  STATE              STIME        FMRI
                  disabled           6:47:03      svc:/network/smb/client:default
                  # svcadm enable smb/client



                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   121




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  Step 8: Make a mount point on the client and mount the CIFS resource
                  from the server.

                  Mount the resource across the network and check it using the following
                  command sequence:
                  # mkdir /mountpoint2
                  # mount -F smbfs //root@x4100/mypool_myfs2                 /mountpoint2
                  Password: *******
                  # df -h /mountpoint2
                  Filesystem                        size used avail capacity Mounted on
                  //root@x4100/mypool_myfs2 9.8G 18K              9.8G    1%              /
                  mountpoint2
                  # df -n
                  /                       : ufs
                  /mountpoint             : nfs
                  /mountpoint2            : smbfs
                                 Copyright 2009 Peter Baer Galvin - All Rights Reserved       122




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont
                  Example 4: Comstar Fibre Channel Target

                        Objectives

                               Understand the purpose of the Comstar Fibre Channel target.

                               Configure an FC target and initiator on two servers.

                        Requirements:

                               Two servers (SPARC or x64 based) running the OpenSolaris OS.

                               Configuration details provided here.

                        Step 1: Start the SSCSI Target Mode Framework and verify it.

                        Use the following commands to start up and check the service on the host that provides the target:

                  # svcs stmf
                  STATE                 STIME        FMRI
                  disabled              19:15:25 svc:/system/device/stmf:default
                  # svcadm enable stmf
                  # stmfadm list-state
                  Operational Status: online
                  Config Status             : initialized
                                         Copyright 2009 Peter Baer Galvin - All Rights Reserved                              123




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  Step 2: Ensure that the framework can see the ports.

                  Use the following command to ensure that the target mode framework can see the HBA ports:
                  # stmfadm list-target -v

                  Target: wwn.210000E08B909221

                        Operational Status: Online

                        Provider Name     : qlt
                        Alias             : qlt0,0

                        Sessions          : 4
                            Initiator: wwn.210100E08B272AB5

                                Alias: ute198:qlc1
                                Logged in since: Thu Mar 27 16:38:30 2008

                            Initiator: wwn.210100E08B296A60
                                Alias: ute198:qlc3
                                Logged in since: Thu Mar 27 16:38:30 2008

                            Initiator: wwn.210000E08B072AB5
                                Alias: ute198:qlc0

                                Logged in since: Thu Mar 27 16:38:30 2008
                            Initiator: wwn.210000E08B096A60

                                Alias: ute198:qlc2
                                Logged in since: Thu Mar 27 16:38:30 2008

                                        Copyright 2009 Peter Baer Galvin - All Rights Reserved                124




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  Target: wwn.210100E08BB09221
                        Operational Status: Online
                        Provider Name        : qlt
                        Alias                : qlt1,0
                        Sessions             : 4
                            Initiator: wwn.210100E08B272AB5
                                   Alias: ute198:qlc1
                                   Logged in since: Thu Mar 27 16:38:30 2008
                            Initiator: wwn.210100E08B296A60
                                   Alias: ute198:qlc3
                                   Logged in since: Thu Mar 27 16:38:30 2008
                            Initiator: wwn.210000E08B072AB5
                                   Alias: ute198:qlc0
                                   Logged in since: Thu Mar 27 16:38:30 2008
                            Initiator: wwn.210000E08B096A60
                                   Alias: ute198:qlc2
                                   Logged in since: Thu Mar 27 16:38:30 2008



                                       Copyright 2009 Peter Baer Galvin - All Rights Reserved   125




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  Step 3: Create a device to use as storage for the target.

                           Use ZFS to create a volume (zvol) for use as the storage behind the
                           target:


                  # zpool list
                  NAME       SIZE       USED    AVAIL       CAP    HEALTH    ALTROOT
                  mypool       68G        94K   68.0G        0%    ONLINE    -


                  # zfs create -V 5gb mypool/myvol
                  # zfs list
                  NAME                 USED     AVAIL   REFER     MOUNTPOINT
                  mypool              5.00G     61.9G      18K    /mypool
                  mypool/myvol            5G    66.9G      16K    -




                                     Copyright 2009 Peter Baer Galvin - All Rights Reserved      126




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  Step 4: Register the zvol with the framework.

                  The zvol becomes the SCSI logical unit (disk) behind the target:
                  # sbdadm create-lu /dev/zvol/rdsk/mypool/myvol
                  Created the following LU:
                  GUID                                   DATA SIZE     SOURCE
                  6000ae4093000000000047f3a1930007 5368643584          /dev/zvol/rdsk/mypool/
                  myvol


                  Confirm its existence as follows:


                  # stmfadm list-lu -v
                  LU Name: 6000AE4093000000000047F3A1930007

                        Operational Status: Online

                        Provider Name       : sbd
                        Alias               : /dev/zvol/rdsk/mypool/myvol
                         View Entry Count     : 0
                                  Copyright 2009 Peter Baer Galvin - All Rights Reserved        127




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  Step 5: Find the initiator HBA ports to which to map the LUs.

                  Discover HBA ports on the initiator host using the following command:
                  # fcinfo hba-port
                  HBA Port WWN: 25000003ba0ad303
                           Port Mode: Initiator
                           Port ID: 1
                           OS Device Name: /dev/cfg/c5
                           Manufacturer: QLogic Corp.
                           Model: 2200
                           Firmware Version: 2.1.145
                           FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver:
                           Type: L-port
                           State: online
                           Supported Speeds: 1Gb
                           Current Speed: 1Gb
                           Node WWN: 24000003ba0ad303


                                   Copyright 2009 Peter Baer Galvin - All Rights Reserved   128




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  Step 5: Find the initiator HBA ports to which to map the LUs.

                  Discover HBA ports on the initiator host using the following command:
                  # fcinfo hba-port
                  HBA Port WWN: 25000003ba0ad303
                           Port Mode: Initiator
                           Port ID: 1
                           OS Device Name: /dev/cfg/c5
                           Manufacturer: QLogic Corp.
                           Model: 2200
                           Firmware Version: 2.1.145
                           FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver:
                           Type: L-port
                           State: online
                           Supported Speeds: 1Gb
                           Current Speed: 1Gb
                           Node WWN: 24000003ba0ad303
                            . . .
                                    Copyright 2009 Peter Baer Galvin - All Rights Reserved   129




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  Step 6: Create a host group and add the world-wide numbers (WWNs) of the initiator host HBA
                  ports to it.

                  Name the group mygroup:
                  # stmfadm create-hg mygroup
                  # stmfadm list-hg
                  Host Group: mygroup

                  Add the WWNs of the ports to the group:
                  # stmfadm add-hg-member -g mygroup wwn.210000E08B096A60 
                  wwn.210100E08B296A60 
                  wwn.210100E08B272AB5 
                  wwn.210000E08B072AB5

                  Now check that everything is in order:
                  # stmfadmlist-hg-member -v -g mygroup

                  With the host group created, you're now ready to export the logical unit. This is accomplished by
                  adding a view entry to the logical unit using this host group, as shown in the following command:
                  # stmfadm add-view -h mygroup       6000AE4093000000000047F3A1930007



                                     Copyright 2009 Peter Baer Galvin - All Rights Reserved                           130




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont


                  Step 7: Check the visibility of the targets on the initiator host.

                  First, force the devices on the initiator host to be rescanned with a simple
                  script:
                  #!/bin/ksh
                  fcinfo hba-port |grep "^HBA" |awk '{print $4}'|while read ln
                  do
                            fcinfo remote-port -p $ln -s >/dev/null 2>&1
                  done
                  The disk exported over FC should then appear in the format list:
                  # format
                  Searching for disks...done
                  c6t6000AE4093000000000047F3A1930007d0: configured with
                  capacity of 5.00GB


                                  Copyright 2009 Peter Baer Galvin - All Rights Reserved         131




Saturday, May 2, 2009
Build an OpenSolaris Storage Server in 10 Minutes - cont

                  ...
                  partition> p
                  Current partition table (default):
                  Total disk cylinders available: 20477 + 2 (reserved cylinders)


                  Part         Tag      Flag      Cylinders           Size               Blocks
                    0          root      wm        0 -    511       128.00MB      (512/0/0)       262144
                    1          swap      wu      512 -   1023       128.00MB      (512/0/0)       262144
                    2     backup         wu        0 - 20476          5.00GB      (20477/0/0) 10484224
                    3 unassigned         wm        0                  0           (0/0/0)              0
                    4 unassigned         wm        0                  0           (0/0/0)              0
                    5 unassigned         wm        0                  0           (0/0/0)              0
                    6           usr      wm     1024 - 20476          4.75GB      (19453/0/0)     9959936
                    7 unassigned         wm        0                  0           (0/0/0)              0


                  partition>



                                      Copyright 2009 Peter Baer Galvin - All Rights Reserved                132




Saturday, May 2, 2009
ZFS Root
                  Solaris 10 10/08 (aka S10U6) supports installation with ZFS as the root file
                  system (as does OpenSolaris)

                  Note that you can’t as of U6 flash archive a ZFS root system(!)

                  Can upgrade by using liveupgrade (LU) to mirror to second disk (ZFS pool) and
                  upgrading there, then booting there

                  lucreate to copy the primary BE to create an alternate BE

                          # zpool create mpool mirror c1t0d0s0 c1t1d0s0
                          # lucreate -c c1t2d0s0 -n zfsBE -p mpool
                  The default file systems are created in the specified pool and the non-shared file
                  systems are then copied into the root pool

                  Run luupgrade to upgrade the alternate BE (optional)

                 Run luactivate on the newly upgraded alternatve BE so that when the system is
                rebooted, it will be the new primary BE

                # luactivate zfsBE
                                  Copyright 2009 Peter Baer Galvin - All Rights Reserved            133




Saturday, May 2, 2009
Life is good
                        Once on ZFS as root, life is good
                        Mirror the root disk with 1 command (if not mirrored):
                  # zpool attach rpool c1t0d0s0 c1t1d0s0
                            Note that you have to manually do an installboot on the
                            mirrored disk
                        Now consider all the ZFS features, used on the boot disk
                            Snapshot before patch, upgrade, any change
                                  Undo change via 1 command
                            Replicate to another system for backup, DR
                            ...

                                   Copyright 2009 Peter Baer Galvin - All Rights Reserved   134




Saturday, May 2, 2009
ZFS Labs
                        What pools are available in your zone?
                           What are their states?
                           What is their performance like?
                        What ZFS file systems?
                        Create a new file system
                        Create a file there
                        Take a snapshot of that file system
                        Delete the file
                        Revert to the file system state as of the snapshot
                        How do you see the contents of a snapshot?

                                 Copyright 2009 Peter Baer Galvin - All Rights Reserved   135




Saturday, May 2, 2009
ZFS Final Thought
                  Eric Schrock's Weblog      -   Thursday Nov 17, 2005

                  UFS/SVM vs. ZFS: Code Complexity

                  A lot of comparisons have been done, and will continue to be done, between ZFS and other filesystems. People
                  tend to focus on performance, features, and CLI tools as they are easier to compare. I thought I'd take a moment
                  to look at differences in the code complexity between UFS and ZFS. It is well known within the kernel group that
                  UFS is about as brittle as code can get. 20 years of ongoing development, with feature after feature being
                  bolted on tends to result in a rather complicated system. Even the smallest changes can have wide ranging
                  effects, resulting in a huge amount of testing and inevitable panics and escalations. And while SVM is
                  considerably newer, it is a huge beast with its own set of problems. Since ZFS is both a volume manager and a
                  filesystem, we can use this script written by Jeff to count the lines of source code in each component. Not a true
                  measure of complexity, but a reasonable approximation to be sure. Running it on the latest version of the gate
                  yields:

                   UFS: kernel= 46806 user= 40147 total= 86953

                   SVM: kernel= 75917 user=161984 total=237901

                  TOTAL: kernel=122723 user=202131 total=324854

                   ZFS: kernel= 50239 user= 21073 total= 71312

                  The numbers are rather astounding. Having written most of the ZFS CLI, I found the most horrifying number to
                  be the 162,000 lines of userland code to support SVM. This is more than twice the size of all the ZFS code
                  (kernel and user) put together! And in the end, ZFS is about 1/5th the size of UFS and SVM. I wonder what
                  those ZFS numbers will look like in 20 years...
                                          Copyright 2009 Peter Baer Galvin - All Rights Reserved                                      136




Saturday, May 2, 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved   137




Saturday, May 2, 2009
Where to Learn More
                    Community: http://www.opensolaris.org/os/community/zfs
                    Wikipedia: http://en.wikipedia.org/wiki/ZFS
                    ZFS blogs: http://blogs.sun.com/main/tags/zfs
                        ZFS ports
                           Apple Mac: http://developer.apple.com/adcnews
                           FreeBSD: http://wiki.freebsd.org/ZFS
                           Linux/FUSE: http://zfs-on-fuse.blogspot.com
                           As an appliance: http://www.nexenta.com
                    Beginner’s Guide to ZFS: http://www.sun.com/bigadmin/
                    features/articles/zfs_overview.jsp
                                Copyright 2009 Peter Baer Galvin - All Rights Reserved   138




Saturday, May 2, 2009
Sun Storage 7x10



                         Copyright 2009 Peter Baer Galvin - All Rights Reserved   139




Saturday, May 2, 2009
Speaking of Futures



                        The future of Sun storage?
                        Announced 11/10/2008




                              Copyright 2009 Peter Baer Galvin - All Rights Reserved   140




Saturday, May 2, 2009
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3
2009 04.s10-admin-topics3

Contenu connexe

Plus de Desmond Devendran (20)

Siam key-facts
Siam key-factsSiam key-facts
Siam key-facts
 
Siam foundation-process-guides
Siam foundation-process-guidesSiam foundation-process-guides
Siam foundation-process-guides
 
Siam foundation-body-of-knowledge
Siam foundation-body-of-knowledgeSiam foundation-body-of-knowledge
Siam foundation-body-of-knowledge
 
Enterprise service-management-essentials
Enterprise service-management-essentialsEnterprise service-management-essentials
Enterprise service-management-essentials
 
Service Integration and Management
Service Integration and Management Service Integration and Management
Service Integration and Management
 
Diagram of iso_22301_implementation_process_en
Diagram of iso_22301_implementation_process_enDiagram of iso_22301_implementation_process_en
Diagram of iso_22301_implementation_process_en
 
CHFI 1
CHFI 1CHFI 1
CHFI 1
 
File000176
File000176File000176
File000176
 
File000175
File000175File000175
File000175
 
File000174
File000174File000174
File000174
 
File000173
File000173File000173
File000173
 
File000172
File000172File000172
File000172
 
File000171
File000171File000171
File000171
 
File000170
File000170File000170
File000170
 
File000169
File000169File000169
File000169
 
File000168
File000168File000168
File000168
 
File000167
File000167File000167
File000167
 
File000166
File000166File000166
File000166
 
File000165
File000165File000165
File000165
 
File000164
File000164File000164
File000164
 

Dernier

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Dernier (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

2009 04.s10-admin-topics3

  • 1. Solaris 10 Administration Topics Workshop 3 - File Systems By Peter Baer Galvin For Usenix Last Revision April 2009 Copyright 2009 Peter Baer Galvin - All Rights Reserved Saturday, May 2, 2009
  • 2. About the Speaker Peter Baer Galvin - 781 273 4100 pbg@cptech.com www.cptech.com peter@galvin.info My Blog: www.galvin.info Bio Peter Baer Galvin is the Chief Technologist for Corporate Technologies, Inc., a leading systems integrator and VAR, and was the Systems Manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines. He was contributing editor of the Solaris Corner for SysAdmin Magazine , wrote Pete's Wicked World, the security column for SunWorld magazine, and Pete’s Super Systems, the systems administration column there. He is now Sun columnist for the Usenix ;login: magazine. Peter is co-author of the Operating Systems Concepts and Applied Operating Systems Concepts texbooks. As a consultant and trainer, Mr. Galvin has taught tutorials in security and system administration and given talks at many conferences and institutions. Copyright 2009 Peter Baer Galvin - All Rights Reserved 2 Saturday, May 2, 2009
  • 3. Objectives Cover a wide variety of topics in Solaris 10 Useful for experienced system administrators Save time Avoid (my) mistakes Learn about new stuff Answer your questions about old stuff Won't read the man pages to you Workshop for hands-on experience and to reinforce concepts Note – Security covered in separate tutorial Copyright 2009 Peter Baer Galvin - All Rights Reserved 3 Saturday, May 2, 2009
  • 4. More Objectives What makes novice vs. advanced administrator? Bytes as well as bits, tactics and strategy Knows how to avoid trouble How to get out of it once in it How to not make it worse Has reasoned philosophy Has methodology Copyright 2009 Peter Baer Galvin - All Rights Reserved 4 Saturday, May 2, 2009
  • 5. Prerequisites Recommend at least a couple of years of Solaris experience Or at least a few years of other Unix experience Best is a few years of admin experience, mostly on Solaris Copyright 2009 Peter Baer Galvin - All Rights Reserved 5 Saturday, May 2, 2009
  • 6. About the Tutorial Every SysAdmin has a different knowledge set A lot to cover, but notes should make good reference So some covered quickly, some in detail Setting base of knowledge Please ask questions But let’s take off-topic off-line Solaris BOF Copyright 2009 Peter Baer Galvin - All Rights Reserved 6 Saturday, May 2, 2009
  • 7. Fair Warning Sites vary Circumstances vary Admin knowledge varies My goals Provide information useful for each of you at your sites Provide opportunity for you to learn from each other Copyright 2009 Peter Baer Galvin - All Rights Reserved 7 Saturday, May 2, 2009
  • 8. Why Listen to Me 20 Years of Sun experience Seen much as a consultant Hopefully, you've used: My Usenix ;login: column The Solaris Corner @ www.samag.com The Solaris Security FAQ SunWorld “Pete's Wicked World” SunWorld “Pete's Super Systems” Unix Secure Programming FAQ (out of date) Operating System Concepts (The Dino Book), now 8th ed Applied Operating System Concepts Copyright 2009 Peter Baer Galvin - All Rights Reserved 8 Saturday, May 2, 2009
  • 9. Slide Ownership As indicated per slide, some slides copyright Sun Microsystems Feel free to share all the slides - as long as you don’t charge for them or teach from them for fee Copyright 2009 Peter Baer Galvin - All Rights Reserved 9 Saturday, May 2, 2009
  • 10. Overview Lay of the Land Copyright 2009 Peter Baer Galvin - All Rights Reserved Saturday, May 2, 2009
  • 11. Schedule Times and Breaks Copyright 2009 Peter Baer Galvin - All Rights Reserved 11 Saturday, May 2, 2009
  • 12. Coverage Solaris 10+, with some Solaris 9 where needed Selected topics that are new, different, confusing, underused, overused, etc Copyright 2009 Peter Baer Galvin - All Rights Reserved 12 Saturday, May 2, 2009
  • 13. Outline Overview Objectives Choosing the most appropriate file system(s) UFS / SDS Veritas FS / VM (not in detail) ZFS Copyright 2009 Peter Baer Galvin - All Rights Reserved 13 Saturday, May 2, 2009
  • 14. Polling Time Solaris releases in use? Plans to upgrade? Other OSes in use? Use of Solaris rising or falling? SPARC and x86 OpenSolaris? Copyright 2009 Peter Baer Galvin - All Rights Reserved 14 Saturday, May 2, 2009
  • 15. Your Objectives? Copyright 2009 Peter Baer Galvin - All Rights Reserved 15 Saturday, May 2, 2009
  • 16. Lab Preparation Have device capable of telnet on the USENIX network Or have a buddy Learn your “magic number” Telnet to 131.106.62.100+”magic number” User “root, password “lisa” It’s all very secure Copyright 2009 Peter Baer Galvin - All Rights Reserved 16 Saturday, May 2, 2009
  • 17. Lab Preparation Or... Use virtualbox Use your own system Use a remote machine you have legit access to Copyright 2009 Peter Baer Galvin - All Rights Reserved 17 Saturday, May 2, 2009
  • 18. Choosing the Most Appropriate File Systems Copyright 2009 Peter Baer Galvin - All Rights Reserved Saturday, May 2, 2009
  • 19. Choosing the Most Appropriate File Systems Many file systems, many not optional (tmpfs et al) Where you have choice, how to choose? Consider Solaris version being used < S10 means no ZFS ISV support For each ISV make sure desired FS is supported Apps, backups, clustering Priorities Now weigh priorities of performance, reliability, experience, features, risk / reward Copyright 2009 Peter Baer Galvin - All Rights Reserved 19 Saturday, May 2, 2009
  • 20. Consider... Pros and cons of mixing file systems Root file system Not much value in using vxfs / vxvm here unless used elsewhere Interoperability (need to detach from one type of system and attach to another?) Cost Supportability & support model Non-production vs. production use Copyright 2009 Peter Baer Galvin - All Rights Reserved 20 Saturday, May 2, 2009
  • 21. Root Disk Mirroring The Crux of Performance Copyright 2009 Peter Baer Galvin - All Rights Reserved Saturday, May 2, 2009
  • 22. Topics •Root disk mirroring •ZFS Copyright 2009 Peter Baer Galvin - All Rights Reserved 22 Saturday, May 2, 2009
  • 23. Root Disk Mirroring Complicated because Must be bootable Want it protected from disk failure And want the protection to work Can increase or decrease upgrade complexity Veritas Live upgrade Copyright 2009 Peter Baer Galvin - All Rights Reserved 23 Saturday, May 2, 2009
  • 24. Manual Mirroring Vxvm encapsulation can cause lack of availability Vxvm needs a rootdg disk Any automatic mirroring can propagate errors Consider Use disksuite (Solaris Volume Manager) to mirror boot disk Use 3rd disk as rootdg, 3rd disksuite metadb, manual mirror copy Or use 10Mb rootdg on 2 boot disks in disksuite to do the mirroring Best of all worlds – details in column at www.samag.com/solaris Copyright 2009 Peter Baer Galvin - All Rights Reserved 24 Saturday, May 2, 2009
  • 25. Manual Mirroring Sometimes want more than no mirroring, less than real mirroring Thus "manual mirroring" Nightly cron job to copy partitions elsewhere Can be used to duplicate root disk, if installboot used Combination of newfs, mount, ufsdump | ufsrestore Quite effective, useful, and cheap Easy recovery from corrupt root image, malicious error, sysadmin error Has saved at least one client But disk failure can require manual intervention Complete script can be found at www.samag.com/solaris Copyright 2009 Peter Baer Galvin - All Rights Reserved 25 Saturday, May 2, 2009
  • 26. Best Practice – Root Disk Have 4 disks for root! 1st is primary boot device 2nd is disksuite mirror of first 3rd is manual mirror of 1st 4th is manual mirror, kept on a shelf! Put nothing but systems files on these disks (/, /var, /opt, /usr, swap) Copyright 2009 Peter Baer Galvin - All Rights Reserved 26 Saturday, May 2, 2009
  • 27. Aside: Disk Performance Which is faster? 73GB drive 300GB drive 10000 RPM 10000 RPM 3Gb/sec 3Gb/sec Copyright 2009 Peter Baer Galvin - All Rights Reserved 27 Saturday, May 2, 2009
  • 28. UFS / SDS Copyright 2009 Peter Baer Galvin - All Rights Reserved Saturday, May 2, 2009
  • 29. UFS Overview Standard Pre-Solaris 10 file system Many years old, updated continously But still showing its age No integrated volume manager, instead use SDS (disk suite) Very fast, but feature poor For example snapshots exist but only useful for backups Painful to manage, change, repair Copyright 2009 Peter Baer Galvin - All Rights Reserved 29 Saturday, May 2, 2009
  • 30. Features 64-bit pointers 16TB file systems (on 64-bit Solaris) 1TB maximum file size metadata logging (by default) increases performance and keeps file systems (usually) consistent after a crash Lots of ISV and internal command (dump) support Only bootable Solaris file system (until S10 10/08) Dynamic multipathing, but via separate “traffic manager” facility Copyright 2009 Peter Baer Galvin - All Rights Reserved 30 Saturday, May 2, 2009
  • 31. Issues Sometimes there is still corruption Need to run fsck Sometimes it fails Many limits Many features lacking (compared to ZFS) Lots of manual administration tasks format to slice up a disk newfs to format the file system, fsck to check it mount and /etc/vfstab to mount a file system share commands, plus svcadm commands, to NFS export Plus separate volume management Copyright 2009 Peter Baer Galvin - All Rights Reserved 31 Saturday, May 2, 2009
  • 32. Volume Management Separate set of commands (meta*) to manage volumes (RAID et al) For example, to mirror the root file system Have 2 disks with identical partitioning Have 2 small partition per disk for meta-data (here slices 5 and 6) newfs the file systems Create meta-data state databases (at least 3, for quorum) # metadb -a /dev/dsk/c0t0d0s5 # metadb -a /dev/dsk/c0t0d0s6 # metadb -a /dev/dsk/c0t1d0s5 # metadb -a /dev/dsk/c0t1d0s6 Copyright 2009 Peter Baer Galvin - All Rights Reserved 32 Saturday, May 2, 2009
  • 33. Volume Management (cont) Initialize submirrors (components of mirrors) and mirror the partitions - here we do /, swap, and /var # metainit -f d10 1 1 c0t0d0s0 # metainit -f d20 1 1 c0t1d0s0 # metainit d0 -m d10 Make the new / bootable # metaroot d0 # metainit -f d11 1 1 c0t0d0s1 # metainit -f d21 1 1 c0t1d0s1 # metainit d1 -m d11 # metainit -f d14 1 1 c0t0d0s4 # metainit -f d24 1 1 c0t1d0s4 # metainit d4 -m d14 # metainit -f d17 1 1 c0t0d0s7 # metainit -f d27 1 1 c0t1d0s7 # metainit d7 -m d17 Copyright 2009 Peter Baer Galvin - All Rights Reserved 33 Saturday, May 2, 2009
  • 34. Volume Management (cont) Update /etc/vfstab to reflect new meta devices /dev/md/dsk/d1 - - swap - no - /dev/md/dsk/d4 /dev/md/rdsk/d4 /var ufs 1 yes - /dev/md/dsk/d7 /dev/md/rdsk/d7 /export ufs 1 yes - Finally attach the submirror to each device to be mirrored # metattach d0 d20 # metattach d1 d21 # metattach d4 d24 # metattach d7 d27 Now the root disk is mirrored, and commands such as Solaris upgrade, live upgrade, and boot understand that Copyright 2009 Peter Baer Galvin - All Rights Reserved 34 Saturday, May 2, 2009
  • 35. Veritas VM / FS Copyright 2009 Peter Baer Galvin - All Rights Reserved Saturday, May 2, 2009
  • 36. Overview A popular, commercial addition to Solaris 64-bit Integrated volume management (vxfs + vxvm) Mirrored root disk via “encapsulation” Good ISV support Good extended features such as snapshots, replication Shrink and grow file systems Extent based (for better and worse), journaled, clusterable Cross-platform Copyright 2009 Peter Baer Galvin - All Rights Reserved 36 Saturday, May 2, 2009
  • 37. Features Very large limits Dynamic multipathing included Hot spares to automatically replace failed disks Dirty region logging (DRL) volume transaction logs for fast recovery from crash But still can require consistency check Copyright 2009 Peter Baer Galvin - All Rights Reserved 37 Saturday, May 2, 2009
  • 38. Issues $$$ Adds supportability complexities (who do you call) Complicates OS upgrades (unencapsulate first) Fairly complex to manage Comparison of performance vs. ZFS at http://www.sun.com/software/whitepapers/ solaris10/zfs_veritas.pdf Copyright 2009 Peter Baer Galvin - All Rights Reserved 38 Saturday, May 2, 2009
  • 39. ZFS Copyright 2009 Peter Baer Galvin - All Rights Reserved Saturday, May 2, 2009
  • 40. ZFS Looks to be the “next great thing” Shipped officially in S10U2 (the 06/06 release) From scratch file system Includes volume management, file system, reliability, scalability, performance, snapshots, clones, replication 128-bit file system, almost everything is “infinite” Checksumming throughout Simple, endian independent, export/importable… Still using traffic manager for multipathing (some following slides are from ZFS talk by Jeff Bonwick and Bill Moore – ZFS team leads at Sun) Copyright 2009 Peter Baer Galvin - All Rights Reserved 40 Saturday, May 2, 2009
  • 41. Trouble with Existing Filesystems No defense against silent data corruption Any defect in disk, controller, cable, driver, or firmware can corrupt data silently; like running a server without ECC memory Brutal to manage Labels, partitions, volumes, provisioning, grow/shrink, /etc/ vfstab... Lots of limits: filesystem/volume size, file size, number of files, files per directory, number of snapshots, ... Not portable between platforms (e.g. x86 to/from SPARC) Dog slow Linear-time create, fat locks, fixed block size, naïve prefetch, slow random writes, dirty region logging Copyright 2009 Peter Baer Galvin - All Rights Reserved 41 Saturday, May 2, 2009
  • 42. Design Principles Pooled storage Completely eliminates the antique notion of volumes Does for storage what VM did for memory End-to-end data integrity Historically considered “too expensive” Turns out, no it isn't And the alternative is unacceptable Transactional operation Keeps things always consistent on disk Removes almost all constraints on I/O order Allows us to get huge performance wins Copyright 2009 Peter Baer Galvin - All Rights Reserved 42 Saturday, May 2, 2009
  • 43. Why “volumes” Exist In the beginning, each filesystem managed a single disk Customers wanted more space, bandwidth, reliability Rewrite filesystems to handle many disks: hard Insert a little shim (“volume”) to cobble disks together: easy An industry grew up around the FS/volume model Filesystems, volume managers sold as separate products Inherent problems in FS/volume interface can't be fixed Copyright 2009 Peter Baer Galvin - All Rights Reserved 43 Saturday, May 2, 2009
  • 44. Traditional Volumes FS FS Volume Volume (stripe) (mirror) Copyright 2009 Peter Baer Galvin - All Rights Reserved 44 Saturday, May 2, 2009
  • 45. ZFS Pools Abstraction: malloc/free No partitions to manage Grow/shrink automatically All bandwidth always available All storage in the pool is shared Copyright 2009 Peter Baer Galvin - All Rights Reserved 45 Saturday, May 2, 2009
  • 46. ZFS Pooled Storage FS FS FS FS FS Storage Pool Storage Pool (RAIDZ) (Mirror) Copyright 2009 Peter Baer Galvin - All Rights Reserved 46 Saturday, May 2, 2009
  • 47. Copyright 2009 Peter Baer Galvin - All Rights Reserved 47 Saturday, May 2, 2009
  • 48. ZFS Data Integrity Model Everything is copy-on-write Never overwrite live data On-disk state always valid – no “windows of vulnerability” No need for fsck(1M) Everything is transactional Related changes succeed or fail as a whole No need for journaling Everything is checksummed No silent data corruption No panics due to silently corrupted metadata Copyright 2009 Peter Baer Galvin - All Rights Reserved 48 Saturday, May 2, 2009
  • 49. Copyright 2009 Peter Baer Galvin - All Rights Reserved 49 Saturday, May 2, 2009
  • 50. Copyright 2009 Peter Baer Galvin - All Rights Reserved 50 Saturday, May 2, 2009
  • 51. Copyright 2009 Peter Baer Galvin - All Rights Reserved 51 Saturday, May 2, 2009
  • 52. Copyright 2009 Peter Baer Galvin - All Rights Reserved 52 Saturday, May 2, 2009
  • 53. Copyright 2009 Peter Baer Galvin - All Rights Reserved 53 Saturday, May 2, 2009
  • 54. Copyright 2009 Peter Baer Galvin - All Rights Reserved 54 Saturday, May 2, 2009
  • 55. Copyright 2009 Peter Baer Galvin - All Rights Reserved 55 Saturday, May 2, 2009
  • 56. Copyright 2009 Peter Baer Galvin - All Rights Reserved 56 Saturday, May 2, 2009
  • 57. Copyright 2009 Peter Baer Galvin - All Rights Reserved 57 Saturday, May 2, 2009
  • 58. Copyright 2009 Peter Baer Galvin - All Rights Reserved 58 Saturday, May 2, 2009
  • 59. Copyright 2009 Peter Baer Galvin - All Rights Reserved 59 Saturday, May 2, 2009
  • 60. Copyright 2009 Peter Baer Galvin - All Rights Reserved 60 Saturday, May 2, 2009
  • 61. Copyright 2009 Peter Baer Galvin - All Rights Reserved 61 Saturday, May 2, 2009
  • 62. Copyright 2009 Peter Baer Galvin - All Rights Reserved 62 Saturday, May 2, 2009
  • 63. Copyright 2009 Peter Baer Galvin - All Rights Reserved 63 Saturday, May 2, 2009
  • 64. Copyright 2009 Peter Baer Galvin - All Rights Reserved 64 Saturday, May 2, 2009
  • 65. Copyright 2009 Peter Baer Galvin - All Rights Reserved 65 Saturday, May 2, 2009
  • 66. Copyright 2009 Peter Baer Galvin - All Rights Reserved 66 Saturday, May 2, 2009
  • 67. Copyright 2009 Peter Baer Galvin - All Rights Reserved 67 Saturday, May 2, 2009
  • 68. Copyright 2009 Peter Baer Galvin - All Rights Reserved 68 Saturday, May 2, 2009
  • 69. Copyright 2009 Peter Baer Galvin - All Rights Reserved 69 Saturday, May 2, 2009
  • 70. Copyright 2009 Peter Baer Galvin - All Rights Reserved 70 Saturday, May 2, 2009
  • 71. Copyright 2009 Peter Baer Galvin - All Rights Reserved 71 Saturday, May 2, 2009
  • 72. Terms Pool - set of disks in one or more RAID formats (i.e. mirrored stripe) No “/” File system - mountable-container of files Data set - file system, block device, snapshot, volume or clone within a pool Named via pool/path[@snapshot] Copyright 2009 Peter Baer Galvin - All Rights Reserved 72 Saturday, May 2, 2009
  • 73. Terms (cont) ZIL - ZFS intent log On-disk duplicate of in-memory log of changes to make to data sets Write goes to memory, ZIL, is acknowledged, then goes to disk ARC - in-memory read cache L2ARC - level 2 ARC - on flash memory Copyright 2009 Peter Baer Galvin - All Rights Reserved 73 Saturday, May 2, 2009
  • 74. What ZFS doesn’t do Can’t remove individual devices from pools Rather, replace the device, or 3-way mirror including the device and then remove the device Can’t shrink a pool (yet) Can add individual devices, but not optimum (yet) If adding disk to RAIDZ or RAIDZ2, then end up with RAIDZ(2)+ 1 concatenated device Instead add full RAID elements to a pool Add a mirror pair or RAIDZ(2) set Copyright 2009 Peter Baer Galvin - All Rights Reserved 74 Saturday, May 2, 2009
  • 75. zpool # zpool missing command usage: zpool command args ... where 'command' is one of the following: create [-fn] [-o property=value] ... [-O file-system-property=value] ... [-m mountpoint] [-R root] <pool> <vdev> ... destroy [-f] <pool> add [-fn] <pool> <vdev> ... remove <pool> <device> ... list [-H] [-o property[,...]] [pool] ... iostat [-v] [pool] ... [interval [count]] status [-vx] [pool] ... online <pool> <device> ... offline [-t] <pool> <device> ... clear <pool> [device] Copyright 2009 Peter Baer Galvin - All Rights Reserved 75 Saturday, May 2, 2009
  • 76. zpool (cont) attach [-f] <pool> <device> <new-device> detach <pool> <device> replace [-f] <pool> <device> [new-device] scrub [-s] <pool> ... import [-d dir] [-D] import [-o mntopts] [-o property=value] ... [-d dir | -c cachefile] [-D] [-f] [-R root] -a import [-o mntopts] [-o property=value] ... [-d dir | -c cachefile] [-D] [-f] [-R root] <pool | id> [newpool] export [-f] <pool> ... upgrade upgrade -v upgrade [-V version] <-a | pool ...> history [-il] [<pool>] ... get <"all" | property[,...]> <pool> ... set <property=value> <pool> Copyright 2009 Peter Baer Galvin - All Rights Reserved 76 Saturday, May 2, 2009
  • 77. zpool (cont) # zpool create ezfs raidz c2t0d0 c3t0d0 c4t0d0 c5t0d0 # zpool status -v pool: ezfs state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM ezfs ONLINE 0 0 0 raidz ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 errors: No known data errors Copyright 2009 Peter Baer Galvin - All Rights Reserved 77 Saturday, May 2, 2009
  • 78. zpool (cont) pool: zfs state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zfs ONLINE 0 0 0 raidz ONLINE 0 0 0 c0d0s7 ONLINE 0 0 0 c0d1s7 ONLINE 0 0 0 c1d1 ONLINE 0 0 0 c1d0 ONLINE 0 0 0 errors: No known data errors Copyright 2009 Peter Baer Galvin - All Rights Reserved 78 Saturday, May 2, 2009
  • 79. zpool (cont) (/)# zpool iostat -v capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- bigp 630G 392G 2 4 41.3K 496K raidz 630G 392G 2 4 41.3K 496K c0d0s6 - - 0 2 8.14K 166K c0d1s6 - - 0 2 7.77K 166K c1d0s6 - - 0 2 24.1K 166K c1d1s6 - - 0 2 22.2K 166K ---------- ----- ----- ----- ----- ----- ----- Copyright 2009 Peter Baer Galvin - All Rights Reserved 79 Saturday, May 2, 2009
  • 80. zpool (cont) # zpool status -v pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c0d0s0 ONLINE 0 0 0 c0d1s0 ONLINE 0 0 0 errors: No known data errors pool: zpbg state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zpbg ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 errors: No known data errors Copyright 2009 Peter Baer Galvin - All Rights Reserved 80 Saturday, May 2, 2009
  • 81. zpool (cont) zpool iostat -v capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- rpool 6.72G 225G 0 1 9.09K 11.6K mirror 6.72G 225G 0 1 9.09K 11.6K c0d0s0 - - 0 0 5.01K 11.7K c0d1s0 - - 0 0 5.09K 11.7K ---------- ----- ----- ----- ----- ----- ----- zpbg 3.72T 833G 0 0 32.0K 1.24K raidz1 3.72T 833G 0 0 32.0K 1.24K c4t0d0 - - 0 0 9.58K 331 c4t1d0 - - 0 0 10.3K 331 c5t0d0 - - 0 0 10.4K 331 c5t1d0 - - 0 0 10.3K 331 c6t0d0 - - 0 0 9.54K 331 ---------- ----- ----- ----- ----- ----- ----- Copyright 2009 Peter Baer Galvin - All Rights Reserved 81 Saturday, May 2, 2009
  • 82. zpool (cont) Note that for import and export, a pool is the delineator You can’t import or export a file system because it’s an integral part of a pool Might cause you to use smaller pools than other Copyright 2009 Peter Baer Galvin - All Rights Reserved 82 Saturday, May 2, 2009
  • 83. zfs # zfs missing command usage: zfs command args ... where 'command' is one of the following: create [-p] [-o property=value] ... <filesystem> create [-ps] [-b blocksize] [-o property=value] ... -V <size> <volume> destroy [-rRf] <filesystem|volume|snapshot> snapshot [-r] [-o property=value] ... <filesystem@snapname| volume@snapname> rollback [-rRf] <snapshot> clone [-p] [-o property=value] ... <snapshot> <filesystem|volume> promote <clone-filesystem> rename <filesystem|volume|snapshot> <filesystem|volume|snapshot> rename -p <filesystem|volume> <filesystem|volume> rename -r <snapshot> <snapshot> Copyright 2009 Peter Baer Galvin - All Rights Reserved 83 Saturday, May 2, 2009
  • 84. zfs (cont) list [-rH] [-o property[,...]] [-t type[,...]] [-s property] ... [-S property] ... [filesystem|volume|snapshot] ... set <property=value> <filesystem|volume|snapshot> ... get [-rHp] [-o field[,...]] [-s source[,...]] <"all" | property[,...]> [filesystem|volume| snapshot] ... inherit [-r] <property> <filesystem|volume|snapshot> ... upgrade [-v] upgrade [-r] [-V version] <-a | filesystem ...> mount mount [-vO] [-o opts] <-a | filesystem> unmount [-f] <-a | filesystem|mountpoint> share <-a | filesystem> unshare [-f] <-a | filesystem|mountpoint> Copyright 2009 Peter Baer Galvin - All Rights Reserved 84 Saturday, May 2, 2009
  • 85. zfs (cont) send [-R] [-[iI] snapshot] <snapshot> receive [-vnF] <filesystem|volume|snapshot> receive [-vnF] -d <filesystem> allow [-ldug] <"everyone"|user|group>[,...] <perm|@setname>[,...] <filesystem|volume> allow [-ld] -e <perm|@setname>[,...] <filesystem|volume> allow -c <perm|@setname>[,...] <filesystem|volume> allow -s @setname <perm|@setname>[,...] <filesystem|volume> unallow [-rldug] <"everyone"|user|group>[,...] [<perm|@setname>[,...]] <filesystem|volume> unallow [-rld] -e [<perm|@setname>[,...]] <filesystem|volume> unallow [-r] -c [<perm|@setname>[,...]] <filesystem|volume> unallow [-r] -s @setname [<perm|@setname>[,...]] <filesystem| volume> Each dataset is of the form: pool/[dataset/]*dataset[@name] For the property list, run: zfs set|get For the delegated permission list, run: zfs allow|unallow Copyright 2009 Peter Baer Galvin - All Rights Reserved 85 Saturday, May 2, 2009
  • 86. zfs (cont) # zfs get missing property argument usage: get [-rHp] [-o field[,...]] [-s source[,...]] <"all" | property[,...]> [filesystem|volume|snapshot] ... The following properties are supported: PROPERTY EDIT INHERIT VALUES available NO NO <size> compressratio NO NO <1.00x or higher if compressed> creation NO NO <date> mounted NO NO yes | no origin NO NO <snapshot> referenced NO NO <size> type NO NO filesystem | volume | snapshot used NO NO <size> aclinherit YES YES discard | noallow | restricted | passthrough aclmode YES YES discard | groupmask | passthrough atime YES YES on | off Copyright 2009 Peter Baer Galvin - All Rights Reserved 86 Saturday, May 2, 2009
  • 87. zfs (cont) canmount YES NO on | off | noauto casesensitivity NO YES sensitive | insensitive | mixed checksum YES YES on | off | fletcher2 | fletcher4 | sha256 compression YES YES on | off | lzjb | gzip | gzip-[1-9] copies YES YES 1 | 2 | 3 devices YES YES on | off exec YES YES on | off mountpoint YES YES <path> | legacy | none nbmand YES YES on | off normalization NO YES none | formC | formD | formKC | formKD primarycache YES YES all | none | metadata quota YES NO <size> | none readonly YES YES on | off recordsize YES YES 512 to 128k, power of 2 refquota YES NO <size> | none refreservation YES NO <size> | none reservation YES NO <size> | none Copyright 2009 Peter Baer Galvin - All Rights Reserved 87 Saturday, May 2, 2009
  • 88. zfs (cont) secondarycache YES YES all | none | metadata setuid YES YES on | off shareiscsi YES YES on | off | type=<type> sharenfs YES YES on | off | share(1M) options sharesmb YES YES on | off | sharemgr(1M) options snapdir YES YES hidden | visible utf8only NO YES on | off version YES NO 1 | 2 | 3 | current volblocksize NO YES 512 to 128k, power of 2 volsize YES NO <size> vscan YES YES on | off xattr YES YES on | off zoned YES YES on | off Sizes are specified in bytes with standard units such as K, M, G, etc. User-defined properties can be specified by using a name containing a colon (:). Copyright 2009 Peter Baer Galvin - All Rights Reserved 88 Saturday, May 2, 2009
  • 89. zfs (cont) (/)# zfs list NAME USED AVAIL REFER MOUNTPOINT bigp 630G 384G - /zfs/bigp bigp/big 630G 384G 630G /zfs/bigp/big (root@sparky)-(7/pts)-(06:35:11/05/05)- (/)# zfs snapshot bigp/big@5-nov (root@sparky)-(8/pts)-(06:35:11/05/05)- (/)# zfs list NAME USED AVAIL REFER MOUNTPOINT bigp 630G 384G - /zfs/bigp bigp/big 630G 384G 630G /zfs/bigp/big bigp/big@5-nov 0 - 630G /zfs/bigp/big@5-nov # zfs send bigp/big@5-nov | ssh host zfs receive poolB/received/ big@5-nov # zfs send -i 5-nov big/bigp@6-nov | ssh host zfs receive poolB/received/big Copyright 2009 Peter Baer Galvin - All Rights Reserved 89 Saturday, May 2, 2009
  • 90. zfs (cont) # zpool history History for 'zpbg': 2006-04-03.11:47:44 zpool create -f zpbg raidz c5t0d0 c10t0d0 c11t0d0 c12t0d0 c13t0d0 2006-04-03.18:19:48 zfs receive zpbg/imp 2006-04-03.18:41:39 zfs receive zpbg/home 2006-04-03.19:04:22 zfs receive zpbg/photos 2006-04-03.19:37:56 zfs set mountpoint=/export/home zpbg/home 2006-04-03.19:44:22 zfs receive zpbg/mail 2006-04-03.20:12:34 zfs set mountpoint=/var/mail zpbg/mail 2006-04-03.20:14:32 zfs receive zpbg/mqueue 2006-04-03.20:15:01 zfs set mountpoint=/var/spool/mqueue zpbg/ mqueue # zfs create -V 2g tank/volumes/v2 # zfs set shareiscsi=on tank/volumes/v2 # iscsitadm list target Target: tank/volumes/v2 iSCSI Name: iqn.1986-03.com.sun:02:984fe301-c412-ccc1-cc80- cf9a72aa062a Connections: 0 Copyright 2009 Peter Baer Galvin - All Rights Reserved 90 Saturday, May 2, 2009
  • 91. zpool history -l Shows user name, host name, and zone of command # zpool history -l users History for ’users’: 2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0 [user root on corona:global] 2008-07-10.09:43:13 zfs create users/marks [user root on corona:global] 2008-07-10.09:43:44 zfs destroy users/marks [user root on corona:global] 2008-07-10.09:43:48 zfs create users/home [user root on corona:global] 2008-07-10.09:43:56 zfs create users/home/markm [user root on corona:global] 2008-07-10.09:44:02 zfs create users/home/marks [user root on corona:global] Copyright 2009 Peter Baer Galvin - All Rights Reserved 91 Saturday, May 2, 2009
  • 92. zpool history -i Shows zfs internal activities - useful for debugging # zpool history -i users History for ’users’: 2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0 2008-07-10.09:43:13 [internal create txg:6] dataset = 21 2008-07-10.09:43:13 zfs create users/marks 2008-07-10.09:43:48 [internal create txg:12] dataset = 27 2008-07-10.09:43:48 zfs create users/home 2008-07-10.09:43:55 [internal create txg:14] dataset = 33 Copyright 2009 Peter Baer Galvin - All Rights Reserved 92 Saturday, May 2, 2009
  • 93. ZFS Delegate Admin Use zfs allow and zfs unallow to grant and remove permissions Use “delegation” property to manage if delegation enabled Then delegate # zfs allow cindys create,destroy,mount,snapshot tank/cindys # zfs allow tank/cindys ------------------------------------------------------------- Local+Descendent permissions on (tank/cindys) user cindys create,destroy,mount,snapshot ------------------------------------------------------------- # zfs unallow cindys tank/cindys # zfs allow tank/cindys Copyright 2009 Peter Baer Galvin - All Rights Reserved 93 Saturday, May 2, 2009
  • 94. ZFS - Odds and Ends zfs get all will display all set attributes of all ZFS file systems Recursive snapshots (via -r) as of S10 8/07 zfs clone makes a RW copy of a snapshot zfs promote sets the root of the file system to be the specified clone You can undo a zpool destroy with zpool import -D As of S10 8/07 ZFS is integrated with FMA As of S10 11/06 ZFS supports double-RAID parity Copyright 2009 Peter Baer Galvin - All Rights Reserved 94 Saturday, May 2, 2009
  • 95. ZFS “GUI” Did you know that Solaris has an admin GUI? Webconsole enabled by default Turn off via svcadm if not used By default (on Nevada B64 at least) ZFS only on-by-default feature Copyright 2009 Peter Baer Galvin - All Rights Reserved 95 Saturday, May 2, 2009
  • 96. Copyright 2009 Peter Baer Galvin - All Rights Reserved 96 Saturday, May 2, 2009
  • 97. ZFS Automatic Snapshots In Nevada 100 (LSARC 2008/571) - will be in OpenSolaris 2008.11 SMF service and GNOME app Can take automatic scheduled snapshots By default all zfs file systems, at boot, then every 15 minutes, every hour, every day, etc Auto delete of oldest snapshots if user-defined amount of space is not available Can perform incremental or full backups via those snapshots Nautilus integration allows user to browse and restore files graphically Copyright 2009 Peter Baer Galvin - All Rights Reserved 97 Saturday, May 2, 2009
  • 98. ZFS Automatic Snapshots (cont) One SMF service per time frequency: frequent snapshots every 15 mins, keeping 4 snapshots hourly snapshots every hour, keeping 24 snapshots daily snapshots every day, keeping 31 snapshots weekly snapshots every week, keeping 7 snapshots monthly snapshots every month, keeping 12 snapshots Details here: http://src.opensolaris.org/source/xref/jds/zfs- snapshot/README.zfs-auto-snapshot.txt Copyright 2009 Peter Baer Galvin - All Rights Reserved 98 Saturday, May 2, 2009
  • 99. ZFS Automatic Snapshots (cont) Service properties provide more details zfs/fs-name The name of the filesystem. If the special filesystem name "//" is used, then the system snapshots only filesystems with the zfs user property "com.sun:auto-snapshot:<label>" set to true, so to take frequent snapshots of tank/timf, run the following zfs command: # zfs set com.sun:auto-snapshot:frequent=true tank/timf The "snap-children" property is ignored when using this fs-name value. Instead, the system automatically determines when it's able to take recursive, vs. non-recursive snapshots of the system, based on the values of the ZFS user properties. zfs/interval [ hours | days | months | none] When set to none, we don't take automatic snapshots, but leave an SMF instance available for users to manually fire the method script whenever they want - useful for snapshotting on system events. zfs/keep How many snapshots to retain - eg. setting this to "4" would keep only the four most recent snapshots. When each new snapshot is taken, the oldest is destroyed. If a snapshot has been cloned, the service will drop to maintenance mode when attempting to destroy that snapshot. Setting to "all" keeps all snapshots. zfs/period How often you want to take snapshots, in intervals set according to "zfs/ interval" (eg. every 10 days) Copyright 2009 Peter Baer Galvin - All Rights Reserved 99 Saturday, May 2, 2009
  • 100. ZFS Automatic Snapshots (cont) zfs/snapshot-children "true" if you would like to recursively take snapshots of all child filesystems of the specified fs-name. This value is ignored when setting zfs/fs-name='//' zfs/backup [ full | incremental | none ] zfs/backup-save-cmd The command string used to save the backup stream. zfs/backup-lock You shouldn't need to change this - but it should be set to "unlocked" by default. We use it to indicate when a backup is running. zfs/label A label that can be used to differentiate this set of snapshots from others, not required. If multiple schedules are running on the same machine, using distinct labels for each schedule is needed - otherwise oneschedule could remove snapshots taken by another schedule according to it's snapshot-retention policy. (see "zfs/keep") zfs/verbose Set to false by default, setting to true makes the service produce more output about what it's doing. zfs/avoidscrub Set to false by default, this determines whether we should avoid taking snapshots on any pools that have a scrub or resilver in progress. More info in the bugid: 6343667 need itinerary so interrupted scrub/resilver doesn't have to start over Copyright 2009 Peter Baer Galvin - All Rights Reserved 100 Saturday, May 2, 2009
  • 101. ZFS Automatic Snapshot (cont) http://blogs.sun.com/erwann/resource/ menu-location.png Copyright 2009 Peter Baer Galvin - All Rights Reserved 101 Saturday, May 2, 2009
  • 102. ZFS Automatic Snapshot (cont) If life-preserver icon enabled in file browser, then backup of directory is available Press to bring up nav bar Copyright 2009 Peter Baer Galvin - All Rights Reserved 102 Saturday, May 2, 2009
  • 103. ZFS Automatic Snapshot (cont) Drag slider into past to show previous version of files in the directory Then right-click on afile and select “Restore to Desktop” if you want it back More features coming Press to bring up nav bar Copyright 2009 Peter Baer Galvin - All Rights Reserved 103 Saturday, May 2, 2009
  • 104. ZFS Status Netbackup, Legato support ZFS for backup / restore VCS supports ZFS as file system of clustered services Most vendors don’t care which file system app runs on Performance as good as other file systems Feature set better Copyright 2009 Peter Baer Galvin - All Rights Reserved 104 Saturday, May 2, 2009
  • 105. ZFS Futures Support by ISVs Backup / restore Some don’t get metadata (yet) Use zfs send to emit file containing filesystem Clustering (see Lustre) Performance still a work in progress Being ported to BSD, Mac OS Leopard Check out the ZFS FAQ at http://www.opensolaris.org/os/community/zfs/faq/ Copyright 2009 Peter Baer Galvin - All Rights Reserved 105 Saturday, May 2, 2009
  • 106. ZFS Performance From http://www.opensolaris.org/jive/thread.jspa? messageID=14997 billm   Reply On Thu, Nov 17, 2005 at 05:21:36AM -0800, Jim Lin wrote: > Does ZFS reorganize (ie. defrag) the files over time? Not yet. > If it doesn't, it might not perform well in "write-little read-much" > scenarios (where read performance is much more important than write > performance). As always, the correct answer is "it depends". Let's take a look at several cases: - Random reads: No matter if the data was written randomly or sequentially, random reads are random for any filesystem, regardless of their layout policy. Not much you can do to optimize these, except have the best I/O scheduler possible. Copyright 2009 Peter Baer Galvin - All Rights Reserved 106 Saturday, May 2, 2009
  • 107. ZFS Performance (cont) - Sequential writes, sequential reads: With ZFS, sequential writes lead to sequential layout on disk. So sequential reads will perform quite well in this case. - Random writes, sequential reads: This is the most interesting case. With random writes, ZFS turns them into sequential writes, which go *really* fast. With sequential reads, you know which order the reads are going to be coming in, so you can kick off a bunch of prefetch reads. Again, with a good I/O scheduler (which ZFS just happens to have), you can turn this into good read performance, if not entirely as good as totally sequential. Believe me, we've thought about this a lot. There is a lot we can do to improve performance, and we're just getting started. Copyright 2009 Peter Baer Galvin - All Rights Reserved 107 Saturday, May 2, 2009
  • 108. ZFS Performance (cont) For DBs and other direct-disk-access- wanting applications There is no direct I/O in ZFS But can get very good performance by matching I/O size of the app (e.g. Oracle uses 8K) with recordsize of zfs file system This is set at filesystem create time Copyright 2009 Peter Baer Galvin - All Rights Reserved 108 Saturday, May 2, 2009
  • 109. ZFS Performance (cont) The ZIL can be a bottleneck on NFS servers NFS does sync writes Put the ZIL on another disk, or on SSD ZFS aggressively uses memory for caching Low priority user, but can cause temporary conflicts with other users Use arcstat to monitor memory use http://www.solarisinternals.com/wiki/index.php/ Arcstat Copyright 2009 Peter Baer Galvin - All Rights Reserved 109 Saturday, May 2, 2009
  • 110. ZFS Backup Tool Zetaback is a thin-agent based ZFS backup tool Runs from a central host Scans clients for new ZFS filesystems Manages varying desired backup intervals (per host) for full backups incremental backups Maintain varying retention policies (per host) Summarize existing backups Restore any host:fs backup at any point in time to any target host https://labs.omniti.com/trac/zetaba Copyright 2009 Peter Baer Galvin - All Rights Reserved 110 Saturday, May 2, 2009
  • 111. zfs upgrade On-disk format of ZFS changes over time Forward-upgradeable, but not backward compatible Watch out when attaching and detaching zpools Also “sent” not readable by older zfs versions # zfs upgrade This system is currently running ZFS filesystem version 2. The following filesystems are out of date, and can be upgraded. After being upgraded, these filesystems (and any ’zfs send’ streams generated from subsequent snapshots) will no longer be accessible by older software versions. VER FILESYSTEM --- ------------ 1 datab 1 datab/users 1 datab/users/area51 Copyright 2009 Peter Baer Galvin - All Rights Reserved 111 Saturday, May 2, 2009
  • 112. Automatic Snapshots and Backups Unsupported services, may become supported http://blogs.sun.com/timf/entry/ zfs_automatic_snapshots_0_10 http://blogs.sun.com/timf/entry/ zfs_automatic_for_the_people Copyright 2009 Peter Baer Galvin - All Rights Reserved 112 Saturday, May 2, 2009
  • 113. ZFS - Smashing! http://www.youtube.com/watch?v=CN6iDzesEs0&fmt=18 Copyright 2009 Peter Baer Galvin - All Rights Reserved 113 Saturday, May 2, 2009
  • 114. Storage Odds and Ends iostat -y shows performance info on multipathed devices raidctl is RAID configuration tool for multiple RAID controllers fsstat file-system based stat command # fsstat -F new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 0 0 0 0 0 0 0 0 ufs 0 0 0 26.0K 0 52.0K 354 4.71K 1.56M 0 0 proc 0 0 0 0 0 0 0 0 0 0 0 nfs 53.2K 1.02K 24.0K 8.99M 48.6K 4.26M 161K 44.8M 11.8G 23.1M 6.58G zfs 0 0 0 2.94K 0 0 0 0 0 0 0 lofs 7.26K 2.84K 4.30K 31.5K 83 35.4K 6 40.5K 41.3M 45.6K 39.2M tmpfs 0 0 0 410 0 0 0 33 11.0K 0 0 mntfs 0 0 0 0 0 0 0 0 0 0 0 nfs3 0 0 0 0 0 0 0 0 0 0 0 nfs4 0 0 0 0 0 0 0 0 0 0 0 autofs Copyright 2009 Peter Baer Galvin - All Rights Reserved 114 Saturday, May 2, 2009
  • 115. Build an OpenSolaris Storage Server in 10 Minutes http://developers.sun.com/openstorage/articles/opensolaris_storage_server.html Example 1: ZFS Filesystem Objectives: Understand the purpose of the ZFS filesystem. Configure a ZFS pool and filesystem. Requirements: A server (SPARC or x64 based) running the OpenSolaris OS. Configuration details from the running server. Step 1: Identify your Disks. Identify the storage available for adding to the ZFS pool using the format(1) command. Your output will vary from that shown here: # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t2d0 /pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@2,0 1. c0t3d0 /pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@3,0 Specify disk (enter its number): ^D Copyright 2009 Peter Baer Galvin - All Rights Reserved 115 Saturday, May 2, 2009
  • 116. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 2: Add your disks to your ZFS pool. # zpool create -f mypool c0t3d0s0 # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT mypool 10G 94K 10.0G 0% ONLINE - Step 3: Create a filesystem in your pool. # zfs create mypool/myfs # df -h /mypool/myfs Filesystem size used avail capacity Mounted on mypool/myfs 9.8G 18K 9.8G 1% /mypool/myfs Copyright 2009 Peter Baer Galvin - All Rights Reserved 116 Saturday, May 2, 2009
  • 117. Build an OpenSolaris Storage Server in 10 Minutes - cont Example 2: Network File System (NFS) Objectives: Understand the purpose of the NFS filesystem. Create an NFS shared filesystem on a server and mount it on a client. Requirements: Two servers (SPARC or x64 based) - one from the previous example - running the OpenSolaris OS. Configuration details from the running systems. Step 1: Create the NFS shared filesystem on the server. Switch on the NFS service on the server: # svcs nfs/server STATE STIME FMRI disabled 6:49:39 svc:/network/nfs/server:default # svcadm enable nfs/server Share the ZFS filesystem over NFS: # zfs set sharenfs=on mypool/myfs # dfshares RESOURCE SERVER ACCESS TRANSPORT x4100:/mypool/myfs x4100 - - Copyright 2009 Peter Baer Galvin - All Rights Reserved 117 Saturday, May 2, 2009
  • 118. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 2: Switch on the NFS service on the client. This is similar to the the procedure for the server: # svcs nfs/client STATE STIME FMRI disabled 6:47:03 svc:/network/nfs/client:default # svcadm enable nfs/client Mount the shared filesystem on the client: # mkdir /mountpoint # mount -F nfs x4100:/mypool/myfs /mountpoint # df -h /mountpoint Filesystem size used avail capacity Mounted on x4100:/mypool/myfs 9.8G 18K 9.8G 1% /mountpoint Copyright 2009 Peter Baer Galvin - All Rights Reserved 118 Saturday, May 2, 2009
  • 119. Build an OpenSolaris Storage Server in 10 Minutes - cont Example 3: Common Internet File System (CIFS) Objectives: Understand the purpose of the CIFS filesystem. Configure a CIFS share on one machine (from the previous example) and make it available on the other machine. Requirements: Two servers (SPARC or x64 based) running the OpenSolaris OS. Configuration details provided here. Step 1: Create a ZFS filesystem for CIFS. # zfs create -o casesensitivity=mixed mypool/myfs2 # df -h /mypool/myfs2 Filesystem size used avail capacity Mounted on mypool/myfs 2 9.8G 18K 9.8G 1% /mypool/myfs2 Step 2: Switch on the SMB Server service on the server. # svcs smb/server STATE STIME FMRI disabled 6:49:39 svc:/network/smb/server:default # svcadm enable smb/server Copyright 2009 Peter Baer Galvin - All Rights Reserved 119 Saturday, May 2, 2009
  • 120. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 3: Share the filesystem using CIFS. # zfs set sharesmb=on mypool/myfs2 Verify using the following command: # zfs get sharesmb mypool/myfs2 NAME PROPERTY VALUE SOURCE mypool/myfs2 sharesmb on local Step 4: Verify the CIFS naming. Because we have not explicitly named the share, we can examine the default name assigned to it using the following command: # sharemgr show -vp default nfs=() zfs zfs/mypool/myfs nfs=() /mypool/myfs zfs/mypool/myfs2 smb=() mypool_myfs2=/mypool/myfs2 Both the NFS share (/mypool/myfs) and the CIFS share (mypool_myfs2) are shown. Step 5: Edit the file /etc/pam.conf to support creation of an encrypted version of the user's password for CIFS. Add the following line to the end of the file: other password required pam_smb_passwd.so.1 nowarn Copyright 2009 Peter Baer Galvin - All Rights Reserved 120 Saturday, May 2, 2009
  • 121. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 6: Change the password using the passwd command. # passwd username New Password: Re-enter new Password: passwd: password successfully changed for root Now repeat Steps 5 and 6 on the Solaris client. Step 7: Enable CIF client services on the client node. # svcs smb/client STATE STIME FMRI disabled 6:47:03 svc:/network/smb/client:default # svcadm enable smb/client Copyright 2009 Peter Baer Galvin - All Rights Reserved 121 Saturday, May 2, 2009
  • 122. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 8: Make a mount point on the client and mount the CIFS resource from the server. Mount the resource across the network and check it using the following command sequence: # mkdir /mountpoint2 # mount -F smbfs //root@x4100/mypool_myfs2 /mountpoint2 Password: ******* # df -h /mountpoint2 Filesystem size used avail capacity Mounted on //root@x4100/mypool_myfs2 9.8G 18K 9.8G 1% / mountpoint2 # df -n / : ufs /mountpoint : nfs /mountpoint2 : smbfs Copyright 2009 Peter Baer Galvin - All Rights Reserved 122 Saturday, May 2, 2009
  • 123. Build an OpenSolaris Storage Server in 10 Minutes - cont Example 4: Comstar Fibre Channel Target Objectives Understand the purpose of the Comstar Fibre Channel target. Configure an FC target and initiator on two servers. Requirements: Two servers (SPARC or x64 based) running the OpenSolaris OS. Configuration details provided here. Step 1: Start the SSCSI Target Mode Framework and verify it. Use the following commands to start up and check the service on the host that provides the target: # svcs stmf STATE STIME FMRI disabled 19:15:25 svc:/system/device/stmf:default # svcadm enable stmf # stmfadm list-state Operational Status: online Config Status : initialized Copyright 2009 Peter Baer Galvin - All Rights Reserved 123 Saturday, May 2, 2009
  • 124. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 2: Ensure that the framework can see the ports. Use the following command to ensure that the target mode framework can see the HBA ports: # stmfadm list-target -v Target: wwn.210000E08B909221 Operational Status: Online Provider Name : qlt Alias : qlt0,0 Sessions : 4 Initiator: wwn.210100E08B272AB5 Alias: ute198:qlc1 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210100E08B296A60 Alias: ute198:qlc3 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B072AB5 Alias: ute198:qlc0 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B096A60 Alias: ute198:qlc2 Logged in since: Thu Mar 27 16:38:30 2008 Copyright 2009 Peter Baer Galvin - All Rights Reserved 124 Saturday, May 2, 2009
  • 125. Build an OpenSolaris Storage Server in 10 Minutes - cont Target: wwn.210100E08BB09221 Operational Status: Online Provider Name : qlt Alias : qlt1,0 Sessions : 4 Initiator: wwn.210100E08B272AB5 Alias: ute198:qlc1 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210100E08B296A60 Alias: ute198:qlc3 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B072AB5 Alias: ute198:qlc0 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B096A60 Alias: ute198:qlc2 Logged in since: Thu Mar 27 16:38:30 2008 Copyright 2009 Peter Baer Galvin - All Rights Reserved 125 Saturday, May 2, 2009
  • 126. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 3: Create a device to use as storage for the target. Use ZFS to create a volume (zvol) for use as the storage behind the target: # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT mypool 68G 94K 68.0G 0% ONLINE - # zfs create -V 5gb mypool/myvol # zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 5.00G 61.9G 18K /mypool mypool/myvol 5G 66.9G 16K - Copyright 2009 Peter Baer Galvin - All Rights Reserved 126 Saturday, May 2, 2009
  • 127. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 4: Register the zvol with the framework. The zvol becomes the SCSI logical unit (disk) behind the target: # sbdadm create-lu /dev/zvol/rdsk/mypool/myvol Created the following LU: GUID DATA SIZE SOURCE 6000ae4093000000000047f3a1930007 5368643584 /dev/zvol/rdsk/mypool/ myvol Confirm its existence as follows: # stmfadm list-lu -v LU Name: 6000AE4093000000000047F3A1930007 Operational Status: Online Provider Name : sbd Alias : /dev/zvol/rdsk/mypool/myvol View Entry Count : 0 Copyright 2009 Peter Baer Galvin - All Rights Reserved 127 Saturday, May 2, 2009
  • 128. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 5: Find the initiator HBA ports to which to map the LUs. Discover HBA ports on the initiator host using the following command: # fcinfo hba-port HBA Port WWN: 25000003ba0ad303 Port Mode: Initiator Port ID: 1 OS Device Name: /dev/cfg/c5 Manufacturer: QLogic Corp. Model: 2200 Firmware Version: 2.1.145 FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver: Type: L-port State: online Supported Speeds: 1Gb Current Speed: 1Gb Node WWN: 24000003ba0ad303 Copyright 2009 Peter Baer Galvin - All Rights Reserved 128 Saturday, May 2, 2009
  • 129. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 5: Find the initiator HBA ports to which to map the LUs. Discover HBA ports on the initiator host using the following command: # fcinfo hba-port HBA Port WWN: 25000003ba0ad303 Port Mode: Initiator Port ID: 1 OS Device Name: /dev/cfg/c5 Manufacturer: QLogic Corp. Model: 2200 Firmware Version: 2.1.145 FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver: Type: L-port State: online Supported Speeds: 1Gb Current Speed: 1Gb Node WWN: 24000003ba0ad303 . . . Copyright 2009 Peter Baer Galvin - All Rights Reserved 129 Saturday, May 2, 2009
  • 130. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 6: Create a host group and add the world-wide numbers (WWNs) of the initiator host HBA ports to it. Name the group mygroup: # stmfadm create-hg mygroup # stmfadm list-hg Host Group: mygroup Add the WWNs of the ports to the group: # stmfadm add-hg-member -g mygroup wwn.210000E08B096A60 wwn.210100E08B296A60 wwn.210100E08B272AB5 wwn.210000E08B072AB5 Now check that everything is in order: # stmfadmlist-hg-member -v -g mygroup With the host group created, you're now ready to export the logical unit. This is accomplished by adding a view entry to the logical unit using this host group, as shown in the following command: # stmfadm add-view -h mygroup 6000AE4093000000000047F3A1930007 Copyright 2009 Peter Baer Galvin - All Rights Reserved 130 Saturday, May 2, 2009
  • 131. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 7: Check the visibility of the targets on the initiator host. First, force the devices on the initiator host to be rescanned with a simple script: #!/bin/ksh fcinfo hba-port |grep "^HBA" |awk '{print $4}'|while read ln do fcinfo remote-port -p $ln -s >/dev/null 2>&1 done The disk exported over FC should then appear in the format list: # format Searching for disks...done c6t6000AE4093000000000047F3A1930007d0: configured with capacity of 5.00GB Copyright 2009 Peter Baer Galvin - All Rights Reserved 131 Saturday, May 2, 2009
  • 132. Build an OpenSolaris Storage Server in 10 Minutes - cont ... partition> p Current partition table (default): Total disk cylinders available: 20477 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 root wm 0 - 511 128.00MB (512/0/0) 262144 1 swap wu 512 - 1023 128.00MB (512/0/0) 262144 2 backup wu 0 - 20476 5.00GB (20477/0/0) 10484224 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 0 0 (0/0/0) 0 5 unassigned wm 0 0 (0/0/0) 0 6 usr wm 1024 - 20476 4.75GB (19453/0/0) 9959936 7 unassigned wm 0 0 (0/0/0) 0 partition> Copyright 2009 Peter Baer Galvin - All Rights Reserved 132 Saturday, May 2, 2009
  • 133. ZFS Root Solaris 10 10/08 (aka S10U6) supports installation with ZFS as the root file system (as does OpenSolaris) Note that you can’t as of U6 flash archive a ZFS root system(!) Can upgrade by using liveupgrade (LU) to mirror to second disk (ZFS pool) and upgrading there, then booting there lucreate to copy the primary BE to create an alternate BE # zpool create mpool mirror c1t0d0s0 c1t1d0s0 # lucreate -c c1t2d0s0 -n zfsBE -p mpool The default file systems are created in the specified pool and the non-shared file systems are then copied into the root pool Run luupgrade to upgrade the alternate BE (optional) Run luactivate on the newly upgraded alternatve BE so that when the system is rebooted, it will be the new primary BE # luactivate zfsBE Copyright 2009 Peter Baer Galvin - All Rights Reserved 133 Saturday, May 2, 2009
  • 134. Life is good Once on ZFS as root, life is good Mirror the root disk with 1 command (if not mirrored): # zpool attach rpool c1t0d0s0 c1t1d0s0 Note that you have to manually do an installboot on the mirrored disk Now consider all the ZFS features, used on the boot disk Snapshot before patch, upgrade, any change Undo change via 1 command Replicate to another system for backup, DR ... Copyright 2009 Peter Baer Galvin - All Rights Reserved 134 Saturday, May 2, 2009
  • 135. ZFS Labs What pools are available in your zone? What are their states? What is their performance like? What ZFS file systems? Create a new file system Create a file there Take a snapshot of that file system Delete the file Revert to the file system state as of the snapshot How do you see the contents of a snapshot? Copyright 2009 Peter Baer Galvin - All Rights Reserved 135 Saturday, May 2, 2009
  • 136. ZFS Final Thought Eric Schrock's Weblog - Thursday Nov 17, 2005 UFS/SVM vs. ZFS: Code Complexity A lot of comparisons have been done, and will continue to be done, between ZFS and other filesystems. People tend to focus on performance, features, and CLI tools as they are easier to compare. I thought I'd take a moment to look at differences in the code complexity between UFS and ZFS. It is well known within the kernel group that UFS is about as brittle as code can get. 20 years of ongoing development, with feature after feature being bolted on tends to result in a rather complicated system. Even the smallest changes can have wide ranging effects, resulting in a huge amount of testing and inevitable panics and escalations. And while SVM is considerably newer, it is a huge beast with its own set of problems. Since ZFS is both a volume manager and a filesystem, we can use this script written by Jeff to count the lines of source code in each component. Not a true measure of complexity, but a reasonable approximation to be sure. Running it on the latest version of the gate yields: UFS: kernel= 46806 user= 40147 total= 86953 SVM: kernel= 75917 user=161984 total=237901 TOTAL: kernel=122723 user=202131 total=324854 ZFS: kernel= 50239 user= 21073 total= 71312 The numbers are rather astounding. Having written most of the ZFS CLI, I found the most horrifying number to be the 162,000 lines of userland code to support SVM. This is more than twice the size of all the ZFS code (kernel and user) put together! And in the end, ZFS is about 1/5th the size of UFS and SVM. I wonder what those ZFS numbers will look like in 20 years... Copyright 2009 Peter Baer Galvin - All Rights Reserved 136 Saturday, May 2, 2009
  • 137. Copyright 2009 Peter Baer Galvin - All Rights Reserved 137 Saturday, May 2, 2009
  • 138. Where to Learn More Community: http://www.opensolaris.org/os/community/zfs Wikipedia: http://en.wikipedia.org/wiki/ZFS ZFS blogs: http://blogs.sun.com/main/tags/zfs ZFS ports Apple Mac: http://developer.apple.com/adcnews FreeBSD: http://wiki.freebsd.org/ZFS Linux/FUSE: http://zfs-on-fuse.blogspot.com As an appliance: http://www.nexenta.com Beginner’s Guide to ZFS: http://www.sun.com/bigadmin/ features/articles/zfs_overview.jsp Copyright 2009 Peter Baer Galvin - All Rights Reserved 138 Saturday, May 2, 2009
  • 139. Sun Storage 7x10 Copyright 2009 Peter Baer Galvin - All Rights Reserved 139 Saturday, May 2, 2009
  • 140. Speaking of Futures The future of Sun storage? Announced 11/10/2008 Copyright 2009 Peter Baer Galvin - All Rights Reserved 140 Saturday, May 2, 2009