SlideShare une entreprise Scribd logo
1  sur  132
Télécharger pour lire hors ligne
Enabling Grids for E-sciencE



                  Distribute Data and Metadata
                           Management
                             with gLite
                                                  Antonio Calanducci
                                       antonio.calanducci@ct.infn.it
                                  National Institute of Nuclear Physics
                                                          INFN Catania
                                      EGEE NA3 Training & Induction
                                                            ISSGC 2009
                     Sophia-Antipolis, Polytech’Nice-Sofia , 10/07/2009

www.eu-egee.org

INFSO-RI-031688
Outline
                  Enabling Grids for E-sciencE




   • Grid Data Management Challenge

   • Storage Elements and SRM

   • File Catalogs and Data Management tools

   • Metadata Service with use cases




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09   2
The Grid DM Challenge
                    Enabling Grids for E-sciencE




 • Heterogeneity
                                                        – Need common interface
      – Data are stored on different                      to storage resources
        storage systems using different
                                                             Storage Resource
        access technologies                                   Manager (SRM)
 • Distribution
      – Data are stored in different                    – Need to keep track where
        locations – in most cases there                   data is stored
        is no shared file system or                          File and Replica
                                                              Catalogs
        common namespace
                                                        – Need scheduled, reliable
      – Data need to be moved between                     file transfer
        different locations                                  File transfer service
 • Data about data
      – Data stored can be huge: find                   – Need a way to describe
        files according to their contents                 files’ content and query
                                                          their descriptions
                                                             Metadata service
INFSO-RI-508833                                         ISSGC 09, Sophia-Antipolis, 10-07-09   3
SRM in an example
                     Enabling Grids for E-sciencE




                  She is running a job which needs:
                  Data for physics event reconstruction
                  Simulated Data
                  Some data analysis files            They are at CERN
                                                      in dCache
                  She will write files remotely too



                                                                     They are at Fermilab
                                                                     In a disk array

                        They are at Nikhef
                        in a classic SE



INFSO-RI-508833                                      ISSGC 09, Sophia-Antipolis, 10-07-09   4
SRM in an example
                  Enabling Grids for E-sciencE




   dCache
   Own system, own protocols
   and parameters
                                                          You as a
                                                         user need
   gLite DPM                                             to know all
   Independent system from
   dCache or Castor
                                                             the
                                                         systems!!!

   Castor
   No connection with
   dCache or DPM

INFSO-RI-508833                                    ISSGC 09, Sophia-Antipolis, 10-07-09   5
SRM in an example
                  Enabling Grids for E-sciencE




   dCache
   Own system, own protocols
   and parameters
                                                       I talk to them on your
                                                       behalf
                                                               You as a
                                                       I will user needspace
                                                              even allocate
   gLite DPM                                           for your know all
                                                             to files



                                                 SRM
   Independent system from                             And I will use transfer
   dCache or Castor                                    protocols the
                                                                   to send your files
                                                       theresystems!!!


   Castor
   No connection with
   dCache or DPM

INFSO-RI-508833                                        ISSGC 09, Sophia-Antipolis, 10-07-09   5
Storage Resource Managers
                      Enabling Grids for E-sciencE




   • Data are stored on disk pool servers or Mass Storage
     Systems

   • Storage resource management needs to take into
     account
        –   Transparent access to files (migration to/from disk pool)
        –   File pinning
        –   Space reservation
        –   File status notification
        –   Life time management


   • The SRM (Storage Resource Manager) takes care of
     all these details
        – The SRM is a single interface that takes care of local storage
          interaction and provides a Grid interface to the outside world

INFSO-RI-508833                                      ISSGC 09, Sophia-Antipolis, 10-07-09   6
Data management with gLite
                      Enabling Grids for E-sciencE


    •   Assumptions:
         – Users and programs produce and require data
         – the lowest granularity of the data is on the file level (we
           deal with files rather than data objects or
           tables)
               Data = files

    •   Files:
         –   Mostly, write once, read many
         –   Located in Storage Elements (SEs)
         –   Several replicas of one file in different sites
         –   Accessible by Grid users and applications from “anywhere”
         –   Locatable by the WMS (data requirements in JDL)

    •   Also…
         – WMS can send (small amounts of) data to/from jobs: Input
           and Output Sandbox
         – Files may be copied from/to local filesystems (WNs, UIs) to
           the Grid (SEs)

INFSO-RI-508833                                      ISSGC 09, Sophia-Antipolis, 10-07-09   7
gLite Grid Storage Requirements
                      Enabling Grids for E-sciencE




•   The Storage Element is the service which allow a user or an application to
    store data for future retrieval

•   Manage local storage (disks) and interface to Mass Storage
    Systems(tapes) like
     – HPSS, CASTOR, DiskeXtender (UNITREE), …

•   Be able to manage different storage systems uniformly and transparently
    for the user (providing an SRM interface)

•   Support basic file transfer protocols
     – GridFTP mandatory
     – Others if available (https, ftp, etc)

•   Support a native I/O (remote file) access protocol
     – POSIX (like) I/O client library for direct access of data (GFAL)


INFSO-RI-508833                                         ISSGC 09, Sophia-Antipolis, 10-07-09   8
gLite Storage Element
                  Enabling Grids for E-sciencE




                            (http/https)




INFSO-RI-508833                                      ISSGC 09, Sophia-Antipolis, 10-07-09   9
gLite SE types history
                        Enabling Grids for E-sciencE



     • gLite data access protocols:
          – File Transfer: GSIFTP (GridFTP)/HTTP(S)
          – File I/O (Remote File access):
                   gsidcap
                   rfio (insecure RFIO)
                   gsirfio (secured RFIO)


     • Classic SE:
          – GridFTP server
          – Insecure RFIO daemon (rfiod) – only LAN limited
            file access
          – Single disk or disk array
          – No quota management
          – Does not support the SRM interface
INFSO-RI-508833                                             ISSGC 09, Sophia-Antipolis, 10-07-09   10
gLite SE types history(II)
                   Enabling Grids for E-sciencE




 • Mass Storage Systems (Castor)
      – Files migrated between front-end disk and back-end tape storage
        hierarchies
      – GridFTP server
      – Insecure RFIO (Castor)
      – Provide a SRM interface with all the benefits
 • Disk pool managers (dCache and gLite DPM)
      – manage distributed storage servers in a centralized way
      – Physical disks or arrays are combined into a common (virtual) file
        system
      – Disks can be dynamically added to the pool
      – GridFTP server
      – Secure remote access protocols (gsidcap for dCache, gsirfio for
        DPM)
      – SRM interface


INFSO-RI-508833                                          ISSGC 09, Sophia-Antipolis, 10-07-09   11
DPM Overview
                          Enabling Grids for E-sciencE


   • The Disk Pool Manager (DPM) is a lightweight solution
     for disk storage management, which offers the SRM
     interfaces (2.2 released in DPM version 1.6.3).
   • Each DPM–type Storage Element (SE), is composed
     by an head node and one (in the same machine) or
     more disk servers
   • The DPM handles the storage on Disk Servers.
         – It handles pools: a pool is a group of file systems, located
           on one or more disk servers. The DPM Disk Servers can
           have multiple filesystems in the pool.
   • Supported protocols:
        – File Transfers: GridFTP, HTTP/HTTPs
        – File I/O: GSIRFIO



EGEE-II INFSO-RI-031688                                  EGEE Tutorial - Barcelona 14th - 18th April 2008,   12
                                                                                                              3
DPM Architecture
                          Enabling Grids for E-sciencE




                                                                                                /dpm
                                                                                                   /domain
                                                                                                         /home
                    CLI, C API,                                                                              /vo
    (uid, gid1, …)
                 SRM-enabled client,                                          DPM
                       etc.                              da                 head node                               file
                                                           ta
                                                                tra
•    DPM Name Server                                               ns
                                                                      fe
    Namespace                                                           r
    Authorization
    Physical files location
•    Disk Servers
    Physical files
•    Direct data transfer from/to                            DPM
     disk server (no bottleneck)                         disk servers


EGEE-II INFSO-RI-031688                                         EGEE Tutorial - Barcelona 14th - 18th April 2008,      13
                                                                                                                        5
DPM architecture (details)
                          Enabling Grids for E-sciencE




EGEE-II INFSO-RI-031688                                   EGEE Tutorial - Barcelona 14th - 18th April 2008,   14
                                                                                                               6
DPM architecture (details)
                          Enabling Grids for E-sciencE




      CLI, C API,
     SRM-enabled
      client, etc.




EGEE-II INFSO-RI-031688                                   EGEE Tutorial - Barcelona 14th - 18th April 2008,   14
                                                                                                               6
DPM architecture (details)
                           Enabling Grids for E-sciencE




      CLI, C API,
     SRM-enabled
      client, etc.


                          SRMv1      SRMv2          SRMv2.2



       DPM                              /dpm
     head node                             /domain
                                              /home
                            DPM    DPNS          /vo
                                                    file
                           daemon daemon




EGEE-II INFSO-RI-031688                                       EGEE Tutorial - Barcelona 14th - 18th April 2008,   14
                                                                                                                   6
DPM architecture (details)
                           Enabling Grids for E-sciencE




      CLI, C API,
     SRM-enabled
      client, etc.


                          SRMv1      SRMv2          SRMv2.2



       DPM                              /dpm
     head node                             /domain
                                              /home
                            DPM    DPNS          /vo
                                                    file
                           daemon daemon




EGEE-II INFSO-RI-031688                                       EGEE Tutorial - Barcelona 14th - 18th April 2008,   14
                                                                                                                   6
DPM architecture (details)
                           Enabling Grids for E-sciencE




                                                                                                         DPM
                                                                                    …
                                                                                                     disk servers
      CLI, C API,                                                     Secure RFIO       Secure RFIO
                                                                        GridFTP           GridFTP
     SRM-enabled
      client, etc.


                          SRMv1      SRMv2          SRMv2.2



       DPM                              /dpm
     head node                             /domain
                                              /home
                            DPM    DPNS          /vo
                                                    file
                           daemon daemon




EGEE-II INFSO-RI-031688                                       EGEE Tutorial - Barcelona 14th - 18th April 2008,   14
                                                                                                                   6
DPM architecture (details)
                           Enabling Grids for E-sciencE




                                            data transfer                                                DPM
                                                                                    …
                                                                                                     disk servers
      CLI, C API,                                                     Secure RFIO       Secure RFIO
                                                                        GridFTP           GridFTP
     SRM-enabled
      client, etc.


                          SRMv1      SRMv2          SRMv2.2



       DPM                              /dpm
     head node                             /domain
                                              /home
                            DPM    DPNS          /vo
                                                    file
                           daemon daemon




EGEE-II INFSO-RI-031688                                       EGEE Tutorial - Barcelona 14th - 18th April 2008,   14
                                                                                                                   6
DPM architecture (details)
                           Enabling Grids for E-sciencE




                                            data transfer                                                DPM
                                                                                    …
                                                                                                     disk servers
      CLI, C API,                                                     Secure RFIO       Secure RFIO
                                                                        GridFTP           GridFTP
     SRM-enabled
      client, etc.


                          SRMv1      SRMv2          SRMv2.2



       DPM                              /dpm
     head node                             /domain
                                              /home
                            DPM    DPNS          /vo
                                                    file
                           daemon daemon
                                                                                         DPNS database


                                                                          DPM database

EGEE-II INFSO-RI-031688                                       EGEE Tutorial - Barcelona 14th - 18th April 2008,   14
                                                                                                                   6
DPM strengths
                          Enabling Grids for E-sciencE




      • Easy to install/configure
            - Few configuration files
      • Manageable storage
            - Logical Namespace
            - Easy to add/remove file systems
      •   Low maintenance effort
      •   Supports as many disk servers as needed
      •   Low memory footprint
      •   Low CPU utilization



EGEE-II INFSO-RI-031688                                  EGEE Tutorial - Barcelona 14th - 18th April 2008,   15
                                                                                                              7
Files Naming conventions
                       Enabling Grids for E-sciencE


   • Logical File Name (LFN)
        – An alias created by a user to refer to some item of data, e.g. “lfn:/grid/
           gilda/20030203/run2/track1”
   • Globally Unique Identifier (GUID)
        – A non-easy-to-remember unique identifier for an item of data, e.g.
           “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6”
   • Site URL (SURL) (or Physical File Name (PFN) or Site FN)
        – The location of an actual piece of data on a storage system
            e.g. “srm://grid009.ct.infn.it/dpm/ct.infn.it/gilda/output10_1”        (SRM)
   • Transport URL (TURL)
        – Temporary locator of a replica + access protocol: understood by a SE
        – e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat”




INFSO-RI-508833                                               ISSGC 09, Sophia-Antipolis, 10-07-09   16
What is a file catalog
                                    File Catalog
                    Enabling Grids for E-sciencE




                                                                                  SE




                                                                                  SE




            gLite                                                                 SE
             UI




INFSO-RI-508833                                        ISSGC 09, Sophia-Antipolis, 10-07-09   17
Upload and file convention example
                       Enabling Grids for E-sciencE

 [issgc59@issgc-ui ~]$ lcg-cr -v -d aliserv6.ct.infn.it -l lfn:/grid/gilda/tutorials/
 issgc09/message0.sh file:/home/issgc59/message0.sh
 Using grid catalog type: lfc
 Using grid catalog : lfc-gilda.ct.infn.it
 SE type: SRMv1
 Destination SURL : srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/
 2009-07-10/file273edfaa-e0ed-448c-b713-7996c2d09f88
 Source SRM Request Token: 401296
 Source URL: file:/home/issgc59/message0.sh
 File size: 20
 VO name: gilda
 Destination specified: aliserv6.ct.infn.it
 Destination URL for copy: gsiftp://aliserv6.ct.infn.it/aliserv6.ct.infn.it:/data01/gilda/
 2009-07-10/file273edfaa-e0ed-448c-b713-7996c2d09f88.401296.0
 # streams: 1
            20 bytes           0.01 KB/sec avg        0.01 KB/sec inst
 Transfer took 3070 ms
 Using LFN: lfn:/grid/gilda/tutorials/issgc09/message0.sh
 Using GUID: guid:2eed7180-2505-46fb-bef2-19ef77f1b2fd
 Registering LFN: /grid/gilda/tutorials/issgc09/message0.sh (2eed7180-2505-46fb-
 bef2-19ef77f1b2fd)
 Registering SURL: srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/
 2009-07-10/file273edfaa-e0ed-448c-b713-7996c2d09f88 (2eed7180-2505-46fb-bef2-19ef77f1b2fd)
  guid:2eed7180-2505-46fb-bef2-19ef77f1b2fd
INFSO-RI-508833                                                ISSGC 09, Sophia-Antipolis, 10-07-09
The LFC (LCG File Catalog)
                        Enabling Grids for E-sciencE




•   It keeps track of the location of copies (replicas) of Grid files
•   LFN acts as main key in the database. It has:
     –   Symbolic links to it (additional LFNs)
     –   Unique Identifier (GUID)
     –   System metadata
     –   Information on replicas
     –   One field of user metadata




INFSO-RI-508833                                            ISSGC 09, Sophia-Antipolis, 10-07-09   19
LFC Features
                   Enabling Grids for E-sciencE



         – Cursors for large queries
         – Timeouts and retries from the client
         – User exposed transactional API (+ auto rollback on failure)
         – Hierarchical namespace and namespace operations (for
           LFNs)
         – Integrated GSI Authentication + Authorization
         – Access Control Lists (Unix Permissions and POSIX ACLs)
         – Checksums
         – Integration with VOMS (VirtualID and VirtualGID)




INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   20
lfc-ls
                         Enabling Grids for E-sciencE



 Listing the entries of a LFC directory
      lfc-ls  [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side]     [--ds]
         path…
      where path specifies the LFN pathname (mandatory)

      – Remember that LFC has a directory tree structure
      – /grid/<VO_name>/<you create it>

           LFC Namespace                   Defined by the user


      – All members of a VO have read-write permissions under
        their directory
      – You can set LFC_HOME to use relative paths
                  > lfc-ls /grid/gilda/tony
                                                                 -l : long listing
                  > export LFC_HOME=/grid/gilda                  -R : list the contents of directories
                  > lfc-ls -l tony                               recursively: Don’t use it!
                  > lfc-ls -l -R /grid

INFSO-RI-508833                                                    ISSGC 09, Sophia-Antipolis, 10-07-09   21
lfc-mkdir
                     Enabling Grids for E-sciencE




   Creating directories in the LFC
        lfc-mkdir [-m mode] [-p] path...

   •   Where path specifies the LFC pathname

   •   Remember that while registering a new file (using lcg-cr, for
       example) the corresponding destination directory must be created in
       the catalog beforehand.

   •   Examples:
        > lfc-mkdir /grid/gilda/tony/demo


        You can just check the directory with:
        > lfc-ls -l /grid/gilda/tony
             drwxr-xrwx 0 19122 1077                0 Jun 14 11:36 demo



INFSO-RI-508833                                       ISSGC 09, Sophia-Antipolis, 10-07-09   22
lfc-ln
                      Enabling Grids for E-sciencE


         Creating a symbolic link
         lfc-ln -s file linkname
         lfc-ln -s directory linkname

         Create a link to the specified file or directory with linkname

         – Examples:
         > lfc-ln -s /grid/gilda/tony/demo/test /grid/gilda/tony/aLink

                                                                     Symbolic
                                      Original File                    link


         Let’s check the link using lfc-ls with long listing (-l):
         > lfc-ls -l
         lrwxrwxrwx 1 19122 1077 0 Jun 14 11:58 aLink ->/grid/gilda/tony/demo/test
         drwxr-xrwx 1 19122 1077 0 Jun 14 11:39 demo

INFSO-RI-508833                                           ISSGC 09, Sophia-Antipolis, 10-07-09   23
LFC commands
                      Enabling Grids for E-sciencE


     Summary of the LFC Catalog commands
     lfc-chmod                              Change access mode of the LFC file/directory
     lfc-chown                              Change owner and group of the LFC file-directory

     lfc-delcomment                         Delete the comment associated with the file/directory

     lfc-getacl                             Get file/directory access control lists

     lfc-ln                                 Make a symbolic link to a file/directory

     lfc-ls                                 List file/directory entries in a directory

     lfc-mkdir                              Create a directory

     lfc-rename                             Rename a file/directory

     lfc-rm                                 Remove a file/directory

     lfc-setacl                             Set file/directory access control lists

     lfc-setcomment                         Add/replace a comment



INFSO-RI-508833                                                     ISSGC 09, Sophia-Antipolis, 10-07-09   24
LFC C API
                      Enabling Grids for E-sciencE


    Low level methods (many POSIX-like):
                                                                               lfc_setacl
     lfc_access             lfc_deleteclass          lfc_listreplica           lfc_setatime
     lfc_aborttrans         lfc_delreplica           lfc_lstat                 lfc_setcomment
     lfc_addreplica         lfc_endtrans             lfc_mkdir                 lfc_seterrbuf
     lfc_apiinit            lfc_enterclass           lfc_modifyclass           lfc_setfsize
     lfc_chclass            lfc_errmsg               lfc_opendir               lfc_starttrans
     lfc_chdir              lfc_getacl               lfc_queryclass            lfc_stat
     lfc_chmod              lfc_getcomment           lfc_readdir               lfc_symlink
     lfc_chown              lfc_getcwd               lfc_readlink              lfc_umask
     lfc_closedir           lfc_getpath              lfc_rename                lfc_undelete
     lfc_creat              lfc_lchown               lfc_rewind                lfc_unlink
     lfc_delcomment         lfc_listclass            lfc_rmdir                 lfc_utime
     lfc_delete             lfc_listlinks            lfc_selectsrvr            send2lfc

INFSO-RI-508833                                                  ISSGC 09, Sophia-Antipolis, 10-07-09   25
GFAL: Grid File Access Library
                        Enabling Grids for E-sciencE


•   Interactions with SE require some components:
    - File catalog services to locate replicas
    - SRM
    - File access mechanism to access files from the SE on the WN
•   GFAL does all this tasks for you:
    - Hides all these operations
    - Presents a POSIX interface for the I/O operations
      - Single shared library in threaded and unthreaded versions (libgfal.so, libgfal_pthr.so)
      - Single header file (gfal_api.h)
    - User can create all commands needed for storage management
    - It offers as well an interface to SRM
•   Supported protocols:
    - file (local or nfs-like access)
    - dcap, gsidcap and kdcap (dCache access)
    - rfio (castor access) and gsirfio (DPM)

INFSO-RI-508833                                                  ISSGC 09, Sophia-Antipolis, 10-07-09   26
GFAL: File I/O API (I)
                   Enabling Grids for E-sciencE




    int gfal_access (const char *path, int amode);
    int gfal_chmod (const char *path, mode_t mode);
    int gfal_close (int fd);
    int gfal_creat (const char *filename, mode_t mode);
    off_t gfal_lseek (int fd, off_t offset, int whence);
    int gfal_open (const char * filename, int flags, mode_t mode);
    ssize_t gfal_read (int fd, void *buf, size_t size);
    int gfal_rename (const char *old_name, const char *new_name);
    ssize_t gfal_setfilchg (int, const void *, size_t);
    int gfal_stat (const char *filename, struct stat *statbuf);
    int gfal_unlink (const char *filename);
    ssize_t gfal_write (int fd, const void *buf, size_t size);




INFSO-RI-508833                                       ISSGC 09, Sophia-Antipolis, 10-07-09   27
GFAL: FC API (II)
                   Enabling Grids for E-sciencE



   int gfal_closedir (DIR *dirp);
   int gfal_mkdir (const char *dirname, mode_t mode);
   DIR *gfal_opendir (const char *dirname);
   struct dirent *gfal_readdir (DIR *dirp);
   int gfal_rmdir (const char *dirname);




INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   28
GFAL: Catalog API
                     Enabling Grids for E-sciencE


        int create_alias (const char *guid, const char *lfn, long long size)
        int guid_exists (const char *guid)
        char *guidforpfn (const char *surl)
        char *guidfromlfn (const char *lfn)
        char **lfnsforguid (const char *guid)
        int register_alias (const char *guid, const char *lfn)
        int register_pfn (const char *guid, const char *surl)
        int setfilesize (const char *surl, long long size)
        char *surlfromguid (const char *guid)
        char **surlsfromguid (const char *guid)
        int unregister_alias (const char *guid, const char *lfn)
        int unregister_pfn (const char *guid, const char *surl)




INFSO-RI-508833                                       ISSGC 09, Sophia-Antipolis, 10-07-09   29
GFAL: SRM API
                     Enabling Grids for E-sciencE




       int deletesurl (const char *surl)
       int getfilemd (const char *surl, struct stat64 *statbuf)
       int set_xfer_done (const char *surl, int reqid, int fileid, char
          *token, int oflag)
       int set_xfer_running (const char *surl, int reqid, int fileid,
          char *token)
       char *turlfromsurl (const char *surl, char **protocols, int
         oflag, int *reqid, int *fileid, char **token)
       int srm_get (int nbfiles, char **surls, int nbprotocols, char
          **protocols, int *reqid, char **token, struct srm_filestatus
          **filestatuses)
       int srm_getstatus (int nbfiles, char **surls, int reqid, char
          *token, struct srm_filestatus **filestatuses)


INFSO-RI-508833                                       ISSGC 09, Sophia-Antipolis, 10-07-09   30
GFAL Java API
                   Enabling Grids for E-sciencE




      •   GFAL API are available for C/C++ programmers
      •   We wrote a wrapper around the C APIs using Java Native
          Interface and a the Java APIs on top of it
      •   More information can be found here:
          https://grid.ct.infn.it/twiki/bin/view/GILDA/APIGFAL




INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   31
lcg_utils DM tools
                    Enabling Grids for E-sciencE



    • High level interface (CL tools and APIs) to
         – Upload/download files to/from the Grid (UI,CE and WN
           <---> SEs)
         – Replicate data between SEs and locate the best replica
           available
         – Interact with the file catalog


    • Definition: A file is considered to be a Grid File if
      it is both physically present in a SE and
      registered in the File Catalog

    • lcg-utils ensure the consistency between files
      in the Storage Elements and entries in the File
      Catalog

INFSO-RI-508833                                    ISSGC 09, Sophia-Antipolis, 10-07-09   32
lcg-utils commands
                         Enabling Grids for E-sciencE

     Replica Management
     lcg-cp          Copies a grid file to a local destination
     lcg-cr          Copies a file to a SE and registers the file in the catalog
     lcg-del         Delete one file
     lcg-rep         Replication between SEs and registration of the replica
     lcg-gt          Gets the TURL for a given SURL and transfer protocol
     lcg-sd          Sets file status to “Done” for a given SURL in a SRM
                     request
     File Catalog Interaction

     lcg-aa       Add an alias in LFC for a given GUID

     lcg-ra       Remove an alias in LFC for a given GUID

     lcg-rf       Registers in LFC a file placed in a SE
     lcg-uf       Unregisters in LFC a file placed in a SE
     lcg-la       Lists the alias for a given SURL, GUID or LFN
     lcg-lg       Get the GUID for a given LFN or SURL
     lcg-lr       Lists the replicas for a given GUID, SURL or LFN


INFSO-RI-508833                                                      ISSGC 09, Sophia-Antipolis, 10-07-09   33
Enabling Grids for E-sciencE
                                                                     LFC interfaces
                                    SE          SE             SE



                  LCG
                  UTIL                         GFAL




                                            Python
                                                                     LFC
                                                                    CLIENT                 LFC
                                                                     C API                SERVER
                  WMS                          DLI




                                               CLI
                                          lfc-ls, lfc-mkdir,
                                            lfc-setacl, …




INFSO-RI-508833                                                         ISSGC 09, Sophia-Antipolis, 10-07-09   34
Data movement introduction
                     Enabling Grids for E-sciencE




    • Grids are naturally distributed systems
    • The means that data also needs to be distributed
         – First generation data distribution mainly concentrated on copy
           protocols in a grid environment:
               gridftp
               http + mod_gridsite
    • But copies controlled by clients have problems…




INFSO-RI-508833                                      ISSGC 09, Sophia-Antipolis, 10-07-09
Direct Client Controlled
                      Enabling Grids for E-sciencE
                                                                          Data Movement

                                                     Client
                                                                       Control
                                                                      Channels


       Source Storage Element                         Data Flow
                                                       Channel          Destination Storage
                                                                              Element

          •   Although transport protocol may be robust, state is held inside
              client – inconvenient and fragile.
          •   Client only knows about local state, no sense of global
              knowledge about data transfers between storage elements.
               – Storage elements overwhelmed with replication requests
               – Multiple replications of the same data can happen
                  simultaneously
               – Site has little control over balance of network resources - DoS



INFSO-RI-508833                                                     ISSGC 09, Sophia-Antipolis, 10-07-09
Transfer Service
                      Enabling Grids for E-sciencE




    • Clear need for a service
      for data transfer                                                      •Submit new request
                                                                             •Monitor progress
         – Client connects to service                   Client               •Cancel request
           to submit request
         – Service maintains state
           about transfer                                                SOAP via https
         – Client can periodically
           reconnect to check status
           or cancel request                                       Transfer
         – Service can have                                        Service              Control
           knowledge of global state,
           not just a single request
               Load balancing
               Scheduling                            Source
                                                     Storage                     Destination
                                                     Element                      Storage
                                                                     Data         Element
                                                                     Flow



INFSO-RI-508833                                        ISSGC 09, Sophia-Antipolis, 10-07-09
gLite FTS: Channels
                    Enabling Grids for E-sciencE




   •   FTS Service has a concept of
       channels
   •   A channel is a unidirectional
       connection between two sites
   •   Transfer requests between these
       two sites are assigned to that
       channel
   •   Channels usually correspond to a                • Channels control certain
       dedicated network pipe (e.g., OPN)                transfer properties:
       associated with production                        transfer concurrency,
   •   But channels can also take                        gridftp streams.
       wildcards:
        – * to MY_SITE : All incoming                  • Channels can be
        – MY SITE to * : All outgoing                    controlled
                                                         independently: started,
        – * to * : Catch all
                                                         stopped, drained.



INFSO-RI-508833                                       ISSGC 09, Sophia-Antipolis, 10-07-09
Why Grid needs Metadata?
                    Enabling Grids for E-sciencE




   • Grids allow to save millions of files spread over
     several storage sites.
   • Users and applications need an efficient mechanism
        – to describe files
        – to locate files based on their contents

   • This is achieved by
        – associating descriptive attributes to files
             Metadata is data about data
        – answering user queries against the associated information




INFSO-RI-508833                                        ISSGC 09, Sophia-Antipolis, 10-07-09
Basic Metadata Concept
                    Enabling Grids for E-sciencE




     • Entries – Representation of real world entities which
       we are attaching metadata to for describing them
     • Attribute – key/value pair
          – Type – The type (int, float, string,…)
          – Name/Key – The name of the attribute
          – Value - Value of an entry's attribute
     • Schema – A set of attributes
     • Collection – A set of entries associated with a
       schema
     • Metadata - List of attributes (including their values)
       associated with entries


INFSO-RI-508833                                          ISSGC 09, Sophia-Antipolis, 10-07-09   40
Example: Movie Trailers
                          Enabling Grids for E-sciencE




      • Movie trailers files (entries) saved on Grid Storage
        Elements and registered into File Catalogue
      • We want to add metadata to describe movie content.
      • A possible schema:
           –      Title -- varchar
           –      Runtime -- int
           –      Cast -- varchar
           –      LFN -- varchar


      • A metadata catalogue will be the repository of the
        movies’ metadata and will allow to find movies
        satisfying users’ queries


INFSO-RI-508833                                                ISSGC 09, Sophia-Antipolis, 10-07-09
Trailer example
                          Enabling Grids for E-sciencE




 Entry names                              Title          Ru Cast            LFN
8c3315c1-811f-4823-a778-60a203439689
                                     My Best             nti Julia
                                                         80                 lfn:/grid/gilda/movies/
                                     Friend’s                Roberts        mybfwed.avi
                                     wedding             me Kirsten
51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2        120                lfn:/grid/gilda/movies/
                                                               Dunst        spiderman2.avi
 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father     113   Al pacino    lfn:/grid/gilda/movies/
                                                                            godfather.avi




INFSO-RI-508833                                                   ISSGC 09, Sophia-Antipolis, 10-07-09   42
Trailer example
                          Enabling Grids for E-sciencE




                                                               Attribute


 Entry names                              Title          Ru Cast             LFN
8c3315c1-811f-4823-a778-60a203439689
                                     My Best             nti Julia
                                                         80                  lfn:/grid/gilda/movies/
                                     Friend’s                Roberts         mybfwed.avi
                                     wedding             me Kirsten
51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2        120                 lfn:/grid/gilda/movies/
                                                                Dunst        spiderman2.avi
 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father     113    Al pacino    lfn:/grid/gilda/movies/
                                                                             godfather.avi




INFSO-RI-508833                                                    ISSGC 09, Sophia-Antipolis, 10-07-09   42
Trailer example
                          Enabling Grids for E-sciencE




                           Schema                              Attribute


 Entry names                              Title          Ru Cast             LFN
8c3315c1-811f-4823-a778-60a203439689
                                     My Best             nti Julia
                                                         80                  lfn:/grid/gilda/movies/
                                     Friend’s                Roberts         mybfwed.avi
                                     wedding             me Kirsten
51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2        120                 lfn:/grid/gilda/movies/
                                                                Dunst        spiderman2.avi
 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father     113    Al pacino    lfn:/grid/gilda/movies/
                                                                             godfather.avi




INFSO-RI-508833                                                    ISSGC 09, Sophia-Antipolis, 10-07-09   42
Trailer example
                          Enabling Grids for E-sciencE




                           Schema                              Attribute


 Entry names                              Title          Ru Cast             LFN
8c3315c1-811f-4823-a778-60a203439689
                                     My Best             nti Julia
                                                         80                  lfn:/grid/gilda/movies/
                                     Friend’s                Roberts         mybfwed.avi
                                     wedding             me Kirsten
51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2        120                 lfn:/grid/gilda/movies/
                                                                Dunst        spiderman2.avi
 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father     113    Al pacino    lfn:/grid/gilda/movies/
                                                                             godfather.avi



                                                                                       Entries


INFSO-RI-508833                                                    ISSGC 09, Sophia-Antipolis, 10-07-09   42
Trailer example
                          Enabling Grids for E-sciencE




                           Schema                              Attribute


 Entry names                              Title          Ru Cast             LFN
8c3315c1-811f-4823-a778-60a203439689
                                     My Best             nti Julia
                                                         80                  lfn:/grid/gilda/movies/
                                     Friend’s                Roberts         mybfwed.avi
                                     wedding             me Kirsten
51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2        120                 lfn:/grid/gilda/movies/
                                                                Dunst        spiderman2.avi
 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father     113    Al pacino    lfn:/grid/gilda/movies/
                                                                             godfather.avi


                      Collection /trailers
                                                                                       Entries


INFSO-RI-508833                                                    ISSGC 09, Sophia-Antipolis, 10-07-09   42
Metadata service on the Grid
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09   43
Metadata service on the Grid
                  Enabling Grids for E-sciencE



   • Information about files -- but not only!




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09   43
Metadata service on the Grid
                   Enabling Grids for E-sciencE



   • Information about files -- but not only!
   • metadata can describe any grid entity/object
        – ex: JobIDs - add logging information to your jobs




INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   43
Metadata service on the Grid
                   Enabling Grids for E-sciencE



   • Information about files -- but not only!
   • metadata can describe any grid entity/object
        – ex: JobIDs - add logging information to your jobs
   • monitoring of running applications:
        – ex: ongoing results from running jobs can be published on
          the metadata server




INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   43
Metadata service on the Grid
                   Enabling Grids for E-sciencE



   • Information about files -- but not only!
   • metadata can describe any grid entity/object
        – ex: JobIDs - add logging information to your jobs
   • monitoring of running applications:
        – ex: ongoing results from running jobs can be published on
          the metadata server
   • Inputset for a storm of parametric jobs




INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   43
Metadata service on the Grid
                   Enabling Grids for E-sciencE



   • Information about files -- but not only!
   • metadata can describe any grid entity/object
        – ex: JobIDs - add logging information to your jobs
   • monitoring of running applications:
        – ex: ongoing results from running jobs can be published on
          the metadata server
   • Inputset for a storm of parametric jobs
   • information exchanging among grid peers
        – ex: producers/consumers job collections: master jobs
          produce data to be analyzed; slave jobs query the metadata
          server to retrieve input to “consume”




INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   43
Metadata service on the Grid
                   Enabling Grids for E-sciencE



   • Information about files -- but not only!
   • metadata can describe any grid entity/object
        – ex: JobIDs - add logging information to your jobs
   • monitoring of running applications:
        – ex: ongoing results from running jobs can be published on
          the metadata server
   • Inputset for a storm of parametric jobs
   • information exchanging among grid peers
        – ex: producers/consumers job collections: master jobs
          produce data to be analyzed; slave jobs query the metadata
          server to retrieve input to “consume”
   • Simplified DB access on the grid
        – Grid applications that needs structured data can model their
          data schemas as metadata

INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   43
Monitoring of running application
                        Enabling Grids for E-sciencE




                   SE                                                             showing results
                                                                                  as long as they
                                                                                  are produced
                                          W
                                          N


                  CE                       WN            Metadata
                                                         Catalogue
                                                       /results collection
                                          WN
        Workload
        Manager



             Scientist/Developer                                                       Customer/
             submitting jobs                                                           Scientist

INFSO-RI-508833                                                      ISSGC 09, Sophia-Antipolis, 10-07-09   44
Inputset for parametric jobs
                             Enabling Grids for E-sciencE




   • /grid/my_simulation/input
  ----------------------------------------------------------------------------------------------------
  |entry     |x1        |x2        |y1        |y2        |step      |isTaken   |found     |output    |
  |--------------------------------------------------------------------------------------------------|
  |1         |9453.1    |9453.32   |-439.93   |-439.91   |0.0006    |JobID1234 |No pillars|          |
  |2         |9342.13   |3435      |3423      |2343.2    |0.003     |No        |          |          |
  |3         |34254.3   |342342    |432.43    |132       |0.002     |No        |          |          |
  | ...... and so on                                                                                 |
  ----------------------------------------------------------------------------------------------------

  • This collection lists all the parameter set to be run on the
    Grid
  • On the WN, one of the inputset is selected and “isTaken” is
    set = JOB_ID of the job that has fetched it
  • Results is also written in the “found” column to monitor the
    simulation
     • so users can check the simulation from a UI, querying the
        metadata server, or from a WebPage (using APIs for ex)
   • StdOutput can be copied also into the “output”
     text column
INFSO-RI-508833                                                                ISSGC 09, Sophia-Antipolis, 10-07-09   45
A possible parameter-get.sh script
                      Enabling Grids for E-sciencE

   #!/bin/bash
   # Find the first set of parameters that has not been taken by noone
   ID=`mdcli find /grid/my_simulation/input 'isTaken="No"' | head -1`
   # Exit if all the parameters set has been already analyzed
   if [ "$ID" = "" ]; then exit 1; fi
   # set isTaken as its JOB_ID so that no one else will analyze the same set of parameter
   mdcli setattr /grid/my_simulation/input/$ID isTaken `echo $GLITE_WMS_JOBID`
   # retrieve the set of the parameter to be scanned
   X1=`mdcli getattr /grid/my_simulation/input/$ID x1 | tail -1`
   Y1=`mdcli getattr /grid/my_simulation/input/$ID y1 | tail -1`
   X2=`mdcli getattr /grid/my_simulation/input/$ID x2 | tail -1`
   Y2=`mdcli getattr /grid/my_simulation/input/$ID y2 | tail -1`
   STEP=`mdcli getattr /grid/my_simulation/input/$ID step | tail -1`


   # Run the scan with the proper parameter and save the output to output.txt
   java -cp issgc_sfk_nesc.jar:sfkscanner.jar  uk.ac.nesc.toe.sfk.radar.Scanner $X1 $Y1 $X2 $Y2
   $STEP > output.txt


   # the Scanner class returns the writing "No pillars found in this area" or "Found area:" -
   so this will give useful info for monitoring during the run
   mdcli setattr /grid/my_simulation/input/$ID found `cat output.txt | grep -i found`
   # save the output (and the pillar text if found) on the metadata server
   mdcli setattr /grid/my_simulation/input/$ID output `cat output.txt`

INFSO-RI-508833                                             ISSGC 09, Sophia-Antipolis, 10-07-09   46
Use a Metadata services to exchange data
                  Enabling Grids for E-sciencE
                                                          among running jobs




INFSO-RI-508833                                 ISSGC 09, Sophia-Antipolis, 10-07-09
Use a Metadata services to exchange data
                   Enabling Grids for E-sciencE
                                                           among running jobs



     • Suppose we have two sets of jobs:
         – Producers: they generate a file, store on a SE, register
           it onto the LFC File Catalogue assigning a LFN
         – Consumers: they will take a LFN, download the file and
           elaborate it




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09
Use a Metadata services to exchange data
                   Enabling Grids for E-sciencE
                                                           among running jobs



     • Suppose we have two sets of jobs:
         – Producers: they generate a file, store on a SE, register
           it onto the LFC File Catalogue assigning a LFN
         – Consumers: they will take a LFN, download the file and
           elaborate it

     • A Metadata collection can be used to share the
       information generated by the Producers; it
       could act as a “bag-of-LFNs” (bag-of-task
       model) from which Consumers can fetch file for
       further elaboration



INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09
Information exchanging among grid peers
                        Enabling Grids for E-sciencE




                                                                SE
                                 Producers jobs
                                                                                 Consumers jobs
                                put LFN
                                                 W                                         fetch LFN
                                                 N                                W
                                                                                  N


                   CE                             WN    Metadata                   WN
                                                        Catalogue
                                                       /bag-of-LFNs
                                                  WN    collection
                                                                                  WN
                                                          CE
   Workload
   Manager

                  Scientist/Developer
                  submitting jobs

INFSO-RI-508833                                                ISSGC 09, Sophia-Antipolis, 10-07-09   48
The AMGA Metadata Catalogue
                    Enabling Grids for E-sciencE




    • Official metadata service for the gLite middleware
     – but no dependencies from gLite software
     – it can be used with other grid technologies/other environments
    • AMGA: Arda Metadata Grid Application
    • Provide a complete but simple interface, in order to make all
      users able to use it easily.
    • Designed with scalability in mind in order to deal with large
      number of entries
     – based on a lightweight and streamed text-based protocol, like HTTP/SMTP
    • Grid security is provided to grant different access levels to
      different users.
    • Flexible with support to dynamic schemas in order to serve
      several application domains
    • Simple installation by tar source, RPMs or Yum/YAIM


INFSO-RI-508833                                     ISSGC 09, Sophia-Antipolis, 10-07-09   49
AMGA Features
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09
AMGA Features
                    Enabling Grids for E-sciencE




 • Dynamic Schemas
      – Schemas can be modified at runtime by client
            Create, delete schemas
            Add, remove attributes




INFSO-RI-508833                                    ISSGC 09, Sophia-Antipolis, 10-07-09
AMGA Features
                    Enabling Grids for E-sciencE




 • Dynamic Schemas
      – Schemas can be modified at runtime by client
            Create, delete schemas
            Add, remove attributes
 • AMGA collections are hierarchical organized
      – Collections can contain sub-collections
      – Sub-collections can inherit/extend parent collection’ schema




INFSO-RI-508833                                    ISSGC 09, Sophia-Antipolis, 10-07-09
AMGA Features
                      Enabling Grids for E-sciencE




 • Dynamic Schemas
      – Schemas can be modified at runtime by client
            Create, delete schemas
            Add, remove attributes
 • AMGA collections are hierarchical organized
      – Collections can contain sub-collections
      – Sub-collections can inherit/extend parent collection’ schema
 • Flexible Queries
      – SQL-like query language
      – Different join type (inner, outer, left, right) between schemas are
        provided
        selectattr /gLibrary:FileName /gLAudio:Author /gLAudio:Album
        '/gLibrary:FILE=/gLAudio:FILE and like(/gLibrary:FileName, “%.mp3")‘




INFSO-RI-508833                                            ISSGC 09, Sophia-Antipolis, 10-07-09
AMGA Features
                      Enabling Grids for E-sciencE




 • Dynamic Schemas
      – Schemas can be modified at runtime by client
            Create, delete schemas
            Add, remove attributes
 • AMGA collections are hierarchical organized
      – Collections can contain sub-collections
      – Sub-collections can inherit/extend parent collection’ schema
 • Flexible Queries
      – SQL-like query language
      – Different join type (inner, outer, left, right) between schemas are
        provided
        selectattr /gLibrary:FileName /gLAudio:Author /gLAudio:Album
        '/gLibrary:FILE=/gLAudio:FILE and like(/gLibrary:FileName, “%.mp3")‘


       Support for Views, Constraints, Indexes

INFSO-RI-508833                                            ISSGC 09, Sophia-Antipolis, 10-07-09
Example
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09
AMGA Security
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09   52
AMGA Security
                  Enabling Grids for E-sciencE




  • Unix style permissions - users and groups




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09   52
AMGA Security
                  Enabling Grids for E-sciencE




  • Unix style permissions - users and groups
  • ACLs – Per-collection or per-entry (table row).




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09   52
AMGA Security
                  Enabling Grids for E-sciencE




  • Unix style permissions - users and groups
  • ACLs – Per-collection or per-entry (table row).
  • Secure client/server connections – SSL




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09   52
AMGA Security
                  Enabling Grids for E-sciencE




  •   Unix style permissions - users and groups
  •   ACLs – Per-collection or per-entry (table row).
  •   Secure client/server connections – SSL
  •   Client Authentication based on
       – Username/password
       – General X509 certificates (DN based)
       – Grid-proxy certificates (DN based)




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09   52
AMGA Security
                  Enabling Grids for E-sciencE




  •   Unix style permissions - users and groups
  •   ACLs – Per-collection or per-entry (table row).
  •   Secure client/server connections – SSL
  •   Client Authentication based on
       – Username/password
       – General X509 certificates (DN based)
       – Grid-proxy certificates (DN based)
  • VOMS support:
       – VO attribute maps to defined AMGA user
       – VOMS Role maps to defined AMGA user
       – VOMS Group maps to defined AMGA group




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09   52
AMGA Implementation
                            Enabling Grids for E-sciencE




 • C++ multiprocess server
      – Backends
            Oracle, MySQL 4/5, PostgreSQL,
             SQLite
      – Front Ends
              TCP text streaming
                  •   High performance
                  •   Client API for C++, Java, Python,
                      Perl, PHP
              SOAP (deprecated)
                  •   Interoperability                     • AMGA server runs on SLC3/4,
                  •   Scalability                           Fedora Core, Gentoo, Debian
              WS-DAIR Interface (new in
              AMGA 2.0)
                  •   WS-enable environment


 • Standalone Python Library
   implementation
      – Data stored on file system

INFSO-RI-508833                                                   ISSGC 09, Sophia-Antipolis, 10-07-09
AMGA Datatypes
                  Enabling Grids for E-sciencE




   - Using the above datatypes you are sure that your
     metadata can be easily moved to all supported back-
     ends

   - If you do not care about DB portability, you can use, in
     principle, as entry attribute type ALL the datatypes
     supported by the back-end, even the more esoteric ones
     (PostgreSQL Network Address type or Geometric ones)
INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09
Accessing AMGA from UI/WNs
                     Enabling Grids for E-sciencE



   • TCP Streaming Front-end
        – mdcli & mdclient CLI and C++ API (md_cli.h, MD_Client.h)
        – Java Client API and command line mdjavaclient.sh & mdjavacli.sh
          (also under Windows !!)
        – Python and Perl Client API
        – PHP Client API – NEW
              developed totally by the GILDA team – INFN CT
        – AMGA Web Interface (AMGA WI) ---NEW
              Developed totally by the GILDA team – INFN CT
              Based on JAVA AMGA Standard APIs
              Web Application using standard as JSP Custom Tags, Servlet
   • SOAP Frontend (WSDL)
        – C++ gSOAP
        – AXIS (Java)
        – ZSI (Python)

INFSO-RI-508833                                      ISSGC 09, Sophia-Antipolis, 10-07-09
Advanced features: Metadata Replication
                        Enabling Grids for E-sciencE




     • AMGA provides a replication/federation mechanisms
     • Motivation
          –   Scalability – Support hundreds/thousands of concurrent users
          –   Geographical distribution – Hide network latency
          –   Reliability – No single point of failure
          –   DB Independent replication – Heterogeneous DB systems
          –   Disconnected computing – Off-line access (laptops)
     • Architecture
          – Asynchronous replication
          – Master-slave – writes only allowed on the master
          – Application level replication
                   Replicate Metadata commands
          – Partial replication – supports replication of only sub-trees of the
            metadata hierarchy


INFSO-RI-508833                                        ISSGC 09, Sophia-Antipolis, 10-07-09
Metadata Replication: Use cases
                       Enabling Grids for E-sciencE




                  Full replication                         Partial replication




                  Federation                                      Proxy




INFSO-RI-508833                                       ISSGC 09, Sophia-Antipolis, 10-07-09
Existing SQL DBs importing
                   Enabling Grids for E-sciencE




    • Since AMGA 1.2.10, a new import feature allow to
      access existing DB table
    • Once imported into AMGA the tables from one or
      more DBs you want to access through AMGA, you
      can exploit many of the features brought to you by
      AMGA for your existing tables
    • Advantages:
         – your db tables can be accessed by grid users/applications,
           using grid authentication (VOMS proxies)/authorization with
           ACLs
         – exploiting AMGA federation features you can access several
           databases together from the Grid



INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   58
Set up AMGA to access your tables
                        Enabling Grids for E-sciencE




     • To remember: AMGA stores its own tables in its
       DB backend
     • To access and existing DB you have 2 option:
                   import the tables of the DB you want to access to into
                    AMGA DB backend
                   viceversa, add AMGA DB backed tables to the DB you want
                    to access to
     • Use the import command by root to “mount” you
       table into the AMGA collection hierarchy
              Query> whoami
              >> root
              Query> createdir /world
              Query> cd /world/
              Query> import world.City /world/City
              Query> import world.Country /world/Country
              Query> import world.CountryLanguage /world/CountryLanguage



INFSO-RI-508833                                        ISSGC 09, Sophia-Antipolis, 10-07-09   59
Set up AMGA to access your tables
                     Enabling Grids for E-sciencE




      • Properly set up authorization on the imported
        tables:
        Query> acl_remove /world/City/ system:anyuser
        Query> acl_remove /world/Country system:anyuser
        Query> acl_add /world/ gilda:users rx
        Query> acl_show /world
        >> root rwx
        >> gilda:users rx
        >> system:anyuser rx
        Query> selectattr City:CountryCode City:Name 'like(City:Name, "Am%")
        limit 5'
        >> NLD
        >> Amsterdam
        >> NLD
        >> Amersfoort
        >> BRA
        >> Americana
        >> ECU
        >> Ambato
        >> IDN

      ‣ More information on existing DB access @:
          ‣       http://amga.web.cern.ch/amga/importing.html
          ‣       https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGADBaccess
INFSO-RI-508833                                              ISSGC 09, Sophia-Antipolis, 10-07-09   60
Native SQL Support
                      Enabling Grids for E-sciencE



   • Objective:
        – implement native SQL query processing functionality in AMGA

   • Current Status:
        – direct SQL data statement in SQL92 Entry Level has been
          implemented in the 1.9 release
              Including 4 statements: SELECT, DELETE, UPDATE and INSERT
              ALL SQL commands should be issued in UPPERCASE


   • Entry name:
        – when a new entry is created with addentry/addentries, a name
          has to be assigned (filling the “file” column in the AMGA db
          backend)
              in the INSERT implementation, it’s filled automatically with a
               random guid

INFSO-RI-508833                                        ISSGC 09, Sophia-Antipolis, 10-07-09   61
Enabling Grids for E-sciencE
                                                     Native SQL example
    Query> INSERT INTO `City` VALUES (1,'Kabul','AFG','Kabol',1780000)
    >>    Operation Success
    Query> dir /world/City/
    >> /world/City/80b4fe646ed11dda02100304873049
    >> entry
    Query> SELECT COUNT (*) FROM /world/City
    >> 3429
    Query> SELECT * FROM /world/City WHERE Name LIKE '%Catani%'
    >> 1472
    >> Catania
    >> ITA
    >> Sisilia
    >> 337862
    Query> SELECT /world/City:Name, /world/City:District, /world/Country:Name, /
    world/Country:Region, /world/Country:Continent FROM /world/City, /world/Country
    WHERE /world/City:Name LIKE '%Catani%' AND Code = 'ITA'
    >> Catania
    >> Sisilia
    >> Italy
    >> Southern Europe
    >> Europe


INFSO-RI-508833                                        ISSGC 09, Sophia-Antipolis, 10-07-09   62
Biomed - MDM
                      Enabling Grids for E-sciencE




•   Medical Data Manager – MDM
    – Store and access medical images and associated metadata on the Grid
    – Built on top of gLite 1.5 data management system
    – Demonstrated at last EGEE conference (October 05, Pisa)

•   Strong security requirements
    – Patient data is sensitive
    – Data must be encrypted
    – Metadata access must be restricted to authorized users

•   AMGA used as metadata server
    – Demonstrates authentication and encrypted access
    – Used as a simplified DB
•   More details at
    – https://uimon.cern.ch/twiki/bin/view/EGEE/DMEncryptedStorage



INFSO-RI-508833                                        ISSGC 09, Sophia-Antipolis, 10-07-09
gMOD: grid Movie On Demand
                  Enabling Grids for E-sciencE




  • gMOD provides a Video-On-Demand service
  • User chooses among a list of video and the chosen
    one is streamed in real time to the video client of the
    user’s workstation
  • For each movie a lot of details (Title, Runtime,
    Country, Release Date, Genre, Director, Case, Plot
    Outline) are stored and users can search a particular
    movie querying on one or more attributes
  • Two kind of users can interact with gMOD:
    TrailersManagers that can administer the db of
    movies (uploading new ones and attaching metadata
    to them); GILDA VO users (guest) can browse, search
    and choose a movie to be streamed.

INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09
gMOD under the hood
                  Enabling Grids for E-sciencE




     • Built on top of gLite services:
     • Storage Elements, sited in different place, physically
       contain the movie files
     • LFC, the File Catalogue, keeps track in which Storage
       Element a particular movie is located
     • AMGA is the repository of the detailed information for
       each movie, and makes possible queries on them
     • The Virtual Organization Membership Service (VOMS) is
       used to assign the right role to the different users
     • The Workload Management System (WMS) is responsible
       to retrieve the chosen movie from the right Storage
       Element and stream it over the network down to the
       user’s desktop or laptop


INFSO-RI-508833                                      ISSGC 09, Sophia-Antipolis, 10-07-09
gMOD interactions
                  Enabling Grids for E-sciencE




      VOMS                                                 Metadata
                                                           Catalogue                 Storage
                                                                                     Elements
                                GENIUS Portal               AMGA
      get Role



                                                              LFC File
                                                             Catalogue


       User                                Workload Management System

                                            W            CE
                                            N W
                                                N
                                                 WN


INFSO-RI-508833                                            ISSGC 09, Sophia-Antipolis, 10-07-09
gMOD screenshot
                      Enabling Grids for E-sciencE


        gMOD is accesible through the Genius Portal (https://glite-demo.ct.infn.it)




INFSO-RI-508833                                         ISSGC 09, Sophia-Antipolis, 10-07-09
gLibrary features
                  Enabling Grids for E-sciencE



   • INFN-developed tool totally gLite based
   • It allows to store, organize, search and retrieve digital
     assets on a Grid environment with an intuitive front-end
   • What we mean by Digital Assets:




INFSO-RI-508833                                     ISSGC’09, Taipei, 21st April 09   68
gLibrary as the iTunes for the Grid
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   69
Browse & Search
                     Enabling Grids for E-sciencE

   • Assets can be browsed selecting a type (or category) and
     selecting one or more filters:
        – attributes of the selected types, chosen from a defined list, used to narrow
          the result set
   • Filter application is cascading and context-sensitive: the
     selection of a filter value dynamically influences subsequent
     filter values (“à la iTunes” browsing)
       – Classical search by description and keywords available too




INFSO-RI-508833                                              ISSGC’09, Taipei, 21st April 09   70
Organize assets
                    Enabling Grids for E-sciencE

 • “Types” and “Categories” definition by repository
   providers/admins:
                                                     • and/or organized
 • Assets are organized by                             by collection:
   type:                                             – Group together related
 – a list of specific attributes to                    assets of different type;
   describe each kind of asset to                    – Useful also to define
   be managed by the system                            subsets of assets
 – hierarchical (a child type shares                   belonging to the same
   and extend parent’s attributes)                     type
 – queried during searches                           – Multiple category
                                                       assignment per asset
                                                       (tagging)


                                                         Collections




INFSO-RI-508833                                     ISSGC’09, Taipei, 21st April 09   71
Store & Retrieve
                  Enabling Grids for E-sciencE


•   Users can upload their local assets on one or more
    (creating replicas) Storage Elements of the Grid
     - Files already on grid SE can be registered in a gLibrary
       repository by the LFC File Catalogue browser

•   Download from SEs to the users’ laptop/desktop:
     - selection of a replica link from a list



•   Transfers are handled from the browser over HTTP/
    HTTPS provided that users have their own X.509 Grid
    Certificate imported

INFSO-RI-508833                                    ISSGC’09, Taipei, 21st April 09   72
gLibrary architecture
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                        ISSGC’09, Taipei, 21st April 09   73
Features
                          Enabling Grids for E-sciencE


     • Implemented as Web 2.0 application
          – AJAX and Javascript are strongly used to offer a desktop like
            user experience
          – Business logic implemented using PHP 5 OOP support




EGEE-II INFSO-RI-031688                Antonio Calanducci   ISGC09 Grid Tutorial, Taipei 18-04-2009   74
Technologies used
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   75
Technologies used
                  Enabling Grids for E-sciencE



  • Web standards:
       – Javascript/AJAX/JSON on the client side
       – PHP5 classes to implement business logic on the server side




INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   75
Technologies used
                  Enabling Grids for E-sciencE



  • Web standards:
       – Javascript/AJAX/JSON on the client side
       – PHP5 classes to implement business logic on the server side




INFSO-RI-508833                                   ISSGC 09, Sophia-Antipolis, 10-07-09   75
Technologies used
                   Enabling Grids for E-sciencE



  • Web standards:
       – Javascript/AJAX/JSON on the client side
       – PHP5 classes to implement business logic on the server side


  • Grid technologies:
       – Storage Element SRM interface to get the TURLs (Transfer URLs)
       – Transfers handled with GridFTP (first release) and X.509 cert auth
         HTTPS
       – X.509 based Globus Security Infrastructure with the VOMS
         extensions to handle authentication and authorization (ACL based)
         on Metadata and Storage Elements
       – All grid services implemented with the EGEE gLite middleware
         (DPM Storage Elements, AMGA Metadata Catalogue, LFC File
         Catalogue, VOMS Services)


INFSO-RI-508833                                    ISSGC 09, Sophia-Antipolis, 10-07-09   75
De Roberto cultural heritage
                  Enabling Grids for E-sciencE



  • De Roberto, an Italian writer of the XIX/XX century, born in
    Naples, but spending his life in Catania, has left to the
    humanistic community numerous works
  • Those are made up of valuable and hard-to-manage pieces:
    manuscripts, typescripts, drafts with handwritten
    corrections, magazines, cuts, sketches, photos, etc.




INFSO-RI-508833                                        ISSGC’09, Taipei, 21st April 09   76
Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   77
Digitalize to preserve them
                  Enabling Grids for E-sciencE




 • Some sheets are damaged (mold, crumbed
   pieces) and need physical restoration
 • Digitalization to avoid the loss of this works,
   some of them still unpublished and relevant
   for the humanistic communities




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   78
Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   79
Acquisition stage
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                       ISSGC’09, Taipei, 21st April 09   80
Acquisition stage
                     Enabling Grids for E-sciencE


• Digitalization of manuscripts, typescripts, printed works
   – TIFF Files, one per page, 600 dpi, about 100MB for A3
           High resolution scans for in-depth examination
     – PDF, one per work, 300 dpi, varying file sizes 40-400MB
           Overall examination of works
     – 8000 sheets/scans, 3 Terabyte of disk space
     – Different physical formats, A3/A4/custom size




INFSO-RI-508833                                          ISSGC’09, Taipei, 21st April 09   80
Acquisition stage
                      Enabling Grids for E-sciencE


• Digitalization of manuscripts, typescripts, printed works
   – TIFF Files, one per page, 600 dpi, about 100MB for A3
           High resolution scans for in-depth examination
     – PDF, one per work, 300 dpi, varying file sizes 40-400MB
           Overall examination of works
   – 8000 sheets/scans, 3 Terabyte of disk space
   – Different physical formats, A3/A4/custom size
• Embedded Metadata
   – TIFF with embedded metadata to provide scan physical
     features and information about the content
           ImageWidth, ImageHeight, XResolution, FileSize, CreationDate, ModifyDate
           Description, Keywords, CaptionWriter, Title, Author, Copyright Status, Copyright
            Notice
     – Added with Photoshop after the digitalization phase (Adobe
       XMP format)
INFSO-RI-508833                                                 ISSGC’09, Taipei, 21st April 09   80
Goals and requirements
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                           ISSGC’09, Taipei, 21st April 09   81
Goals and requirements
                  Enabling Grids for E-sciencE


 • Make those works accessible to the humanistic communities
      – Always on-line: 24 x 365
      – Available from everywhere
      – Simple and easy-to-use interface for non-expert people




INFSO-RI-508833                                           ISSGC’09, Taipei, 21st April 09   81
Goals and requirements
                    Enabling Grids for E-sciencE


 • Make those works accessible to the humanistic communities
      – Always on-line: 24 x 365
      – Available from everywhere
      – Simple and easy-to-use interface for non-expert people
 • Quickly find the desired document
      – Document organization according the physical and semantic metadata
          Organization by type/collections
             Dynamic filtering of search result sets according the selection of
             one or more document metadata




INFSO-RI-508833                                             ISSGC’09, Taipei, 21st April 09   81
Goals and requirements
                    Enabling Grids for E-sciencE


 • Make those works accessible to the humanistic communities
      – Always on-line: 24 x 365
      – Available from everywhere
      – Simple and easy-to-use interface for non-expert people
 • Quickly find the desired document
      – Document organization according the physical and semantic metadata
          Organization by type/collections
             Dynamic filtering of search result sets according the selection of
             one or more document metadata
 • Long-term preservation (digital preservation)
      – Multiple copies (replicas) spread in different geographical sites
      – Reliability of storage systems and replica redundancy to achieve secure
        preservation




INFSO-RI-508833                                             ISSGC’09, Taipei, 21st April 09   81
What Data Grids can offer to them
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   82
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto Heritage
     ----> Data Grid Storage Elements




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   82
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto Heritage
     ----> Data Grid Storage Elements
   • enable an ubiquitous and 24/24h access to
     scientists ---> Web Application




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   82
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto Heritage
     ----> Data Grid Storage Elements
   • enable an ubiquitous and 24/24h access to
     scientists ---> Web Application
   • document organization for a quick search
     ---> Metadata Services




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   82
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto Heritage
     ----> Data Grid Storage Elements
   • enable an ubiquitous and 24/24h access to
     scientists ---> Web Application
   • document organization for a quick search
     ---> Metadata Services
   • long-term digital preservation of data
     ---> redundancy through Replicas of files on
     several Storage Elements




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   82
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto Heritage
     ----> Data Grid Storage Elements
   • enable an ubiquitous and 24/24h access to
     scientists ---> Web Application
   • document organization for a quick search
     ---> Metadata Services
   • long-term digital preservation of data
     ---> redundancy through Replicas of files on
     several Storage Elements
   • simple and easy-to-use system for searches,
     organization, upload and download of
     digitalized documents on the Grid

INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   82
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto Heritage
     ----> Data Grid Storage Elements
   • enable an ubiquitous and 24/24h access to
     scientists ---> Web Application
   • document organization for a quick search
     ---> Metadata Services
   • long-term digital preservation of data
     ---> redundancy through Replicas of files on
     several Storage Elements
   • simple and easy-to-use system for searches,
     organization, upload and download of
     digitalized documents on the Grid
       ----->
INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   82
What Data Grids can offer to them
                  Enabling Grids for E-sciencE


   • store the 8000 scans of De Roberto Heritage
     ----> Data Grid Storage Elements
   • enable an ubiquitous and 24/24h access to
     scientists ---> Web Application
   • document organization for a quick search
     ---> Metadata Services
   • long-term digital preservation of data
     ---> redundancy through Replicas of files on
     several Storage Elements
   • simple and easy-to-use system for searches,
     organization, upload and download of
     digitalized documents on the Grid
       ----->
INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   82
Data Grids to preserve Cultural Heritage
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   83
Live Demo
                  Enabling Grids for E-sciencE




INFSO-RI-508833                                  ISSGC’09, Taipei, 21st April 09   84
Data Management Services Summary
                  Enabling Grids for E-sciencE



 • Storage Element – save date and provide a common interface
      – Storage Resource Manager (SRM) dCache, DPM, ...
      – Native Access protocols        rfio, dcap, nfs, …
      – Transfer protocols             gsiftp, https, …


 • Catalogs – keep track where data are stored
      – File Catalog
                                                         LCG File Catalog (LFC)
      – Replica Catalog
      – Metadata Catalog                                 AMGA Metadata Catalogue


 • Data Movement – schedules reliable file transfer
      – File Transfer Service                    gLite FTS (manages physical transfers)



INFSO-RI-508833                                                    ISSGC 09, Sophia-Antipolis, 10-07-09   85
References
                  Enabling Grids for E-sciencE




 •    gLite documentation homepage
      – http://glite.web.cern.ch/glite/documentation/default.asp

 •    DM subsystem documentation
      – http://egee-jra1-dm.web.cern.ch/egee-jra1-dm/doc.htm

 •    LFC and DPM documentation
      – https://uimon.cern.ch/twiki/bin/view/LCG/
        DataManagementDocumentation




INFSO-RI-508833                                  ISSGC 09, Sophia-Antipolis, 10-07-09   86
References
                     Enabling Grids for E-sciencE


   • AMGA Web Site
             http://cern.ch/amga


   • AMGA Manual
            http://amga.web.cern.ch/amga/downloads/amga-manual_1_3_0.pdf


   • AMGA API Javadoc
            http://amga.web.cern.ch/amga/javadoc/index.html


   • AMGA Web Frontend
            http://gilda-forge.ct.infn.it/projects/amgawi/


   • AMGA Basic Tutorial
            https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGAHandsOn

   • More information on existing DB access @:
        – http://amga.web.cern.ch/amga/importing.html
        – https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGADBaccess

INFSO-RI-508833                                         ISSGC 09, Sophia-Antipolis, 10-07-09   87
gLibrary references
                          Enabling Grids for E-sciencE




     •    gLibray project homepage:
          – https://glibrary.ct.infn.it/

     •    gLibrary paper:
           – https://glibrary.ct.infn.it/glibrary/downloads/gLibrary_paper_v2.pdf




EGEE-II INFSO-RI-031688                Antonio Calanducci    ISGC09 Grid Tutorial, Taipei 18-04-2009   88
Session 24 - Distribute Data and Metadata Management with gLite

Contenu connexe

Tendances (11)

Introduction of file based workflows 111004 vfinal
Introduction of file based workflows 111004 vfinalIntroduction of file based workflows 111004 vfinal
Introduction of file based workflows 111004 vfinal
 
Sucet os module_2_notes
Sucet os module_2_notesSucet os module_2_notes
Sucet os module_2_notes
 
Shaman Project Hemmje
Shaman Project  HemmjeShaman Project  Hemmje
Shaman Project Hemmje
 
Smashing The Stack
Smashing The StackSmashing The Stack
Smashing The Stack
 
Ogce Workflow Suite Tg09
Ogce Workflow Suite Tg09Ogce Workflow Suite Tg09
Ogce Workflow Suite Tg09
 
Introduction to Microkernels
Introduction to MicrokernelsIntroduction to Microkernels
Introduction to Microkernels
 
Resume
ResumeResume
Resume
 
IEEExeonmem
IEEExeonmemIEEExeonmem
IEEExeonmem
 
Summer Training In Dotnet
Summer Training In DotnetSummer Training In Dotnet
Summer Training In Dotnet
 
Phasor data concentrator or i pdc
Phasor data concentrator or i pdcPhasor data concentrator or i pdc
Phasor data concentrator or i pdc
 
Gt3112931298
Gt3112931298Gt3112931298
Gt3112931298
 

Similaire à Session 24 - Distribute Data and Metadata Management with gLite

Distributed file system
Distributed file systemDistributed file system
Distributed file systemAnamika Singh
 
IRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFSIRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFSIRJET Journal
 
Next-generation sequencing: Data mangement
Next-generation sequencing: Data mangementNext-generation sequencing: Data mangement
Next-generation sequencing: Data mangementGuy Coates
 
Multicore series-1-0223
Multicore series-1-0223Multicore series-1-0223
Multicore series-1-0223Aysha Khan
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage systemZhichao Liang
 
Research computing at ILRI
Research computing at ILRIResearch computing at ILRI
Research computing at ILRIILRI
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Jian Qin
 
Architecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceArchitecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceSpeck&Tech
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes Minio
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptxVMahesh5
 
[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage ReductionPerforce
 
Damon2011 preview
Damon2011 previewDamon2011 preview
Damon2011 previewsundarnu
 
Exploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed UpExploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed UpIJERD Editor
 
Peer-to-Peer Management of Large-Scale Memory Sources (midterm)
Peer-to-Peer Management of Large-Scale Memory Sources (midterm)Peer-to-Peer Management of Large-Scale Memory Sources (midterm)
Peer-to-Peer Management of Large-Scale Memory Sources (midterm)odcsss
 
Securing Multi-copy Dynamic Data in Cloud using AES
Securing Multi-copy Dynamic Data in Cloud using AESSecuring Multi-copy Dynamic Data in Cloud using AES
Securing Multi-copy Dynamic Data in Cloud using AESIRJET Journal
 
Storage for next-generation sequencing
Storage for next-generation sequencingStorage for next-generation sequencing
Storage for next-generation sequencingGuy Coates
 

Similaire à Session 24 - Distribute Data and Metadata Management with gLite (20)

Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
IRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFSIRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFS
 
Next-generation sequencing: Data mangement
Next-generation sequencing: Data mangementNext-generation sequencing: Data mangement
Next-generation sequencing: Data mangement
 
Multicore series-1-0223
Multicore series-1-0223Multicore series-1-0223
Multicore series-1-0223
 
Survey of distributed storage system
Survey of distributed storage systemSurvey of distributed storage system
Survey of distributed storage system
 
Research computing at ILRI
Research computing at ILRIResearch computing at ILRI
Research computing at ILRI
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Architecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceArchitecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for science
 
11. dfs
11. dfs11. dfs
11. dfs
 
Chapter-5-DFS.ppt
Chapter-5-DFS.pptChapter-5-DFS.ppt
Chapter-5-DFS.ppt
 
12. dfs
12. dfs12. dfs
12. dfs
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptx
 
[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction
 
Damon2011 preview
Damon2011 previewDamon2011 preview
Damon2011 preview
 
Exploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed UpExploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed Up
 
Peer-to-Peer Management of Large-Scale Memory Sources (midterm)
Peer-to-Peer Management of Large-Scale Memory Sources (midterm)Peer-to-Peer Management of Large-Scale Memory Sources (midterm)
Peer-to-Peer Management of Large-Scale Memory Sources (midterm)
 
Securing Multi-copy Dynamic Data in Cloud using AES
Securing Multi-copy Dynamic Data in Cloud using AESSecuring Multi-copy Dynamic Data in Cloud using AES
Securing Multi-copy Dynamic Data in Cloud using AES
 
Storage for next-generation sequencing
Storage for next-generation sequencingStorage for next-generation sequencing
Storage for next-generation sequencing
 
Unit 5
Unit 5Unit 5
Unit 5
 

Plus de ISSGC Summer School

Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future ISSGC Summer School
 
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake EdlundSession 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake EdlundISSGC Summer School
 
Session 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in EuropeSession 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in EuropeISSGC Summer School
 
Session 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteSession 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteISSGC Summer School
 
Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management ISSGC Summer School
 
Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical ISSGC Summer School
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution ISSGC Summer School
 
Session 37 - Intro to Workflows, API's and semantics
Session 37 - Intro to Workflows, API's and semantics Session 37 - Intro to Workflows, API's and semantics
Session 37 - Intro to Workflows, API's and semantics ISSGC Summer School
 
Session 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an exampleSession 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an exampleISSGC Summer School
 
Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction ISSGC Summer School
 
General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school ISSGC Summer School
 

Plus de ISSGC Summer School (20)

Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future
 
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake EdlundSession 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
 
Session 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in EuropeSession 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in Europe
 
Integrating Practical2009
Integrating Practical2009Integrating Practical2009
Integrating Practical2009
 
Session 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteSession 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky Note
 
Departure
DepartureDeparture
Departure
 
Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management
 
Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution
 
Session 42 - GridSAM
Session 42 - GridSAMSession 42 - GridSAM
Session 42 - GridSAM
 
Session 37 - Intro to Workflows, API's and semantics
Session 37 - Intro to Workflows, API's and semantics Session 37 - Intro to Workflows, API's and semantics
Session 37 - Intro to Workflows, API's and semantics
 
Session 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an exampleSession 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an example
 
Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction
 
Session 36 - Engage Results
Session 36 - Engage ResultsSession 36 - Engage Results
Session 36 - Engage Results
 
Session 23 - Intro to EGEE-III
Session 23 - Intro to EGEE-IIISession 23 - Intro to EGEE-III
Session 23 - Intro to EGEE-III
 
Session 33 - Production Grids
Session 33 - Production GridsSession 33 - Production Grids
Session 33 - Production Grids
 
Social Program
Social ProgramSocial Program
Social Program
 
Session29 Arc
Session29 ArcSession29 Arc
Session29 Arc
 
Session 23 - gLite Overview
Session 23 - gLite OverviewSession 23 - gLite Overview
Session 23 - gLite Overview
 
General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school
 

Dernier

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Dernier (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

Session 24 - Distribute Data and Metadata Management with gLite

  • 1. Enabling Grids for E-sciencE Distribute Data and Metadata Management with gLite Antonio Calanducci antonio.calanducci@ct.infn.it National Institute of Nuclear Physics INFN Catania EGEE NA3 Training & Induction ISSGC 2009 Sophia-Antipolis, Polytech’Nice-Sofia , 10/07/2009 www.eu-egee.org INFSO-RI-031688
  • 2. Outline Enabling Grids for E-sciencE • Grid Data Management Challenge • Storage Elements and SRM • File Catalogs and Data Management tools • Metadata Service with use cases INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 2
  • 3. The Grid DM Challenge Enabling Grids for E-sciencE • Heterogeneity – Need common interface – Data are stored on different to storage resources storage systems using different  Storage Resource access technologies Manager (SRM) • Distribution – Data are stored in different – Need to keep track where locations – in most cases there data is stored is no shared file system or  File and Replica Catalogs common namespace – Need scheduled, reliable – Data need to be moved between file transfer different locations  File transfer service • Data about data – Data stored can be huge: find – Need a way to describe files according to their contents files’ content and query their descriptions  Metadata service INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 3
  • 4. SRM in an example Enabling Grids for E-sciencE She is running a job which needs: Data for physics event reconstruction Simulated Data Some data analysis files They are at CERN in dCache She will write files remotely too They are at Fermilab In a disk array They are at Nikhef in a classic SE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 4
  • 5. SRM in an example Enabling Grids for E-sciencE dCache Own system, own protocols and parameters You as a user need gLite DPM to know all Independent system from dCache or Castor the systems!!! Castor No connection with dCache or DPM INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 5
  • 6. SRM in an example Enabling Grids for E-sciencE dCache Own system, own protocols and parameters I talk to them on your behalf You as a I will user needspace even allocate gLite DPM for your know all to files SRM Independent system from And I will use transfer dCache or Castor protocols the to send your files theresystems!!! Castor No connection with dCache or DPM INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 5
  • 7. Storage Resource Managers Enabling Grids for E-sciencE • Data are stored on disk pool servers or Mass Storage Systems • Storage resource management needs to take into account – Transparent access to files (migration to/from disk pool) – File pinning – Space reservation – File status notification – Life time management • The SRM (Storage Resource Manager) takes care of all these details – The SRM is a single interface that takes care of local storage interaction and provides a Grid interface to the outside world INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 6
  • 8. Data management with gLite Enabling Grids for E-sciencE • Assumptions: – Users and programs produce and require data – the lowest granularity of the data is on the file level (we deal with files rather than data objects or tables)  Data = files • Files: – Mostly, write once, read many – Located in Storage Elements (SEs) – Several replicas of one file in different sites – Accessible by Grid users and applications from “anywhere” – Locatable by the WMS (data requirements in JDL) • Also… – WMS can send (small amounts of) data to/from jobs: Input and Output Sandbox – Files may be copied from/to local filesystems (WNs, UIs) to the Grid (SEs) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 7
  • 9. gLite Grid Storage Requirements Enabling Grids for E-sciencE • The Storage Element is the service which allow a user or an application to store data for future retrieval • Manage local storage (disks) and interface to Mass Storage Systems(tapes) like – HPSS, CASTOR, DiskeXtender (UNITREE), … • Be able to manage different storage systems uniformly and transparently for the user (providing an SRM interface) • Support basic file transfer protocols – GridFTP mandatory – Others if available (https, ftp, etc) • Support a native I/O (remote file) access protocol – POSIX (like) I/O client library for direct access of data (GFAL) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 8
  • 10. gLite Storage Element Enabling Grids for E-sciencE (http/https) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 9
  • 11. gLite SE types history Enabling Grids for E-sciencE • gLite data access protocols: – File Transfer: GSIFTP (GridFTP)/HTTP(S) – File I/O (Remote File access):  gsidcap  rfio (insecure RFIO)  gsirfio (secured RFIO) • Classic SE: – GridFTP server – Insecure RFIO daemon (rfiod) – only LAN limited file access – Single disk or disk array – No quota management – Does not support the SRM interface INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 10
  • 12. gLite SE types history(II) Enabling Grids for E-sciencE • Mass Storage Systems (Castor) – Files migrated between front-end disk and back-end tape storage hierarchies – GridFTP server – Insecure RFIO (Castor) – Provide a SRM interface with all the benefits • Disk pool managers (dCache and gLite DPM) – manage distributed storage servers in a centralized way – Physical disks or arrays are combined into a common (virtual) file system – Disks can be dynamically added to the pool – GridFTP server – Secure remote access protocols (gsidcap for dCache, gsirfio for DPM) – SRM interface INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 11
  • 13. DPM Overview Enabling Grids for E-sciencE • The Disk Pool Manager (DPM) is a lightweight solution for disk storage management, which offers the SRM interfaces (2.2 released in DPM version 1.6.3). • Each DPM–type Storage Element (SE), is composed by an head node and one (in the same machine) or more disk servers • The DPM handles the storage on Disk Servers. – It handles pools: a pool is a group of file systems, located on one or more disk servers. The DPM Disk Servers can have multiple filesystems in the pool. • Supported protocols: – File Transfers: GridFTP, HTTP/HTTPs – File I/O: GSIRFIO EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 12 3
  • 14. DPM Architecture Enabling Grids for E-sciencE /dpm /domain /home CLI, C API, /vo (uid, gid1, …) SRM-enabled client, DPM etc. da head node file ta tra • DPM Name Server ns fe Namespace r Authorization Physical files location • Disk Servers Physical files • Direct data transfer from/to DPM disk server (no bottleneck) disk servers EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 13 5
  • 15. DPM architecture (details) Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  • 16. DPM architecture (details) Enabling Grids for E-sciencE CLI, C API, SRM-enabled client, etc. EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  • 17. DPM architecture (details) Enabling Grids for E-sciencE CLI, C API, SRM-enabled client, etc. SRMv1 SRMv2 SRMv2.2 DPM /dpm head node /domain /home DPM DPNS /vo file daemon daemon EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  • 18. DPM architecture (details) Enabling Grids for E-sciencE CLI, C API, SRM-enabled client, etc. SRMv1 SRMv2 SRMv2.2 DPM /dpm head node /domain /home DPM DPNS /vo file daemon daemon EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  • 19. DPM architecture (details) Enabling Grids for E-sciencE DPM … disk servers CLI, C API, Secure RFIO Secure RFIO GridFTP GridFTP SRM-enabled client, etc. SRMv1 SRMv2 SRMv2.2 DPM /dpm head node /domain /home DPM DPNS /vo file daemon daemon EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  • 20. DPM architecture (details) Enabling Grids for E-sciencE data transfer DPM … disk servers CLI, C API, Secure RFIO Secure RFIO GridFTP GridFTP SRM-enabled client, etc. SRMv1 SRMv2 SRMv2.2 DPM /dpm head node /domain /home DPM DPNS /vo file daemon daemon EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  • 21. DPM architecture (details) Enabling Grids for E-sciencE data transfer DPM … disk servers CLI, C API, Secure RFIO Secure RFIO GridFTP GridFTP SRM-enabled client, etc. SRMv1 SRMv2 SRMv2.2 DPM /dpm head node /domain /home DPM DPNS /vo file daemon daemon DPNS database DPM database EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 14 6
  • 22. DPM strengths Enabling Grids for E-sciencE • Easy to install/configure - Few configuration files • Manageable storage - Logical Namespace - Easy to add/remove file systems • Low maintenance effort • Supports as many disk servers as needed • Low memory footprint • Low CPU utilization EGEE-II INFSO-RI-031688 EGEE Tutorial - Barcelona 14th - 18th April 2008, 15 7
  • 23. Files Naming conventions Enabling Grids for E-sciencE • Logical File Name (LFN) – An alias created by a user to refer to some item of data, e.g. “lfn:/grid/ gilda/20030203/run2/track1” • Globally Unique Identifier (GUID) – A non-easy-to-remember unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” • Site URL (SURL) (or Physical File Name (PFN) or Site FN) – The location of an actual piece of data on a storage system e.g. “srm://grid009.ct.infn.it/dpm/ct.infn.it/gilda/output10_1” (SRM) • Transport URL (TURL) – Temporary locator of a replica + access protocol: understood by a SE – e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat” INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 16
  • 24. What is a file catalog File Catalog Enabling Grids for E-sciencE SE SE gLite SE UI INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 17
  • 25. Upload and file convention example Enabling Grids for E-sciencE [issgc59@issgc-ui ~]$ lcg-cr -v -d aliserv6.ct.infn.it -l lfn:/grid/gilda/tutorials/ issgc09/message0.sh file:/home/issgc59/message0.sh Using grid catalog type: lfc Using grid catalog : lfc-gilda.ct.infn.it SE type: SRMv1 Destination SURL : srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/ 2009-07-10/file273edfaa-e0ed-448c-b713-7996c2d09f88 Source SRM Request Token: 401296 Source URL: file:/home/issgc59/message0.sh File size: 20 VO name: gilda Destination specified: aliserv6.ct.infn.it Destination URL for copy: gsiftp://aliserv6.ct.infn.it/aliserv6.ct.infn.it:/data01/gilda/ 2009-07-10/file273edfaa-e0ed-448c-b713-7996c2d09f88.401296.0 # streams: 1 20 bytes 0.01 KB/sec avg 0.01 KB/sec inst Transfer took 3070 ms Using LFN: lfn:/grid/gilda/tutorials/issgc09/message0.sh Using GUID: guid:2eed7180-2505-46fb-bef2-19ef77f1b2fd Registering LFN: /grid/gilda/tutorials/issgc09/message0.sh (2eed7180-2505-46fb- bef2-19ef77f1b2fd) Registering SURL: srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/ 2009-07-10/file273edfaa-e0ed-448c-b713-7996c2d09f88 (2eed7180-2505-46fb-bef2-19ef77f1b2fd) guid:2eed7180-2505-46fb-bef2-19ef77f1b2fd INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 26. The LFC (LCG File Catalog) Enabling Grids for E-sciencE • It keeps track of the location of copies (replicas) of Grid files • LFN acts as main key in the database. It has: – Symbolic links to it (additional LFNs) – Unique Identifier (GUID) – System metadata – Information on replicas – One field of user metadata INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 19
  • 27. LFC Features Enabling Grids for E-sciencE – Cursors for large queries – Timeouts and retries from the client – User exposed transactional API (+ auto rollback on failure) – Hierarchical namespace and namespace operations (for LFNs) – Integrated GSI Authentication + Authorization – Access Control Lists (Unix Permissions and POSIX ACLs) – Checksums – Integration with VOMS (VirtualID and VirtualGID) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 20
  • 28. lfc-ls Enabling Grids for E-sciencE Listing the entries of a LFC directory lfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--ds] path… where path specifies the LFN pathname (mandatory) – Remember that LFC has a directory tree structure – /grid/<VO_name>/<you create it> LFC Namespace Defined by the user – All members of a VO have read-write permissions under their directory – You can set LFC_HOME to use relative paths > lfc-ls /grid/gilda/tony -l : long listing > export LFC_HOME=/grid/gilda -R : list the contents of directories > lfc-ls -l tony recursively: Don’t use it! > lfc-ls -l -R /grid INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 21
  • 29. lfc-mkdir Enabling Grids for E-sciencE Creating directories in the LFC lfc-mkdir [-m mode] [-p] path... • Where path specifies the LFC pathname • Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog beforehand. • Examples: > lfc-mkdir /grid/gilda/tony/demo You can just check the directory with: > lfc-ls -l /grid/gilda/tony drwxr-xrwx 0 19122 1077 0 Jun 14 11:36 demo INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 22
  • 30. lfc-ln Enabling Grids for E-sciencE Creating a symbolic link lfc-ln -s file linkname lfc-ln -s directory linkname Create a link to the specified file or directory with linkname – Examples: > lfc-ln -s /grid/gilda/tony/demo/test /grid/gilda/tony/aLink Symbolic Original File link Let’s check the link using lfc-ls with long listing (-l): > lfc-ls -l lrwxrwxrwx 1 19122 1077 0 Jun 14 11:58 aLink ->/grid/gilda/tony/demo/test drwxr-xrwx 1 19122 1077 0 Jun 14 11:39 demo INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 23
  • 31. LFC commands Enabling Grids for E-sciencE Summary of the LFC Catalog commands lfc-chmod Change access mode of the LFC file/directory lfc-chown Change owner and group of the LFC file-directory lfc-delcomment Delete the comment associated with the file/directory lfc-getacl Get file/directory access control lists lfc-ln Make a symbolic link to a file/directory lfc-ls List file/directory entries in a directory lfc-mkdir Create a directory lfc-rename Rename a file/directory lfc-rm Remove a file/directory lfc-setacl Set file/directory access control lists lfc-setcomment Add/replace a comment INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 24
  • 32. LFC C API Enabling Grids for E-sciencE Low level methods (many POSIX-like): lfc_setacl lfc_access lfc_deleteclass lfc_listreplica lfc_setatime lfc_aborttrans lfc_delreplica lfc_lstat lfc_setcomment lfc_addreplica lfc_endtrans lfc_mkdir lfc_seterrbuf lfc_apiinit lfc_enterclass lfc_modifyclass lfc_setfsize lfc_chclass lfc_errmsg lfc_opendir lfc_starttrans lfc_chdir lfc_getacl lfc_queryclass lfc_stat lfc_chmod lfc_getcomment lfc_readdir lfc_symlink lfc_chown lfc_getcwd lfc_readlink lfc_umask lfc_closedir lfc_getpath lfc_rename lfc_undelete lfc_creat lfc_lchown lfc_rewind lfc_unlink lfc_delcomment lfc_listclass lfc_rmdir lfc_utime lfc_delete lfc_listlinks lfc_selectsrvr send2lfc INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 25
  • 33. GFAL: Grid File Access Library Enabling Grids for E-sciencE • Interactions with SE require some components: - File catalog services to locate replicas - SRM - File access mechanism to access files from the SE on the WN • GFAL does all this tasks for you: - Hides all these operations - Presents a POSIX interface for the I/O operations - Single shared library in threaded and unthreaded versions (libgfal.so, libgfal_pthr.so) - Single header file (gfal_api.h) - User can create all commands needed for storage management - It offers as well an interface to SRM • Supported protocols: - file (local or nfs-like access) - dcap, gsidcap and kdcap (dCache access) - rfio (castor access) and gsirfio (DPM) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 26
  • 34. GFAL: File I/O API (I) Enabling Grids for E-sciencE int gfal_access (const char *path, int amode); int gfal_chmod (const char *path, mode_t mode); int gfal_close (int fd); int gfal_creat (const char *filename, mode_t mode); off_t gfal_lseek (int fd, off_t offset, int whence); int gfal_open (const char * filename, int flags, mode_t mode); ssize_t gfal_read (int fd, void *buf, size_t size); int gfal_rename (const char *old_name, const char *new_name); ssize_t gfal_setfilchg (int, const void *, size_t); int gfal_stat (const char *filename, struct stat *statbuf); int gfal_unlink (const char *filename); ssize_t gfal_write (int fd, const void *buf, size_t size); INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 27
  • 35. GFAL: FC API (II) Enabling Grids for E-sciencE int gfal_closedir (DIR *dirp); int gfal_mkdir (const char *dirname, mode_t mode); DIR *gfal_opendir (const char *dirname); struct dirent *gfal_readdir (DIR *dirp); int gfal_rmdir (const char *dirname); INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 28
  • 36. GFAL: Catalog API Enabling Grids for E-sciencE int create_alias (const char *guid, const char *lfn, long long size) int guid_exists (const char *guid) char *guidforpfn (const char *surl) char *guidfromlfn (const char *lfn) char **lfnsforguid (const char *guid) int register_alias (const char *guid, const char *lfn) int register_pfn (const char *guid, const char *surl) int setfilesize (const char *surl, long long size) char *surlfromguid (const char *guid) char **surlsfromguid (const char *guid) int unregister_alias (const char *guid, const char *lfn) int unregister_pfn (const char *guid, const char *surl) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 29
  • 37. GFAL: SRM API Enabling Grids for E-sciencE int deletesurl (const char *surl) int getfilemd (const char *surl, struct stat64 *statbuf) int set_xfer_done (const char *surl, int reqid, int fileid, char *token, int oflag) int set_xfer_running (const char *surl, int reqid, int fileid, char *token) char *turlfromsurl (const char *surl, char **protocols, int oflag, int *reqid, int *fileid, char **token) int srm_get (int nbfiles, char **surls, int nbprotocols, char **protocols, int *reqid, char **token, struct srm_filestatus **filestatuses) int srm_getstatus (int nbfiles, char **surls, int reqid, char *token, struct srm_filestatus **filestatuses) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 30
  • 38. GFAL Java API Enabling Grids for E-sciencE • GFAL API are available for C/C++ programmers • We wrote a wrapper around the C APIs using Java Native Interface and a the Java APIs on top of it • More information can be found here: https://grid.ct.infn.it/twiki/bin/view/GILDA/APIGFAL INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 31
  • 39. lcg_utils DM tools Enabling Grids for E-sciencE • High level interface (CL tools and APIs) to – Upload/download files to/from the Grid (UI,CE and WN <---> SEs) – Replicate data between SEs and locate the best replica available – Interact with the file catalog • Definition: A file is considered to be a Grid File if it is both physically present in a SE and registered in the File Catalog • lcg-utils ensure the consistency between files in the Storage Elements and entries in the File Catalog INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 32
  • 40. lcg-utils commands Enabling Grids for E-sciencE Replica Management lcg-cp Copies a grid file to a local destination lcg-cr Copies a file to a SE and registers the file in the catalog lcg-del Delete one file lcg-rep Replication between SEs and registration of the replica lcg-gt Gets the TURL for a given SURL and transfer protocol lcg-sd Sets file status to “Done” for a given SURL in a SRM request File Catalog Interaction lcg-aa Add an alias in LFC for a given GUID lcg-ra Remove an alias in LFC for a given GUID lcg-rf Registers in LFC a file placed in a SE lcg-uf Unregisters in LFC a file placed in a SE lcg-la Lists the alias for a given SURL, GUID or LFN lcg-lg Get the GUID for a given LFN or SURL lcg-lr Lists the replicas for a given GUID, SURL or LFN INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 33
  • 41. Enabling Grids for E-sciencE LFC interfaces SE SE SE LCG UTIL GFAL Python LFC CLIENT LFC C API SERVER WMS DLI CLI lfc-ls, lfc-mkdir, lfc-setacl, … INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 34
  • 42. Data movement introduction Enabling Grids for E-sciencE • Grids are naturally distributed systems • The means that data also needs to be distributed – First generation data distribution mainly concentrated on copy protocols in a grid environment:  gridftp  http + mod_gridsite • But copies controlled by clients have problems… INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 43. Direct Client Controlled Enabling Grids for E-sciencE Data Movement Client Control Channels Source Storage Element Data Flow Channel Destination Storage Element • Although transport protocol may be robust, state is held inside client – inconvenient and fragile. • Client only knows about local state, no sense of global knowledge about data transfers between storage elements. – Storage elements overwhelmed with replication requests – Multiple replications of the same data can happen simultaneously – Site has little control over balance of network resources - DoS INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 44. Transfer Service Enabling Grids for E-sciencE • Clear need for a service for data transfer •Submit new request •Monitor progress – Client connects to service Client •Cancel request to submit request – Service maintains state about transfer SOAP via https – Client can periodically reconnect to check status or cancel request Transfer – Service can have Service Control knowledge of global state, not just a single request  Load balancing  Scheduling Source Storage Destination Element Storage Data Element Flow INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 45. gLite FTS: Channels Enabling Grids for E-sciencE • FTS Service has a concept of channels • A channel is a unidirectional connection between two sites • Transfer requests between these two sites are assigned to that channel • Channels usually correspond to a • Channels control certain dedicated network pipe (e.g., OPN) transfer properties: associated with production transfer concurrency, • But channels can also take gridftp streams. wildcards: – * to MY_SITE : All incoming • Channels can be – MY SITE to * : All outgoing controlled independently: started, – * to * : Catch all stopped, drained. INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 46. Why Grid needs Metadata? Enabling Grids for E-sciencE • Grids allow to save millions of files spread over several storage sites. • Users and applications need an efficient mechanism – to describe files – to locate files based on their contents • This is achieved by – associating descriptive attributes to files  Metadata is data about data – answering user queries against the associated information INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 47. Basic Metadata Concept Enabling Grids for E-sciencE • Entries – Representation of real world entities which we are attaching metadata to for describing them • Attribute – key/value pair – Type – The type (int, float, string,…) – Name/Key – The name of the attribute – Value - Value of an entry's attribute • Schema – A set of attributes • Collection – A set of entries associated with a schema • Metadata - List of attributes (including their values) associated with entries INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 40
  • 48. Example: Movie Trailers Enabling Grids for E-sciencE • Movie trailers files (entries) saved on Grid Storage Elements and registered into File Catalogue • We want to add metadata to describe movie content. • A possible schema: – Title -- varchar – Runtime -- int – Cast -- varchar – LFN -- varchar • A metadata catalogue will be the repository of the movies’ metadata and will allow to find movies satisfying users’ queries INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 49. Trailer example Enabling Grids for E-sciencE Entry names Title Ru Cast LFN 8c3315c1-811f-4823-a778-60a203439689 My Best nti Julia 80 lfn:/grid/gilda/movies/ Friend’s Roberts mybfwed.avi wedding me Kirsten 51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2 120 lfn:/grid/gilda/movies/ Dunst spiderman2.avi 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 42
  • 50. Trailer example Enabling Grids for E-sciencE Attribute Entry names Title Ru Cast LFN 8c3315c1-811f-4823-a778-60a203439689 My Best nti Julia 80 lfn:/grid/gilda/movies/ Friend’s Roberts mybfwed.avi wedding me Kirsten 51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2 120 lfn:/grid/gilda/movies/ Dunst spiderman2.avi 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 42
  • 51. Trailer example Enabling Grids for E-sciencE Schema Attribute Entry names Title Ru Cast LFN 8c3315c1-811f-4823-a778-60a203439689 My Best nti Julia 80 lfn:/grid/gilda/movies/ Friend’s Roberts mybfwed.avi wedding me Kirsten 51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2 120 lfn:/grid/gilda/movies/ Dunst spiderman2.avi 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 42
  • 52. Trailer example Enabling Grids for E-sciencE Schema Attribute Entry names Title Ru Cast LFN 8c3315c1-811f-4823-a778-60a203439689 My Best nti Julia 80 lfn:/grid/gilda/movies/ Friend’s Roberts mybfwed.avi wedding me Kirsten 51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2 120 lfn:/grid/gilda/movies/ Dunst spiderman2.avi 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi Entries INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 42
  • 53. Trailer example Enabling Grids for E-sciencE Schema Attribute Entry names Title Ru Cast LFN 8c3315c1-811f-4823-a778-60a203439689 My Best nti Julia 80 lfn:/grid/gilda/movies/ Friend’s Roberts mybfwed.avi wedding me Kirsten 51a18b7a-fd21-4b2c-aa74-4c53ee64846a Spider-man 2 120 lfn:/grid/gilda/movies/ Dunst spiderman2.avi 401e6df4-c1be-4822-958c-ce3eb5c54fcb The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi Collection /trailers Entries INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 42
  • 54. Metadata service on the Grid Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  • 55. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  • 56. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! • metadata can describe any grid entity/object – ex: JobIDs - add logging information to your jobs INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  • 57. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! • metadata can describe any grid entity/object – ex: JobIDs - add logging information to your jobs • monitoring of running applications: – ex: ongoing results from running jobs can be published on the metadata server INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  • 58. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! • metadata can describe any grid entity/object – ex: JobIDs - add logging information to your jobs • monitoring of running applications: – ex: ongoing results from running jobs can be published on the metadata server • Inputset for a storm of parametric jobs INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  • 59. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! • metadata can describe any grid entity/object – ex: JobIDs - add logging information to your jobs • monitoring of running applications: – ex: ongoing results from running jobs can be published on the metadata server • Inputset for a storm of parametric jobs • information exchanging among grid peers – ex: producers/consumers job collections: master jobs produce data to be analyzed; slave jobs query the metadata server to retrieve input to “consume” INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  • 60. Metadata service on the Grid Enabling Grids for E-sciencE • Information about files -- but not only! • metadata can describe any grid entity/object – ex: JobIDs - add logging information to your jobs • monitoring of running applications: – ex: ongoing results from running jobs can be published on the metadata server • Inputset for a storm of parametric jobs • information exchanging among grid peers – ex: producers/consumers job collections: master jobs produce data to be analyzed; slave jobs query the metadata server to retrieve input to “consume” • Simplified DB access on the grid – Grid applications that needs structured data can model their data schemas as metadata INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 43
  • 61. Monitoring of running application Enabling Grids for E-sciencE SE showing results as long as they are produced W N CE WN Metadata Catalogue /results collection WN Workload Manager Scientist/Developer Customer/ submitting jobs Scientist INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 44
  • 62. Inputset for parametric jobs Enabling Grids for E-sciencE • /grid/my_simulation/input ---------------------------------------------------------------------------------------------------- |entry     |x1        |x2        |y1        |y2        |step      |isTaken   |found     |output    | |--------------------------------------------------------------------------------------------------| |1         |9453.1    |9453.32   |-439.93   |-439.91   |0.0006    |JobID1234 |No pillars|          | |2         |9342.13   |3435      |3423      |2343.2    |0.003     |No        |          |          | |3 |34254.3 |342342 |432.43 |132 |0.002 |No | | | | ...... and so on | ---------------------------------------------------------------------------------------------------- • This collection lists all the parameter set to be run on the Grid • On the WN, one of the inputset is selected and “isTaken” is set = JOB_ID of the job that has fetched it • Results is also written in the “found” column to monitor the simulation • so users can check the simulation from a UI, querying the metadata server, or from a WebPage (using APIs for ex) • StdOutput can be copied also into the “output” text column INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 45
  • 63. A possible parameter-get.sh script Enabling Grids for E-sciencE #!/bin/bash # Find the first set of parameters that has not been taken by noone ID=`mdcli find /grid/my_simulation/input 'isTaken="No"' | head -1` # Exit if all the parameters set has been already analyzed if [ "$ID" = "" ]; then exit 1; fi # set isTaken as its JOB_ID so that no one else will analyze the same set of parameter mdcli setattr /grid/my_simulation/input/$ID isTaken `echo $GLITE_WMS_JOBID` # retrieve the set of the parameter to be scanned X1=`mdcli getattr /grid/my_simulation/input/$ID x1 | tail -1` Y1=`mdcli getattr /grid/my_simulation/input/$ID y1 | tail -1` X2=`mdcli getattr /grid/my_simulation/input/$ID x2 | tail -1` Y2=`mdcli getattr /grid/my_simulation/input/$ID y2 | tail -1` STEP=`mdcli getattr /grid/my_simulation/input/$ID step | tail -1` # Run the scan with the proper parameter and save the output to output.txt java -cp issgc_sfk_nesc.jar:sfkscanner.jar  uk.ac.nesc.toe.sfk.radar.Scanner $X1 $Y1 $X2 $Y2 $STEP > output.txt # the Scanner class returns the writing "No pillars found in this area" or "Found area:" - so this will give useful info for monitoring during the run mdcli setattr /grid/my_simulation/input/$ID found `cat output.txt | grep -i found` # save the output (and the pillar text if found) on the metadata server mdcli setattr /grid/my_simulation/input/$ID output `cat output.txt` INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 46
  • 64. Use a Metadata services to exchange data Enabling Grids for E-sciencE among running jobs INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 65. Use a Metadata services to exchange data Enabling Grids for E-sciencE among running jobs • Suppose we have two sets of jobs: – Producers: they generate a file, store on a SE, register it onto the LFC File Catalogue assigning a LFN – Consumers: they will take a LFN, download the file and elaborate it INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 66. Use a Metadata services to exchange data Enabling Grids for E-sciencE among running jobs • Suppose we have two sets of jobs: – Producers: they generate a file, store on a SE, register it onto the LFC File Catalogue assigning a LFN – Consumers: they will take a LFN, download the file and elaborate it • A Metadata collection can be used to share the information generated by the Producers; it could act as a “bag-of-LFNs” (bag-of-task model) from which Consumers can fetch file for further elaboration INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 67. Information exchanging among grid peers Enabling Grids for E-sciencE SE Producers jobs Consumers jobs put LFN W fetch LFN N W N CE WN Metadata WN Catalogue /bag-of-LFNs WN collection WN CE Workload Manager Scientist/Developer submitting jobs INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 48
  • 68. The AMGA Metadata Catalogue Enabling Grids for E-sciencE • Official metadata service for the gLite middleware – but no dependencies from gLite software – it can be used with other grid technologies/other environments • AMGA: Arda Metadata Grid Application • Provide a complete but simple interface, in order to make all users able to use it easily. • Designed with scalability in mind in order to deal with large number of entries – based on a lightweight and streamed text-based protocol, like HTTP/SMTP • Grid security is provided to grant different access levels to different users. • Flexible with support to dynamic schemas in order to serve several application domains • Simple installation by tar source, RPMs or Yum/YAIM INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 49
  • 69. AMGA Features Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 70. AMGA Features Enabling Grids for E-sciencE • Dynamic Schemas – Schemas can be modified at runtime by client  Create, delete schemas  Add, remove attributes INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 71. AMGA Features Enabling Grids for E-sciencE • Dynamic Schemas – Schemas can be modified at runtime by client  Create, delete schemas  Add, remove attributes • AMGA collections are hierarchical organized – Collections can contain sub-collections – Sub-collections can inherit/extend parent collection’ schema INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 72. AMGA Features Enabling Grids for E-sciencE • Dynamic Schemas – Schemas can be modified at runtime by client  Create, delete schemas  Add, remove attributes • AMGA collections are hierarchical organized – Collections can contain sub-collections – Sub-collections can inherit/extend parent collection’ schema • Flexible Queries – SQL-like query language – Different join type (inner, outer, left, right) between schemas are provided selectattr /gLibrary:FileName /gLAudio:Author /gLAudio:Album '/gLibrary:FILE=/gLAudio:FILE and like(/gLibrary:FileName, “%.mp3")‘ INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 73. AMGA Features Enabling Grids for E-sciencE • Dynamic Schemas – Schemas can be modified at runtime by client  Create, delete schemas  Add, remove attributes • AMGA collections are hierarchical organized – Collections can contain sub-collections – Sub-collections can inherit/extend parent collection’ schema • Flexible Queries – SQL-like query language – Different join type (inner, outer, left, right) between schemas are provided selectattr /gLibrary:FileName /gLAudio:Author /gLAudio:Album '/gLibrary:FILE=/gLAudio:FILE and like(/gLibrary:FileName, “%.mp3")‘  Support for Views, Constraints, Indexes INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 74. Example Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 75. AMGA Security Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  • 76. AMGA Security Enabling Grids for E-sciencE • Unix style permissions - users and groups INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  • 77. AMGA Security Enabling Grids for E-sciencE • Unix style permissions - users and groups • ACLs – Per-collection or per-entry (table row). INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  • 78. AMGA Security Enabling Grids for E-sciencE • Unix style permissions - users and groups • ACLs – Per-collection or per-entry (table row). • Secure client/server connections – SSL INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  • 79. AMGA Security Enabling Grids for E-sciencE • Unix style permissions - users and groups • ACLs – Per-collection or per-entry (table row). • Secure client/server connections – SSL • Client Authentication based on – Username/password – General X509 certificates (DN based) – Grid-proxy certificates (DN based) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  • 80. AMGA Security Enabling Grids for E-sciencE • Unix style permissions - users and groups • ACLs – Per-collection or per-entry (table row). • Secure client/server connections – SSL • Client Authentication based on – Username/password – General X509 certificates (DN based) – Grid-proxy certificates (DN based) • VOMS support: – VO attribute maps to defined AMGA user – VOMS Role maps to defined AMGA user – VOMS Group maps to defined AMGA group INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 52
  • 81. AMGA Implementation Enabling Grids for E-sciencE • C++ multiprocess server – Backends  Oracle, MySQL 4/5, PostgreSQL, SQLite – Front Ends TCP text streaming • High performance • Client API for C++, Java, Python, Perl, PHP SOAP (deprecated) • Interoperability • AMGA server runs on SLC3/4, • Scalability Fedora Core, Gentoo, Debian WS-DAIR Interface (new in AMGA 2.0) • WS-enable environment • Standalone Python Library implementation – Data stored on file system INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 82. AMGA Datatypes Enabling Grids for E-sciencE - Using the above datatypes you are sure that your metadata can be easily moved to all supported back- ends - If you do not care about DB portability, you can use, in principle, as entry attribute type ALL the datatypes supported by the back-end, even the more esoteric ones (PostgreSQL Network Address type or Geometric ones) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 83. Accessing AMGA from UI/WNs Enabling Grids for E-sciencE • TCP Streaming Front-end – mdcli & mdclient CLI and C++ API (md_cli.h, MD_Client.h) – Java Client API and command line mdjavaclient.sh & mdjavacli.sh (also under Windows !!) – Python and Perl Client API – PHP Client API – NEW  developed totally by the GILDA team – INFN CT – AMGA Web Interface (AMGA WI) ---NEW  Developed totally by the GILDA team – INFN CT  Based on JAVA AMGA Standard APIs  Web Application using standard as JSP Custom Tags, Servlet • SOAP Frontend (WSDL) – C++ gSOAP – AXIS (Java) – ZSI (Python) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 84. Advanced features: Metadata Replication Enabling Grids for E-sciencE • AMGA provides a replication/federation mechanisms • Motivation – Scalability – Support hundreds/thousands of concurrent users – Geographical distribution – Hide network latency – Reliability – No single point of failure – DB Independent replication – Heterogeneous DB systems – Disconnected computing – Off-line access (laptops) • Architecture – Asynchronous replication – Master-slave – writes only allowed on the master – Application level replication  Replicate Metadata commands – Partial replication – supports replication of only sub-trees of the metadata hierarchy INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 85. Metadata Replication: Use cases Enabling Grids for E-sciencE Full replication Partial replication Federation Proxy INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 86. Existing SQL DBs importing Enabling Grids for E-sciencE • Since AMGA 1.2.10, a new import feature allow to access existing DB table • Once imported into AMGA the tables from one or more DBs you want to access through AMGA, you can exploit many of the features brought to you by AMGA for your existing tables • Advantages: – your db tables can be accessed by grid users/applications, using grid authentication (VOMS proxies)/authorization with ACLs – exploiting AMGA federation features you can access several databases together from the Grid INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 58
  • 87. Set up AMGA to access your tables Enabling Grids for E-sciencE • To remember: AMGA stores its own tables in its DB backend • To access and existing DB you have 2 option:  import the tables of the DB you want to access to into AMGA DB backend  viceversa, add AMGA DB backed tables to the DB you want to access to • Use the import command by root to “mount” you table into the AMGA collection hierarchy Query> whoami >> root Query> createdir /world Query> cd /world/ Query> import world.City /world/City Query> import world.Country /world/Country Query> import world.CountryLanguage /world/CountryLanguage INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 59
  • 88. Set up AMGA to access your tables Enabling Grids for E-sciencE • Properly set up authorization on the imported tables: Query> acl_remove /world/City/ system:anyuser Query> acl_remove /world/Country system:anyuser Query> acl_add /world/ gilda:users rx Query> acl_show /world >> root rwx >> gilda:users rx >> system:anyuser rx Query> selectattr City:CountryCode City:Name 'like(City:Name, "Am%") limit 5' >> NLD >> Amsterdam >> NLD >> Amersfoort >> BRA >> Americana >> ECU >> Ambato >> IDN ‣ More information on existing DB access @: ‣ http://amga.web.cern.ch/amga/importing.html ‣ https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGADBaccess INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 60
  • 89. Native SQL Support Enabling Grids for E-sciencE • Objective: – implement native SQL query processing functionality in AMGA • Current Status: – direct SQL data statement in SQL92 Entry Level has been implemented in the 1.9 release  Including 4 statements: SELECT, DELETE, UPDATE and INSERT  ALL SQL commands should be issued in UPPERCASE • Entry name: – when a new entry is created with addentry/addentries, a name has to be assigned (filling the “file” column in the AMGA db backend)  in the INSERT implementation, it’s filled automatically with a random guid INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 61
  • 90. Enabling Grids for E-sciencE Native SQL example Query> INSERT INTO `City` VALUES (1,'Kabul','AFG','Kabol',1780000) >> Operation Success Query> dir /world/City/ >> /world/City/80b4fe646ed11dda02100304873049 >> entry Query> SELECT COUNT (*) FROM /world/City >> 3429 Query> SELECT * FROM /world/City WHERE Name LIKE '%Catani%' >> 1472 >> Catania >> ITA >> Sisilia >> 337862 Query> SELECT /world/City:Name, /world/City:District, /world/Country:Name, / world/Country:Region, /world/Country:Continent FROM /world/City, /world/Country WHERE /world/City:Name LIKE '%Catani%' AND Code = 'ITA' >> Catania >> Sisilia >> Italy >> Southern Europe >> Europe INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 62
  • 91. Biomed - MDM Enabling Grids for E-sciencE • Medical Data Manager – MDM – Store and access medical images and associated metadata on the Grid – Built on top of gLite 1.5 data management system – Demonstrated at last EGEE conference (October 05, Pisa) • Strong security requirements – Patient data is sensitive – Data must be encrypted – Metadata access must be restricted to authorized users • AMGA used as metadata server – Demonstrates authentication and encrypted access – Used as a simplified DB • More details at – https://uimon.cern.ch/twiki/bin/view/EGEE/DMEncryptedStorage INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 92. gMOD: grid Movie On Demand Enabling Grids for E-sciencE • gMOD provides a Video-On-Demand service • User chooses among a list of video and the chosen one is streamed in real time to the video client of the user’s workstation • For each movie a lot of details (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) are stored and users can search a particular movie querying on one or more attributes • Two kind of users can interact with gMOD: TrailersManagers that can administer the db of movies (uploading new ones and attaching metadata to them); GILDA VO users (guest) can browse, search and choose a movie to be streamed. INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 93. gMOD under the hood Enabling Grids for E-sciencE • Built on top of gLite services: • Storage Elements, sited in different place, physically contain the movie files • LFC, the File Catalogue, keeps track in which Storage Element a particular movie is located • AMGA is the repository of the detailed information for each movie, and makes possible queries on them • The Virtual Organization Membership Service (VOMS) is used to assign the right role to the different users • The Workload Management System (WMS) is responsible to retrieve the chosen movie from the right Storage Element and stream it over the network down to the user’s desktop or laptop INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 94. gMOD interactions Enabling Grids for E-sciencE VOMS Metadata Catalogue Storage Elements GENIUS Portal AMGA get Role LFC File Catalogue User Workload Management System W CE N W N WN INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 95. gMOD screenshot Enabling Grids for E-sciencE gMOD is accesible through the Genius Portal (https://glite-demo.ct.infn.it) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09
  • 96. gLibrary features Enabling Grids for E-sciencE • INFN-developed tool totally gLite based • It allows to store, organize, search and retrieve digital assets on a Grid environment with an intuitive front-end • What we mean by Digital Assets: INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 68
  • 97. gLibrary as the iTunes for the Grid Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 69
  • 98. Browse & Search Enabling Grids for E-sciencE • Assets can be browsed selecting a type (or category) and selecting one or more filters: – attributes of the selected types, chosen from a defined list, used to narrow the result set • Filter application is cascading and context-sensitive: the selection of a filter value dynamically influences subsequent filter values (“à la iTunes” browsing) – Classical search by description and keywords available too INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 70
  • 99. Organize assets Enabling Grids for E-sciencE • “Types” and “Categories” definition by repository providers/admins: • and/or organized • Assets are organized by by collection: type: – Group together related – a list of specific attributes to assets of different type; describe each kind of asset to – Useful also to define be managed by the system subsets of assets – hierarchical (a child type shares belonging to the same and extend parent’s attributes) type – queried during searches – Multiple category assignment per asset (tagging) Collections INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 71
  • 100. Store & Retrieve Enabling Grids for E-sciencE • Users can upload their local assets on one or more (creating replicas) Storage Elements of the Grid - Files already on grid SE can be registered in a gLibrary repository by the LFC File Catalogue browser • Download from SEs to the users’ laptop/desktop: - selection of a replica link from a list • Transfers are handled from the browser over HTTP/ HTTPS provided that users have their own X.509 Grid Certificate imported INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 72
  • 101. gLibrary architecture Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 73
  • 102. Features Enabling Grids for E-sciencE • Implemented as Web 2.0 application – AJAX and Javascript are strongly used to offer a desktop like user experience – Business logic implemented using PHP 5 OOP support EGEE-II INFSO-RI-031688 Antonio Calanducci ISGC09 Grid Tutorial, Taipei 18-04-2009 74
  • 103. Technologies used Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 75
  • 104. Technologies used Enabling Grids for E-sciencE • Web standards: – Javascript/AJAX/JSON on the client side – PHP5 classes to implement business logic on the server side INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 75
  • 105. Technologies used Enabling Grids for E-sciencE • Web standards: – Javascript/AJAX/JSON on the client side – PHP5 classes to implement business logic on the server side INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 75
  • 106. Technologies used Enabling Grids for E-sciencE • Web standards: – Javascript/AJAX/JSON on the client side – PHP5 classes to implement business logic on the server side • Grid technologies: – Storage Element SRM interface to get the TURLs (Transfer URLs) – Transfers handled with GridFTP (first release) and X.509 cert auth HTTPS – X.509 based Globus Security Infrastructure with the VOMS extensions to handle authentication and authorization (ACL based) on Metadata and Storage Elements – All grid services implemented with the EGEE gLite middleware (DPM Storage Elements, AMGA Metadata Catalogue, LFC File Catalogue, VOMS Services) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 75
  • 107. De Roberto cultural heritage Enabling Grids for E-sciencE • De Roberto, an Italian writer of the XIX/XX century, born in Naples, but spending his life in Catania, has left to the humanistic community numerous works • Those are made up of valuable and hard-to-manage pieces: manuscripts, typescripts, drafts with handwritten corrections, magazines, cuts, sketches, photos, etc. INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 76
  • 108. Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 77
  • 109. Digitalize to preserve them Enabling Grids for E-sciencE • Some sheets are damaged (mold, crumbed pieces) and need physical restoration • Digitalization to avoid the loss of this works, some of them still unpublished and relevant for the humanistic communities INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 78
  • 110. Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 79
  • 111. Acquisition stage Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 80
  • 112. Acquisition stage Enabling Grids for E-sciencE • Digitalization of manuscripts, typescripts, printed works – TIFF Files, one per page, 600 dpi, about 100MB for A3  High resolution scans for in-depth examination – PDF, one per work, 300 dpi, varying file sizes 40-400MB  Overall examination of works – 8000 sheets/scans, 3 Terabyte of disk space – Different physical formats, A3/A4/custom size INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 80
  • 113. Acquisition stage Enabling Grids for E-sciencE • Digitalization of manuscripts, typescripts, printed works – TIFF Files, one per page, 600 dpi, about 100MB for A3  High resolution scans for in-depth examination – PDF, one per work, 300 dpi, varying file sizes 40-400MB  Overall examination of works – 8000 sheets/scans, 3 Terabyte of disk space – Different physical formats, A3/A4/custom size • Embedded Metadata – TIFF with embedded metadata to provide scan physical features and information about the content  ImageWidth, ImageHeight, XResolution, FileSize, CreationDate, ModifyDate  Description, Keywords, CaptionWriter, Title, Author, Copyright Status, Copyright Notice – Added with Photoshop after the digitalization phase (Adobe XMP format) INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 80
  • 114. Goals and requirements Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 81
  • 115. Goals and requirements Enabling Grids for E-sciencE • Make those works accessible to the humanistic communities – Always on-line: 24 x 365 – Available from everywhere – Simple and easy-to-use interface for non-expert people INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 81
  • 116. Goals and requirements Enabling Grids for E-sciencE • Make those works accessible to the humanistic communities – Always on-line: 24 x 365 – Available from everywhere – Simple and easy-to-use interface for non-expert people • Quickly find the desired document – Document organization according the physical and semantic metadata Organization by type/collections Dynamic filtering of search result sets according the selection of one or more document metadata INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 81
  • 117. Goals and requirements Enabling Grids for E-sciencE • Make those works accessible to the humanistic communities – Always on-line: 24 x 365 – Available from everywhere – Simple and easy-to-use interface for non-expert people • Quickly find the desired document – Document organization according the physical and semantic metadata Organization by type/collections Dynamic filtering of search result sets according the selection of one or more document metadata • Long-term preservation (digital preservation) – Multiple copies (replicas) spread in different geographical sites – Reliability of storage systems and replica redundancy to achieve secure preservation INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 81
  • 118. What Data Grids can offer to them Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  • 119. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  • 120. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  • 121. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application • document organization for a quick search ---> Metadata Services INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  • 122. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application • document organization for a quick search ---> Metadata Services • long-term digital preservation of data ---> redundancy through Replicas of files on several Storage Elements INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  • 123. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application • document organization for a quick search ---> Metadata Services • long-term digital preservation of data ---> redundancy through Replicas of files on several Storage Elements • simple and easy-to-use system for searches, organization, upload and download of digitalized documents on the Grid INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  • 124. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application • document organization for a quick search ---> Metadata Services • long-term digital preservation of data ---> redundancy through Replicas of files on several Storage Elements • simple and easy-to-use system for searches, organization, upload and download of digitalized documents on the Grid -----> INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  • 125. What Data Grids can offer to them Enabling Grids for E-sciencE • store the 8000 scans of De Roberto Heritage ----> Data Grid Storage Elements • enable an ubiquitous and 24/24h access to scientists ---> Web Application • document organization for a quick search ---> Metadata Services • long-term digital preservation of data ---> redundancy through Replicas of files on several Storage Elements • simple and easy-to-use system for searches, organization, upload and download of digitalized documents on the Grid -----> INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 82
  • 126. Data Grids to preserve Cultural Heritage Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 83
  • 127. Live Demo Enabling Grids for E-sciencE INFSO-RI-508833 ISSGC’09, Taipei, 21st April 09 84
  • 128. Data Management Services Summary Enabling Grids for E-sciencE • Storage Element – save date and provide a common interface – Storage Resource Manager (SRM) dCache, DPM, ... – Native Access protocols rfio, dcap, nfs, … – Transfer protocols gsiftp, https, … • Catalogs – keep track where data are stored – File Catalog LCG File Catalog (LFC) – Replica Catalog – Metadata Catalog AMGA Metadata Catalogue • Data Movement – schedules reliable file transfer – File Transfer Service gLite FTS (manages physical transfers) INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 85
  • 129. References Enabling Grids for E-sciencE • gLite documentation homepage – http://glite.web.cern.ch/glite/documentation/default.asp • DM subsystem documentation – http://egee-jra1-dm.web.cern.ch/egee-jra1-dm/doc.htm • LFC and DPM documentation – https://uimon.cern.ch/twiki/bin/view/LCG/ DataManagementDocumentation INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 86
  • 130. References Enabling Grids for E-sciencE • AMGA Web Site http://cern.ch/amga • AMGA Manual http://amga.web.cern.ch/amga/downloads/amga-manual_1_3_0.pdf • AMGA API Javadoc http://amga.web.cern.ch/amga/javadoc/index.html • AMGA Web Frontend http://gilda-forge.ct.infn.it/projects/amgawi/ • AMGA Basic Tutorial https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGAHandsOn • More information on existing DB access @: – http://amga.web.cern.ch/amga/importing.html – https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGADBaccess INFSO-RI-508833 ISSGC 09, Sophia-Antipolis, 10-07-09 87
  • 131. gLibrary references Enabling Grids for E-sciencE • gLibray project homepage: – https://glibrary.ct.infn.it/ • gLibrary paper: – https://glibrary.ct.infn.it/glibrary/downloads/gLibrary_paper_v2.pdf EGEE-II INFSO-RI-031688 Antonio Calanducci ISGC09 Grid Tutorial, Taipei 18-04-2009 88