Cybersecurity Awareness Training Presentation v2024.03
Larocca
1. The gLite WMS and the
Data Management System
Giuseppe LA ROCCA
INFN Catania
giuseppe.larocca@ct.infn.it
Master Class for Life Science,
4-6 May 2010
Singapore
2. Outline
• An introduction to the gLite WMS
• Job Submission via WMS
• Command line interface
• Job status
• The Job Description Language overview
• JDL attributes
• The gLite DMS
– The Storage Resource Manager (SRM)
• Grid file referencing schemes
• LFC File Catalogue
– Architecture
– LFC commands
• File & Replica Management Client Tools
• Run bioinformatics applications via Grid portal
4. Overview of the WMS
• The Workload Management System (WMS) is the gLite 3
component that allows users to submit jobs, and performs all
tasks required to execute them, without exposing the user to the
complexity of the Grid.
• Workload Management System (WMS) comprises a set of Grid
middleware components responsible for distribution and
management of tasks across Grid resources.
– The Workload Manager (WM) aims to accept and satisfy
requests for job management coming from its clients.
• WM will pass the job to an appropriate CE for execution
taking into account requirements and the preferences
expressed in the job description.
• The decision of which resource should be used is the
outcome of a matchmaking process.
– The Logging and Bookkeeping service tracks jobs managed by
the WMS. It collects events from many WMS components and
records the status and history of the job.
5. Job Submission via WMS
GILDA User Interface
create
proxy
Grid Site
Computing Element Storage Element
VO Management
Service
(DB of VO users)
6. Job Submission via WMS
GILDA User Interface Workload Information System
Write JDL, Submit job Management
(executable) + small inputs System
query
create
proxy
publish
state
Grid Site
Computing Element Storage Element
VO Management
Service
(DB of VO users)
7. Job Submission via WMS
GILDA User Interface Workload Information System
Write JDL, Submit job Management
(executable) + small inputs System
query
create
proxy
publish
Submit job
state
Logging
Grid Site
Computing Element Storage Element
VO Management process
Service
(DB of VO users) Logging and
bookkeeping
8. Job Submission via WMS
GILDA User Interface Workload Information System
Write JDL, Submit job Management
(executable) + small inputs System
query
Retrieve status
create &
proxy (small) output files
publish
Submit job
Retrieve state
output
Job
Logging
status
Grid Site
Computing Element Storage Element
VO Management process
Service
(DB of VO users) Logging and
bookkeeping
9. The Command Line Interface
• The gLite WMS implements two different services to manage
jobs: the Network Server and the WMProxy.
– The recommended method to manage jobs is through
the gLite WMS via WMProxy, because it gives the best
performance and allows to use the most advanced
functionalities
• The WMProxy implements several
functionalities, among which:
– submission of job collections;
– faster authentication;
– faster match-making;
– faster response time for users;
– higher job throughput.
10. Proxy Delegation
To explicitly delegate a user proxy to WMProxy, the
command to use is:
glite-wms-job-delegate-proxy -d <delegID>
Example:
$ glite-wms-job-delegate-proxy -d mydelegID
Connecting to the service
https://rb102.cern.ch:7443/glite_wms_wmproxy_server
======= glite-wms-job-delegate-proxy Success ========
Your proxy has been successfully delegated to the
WMProxy:
https://rb102.cern.ch:7443/glite_wms_wmproxy_server
with the delegation identifier: mydelegID
=====================================================
11. Job Submission
Starting from a simple JDL file, we can submit it via
WMProxy by doing:
$ glite-wms-job-submit –d mydelegID test.jdl
Connecting to the service
https://rb102.cern.ch:7443/glite_wms_wmproxy_server
======== glite-wms-job-submit Success ========
The job has been successfully submitted to the WMProxy
Your job identifier is:
https://rb102.cern.ch:9000/vZKKk3gdBla6RySximq_vQ
==============================================
12. Listing CE(s) that matching a job
It is possible to see which CEs are eligible to run a job
described by a given JDL using:
$ glite-wms-job-list-match –d mydelegID test.jdl
Connecting to the service
https://rb102.cern.ch:7443/glite_wms_wmproxy_server
====================================================
COMPUTING ELEMENT IDs LIST
The following CE(s) matching your job requirements have
been found:
*CEId*
- CE.pakgrid.org.pk:2119/jobmanager-lcgpbs-cms
- grid-ce0.desy.de:2119/jobmanager-lcgpbs-cms
- gw-2.ccc.ucl.ac.uk:2119/jobmanager-sge-default
- grid-ce2.desy.de:2119/jobmanager-lcgpbs-cms
13. Retrieving the status of a job
$ glite-wms-job-status
https://rb102.cern.ch:9000/fNdD4FW_Xxkt2s2aZJeoeg
=====================================================
BOOKKEEPING INFORMATION:
Status info for the Job :
https://rb102.cern.ch:9000/fNdD4FW_Xxkt2s2aZJeoeg
Current Status: Done (Success)
Exit code: 0
Status Reason: Job terminated successfully
Destination: ce1.inrne.bas.bg:2119/jobmanager-lcgpbs-cms
Submitted: Mon Dec 4 15:05:43 2006 CET
=====================================================
The verbosity level controls the amount of information provided.
The value of the -v option ranges from 0 to 3.
The commands to get the job status can have several jobIDs as
arguments, i.e.: glite-wms-job-status <jobID1> ... or,
more conveniently, the -i <file path> option can be used to
14. Retrieving the output(s)
$ glite-wms-job-output
https://rb102.cern.ch:9000/yabp72aERhofLA6W2-LrJw
Connecting to the service
https://128.142.160.93:7443/glite_wms_wmproxy_server
=====================================================
JOB GET OUTPUT OUTCOME
Output sandbox files for the job:
https://rb102.cern.ch:9000/yabp72aERhofLA6W2-LrJw
have been successfully retrieved and stored in the
directory:
/tmp/doe_yabp72aERhofLA6W2-LrJw
=====================================================
The default location for storing the outputs (normally
/tmp) is defined in the UI configuration, but it is possible
to specify in which directory to save the output using the
--dir <path name> option.
15. Cancelling a job
$ glite-wms-job-cancel
https://rb102.cern.ch:9000/P1c60RFsrIZ9mnBALa7yZA
Are you sure you want to remove specified job(s)
[y/n]y : y
Connecting to the service
https://128.142.160.93:7443/glite_wms_wmproxy_server
========== glite-wms-job-cancel Success ============
The cancellation request has been successfully
submitted for the following job(s):
- https://rb102.cern.ch:9000/P1c60RFsrIZ9mnBALa7yZA
====================================================
If the cancellation is successful, the job will terminate in
status CANCELLED
16. Job Submission with CLI
GILDA User Interface
glite-wms-job-delegate-proxy -d delegID
glite-wms-job-list-match –d delegID hostname.jdl
delegID
glite-wms-job-submit
-d delegID hostname.jdl
JobID
glite-wms-job-status JobID
glite-wms-job-output JobID
Manage job
voms-proxy-init --voms gilda
Grid Site
Computing Element Storage Element
VO Management process
Service
(DB of VO users)
18. Job Description Language
• The Job Description Language (JDL) is a high-level
language based on the Classified Advertisement
(ClassAd) language, used to describe jobs and
aggregates of jobs with arbitrary dependency relations.
– The JDL is used in WLCG/EGEE to specify the desired
job characteristics and constraints, which are taken
into account by the WMS to select the best resource
to execute the job.
– A job description is a file (called JDL file) consisting
of lines having the format: attribute = expression;
– Expressions can span several lines, but only the last
one must be terminated by a semicolon.
19. Job Description Language
• The character “ ‘ ” cannot be used in the JDL.
• Comments must be preceded by a sharp character
(#) or a double slash (//) at the beginning if each
line.
• Multi-line comments must be enclosed between “/
*” and “*/” .
Attention! The JDL is sensitive to blank characters and
tabs. No blank characters or tabs should follow the
semicolon at the end of a line.
20. Simple JDL example
Executable = "/bin/hostname";
StdOutput = "std.out";
StdError = "std.err";
The Executable attribute specifies the command to be
run by the job. If the command is already present on
the WN, it must be expressed as a absolute path; if it
has to be copied from the UI, only the file name must
be specified, and the path of the command on the UI
should be given in the InputSandbox attribute.
Executable = "test.sh";
InputSandbox = {"/home/larocca/test.sh"};
StdOutput = "std.out";
StdError = "std.err";
21. • The Arguments attribute can contain a string value,
which is taken as argument list for the executable:
Arguments = "fileA 10";
• In the Executable and in the Arguments attributes it
may be necessary to use special characters, such as
&, , |, >, <. These characters should be preceded by
triple in the JDL, or specified inside quoted strings
e.g.: Arguments = "-f file1&file2";
• The shell environment of the job can be modified using
the Environment attribute.
Environment = {"CMS_PATH=$HOME/cms"};
22. • If files have to be copied from the UI to the execution
node, they must be listed in the InputSandbox
attribute: InputSandbox = {"test.sh", ... ,"fileN"};
• The files to be transferred back to the UI after the job
is finished can be specified using the OutputSandbox
attribute: OutputSandbox = {"std.out","std.err"};
• Wildcards are allowed only in the InputSandbox
attribute.
• Absolute paths cannot be specified in the
OutputSandbox attribute.
• The InputSandbox cannot contain two files with the
same name, even if they have a different absolute
path, as when transferred they would overwrite each
other.
23. • The Requirements attribute can be used to express
constraints on the resources where the job should run.
– Its value is a Boolean expression that must
evaluate to true for a job to run on that specific CE.
• Note: Only one Requirements attribute can be specified
(if there are more than one, only the last one is
considered). If several conditions must be applied to
the job, then they all must be combined in a single
Requirements attribute.
• For example, let us suppose that the user wants to run
on a CE using PBS as batch system, and whose WNs
have at least two CPUs. He will write then in the job
description file:
Requirements = other.GlueCEInfoLRMSType ==
"PBS" && other.GlueCEInfoTotalCPUs > 1;
24. • The WMS can be also asked to send a job to a particular queue
in a CE with the following expression:
Requirements = other.GlueCEUniqueID ==
"lxshare0286.cern.ch:2119/jobmanager-pbs-short";
• It is also possible to use regular expressions when expressing
a requirement.
– Let us suppose for example that the user wants all his
jobs to run on any CE in the domain cern.ch. This can be
achieved putting in the JDL file the following
expression:
Requirements =
RegExp("cern.ch",other.GlueCEUniqueID);
– The opposite can be required by using:
Requirements =
(!RegExp("cern.ch", other.GlueCEUniqueID));
25. • If the job must run on a CE where a particular
experiment software is installed and this information is
published by the CE, something like the following must
be written:
Requirements = Member(“BLAST-1.0.3”,
other.GlueHostApplicationSoftwareRunTimeEnvironment);
Note: The Member operator is used to test if its first argument
(a scalar value) is a member of its second argument (a list).
In fact, the GlueHostApplicationSoftwareRunTimeEnvironment
attribute is a list of strings and is used to publish any VO-
specific information relative to the CE (typically, information
on the VO software available on that CE).
26. Advanced job types
• Job Collection: a set of independent jobs that user can
submit and monitor as it was a single job
[
Type = “Collection";
nodes={ [
Executable = "/bin/hostname";
Arguments = “-f";
StdOutput = "hostname.out";
StdError = "hostname.err";
OutputSandbox = {"hostname.err","hostname.out"};
],[
Executable = "/bin/sh";
Arguments = "start_povray_valve.sh";
StdOutput = “povray.out";
StdError = “povray.err";
InputSandbox = {“start_povray_valve.sh"};
OutputSandbox = {“povray.err",“povray.out"};
Requirements = Member (“POVRAY-3,5”,
other.GlueHostApplicationSoftwareRunTimeEnvironment);
] };
]
27. Advanced job types
• Parametric Job: a job collection where the jobs are identical
but for the value of a running parameter
JobType = "Parametric";
Executable = “/bin/echo";
Arguments = “_PARAM_”;
StdOutput = "myoutput_PARAM_.txt";
StdError = "myerror_PARAM_.txt";
Parameters = 3;
ParameterStep = 1;
ParameterStart = 1;
OutputSandbox = {“myoutput_PARAM_.txt”};
28. Advanced job types
• DAG is a set of jobs where the input, output, or execution of
one or more jobs depends on one or more other ones
• The jobs are nodes (vertices) in the graph
Type = "dag";
• the edges (arcs) identify the dependencies
max_nodes_running = 5;
InputSandbox = {"/tmp/foo/*.exe", "/home/larocca/bar", "gsiftp://neo.datamat.it:5678/tmp/cms_sim.exe ", "file:///tmp/myconf"};
InputSandboxBaseURI = "gsiftp://matrix.datamat.it:5432/tmp";
nodes = [
nodeA = [ description = [
JobType = "Normal";
Executable = "a.exe";
InputSandbox = { "/home/larocca/myfile.txt", root.InputSandbox};
];
];
nodeF = [ description = [
JobType = "Normal";
Executable = "b.exe";
Arguments = "1 2 3"; nodeA
OutputSandbox = {"myoutput.txt", "myerror.txt" };
];
];
nodeD = [ description = [
JobType = "Checkpointable";
Executable = "b.exe";
Arguments = "1 2 3"; nodeB nodeC NodeF
InputSandbox = { "file:///home/larocca/data.txt",
root.nodes.nodeF.description.OutputSandbox[0] };
];
];
nodeC = [ file = "/home/larocca/nodec.jdl"; ];
nodeB = [ file = "foo.jdl"; ];
]; nodeD
dependencies = { { nodeA, nodeB }, { nodeA, nodeC }, {nodeA, nodeF }, { { nodeB, nodeC, nodeF }, nodeD } };
31. Storage Elements
• The Storage Element is the service which allows a user or an
application to store/retrieve data for future retrieval.
• The DMS provides services to locate, access and transfer files
– User does not need to know the physical location of file, just its
logical file name;
– Files can be replicated or transferred to several locations (SEs) as
needed;
– Files are shared with all the members of the given VO.
• Files stored in a SE are written-once, read-many
– Files cannot be changed unless remove or replaced;
32. Protocols
– The GSIFTP protocol offers the functionalities of FTP, but
with support for GSI. It is responsible for secure, fast and
efficient file transfers to/from Storage Elements.
– RFIO was developed to access tape archiving systems, such
as CASTOR (CERN Advanced STORage manager) and it
comes in a secure and an insecure version.
– The gsidcap protocol is the GSI enabled version of the
dCache native access protocol, dcap.
33. Types of Storage Elements /1
• In WLCG/EGEE, different types of Storage Elements are
available:
• CASTOR. It consists in a disk buffer frontend to a tape
mass storage system. A virtual file system (namespace)
shields the user from the complexities of the disk and
tape underlying setup. File migration between disk and
tape is managed by a process called “stager”. The
native storage protocol, the insecure RFIO, allows
access of files in the SE. Since the protocol is not GSI-
enabled, only RFIO access from a location in the same
LAN of the SE is allowed. With the proper modifications,
the CASTOR disk buffer can be used also as disk-only
storage system.
34. Types of Storage Elements /2
• StoRM. It has been designed to support space
reservation and direct access (native POSIX I/O call),
as well as other standard libraries (like RFIO).
• StoRM takes advantage from high performance parallel
file systems like GPFS (from IBM).
– In addition, standard POSIX file systems are supported
(XFS from SGI and ext3).
• StoRM takes advantage of ACL support provided by the
underlying file systems to implement the security
models
35. Types of Storage Elements /3
• dCache. It consists of a server and one or more pool
nodes. The server represents the single point of access
to the SE and presents files in the pool disks under a
single virtual file system tree. Nodes can be
dynamically added to the pool. The native gsidcap
protocol allows POSIX-like data access. dCache is
widely employed as disk buffer frontend to many mass
storage systems, like HPSS and Enstore, as well as a
disk-only storage system.
• LCG Disk pool manager. It’s a lightweight disk pool
manager, suitable for relatively small sites (max 10 TB
of total space). Disks can be added dynamically to the
pool at any time. Like in dCache and CASTOR, a virtual
file system hides the complexity of the disk pool
architecture. The secure RFIO protocol allows file
access from the WAN.
37. The Storage Resource Manager
The Storage Resource Manager (SRM) has been
designed to be the single interface for the
management of disk and tape storage resources.
Any type of Storage Element in WLCG/EGEE offers
an SRM interface except for the Classic SE, which
is being phased out.
SRM hides the complexity of the resources setup
behind it and allows the user to request files,
keep them on a disk buffer for a specified lifetime,
reserve space for new entries, and so on.
– In gLite, interactions with the SRM is hidden
by high level services (DM tools and APIs)
39. Grid file referencing schemes
LFN GUID SURL TURL
• Logical File Name (LFN)
– lfn:/grid/gilda/tutorials/input-file
• Grid Unique IDentifier (GUID)
– guid:4d57edef-fa5c-4512-a345-1c838916b357
• Storage URL (for a specific replica, on a specific Storage
Element)
– srm://aliserv6.ct.infn.it/gilda/generated/2007-11-13/file
b366f371-b2c0-485d-b12c-c114edaf4db4
– sfn://se01.athena.hellasgrid.gr/data/dteam/doe/file1
• Transport URL (for a specific replica, on an SE, with a specific
protocol)
– gsiftp://aliserv6.ct.infn.it/gilda/generated/2007-11-13/fil
eb366f371-b2c0-485d-b12c-c114edaf4db4
41. Needles in a haystack
• How do I keep track of all files I have on Grid ?
• How does the Grid keep track of the mapping
between LFN(s), GUID and SURL(s) ?
LFC File Catalogue
LFC = LCG File Catalogue
LCG = LHC Compute Grid
LHC = Large Hadron Collider
• The LCG File Catalogue is the service which
maintains mappings between LFN(s), GUID
and SURL(s).
42. LFC File Catalogue
• It consists of a unique catalogue, where the LFN is the
main key. Further LFNs can be added as symlinks to the
main LFN.
– Looks like a “top-level” directory in the Grid
– For each of the supported VO a separate subdirectory
does exist under “/grid” directory
– All the members of the VO have read/write
permissions
– System metadata are supported, while for user
metadata only a single string entry is available
• The catalogue publishes its endpoint in the Information
Service so that it can be discovered by Data
Management tools and other services (the WMS for
example).
43. Architecture of the LFC Catalogue
• LFN acts as main key in the database.
It has:
– Symbolic links to it (additional LFNs)
– System metadata
– Information on replicas
– One field of user metadata
– Access Control Lists
– Integration with VOMS
(VirtualID and VirtualGID)
– C API language
44. Before to start..
• User can interact with the file catalogue through CLIs
and APIs.
– The environment variable LFC_HOST
(e.g.: LFC_HOST=lfc-gilda.ct.infn.it)
must contains the host name of the LFC server
to be used.
• The directory structure of the LFC namespace has the
form: /grid/<VO>/<subpaths>
– Users of a given VO will have read and write
permissions only under the corresponding
<VO> subdirectory.
45. LFC Commands
lfc-chmod Change access mode of the LFC file/directory
lfc-chown Change owner and group of the LFC file/directory
lfc-delcomment Delete the comment associated with the
file/directory
lfc-getacl Get file/directory access control lists
lfc-ln Make a symbolic link to a file/directory
lfc-ls List file/directory entries in a directory
lfc-mkdir Create a directory
lfc-rename Rename a file/directory
lfc-rm Remove a file/directory
lfc-setacl Set file/directory access control lists
lfc-setcomment Add/replace a comment
46. lfc-ls
• Listing the entries of a LFC directory
– lfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--
ds] path…
– where path specifies the LFN pathname (mandatory)
– Remember that LFC has a directory tree structure
– /grid/<VO_name>/<you create it>
LFC Namespace Defined by the user
– All members of a VO have read-write permissions under
their directory
– You can set LFC_HOME to use relative paths
lfc-ls /grid/gilda/tutorials/taipei02
export LFC_HOME=/grid/gilda/tutorials
lfc-ls -l taipei02
lfc-ls -l -R /grid
47. lfc-mkdir
• Creating directories in the LFC
– lfc-mkdir [-m mode] [-p] path...
• Where path specifies the LFC pathname
• Remember that while registering a new file (using lcg-
cr, for example) the corresponding destination
directory must be created in the catalog beforehand.
• Examples:
lfc-mkdir /grid/gilda/<YOUR_DIRECTORY>
Created by the user
48. lfc-ln
• Creating a symbolic link
– lfc-ln -s file linkname
– lfc-ln -s directory linkname
– Create a link to the specified file or directory with
linkname
Examples:
– lfc-ln -s /grid/gilda/test /grid/gilda/aLink
Original File Symbolic Link
Let’s check the link using lfc-ls with long listing
– lfc-ls -l aLink
lrwxrwxrwx 1 19122 1077 0 Jun 14 11:58 aLink -
> /grid/gilda/test
49. Access Control List (ACL)
• LFC allows to attach to a file or directory an access control list
(ACL), a list of permissions which specify who is allowed to
access or modify it. The permissions are very much like those of
a UNIX file system: read (r), write (w) and execute (x).
• In LFC, users and groups are internally identified as numerical
virtual uids and virtual gids, which are virtual in the sense that
they exist only in the LFC namespace.
– A user can be specified as a name, as a virtual uid or as a
DN.
– A group can be specified as name, as a virtual gid or as a
VOMS FQAN.
• A directory in LFC has also a default ACL (which is the ACL
associated to any file or directory being created under that
directory). After creation, the ACLs can be freely changed.
– When creating a sub-directory, its default ACL is inherited
from the parent directory
50. Print the ACL of a directory
$ lfc-getacl /grid/gilda/tutorials/test-acl
# file: /grid/gilda/tutorials/test-acl
# owner: /C=IT/O=INFN/OU=Personal
Certificate/L=Catania/CN=Giuseppe La Rocca/Email=
giuseppe.larocca@ct.infn.it
# group: gilda
user::rwx
group::rwx #effective:rwx
other::r-x
default:user::rwx
default:group::rwx
default:other::r-x
In this example, the owner and all users in the gilda group
have full privileges to the directory, while other users cannot
write into it.
51. Modify the ACL
lfc-setacl [-d] [-m] [-s] acl_entries path
The -m option means that we are modifying the existing
ACL. Other options of lfc-setacl are -d to remove ACL
entries, and -s to replace the complete set of ACL
entries.
acl_entries is a coma separated list of entries. Each entry
has colon separated fields: ACL type, id (uid or gid),
permission. Only directories can have default ACL
entries!
The entries look like: user::perm defaul::user:perm
user:uid:perm defaul::user:uid:perm
group:perm defaul::group:perm
group:gid:perm defaul::group:gid:perm
mask:perm default::mask:perm
other:perm deafult::other:perm
52. Modify the ACL of a directory
Let's change default ACL, with read/write
permission for user and group, and no privileges
for others.
– The syntax we apply here is modify (-m)
default (d:) for user (u:), and the same of
course for group and others.
$ lfc-setacl -m d::u:6,d::g:6,d::o:0
$LFC_HOME/test-acl/
53. Adding metadata information
The lfc-setcomment and lfc-delcomment commands allow the
user to associate a comment with a catalogue entry and delete
such comment. This is the only user-defined metadata that
can be associated with catalogue entries.
The comments for the files may be listed using the --comment
option of the lfc-ls command. This is shown in the following
example:
$ lfc-setcomment /grid/gilda/file1 “My metadata“
$ lfc-ls --comment /grid/gilda/file1
/grid/gilda/file1 My metadata
54. LCG Data Management Client Tools
• The LCG Data Management tools allow users to copy files between
UI, WN and a SE, to register entries in the file catalogue and
replicate files between SEs.
lcg-cp Copies a Grid file to a local destination
lcg-cr Copies a file to a SE and registers it in the catalogue
lcg-del Deletes one file (either one replica or all the replicas)
lcg-rep Copies a file from one SE to another SE and registers it
in the catalogue
lcg-gt Gets the TURL for a given SURL and transfer protocol
lcg-aa Adds an alias in the catalogue for a given GUID
lcg-ra Removes an alias in the catalogue for a given GUID
lcg-rf Registers in the catalogue a file residing on a SE
lcg-uf Unregisters in the catalogue a file residing on a SE
lcg-la Lists the aliases for a given LFN, GUID or SURL
lcg-lg Gets the GUID for a given LFN or SURL
lcg-lr Lists the replicas for a given LFN, GUID or SURL
55. Environment variables /1
• The --vo <vo name> option, to specify the virtual
organisation of the user, is present in all commands,
except for lcg-gt. Its usage is mandatory unless the
variable LCG_GFAL_VO is set (e.g.: export
LCG_GFAL_VO=gilda)
Timeouts
The commands lcg-cr, lcg-del, lcg-gt, lcg-rf, lcg-sd and
lcg-rep all have timeouts implemented.
By using the option -t, the user can specify a number of
seconds for the timeout.
The default is 0 seconds, that is no timeout.
If we got a times out during the performing of an
operation, all actions performed till that moment are
rolled back, so no broken files are left on a SE and no
existing files are not registered in the catalogues.
56. Environment variables /2
• For all lcg-* commands to work, the environment
variable LCG_GFAL_INFOSYS must be set to point to a
top BDII in the format <hostname>:<port>, so that
the commands can retrieve the necessary information
export LCG_GFAL_INFOSYS=gilda-bdii.ct.infn.it:2170
• The VO_<VO>_DEFAULT_SE variable specifies the
default SE for the VO.
export VO_GILDA_DEFAULT_SE=aliserv6.ct.infn.it
57. Uploading a file to the Grid /1
$ lcg-cr --vo gilda -d aliserv6.ct.infn.it
file:/home/larocca/file1
guid:6ac491ea-684c-11d8-8f12-9c97cebf582a
where the only argument is the local file to be
uploaded and the -d <destination> option
indicates the SE used as the destination for the
file. The command returns the file GUID.
If no destination is given, the SE specified by the
VO_<VO>_DEFAULT_SE environmental variable is taken.
The -P option allows the user to specify a relative path
name for the file in the SE. If no -P option is given, the
relative path is automatically generated.
58. Uploading a file to the Grid /2
The following are examples of the different ways to
specify a destination:
-d aliserv6.ct.infn.it
-d srm://aliserv6.ct.infn.it/data/gilda/my_file
-d aliserv6.ct.infn.it -P my_dir/my_file
The –l <lfn> option can be used to specify a LFN:
$ lcg-cr --vo gilda -d aliserv6.ct.infn.it
-l lfn:/grid/gilda/myalias1
file:/home/larocca/file1
guid:db7ddbc5-613e-423f-9501-3c0c00a0ae24
59. Replicating a file
$ lcg-rep -v --vo gilda -d <SECOND_SE>
guid:db7ddbc5-613e-423f-9501-3c0c00a0ae24
Source URL:
sfn://aliserv6.ct.infn.it/data/gilda/larocca/file1
File size: 30
Destination specified: <SECOND_SE>
Source URL for copy:
gsiftp://aliserv6.ct.infn.it/data/gilda/larocca/file1
Destination URL for copy:
gsiftp://<SECOND_SE>/data/gilda/generated/2004-07-09/
file50c0752c-f61f-4bc3-b48e-af3f22924b57
# streams: 1
Transfer took 2040 ms
Destination URL registered in LRC:
srm://<SECOND_SE>/data/gilda/generated/2004-07-09/fi
le50c0752c-f61f-4bc3-b48e-af3f22924b57
60. Listing replicas
$ lcg-lr --vo gilda
lfn:/grid/gilda/tutorials/larocca/my_alias1
srm://aliserv6.ct.infn.it/data/gilda/generated/2004-07
-09/file79aee616-6cd7-4b75-8848-f091
srm://<SECOND_SE>/data/gilda/generated/2004-07-08/file
0dcabb46-2214-4db8-9ee8-2930
Again, a LFN, the GUID or a SURL can be used to specify
the file.
61. Copying files out the Grid
$ lcg-cp --vo gilda -t 100 -v
lfn:/grid/gilda/tutorials/mytext.txt
file:/tmp/mytext.txt
Source URL: lfn:/grid/gilda/mytext.txt
File size: 104857600
Source URL for copy:
gsiftp://aliserv6.ct.infn.it:/storage/gilda/2007-07-06/
input2.dat.10.0
Destination URL: file:///tmp/myfile
# streams: 1
# set timeout to 100 (seconds)
85983232 bytes 8396.77 KB/sec avg 9216.11
Transfer took 12040 ms
62. Deleting replicas /1
A file stored on a SE and registered in LFC can be
deleted using the lcg-del command.
• If a SURL is provided as argument, then that
particular replica will be deleted.
• If a LFN or GUID is given instead then the –s <SE>
option must be used to indicate which one of the
replicas must be erased
$ lcg-del --vo gilda -s aliserv6.ct.infn.it
guid:91b89dfe-ff95-4614-bad2-c538bfa28fac
63. Deleting replicas /2
• If the –a option is used, all the replicas of the given file
will be deleted and unregistered from the catalog.
$ lcg-del --vo gilda -a
guid:91b89dfe-ff95-4614-bad2-c538bfa28fac
64. Registering Grid files
The lcg-rf command allows to register a file physically
present in a SE, creating a GUID-SURL mapping in the
catalogue.
The -g <GUID> option allows to specify a GUID (otherwise
automatically created).
$ lcg-rf --vo gilda
-l lfn:/grid/gilda/newfile
srm://aliserv6.ct.infn.it/data/gilda/generated/2004-0
7 08/file0dcabb46-2214-4db8-9ee8-2930de1
guid:baddb707-0cb5-4d9a-8141-a046659d243b
65. Unregistering Grid files
lcg-uf allows to delete a GUID-SURL mapping
(respectively the first and second argument of the
command) from the catalogue:
$ lcg-uf --vo gilda
guid:baddb707-0cb5-4d9a-8141-a046659d243b
srm://aliserv6.ct.infn.it/data/gilda/generated/2004-
07 08/file0dcabb46-2214-4db8-9ee8-2930de1
If the last replica of a file is unregistered, the
corresponding GUID-LFN mapping is also removed.
Attention!
lcg-uf just removes entries from the catalogue.
66. Working with large data datasets
• The InputSandbox and OutputSandbox attributes are the
basic way to move files to and from the User Interface
(UI) and the Worker Node (WN).
• However, there are other ways to move files to and from
the WN especially when large files (> 10 MB) are involved
67. “User Input “sandbox”
DataSets info
interface”
Output “sandbox”
WMS LCG File
In
pu
Catalogue (LFC)
t“
san
Ou
db
tp
ut
o
x”
“sa
+B
n
db
ro
ox
erk
”
In
fo
Storage Computing
Element 2 Element
68. References
• gLite 3 User Guide – Manual Series
– https
://edms.cern.ch/file/722398/1.3/gLite-3-UserGuide.pdf
• gLite Documentation homepage
– http://glite.web.cern.ch/glite/documentation/default.as
• DM subsystem documentation
– http://egee-jra1-dm.web.cern.ch/egee-jra1-dm/doc.htm
• LFC and DPM documentation
– https://uimon.cern.ch/twiki/bin/view/LCG/DataManage
• DM API
– http://www.euasiagrid.org/wiki/index.php
/Data_Management_Java_API
69. Running more realistic jobs
with the GENIUS Grid portal:
Porting “BLAST” & “MrBayes”
applications to Grid
Case study from
CNR - ITB
70. The GENIUS Grid Portal architecture
www.enginframe.com
www.nice-italy.com
www.infn.it
• The GENIUS Grid portal (license ver 4.2 is free for educational)
is built on top of the EnginFrame Java/XML framework;
• It’s a gateway to European EGEE Project middleware (it’s
easily customizable for other middleware);
• It allows to expose gLite-enabled applications via web browser
as well as Web Services.
71. What is EnginFrame ?
• It is a web-based technology able to expose Grid
services running on Grid infrastructures
• It allows organizations to provide application-oriented
computing and data services to both users (via Web
browsers) and applications (via SOAP/WSDL and/or
RSS)
• It’s a Grid gateway!!
• It greatly simplifies the development of Web Portals
exposing computing services that can run on a broad
range of different computational Grid systems
72. About MrBayes
• MrBayes is a program for the Bayesian estimation of
phylogeny.
• Bayesian inference of phylogeny is based on the posterior
probability distribution of trees, which is the probability of
a tree conditioned on the observations.
– To approximate the posterior probability distribution of
trees MrBayes uses a simulation technique called
Markov Chain Monte Carlo (or MCMC).
• The program takes as input a character matrix in a NEXUS
file format.
• The output is several files with the parameters that were
sampled by the MCMC algorithm.
• The application is CPU demanding, especially if the MPI
version of the software is used.
76. About BLAST
BLAST (Basic Local Alignment Search Tool) provides a
method for rapid searching of nucleotide and protein
databases.
The program compares nucleotide or protein sequences to
sequence databases and calculates the statistical
significance of matches.
Click here to download results