SlideShare une entreprise Scribd logo
1  sur  73
Workflow tutorial @
    ISSGC’09
          Gergely Sipos
           MTA SZTAKI
         sipos@sztaki.hu

    EGEE Training and Induction
  EGEE Application Porting Support


   www.lpds.sztaki.hu/gasuc
    www.portal.p-grade.hu
                                     1
It’s already Day 10…




                       2
Agenda of the morning


9-10:30 – Lecture room
• Introduction to workflow systems and problems
• P-GRADE Portal as an implementation with demo

Break

11-12:30 – Computer room
• Hands-on: workflows, parameter studies
• Further information and next steps




                                                  3
Many of my slides were taken from

•   Abu Zafar Abbasi
•   Peter Kacsuk
•   Johan Montagnat
•   Tristan Glatard
•   Ewa Deelman




                                           4
Workflow

  The automation of a business process, in whole or
  part, during which documents, information or tasks
  are passed from one participant to another for
  action, according to a set of procedural rules to
  achieve, or contribute to, an overall business goal.
                       Workflow Reference Model, 19/11/1998


                                  www.wfmc.org

• Workflow management system (WFMS) is the software
  that does it


                                                              5
Why use workflows in Grid?

• Build distributed applications through
  orchestration of multiple services
     • A single job or a single service is good for nothing…
• Integration of multiple teams involved
     • Collaborative work
• Unit of reusage
     • (E-)science requires traceable, repetable analysis
• (Typically) ease of use grids
     • Graphical representation


                                                               6
Grid Workflow definition examples


Grid workflow can be defined as the composition of grid
application services which execute on heterogeneous
and distributed resources in a well-defined order to
accomplish a specific goal.
                                              R. Buyya

The automation of the processes, which involves the
orchestration of a set of Grid services, agents and actors
that must be combined together to solve a problem or to
define a new service.
                                Geoffrey Fox [GGF 10]


                                                             7
Example: Ultra-short range weather
         forecast with P-GRADE Portal
                           Forecasting dangerous
                           weather situations (storms,
                           fog, etc.), crucial task in the
                           protection of life and property
              25 x
                              Processed information:
                              surface level
                              measurements, high-
10 x
               25 x   5x      altitude measurements,
                              radar, satellite, lightning,
                              results of previous
                              computed models

                                 Requirements:
                                 •Execution time < 10 min
                                 •High resolution (1km)

                                Execution on a GT2 based
                                Hungarian Grid

                                                             8
Example: Montage workflow with
                                                      Pegasus (and DAGMan)




                                                                                     Tasks run on NSF’s TeraGrid

   Montage application
   ~7,000 compute jobs in instance
   ~10,000 nodes in the executable
   workflow
   same number of clusters as
   processors
   speedup of ~15 on 32 processors




Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems, Ewa Deelman, Gurmeet Singh, Mei-
Hui Su, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi, G. Bruce Berriman, John Good, Anastasia Laity,
Joseph C. Jacob, Daniel S. Katz, Scientific Programming Journal, Volume 13, Number 3, 2005                                   9
Example: CancerGrid workflow
                      with gUSE (and WS-PGRADE)



        1
                 N=20e-30e, M=100  ~2.7 billion tasks !!!
    x                                                NxM

    1                                                            CancerGrid
    1                                                              Portal
            x          x        x
            N          N        N               Nx
                                                M
                                                           NxM
     1            N        N         N
Generator job                   Generator job
                                                     NxM



                   Workflow is hidden from end users
                Tasks run on Desktop Grids and RDBMS

http://www.cancergrid.eu/                                                     10
Grid WFMS




Source: Jia Yu and Rajkumar Buyya: A Taxonomy of Workflow Management Systems for Grid Computing,
Journal of Grid Computing, Volume 3, Numbers 3-4 / September, 2005                                 11
What does a typical Grid WFMS provide?
• A level of abstraction above grid processes
   – gridftp, lcg-cr, lfc-mkdir, ...
   – condor-submit, globus-job-run, glite-wms-job-submit, ...
   – lcg-infosites, ...
• A level of abstraction above „legacy processes”
   – SQL read/write
   – HTTP file transfer
   – ...
• Automated mapping and execution of tasks grid resources
   –   Submission of jobs
   –   Invocation of (Web) services
   –   Manage data
   –   Catalog intermediate and final data products
• Improve successful application execution
• Improve application performance
• Provide provenance tracking capabilities
                                                                12
What does a typical grid
 workflow consist of?

               • Dataflow graph
               • Activities
                  – Definition of Jobs
                  – Specification of services
               • Data channels
                  – Data transfer
                  – Coordination
               • Cyclic (DAG) /acyclic
               • Conditional statements




                                           13
Data lifecycle in workflows

                                     Metadata Catalogs                                                         Workflow Creation

                                                          Data Discovery


Workflow Reuse
                                                                                                            Component Libraries


                                               al
                                              d




                                                                                   D ata
                                   anc ata an
                                          chiv
                                      e Ar




                                                                                         A nalysi
                            Pro rived D




                                                          Data Lifecycle
                                                    in a Workflow Environment
                               v en




      Provenance Catalogs




                                                                                                 s Setup
                              De




                                                                                                           Workflow Template Libraries


                                                                                                               Workflow Mapping and
                                                        Data Processing                                        Execution




            Data Movement Services                                              Data Replica Catalogs

                                                      Software Catalogs

                                                                                                                                         14
User interaction

                                   Metadata Catalogs                                                         Workflow Creation

                                                        Data Discovery
             Storages,
Workflow Reuse                                                      WF definition tools
              Catalogs                                                                                    Component Libraries


                                             al
                                            d




                                                                                 D ata
                                 anc ata an
                                        chiv
                                    e Ar




                                                                                       A nalysi
                          Pro rived D




                                                        Data Lifecycle
                                                  in a Workflow Environment
                             v en




    Provenance Catalogs




                                                                                               s Setup
                            De




                                                                                                         Workflow Template Libraries


                                                                                                             Workflow Mapping and
                                            WF enactment
                                                      Data Processing                                        Execution

                                              service
          Data Movement Services                                              Data Replica Catalogs

                                                    Software Catalogs

                                                                                                                                       15
Layered architecture of WFMS

 Abstract Workflow

                Results




                               A decision system that develops
         WF optimizer          strategies for reliable and efficient
     e.g. Pegasus Mapper       execution in a variety of environments

         WF scheduler          Reliable and scalable execution of
     e.g. Condor DAGMan        dependent tasks

        Grid scheduler         Reliable, scalable execution of
     e.g. Condor Schedd        independent tasks (locally, across
                               the network), priorities, scheduling


Cyberinfrastructure: Cluster, Condor pool, OSG, EGEE, TeraGrid
                                                                        16
(Some of the) available grid
                 workflow systems
           http://www.gridworkflow.org
Categories for
    – Composition tools
    – Description languages
         • Scientific
         • Industrial
         • Formalism
    – Engines
Some relevant tools for ARC, gLite, Globus, UNICORE grid users
• Condor DAGMan
    – Used as an enactor in P-GRADE Portal, Pegasus, …
    – Uses DAGMan WF language (DAG = Directed Acyclic Graph)
•   MOTEUR
    – Interfaced with “pilot job” framework on EGEE (pull style job execution)
    – Uses SCUFL WF language
•   gLite WMS
    – Describe workflows in JDL
    – Share Input-Output sandboxes with multiple jobs
•   Taverna
    – Mainly for cluster computing
    – ARC interface is available by Lubeck University
•   …
                                                                                 17
12/3/06
                         Workflow sharing:   18



                          MyExperiment




http://www.myexperiment.org/                      18
12/3/06
                         Workflow sharing:   19



                          MyExperiment




http://www.myexperiment.org/                      19
Current and Future Research
•   Workflow provenance
     – Reproducability, traceability  trust in vitro simulations
•   Flexibility
     – Views at various level: end user, application developer, grid operator, ...
•   Information sources
     – Heterogenities, inconsistencies
•   Automation
     – Manual vs. Automated workflow design; reasoning and planning
     – Semantics for operations and data
•   Interoperability
     – Reusability of applications
     – Complex workflow built from multiple sources
     – Standards vs future requirements
•   Collaborative usage
     – Versioning
     – Change management
•   Adaptive computing
     – Workflow refinement adapts to changing execution environment
     – Optimizing execution in multi-dimensional requirement spaces
     – Long-lived workflows


                                                                                     20
P-GRADE Portal

    A Grid WFMS

www.portal.p-grade.hu


                        21
Short History of P-GRADE portal

• Parallel Grid Application Development
  Environment
• Initial development started in the Hungarian
  SuperComputing Grid project in 2003
• It has been continuously developed since 2003
      • Around 30 manyear development + training + user support
• Detailed information: http://portal.p-grade.hu/
• Open Source community development since
  January 2008:
  https://sourceforge.net/projects/pgportal/

• Current version: 2.8

                                                                  22
Current P-GRADE Portal
                 related projects
• GGF GIN (Since 2006)
  – Providing the GIN Resource Testing portal
• EU EGEE-II, EGEE-III (2006-2010)
  – Tool recommended for application development
  – Intensively used in new users’ training
• EU SEE-GRID-SCI (2008-2010)
  – Interfacing to DSpace-based workflow storage
  – Infrastructure testing workflows
• EU CancerGrid (2007-2009)
  – Development of new generation P-GRADE (gUSE
    and WS-PGRADE)
  – Integration with desktop grids
• EU EDGeS (2008-2009)
  – Transparent access to Desktop Grid systems
                                                   23
Portal installations

P-GRADE Portal services:
– SEE-GRID infrastructure
– Several VOs of EGEE:
  •   Biomed, Astronomy, Central European, NA4,...
– GILDA: Training VO of EGEE
– Many national Grids (UK National Grid Service,
  HunGrid, Turkish Grid, etc.)
– US Open Science Grid, TeraGrid
– OGF Grid Interoperability Now (GIN) VO
– …

Portal services and account request:
http://portal.p-grade.hu/index.php?m=3&s=0
Account request form on portal login page

                                                     24
Multi-Grid portal installation:
www.lpds.sztaki.hu/multi-grid




                                  25
Design principles of P-GRADE portal

• P-GRADE Portal is not only a user interface, it is a
   –   General purpose
   –   Workflow-level
   –   Multi-Grid
   –   Application Development and Execution Environment
• P-GRADE Portal includes a high-level middleware layer for
  orchestrating jobs on grid resources
   – inside a grid
   – among several different grids (and several VOs)
• P-GRADE Portal is grid-neutral:
   – Unlike many existing grid portals it is not tailored to any particular
     grid type
   – Can be connected to various grids based on different grid
     middleware
        • LCG-2, gLite, GT2, GT4, ARC, Unicore, etc.
   – Implements the high-level grid middleware services on top of the
     existing grid middleware services
   – The workflow interface is the same no matter which type of grid is
     connected to it

                                                                              26
What is a P-GRADE Portal workflow?

• A directed acyclic
  graph where
   – Nodes represent jobs
     (batch programs to be
     executed on a computing
     element)
   – Ports represent
     input/output files the jobs
     expect/produce
   – Arcs represent file transfer
     operations

• semantics of the
  workflow:
   – A job can be executed if
     all of its input files are
     available


                                            27
Three levels of parallelism
                                  Multiple instances of the
                                  same workflow process
                                    different data files



– Job level: Parallel execution
inside a workflow node (MPI
job as workflow component)


– Workflow level: Parallel
execution among workflow
nodes (WF branch parallelism)




 – PS workflow level:
 Parameter study execution
                                        Multiple jobs run     Each job can be a
 of the workflow
                                             parallel         parallel program



                                                                                  28
Example: Computational
     Chemistry

               Department of Chemistry, University of
               Perugia

     ~100
               SOLUTION OF SCHRODINGER EQUATION
 independent   FOR TRIATOMIC SYSTEMS USING TIME-
    jobs to    DEPENDENT (RWAVEPR) OR TIME
      run      INDEPENDENT (ABC) METHOD

               A single execution can be between 5
               hours and 10 hours

               Many simulations at the same time

               SEQUENTIAL FORTRAN 90



                                                        29
Typical user scenario
               Job compilation phase




         UPLOAD JOB
          SOURCE(S)
                         Portal
                         server
                                            Grid
Client                   COMPILE – EDIT   services


            DOWNLOAD
            BINARI(ES)




                                                     30
Typical user scenario
              Workflow development phase




                SAVE
              WORKFLOW
                            Portal
                            server
                                             Grid
  Client                                   services
                                 IMPORT
                                WORKFLOW
                 START
                 EDITOR
OPEN & EDIT
WORKFLOW
   ADD                    DSpace WF
 BINARIES                 repository

                                                      31
Typical user scenarios
                 Workflow execution phase
   MyProxy
                                     TRANSFER FILES,
   Certificate
                                     SUBMIT JOBS
    servers
                 DOWNLOAD
                 PROXY
                 CERTIFICATES


                                             MONITOR
            VISUALIZE                         JOBS
             JOBS and           Portal
            WORKFLOW            server
            PROGRESS                                      Grid
Client                                                  services
                                             DOWNLOAD
                                              (SMALL)
                                              RESULTS

                          DOWNLOAD
                           (SMALL)
                           RESULTS



                                                                   32
Accessing local and remote files
Use legacy executables with Grid files without touching the code
                                                              Grid
                                                            services

                                          LOCAL INPUT         Storage
                                             FILES           elements
             LOCAL INPUT                                     and File
                                               &
                FILES                                        catalogs
                                          EXECUTABLES
                  &              Portal
            EXECUTABLES          server                 REMOTE    REMOTE
                                                         INPUT    OUTPUT
                                                         FILES     FILES
                    LOCAL
                    OUTPUT                 LOCAL
                     FILES                 OUTPUT
                                            FILES           Computing
                                                             elements
                              Only the
                             permanent
                                files!

                                                                        33
P-GRADE Portal structural overview


                                 Java Webstart
              Web browser
                                 workflow editor



  DSpace           Extended DAGMan                          Globus GIIS
 repository         WF specification                         gLite BDII




                 Extended DAGMan
                  Globus and gLite
              command line clients + scripts




   EGEE, Globus (and ARC) Grid services + MyProxy service
           (gLite WMS, LFC,…; Globus GRAM, …)

                                                                          34
Web interface - Portlets




                           35
Email notifications




NOTIFY




                               36
Workflow portlet



WORKFLOW EDITOR




                             37
Graphical workflow editing


•   To define a graph:
    –   Drag & drop components:
        jobs and ports
    –   Define their properties
    –   Connect ports by
        channels
        (no cycles, no loops)

        System generates JDL for
        each job automatically




                                             38
Workflow Editor
   Properties of a job




                    Properties of a job:
                    • Executable file
                    • Type of executable
                      (Sequential / Parallel)
                    • Command line parameters
                    • Which resource to use?
                        • Which VO?
                        • Broker or Computing
                          element?




                                                39
Workflow Editor
 Defining input-output files


                           File properties
                 Type:
                  input: the executable reads
                  output: the executable generates
                 File type:
                  local: comes from my desktop
                  remote: comes from an SE
                 File:
                     location of the file
                 Internal file name:
                    Executable uses this
                    e.g. fopen(“file.in”, …)
                 File storage type (output files only):
                  Permanent: final result
                  Volatile: temp. data channel




                                                      40
How to refer to an I/O file?

             Input file                                     Output file
                                     Local file
•   Client side location:                        •   Client side location:
    c:experiments11-04.dat                         result.dat

•   LFC logical file name                        •   LFC logical file name
    (LFC file catalog is required – EGEE VOs)
    lfn:/grid/gilda/sipos/11-04.dat                  (LFC file catalog is required – EGEE VOs)
                                                     lfn:/grid/gilda/sipos/11-04_-_result.dat
•   GridFTP address (in Globus
    Grids):                                      •   GridFTP address (in Globus
    gsiftp://somengshost.ac.uk/mydir/11-04.dat       Grids):
                                                     gsiftp://somengshost.ac.uk/mydir/result.dat



                                   Remote file
                                                                                              41
Upload a workflow from client side
       or from FTP server



 UPLOAD




                STORED on FTP server




                                       42
Importing an application




INCOMPLETE WORKFLOW  Open it in editor and save it again




                                                            43
Import a workflow from DSpace
          repository




                                44
External access to DSpace
http://pgrade-dspace.sztaki.hu




                                 45
Certificate and proxy
management Portlet




                        46
OGF GIN interoperability portal by P-GRADE
Acccessing Globus, gLite and ARC based grids/VOs simultaneously

                                                       Proxy 1


                        P-GRADE
                      P-GRADE
                       GEMLCA
                        portal
                        Portal
  Proxy 6

                                                                 Proxy 2




   Proxy 5              GEMLCA
                       Repository


                                                       Proxy 3


    Proxy 4

                                                                     47
Application execution




                        48
Fault-tolerant execution

• Utilizing
   – Condor DAGMan’s rescue mechanism
   – EGEE job resubmission mechanism of WMS
• If the EGEE broker leaves a job stuck in a
  CEs’ queue, the portal automatically
   – kills the job on this site and
   – resubmits the job to the broker by prohibiting this
     site.
• As a result
   – the portal guarantees the correct submission of a
     job as long as there exists at least one matching
     resource
   – job submission is reliable even in an unreliable grid
                                                             49
Information system visualization




                                   50
LFC-SE file browser portlet




                              51
Compilation support




                      52
WORKFLOW DEMO




                53
From workflows to
parameter studies
Advanced execution patterns




                              54
Scaling up a workflow to a
                       parameter study


                                          Complete
                                          workflow




            P-GRADE Portal:
     Files in the same LFC catalog
(e.g. /grid/gilda/sipos/myinputs)


                     P-GRADE Portal:
                    Results produced in
                     the same catalog
                                                     55
Advanced parameter studies
Initial                                      Generator
input                                      component(s)
 data


                                                     Complete
           Generate or                               workflow
          cut input into
          smaller pieces

                                                                  Collector
                                                                component(s)




               P-GRADE Portal:
        Files in the same LFC catalog
   (e.g. /grid/gilda/sipos/myinputs)


                              P-GRADE Portal:
                             Results produced in          Aggregate
                              the same catalog              result
                                                                               56
Concept of parameter study
                          workflows



                  GEN


Generator part
  generates the
input parameter
                                     SEQ
                                    SEQ
     space                         SEQ
                                  SEQ


              Parameter
              study part


                                                 COLL
                           Collector part
                            evaluates and
                        integrates the results

                                                        57
Turning a WF into a parameter study



                     By switching at least one
                      of the open input ports
                     into a “PS Input port” the
                        WF is turned into a
                         Parameter Study




                                                  58
Input-output files are stored in SEs
/grid/gilda/sipos/InputImages   /grid/gilda/sipos/XCoordinates     /grid/gilda/sipos/YCoordinates
     Image.0                         XCoordinate.0                      YCoordinate.0
     Image.1                         XCoordinate.1                      YCoordinate.1




     2x2x2=8
   execution of the
   whole workflow

 CROSS PRODUCT
   of data items




                                               /grid/gilda/sipos/Output
                                                    ImagePart.0
                                                    ImagePart.1
                                                    ...
                                                                                                    59
Typical data-flow compositions

       CROSS ITERATOR                     DOT ITERATOR                   MATCH ITERATOR

      {A1, A2, A3} {B1, B2, B3}         {A1, A2, A3} {B1, B2, B3}         {A1, A2, A3} {B1, B2, B3}

cross iterator:        X          dot iterator:
                                  one-to-one
                                                                    match iterator        M
all-to-all
                  Activity / WF                   Activity / WF                      Activity / WF



                  A1        B1                    A1        B1                       Ai        Bj
                  A2        B2                    A2        B2
                                                                              If Ai and Bj have a
                  A3        B3                    A3        B3
                                                                              common ancestor

                  AXB                             A         B                        AMB
            P-GRADE Portal                         Find these in TAVERNA, MOTEUR
             supports this



                                                                                                     60
PS Input Port



              Grid
           Directory
           instead of
              FILE
           reference




                        61
Parameter generator

              Generator can be
              attached to any
              parameter input port

              Generator can be
              • Auto generator: to
              generate text files
              • Custom generator: to
              generate any content

              Generated files are
              moved into SE by the
              portal




                                       62
Definition Window of Auto Generator Job


                            User defines the template
                            of the text file

                            User puts key(s) into the
                            template

                            User defines values for
                            the key(s)
                            • Integer number
                            • Real number
                            • Custom set
                            •…




                                                        63
Placement of result




                      64
Placement of result


                                            Use the
                                            default
                                             value!


Will contain one
compressed file for
each execution of
the workflow.                               Choose a
                                            „reliable”
                                             Storage
                                            Element




                                                         65
Executing PS workflows




    PS Details for
  parameter sweep
workflows applications

                             66
Detailed view of a PS workflow




 Generator job(s)


Overall statistics of
workflow instances


Workflow instances



 Collector job(s)
                                                         67
PARAMETER STUDY
WORKFLOW DEMO




                  68
Learn once, use everywhere
Develop once, execute anywhere


         Thank you!

     www.portal.p-grade.hu
    pgportal@lpds.sztaki.hu




                                 69
Backup slides to answer
      questions




                          70
Proxy delegations
                                                            Proxy based
MyProxy                                                    authentication
          Proxy               VOMS
 server
                              server

                  Proxy            VOMS ext.

                                       Proxy
     username
     password
                          P-GRADE
                            Portal
                            server             VOMS ext.       GILDA
                                                Proxy         services
            username
            password




              Login & psw
                 based
             authentication
                                                                         71
Settings




           Portal administrator
             can
              – connect the portal
                to several grids
              – register default
                resources of the
                connected grids




                                     72
Settings




     User can customize the
       connected grids by
       adding and removing
       resources




                              73

Contenu connexe

Tendances

Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
Marco Cattaneo "Event data processing in LHCb"
Marco Cattaneo "Event data processing in LHCb"Marco Cattaneo "Event data processing in LHCb"
Marco Cattaneo "Event data processing in LHCb"Yandex
 
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Saliya Ekanayake
 
Making Big Data Analytics Interactive and Real-­Time
 Making Big Data Analytics Interactive and Real-­Time Making Big Data Analytics Interactive and Real-­Time
Making Big Data Analytics Interactive and Real-­TimeSeven Nguyen
 
Bft mr-clouds-of-clouds-discco2012 - navtalk
Bft mr-clouds-of-clouds-discco2012 - navtalkBft mr-clouds-of-clouds-discco2012 - navtalk
Bft mr-clouds-of-clouds-discco2012 - navtalkPedro (A. R. S.) Costa
 
Scientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceScientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceAngelo Corsaro
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingSaliya Ekanayake
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Inc.
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyShital Kat
 
Open repository 2011_duracloud-final
Open repository 2011_duracloud-finalOpen repository 2011_duracloud-final
Open repository 2011_duracloud-finalMark Diggory
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...Bikash Chandra Karmokar
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkSteve Loughran
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305mjfrankli
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09Hiroshi Ono
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012Amazon Web Services
 
[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnnNAVER D2
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA Taiwan
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...NVIDIA Taiwan
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Intel® Software
 

Tendances (20)

Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
Marco Cattaneo "Event data processing in LHCb"
Marco Cattaneo "Event data processing in LHCb"Marco Cattaneo "Event data processing in LHCb"
Marco Cattaneo "Event data processing in LHCb"
 
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
 
Making Big Data Analytics Interactive and Real-­Time
 Making Big Data Analytics Interactive and Real-­Time Making Big Data Analytics Interactive and Real-­Time
Making Big Data Analytics Interactive and Real-­Time
 
Bft mr-clouds-of-clouds-discco2012 - navtalk
Bft mr-clouds-of-clouds-discco2012 - navtalkBft mr-clouds-of-clouds-discco2012 - navtalk
Bft mr-clouds-of-clouds-discco2012 - navtalk
 
Scientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceScientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution Service
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and Benchmarking
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop Technology
 
Open repository 2011_duracloud-final
Open repository 2011_duracloud-finalOpen repository 2011_duracloud-final
Open repository 2011_duracloud-final
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
 

Similaire à Workflow tutorial overview and agenda

RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
Performance Analysis of Grid Workflows in K-WfGrid and ASKALON
Performance Analysis of Grid Workflows in K-WfGrid and ASKALONPerformance Analysis of Grid Workflows in K-WfGrid and ASKALON
Performance Analysis of Grid Workflows in K-WfGrid and ASKALONHong-Linh Truong
 
Distributed Database practicals
Distributed Database practicals Distributed Database practicals
Distributed Database practicals Vrushali Lanjewar
 
Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The BoxIan Foster
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...TigerGraph
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudAdianto Wibisono
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectMatthew Gerring
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Workflowsim escience12
Workflowsim escience12Workflowsim escience12
Workflowsim escience12Weiwei Chen
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumbergerinside-BigData.com
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
 

Similaire à Workflow tutorial overview and agenda (20)

RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
Performance Analysis of Grid Workflows in K-WfGrid and ASKALON
Performance Analysis of Grid Workflows in K-WfGrid and ASKALONPerformance Analysis of Grid Workflows in K-WfGrid and ASKALON
Performance Analysis of Grid Workflows in K-WfGrid and ASKALON
 
Distributed Database practicals
Distributed Database practicals Distributed Database practicals
Distributed Database practicals
 
GRID COMPUTING
GRID COMPUTINGGRID COMPUTING
GRID COMPUTING
 
Computing Outside The Box
Computing Outside The BoxComputing Outside The Box
Computing Outside The Box
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
Session19 Globus
Session19 GlobusSession19 Globus
Session19 Globus
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Workflowsim escience12
Workflowsim escience12Workflowsim escience12
Workflowsim escience12
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 

Plus de ISSGC Summer School

Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future ISSGC Summer School
 
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake EdlundSession 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake EdlundISSGC Summer School
 
Session 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in EuropeSession 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in EuropeISSGC Summer School
 
Session 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteSession 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteISSGC Summer School
 
Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management ISSGC Summer School
 
Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical ISSGC Summer School
 
Session 37 - Intro to Workflows, API's and semantics
Session 37 - Intro to Workflows, API's and semantics Session 37 - Intro to Workflows, API's and semantics
Session 37 - Intro to Workflows, API's and semantics ISSGC Summer School
 
Session 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an exampleSession 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an exampleISSGC Summer School
 
Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction ISSGC Summer School
 
Session 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLiteSession 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLiteISSGC Summer School
 
General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school ISSGC Summer School
 

Plus de ISSGC Summer School (20)

Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future
 
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake EdlundSession 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
 
Session 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in EuropeSession 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in Europe
 
Integrating Practical2009
Integrating Practical2009Integrating Practical2009
Integrating Practical2009
 
Session 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteSession 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky Note
 
Departure
DepartureDeparture
Departure
 
Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management
 
Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical
 
Session 42 - GridSAM
Session 42 - GridSAMSession 42 - GridSAM
Session 42 - GridSAM
 
Session 37 - Intro to Workflows, API's and semantics
Session 37 - Intro to Workflows, API's and semantics Session 37 - Intro to Workflows, API's and semantics
Session 37 - Intro to Workflows, API's and semantics
 
Session 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an exampleSession 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an example
 
Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction
 
Session 36 - Engage Results
Session 36 - Engage ResultsSession 36 - Engage Results
Session 36 - Engage Results
 
Session 23 - Intro to EGEE-III
Session 23 - Intro to EGEE-IIISession 23 - Intro to EGEE-III
Session 23 - Intro to EGEE-III
 
Session 33 - Production Grids
Session 33 - Production GridsSession 33 - Production Grids
Session 33 - Production Grids
 
Social Program
Social ProgramSocial Program
Social Program
 
Session29 Arc
Session29 ArcSession29 Arc
Session29 Arc
 
Session 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLiteSession 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLite
 
Session 23 - gLite Overview
Session 23 - gLite OverviewSession 23 - gLite Overview
Session 23 - gLite Overview
 
General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school
 

Dernier

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 

Dernier (20)

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 

Workflow tutorial overview and agenda

  • 1. Workflow tutorial @ ISSGC’09 Gergely Sipos MTA SZTAKI sipos@sztaki.hu EGEE Training and Induction EGEE Application Porting Support www.lpds.sztaki.hu/gasuc www.portal.p-grade.hu 1
  • 3. Agenda of the morning 9-10:30 – Lecture room • Introduction to workflow systems and problems • P-GRADE Portal as an implementation with demo Break 11-12:30 – Computer room • Hands-on: workflows, parameter studies • Further information and next steps 3
  • 4. Many of my slides were taken from • Abu Zafar Abbasi • Peter Kacsuk • Johan Montagnat • Tristan Glatard • Ewa Deelman 4
  • 5. Workflow The automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules to achieve, or contribute to, an overall business goal. Workflow Reference Model, 19/11/1998 www.wfmc.org • Workflow management system (WFMS) is the software that does it 5
  • 6. Why use workflows in Grid? • Build distributed applications through orchestration of multiple services • A single job or a single service is good for nothing… • Integration of multiple teams involved • Collaborative work • Unit of reusage • (E-)science requires traceable, repetable analysis • (Typically) ease of use grids • Graphical representation 6
  • 7. Grid Workflow definition examples Grid workflow can be defined as the composition of grid application services which execute on heterogeneous and distributed resources in a well-defined order to accomplish a specific goal. R. Buyya The automation of the processes, which involves the orchestration of a set of Grid services, agents and actors that must be combined together to solve a problem or to define a new service. Geoffrey Fox [GGF 10] 7
  • 8. Example: Ultra-short range weather forecast with P-GRADE Portal Forecasting dangerous weather situations (storms, fog, etc.), crucial task in the protection of life and property 25 x Processed information: surface level measurements, high- 10 x 25 x 5x altitude measurements, radar, satellite, lightning, results of previous computed models Requirements: •Execution time < 10 min •High resolution (1km) Execution on a GT2 based Hungarian Grid 8
  • 9. Example: Montage workflow with Pegasus (and DAGMan) Tasks run on NSF’s TeraGrid Montage application ~7,000 compute jobs in instance ~10,000 nodes in the executable workflow same number of clusters as processors speedup of ~15 on 32 processors Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems, Ewa Deelman, Gurmeet Singh, Mei- Hui Su, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi, G. Bruce Berriman, John Good, Anastasia Laity, Joseph C. Jacob, Daniel S. Katz, Scientific Programming Journal, Volume 13, Number 3, 2005 9
  • 10. Example: CancerGrid workflow with gUSE (and WS-PGRADE) 1 N=20e-30e, M=100  ~2.7 billion tasks !!! x NxM 1 CancerGrid 1 Portal x x x N N N Nx M NxM 1 N N N Generator job Generator job NxM Workflow is hidden from end users Tasks run on Desktop Grids and RDBMS http://www.cancergrid.eu/ 10
  • 11. Grid WFMS Source: Jia Yu and Rajkumar Buyya: A Taxonomy of Workflow Management Systems for Grid Computing, Journal of Grid Computing, Volume 3, Numbers 3-4 / September, 2005 11
  • 12. What does a typical Grid WFMS provide? • A level of abstraction above grid processes – gridftp, lcg-cr, lfc-mkdir, ... – condor-submit, globus-job-run, glite-wms-job-submit, ... – lcg-infosites, ... • A level of abstraction above „legacy processes” – SQL read/write – HTTP file transfer – ... • Automated mapping and execution of tasks grid resources – Submission of jobs – Invocation of (Web) services – Manage data – Catalog intermediate and final data products • Improve successful application execution • Improve application performance • Provide provenance tracking capabilities 12
  • 13. What does a typical grid workflow consist of? • Dataflow graph • Activities – Definition of Jobs – Specification of services • Data channels – Data transfer – Coordination • Cyclic (DAG) /acyclic • Conditional statements 13
  • 14. Data lifecycle in workflows Metadata Catalogs Workflow Creation Data Discovery Workflow Reuse Component Libraries al d D ata anc ata an chiv e Ar A nalysi Pro rived D Data Lifecycle in a Workflow Environment v en Provenance Catalogs s Setup De Workflow Template Libraries Workflow Mapping and Data Processing Execution Data Movement Services Data Replica Catalogs Software Catalogs 14
  • 15. User interaction Metadata Catalogs Workflow Creation Data Discovery Storages, Workflow Reuse WF definition tools Catalogs Component Libraries al d D ata anc ata an chiv e Ar A nalysi Pro rived D Data Lifecycle in a Workflow Environment v en Provenance Catalogs s Setup De Workflow Template Libraries Workflow Mapping and WF enactment Data Processing Execution service Data Movement Services Data Replica Catalogs Software Catalogs 15
  • 16. Layered architecture of WFMS Abstract Workflow Results A decision system that develops WF optimizer strategies for reliable and efficient e.g. Pegasus Mapper execution in a variety of environments WF scheduler Reliable and scalable execution of e.g. Condor DAGMan dependent tasks Grid scheduler Reliable, scalable execution of e.g. Condor Schedd independent tasks (locally, across the network), priorities, scheduling Cyberinfrastructure: Cluster, Condor pool, OSG, EGEE, TeraGrid 16
  • 17. (Some of the) available grid workflow systems http://www.gridworkflow.org Categories for – Composition tools – Description languages • Scientific • Industrial • Formalism – Engines Some relevant tools for ARC, gLite, Globus, UNICORE grid users • Condor DAGMan – Used as an enactor in P-GRADE Portal, Pegasus, … – Uses DAGMan WF language (DAG = Directed Acyclic Graph) • MOTEUR – Interfaced with “pilot job” framework on EGEE (pull style job execution) – Uses SCUFL WF language • gLite WMS – Describe workflows in JDL – Share Input-Output sandboxes with multiple jobs • Taverna – Mainly for cluster computing – ARC interface is available by Lubeck University • … 17
  • 18. 12/3/06 Workflow sharing: 18 MyExperiment http://www.myexperiment.org/ 18
  • 19. 12/3/06 Workflow sharing: 19 MyExperiment http://www.myexperiment.org/ 19
  • 20. Current and Future Research • Workflow provenance – Reproducability, traceability  trust in vitro simulations • Flexibility – Views at various level: end user, application developer, grid operator, ... • Information sources – Heterogenities, inconsistencies • Automation – Manual vs. Automated workflow design; reasoning and planning – Semantics for operations and data • Interoperability – Reusability of applications – Complex workflow built from multiple sources – Standards vs future requirements • Collaborative usage – Versioning – Change management • Adaptive computing – Workflow refinement adapts to changing execution environment – Optimizing execution in multi-dimensional requirement spaces – Long-lived workflows 20
  • 21. P-GRADE Portal A Grid WFMS www.portal.p-grade.hu 21
  • 22. Short History of P-GRADE portal • Parallel Grid Application Development Environment • Initial development started in the Hungarian SuperComputing Grid project in 2003 • It has been continuously developed since 2003 • Around 30 manyear development + training + user support • Detailed information: http://portal.p-grade.hu/ • Open Source community development since January 2008: https://sourceforge.net/projects/pgportal/ • Current version: 2.8 22
  • 23. Current P-GRADE Portal related projects • GGF GIN (Since 2006) – Providing the GIN Resource Testing portal • EU EGEE-II, EGEE-III (2006-2010) – Tool recommended for application development – Intensively used in new users’ training • EU SEE-GRID-SCI (2008-2010) – Interfacing to DSpace-based workflow storage – Infrastructure testing workflows • EU CancerGrid (2007-2009) – Development of new generation P-GRADE (gUSE and WS-PGRADE) – Integration with desktop grids • EU EDGeS (2008-2009) – Transparent access to Desktop Grid systems 23
  • 24. Portal installations P-GRADE Portal services: – SEE-GRID infrastructure – Several VOs of EGEE: • Biomed, Astronomy, Central European, NA4,... – GILDA: Training VO of EGEE – Many national Grids (UK National Grid Service, HunGrid, Turkish Grid, etc.) – US Open Science Grid, TeraGrid – OGF Grid Interoperability Now (GIN) VO – … Portal services and account request: http://portal.p-grade.hu/index.php?m=3&s=0 Account request form on portal login page 24
  • 26. Design principles of P-GRADE portal • P-GRADE Portal is not only a user interface, it is a – General purpose – Workflow-level – Multi-Grid – Application Development and Execution Environment • P-GRADE Portal includes a high-level middleware layer for orchestrating jobs on grid resources – inside a grid – among several different grids (and several VOs) • P-GRADE Portal is grid-neutral: – Unlike many existing grid portals it is not tailored to any particular grid type – Can be connected to various grids based on different grid middleware • LCG-2, gLite, GT2, GT4, ARC, Unicore, etc. – Implements the high-level grid middleware services on top of the existing grid middleware services – The workflow interface is the same no matter which type of grid is connected to it 26
  • 27. What is a P-GRADE Portal workflow? • A directed acyclic graph where – Nodes represent jobs (batch programs to be executed on a computing element) – Ports represent input/output files the jobs expect/produce – Arcs represent file transfer operations • semantics of the workflow: – A job can be executed if all of its input files are available 27
  • 28. Three levels of parallelism Multiple instances of the same workflow process different data files – Job level: Parallel execution inside a workflow node (MPI job as workflow component) – Workflow level: Parallel execution among workflow nodes (WF branch parallelism) – PS workflow level: Parameter study execution Multiple jobs run Each job can be a of the workflow parallel parallel program 28
  • 29. Example: Computational Chemistry Department of Chemistry, University of Perugia ~100 SOLUTION OF SCHRODINGER EQUATION independent FOR TRIATOMIC SYSTEMS USING TIME- jobs to DEPENDENT (RWAVEPR) OR TIME run INDEPENDENT (ABC) METHOD A single execution can be between 5 hours and 10 hours Many simulations at the same time SEQUENTIAL FORTRAN 90 29
  • 30. Typical user scenario Job compilation phase UPLOAD JOB SOURCE(S) Portal server Grid Client COMPILE – EDIT services DOWNLOAD BINARI(ES) 30
  • 31. Typical user scenario Workflow development phase SAVE WORKFLOW Portal server Grid Client services IMPORT WORKFLOW START EDITOR OPEN & EDIT WORKFLOW ADD DSpace WF BINARIES repository 31
  • 32. Typical user scenarios Workflow execution phase MyProxy TRANSFER FILES, Certificate SUBMIT JOBS servers DOWNLOAD PROXY CERTIFICATES MONITOR VISUALIZE JOBS JOBS and Portal WORKFLOW server PROGRESS Grid Client services DOWNLOAD (SMALL) RESULTS DOWNLOAD (SMALL) RESULTS 32
  • 33. Accessing local and remote files Use legacy executables with Grid files without touching the code Grid services LOCAL INPUT Storage FILES elements LOCAL INPUT and File & FILES catalogs EXECUTABLES & Portal EXECUTABLES server REMOTE REMOTE INPUT OUTPUT FILES FILES LOCAL OUTPUT LOCAL FILES OUTPUT FILES Computing elements Only the permanent files! 33
  • 34. P-GRADE Portal structural overview Java Webstart Web browser workflow editor DSpace Extended DAGMan Globus GIIS repository WF specification gLite BDII Extended DAGMan Globus and gLite command line clients + scripts EGEE, Globus (and ARC) Grid services + MyProxy service (gLite WMS, LFC,…; Globus GRAM, …) 34
  • 35. Web interface - Portlets 35
  • 38. Graphical workflow editing • To define a graph: – Drag & drop components: jobs and ports – Define their properties – Connect ports by channels (no cycles, no loops) System generates JDL for each job automatically 38
  • 39. Workflow Editor Properties of a job Properties of a job: • Executable file • Type of executable (Sequential / Parallel) • Command line parameters • Which resource to use? • Which VO? • Broker or Computing element? 39
  • 40. Workflow Editor Defining input-output files File properties Type: input: the executable reads output: the executable generates File type: local: comes from my desktop remote: comes from an SE File: location of the file Internal file name: Executable uses this e.g. fopen(“file.in”, …) File storage type (output files only): Permanent: final result Volatile: temp. data channel 40
  • 41. How to refer to an I/O file? Input file Output file Local file • Client side location: • Client side location: c:experiments11-04.dat result.dat • LFC logical file name • LFC logical file name (LFC file catalog is required – EGEE VOs) lfn:/grid/gilda/sipos/11-04.dat (LFC file catalog is required – EGEE VOs) lfn:/grid/gilda/sipos/11-04_-_result.dat • GridFTP address (in Globus Grids): • GridFTP address (in Globus gsiftp://somengshost.ac.uk/mydir/11-04.dat Grids): gsiftp://somengshost.ac.uk/mydir/result.dat Remote file 41
  • 42. Upload a workflow from client side or from FTP server UPLOAD STORED on FTP server 42
  • 43. Importing an application INCOMPLETE WORKFLOW  Open it in editor and save it again 43
  • 44. Import a workflow from DSpace repository 44
  • 45. External access to DSpace http://pgrade-dspace.sztaki.hu 45
  • 47. OGF GIN interoperability portal by P-GRADE Acccessing Globus, gLite and ARC based grids/VOs simultaneously Proxy 1 P-GRADE P-GRADE GEMLCA portal Portal Proxy 6 Proxy 2 Proxy 5 GEMLCA Repository Proxy 3 Proxy 4 47
  • 49. Fault-tolerant execution • Utilizing – Condor DAGMan’s rescue mechanism – EGEE job resubmission mechanism of WMS • If the EGEE broker leaves a job stuck in a CEs’ queue, the portal automatically – kills the job on this site and – resubmits the job to the broker by prohibiting this site. • As a result – the portal guarantees the correct submission of a job as long as there exists at least one matching resource – job submission is reliable even in an unreliable grid 49
  • 51. LFC-SE file browser portlet 51
  • 54. From workflows to parameter studies Advanced execution patterns 54
  • 55. Scaling up a workflow to a parameter study Complete workflow P-GRADE Portal: Files in the same LFC catalog (e.g. /grid/gilda/sipos/myinputs) P-GRADE Portal: Results produced in the same catalog 55
  • 56. Advanced parameter studies Initial Generator input component(s) data Complete Generate or workflow cut input into smaller pieces Collector component(s) P-GRADE Portal: Files in the same LFC catalog (e.g. /grid/gilda/sipos/myinputs) P-GRADE Portal: Results produced in Aggregate the same catalog result 56
  • 57. Concept of parameter study workflows GEN Generator part generates the input parameter SEQ SEQ space SEQ SEQ Parameter study part COLL Collector part evaluates and integrates the results 57
  • 58. Turning a WF into a parameter study By switching at least one of the open input ports into a “PS Input port” the WF is turned into a Parameter Study 58
  • 59. Input-output files are stored in SEs /grid/gilda/sipos/InputImages /grid/gilda/sipos/XCoordinates /grid/gilda/sipos/YCoordinates Image.0 XCoordinate.0 YCoordinate.0 Image.1 XCoordinate.1 YCoordinate.1 2x2x2=8 execution of the whole workflow CROSS PRODUCT of data items /grid/gilda/sipos/Output ImagePart.0 ImagePart.1 ... 59
  • 60. Typical data-flow compositions CROSS ITERATOR DOT ITERATOR MATCH ITERATOR {A1, A2, A3} {B1, B2, B3} {A1, A2, A3} {B1, B2, B3} {A1, A2, A3} {B1, B2, B3} cross iterator: X dot iterator: one-to-one match iterator M all-to-all Activity / WF Activity / WF Activity / WF A1 B1 A1 B1 Ai Bj A2 B2 A2 B2 If Ai and Bj have a A3 B3 A3 B3 common ancestor AXB A B AMB P-GRADE Portal Find these in TAVERNA, MOTEUR supports this 60
  • 61. PS Input Port Grid Directory instead of FILE reference 61
  • 62. Parameter generator Generator can be attached to any parameter input port Generator can be • Auto generator: to generate text files • Custom generator: to generate any content Generated files are moved into SE by the portal 62
  • 63. Definition Window of Auto Generator Job User defines the template of the text file User puts key(s) into the template User defines values for the key(s) • Integer number • Real number • Custom set •… 63
  • 65. Placement of result Use the default value! Will contain one compressed file for each execution of the workflow. Choose a „reliable” Storage Element 65
  • 66. Executing PS workflows PS Details for parameter sweep workflows applications 66
  • 67. Detailed view of a PS workflow Generator job(s) Overall statistics of workflow instances Workflow instances Collector job(s) 67
  • 69. Learn once, use everywhere Develop once, execute anywhere Thank you! www.portal.p-grade.hu pgportal@lpds.sztaki.hu 69
  • 70. Backup slides to answer questions 70
  • 71. Proxy delegations Proxy based MyProxy authentication Proxy VOMS server server Proxy VOMS ext. Proxy username password P-GRADE Portal server VOMS ext. GILDA Proxy services username password Login & psw based authentication 71
  • 72. Settings Portal administrator can – connect the portal to several grids – register default resources of the connected grids 72
  • 73. Settings User can customize the connected grids by adding and removing resources 73