SlideShare a Scribd company logo
1 of 64
Download to read offline
SEASR: 

                 Meandre: !
        Semantic-Driven Data-Intensive !
            Flows in the Clouds 
         Xavier Llora, Bernie Acs, Loretta Auvil, Boris Capitanu, Michael Welge, David Goldberg




                           National Center for Supercomputing Applications!
                              University of Illinois at Urbana-Champaign
                                                                       


                               {xllora, acs1, lauvil, capitanu, mwelge, deg}@illinois.edu
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
SEASR: The Project 

          SEASR: Software Environment for the!
           Advancement of Scholarly Research
  •  Funded by the Andrew W. Mellon Foundation to answer the humanities community’s call for a
  research and development environment capable of powering leading edge digital humanities
  initiatives.

  •  Fosters collaboration through empowering scholars to share data and research processes with
  an infrastructure and framework designed to support reusable, repeatable, and scalable services
  and processes.

  •  Designed to enable developers to rapidly design, build, and share software applications that
  support research and collaboration using modular components that can be assembled to create
  reusable data-flows.

  •  Project web site: http://seasr.org




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
SEASR: The High-Altitude Picture




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
SEASR: @ Work – DISCUS
SEASR: @ Work – NEMA
SEASR: @ Work – NESTER
SEASR: @ Work – MONK
SAESR: @ Work – Evolution Highway
SEASR: A Quick Overview
      •  Addresses:
             –  Challenges of transforming information into knowledge

             –  Constructs software bridges to migrate unstructured and semi-
                structured data into structured data and/or metadata to enable
                analysis and accessibility. 

      •  Aims:
             –  Make digital collections more useful and flexible

             –  Provide access to analytic processes and visualizations

             –  Enable easy mash-up with other web-based services (SOA) 



The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
SEASR: Knowledge Discovery…
      Predictable process


   The Process
          •  Selection
          •  Preparation
          •  Transform
          •  Processing
          •  Interpret




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
SEASR: Knowledge Discovery…
      Predictable process across domains.


      Domains
             •  Literature
             •  History
             •  Music
             •  Art
             • Science




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
SEASR: Knowledge Discovery…
      Predictable process across domains and digital collections.


   Collection Types
          • Text
          •  Multimedia
          •  Data




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
SEASR: Design Goals
      •  Transparency
             –  From a single laptop to a HPC cluster

             –  Not bound to a particular computation fabric

             –  Allow heterogeneous development 

      •  Intuitive programming paradigm
             –  Modular Components, Flows, and Reusable

             –  Foster Collaboration and Sharing

      •  Open Source
      •  Service Orientated Architecture (SOA)
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Infrastructure
  •  SEASR/Meandre Infrastructure:
         –  Dataflow execution paradigm
         –  Semantic-web driven
         –  Web Oriented
         –  Supports publishing services
         –  Modular components
         –  Encapsulation and execution mechanism
         –  Promotes reuse, sharing, and collaboration


The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Data Driven Execution
      •  Execution Paradigms
             –  Conventional programs perform computational tasks by
                executing a sequence of instructions.
             –  Data driven execution revolves around the idea of
                applying transformation operations to a flow or stream
                of data when it is available. 

      •  Dataflow Approach
             –  May have zero to many inputs
             –  May have zero to many outputs
             –  Performs a logical operation when data is available
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Dataflow Example
      •  Dataflow Addition Example 
             –  Logical Operation ‘+’
                                                      Value1
             –  Requires two inputs 
                                 Sum
             –  Produces one output
                  Value2

      •  When two inputs are available
             –  Logical operation can be preformed

             –  Sum is output

      •  When output is produced 
             –  Reset internal values

             –  Wait for two new input values to become available 
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: The Dataflow Component
     •  Data dictates component execution semantics

                Inputs                                                   Outputs




                                                    Component

                                                    P




                          Descriptor in RDF!               The component !
                          of its behavior
                 implementation
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Component Metadata
      •  Describes a component
      •  Separates: 
             –  Components semantics (black box)
             –  Components implementation

      •  Provides a unified framework:
             –  Basic building blocks or units (components)
             –  Complex tasks (flows)
             –  Standardized metadata

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Semantic Web Concepts
      •  Relies on the usage of the resource description framework
         (RDF) which uses simple notation to express graph relations
         written usually as XML to provide a set of conventions and
         common means to exchange information
      •  Provides a common framework to share and reuse data
         across application, enterprise, and community boundaries
      •  Focuses on common formats for integration and combination
         of data drawn from diverse sources
      •  Pays special attention to the language used for recording how
         the data relates to real world objects
      •  Allows navigation to sets of data resources that are
         semantically connected.
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Metadata Ontologies
      •  Meandre's metadata relies on three ontologies: 
             –  The RDF ontology serves as a base for defining
                Meandre descriptors 
             –  The Dublin Core Elements ontology provides basic
                publishing and descriptive capabilities in the description
                of Meandre descriptors
             –  The Meandre ontology describes a set of relationships
                that model valid components, as understood by the
                Meandre execution engine architecture




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Components in RDF
 @prefix   meandre:    <http://www.meandre.org/ontology/> .
 @prefix   xsd:       <http://www.w3.org/2001/XMLSchema#> .
                                                                       Existing!
 @prefix   dc:        <http://purl.org/dc/elements/1.1/> .
                                                                       Standards
 @prefix   rdfs:      <http://www.w3.org/2000/01/rdf-schema#> .
 @prefix   rdf:       <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
 @prefix   :          <#> .

   <http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited-iterations>
               meandre:name Limited iterations^^xsd:string ;
               rdf:type meandre:executable_component ;
               dc:creator Xavier Llora^^xsd:string ;
               dc:date 2007-11-17T00:32:35^^xsd:date ;
               dc:description Allows only a limited number of
         iterations^^xsd:string ;
               dc:format java/class^^xsd:string ;
               dc:rights University of Illinois/NCSA Open Source
         License^^xsd:string ;
               meandre:execution_context
       
 <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/
         colt.jar> , 
       
 <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/
         gacore.jar> ,                   

                  <http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited-
The SEASR project and its Meandre infrastructure!
         iterations/implementation/> ,
are sponsored by The Andrew W. Mellon Foundation
                 
 <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/
         resources/gacore-meandre.jar> ,
Meandre: Components Types
      •  Components are the basic building block of any
         computational task. 

      •  There are two kinds of Meandre components: 
             –  Executable components 

                    •  Perform computational tasks that require no human
                       interactions during runtime

                    •  Processes are initialized during flow startup and are fired when
                       in accordance to the policies defined for it. 

             –  Control components

                    •  Used to pause dataflow during user interaction cycles

                    •  WebUI may be a HTML Form, Applet, or Other user interface 
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Component Assemblies
      •  Defined by connecting outputs from one component to the
         inputs of another.
             –  Cyclical connections are supported 

             –  Components may have 
                    •  Zero to many inputs

                    •  Zero to many output

                    •  Properties that control runtime behavior 

      •  Described using RDF 
             –  Enables storage, reuse, and sharing like components

             –  Allows discovery and dynamic execution

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Flow (Complex Tasks)
     •  A flow is a collection of connected components


                      Read
                                                        Merge
               P

                                                    P



                                                                    Show
                       Get
                                                                P
               P

                                                        Do
                                                    P




                                           Dataflow execution
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Create, Publish, & Share
      •  “Components” and “Flows” have RDF descriptors
             –  Easily shared, fosters sharing, & reuse

             –  Allow machines to read and interpret
             –  Independent of the implementations

             –  Combine different implementation & platforms

                    –  Components: Java, Python, Lisp, Web Services

                    –  Execution: On a Laptop or a High Performance Cluster 

      •  A “Location” is RDF descriptor of one to many
         components, one to many flows, and their
         implementations 

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Repository & Locations
      •  Each location represents a set components/flows
      •  Users can
             –  Combine different locations together

             –  Create components

             –  Assemble flows

             –  Share components and flows

      •  Repositories Help 
             –  Administrate complex environments

             –  Organize components and flows


The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Metadata Properties
      •  Components and Flows share properties such as
         component name, creator, creation date, description, tags,
         and rights.
      •  Components specific metadata to describe the
         components' behavior, it’s location, type of
         implementation, firing policy, runnable, format, resource
         location, and execution context
      •  Flow specific metadata describes the directed graph of
         components, components instances, connectors,
         connector instance data port source, connector, instance
         data port target, connector instance source, connector
         instance target, instance name

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Programming Paradigm 

      •  The programming paradigm creates complex
         tasks by linking together a bunch of specialized
         components. Meandre's publishing mechanism
         allows components develop by third parties to be
         assembled in a new flow. 
      •  There are two ways to develop flows : 
             –  Meandre’s Workbench visual programming tool
             –  Meandre’s ZigZag scripting language




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Workbench Existing Flow

  Components




    Flows




     Locations




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Workbench Create Flow




  Components




     Flows
    Locations

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Workbench Create Flow
  Drag & Drop Selected
  Component into
  workspace




  Components




     Flows
    Locations

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Workbench Create Flow
  Properties for Selected
  Component Exposed




  Components




     Flows
    Locations

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Workbench Create Flow


 Description for Selected
 Component Exposed




  Components




     Flows
    Locations

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Workbench Create Flow
  Drag & Drop Another
  Component into
  workspace




  Components




     Flows
    Locations

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Workbench Create Flow
  Connect Output of First
  Component to Input of
  Second




                                                    Click First Port to
                                                    connect will highlight
  Components
                                                    with color change (Red)




     Flows
    Locations

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Workbench Create Flow
  Connect Output of First
  Component to Input of
  Second




                                                    Click Port to Connect
                                                    will cause a line to be
  Components
                                                    displayed as visual
                                                    indicator


     Flows
    Locations

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Workbench Create Flow
  Repeat Drag & Drop to
  Complete the Assembly




  Components




     Flows
    Locations

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
      •  ZigZag is a simple language for describing data-
         intensive flows
             –  Modeled on Python for simplicity. 
             –  ZigZag is declarative language for expressing the
                directed graphs that describe flows. 

      •  Command-line tools allow ZigZag files to compile
         and execute.
             –  A compiler is provided to transform a ZigZag program
                (.zz) into Meandre archive unit (.mau). 
             –  Mau(s) can then be executed by a Meandre engine. 
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
      •  As an example the Flow Diagram
             –  The flow below pushes two strings that get concatenated and
                printed to the console




             –  




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
   •  ZigZag code that represents example flow:
                                             #
                                             # Imports the three required components and creates the component aliases
                                             #
   Repository                                import <http://localhost:1714/public/services/demo_repository.rdf>
   Location                                  alias <http://test.org/component/push_string> as PUSH
                                             alias <http://test.org/component/concatenate-strings> as CONCAT
                                             alias <http://test.org/component/print-object> as PRINT
                                             #
   Defines the logical                       # Creates four instances for the flow
   repository location                       #
                                             push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT()
   where components in                       #
   this flow can be found                    # Sets up the properties of the instances
                                             #
   similar to defining a                     push_hello.message, push_world.message = Hello , world!
   location for workbench                    #
                                             # Describes the data-intensive flow
   which would then                          #
   display available                         @phres, @pwres = push_hello(), push_world()
                                             @cres = concat( string_one: phres.string; string_two: pwres.string )
   components located                        print( object: cres.concatenated_string )
   there                                     #




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
   •  ZigZag code that represents example flow:
                                             #
                                             # Imports the three required components and creates the component aliases
                                             #
                                             import <http://localhost:1714/public/services/demo_repository.rdf>
                                             alias <http://test.org/component/push_string> as PUSH
   Alias                                     alias <http://test.org/component/concatenate-strings> as CONCAT
                                             alias <http://test.org/component/print-object> as PRINT
                                             #
                                             # Creates four instances for the flow
  Assigns a logical                          #
  name reference for                         push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT()
                                             #
  each component                             # Sets up the properties of the instances
  making subsequent                          #
                                             push_hello.message, push_world.message = Hello , world!
  program calls easier to                    #
  read and write.                            # Describes the data-intensive flow
                                             #
                                             @phres, @pwres = push_hello(), push_world()
                                             @cres = concat( string_one: phres.string; string_two: pwres.string )
                                             print( object: cres.concatenated_string )
                                             #




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
   •  ZigZag code that represents example flow:
                                             #
                                             # Imports the three required components and creates the component aliases
                                             #
                                             import <http://localhost:1714/public/services/demo_repository.rdf>
                                             alias <http://test.org/component/push_string> as PUSH
                                             alias <http://test.org/component/concatenate-strings> as CONCAT
                                             alias <http://test.org/component/print-object> as PRINT
                                             #
                                             # Creates four instances for the flow
                                             #
   Implementation                            push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT()
   Instances                                 #
                                             # Sets up the properties of the instances
                                             #
  Create instances of                        push_hello.message, push_world.message = Hello , world!
  the components using                       #
                                             # Describes the data-intensive flow
  the “Alias” references                     #
  similar to dragging                        @phres, @pwres = push_hello(), push_world()
                                             @cres = concat( string_one: phres.string; string_two: pwres.string )
  components on to                           print( object: cres.concatenated_string )
  workbench canvas                           #




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
   •  ZigZag code that represents example flow:
                                             #
                                             # Imports the three required components and creates the component aliases
                                             #
  Define the property                        import <http://localhost:1714/public/services/demo_repository.rdf>
                                             alias <http://test.org/component/push_string> as PUSH
  values for components                      alias <http://test.org/component/concatenate-strings> as CONCAT
  which is similar to filing                 alias <http://test.org/component/print-object> as PRINT
                                             #
  in values in the                           # Creates four instances for the flow
  workbench’s properties                     #
                                             push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT()
  panel.                                     #
                                             # Sets up the properties of the instances
                                             #
   Set the Property                          push_hello.message, push_world.message = Hello , world!
   Values                                    #
                                             # Describes the data-intensive flow
                                             #
                                             @phres, @pwres = push_hello(), push_world()
                                             @cres = concat( string_one: phres.string; string_two: pwres.string )
                                             print( object: cres.concatenated_string )
                                             #




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
   •  ZigZag code that represents example flow:
                                             #
                                             # Imports the three required components and creates the component aliases
                                             #
                                             import <http://localhost:1714/public/services/demo_repository.rdf>
                                             alias <http://test.org/component/push_string> as PUSH
  Define the connections                     alias <http://test.org/component/concatenate-strings> as CONCAT
                                             alias <http://test.org/component/print-object> as PRINT
  or relationships between                   #
  the components in this                     # Creates four instances for the flow
                                             #
  flow which is similar to                   push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT()
  drawing connection                         #
                                             # Sets up the properties of the instances
  lines on the workbench                     #
  canvas                                     push_hello.message, push_world.message = Hello , world!
                                             #
                                             # Describes the data-intensive flow
                                             #
                                             @phres, @pwres = push_hello(), push_world()
   Describe                                  @cres = concat( string_one: phres.string; string_two: pwres.string )
   Connections                               print( object: cres.concatenated_string )
                                             #




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
   •  Automatic Parallelization 
          –  Multiple instances of a component could be run in parallel to boost
             throughput.

          –  Specialized operator available in ZigZag Scripting to cause multiple
             instances of a given component to used
                  •  Consider a simple flow example show in the diagram



                  •  The dataflow declaration would look like
                          #
                          # Describes the data-intensive flow
                          #
                          @pu = push()
                          @pt = pass( string:pu.string )
                          print( object:pt.string )
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
   •  Automatic Parallelization 
          –  Adding the operator [+AUTO] to middle component
                       # Describes the data-intensive flow
                       #
                       @pu = push()
                       @pt = pass( string:pu.string ) [+AUTO]
                       print( object:pt.string )

          –  [+AUTO] tells the ZigZag compiler to parallelize the “pass
             component instance” by the number of cores available on
             system.
          –  [+AUTO] may also be written [+N] where N is an numeric
             value to use for example [+10]. 


The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
   •  Automatic Parallelization 
          –  Adding the operator [+4] would result in a directed graph 


                       # Describes the data-intensive flow
                       #
                       @pu = push()
                       @pt = pass( string:pu.string ) [+4]
                       print( object:pt.string )




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
   •  Automatic Parallelization 
          –  ZigZag has created 4 parallel instances of the component. 
                  •  It has also introduced a mapper instance that is in charge of
                     distributing the incoming data to each of the parallel instance. 

                  •  This is called unordered parallelization, since data may be
                     arriving to the print flow out of the original order in which they
                     were generated by the push component instance.




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
   •  Automatic Parallelization 
          –  The operator [+AUTO] can be told to maintain data order with
             “!” 

                                  # Describes the data-intensive flow
                                  #
                                  @pu = push()
                                  @pt = pass( string:pu.string ) [+AUTO!]
                                  print( object:pt.string )


          –  The [+AUTO!] tells the ZigZag compiler to parallelize the “pass
             component instance” by the number of cores available on
             system and to maintain order of data throughput.



The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
   •  Automatic Parallelization 
          –  ZigZag has created 4 parallel instances of the component. 
                  •  It has also introduced a mapper instance that is in charge of
                     distributing the incoming data to each of the parallel instance. 

                  •  It has also introduced a reducer instance that is in charge of
                     distributing the incoming data to each of the parallel instance




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Flows to MAU
      •  Flows can be executed using their RDF
         descriptors
      •  Flows can be compiled into MAU
      •  MAU is:
             –  Self-contained representation
             –  Ready for execution
             –  Portable
             –  The base of flow execution in grid environments


The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: The Architecture
      •  The design of the Meandre architecture follows
         three directives: 
             –  provide a robust and transparent scalable solution from
                a laptop to large-scale clusters
             –  create an unified solution for batch and interactive tasks
             –  encourage reusing and sharing components

      •  To ensure such goals, the designed architecture
         relies on four stacked layers and builds on top of
         service-oriented architectures (SOA)

The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre: Basic Single Server




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre MDX: Cloud Computing
      •  Servers can be 
             –  instantiated on demand
             –  disposed when done or on demand

      •  A cluster is formed by at least one server
      •  The Meandre Distributed Exchange (MDX)
             –  Orchestrates operational integrity by managing cluster
                configuration and membership using a shared database
                resource.



The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre MDX: The Picture
      MDX
Backbone





The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre MDX: The Architecture
      •  Virtualization infrastructure
             –  Provide a uniform access to the underlying execution environment. It relies on
                virtualization of machines and the usage of Java for hardware abstraction.

      •  IO standardization
             –  A unified layer provides access to shared data stores, distributed file-system,
                specialized metadata stores, and access to other service-oriented architecture
                gateways.

      •  Data-intensive flow infrastructure
             –  Provide the basic Meandre execution engine for data-intensive flows, component
                repositories and discovery mechanisms, extensible plugins and web user interfaces
                (webUIs).

      •  Interaction layer
             –  Can provide self-contained applications via webUIs, create plugins for third-party
                services, interact with the embedding application that relies on the Meandre engine,
                or provide services to the cloud.
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre MDX: The Experiment
      •  Experimental Prototype
             –  Designed and built to validate viability of MDX cluster

             –  Using VMWare Server 2.0 on three identical hosts with
                    •  Windows Server 2003

                    •  Equipped with two quad-core 2.8GHz Xeon processors

                    •  1600MHz front side bus

                    •  32Gb of RAM

                    •  4Tb of RAID 5 disk




The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre MDX: The Experiment
      •  Experimental Prototype
             –  8 virtual Machine instances were created on each host with
                    •  32-bit Ubuntu 8.04 Linux 

                    •  3 Gb RAM dedicated to each instance

                    •  1 Physical processor core assigned to each VM

                    •  VM instances were equipped to run a Meandre MDX server using Sun's Java
                       1.5 JVM

             –  A Third Physical hosts support 2 virtual machine instances with
                    •  32-bit Ubuntu 8.04 Linux 

                    •  3 Gb RAM dedicated to each instance

                    •  1 Physical processor core assigned to each VM

                    •  Highly available MySQL database and HTTP load-balancing facility
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre MDX: The Experiment
      •  We conducted three different experiments
             –  All three were based on the same flow shown earlier in the ZigZag
                example with a single change to make the single line of text into
                250,000 lines of text for each iteration of the flow.

             –  The first test was designed to test the scalability of a single
                Meandre server. 



             –  Concurrent flows !
                running on a standalone!
                engine on a log/log scale, !
                each iteration of the flow !
                pushed 250,000 lines of text


The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre MDX: The Experiment
      •  We conducted three different experiments
             –  All three were based on the same flow shown earlier in the ZigZag
                example with a single change to make the single line of text into
                250,000 lines of text for each iteration of the flow.

             –  The second experiment were run against a virtual Meandre cluster
                consisting of 16 Meandre servers. 



             –  Concurrent flows !
                running on a standalone!
                engine on a log/log scale, !
                each iteration of the flow !
                pushed 1 lines of text


The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre MDX: The Experiment
      •  We conducted three different experiments
             –  All three were based on the same flow shown earlier in the ZigZag
                example with a single change to make the single line of text into
                250,000 lines of text for each iteration of the flow.

             –  The third experiment were run against a virtual Meandre cluster
                consisting of 16 Meandre servers. 



             –  Concurrent flows !
                running on a standalone!
                engine on a log/log scale, !
                each iteration of the flow !
                pushed 250,000 lines of text


The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Meandre MDX: The Experiment
      •  We conducted three different experiments
             –  The first test clearly shows
                    •  The average time per flow increased linearly with the
                       number of concurrent flows


             –  The next experiments clearly shows
                    •  Cluster throughput grows linearly with the number of
                       Meandre servers available



The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
Upcoming Events
•  SEASR 2009 workshop
  –  The workshop is organized to provide expanded
     opportunities for learning, knowledge sharing, and
     support and is intended to provide sufficient
     introduction and support so that teams can implement
     a study using SEASR.
  –  The workshop is intended for institutional teams of
     scholars from the Humanities.
  –  The workshop will include communication and work
     from a team’s home campus as well as face-to-face
     meeting on the University of Illinois campus.
SEASR: 

                 Meandre: !
        Semantic-Driven Data-Intensive !
            Flows in the Clouds 
         Xavier Llora, Bernie Acs, Loretta Auvil, Boris Capitanu, Michael Welge, David Goldberg




                           National Center for Supercomputing Applications!
                              University of Illinois at Urbana-Champaign
                                                                       


                               {xllora, acs1, lauvil, capitanu, mwelge, deg}@illinois.edu
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation

More Related Content

Viewers also liked

The use of Technology in the K-8 Science Curriculum
The use of Technology in the K-8 Science CurriculumThe use of Technology in the K-8 Science Curriculum
The use of Technology in the K-8 Science Curriculuml_cambe
 
ED 480 The use of technology in the K-8 Science Curriculum
ED 480 The use of technology in the K-8 Science CurriculumED 480 The use of technology in the K-8 Science Curriculum
ED 480 The use of technology in the K-8 Science Curriculuml_cambe
 
Fedora App Slide 2009 Hastac
Fedora App Slide 2009 HastacFedora App Slide 2009 Hastac
Fedora App Slide 2009 HastacLoretta Auvil
 
Social Human-Robot Interaction in Slovak
Social Human-Robot Interaction in SlovakSocial Human-Robot Interaction in Slovak
Social Human-Robot Interaction in SlovakMaria Vircikova
 
The Star Trek computer doesnt seem so interesting (on Artificial Intelligence...
The Star Trek computer doesnt seem so interesting (on Artificial Intelligence...The Star Trek computer doesnt seem so interesting (on Artificial Intelligence...
The Star Trek computer doesnt seem so interesting (on Artificial Intelligence...Maria Vircikova
 
Emomime startup presentation - Artificial Intelligence that transforms a sho...
Emomime startup presentation -  Artificial Intelligence that transforms a sho...Emomime startup presentation -  Artificial Intelligence that transforms a sho...
Emomime startup presentation - Artificial Intelligence that transforms a sho...Maria Vircikova
 
ICHASS Workshop Text Mining
ICHASS Workshop Text MiningICHASS Workshop Text Mining
ICHASS Workshop Text MiningLoretta Auvil
 
Intelligent robots are coming everywhere
Intelligent robots are coming everywhereIntelligent robots are coming everywhere
Intelligent robots are coming everywhereMaria Vircikova
 
Making It Sticky: how to effectively Present your Ideas
Making It Sticky: how to effectively Present your IdeasMaking It Sticky: how to effectively Present your Ideas
Making It Sticky: how to effectively Present your Ideasvinamaria
 
Presenting with Impact
Presenting with ImpactPresenting with Impact
Presenting with Impactvinamaria
 
Human-Robot Interaction | Field Tests: Observing People´s Reaction
Human-Robot Interaction | Field Tests: Observing People´s ReactionHuman-Robot Interaction | Field Tests: Observing People´s Reaction
Human-Robot Interaction | Field Tests: Observing People´s ReactionMaria Vircikova
 

Viewers also liked (19)

SEASR Overview
SEASR OverviewSEASR Overview
SEASR Overview
 
7
77
7
 
The use of Technology in the K-8 Science Curriculum
The use of Technology in the K-8 Science CurriculumThe use of Technology in the K-8 Science Curriculum
The use of Technology in the K-8 Science Curriculum
 
ED 480 The use of technology in the K-8 Science Curriculum
ED 480 The use of technology in the K-8 Science CurriculumED 480 The use of technology in the K-8 Science Curriculum
ED 480 The use of technology in the K-8 Science Curriculum
 
7
77
7
 
Fedora App Slide 2009 Hastac
Fedora App Slide 2009 HastacFedora App Slide 2009 Hastac
Fedora App Slide 2009 Hastac
 
Social Human-Robot Interaction in Slovak
Social Human-Robot Interaction in SlovakSocial Human-Robot Interaction in Slovak
Social Human-Robot Interaction in Slovak
 
The Star Trek computer doesnt seem so interesting (on Artificial Intelligence...
The Star Trek computer doesnt seem so interesting (on Artificial Intelligence...The Star Trek computer doesnt seem so interesting (on Artificial Intelligence...
The Star Trek computer doesnt seem so interesting (on Artificial Intelligence...
 
SEASR Overview
SEASR OverviewSEASR Overview
SEASR Overview
 
8
88
8
 
6
66
6
 
Emomime startup presentation - Artificial Intelligence that transforms a sho...
Emomime startup presentation -  Artificial Intelligence that transforms a sho...Emomime startup presentation -  Artificial Intelligence that transforms a sho...
Emomime startup presentation - Artificial Intelligence that transforms a sho...
 
ICHASS Workshop Text Mining
ICHASS Workshop Text MiningICHASS Workshop Text Mining
ICHASS Workshop Text Mining
 
SEASR Text
SEASR TextSEASR Text
SEASR Text
 
Prep
PrepPrep
Prep
 
Intelligent robots are coming everywhere
Intelligent robots are coming everywhereIntelligent robots are coming everywhere
Intelligent robots are coming everywhere
 
Making It Sticky: how to effectively Present your Ideas
Making It Sticky: how to effectively Present your IdeasMaking It Sticky: how to effectively Present your Ideas
Making It Sticky: how to effectively Present your Ideas
 
Presenting with Impact
Presenting with ImpactPresenting with Impact
Presenting with Impact
 
Human-Robot Interaction | Field Tests: Observing People´s Reaction
Human-Robot Interaction | Field Tests: Observing People´s ReactionHuman-Robot Interaction | Field Tests: Observing People´s Reaction
Human-Robot Interaction | Field Tests: Observing People´s Reaction
 

Similar to SEASR eScience 2008

Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009Loretta Auvil
 
Seasr Overview Ws April 2009
Seasr Overview Ws April 2009Seasr Overview Ws April 2009
Seasr Overview Ws April 2009Loretta Auvil
 
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...eswcsummerschool
 
WebGUI And The Semantic Web
WebGUI And The Semantic WebWebGUI And The Semantic Web
WebGUI And The Semantic WebWilliam McKee
 
Some news about the SW
Some news about the SWSome news about the SW
Some news about the SWIvan Herman
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesvty
 
Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialAdrian Hornsby
 
Applications of the REST Principle
Applications of the REST PrincipleApplications of the REST Principle
Applications of the REST Principleelliando dias
 
Hello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperHello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperAlexandre Passant
 
Exploring Data Visualization
Exploring Data VisualizationExploring Data Visualization
Exploring Data VisualizationJim Jenkins
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDCAstroAtom
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docbutest
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsAndreas Kamilaris
 
Caspar Preservation Methodology Steve Renkin
Caspar Preservation Methodology Steve RenkinCaspar Preservation Methodology Steve Renkin
Caspar Preservation Methodology Steve RenkinDigitalPreservationEurope
 

Similar to SEASR eScience 2008 (20)

Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009Meandre Architecture Ws Apr 2009
Meandre Architecture Ws Apr 2009
 
Seasr Overview Ws April 2009
Seasr Overview Ws April 2009Seasr Overview Ws April 2009
Seasr Overview Ws April 2009
 
SEASR and Zotero
SEASR and ZoteroSEASR and Zotero
SEASR and Zotero
 
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
 
WebGUI And The Semantic Web
WebGUI And The Semantic WebWebGUI And The Semantic Web
WebGUI And The Semantic Web
 
Some news about the SW
Some news about the SWSome news about the SW
Some news about the SW
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potential
 
Semtech2006
Semtech2006Semtech2006
Semtech2006
 
Applications of the REST Principle
Applications of the REST PrincipleApplications of the REST Principle
Applications of the REST Principle
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Hello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic DeveloperHello Open World - The Web of Data for the Pragmatic Developer
Hello Open World - The Web of Data for the Pragmatic Developer
 
Exploring Data Visualization
Exploring Data VisualizationExploring Data Visualization
Exploring Data Visualization
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 
cold2014-ldvizwiz
cold2014-ldvizwizcold2014-ldvizwiz
cold2014-ldvizwiz
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.doc
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of Things
 
Caspar Preservation Methodology Steve Renkin
Caspar Preservation Methodology Steve RenkinCaspar Preservation Methodology Steve Renkin
Caspar Preservation Methodology Steve Renkin
 
Presentation
PresentationPresentation
Presentation
 
Metadata is back!
Metadata is back!Metadata is back!
Metadata is back!
 

More from Loretta Auvil

More from Loretta Auvil (10)

SEASR Tools
SEASR ToolsSEASR Tools
SEASR Tools
 
Text Mining and SEASR
Text Mining and SEASRText Mining and SEASR
Text Mining and SEASR
 
SEASR-Fedora App
SEASR-Fedora AppSEASR-Fedora App
SEASR-Fedora App
 
SEASR Installation
SEASR InstallationSEASR Installation
SEASR Installation
 
SEASR Community Hub
SEASR Community HubSEASR Community Hub
SEASR Community Hub
 
Meandre Workbench Ws Jan 2009
Meandre Workbench Ws Jan 2009Meandre Workbench Ws Jan 2009
Meandre Workbench Ws Jan 2009
 
SEASR and UIMA
SEASR and UIMASEASR and UIMA
SEASR and UIMA
 
ICHASS Workshop Lab
ICHASS Workshop LabICHASS Workshop Lab
ICHASS Workshop Lab
 
ICHASS Workshop Seasr
ICHASS Workshop SeasrICHASS Workshop Seasr
ICHASS Workshop Seasr
 
Text Mining Wksp Auvil
Text Mining Wksp AuvilText Mining Wksp Auvil
Text Mining Wksp Auvil
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

SEASR eScience 2008

  • 1. SEASR: Meandre: ! Semantic-Driven Data-Intensive ! Flows in the Clouds Xavier Llora, Bernie Acs, Loretta Auvil, Boris Capitanu, Michael Welge, David Goldberg National Center for Supercomputing Applications! University of Illinois at Urbana-Champaign {xllora, acs1, lauvil, capitanu, mwelge, deg}@illinois.edu The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 2. SEASR: The Project SEASR: Software Environment for the! Advancement of Scholarly Research •  Funded by the Andrew W. Mellon Foundation to answer the humanities community’s call for a research and development environment capable of powering leading edge digital humanities initiatives. •  Fosters collaboration through empowering scholars to share data and research processes with an infrastructure and framework designed to support reusable, repeatable, and scalable services and processes. •  Designed to enable developers to rapidly design, build, and share software applications that support research and collaboration using modular components that can be assembled to create reusable data-flows. •  Project web site: http://seasr.org The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 3. SEASR: The High-Altitude Picture The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 4. SEASR: @ Work – DISCUS
  • 5. SEASR: @ Work – NEMA
  • 6. SEASR: @ Work – NESTER
  • 7. SEASR: @ Work – MONK
  • 8. SAESR: @ Work – Evolution Highway
  • 9. SEASR: A Quick Overview •  Addresses: –  Challenges of transforming information into knowledge –  Constructs software bridges to migrate unstructured and semi- structured data into structured data and/or metadata to enable analysis and accessibility. •  Aims: –  Make digital collections more useful and flexible –  Provide access to analytic processes and visualizations –  Enable easy mash-up with other web-based services (SOA) The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 10. SEASR: Knowledge Discovery… Predictable process The Process •  Selection •  Preparation •  Transform •  Processing •  Interpret The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 11. SEASR: Knowledge Discovery… Predictable process across domains. Domains •  Literature •  History •  Music •  Art • Science The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 12. SEASR: Knowledge Discovery… Predictable process across domains and digital collections. Collection Types • Text •  Multimedia •  Data The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 13. SEASR: Design Goals •  Transparency –  From a single laptop to a HPC cluster –  Not bound to a particular computation fabric –  Allow heterogeneous development •  Intuitive programming paradigm –  Modular Components, Flows, and Reusable –  Foster Collaboration and Sharing •  Open Source •  Service Orientated Architecture (SOA) The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 14. Meandre: Infrastructure •  SEASR/Meandre Infrastructure: –  Dataflow execution paradigm –  Semantic-web driven –  Web Oriented –  Supports publishing services –  Modular components –  Encapsulation and execution mechanism –  Promotes reuse, sharing, and collaboration The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 15. Meandre: Data Driven Execution •  Execution Paradigms –  Conventional programs perform computational tasks by executing a sequence of instructions. –  Data driven execution revolves around the idea of applying transformation operations to a flow or stream of data when it is available. •  Dataflow Approach –  May have zero to many inputs –  May have zero to many outputs –  Performs a logical operation when data is available The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 16. Meandre: Dataflow Example •  Dataflow Addition Example –  Logical Operation ‘+’ Value1 –  Requires two inputs Sum –  Produces one output Value2 •  When two inputs are available –  Logical operation can be preformed –  Sum is output •  When output is produced –  Reset internal values –  Wait for two new input values to become available The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 17. Meandre: The Dataflow Component •  Data dictates component execution semantics Inputs Outputs Component P Descriptor in RDF! The component ! of its behavior implementation The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 18. Meandre: Component Metadata •  Describes a component •  Separates: –  Components semantics (black box) –  Components implementation •  Provides a unified framework: –  Basic building blocks or units (components) –  Complex tasks (flows) –  Standardized metadata The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 19. Meandre: Semantic Web Concepts •  Relies on the usage of the resource description framework (RDF) which uses simple notation to express graph relations written usually as XML to provide a set of conventions and common means to exchange information •  Provides a common framework to share and reuse data across application, enterprise, and community boundaries •  Focuses on common formats for integration and combination of data drawn from diverse sources •  Pays special attention to the language used for recording how the data relates to real world objects •  Allows navigation to sets of data resources that are semantically connected. The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 20. Meandre: Metadata Ontologies •  Meandre's metadata relies on three ontologies: –  The RDF ontology serves as a base for defining Meandre descriptors –  The Dublin Core Elements ontology provides basic publishing and descriptive capabilities in the description of Meandre descriptors –  The Meandre ontology describes a set of relationships that model valid components, as understood by the Meandre execution engine architecture The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 21. Meandre: Components in RDF @prefix meandre: <http://www.meandre.org/ontology/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . Existing! @prefix dc: <http://purl.org/dc/elements/1.1/> . Standards @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix : <#> . <http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited-iterations> meandre:name Limited iterations^^xsd:string ; rdf:type meandre:executable_component ; dc:creator Xavier Llora^^xsd:string ; dc:date 2007-11-17T00:32:35^^xsd:date ; dc:description Allows only a limited number of iterations^^xsd:string ; dc:format java/class^^xsd:string ; dc:rights University of Illinois/NCSA Open Source License^^xsd:string ; meandre:execution_context <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/ colt.jar> , <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/ gacore.jar> , <http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited- The SEASR project and its Meandre infrastructure! iterations/implementation/> , are sponsored by The Andrew W. Mellon Foundation <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/ resources/gacore-meandre.jar> ,
  • 22. Meandre: Components Types •  Components are the basic building block of any computational task. •  There are two kinds of Meandre components: –  Executable components •  Perform computational tasks that require no human interactions during runtime •  Processes are initialized during flow startup and are fired when in accordance to the policies defined for it. –  Control components •  Used to pause dataflow during user interaction cycles •  WebUI may be a HTML Form, Applet, or Other user interface The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 23. Meandre: Component Assemblies •  Defined by connecting outputs from one component to the inputs of another. –  Cyclical connections are supported –  Components may have •  Zero to many inputs •  Zero to many output •  Properties that control runtime behavior •  Described using RDF –  Enables storage, reuse, and sharing like components –  Allows discovery and dynamic execution The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 24. Meandre: Flow (Complex Tasks) •  A flow is a collection of connected components Read Merge P P Show Get P P Do P Dataflow execution The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 25. Meandre: Create, Publish, & Share •  “Components” and “Flows” have RDF descriptors –  Easily shared, fosters sharing, & reuse –  Allow machines to read and interpret –  Independent of the implementations –  Combine different implementation & platforms –  Components: Java, Python, Lisp, Web Services –  Execution: On a Laptop or a High Performance Cluster •  A “Location” is RDF descriptor of one to many components, one to many flows, and their implementations The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 26. Meandre: Repository & Locations •  Each location represents a set components/flows •  Users can –  Combine different locations together –  Create components –  Assemble flows –  Share components and flows •  Repositories Help –  Administrate complex environments –  Organize components and flows The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 27. Meandre: Metadata Properties •  Components and Flows share properties such as component name, creator, creation date, description, tags, and rights. •  Components specific metadata to describe the components' behavior, it’s location, type of implementation, firing policy, runnable, format, resource location, and execution context •  Flow specific metadata describes the directed graph of components, components instances, connectors, connector instance data port source, connector, instance data port target, connector instance source, connector instance target, instance name The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 28. Meandre: Programming Paradigm •  The programming paradigm creates complex tasks by linking together a bunch of specialized components. Meandre's publishing mechanism allows components develop by third parties to be assembled in a new flow. •  There are two ways to develop flows : –  Meandre’s Workbench visual programming tool –  Meandre’s ZigZag scripting language The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 29. Meandre: Workbench Existing Flow Components Flows Locations The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 30. Meandre: Workbench Create Flow Components Flows Locations The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 31. Meandre: Workbench Create Flow Drag & Drop Selected Component into workspace Components Flows Locations The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 32. Meandre: Workbench Create Flow Properties for Selected Component Exposed Components Flows Locations The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 33. Meandre: Workbench Create Flow Description for Selected Component Exposed Components Flows Locations The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 34. Meandre: Workbench Create Flow Drag & Drop Another Component into workspace Components Flows Locations The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 35. Meandre: Workbench Create Flow Connect Output of First Component to Input of Second Click First Port to connect will highlight Components with color change (Red) Flows Locations The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 36. Meandre: Workbench Create Flow Connect Output of First Component to Input of Second Click Port to Connect will cause a line to be Components displayed as visual indicator Flows Locations The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 37. Meandre: Workbench Create Flow Repeat Drag & Drop to Complete the Assembly Components Flows Locations The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 38. Meandre: ZigZag Script Language •  ZigZag is a simple language for describing data- intensive flows –  Modeled on Python for simplicity. –  ZigZag is declarative language for expressing the directed graphs that describe flows. •  Command-line tools allow ZigZag files to compile and execute. –  A compiler is provided to transform a ZigZag program (.zz) into Meandre archive unit (.mau). –  Mau(s) can then be executed by a Meandre engine. The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 39. Meandre: ZigZag Script Language •  As an example the Flow Diagram –  The flow below pushes two strings that get concatenated and printed to the console –  The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 40. Meandre: ZigZag Script Language •  ZigZag code that represents example flow: # # Imports the three required components and creates the component aliases # Repository import <http://localhost:1714/public/services/demo_repository.rdf> Location alias <http://test.org/component/push_string> as PUSH alias <http://test.org/component/concatenate-strings> as CONCAT alias <http://test.org/component/print-object> as PRINT # Defines the logical # Creates four instances for the flow repository location # push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT() where components in # this flow can be found # Sets up the properties of the instances # similar to defining a push_hello.message, push_world.message = Hello , world! location for workbench # # Describes the data-intensive flow which would then # display available @phres, @pwres = push_hello(), push_world() @cres = concat( string_one: phres.string; string_two: pwres.string ) components located print( object: cres.concatenated_string ) there # The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 41. Meandre: ZigZag Script Language •  ZigZag code that represents example flow: # # Imports the three required components and creates the component aliases # import <http://localhost:1714/public/services/demo_repository.rdf> alias <http://test.org/component/push_string> as PUSH Alias alias <http://test.org/component/concatenate-strings> as CONCAT alias <http://test.org/component/print-object> as PRINT # # Creates four instances for the flow Assigns a logical # name reference for push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT() # each component # Sets up the properties of the instances making subsequent # push_hello.message, push_world.message = Hello , world! program calls easier to # read and write. # Describes the data-intensive flow # @phres, @pwres = push_hello(), push_world() @cres = concat( string_one: phres.string; string_two: pwres.string ) print( object: cres.concatenated_string ) # The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 42. Meandre: ZigZag Script Language •  ZigZag code that represents example flow: # # Imports the three required components and creates the component aliases # import <http://localhost:1714/public/services/demo_repository.rdf> alias <http://test.org/component/push_string> as PUSH alias <http://test.org/component/concatenate-strings> as CONCAT alias <http://test.org/component/print-object> as PRINT # # Creates four instances for the flow # Implementation push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT() Instances # # Sets up the properties of the instances # Create instances of push_hello.message, push_world.message = Hello , world! the components using # # Describes the data-intensive flow the “Alias” references # similar to dragging @phres, @pwres = push_hello(), push_world() @cres = concat( string_one: phres.string; string_two: pwres.string ) components on to print( object: cres.concatenated_string ) workbench canvas # The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 43. Meandre: ZigZag Script Language •  ZigZag code that represents example flow: # # Imports the three required components and creates the component aliases # Define the property import <http://localhost:1714/public/services/demo_repository.rdf> alias <http://test.org/component/push_string> as PUSH values for components alias <http://test.org/component/concatenate-strings> as CONCAT which is similar to filing alias <http://test.org/component/print-object> as PRINT # in values in the # Creates four instances for the flow workbench’s properties # push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT() panel. # # Sets up the properties of the instances # Set the Property push_hello.message, push_world.message = Hello , world! Values # # Describes the data-intensive flow # @phres, @pwres = push_hello(), push_world() @cres = concat( string_one: phres.string; string_two: pwres.string ) print( object: cres.concatenated_string ) # The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 44. Meandre: ZigZag Script Language •  ZigZag code that represents example flow: # # Imports the three required components and creates the component aliases # import <http://localhost:1714/public/services/demo_repository.rdf> alias <http://test.org/component/push_string> as PUSH Define the connections alias <http://test.org/component/concatenate-strings> as CONCAT alias <http://test.org/component/print-object> as PRINT or relationships between # the components in this # Creates four instances for the flow # flow which is similar to push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT() drawing connection # # Sets up the properties of the instances lines on the workbench # canvas push_hello.message, push_world.message = Hello , world! # # Describes the data-intensive flow # @phres, @pwres = push_hello(), push_world() Describe @cres = concat( string_one: phres.string; string_two: pwres.string ) Connections print( object: cres.concatenated_string ) # The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 45. Meandre: ZigZag Script Language •  Automatic Parallelization –  Multiple instances of a component could be run in parallel to boost throughput. –  Specialized operator available in ZigZag Scripting to cause multiple instances of a given component to used •  Consider a simple flow example show in the diagram •  The dataflow declaration would look like # # Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) print( object:pt.string ) The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 46. Meandre: ZigZag Script Language •  Automatic Parallelization –  Adding the operator [+AUTO] to middle component # Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+AUTO] print( object:pt.string ) –  [+AUTO] tells the ZigZag compiler to parallelize the “pass component instance” by the number of cores available on system. –  [+AUTO] may also be written [+N] where N is an numeric value to use for example [+10]. The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 47. Meandre: ZigZag Script Language •  Automatic Parallelization –  Adding the operator [+4] would result in a directed graph # Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+4] print( object:pt.string ) The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 48. Meandre: ZigZag Script Language •  Automatic Parallelization –  ZigZag has created 4 parallel instances of the component. •  It has also introduced a mapper instance that is in charge of distributing the incoming data to each of the parallel instance. •  This is called unordered parallelization, since data may be arriving to the print flow out of the original order in which they were generated by the push component instance. The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 49. Meandre: ZigZag Script Language •  Automatic Parallelization –  The operator [+AUTO] can be told to maintain data order with “!” # Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+AUTO!] print( object:pt.string ) –  The [+AUTO!] tells the ZigZag compiler to parallelize the “pass component instance” by the number of cores available on system and to maintain order of data throughput. The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 50. Meandre: ZigZag Script Language •  Automatic Parallelization –  ZigZag has created 4 parallel instances of the component. •  It has also introduced a mapper instance that is in charge of distributing the incoming data to each of the parallel instance. •  It has also introduced a reducer instance that is in charge of distributing the incoming data to each of the parallel instance The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 51. Meandre: Flows to MAU •  Flows can be executed using their RDF descriptors •  Flows can be compiled into MAU •  MAU is: –  Self-contained representation –  Ready for execution –  Portable –  The base of flow execution in grid environments The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 52. Meandre: The Architecture •  The design of the Meandre architecture follows three directives: –  provide a robust and transparent scalable solution from a laptop to large-scale clusters –  create an unified solution for batch and interactive tasks –  encourage reusing and sharing components •  To ensure such goals, the designed architecture relies on four stacked layers and builds on top of service-oriented architectures (SOA) The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 53. Meandre: Basic Single Server The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 54. Meandre MDX: Cloud Computing •  Servers can be –  instantiated on demand –  disposed when done or on demand •  A cluster is formed by at least one server •  The Meandre Distributed Exchange (MDX) –  Orchestrates operational integrity by managing cluster configuration and membership using a shared database resource. The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 55. Meandre MDX: The Picture MDX
Backbone
 The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 56. Meandre MDX: The Architecture •  Virtualization infrastructure –  Provide a uniform access to the underlying execution environment. It relies on virtualization of machines and the usage of Java for hardware abstraction. •  IO standardization –  A unified layer provides access to shared data stores, distributed file-system, specialized metadata stores, and access to other service-oriented architecture gateways. •  Data-intensive flow infrastructure –  Provide the basic Meandre execution engine for data-intensive flows, component repositories and discovery mechanisms, extensible plugins and web user interfaces (webUIs). •  Interaction layer –  Can provide self-contained applications via webUIs, create plugins for third-party services, interact with the embedding application that relies on the Meandre engine, or provide services to the cloud. The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 57. Meandre MDX: The Experiment •  Experimental Prototype –  Designed and built to validate viability of MDX cluster –  Using VMWare Server 2.0 on three identical hosts with •  Windows Server 2003 •  Equipped with two quad-core 2.8GHz Xeon processors •  1600MHz front side bus •  32Gb of RAM •  4Tb of RAID 5 disk The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 58. Meandre MDX: The Experiment •  Experimental Prototype –  8 virtual Machine instances were created on each host with •  32-bit Ubuntu 8.04 Linux •  3 Gb RAM dedicated to each instance •  1 Physical processor core assigned to each VM •  VM instances were equipped to run a Meandre MDX server using Sun's Java 1.5 JVM –  A Third Physical hosts support 2 virtual machine instances with •  32-bit Ubuntu 8.04 Linux •  3 Gb RAM dedicated to each instance •  1 Physical processor core assigned to each VM •  Highly available MySQL database and HTTP load-balancing facility The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 59. Meandre MDX: The Experiment •  We conducted three different experiments –  All three were based on the same flow shown earlier in the ZigZag example with a single change to make the single line of text into 250,000 lines of text for each iteration of the flow. –  The first test was designed to test the scalability of a single Meandre server. –  Concurrent flows ! running on a standalone! engine on a log/log scale, ! each iteration of the flow ! pushed 250,000 lines of text The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 60. Meandre MDX: The Experiment •  We conducted three different experiments –  All three were based on the same flow shown earlier in the ZigZag example with a single change to make the single line of text into 250,000 lines of text for each iteration of the flow. –  The second experiment were run against a virtual Meandre cluster consisting of 16 Meandre servers. –  Concurrent flows ! running on a standalone! engine on a log/log scale, ! each iteration of the flow ! pushed 1 lines of text The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 61. Meandre MDX: The Experiment •  We conducted three different experiments –  All three were based on the same flow shown earlier in the ZigZag example with a single change to make the single line of text into 250,000 lines of text for each iteration of the flow. –  The third experiment were run against a virtual Meandre cluster consisting of 16 Meandre servers. –  Concurrent flows ! running on a standalone! engine on a log/log scale, ! each iteration of the flow ! pushed 250,000 lines of text The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 62. Meandre MDX: The Experiment •  We conducted three different experiments –  The first test clearly shows •  The average time per flow increased linearly with the number of concurrent flows –  The next experiments clearly shows •  Cluster throughput grows linearly with the number of Meandre servers available The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation
  • 63. Upcoming Events •  SEASR 2009 workshop –  The workshop is organized to provide expanded opportunities for learning, knowledge sharing, and support and is intended to provide sufficient introduction and support so that teams can implement a study using SEASR. –  The workshop is intended for institutional teams of scholars from the Humanities. –  The workshop will include communication and work from a team’s home campus as well as face-to-face meeting on the University of Illinois campus.
  • 64. SEASR: Meandre: ! Semantic-Driven Data-Intensive ! Flows in the Clouds Xavier Llora, Bernie Acs, Loretta Auvil, Boris Capitanu, Michael Welge, David Goldberg National Center for Supercomputing Applications! University of Illinois at Urbana-Champaign {xllora, acs1, lauvil, capitanu, mwelge, deg}@illinois.edu The SEASR project and its Meandre infrastructure! are sponsored by The Andrew W. Mellon Foundation